PDF在node.js中显示为Buffer Object,如何将其解析为文本?

gudnpqoy  于 9个月前  发布在  Node.js
关注(0)|答案(1)|浏览(101)

我正在处理后端的文件。我试图发送一个PDF到后端,然后我想解析PDF,以便我可以阅读它的文本。似乎我可以将PDF发送到后端。但是,我不知道如何阅读PDF文本后,我得到它在后端。这是我的帖子请求:

app.post("/submitPDF", (request, response) => {
  console.log("Made a post request>>>", request.body);

  // if (!request.files && !request.files.pdfFile) {
  //   console.log("No file received");
  //   response.status(400);
  //   response.end();
  // }
  pdfParse(request.body.pdfFile).then((result) => {
    console.log(result.text);
  });
  response.status(201).send({ message: "File upload successful" });
});

字符串
下面是我的API POST请求,只是为了展示我是如何发送PDF的。我创建了一个FormData对象,附加了我的PDF,然后在我的post请求中发送它:

export const fetchPDF = (value) => {
  console.log("The value>>>", value);
  const formData = new FormData();
  formData.append('pdfFile', value);
  console.log(Object.fromEntries(formData.entries())) // this is how to console log items in FormData

  return fetch(`${baseURL}/submitPDF`, {
    method: 'POST',
    headers: {
      'Content-Type': 'multipart/form-data', // had to change content-type to accept pdfs. this fixed the cors error
    },
    body: formData
  })
    .then((response) => {
      if (response.ok) {
        console.log("The response is ok");
        return response;
      } else {
        // If not successful, handle the error
        console.log("the response is not ok", response);
        throw new Error(`Error: ${response.status} - ${response.statusText}`);
      }
    })
    .catch((error) => {
      console.log("There is an error>>>", error.message);
    })
}


当我控制台记录包含PDF的request.body时,我得到了一些像这样的buffer对象:
2019 <Buffer 2d 2d 2d 2d 2d 2d 57 65 62 4b 69 74 46 6f >- 04 - 22 00:00:00 00:00 00:00 00 00:00 00:00 00:00 00 00:00 00:00 00 00:00 00:00 00 00:00 00 00:00 00:00 00:00:00:00 00:00:00:0
我尝试使用pdf-parse解析我的PDF,如下所示:

pdfParse(request.body.pdfFile).then((result) => {
  console.log(result.text);
});


我犯了这两个错误:
throw new Error('getDocument中的参数无效,' + '需要> Uint 8Array、字符串或参数对象');
错误:getDocument中的参数无效,需要Uint 8Array、字符串或参数对象
似乎我必须解析buffer对象,但我不确定我到底是怎么做的?我必须将buffer对象转换为字符串吗?如果是,我怎么做?然后我使用pdf-parse之后,我可以阅读PDF的文本?

zyfwsgd6

zyfwsgd61#

你需要一些中间件来上传文件。
建议使用Multer,例如:https://github.com/expressjs/multer
然后更新你的代码,像这样:

const express = require('express')
const multer  = require('multer')
const upload = multer({ dest: 'uploads/' })

const app = express()

app.post('/profile', upload.single('pdf'), function (req, res, next) {
    // req.file is the `pdf` file
    // req.body will hold the text fields, if there were any

    pdfParse(req.body.pdf).then((result)...
})

字符串

相关问题