python-3.x 如何在PDF中附加多个文件?

vpfxa7rd  于 2023-01-27  发布在  Python
关注(0)|答案(2)|浏览(217)

我有一个对象列表:List = ['Doc1.xlsx','Doc2.csv','Doc3.pdf']及其名称列表:List1 = ['Doc1_name.xlsx','Doc2_name.csv','Doc3_name.pdf']。我需要在现有的PDF中附加它们。我尝试了以下代码,只有当我有一个附件时才有效。现在,我尝试迭代附件以附加所有附件,但在Final.pdf中,将只附加最后一个对象'Doc3.pdf'

fileReader = PdfFileReader('Existing_pdf.pdf', 'rb')
fileWriter = PdfFileWriter()
fileWriter = appendPagesFromReader(fileReader)

for j in range(1, len(List)):
    fileWriter.addAtachment(List1[j],List[j])

with open('Final.pdf', 'wb') as output_pdf:
    fileWriter.write(output_pdf)
jjhzyzn0

jjhzyzn01#

在我看来,addAttachment-Method总是替换当前的附件。
PyPDF2 Github中的www.example.com:pdf.py in the PyPDF2 Github :

def addAttachment(self, fname, fdata):
    file_entry = DecodedStreamObject()
    file_entry.setData(fdata)
    file_entry.update({
            NameObject("/Type"): NameObject("/EmbeddedFile")
            })

    efEntry = DictionaryObject()
    efEntry.update({ NameObject("/F"):file_entry })

    filespec = DictionaryObject()
    filespec.update({
            NameObject("/Type"): NameObject("/Filespec"),
            NameObject("/F"): createStringObject(fname),  # Perhaps also try TextStringObject
            NameObject("/EF"): efEntry
            })

    embeddedFilesNamesDictionary = DictionaryObject()
    embeddedFilesNamesDictionary.update({
            NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])
            })

    embeddedFilesDictionary = DictionaryObject()
    embeddedFilesDictionary.update({
            NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary
            })
    # Update the root
    self._root_object.update({
        NameObject("/Names"): embeddedFilesDictionary
        })

我相信

self._root_object.update({
        NameObject("/Names"): embeddedFilesDictionary
        })

替换附件,而不是添加附件。

    • EDIT:**这个脚本为我附加了两个. txt文件。它使用了上面的addAttachment方法,我稍微调整了一下,以允许附加多个文件。
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import DecodedStreamObject, NameObject, DictionaryObject, createStringObject, ArrayObject

def appendAttachment(myPdfFileWriterObj, fname, fdata):
    # The entry for the file
    file_entry = DecodedStreamObject()
    file_entry.setData(fdata)
    file_entry.update({NameObject("/Type"): NameObject("/EmbeddedFile")})

    # The Filespec entry
    efEntry = DictionaryObject()
    efEntry.update({ NameObject("/F"):file_entry })

    filespec = DictionaryObject()
    filespec.update({NameObject("/Type"): NameObject("/Filespec"),NameObject("/F"): createStringObject(fname),NameObject("/EF"): efEntry})

    if "/Names" not in myPdfFileWriterObj._root_object.keys():
        # No files attached yet. Create the entry for the root, as it needs a reference to the Filespec
        embeddedFilesNamesDictionary = DictionaryObject()
        embeddedFilesNamesDictionary.update({NameObject("/Names"): ArrayObject([createStringObject(fname), filespec])})

        embeddedFilesDictionary = DictionaryObject()
        embeddedFilesDictionary.update({NameObject("/EmbeddedFiles"): embeddedFilesNamesDictionary})
        myPdfFileWriterObj._root_object.update({NameObject("/Names"): embeddedFilesDictionary})
    else:
        # There are files already attached. Append the new file.
        myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(createStringObject(fname))
        myPdfFileWriterObj._root_object["/Names"]["/EmbeddedFiles"]["/Names"].append(filespec)

fr = PdfFileReader('dummy.pdf','rb')
fw = PdfFileWriter()
fw.appendPagesFromReader(fr)

my_attach_files = ['test.txt','test2.txt']
for my_test in my_attach_files:
    with open(my_test, 'rb') as my_test_attachment:
        my_test_data = my_test_attachment.read()
    appendAttachment(fw, my_test, my_test_data)

with open('dummy_new.pdf','wb') as file:
    fw.write(file)

希望这对你有用。

qyyhg6bp

qyyhg6bp2#

**免责声明:**我是borb的作者,此答案中使用的库

borb中,Document类有一个方法add_embedded_file,它接受一个文件名(将显示在PDF查看器中)和字节。
此简短代码片段显示如何将嵌入文件添加到现有PDF:

from borb.pdf import Document
from borb.pdf import PDF

import typing

doc: typing.Optional[Document] = None
with open("input.pdf", "rb") as fh:
    doc = PDF.loads(fh)

# The next line adds an embedded file to the PDF.
# In order to keep this example short, I've used an inline byte string
# but you can of course read a file, and use those bytes
doc.add_embedded_file("name.json", b"{}")

# store
with open("output.pdf", "wb") as fh:
    PDF.dumps(fh, doc)

相关问题