python 使用pypdf操作pdf中的内容

rkkpypqq  于 2023-03-28  发布在  Python
关注(0)|答案(1)|浏览(141)

我有一个方法,使用PyPDF4 1.27.0来改变pdf文件中所有黑色对象的颜色。现在我想切换到pypdf 3.6.0,但我不知道如何移植此代码。
我已经把代码简化成一个小的工作示例。这个示例把所有的back对象变成红色。

from PyPDF4 import PdfFileReader, PdfFileWriter
from PyPDF4.pdf import ContentStream
from PyPDF4.generic import TextStringObject, NameObject, NumberObject, FloatObject
from PyPDF4.utils import b_

with open("test.pdf", "rb") as f:
    source = PdfFileReader(f, "rb")
    output = PdfFileWriter()

    for page in range(source.getNumPages()):
        page = source.getPage(page)
        content_object = page["/Contents"].getObject()
        content = ContentStream(content_object, source)

        i = 0
        for operands, operator in content.operations:
            if operator == b_("rg") or operator == b_("RG"):
                if operands == [0, 0, 0]:
                    rgb = content.operations[i][0]
                    content.operations[i] = ([FloatObject(1.0),
                                              FloatObject(0.0),
                                              FloatObject(0.0)],
                                             content.operations[i][1])
            i = i + 1

        page.__setitem__(NameObject('/Contents'), content)
        output.addPage(page)

    with open("test_colored.pdf", "wb") as outputStream:
        output.write(outputStream)

感觉我什么都试过了,但毫无进展。这就是我现在的处境:

import pypdf
from io import BytesIO

with open("test.pdf", "rb") as fh:
    bytes_stream = BytesIO(fh.read())

# Read from bytes_stream
pdfReader = pypdf.PdfReader(bytes_stream)

page = pdfReader.pages[0]
content_object = page.get_contents()
content = content_object.get_data()

content = content.replace(b'0 0 0 rg', b'0.99 0.0 0.0 rg')
content = content.replace(b'0 0 0 RG', b'0.99 0.0 0.0 RG')

writer = pypdf.PdfWriter()

# Somehow add the content to new_page ??

writer.add_page(new_page)

with open("test_colored.pdf", "wb") as f:
    writer.write(f)

我试图找到与content = ContentStream(content_object, source)等价的代码,它可以让我以与for operands, operator in content.operations:之前相同的方式迭代内容,但我没有找到。
下一个问题是,我不知道如何从内容创建一个页面,然后我可以添加到作家。
我已经在PyPDF2 1.26.0、PyPDF4 1.27.0和PyMuPDF(fitz)1.21.1上运行了它,但我真的很想切换到pypdf。

7kqas0il

7kqas0il1#

好吧,我觉得自己很愚蠢。原来我遇到的大多数问题都是由于pypdf是使用pip安装的,同时在我保存Python脚本的文件夹中有一个pypdf文件夹。
现在我已经在pypdf 3.7.0(最新版本)中找到了正确的方法。这个例子把所有的back对象都变成了红色。

from pypdf import PdfReader, PdfWriter, generic, _utils

with open("test.pdf", "rb") as f:
    source = PdfReader(f, "rb")
    output = PdfWriter()

    for page in range(len(source.pages)):
        page = source.pages[page]
        content_object = page["/Contents"].get_object()
        content = generic.ContentStream(content_object, source)

        i = 0
        for operands, operator in content.operations:
            if operator == _utils.b_("rg") or operator == _utils.b_("RG"):
                if operands == [0, 0, 0]:
                    rgb = content.operations[i][0]
                    content.operations[i] = (
                        [generic.FloatObject(1.0), generic.FloatObject(0.0), generic.FloatObject(0.0)], content.operations[i][1])
            i = i + 1

        page.__setitem__(generic.NameObject('/Contents'), content)
        output.add_page(page)

    with open("test_colored.pdf", "wb") as outputStream:
        output.write(outputStream)

相关问题