使用python将多页pdf文件分割为多个pdf文件?

dgjrabp2  于 2023-01-29  发布在  Python
关注(0)|答案(6)|浏览(181)

我想采取多页pdf文件,并创建单独的pdf文件每页。
我已经下载了reportlab并浏览了文档,但它似乎是针对pdf生成的,我还没有看到任何关于处理PDF文件本身的东西。
在python中有没有简单的方法来完成这个任务?

mlnl4t2r

mlnl4t2r1#

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("document.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

等等。

qxsslcnc

qxsslcnc2#

我在这里错过了一个解决方案,您将PDF拆分为两个部分组成的所有页面,所以我附加我的解决方案,如果有人正在寻找相同的:

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdf_to_two(filename,page_number):
    pdf_reader = PdfFileReader(open(filename, "rb"))
    try:
        assert page_number < pdf_reader.numPages
        pdf_writer1 = PdfFileWriter()
        pdf_writer2 = PdfFileWriter()

        for page in range(page_number):
            pdf_writer1.addPage(pdf_reader.getPage(page))

        for page in range(page_number,pdf_reader.getNumPages()):
            pdf_writer2.addPage(pdf_reader.getPage(page))

        with open("part1.pdf", 'wb') as file1:
            pdf_writer1.write(file1)

        with open("part2.pdf", 'wb') as file2:
            pdf_writer2.write(file2)

    except AssertionError as e:
        print("Error: The PDF you are cutting has less pages than you want to cut!")
k3fezbri

k3fezbri3#

PyPDF2包使您能够将单个PDF拆分为多个PDF。

import os
from PyPDF2 import PdfFileReader, PdfFileWriter

pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
    pdf_writer = PdfFileWriter()
    pdf_writer.addPage(pdf.getPage(page))

    output_filename = '{}_page_{}.pdf'.format(fname, page+1)

    with open(output_filename, 'wb') as out:
        pdf_writer.write(out)

    print('Created: {}'.format(output_filename))

来源:https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/

w6lpcovy

w6lpcovy4#

我知道这段代码与python无关,但是我还是想发布这段R代码,它简单,灵活,工作起来令人惊讶。R中的PDFtools包在轻松拆分合并PDF方面令人惊讶。

library(pdftools) #Rpackage
pdf_subset('D:\\file\\20.02.20\\22 GT 2017.pdf',
           pages = 1:51, output = "subset.pdf")
elcex8rz

elcex8rz5#

import fitz

src = fitz.open("source.pdf")
for page in src:
    tar = fitz.open()  # output PDF for 1 page
    # copy over current page
    tar.insert_pdf(src, from_page=page.number, to_page=page.number)
    tar.save(f"page-{page.number}.pdf")
    tar.close()
3zwtqj6y

3zwtqj6y6#

之前用PyPDF2拆分pdf的答案在最新版本更新后不再起作用。作者建议使用pypdf,此版本的PyPDF2==3.0.1将是PyPDF2的最后一个版本。该函数需要修改如下:

import os
from PyPDF2 import PdfReader, PdfWriter

def split_pdfs(input_file_path):
    inputpdf = PdfReader(open(input_file_path, "rb"))

    out_paths = []
    if not os.path.exists("outputs"):
        os.makedirs("outputs")

    for i, page in enumerate(inputpdf.pages):
        output = PdfWriter()
        output.add_page(page)

        out_file_path = f"outputs/{input_file_path[:-4]}_{i}.pdf"
        with open(out_file_path, "wb") as output_stream:
            output.write(output_stream)

        out_paths.append(out_file_path)
    return out_paths

注意:同样的功能也适用于pypdf。从pypdf而不是PyPDF2导入PdfReaderPdfWriter

相关问题