python-3.x 如何从一个DOCX文件中按页写入单独的DOCX文件？

bwitn5fc 于 2023-04-22 发布在 Python

关注(0)|答案(3)|浏览(156)

我有一个由几百页组成的MS Word文档。
每个页面都是相同的，除了一个人的名字在每个页面上是唯一的。（一个页面是一个用户）。
我想把这个word文档和自动化的过程，以单独保存每一页，所以我最终有几百个word文档，每个人一个，而不是一个文件，由每个人组成，我可以分发给不同的人。
我一直在使用这里找到的模块python-docx：https://python-docx.readthedocs.io/en/latest/
我正在为如何完成这项任务而努力。
据我所知，不可能在每个页面上循环，因为页面不是在.docx文件本身中确定的，而是由程序生成的，即Microsoft Word。
然而python-docx可以解释文本，因为每个页面都是相同的，所以当你看到这个文本（给定页面上的最后一段文本）时，我可以不对python说，把这看作是一个页面的结束，在这一点之后的任何东西都是一个新的页面。
理想情况下，如果我能写一个循环，将考虑这样一个点，并创建一个文档，直到这一点，并重复所有的页面，这将是伟大的。
我不反对其他方法，如转换为PDF第一，如果这是一个选项。
有什么想法吗

python-3.x

来源：https://stackoverflow.com/questions/59993669/how-to-write-separate-docx-files-by-page-from-one-docx-file

3条答案

按热度按时间

wgeznvg71#

我遇到了完全相同的问题。不幸的是，我找不到一种方法来按页面拆分.docx。解决方案是首先使用python-docx或docx 2 python（无论你喜欢什么）来迭代每个页面，并提取唯一的（人）信息并将其放入列表中，这样你就得到了：

people = ['person_A', 'person_B', 'person_C', ....]

然后将.docx保存为pdf将pdf按页面拆分，然后将它们保存为person_A. pdf等，如下所示：

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("document.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open(f"{people[i]}.pdf", "wb") as outputStream:
        output.write(outputStream)

结果是一堆一页的PDF文件保存为Person_A.pdf，Person_B.pdf等。

赞(0）回复(0）举报 2023-04-22

ssgvzors2#

我建议另一个软件包aspose-words-cloud将word文档拆分为单独的页面。目前，它可以与云存储（Aspose云存储，Amazon S3，DropBox，Google Drive Storage，Google Cloud Storage，Windows Azure Storage和FTP Storage）一起使用。然而，在不久的将来，它将支持来自请求主体（流）的流程文件。
P.S：我是Aspose的开发者布道者。

# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile

# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxx-xxxxx-xxxx-xxxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxx'

words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'

remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = '02_pages.docx'
remoteFileName = '02_pages.docx'

#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))

#Split DOCX pages as a zip file
request = asposewordscloud.models.requests.SplitDocumentRequest(name=remoteFileName, format='docx', folder=remoteFolder, zip_output= 'true')
result = words_api.split_document(request)
print("Result {}".format(result.split_result.zipped_pages.href))

#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(result.split_result.zipped_pages.href)
response_download = words_api.download_file(request_download)
copyfile(response_download, 'C:/'+ result.split_result.zipped_pages.href)

赞(0）回复(0）举报 2023-04-22

0g0grzrc3#

这是一个在线的DOCX分割器，它可以将您上传的.docx文件分割成多个.docx文件（原始页面的每一页），并将它们作为.zip文件下载。
（2023年4月13日测试，使用30页.docx文件）
https://products.groupdocs.app/splitter/docx

赞(0）回复(0）举报 2023-04-22