如何使用php或python在pdf文件中查找某个单词的坐标？[已关闭]

g6baxovj 于 2023-04-10 发布在 PHP

关注(0)|答案(2)|浏览(229)

已关闭，此问题需要更focused，目前不接受回答。
**要改进此问题吗？**更新问题，使其仅关注editing this post的一个问题。

4天前关闭。
Improve this question
示例：https://www.africau.edu/images/default/sample.pdf
在这个PDF文件中，我需要在第二页的单词“无聊”的坐标。
我必须得到x=25和y=100这样的值
当我得到这个，我必须把新的文本。
示例：

`$this->pdf = new FPDI();            
$this->pdf->addPage();
$this->pdf->SetFont('Arial','',18);
$this->pdf->SetXY(25, 110);
$this->pdf->Write(5,"Text after boring");`

php

来源：https://stackoverflow.com/questions/75929924/how-to-find-coordinate-of-some-word-in-pdf-file-using-php-or-python

2条答案

按热度按时间

kiayqfof1#

在PyMuPDF中没有问题：

import fitz # PyMuPDF
doc = fitz.open()
page = doc.new_page()  # default page format A4
# insert some text starting at point (100, 100)
page.insert_text((100,100), "Hello World out there!")
words = page.get_text("words")
for word in words:
    print(word)
# produces this output:
(100.0, 88.17500305175781, 125.05799865722656, 103.28900146484375, 'Hello', 0, 0, 0)
(128.11599731445312, 88.17500305175781, 156.8369903564453, 103.28900146484375, 'World', 0, 0, 1)
(159.89498901367188, 88.17500305175781, 175.1849822998047, 103.28900146484375, 'out', 0, 0, 2)
(178.24298095703125, 88.17500305175781, 206.36996459960938, 103.28900146484375, 'there!', 0, 0, 3)

每行中的前4个浮点数是后面单词文本的边界框。在字符串后面是块号，块内行号，行内单词号。
.get_text("words")只是六个输出选项中的一个，从简单的文本到每个字符的详细信息，字体，字体和文本特征，书写方向等。

赞(0）回复(0）举报 2023-04-10

i2byvkas2#

您可以在记事本中打开该示例，然后非常非常罕见地查看您想要的值。
通常你必须先解压文件。

回答

暂定（实际值见下文）

69.2500 640.8000

然而，这是非常非常罕见的，“感兴趣的词”
A）未拆分为单独的字符或编码。
B）是该行字符串中的第一个单词。
C）处于默认比例和大小（10点），没有任何其他变换。
请注意，Y的值比/MediaBox [0 0 612.0000 792.0000]的值要大
这是因为PDF使用类似于系统的图表，因此方向原点通常位于左下方，因此您的目标可能是左上方的792-640.8 = 151.2

但是，我们可以使用mutool扫描页面以获得实际的B位置

mutool trace sample.pdf 2 |find "glyph" |find "B"
        <g unicode="B" glyph="B" x="236.55995" y="664.704" adv=".667"/>
        <g unicode="B" glyph="B" x="72.03" y="640.8" adv=".667"/>

所以有了这些知识最后

X = 72.03和Y = 151.2

赞(0）回复(0）举报 2023-04-10

我来回答

如何使用php或python在pdf文件中查找某个单词的坐标？[已关闭]

2条答案

回答

所以有了这些知识最后

相关问题

热门标签

最新问答