html 如何只使用页面一半的数据？

zujrkrfu 于 2023-11-15 发布在其他

关注(0)|答案(1)|浏览(120)

我想从一个网站导入一份成绩单，但只分析其中一半的数据。我已经导入了URL，我想计算文本中唯一单词的总数，但只从成绩单“The Rental of the Manor of Mayfield，1545”这一行开始。有人知道我可以使用什么代码吗？我不知道如何从一个网址计数的话，但只有从某一部分。到目前为止，我写的：

import requests
source = 'http://www.myjacobfamily.com/historical%20manuscripts/mayfield%201.htm'
r = requests.get(source)
print(r.text)

字符串

Html

来源：https://stackoverflow.com/questions/54872148/importing-a-url-how-do-i-only-use-data-from-half-the-page

1条答案

按热度按时间

gcuhipw91#

我已经包括了下面的代码，我认为你在寻找什么。

import requests
import bs4

response = requests.get('http://www.myjacobfamily.com/historical%20manuscripts/mayfield%201.htm')
soup = bs4.BeautifulSoup(response.text, 'html.parser')
lines = soup.find_all('p')
story = []
record = False
for line in lines:
    if "The Rental of the Manor of Mayfield, 1545." in line.text:
        story.append(line.text)
        record = True
        continue
    if record is True and "---" not in line.text:
        story.append(line.text)
    elif record is True and "---" in line.text:
        break
print(story)

字符串
在这段代码中，我从您发布的链接中提取了一个故事（“半页”是什么意思？））通过使用BeautifulSoup模块来解析<p>和</p>标签之间的所有信息。您可以通过使用Internet浏览器上的开发人员工具查看此信息。一旦加载了所有lines，代码就会遍历它们，直到 * 遇到了The Rental of the Manor of Mayfield，1545.*。此时，它将抓取每一行，直到到达包含“-”的行（这似乎是网站上描述故事的方式）。此时，它中断循环并打印故事。您可以将此列表连接到一个字符串中：

"".join(story)

型
我认为将你想要的故事复制到一个文本文档中，然后用Python之类的东西处理这个文本文档会容易得多。Web抓取绝对不是我解决这个问题的首选。

赞(0）回复(0）举报 2023-11-15

我来回答

html 如何只使用页面一半的数据？

1条答案

相关问题

热门标签

最新问答