- Python:Python 3.11.2 Python编辑器:PyCharm 2022.3.3(Community Edition)- Build PC-223.8836.43操作系统:Windows 11专业版,22 H2,22621.1413浏览器:Chrome 111.0.5563.65(官方版本)(64位)*
我有一个URL(例如,https://dockets.justia.com/docket/puerto-rico/prdce/3:2023cv01127/175963
),我从其中抓取九个项目。我希望有脚本创建一个csv文件,并将我的抓取输出(九个项目)写入csv文件中的列。有没有一个真正简单的方法来做到这一点?
from bs4 import BeautifulSoup
import requests
import csv
html_text = requests.get("https://dockets.justia.com/docket/puerto-rico/prdce/3:2023cv01127/175963").text
soup = BeautifulSoup(html_text, "lxml")
cases = soup.find_all("div", class_ = "wrapper jcard has-padding-30 blocks has-no-bottom-padding")
for case in cases:
case_title = case.find("div", class_ = "title-wrapper").text.replace(" "," ")
case_plaintiff = case.find("td", {"data-th": "Plaintiff"}).text.replace(" "," ")
case_defendant = case.find("td", {"data-th": "Defendant"}).text.replace(" "," ")
case_number = case.find("td", {"data-th": "Case Number"}).text.replace(" "," ")
case_filed = case.find("td", {"data-th": "Filed"}).text.replace(" "," ")
court = case.find("td", {"data-th": "Court"}).text.replace(" "," ")
case_nature_of_suit = case.find("td", {"data-th": "Nature of Suit"}).text.replace(" "," ")
case_cause_of_action = case.find("td", {"data-th": "Cause of Action"}).text.replace(" "," ")
jury_demanded = case.find("td", {"data-th": "Jury Demanded By"}).text.replace(" "," ")
print(f"{case_title.strip()}")
print(f"{case_plaintiff.strip()}")
print(f"{case_defendant.strip()}")
print(f"{case_number.strip()}")
print(f"{case_filed.strip()}")
print(f"{court.strip()}")
print(f"{case_nature_of_suit.strip()}")
print(f"{case_cause_of_action.strip()}")
print(f"{jury_demanded.strip()}")
3条答案
按热度按时间up9lanfz1#
用你的数据生成一个列表列表,并将其转储到csv中:
8yoxcaq72#
当然-最简单的是标准库
csv
模块。我冒昧地用一个函数重构了您的
.replace().strip()
;我们还将所有的案例数据收集到一个字典列表中,然后再将其写入文件。这使得添加新列变得更容易,而不必两次处理它们的名称。9rnv2umw3#
pandas
有一个.to_csv
方法。你也可以用
.T
来transpose。您也可以直接使用
read_html
从URL获取所有表