我学会了通过这个视频从网站刮数据:enter link description here
我的Python代码如下所示:
#IMPORT LIBRARIES
from bs4 import BeautifulSoup
import requests
import csv
#OPEN A NEW CSV FILE. IT CAN BE CALLED ANYTHING
file = open('csv_py.csv', 'w')
#CREATE A VARIABLE FOR WRITING TO THE CSV
writer = csv.writer(file)
#CREATE THE HEADER ROW OF THE CSV
writer.writerow(['Date', 'Content'])
#REQUEST WEBPAGE AND STORE IT AS A VARIABLE
page_to_scrape = requests.get("https://www.liechi.org/en/")
print(page_to_scrape)
#USE BEAUTIFULSOUP TO PARSE THE HTML AND STORE IT AS A VARIABLE
soup = BeautifulSoup(page_to_scrape.text, 'html.parser')
#FIND ALL THE ITEMS IN THE PAGE WITH A CLASS ATTRIBUTE OF 'archive-item-link'
#AND STORE THE LIST AS A VARIABLE
contents= soup.findAll('a', attrs={'class':'archive-item-link'})
#FIND ALL THE ITEMS IN THE PAGE WITH A CLASS ATTRIBUTE OF 'archive-item-date'
#AND STORE THE LIST AS A VARIABLE
dates = soup.findAll('span', attrs={'class':'archive-item-date'})
#LOOP THROUGH BOTH LISTS USING THE 'ZIP' FUNCTION
#AND PRINT AND FORMAT THE RESULTS
for date, content in zip(dates, contents):
print(date.text+ "(" + content.text+ ")")
#WRITE EACH ITEM AS A NEW ROW IN THE CSV
writer.writerow([date.text, content.text])
#CLOSE THE CSV FILE
file.close()
csv文件已创建,但在第一列中找不到日期:经过几次检查,我发现我必须单击单元格才能显示日期:
我想知道为什么我们不能看到的csv文件中的内容的日期和如何解决它。非常感谢!
1条答案
按热度按时间cgfeq70w1#
我执行了代码,注意到需要扩展单元格长度才能看到日期值。
一种解释是
date.text
中有换行符和空格。例如,其中一个date.text值为
你可以通过删除所有白色来清理这个问题,用以下代码更新你的for循环: