我需要webscrape,我不知道该为userdetail[].text
放什么,或者这是否是我应该使用的。我还得到了
属性错误:“NoneType”对象没有属性“find_all”
import requests as r
from bs4 import BeautifulSoup
import csv
import time as t
import random as rnd
urltoget='http://drd.ba.ttu.edu/isqs3358/hw/hw1/'
filename1 = 'exercisedata.csv'
lowval = 5
highval = 7
res=r.get(urltoget)
soup = BeautifulSoup(res.content, 'lxml')
user = soup.find('div', attrs={'id':'UserIndex'})
userurls = user.find_all('a')
with open(filename1,'w') as exercisedata:
datawriter = csv.writer(exercisedata, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
datawriter.writerow(['rank', 'user_id', 'first_name', 'last_name', 'avg_sleep', 'avg_water','avg_step','day', 'day_water', 'day_step', 'metric'])
for user in userurls:
href = user['href']
userres = r.get(urltoget + href)
usersoup = BeautifulSoup(userres.content, 'lxml')
userinfo = usersoup.find('div', attrs={'id' : 'userinfo'}).find_all('span', attrs= {'class' : 'val'})
datawriter.writerow([href.split('=')[1]
,userdetail[].text
timetosleep = rnd.randint(lowval, highval) + rnd.random()
t.sleep(timetosleep)
2条答案
按热度按时间vi4fp9gy1#
这将是一个改进的代码,工作。需要注意的是,div的ID不是UserIndex,而是UsrIndex。
w8ntj3qf2#
由于你想要的数据都在HTML表中,你可以使用pandas和“read_html”方法来完成: