webscrap和导出csv

cgh8pdjw  于 2023-09-27  发布在  其他
关注(0)|答案(2)|浏览(111)

我需要webscrape,我不知道该为userdetail[].text放什么,或者这是否是我应该使用的。我还得到了
属性错误:“NoneType”对象没有属性“find_all”

import requests as r
from bs4 import BeautifulSoup
import csv
import time as t
import random as rnd

urltoget='http://drd.ba.ttu.edu/isqs3358/hw/hw1/'
filename1 = 'exercisedata.csv'
lowval = 5
highval = 7

res=r.get(urltoget)
soup = BeautifulSoup(res.content, 'lxml')

user = soup.find('div', attrs={'id':'UserIndex'})
userurls = user.find_all('a')

with open(filename1,'w') as exercisedata:
    datawriter = csv.writer(exercisedata, delimiter=',', quotechar='"',     quoting=csv.QUOTE_NONNUMERIC)

    datawriter.writerow(['rank', 'user_id', 'first_name', 'last_name', 'avg_sleep',     'avg_water','avg_step','day', 'day_water', 'day_step', 'metric'])
    for user in userurls:
        href = user['href']
        userres = r.get(urltoget + href)
        usersoup = BeautifulSoup(userres.content, 'lxml')
        userinfo = usersoup.find('div', attrs={'id' : 'userinfo'}).find_all('span', attrs=    {'class' : 'val'})
        datawriter.writerow([href.split('=')[1]               
                        
                             ,userdetail[].text
                        
        timetosleep = rnd.randint(lowval, highval) + rnd.random()
        t.sleep(timetosleep)
vi4fp9gy

vi4fp9gy1#

这将是一个改进的代码,工作。需要注意的是,div的ID不是UserIndex,而是UsrIndex。

import requests as r
from bs4 import BeautifulSoup


urltoget = 'http://drd.ba.ttu.edu/isqs3358/hw/hw1/'
filename1 = 'exercisedata.csv'
lowval = 5
highval = 7

res = r.get(urltoget)
soup = BeautifulSoup(res.content, 'lxml')

user = soup.find('div', attrs={'id': 'UsrIndex'})
print(user)
userurls = user.find_all('a')
w8ntj3qf

w8ntj3qf2#

由于你想要的数据都在HTML表中,你可以使用pandas和“read_html”方法来完成:

import pandas as pd

url = 'http://drd.ba.ttu.edu/isqs3358/hw/hw1/'
df = pd.read_html(url,extract_links='body')[0]

output = []
for idx, row in df.iterrows():
    xdf = pd.read_html(url+row[0][1])[0]
    xdf['name'] = row['Name'][0]
    xdf['rank'] = row['Rank'][0]
    xdf['user_id'] = row[0][1].split('=')[-1]

    output.append(xdf)

fdf = pd.concat(output)
fdf.to_csv('+output.csv',index=False)

相关问题