我正在运行下面的代码,从多个URL中获取相同的信息。
data = []
alldata = soup.find_all(class_='module show-times')
for item in alldata:
each_race = item.find_all(class_='racing-time')
for item in each_race:
racelinks = item.find_all('a')
for a in racelinks:
link = (a.get("href"))
full_url = urljoin(url, link)
driver2 = webdriver.Chrome()
driver2.get(full_url)
html2 =driver2.page_source
soup2 = BeautifulSoup(html2, 'html.parser')
data=[]
for item in soup2:
verdict = soup2.find_all(id="oc-verdict")
for item in verdict:
first = item.find_all('span', class_ ="beta-footnote")[1]
odds_first = first.nextSibling
odds_first_simple = odds_first.replace(',', '')
second = item.find_all('span', class_ ="beta-footnote")[2]
odds_second = second.nextSibling
odds_second_simple = odds_second.replace(',', '')
third = item.find_all('span', class_ ="beta-footnote")[3]
odds_third = third.nextSibling
data.append({
"Favourite": odds_first_simple,
"Second": odds_second_simple,
"Third": odds_third
})
df = pd.DataFrame(data)
print(df)
字符串
代码按设计获取信息,但我试图创建一个dataframe
的所有信息从所有的URL刮。当运行上述我得到的输出在这里只是覆盖前一行与最近刮的数据,而不是添加到以前刮的数据:
Favourite Second Third
0 Here Comes Georgie Beneficially Yours Cracking Rhapsody
Favourite Second Third
0 Blue Fin Fia Fuinidh Five Dollar Fine
Favourite Second Third
0 Lisloran R S Ambush Bushmill Boy
型
我正在寻找这样的输出,其中每一个新的行被添加:
Favourite Second Third
0 Here Comes Georgie Beneficially Yours Cracking Rhapsody
1 Blue Fin Fia Fuinidh Five Dollar Fine
2 Lisloran R S Ambush Bushmill Boy
型
我试过移动append()
位和concat()
,但没有任何效果。
1条答案
按热度按时间d8tt03nd1#
由于这个问题缺乏一些细节,这里只是一个方法,假设
data
是一个列表,进一步的字典被正确地添加到该列表中-将dataframe
的创建从循环中删除。编辑
根据你添加的额外信息,在循环内创建
dataframe
和在循环内创建data
仍然是同样的问题。循环外应该只有一个data
。字符串