为什么pandas中的行被覆盖?

zdwk9cvp  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(104)

我正在运行下面的代码,从多个URL中获取相同的信息。

data = []

alldata = soup.find_all(class_='module show-times')

for item in alldata:

    each_race = item.find_all(class_='racing-time')

    for item in each_race:
        racelinks = item.find_all('a')

        for a in racelinks:
            link = (a.get("href"))

            full_url = urljoin(url, link)

            driver2 = webdriver.Chrome()

            driver2.get(full_url)

            html2 =driver2.page_source

            soup2  = BeautifulSoup(html2, 'html.parser')

            data=[]

            for item in soup2:

                verdict = soup2.find_all(id="oc-verdict")

                for item in verdict:
                    first = item.find_all('span', class_ ="beta-footnote")[1]
                    odds_first = first.nextSibling
                    odds_first_simple = odds_first.replace(',', '')

                    second = item.find_all('span', class_ ="beta-footnote")[2]
                    odds_second = second.nextSibling
                    odds_second_simple = odds_second.replace(',', '')

                    third = item.find_all('span', class_ ="beta-footnote")[3]
                    odds_third = third.nextSibling

                    data.append({
                        "Favourite": odds_first_simple,
                        "Second": odds_second_simple,
                        "Third": odds_third
                        })

            df = pd.DataFrame(data)
            print(df)

字符串
代码按设计获取信息,但我试图创建一个dataframe的所有信息从所有的URL刮。当运行上述我得到的输出在这里只是覆盖前一行与最近刮的数据,而不是添加到以前刮的数据:

Favourite               Second              Third
0  Here Comes Georgie   Beneficially Yours   Cracking Rhapsody
   Favourite        Second             Third
0  Blue Fin   Fia Fuinidh   Five Dollar Fine
   Favourite       Second         Third
0  Lisloran   R S Ambush   Bushmill Boy


我正在寻找这样的输出,其中每一个新的行被添加:

Favourite               Second              Third
0  Here Comes Georgie   Beneficially Yours   Cracking Rhapsody
1  Blue Fin             Fia Fuinidh          Five Dollar Fine
2  Lisloran             R S Ambush           Bushmill Boy


我试过移动append()位和concat(),但没有任何效果。

d8tt03nd

d8tt03nd1#

由于这个问题缺乏一些细节,这里只是一个方法,假设data是一个列表,进一步的字典被正确地添加到该列表中-将dataframe的创建从循环中删除。

编辑

根据你添加的额外信息,在循环内创建dataframe和在循环内创建data仍然是同样的问题。循环外应该只有一个data

...
data = []

alldata = soup.find_all(class_='module show-times')

for item in alldata:

    each_race = item.find_all(class_='racing-time')

    for item in each_race:
        racelinks = item.find_all('a')

        for a in racelinks:
            ...
            for item in soup2:
                verdict = soup2.find_all(id="oc-verdict")
                for item in verdict:
                    ...
                    data.append({
                        "Favourite": odds_first_simple,
                        "Second": odds_second_simple,
                        "Third": odds_third
                        })

df = pd.DataFrame(data)

字符串

相关问题