Selenium WebDriverWait TimeoutException当尝试将数据提取到Pandas DataFrame时

m2xkgtsf  于 2023-06-20  发布在  其他
关注(0)|答案(2)|浏览(138)

你好Stack Overflow社区
我目前正在编写一个Python脚本,涉及从网页获取数据并将其存储在pandas DataFrame中。但是,我遇到了一个问题,DataFrame返回为null。我不能像预期的那样拿到记录。
下面是我正在使用的代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd

def extract_data_from_table(table):
    countries = []
    regions_states = []
    start_dates = []
    end_dates = []

    if table is not None:
        for row in table.find_all('tr')[1:]:
            columns = row.find_all('td')

            if len(columns) >= 4:
                countries.append(columns[0].text.strip())
                regions_states.append(columns[1].text.strip())
                start_dates.append(columns[2].text.strip())
                end_dates.append(columns[3].text.strip())

        return pd.DataFrame({
            'Country': countries,
            'Regions/States': regions_states,
            'DST Start Date': start_dates,
            'DST End Date': end_dates
        }) 
    else:
        return None

url = "https://www.timeanddate.com/time/dst/2023.html"

# Create an instance of Chrome Options
options = Options()
options.add_argument("start-maximized")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Set up a WebDriverWait that will wait up to 1000 seconds for the table to appear
wait = WebDriverWait(driver, 1000)

driver.get(url)

# Wait for the table to appear
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'table table--inner-borders-all table--left table--striped table--hover')))

# Get the page source and parse it with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table', class_='table table--inner-borders-all table--left table--striped table--hover')

df = extract_data_from_table(table)

driver.quit()

if df is not None:
    print(df)

当我运行这段代码时,我希望看到一个DataFrame,其中填充了我试图获取的记录。然而,我得到的是一个空的DataFrame。我尝试通过检查记录源并确保数据确实存在来调试此问题,但我仍然无法填充DataFrame。
以下是我收到的错误消息:Traceback (most recent call last): File "/Users/rajeevranjanpandey/test.py", line 51, in <module> wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'table table--inner-borders-all table--left table--striped table--hover'))) File "/Users/rajeevranjanpandey/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/support/wait.py", line 95, in until raise TimeoutException(message, screen, stacktrace)
我对Python、Selenium和pandas相对来说是个新手,所以我不确定我做错了什么。有谁能告诉我这里可能有什么问题吗?任何帮助将不胜感激。
感谢您的评分
以下是我尝试解决这个问题的步骤:

  • 已检查URL以确保其正确且网页可访问。
  • 已验证网页上是否存在我尝试抓取的表。
  • 检查了网页HTML中表的类名,以确保它与我的代码中的类名匹配。
  • 增加了WebDriverWait超时,以查看表是否需要更多时间加载。

尽管采取了这些步骤,我仍然遇到了同样的问题。我对Python、Selenium和pandas相对来说是个新手,所以我不确定我做错了什么。有谁能告诉我这里可能有什么问题吗?任何帮助将不胜感激。
感谢您的评分

tcomlyy6

tcomlyy61#

下面是根据解析逻辑的完整解决方案。
不使用Selenium(使用requests+BeautifulSoup):

from bs4 import BeautifulSoup
import pandas as pd
import requests

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

def extract_data_from_table(table):
    countries = []
    regions_states = []
    start_dates = []
    end_dates = []
    prev_country = None
    if table:
        for row in table.find('tbody').find_all('tr'):
            try:
                country_col = row.find('th').text.strip()
                prev_country = country_col
            except AttributeError:
                pass

            other_col = row.find_all('td')
            if len(other_col) > 2:
                countries.append(prev_country)
                regions_states.append(other_col[0].text.strip())
                start_dates.append(other_col[1].text.strip())
                end_dates.append(other_col[2].text.strip())

        return pd.DataFrame({
            'Country': countries,
            'Regions/States': regions_states,
            'DST Start Date': start_dates,
            'DST End Date': end_dates
        })
    else:
        return None

data = requests.get('https://www.timeanddate.com/time/dst/2023.html', headers={"Accept-Language": "en"})
soup = BeautifulSoup(data.text, 'html.parser')

table = soup.find('table', class_='table table--inner-borders-all table--left table--striped table--hover')
df = extract_data_from_table(table)

if df is not None:
    print(df)

输出:

Country                                     Regions/States         DST Start Date            DST End Date
0               Åland Islands                                      All locations       Sunday, 26 March      Sunday, 29 October
1                     Albania                                      All locations       Sunday, 26 March      Sunday, 29 October
2                     Andorra                                      All locations       Sunday, 26 March      Sunday, 29 October
3                  Antarctica                                     Some locations   Sunday, 24 September         Sunday, 2 April
4                  Antarctica                                      Troll Station       Sunday, 19 March      Sunday, 29 October
5                   Australia                                     Most locations      Sunday, 1 October         Sunday, 2 April
6                   Australia                                   Lord Howe Island      Sunday, 1 October         Sunday, 2 April
7                     Austria                                      All locations       Sunday, 26 March      Sunday, 29 October
8                     Belgium                                      All locations       Sunday, 26 March      Sunday, 29 October
9                     Bermuda                                      All locations       Sunday, 12 March      Sunday, 5 November
10     Bosnia and Herzegovina                                      All locations       Sunday, 26 March      Sunday, 29 October
11                   Bulgaria                                      All locations       Sunday, 26 March      Sunday, 29 October
12                     Canada                                     Most locations       Sunday, 12 March      Sunday, 5 November
13                      Chile                                     Most locations    Sunday, 3 September         Sunday, 2 April
14                      Chile                                      Easter Island  Saturday, 2 September       Saturday, 1 April
15                    Croatia                                      All locations       Sunday, 26 March      Sunday, 29 October
16                       Cuba                                      All locations       Sunday, 12 March      Sunday, 5 November
17                     Cyprus                                      All locations       Sunday, 26 March      Sunday, 29 October
18                    Czechia                                      All locations       Sunday, 26 March      Sunday, 29 October
19                    Denmark                                      All locations       Sunday, 26 March      Sunday, 29 October
20                      Egypt                                      All locations       Friday, 28 April      Friday, 27 October
21                    Estonia                                      All locations       Sunday, 26 March      Sunday, 29 October
22              Faroe Islands                                      All locations       Sunday, 26 March      Sunday, 29 October
23                       Fiji                                      All locations    Sunday, 12 November  Does not end this year
24                    Finland                                      All locations       Sunday, 26 March      Sunday, 29 October
25                     France                                     Most locations       Sunday, 26 March      Sunday, 29 October
26                    Germany                                      All locations       Sunday, 26 March      Sunday, 29 October
27                  Gibraltar                                      All locations       Sunday, 26 March      Sunday, 29 October
28                     Greece                                      All locations       Sunday, 26 March      Sunday, 29 October
29                  Greenland                                     Most locations     Saturday, 25 March    Saturday, 28 October
30                  Greenland                                   Ittoqqortoormiit       Sunday, 26 March      Sunday, 29 October
31                  Greenland                                     Thule Air Base       Sunday, 12 March      Sunday, 5 November
32                   Guernsey                                      All locations       Sunday, 26 March      Sunday, 29 October
33                      Haiti                                      All locations       Sunday, 12 March      Sunday, 5 November
34                    Hungary                                      All locations       Sunday, 26 March      Sunday, 29 October
35                    Ireland                                      All locations       Sunday, 26 March      Sunday, 29 October
36                Isle of Man                                      All locations       Sunday, 26 March      Sunday, 29 October
37                     Israel                                      All locations       Friday, 24 March      Sunday, 29 October
38                      Italy                                      All locations       Sunday, 26 March      Sunday, 29 October
39                     Jersey                                      All locations       Sunday, 26 March      Sunday, 29 October
40                     Kosovo                                      All locations       Sunday, 26 March      Sunday, 29 October
41                     Latvia                                      All locations       Sunday, 26 March      Sunday, 29 October
42                    Lebanon                                      All locations     Thursday, 30 March      Sunday, 29 October
43              Liechtenstein                                      All locations       Sunday, 26 March      Sunday, 29 October
44                  Lithuania                                      All locations       Sunday, 26 March      Sunday, 29 October
45                 Luxembourg                                      All locations       Sunday, 26 March      Sunday, 29 October
46                      Malta                                      All locations       Sunday, 26 March      Sunday, 29 October
47                     Mexico  Baja California, much of Chihuahua, much of Ta...       Sunday, 12 March      Sunday, 5 November
48                    Moldova                                      All locations       Sunday, 26 March      Sunday, 29 October
49                     Monaco                                      All locations       Sunday, 26 March      Sunday, 29 October
50                 Montenegro                                      All locations       Sunday, 26 March      Sunday, 29 October
51                    Morocco                                      All locations       Sunday, 23 April        Sunday, 19 March
52                Netherlands                                     Most locations       Sunday, 26 March      Sunday, 29 October
53                New Zealand                                      All locations   Sunday, 24 September         Sunday, 2 April
54             Norfolk Island                                      All locations      Sunday, 1 October         Sunday, 2 April
55            North Macedonia                                      All locations       Sunday, 26 March      Sunday, 29 October
56                     Norway                                      All locations       Sunday, 26 March      Sunday, 29 October
57                  Palestine                                      All locations     Saturday, 29 April    Saturday, 28 October
58                   Paraguay                                      All locations      Sunday, 1 October        Sunday, 26 March
59                     Poland                                      All locations       Sunday, 26 March      Sunday, 29 October
60                   Portugal                                      All locations       Sunday, 26 March      Sunday, 29 October
61                    Romania                                      All locations       Sunday, 26 March      Sunday, 29 October
62  Saint Pierre and Miquelon                                      All locations       Sunday, 12 March      Sunday, 5 November
63                 San Marino                                      All locations       Sunday, 26 March      Sunday, 29 October
64                     Serbia                                      All locations       Sunday, 26 March      Sunday, 29 October
65                   Slovakia                                      All locations       Sunday, 26 March      Sunday, 29 October
66                   Slovenia                                      All locations       Sunday, 26 March      Sunday, 29 October
67                      Spain                                      All locations       Sunday, 26 March      Sunday, 29 October
68                     Sweden                                      All locations       Sunday, 26 March      Sunday, 29 October
69                Switzerland                                      All locations       Sunday, 26 March      Sunday, 29 October
70                The Bahamas                                      All locations       Sunday, 12 March      Sunday, 5 November
71   Turks and Caicos Islands                                      All locations       Sunday, 12 March      Sunday, 5 November
72                    Ukraine                                     Most locations       Sunday, 26 March      Sunday, 29 October
73             United Kingdom                                      All locations       Sunday, 26 March      Sunday, 29 October
74              United States                                     Most locations       Sunday, 12 March      Sunday, 5 November
75    Vatican City (Holy See)                                      All locations       Sunday, 26 March      Sunday, 29 October
76             Western Sahara                                      All locations       Sunday, 23 April        Sunday, 19 March

要确保结果是英语的,请将headers与请求沿着传递。headers={"Accept-Language": "en"}
[更新]:在下面的评论中回答您的第二个问题:

# to make the URL dynamic sensing the current year
from datetime import datetime

data = requests.get(f'https://www.timeanddate.com/time/dst/{datetime.now().year}.html', headers={"Accept-Language": "en"})

你可以简单地传递datetime.now().year来获得当前的年份。如果你明年运行它,URL将是https://www.timeanddate.com/time/dst/2024.html等等。

eqzww0vc

eqzww0vc2#

By.CLASS_NAME只接受单个类值,不接受多个类值,而是使用By.CSS_SELECTOR
//等待表出现

wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.table.table--inner-borders-all.table--left.table--striped.table--hover')))

//获取table元素的html

tableContent=wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.table.table--inner-borders-all.table--left.table--striped.table--hover'))).get_attribute("outerHtml")

//使用in built方法获取 Dataframe ,无需使用soup和解析

df=pd.read_html(tableContent)[0]
print(df)

或者你可以只使用两行代码,selenium甚至不需要。

df=pd.read_html("https://www.timeanddate.com/time/dst/2023.html")
print(df[0])

快照:

相关问题