我正试图使用此代码从纳斯达克网页刮IPO数据。
代码可以废弃,但DataFrame中的结果是NaN
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
from time import sleep
from datetime import datetime
# Define dates
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 5, 31)
dates = pd.period_range(start_date, end_date, freq='M')
# Create an empty DataFrame
df = pd.DataFrame(columns=['Company Name', 'Symbol', 'Market', 'Price', 'Shares'])
# Set the URL and headers
url = 'https://www.nasdaq.com/markets/ipos/activity.aspx?tab=pricings&month=%s'
headers = {'User-Agent': 'non-profit learning project'}
# Scrape IPO data for each date
for idx in dates:
print(f'Fetching data for {idx}')
result = requests.get(url % idx, headers=headers)
sleep(30)
content = result.content
if 'There is no data for this month' not in str(content):
table = pd.read_html(content)[0]
print(table)
df = pd.concat([df, table], ignore_index=True)
soup = BeautifulSoup(content, features="lxml")
links = soup.find_all('a', id=re.compile('two_column_main_content_rptPricing_company_\d'))
print(f"Length of table vs length of links: {table.shape[0] - len(links)}")
for link in links:
df['Link'].append(link['href'])
# Print the resulting DataFrame
print(df)
结果如下:
Fetching data for 2023-01
Unnamed: 0 Unnamed: 1
0 NaN NaN
Length of table vs length of links: 1
Fetching data for 2023-02
Unnamed: 0 Unnamed: 1
0 NaN NaN
Length of table vs length of links: 1
Fetching data for 2023-03
Unnamed: 0 Unnamed: 1
0 NaN NaN
Length of table vs length of links: 1
Fetching data for 2023-04
Unnamed: 0 Unnamed: 1
0 NaN NaN
Length of table vs length of links: 1
Fetching data for 2023-05
Unnamed: 0 Unnamed: 1
0 NaN NaN
Length of table vs length of links: 1
Company Name Symbol Market Price Shares Unnamed: 0 Unnamed: 1
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN
代码似乎成功地获取了指定日期范围内每个月的数据。但是,结果DataFrame存在一些问题,如列中存在NaN值所示。
我想用IPO的数据做一个模型,有什么想法可以实现吗?谢谢
1条答案
按热度按时间e4eetjau1#
不要解析HTML内容,而是使用公共API:
输出: