pandas 纳斯达克IPO数据抓取

pinkon5k  于 2023-06-20  发布在  其他
关注(0)|答案(1)|浏览(74)

我正试图使用此代码从纳斯达克网页刮IPO数据。
代码可以废弃,但DataFrame中的结果是NaN

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
from time import sleep
from datetime import datetime

# Define dates
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 5, 31)
dates = pd.period_range(start_date, end_date, freq='M')

# Create an empty DataFrame
df = pd.DataFrame(columns=['Company Name', 'Symbol', 'Market', 'Price', 'Shares'])

# Set the URL and headers
url = 'https://www.nasdaq.com/markets/ipos/activity.aspx?tab=pricings&month=%s'
headers = {'User-Agent': 'non-profit learning project'}

# Scrape IPO data for each date
for idx in dates:
    print(f'Fetching data for {idx}')
    result = requests.get(url % idx, headers=headers)
    sleep(30)
    content = result.content
    
    if 'There is no data for this month' not in str(content):
        table = pd.read_html(content)[0]
        print(table)
        df = pd.concat([df, table], ignore_index=True)
    
        soup = BeautifulSoup(content, features="lxml")
        
        links = soup.find_all('a', id=re.compile('two_column_main_content_rptPricing_company_\d'))
        print(f"Length of table vs length of links: {table.shape[0] - len(links)}")
        
        for link in links:
            df['Link'].append(link['href'])

# Print the resulting DataFrame
print(df)

结果如下:

Fetching data for 2023-01
   Unnamed: 0  Unnamed: 1
0         NaN         NaN
Length of table vs length of links: 1
Fetching data for 2023-02
   Unnamed: 0  Unnamed: 1
0         NaN         NaN
Length of table vs length of links: 1
Fetching data for 2023-03
   Unnamed: 0  Unnamed: 1
0         NaN         NaN
Length of table vs length of links: 1
Fetching data for 2023-04
   Unnamed: 0  Unnamed: 1
0         NaN         NaN
Length of table vs length of links: 1
Fetching data for 2023-05
   Unnamed: 0  Unnamed: 1
0         NaN         NaN
Length of table vs length of links: 1
  Company Name Symbol Market Price Shares  Unnamed: 0  Unnamed: 1
0          NaN    NaN    NaN   NaN    NaN         NaN         NaN
1          NaN    NaN    NaN   NaN    NaN         NaN         NaN
2          NaN    NaN    NaN   NaN    NaN         NaN         NaN
3          NaN    NaN    NaN   NaN    NaN         NaN         NaN
4          NaN    NaN    NaN   NaN    NaN         NaN         NaN

代码似乎成功地获取了指定日期范围内每个月的数据。但是,结果DataFrame存在一些问题,如列中存在NaN值所示。
我想用IPO的数据做一个模型,有什么想法可以实现吗?谢谢

e4eetjau

e4eetjau1#

不要解析HTML内容,而是使用公共API:

import pandas as pd
import requests

url = 'https://api.nasdaq.com/api/ipo/calendar'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0'}
start_date = '2023-1-1'
end_date = '2023-5-31'
periods = pd.period_range(start_date, end_date, freq='M')
dfs = []
for period in periods:
    data = requests.get(url, headers=headers, params={'date': period}).json()
    df = pd.json_normalize(data['data']['priced'], 'rows')
    dfs.append(df)
df = pd.concat(dfs, ignore_index=True)

输出:

>>> df
            dealID proposedTickerSymbol                             companyName      proposedExchange proposedSharePrice sharesOffered pricedDate dollarValueOfSharesOffered dealStatus
0   1225815-104715                 BREA                      Brera Holdings PLC        NASDAQ Capital               5.00     1,705,000  1/27/2023                 $8,525,000     Priced
1    890697-104848                  TXO               TXO Energy Partners, L.P.                  NYSE              20.00     5,000,000  1/27/2023               $100,000,000     Priced
2    405880-103426                 GNLX                            GENELUX CORP        NASDAQ Capital               6.00     2,500,000  1/26/2023                $15,000,000     Priced
3   1241592-105143                  QSG                    QuantaSing Group Ltd         NASDAQ Global              12.50     3,250,000  1/25/2023                $40,625,000     Priced
4   1225290-104329                 CVKD             Cadrenal Therapeutics, Inc.        NASDAQ Capital               5.00     1,400,000  1/20/2023                 $7,000,000     Priced
..             ...                  ...                                     ...                   ...                ...           ...        ...                        ...        ...
64  1210259-102635                  SGE       Strong Global Entertainment, Inc.              NYSE MKT               4.00     1,000,000  5/16/2023                 $4,000,000     Priced
65  1254469-106197                 SLRN                          ACELYRIN, Inc.  NASDAQ Global Select              18.00    30,000,000  5/05/2023               $540,000,000     Priced
66  1239799-104989                ALCYU  Alchemy Investments Acquisition Corp 1         NASDAQ Global              10.00    10,000,000  5/05/2023               $100,000,000     Priced
67  1243360-105271                 KVUE                             Kenvue Inc.                  NYSE              22.00   172,812,560  5/04/2023             $3,801,876,320     Priced
68  1190851-101486                GODNU            Golden Star Acquisition Corp         NASDAQ Global              10.00     6,000,000  5/02/2023                $60,000,000     Priced

[69 rows x 9 columns]

相关问题