pandas 获取标准普尔500指数股票代码列表

hfsqlsce  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(119)

所以我在Python上使用series for Finance,它一直给我错误--

1) line 22, in <module> save_sp500_tickers() and 

2) line 8, in save_sp500_tickers
    soup = bs.BeautifulSoup(resp.text,'lxml')and 

3) line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.
Do you need to install a parser library?

我已经在它一整天,我诚实地拒绝给予,任何帮助与此将大大appreicated.此外,如果任何人有任何建议,除了泡菜,可以帮助写的东西,让我打电话给sp 500没有泡菜,这将是伟大的.

import bs4 as bs    
import pickle    
import requests    
import lxml    
def save_sp500_tickers():
    resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')        
    soup = bs.BeautifulSoup(resp.text,'lxml')        
    table = soup.find('table', {'class': 'wikitable sortable'})        

    tickers = []

    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)

    with open("sp500tickers.pickle", "wb") as f:
        pickle.dump(tickers, f)
    print(tickers)

    return tickers    

save_sp500_tickers()
9gm1akwq

9gm1akwq1#

在我的系统上按原样运行代码是可行的。
不幸的是,如果你在Windows上,pip install lxml不能工作,除非你有一个完整的编译器基础设施,你可能没有。
幸运的是,你可以从www.example.com获得一个预编译的二进制安装程序http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml-确保你选择的安装程序与你的python版本相匹配,不管它是32位还是64位。

**编辑:**只是为了兴趣,换行试试

soup = bs.BeautifulSoup(resp.text, 'html.parser')   # use Python's built-in parser instead

有关https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser可用解析器的列表,请参见www.example.com。

jobtbby3

jobtbby32#

要获得标准普尔500指数的非官方列表,可以使用pandas.read_html。还需要lxmlbs4 + html5lib等解析器,因为它是由pandas内部使用的。

使用维基百科
import pandas as pd

def list_wikipedia_sp500() -> pd.DataFrame:
    # Ref: https://stackoverflow.com/a/75845569/
    url = 'https://en.m.wikipedia.org/wiki/List_of_S%26P_500_companies'
    return pd.read_html(url, attrs={'id': 'constituents'}, index_col='Symbol')[0]

>> df = list_wikipedia_sp500()
>> df.head()
           Security             GICS Sector  ...      CIK      Founded
Symbol                                       ...                      
MMM              3M             Industrials  ...    66740         1902
AOS     A. O. Smith             Industrials  ...    91142         1916
ABT          Abbott             Health Care  ...     1800         1888
ABBV         AbbVie             Health Care  ...  1551152  2013 (1888)
ACN       Accenture  Information Technology  ...  1467373         1989
[5 rows x 7 columns]

>> symbols = df.index.to_list()
>> symbols[:5]
['MMM', 'AOS', 'ABT', 'ABBV', 'ACN']
使用Slickcharts
import pandas as pd
import requests

def list_slickcharts_sp500() -> pd.DataFrame:
    # Ref: https://stackoverflow.com/a/75845569/
    url = 'https://www.slickcharts.com/sp500'
    user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0'  # Default user-agent fails.
    response = requests.get(url, headers={'User-Agent': user_agent})
    return pd.read_html(response.text, match='Symbol', index_col='Symbol')[0]

这是用Pandas 1.5.3测试的。
结果可以在存储器和/或磁盘上缓存某个时间段,例如8小时,以便避免对源的过度重复调用的风险。
纳斯达克100指数的类似答案是here

相关问题