我正在导入一个HTML文件。它有一个奇怪的格式和多个索引的数据。
我对导入表"Photovoltaic"特别感兴趣,它从大表的第10行开始。该表似乎是多索引的。
代码:
net_met_cus = 'https://www.eia.gov/electricity/annual/html/epa_04_10.html'
net_met = pd.read_html(net_met_cus)
print(len(net_met))
net_met_pv = net_met[1]
# Photovoltaic table starts at 12 row
print(net_met_pv.loc[12])
Unnamed: 0_level_0 Year Photovoltaic
Capacity (MW) Residential Photovoltaic
Commercial Photovoltaic
Industrial Photovoltaic
Transportation Photovoltaic
Total Photovoltaic
Customers Residential Photovoltaic
Commercial Photovoltaic
Industrial Photovoltaic
Transportation Photovoltaic
Total Photovoltaic
Name: 12, dtype: object
# Is it multiindex
print(net_met_pv.loc[12].index)
MultiIndex([('Unnamed: 0_level_0', 'Year'),
( 'Capacity (MW)', 'Residential'),
( 'Capacity (MW)', 'Commercial'),
( 'Capacity (MW)', 'Industrial'),
( 'Capacity (MW)', 'Transportation'),
( 'Capacity (MW)', 'Total'),
( 'Customers', 'Residential'),
( 'Customers', 'Commercial'),
( 'Customers', 'Industrial'),
( 'Customers', 'Transportation'),
( 'Customers', 'Total')],
)
# Okay, let's flaten it
net_met_pv.to_flat_index()
当前输出:
AttributeError: 'DataFrame' object has no attribute 'to_flat_index'
1条答案
按热度按时间u3r8eeie1#
.to_flat_index()
是Index
或Multindex
的方法,因此应使用net_met_pv.loc[12].index.to_flat_index()
或类似调用进行调用。参考:www.example.comhttps://pandas.pydata.org/docs/reference/api/pandas.Index.to_flat_index.html?highlight=to_flat_index#pandas.Index.to_flat_indexhttps://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.to_flat_index.html?highlight=to_flat_index#pandas.MultiIndex.to_flat_index