我遇到了一个奇怪的问题,Pandas Dataframe where in,where()失败,抱怨它不能连接重叠的索引名称。
ValueError: cannot join with no overlapping index names
>>> feed_tail
Close
ABAN.NS ADFFOODS.NS AGARIND.NS AMRUTANJAN.NS ASAHIINDIA.NS
Date
2022-10-13 50.55 699.00 687.50 713.80 622.30
2022-10-14 52.00 709.05 672.85 712.90 609.20
2022-10-17 50.75 711.95 669.25 710.05 611.10
2022-10-18 50.90 730.85 680.25 707.95 609.85
2022-10-19 50.05 713.10 692.10 705.45 604.45
>>> feed_tail.columns
MultiIndex([('Close', 'ABAN.NS'),
('Close', 'ADFFOODS.NS'),
('Close', 'AGARIND.NS'),
('Close', 'AMRUTANJAN.NS'),
('Close', 'ASAHIINDIA.NS')],
)
>>> feed_tail.index
DatetimeIndex(['2022-10-13', '2022-10-14', '2022-10-17', '2022-10-18',
'2022-10-19'],
dtype='datetime64[ns]', name='Date', freq=None)
>>> feed_tail['Close'] > 500
ABAN.NS ADFFOODS.NS AGARIND.NS AMRUTANJAN.NS ASAHIINDIA.NS
Date
2022-10-13 False True True True True
2022-10-14 False True True True True
2022-10-17 False True True True True
2022-10-18 False True True True True
2022-10-19 False True True True True
>>> feed_tail.where(feed_tail['Close'] > 500)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
return self._join_multi(other, how=how)
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names
我不知道这个数据框需要什么诊断。任何帮助都是真诚的感谢。
要在控制台上重现此问题:尝试以下内容
>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
>>> data
Close High ... Open Volume
AAPL GOOG MSFT AAPL GOOG MSFT ... AAPL GOOG MSFT AAPL GOOG MSFT
Date ...
2022-12-01 148.309998 101.279999 254.690002 149.130005 102.589996 256.119995 ... 148.210007 101.400002 253.869995 71250400 21771500 26041500
2022-12-02 147.809998 100.830002 255.020004 148.000000 101.150002 256.059998 ... 145.960007 99.370003 249.820007 65421400 18812200 21522800
2022-12-05 146.630005 99.870003 250.199997 150.919998 101.750000 253.820007 ... 147.770004 99.815002 252.009995 68826400 19955500 23435300
[3 rows x 15 columns]
>>> data['Close'] > 100
AAPL GOOG MSFT
Date
2022-12-01 True True True
2022-12-02 True True True
2022-12-05 True False True
>>> data[data['Close'] > 100]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
..
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 229, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
return self._join_multi(other, how=how)
File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names
Dataframe 中可能缺少什么,以至于无法工作?
- 更新**
感谢@JayPeerachai给出了一个方向的答案。至于为什么以前这样做是有效的:请查看here处问题中的解决方案部分。
我一直在用Pandas数据读取器从yahoo finance获取数据,直到昨天坏了,无法获取数据,这促使我切换到yahoo finance,但事实证明,获取结果是好的,但数据本身不知何故不允许where
函数查询,尽管是相同的数据集。因此,这个问题。
对pandas datareader和yfinance代码的进一步探讨帮助我理解了列名是不同的,pandas datareader实际上在将数据集返回给用户代码之前设置了列名。
它所需要的只是在获取数据之后设置列名,并且还能够迭代地运行where
函数查询。
>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
[*********************100%***********************] 3 of 3 completed
>>> data.columns.names = ["Attributes", "Symbols"]
>>> data[data['Close'] > 100]
Attributes Close High Low Open Volume
Symbols AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT AAPL GOOG MSFT
Date
2022-12-01 148.309998 101.279999 254.690002 149.130005 102.589996 256.119995 146.610001 100.669998 250.919998 148.210007 101.400002 253.869995 71250400 21771500.0 26041500
2022-12-02 147.809998 100.830002 255.020004 148.000000 101.150002 256.059998 145.649994 99.169998 249.690002 145.960007 99.370003 249.820007 65421400 18812200.0 21522800
2022-12-05 146.630005 NaN 250.199997 150.919998 NaN 253.820007 145.770004 NaN 248.059998 147.770004 NaN 252.009995 68826400 NaN 23435300
>>>
3条答案
按热度按时间apeeds0o1#
可能是多级列造成的,因为
where()
方法需要一个单级列。请先尝试将其扁平化。nbnkbykc2#
从
yfinance
获取结果后设置列名。理想情况下,希望
yfinance
自己处理此问题。uplii1fm3#
Where方法接受一个带有布尔值的列表类参数进行过滤。你必须传递给它一个Pandas系列,numpy数组,python列表等。但是你传递给它一个 Dataframe (df ['close']〉500),Where方法引发错误。你可以阅读Pandas文档来了解更多信息