pandas ValueError:无法在同一 Dataframe 上没有重叠索引名称的情况下进行联接

58wvjzkj  于 2022-12-21  发布在  其他
关注(0)|答案(3)|浏览(648)

我遇到了一个奇怪的问题,Pandas Dataframe where in,where()失败,抱怨它不能连接重叠的索引名称。

ValueError: cannot join with no overlapping index names 

>>> feed_tail
             Close
           ABAN.NS ADFFOODS.NS AGARIND.NS AMRUTANJAN.NS ASAHIINDIA.NS
Date
2022-10-13   50.55      699.00     687.50        713.80        622.30
2022-10-14   52.00      709.05     672.85        712.90        609.20
2022-10-17   50.75      711.95     669.25        710.05        611.10
2022-10-18   50.90      730.85     680.25        707.95        609.85
2022-10-19   50.05      713.10     692.10        705.45        604.45
>>> feed_tail.columns
MultiIndex([('Close',       'ABAN.NS'),
            ('Close',   'ADFFOODS.NS'),
            ('Close',    'AGARIND.NS'),
            ('Close', 'AMRUTANJAN.NS'),
            ('Close', 'ASAHIINDIA.NS')],
           )
>>> feed_tail.index
DatetimeIndex(['2022-10-13', '2022-10-14', '2022-10-17', '2022-10-18',
               '2022-10-19'],
              dtype='datetime64[ns]', name='Date', freq=None)

>>> feed_tail['Close'] > 500
            ABAN.NS  ADFFOODS.NS  AGARIND.NS  AMRUTANJAN.NS  ASAHIINDIA.NS
Date
2022-10-13    False         True        True           True           True
2022-10-14    False         True        True           True           True
2022-10-17    False         True        True           True           True
2022-10-18    False         True        True           True           True
2022-10-19    False         True        True           True           True
>>> feed_tail.where(feed_tail['Close'] > 500)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
    return self._join_multi(other, how=how)
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
    raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names

我不知道这个数据框需要什么诊断。任何帮助都是真诚的感谢。
要在控制台上重现此问题:尝试以下内容

>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)

>>> data
                 Close                                High                          ...        Open                            Volume
                  AAPL        GOOG        MSFT        AAPL        GOOG        MSFT  ...        AAPL        GOOG        MSFT      AAPL      GOOG      MSFT
Date                                                                                ...
2022-12-01  148.309998  101.279999  254.690002  149.130005  102.589996  256.119995  ...  148.210007  101.400002  253.869995  71250400  21771500  26041500
2022-12-02  147.809998  100.830002  255.020004  148.000000  101.150002  256.059998  ...  145.960007   99.370003  249.820007  65421400  18812200  21522800
2022-12-05  146.630005   99.870003  250.199997  150.919998  101.750000  253.820007  ...  147.770004   99.815002  252.009995  68826400  19955500  23435300

[3 rows x 15 columns]
    >>> data['Close'] > 100
            AAPL   GOOG  MSFT
Date
2022-12-01  True   True  True
2022-12-02  True   True  True
2022-12-05  True  False  True

>>> data[data['Close'] > 100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
..
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 229, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4658, in join
    return self._join_multi(other, how=how)
  File "lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4782, in _join_multi
    raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names

Dataframe 中可能缺少什么,以至于无法工作?

    • 更新**

感谢@JayPeerachai给出了一个方向的答案。至于为什么以前这样做是有效的:请查看here处问题中的解决方案部分。
我一直在用Pandas数据读取器从yahoo finance获取数据,直到昨天坏了,无法获取数据,这促使我切换到yahoo finance,但事实证明,获取结果是好的,但数据本身不知何故不允许where函数查询,尽管是相同的数据集。因此,这个问题。
对pandas datareader和yfinance代码的进一步探讨帮助我理解了列名是不同的,pandas datareader实际上在将数据集返回给用户代码之前设置了列名。
它所需要的只是在获取数据之后设置列名,并且还能够迭代地运行where函数查询。

>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
[*********************100%***********************]  3 of 3 completed
>>> data.columns.names = ["Attributes", "Symbols"]
>>> data[data['Close'] > 100]
Attributes       Close                                High                                 Low                                Open                            Volume
Symbols           AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT      AAPL        GOOG      MSFT
Date
2022-12-01  148.309998  101.279999  254.690002  149.130005  102.589996  256.119995  146.610001  100.669998  250.919998  148.210007  101.400002  253.869995  71250400  21771500.0  26041500
2022-12-02  147.809998  100.830002  255.020004  148.000000  101.150002  256.059998  145.649994   99.169998  249.690002  145.960007   99.370003  249.820007  65421400  18812200.0  21522800
2022-12-05  146.630005         NaN  250.199997  150.919998         NaN  253.820007  145.770004         NaN  248.059998  147.770004         NaN  252.009995  68826400         NaN  23435300
>>>
apeeds0o

apeeds0o1#

可能是多级列造成的,因为where()方法需要一个单级列。请先尝试将其扁平化。

startdate=datetime(2022,12,1)
enddate=datetime(2022,12,6)
y_symbols = ['GOOG', 'AAPL', 'MSFT']
data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
data = data.stack()
filtered_cond = data['Close'] > 100
filtered_data = data.where(filtered_cond).unstack()
nbnkbykc

nbnkbykc2#

yfinance获取结果后设置列名。
理想情况下,希望yfinance自己处理此问题。

>>> import yfinance as yf
>>> from datetime import datetime
>>> startdate=datetime(2022,12,1)
>>> enddate=datetime(2022,12,6)
>>> y_symbols = ['GOOG', 'AAPL', 'MSFT']
>>> data=yf.download(y_symbols, start=startdate, end=enddate, auto_adjust=True, threads=True)
[*********************100%***********************]  3 of 3 completed
>>> data.columns.names = ["Attributes", "Symbols"]
>>> data[data['Close'] > 100]
Attributes       Close                                High                                 Low                                Open                            Volume
Symbols           AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT        AAPL        GOOG        MSFT      AAPL        GOOG      MSFT
Date
2022-12-01  148.309998  101.279999  254.690002  149.130005  102.589996  256.119995  146.610001  100.669998  250.919998  148.210007  101.400002  253.869995  71250400  21771500.0  26041500
2022-12-02  147.809998  100.830002  255.020004  148.000000  101.150002  256.059998  145.649994   99.169998  249.690002  145.960007   99.370003  249.820007  65421400  18812200.0  21522800
2022-12-05  146.630005         NaN  250.199997  150.919998         NaN  253.820007  145.770004         NaN  248.059998  147.770004         NaN  252.009995  68826400         NaN  23435300
>>>

>>> data.where(data['Close'] > 100).where(data['High'] > 120)
Attributes       Close                         High                          Low                         Open                     Volume
Symbols           AAPL GOOG        MSFT        AAPL GOOG        MSFT        AAPL GOOG        MSFT        AAPL GOOG        MSFT      AAPL GOOG      MSFT
Date
2022-12-01  148.309998  NaN  254.690002  149.130005  NaN  256.119995  146.610001  NaN  250.919998  148.210007  NaN  253.869995  71250400  NaN  26041500
2022-12-02  147.809998  NaN  255.020004  148.000000  NaN  256.059998  145.649994  NaN  249.690002  145.960007  NaN  249.820007  65421400  NaN  21522800
2022-12-05  146.630005  NaN  250.199997  150.919998  NaN  253.820007  145.770004  NaN  248.059998  147.770004  NaN  252.009995  68826400  NaN  23435300
uplii1fm

uplii1fm3#

Where方法接受一个带有布尔值的列表类参数进行过滤。你必须传递给它一个Pandas系列,numpy数组,python列表等。但是你传递给它一个 Dataframe (df ['close']〉500),Where方法引发错误。你可以阅读Pandas文档来了解更多信息

相关问题