python 基于列值过滤或查询Pandas多索引数据框架

yzuktlbb  于 2022-12-02  发布在  Python
关注(0)|答案(1)|浏览(148)

我有一个多索引PandasDataFrame,如下图,主要用DateTime对象进行索引。

>>> type(feed_tail)
<class 'pandas.core.frame.DataFrame'>

>>> feed_tail.index
DatetimeIndex(['2022-11-11', '2022-11-14', '2022-11-15', '2022-11-16',
           '2022-11-17', '2022-11-18', '2022-11-21', '2022-11-22',
           '2022-11-23', '2022-11-24'],
          dtype='datetime64[ns]', name='Date', freq=None)

>>> feed_tail.columns
MultiIndex([(       'Close', 'BALKRISIND.NS'),
        (       'Close',        'KSB.NS'),
        (       'SMA13', 'BALKRISIND.NS'),
        (       'SMA13',        'KSB.NS'),
        ('ClosegtSMA13', 'BALKRISIND.NS'),
        ('ClosegtSMA13',        'KSB.NS'),
        (     'MTDPerf', 'BALKRISIND.NS'),
        (     'MTDPerf',        'KSB.NS')],
       names=['Attributes', 'Symbols'])

>>> feed_tail
Attributes         Close                  SMA13           ClosegtSMA13              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11       1889.45  1834.40       1933.03  1959.00         False  False         -3.73 -11.86
2022-11-14       1875.55  1848.60       1927.28  1944.42         False  False         -4.44 -11.18
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47

我试图访问/过滤 Dataframe 到另一个 Dataframe ,对于每个日期时间索引,其中ClosegtSMA13列是True,但似乎我无法理解这里的数据模型。Quest是按顺序迭代日期时间索引,并且获得其中符号的ClosegtSMA13TrueClosegreater thanSMA13的 Dataframe ,并且然后检查过滤/查询的 Dataframe 以在循环内进一步处理。
任何有助于进一步解开这一点的帮助都是真诚的感谢。
谢谢你
更新内容:
遵循@jezrael的建议使用mask。这有助于执行'OR'运算,但它更倾向于获取所有符号都满足Close gt SMA13的所有序列行。

>>> feed_tail
Attributes         Close                  SMA13           ClosegtSMA13              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11       1889.45  1834.40       1933.03  1959.00         False  False         -3.73 -11.86
2022-11-14       1875.55  1848.60       1927.28  1944.42         False  False         -4.44 -11.18
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47
>>> mask = feed_tail['Close'].gt(feed_tail['SMA13']).any(axis=1)
>>> df = feed_tail[mask]
>>> df
Attributes         Close                  SMA13           SMA13gtClose              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47
  • 总体任务 * 与此数据框架模型的更大形状相关联,我打算在此模型中获取每天的顶级“MTDPerf”项目,这似乎有帮助,但我希望在检查其MTDPerf值之前,通过确保它们具有“Close gt SMA 13”进行筛选。
>>> for dt in feed_tail.index:
...     
feed_tail['MTDPerf'].loc[dt].head(10).sort_values(ascending=False)

尝试在访问MTDPerf相关内容之前进行筛选,

>>> for dt in feed_tail.index:
...     d=feed_tail[feed_tail['Close'].loc[dt] > feed_tail['SMA13'].loc[dt]]
...     d
...
<stdin>:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "lib/python3.9/site-packages/pandas/core/frame.py", line 3796, in __getitem__
    return self._getitem_bool_array(key)
  File "lib/python3.9/site-packages/pandas/core/frame.py", line 3849, in _getitem_bool_array
    key = check_bool_indexer(self.index, key)
  File "lib/python3.9/site-packages/pandas/core/indexing.py", line 2548, in check_bool_indexer
    raise IndexingError(
pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
aelbi1ox

aelbi1ox1#

使用带滤波器MTDPerfSeriesDataFrame.loc
第一个
如果不匹配,用DataFrame.where替换NaN的解决方案:

df = feed_tail['MTDPerf'].where(feed_tail['Close'] > feed_tail['SMA13'])
print (df)
Symbols     BALKRISIND.NS  KSB.NS
2022-11-11            NaN     NaN
2022-11-14            NaN     NaN
2022-11-15           0.02   -6.11
2022-11-16          -0.33   -5.36
2022-11-17           0.79   -5.85
2022-11-18           0.51   -7.85
2022-11-21          -0.86     NaN
2022-11-22          -0.63     NaN
2022-11-23          -0.82   -7.23
2022-11-24           0.64   -7.47

整形后:

s = feed_tail['MTDPerf'].where(feed_tail['Close'] > feed_tail['SMA13']).stack()
print (s)
            Symbols      
2022-11-15  BALKRISIND.NS    0.02
            KSB.NS          -6.11
2022-11-16  BALKRISIND.NS   -0.33
            KSB.NS          -5.36
2022-11-17  BALKRISIND.NS    0.79
            KSB.NS          -5.85
2022-11-18  BALKRISIND.NS    0.51
            KSB.NS          -7.85
2022-11-21  BALKRISIND.NS   -0.86
2022-11-22  BALKRISIND.NS   -0.63
2022-11-23  BALKRISIND.NS   -0.82
            KSB.NS          -7.23
2022-11-24  BALKRISIND.NS    0.64
            KSB.NS          -7.47
dtype: float64

相关问题