排名,排名,啦啦啦(Pandas排名)

rsl1atfo  于 2023-02-27  发布在  其他
关注(0)|答案(1)|浏览(200)

我有一个Pandas的数据框,看起来像这样:

SPX       RYH       RSP  ...       RYT       RYU      EWRE
Date                                      ...                              
2022-03-04       NaN       NaN       NaN  ...       NaN       NaN       NaN
2022-03-11 -0.028774 -0.037115 -0.026436  ... -0.029486 -0.007445 -0.010430
2022-03-18  0.061558  0.059660  0.051164  ...  0.075097  0.003155  0.020566
2022-03-25  0.017911 -0.004760  0.009611  ...  0.003947  0.035678  0.010814
2022-04-01  0.000616  0.016157  0.001266  ... -0.003325  0.040844  0.035427
2022-04-08 -0.012666  0.019052 -0.008406  ... -0.034156  0.019695 -0.006067
2022-04-14 -0.021320 -0.027425 -0.008669  ... -0.027773 -0.008233 -0.007764
2022-04-22 -0.027503 -0.044911 -0.020189  ... -0.026124 -0.013137  0.009547
2022-04-29 -0.032738 -0.038706 -0.032417  ... -0.016110 -0.044835 -0.052401

其结构如下:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 52 entries, 2022-03-04 to 2023-02-23
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   SPX     51 non-null     float32
 1   RYH     51 non-null     float32
 2   RSP     51 non-null     float32
 3   RCD     51 non-null     float32
 4   RYE     51 non-null     float32
 5   RYF     51 non-null     float32
 6   RGI     51 non-null     float32
 7   EWCO    51 non-null     float32
 8   RTM     51 non-null     float32
 9   RHS     51 non-null     float32
 10  RYT     51 non-null     float32
 11  RYU     51 non-null     float32
 12  EWRE    51 non-null     float32
dtypes: float32(13)
memory usage: 3.0 KB

我可以这样评价:

>>> a.changes.rank(method = "first", axis = 1)
             SPX   RYH  RSP   RCD   RYE  ...   RTM   RHS   RYT   RYU  EWRE
Date                                     ...                              
2022-03-04   NaN   NaN  NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN
2022-03-11   6.0   2.0  8.0   3.0  13.0  ...  11.0   1.0   5.0  12.0  10.0
2022-03-18   9.0   8.0  6.0  13.0   1.0  ...   5.0   4.0  12.0   2.0   3.0
2022-03-25   9.0   2.0  5.0   1.0  13.0  ...  12.0  10.0   4.0  11.0   7.0
2022-04-01   8.0  10.0  9.0   5.0   2.0  ...   3.0  11.0   7.0  13.0  12.0
2022-04-08   6.0  10.0  7.0   3.0  12.0  ...   9.0  13.0   1.0  11.0   8.0
2022-04-14   3.0   2.0  5.0  13.0  10.0  ...  12.0  11.0   1.0   6.0   7.0
2022-04-22   5.0   3.0  7.0   9.0   2.0  ...   4.0  12.0   6.0  10.0  13.0

然而,这种方法只产生有序的排序,所以排序[1,2,3]产生的结果如预期的那样是[1,2,3],但是排序[1,9,10]也是如此,我需要的是考虑距离的排序,比如[0.0,0.888,1.0],这个函数就是这样做的。

(x - min(row)) / ((max(row) - min(row)) for x in row

我需要知道的是如何将其应用到 Dataframe 中。我尝试了以下方法:

self.ranks2 = self.changes.apply(lambda row: [(x - min(row)) / (max(row) - min(row)) for x in row], axis=1)

这是可行的,但是当我需要的是原始 Dataframe 的一个修改副本,而不是值时,返回的是一个列表的 Dataframe 。如何将该函数应用于 Dataframe ,以生成既尊重量级又尊重阶数的秩?

uqzxnwby

uqzxnwby1#

您可以计算每一行的最小值和最大值,然后将公式应用于每一列:

df_min = df.min(axis=1)
df_max = df.max(axis=1)
df.apply(lambda col: (col - df_min) / (df_max - df_min))

输出(输入被截断):

SPX       RYH       RSP       RYT       RYU      EWRE
Date                                                                  
2022-03-04       NaN       NaN       NaN       NaN       NaN       NaN
2022-03-11  0.281126  0.000000  0.359926  0.257128  1.000000  0.899393
2022-03-18  0.811807  0.785424  0.667329  1.000000  0.000000  0.242014
2022-03-25  0.560636  0.000000  0.355384  0.215317  1.000000  0.385133
2022-04-01  0.089225  0.441079  0.103942  0.000000  1.000000  0.877357
2022-04-08  0.399064  0.988060  0.478171  0.000000  1.000000  0.521606
2022-04-14  0.322505  0.017392  0.954770  0.000000  0.976561  1.000000
2022-04-22  0.319659  0.000000  0.453965  0.344981  0.583459  1.000000
2022-04-29  0.541815  0.377366  0.550660  1.000000  0.208481  0.000000

相关问题