python 比较N个 Dataframe 列的值，检查它们是升序还是降序

jdzmm42g 于 2023-01-08 发布在 Python

关注(0)|答案(3)|浏览(176)

我正在寻找一种方法，当列数不明确知道时，可以将多个列相互比较。
具体来说，给定N列，我想创建一个名为'result'的附加列，其中每行的值为：

1，如果该行为col(0) > col(1) > col(2) > ... > col(N-1) > col(N)
-1，如果相反情况成立（col(0) < col(1) < ... < col(N-1) < col(N)
0，如果以上均不为真

例如，使用以下 Dataframe

df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,1],
                   'col2': [4,3,2,1,0,-1,-2,-3,1],
                   'col3': [8,6,4,2,0,-2,-4,-6,1]})

   col1  col2  col3
0     1     4     8
1     2     3     6
2     3     2     4
3     4     1     2
4     5     0     0
5     6    -1    -2
6     7    -2    -4
7     8    -3    -6
8     1     1     1

我应该得到下面的结果列

col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

我可以简单地

condition1 = (df['col1'] > df['col2']) & (df['col2'] > df['col3'])
condition2 = (df['col1'] < df['col2']) & (df['col2'] < df['col3'])

df['result'] = np.select([condition1,condition2], [1,-1], 0)

问题是随着列数的增加，这很快变得非常低效。
我尝试过将列转换为列表，获取每个列表的第一个元素，检查它们是降序还是升序（使用我在搜索这个答案时找到的一个函数），然后从中创建一个“结果列表”。

#Checking whether the list given is in Ascending or Descending order
def isOrdered(some_list):
    isAscending = True
    isDescending = True
    for i in range(1,len(some_list)):
        if(some_list[i] >= some_list[i-1]):
            isDescending = False
        elif(some_list[i] <= some_list[i-1]):
            isAscending = False
    if(isAscending):
        return -1
    if(isDescending):
        return 1
    return 0

#Converting the columns to lists and compare the nth elements of each, one at a time
#The columns are guaranteed to be of the same length
col_list = [df[x].to_list() for x in df.columns]
result_list = []
n=0
while n in range(len(col_list[0])):
    tmp_lst = []
    for idx in range(len(col_list)):
        tmp_lst.append(col_list[idx][n])
    result_list.append(isOrdered(tmp_lst))
    n +=1

df['result'] = result_list

这给我提供了以下DataFrame（它的问题是，如果所有值都相同，它将返回-1而不是0，但只要它准确地告诉我列是Ascending还是notAscending，我就可以接受它）

col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1      -1

这种方法看起来不太好，而且我怀疑它是否有效。有没有更好的方法来实现这一点？

python

来源：https://stackoverflow.com/questions/75038745/comparing-the-values-of-n-dataframe-columns-with-each-other-and-check-whether-th

3条答案

按热度按时间

xmq68pz91#

dirs = np.sign(df.diff(-1, axis="columns")).iloc[:, :-1]

df["result"] = np.select([dirs.eq(1).all(axis="columns"),
                          dirs.eq(-1).all(axis="columns")],
                         [1,
                          -1],
                         default=0)

获取每行的“方向
这是从左到右连续差值的符号（diff（-1，“columns”））
由于末尾没有下一个值，因此将其截断（iloc[：，：-1]）
如果方向 * 全部**等于1 =〉，则将1置于结果中
否则如果它们 * 全部**等于-1 =〉将-1放入结果
否则，设置默认值，即0

（np.select是向量化的if-elif-.. -else。）
得到

>>> df

   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

其中dirs是

>>> dirs

   col1  col2
0    -1    -1
1    -1    -1
2     1    -1
3     1    -1
4     1     0
5     1     1
6     1     1
7     1     1
8     0     0

赞(0）回复(0）举报 2023-01-08

e4eetjau2#

您可以计算各行沿着的差异，然后检查一行中的所有差异是大于还是小于0：

import numpy as np
diff = df.diff(axis=1).iloc[:,1:]
df['result'] = np.where((diff > 0).all(axis=1), -1, np.where((diff < 0).all(axis=1), 1, 0))
df
   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

赞(0）回复(0）举报 2023-01-08

whitzsjs3#

拟议代码：

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,1],
                   'col2': [4,3,2,1,0,-1,-2,-3,1],
                   'col3': [8,6,4,2,0,-2,-4,-6,1]})

def func(dat):
    return dat[['col2', 'col3']].div(dat[['col2', 'col3']].abs()).sum(axis=1)

df2 = df1.assign(cmp=lambda dat: func(dat.diff(axis=1)))
df2["cmp"] = df2["cmp"].apply(lambda r: 0 if abs(r) != len(df1.columns)-1 else np.sign(r))

print(df2)

结果：

col1  col2  col3  cmp
0     1     4     8   -1
1     2     3     6   -1
2     3     2     4    0
3     4     1     2    0
4     5     0     0    0
5     6    -1    -2    1
6     7    -2    -4    1
7     8    -3    -6    1
8     1     1     1    0

赞(0）回复(0）举报 2023-01-08

我来回答

python 比较N个 Dataframe 列的值，检查它们是升序还是降序

3条答案

相关问题

热门标签

最新问答