pandas 计算重复次数并在新列上书写

wfauudbj 于 2023-01-07 发布在其他

关注(0)|答案(3)|浏览(171)

当3列的值与另一行匹配时，是否可以逐行比较同一 Dataframe 上N列之间的值，并设置一个新列来计算重复次数？
发件人：

id | column1 | column2 | column3
0  | z       | x       | x       
1  | y       | y       | y       
2  | x       | x       | x       
3  | x       | x       | x       
4  | z       | y       | x      
5  | w       | w       | w     
6  | w       | w       | w     
7  | w       | w       | w

收件人：

id | column1 | column2 | column3 | counter
0  | z       | x       | x       | 0
1  | y       | y       | y       | 1
2  | x       | x       | x       | 2
3  | x       | x       | x       | 2
4  | z       | y       | x       | 0
5  | w       | w       | w       | 3
6  | w       | w       | w       | 3
7  | w       | w       | w       | 3

大概是这样：if(column1[someRow] == column1[anotherRow] & column2[someRow] == column2[anotherRow] & column3[someRow] == column3[anotherRow])然后counter[someRow]++

pandas

来源：https://stackoverflow.com/questions/75005652/counting-repetitions-and-writing-on-a-new-column

3条答案

按热度按时间

mzillmmw1#

您可以使用：

# keep only relevant columns
df2 = df.drop(columns='id')

# are the values identical in the row?
m = df2.eq(df2.iloc[:, 0], axis=0).all(axis=1)

# count the number of occurrences per group
# and only keep the output when all values are identical
df['counter'] = df2.groupby(list(df2)).transform('size').where(m, 0)

# for older pandas versions
# df['counter'] = df2.groupby(list(df2))[df2.columns[0]].transform('size').where(m, 0)

输出（为清楚起见，多了一行）：

id column1 column2 column3  counter
0   0       z       x       x        0
1   1       y       y       y        1
2   2       x       x       x        2
3   3       x       x       x        2
4   4       z       y       x        0
5   5       w       w       w        1

使用的输入：

df = pd.DataFrame({'id': [0, 1, 2, 3, 4, 5],
                   'column1': ['z', 'y', 'x', 'x', 'z', 'w'],
                   'column2': ['x', 'y', 'x', 'x', 'y', 'w'],
                   'column3': ['x', 'y', 'x', 'x', 'x', 'w'],
                  })

赞(0）回复(0）举报 2023-01-07

lhcgjxsq2#

您可以：

s = df.drop("id", axis=1).nunique(1)
df["counter"] = (
    df.groupby(df.where(s.eq(1))["column1"]).transform("size").fillna(0).astype(int)
)

#For previous versions of pandas
df["counter"] = (
    df.groupby(df.where(s.eq(1))["column1"])["column1"]
    .transform("size")
    .fillna(0)
    .astype(int)
)

print(df)

   id column1 column2 column3  counter
0   0       z       x       x        0
1   1       y       y       y        1
2   2       x       x       x        2
3   3       x       x       x        2
4   4       z       y       x        0
5   5       w       w       w        1

我们在这里所做的是使用nunique获取轴1上的唯一元素数（不包括列id），然后取那些只有一个唯一值的行，并使用size执行groupby.transform。

赞(0）回复(0）举报 2023-01-07

k10s72fa3#

答：
第一个月

赞(0）回复(0）举报 2023-01-07