pandas 如何从dataframe中的每一列获取唯一值

jvlzgdj9 于 2023-08-01 发布在其他

关注(0)|答案(2)|浏览(151)

我正在处理一个 Dataframe ，它看起来像这样：

from pandas import DataFrame
    import pandas as pd
    sample = DataFrame([{'ID': 'no1', 'B': 'Eric','C': 'George','D': 'a'},
                    {'ID': 'no1', 'B': 'Eric','C': 'George','D': 'b'},
                    {'ID': 'no1', 'B': 'Eric','C': 'George','D': 'c'},
                    {'ID': 'no1', 'B': 'Eric','C': 'Genna','D': 'a'},
                    {'ID': 'no1', 'B': 'Eric','C': 'Genna','D': 'b'},
                    {'ID': 'no1', 'B': 'Eric','C': 'Genna','D': 'c'},
                    {'ID': 'no1', 'B': 'aa','C': 'George','D': 'a'},
                    {'ID': 'no1', 'B': 'aa','C': 'George','D': 'b'},
                    {'ID': 'no1', 'B': 'aa','C': 'George','D': 'c'},
                    {'ID': 'no1', 'B': 'aa','C': 'Genna','D': 'a'},
                    {'ID': 'no1', 'B': 'aa','C': 'Genna','D': 'b'},
                    {'ID': 'no1', 'B': 'aa','C': 'Genna','D': 'c'},
                    {'ID': 'no2', 'B': 'Cythina','C': 'Oliver','D': 'x'},
                     {'ID': 'no2', 'B': 'Cythina','C': 'Oliver','D': 'y'},
                     {'ID': 'no2', 'B': 'Cythina','C': 'Olivia','D': 'x'},
                     {'ID': 'no2', 'B': 'Cythina','C': 'Olivia','D': 'y'},
                     {'ID': 'no2', 'B': 'Ben','C': 'Oliver','D': 'x'},
                     {'ID': 'no2', 'B': 'Ben','C': 'Oliver','D': 'y'},
                     {'ID': 'no2', 'B': 'Ben','C': 'Olivia','D': 'x'},
                      {'ID': 'no2', 'B': 'Ben','C': 'Olivia','D': 'y'},
                    ])

字符串
它目前看起来像这样：

ID  B          C    D
0   no1 Eric    George  a
1   no1 Eric    George  b
2   no1 Eric    George  c
3   no1 Eric    Genna   a
4   no1 Eric    Genna   b
5   no1 Eric    Genna   c
6   no1 aa      George  a
7   no1 aa      George  b
8   no1 aa      George  c
9   no1 aa      Genna   a
10  no1 aa      Genna   b
11  no1 aa      Genna   c
12  no2 Cythina Oliver  x
13  no2 Cythina Oliver  y
14  no2 Cythina Olivia  x
15  no2 Cythina Olivia  y
16  no2 Ben     Oliver  x
17  no2 Ben     Oliver  y
18  no2 Ben     Olivia  x
19  no2 Ben     Olivia  y

型
BCD列在每列之间没有关系。我希望每个BCD列和按ID分组的唯一值--B列中的唯一/独特值，C列中的独特值，D列中的独特值，如下所示：

ID B       C       D
0   no1 Eric    George  a
1   no1 aa      Genna   b
2   no1 NULL    NULL    c
3   no2 Cythina Oliver  x
4   no2 Ben     Olivia  y

型
一些ID在B下可能有13个唯一值，在C下没有值，在D下可能有5个唯一值。它确实有规律。

pandas

来源：https://stackoverflow.com/questions/76691288/how-to-get-unique-values-from-each-column-in-a-dataframe

2条答案

按热度按时间

qzwqbdag1#

IIUC，你可以试试itertools.zip_longest：

from itertools import zip_longest

def fn(x):
    b = x['B'].unique()
    c = x['C'].unique()
    d = x['D'].unique()
    return pd.DataFrame(zip_longest(b, c, d), columns=['B', 'C', 'D'])

out = sample.groupby('ID').apply(fn).droplevel(level=1).reset_index()
print(out)

字符串
图纸：

ID        B       C  D
0  no1     Eric  George  a
1  no1       aa   Genna  b
2  no1     None    None  c
3  no2  Cythina  Oliver  x
4  no2      Ben  Olivia  y

型

赞(0）回复(0）举报 2023-08-01

ssm49v7z2#

这里有一个方法：

(df.set_index('ID')
.where(lambda x: x.apply(lambda x: ~x.duplicated()))
.stack()
.to_frame()
.assign(cc = lambda x: x.groupby(level=[0,1]).cumcount())
.set_index('cc',append=True)[0]
.unstack(level=1)
.droplevel(1)
.reset_index())

字符串
输出量：

ID        B       C  D
0  no1     Eric  George  a
1  no1       aa   Genna  b
2  no1      NaN     NaN  c
3  no2  Cythina  Oliver  x
4  no2      Ben  Olivia  y

型

赞(0）回复(0）举报 2023-08-01

我来回答

pandas 如何从dataframe中的每一列获取唯一值

2条答案

相关问题

热门标签

最新问答