pandas 在Python Dataframe 中，有没有什么方法可以查看两列是否相同，但具有重命名的值？

1szpjjfi 于 2023-01-01 发布在 Python

关注(0)|答案(4)|浏览(217)

例如，如果我有一个动物园中所有个体的大型 Dataframe ，其中两列是Animal_Common_Name和Animal_Scientific_Name，我怀疑其中一列是多余的，因为一个特征完全由另一个特征决定，反之亦然。基本上是相同的特征，但被重命名。
是否有选择两个不同列的函数告诉您这一点？

pandas

来源：https://stackoverflow.com/questions/74955875/is-there-any-way-in-a-python-dataframe-to-see-if-two-columns-are-the-same-but-wi

4条答案

按热度按时间

txu3uszq1#

假设这个例子：

Animal_Common_Name  Animal_Scientific_Name
0               Lion            Panthera leo
1            Giraffe  Giraffa camelopardalis
2               Lion            Panthera leo

使用factorize转换为分类整数，然后比较所有值是否相等：

(pd.factorize(df['Animal_Common_Name'])[0] == pd.factorize(df['Animal_Scientific_Name'])[0]).all()

输出：True
如果要标识多个关系：

df[df.groupby('Animal_Scientific_Name')['Animal_Common_Name'].transform('nunique').ne(1)]

交换的列名也是如此。

赞(0）回复(0）举报 2023-01-01

t5fffqht2#

df['Animal_Common_Name'].equals(df['Animal_Scientific_Name'])

如果它们相同，则返回True，否则返回False。

赞(0）回复(0）举报 2023-01-01

lsmepo6l3#

您可以使用pandas.Series.equals()方法。
例如：

import pandas as pd

data = {
    'Column1': [1, 2, 3, 4],
    'Column2': [1, 2, 3, 4],
    'Column3': [5, 6, 7, 8]
}

df = pd.DataFrame(data)

# True
print(df['Column1'].equals(df['Column2']))

# False
print(df['Column1'].equals(df['Column3']))

通过GeeksForGeeks找到

赞(0）回复(0）举报 2023-01-01

pieyvz9o4#

您可以使用Pandas的矢量化操作来快速确定冗余。下面是一个示例：

import pandas as pd

# create a sample dataframe from some data
d = {'name1': ['Zebra', 'Lion', 'Seagull', 'Spider'],
     'name2': ['Zebra', 'Lion', 'Bird', 'Insect']}
df = pd.DataFrame(data=d)

# create a new column for your test:
df['is_redundant'] = ''

# select your empty column where the redundancy exists:
df['is_redundant'][df['name1']==df['name2']] = 1

print(df)

    name1   name2   is_redundant
0   Zebra   Zebra   1
1   Lion    Lion    1
2   Seagull Bird    
3   Spider  Insect

然后，您可以将空值替换为0或保留原样，具体取决于您的应用。

赞(0）回复(0）举报 2023-01-01

我来回答

pandas 在Python Dataframe 中，有没有什么方法可以查看两列是否相同，但具有重命名的值？

4条答案

相关问题

热门标签

最新问答