在python中比较两个不同的csv与key列

3hvapo4f  于 2023-03-21  发布在  Python
关注(0)|答案(2)|浏览(140)

我有2个csv命名为abc.csv和xyz.csv。我可以在Excel中使用VLOOKUP实现。想在Python中做。

abc.csv格式

"uid","isDisabled"
"user.0","active"
"user.1","Disabled"
"user.2","active"
"user.3","Disabled"
"user.4","active"
"user.5","Disabled"
"user.6","active"
"user.8","active"

xyz.csv格式

"uid","status"
"user.0","active"
"user.1","active"
"user.2","active"
"user.5","active"
"user.7","active"

寻找类似的结果。

输出.csv

"uid","status","abc_status"
"user.0","active","NOCHANGE"
"user.1","active","Disabled in ABC"
"user.2","active","NOCHANGE"
"user.5","active","Disabled in ABC"
"user.7","active","Does not exist in ABC"

这是可以实现的吗?
我用Excel试过了。没用Python。

wbgh16ku

wbgh16ku1#

导入包:

import pandas as pd

读取CSV文件:

abc_df = pd.read_csv("abc.csv")
xyz_df = pd.read_csv("xyz.csv")

基于uid合并这些 Dataframe :

merged_df = pd.merge(xyz_df, abc_df, on="uid", how="left")

定义检查状态的函数:

def status(row):
    if pd.isna(row["isDisabled"]):
        return "Does not exist in ABC"
    elif row["status"] == row["isDisabled"]:
        return "NOCHANGE"
    else:
        return "Disabled in ABC"

status()函数应用于merged_df的每一行:

merged_df["abc_status"] = merged_df.apply(status, axis=1)

输出 Dataframe :

output_df = merged_df[["uid", "status", "abc_status"]]

output_df保存为CSV文件:

output_df.to_csv("output.csv", index=False)

output.csv文件:

user.0,active,NOCHANGE
user.1,active,Disabled in ABC
user.2,active,NOCHANGE
user.5,active,Disabled in ABC
user.7,active,Does not exist in ABC
rqdpfwrv

rqdpfwrv2#

您可以使用csv模块完成此操作,方法是首先阅读ABC读入dict,将status键入uid:

import csv

abc_status_map: dict[str, str] = {}

with open("abc.csv", newline="") as f:
    reader = csv.reader(f)
    next(reader)  # discard header

    for row in reader:
        abc_status_map[row[0]] = row[1]

abc状态Map如下所示:

{
    "user.0": "active",
    "user.1": "Disabled",
    "user.2": "active",
    "user.3": "Disabled",
    "user.4": "active",
    "user.5": "Disabled",
    "user.6": "active",
    "user.8": "active",
}

您的示例输出没有涵盖XYZ中禁用但ABC中激活的情况。我继续编写代码,以解决这种可能性:

def get_abc_status(uid: str, xyz_status: str) -> str:
    abc_status = abc_status_map.get(uid)

    if abc_status is None:
        return "Does not exist in ABC"

    if abc_status == xyz_status:
        return "NOCHANGE"

    if xyz_status == "active":
        return "Disabled in ABC"

    return "Enabled in ABC"

并修改了XYZ CSV来测试它:

| uid    | status   |
|--------|----------|
| user.0 | active   |
| user.1 | active   |
| user.2 | Disabled |  ← let's see what happens
| user.5 | active   |
| user.7 | active   |

创建最终行的列表,以将计算行追加到其中。使用输出标头初始化该列表:

final_rows = [["uid", "status", "abc_status"]]

然后读取XYZ,比较状态,追加计算出的最后几行,最后写入输出:

with open("xyz.csv", newline="") as f:
    reader = csv.reader(f)
    next(reader)

    for row in reader:
        uid = row[0]
        xyz_status = row[1]
        abc_status = get_abc_status(uid, xyz_status)

        final_rows.append([uid, xyz_status, abc_status])

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(final_rows)

这就给了我:

| uid    | status   | abc_status            |
| ------ | -------- | --------------------- |
| user.0 | active   | NOCHANGE              |
| user.1 | active   | Disabled in ABC       |
| user.2 | Disabled | Enabled in ABC        |  ← looks correct
| user.5 | active   | Disabled in ABC       |
| user.7 | active   | Does not exist in ABC |

相关问题