python 合并具有父/子关系的Pandas中的行

qni6mghb  于 2022-11-28  发布在  Python
关注(0)|答案(1)|浏览(149)

请考虑以下示例 Dataframe :

case_number parent_case_number              name                role    paid                            notes
0  NYC-22-1234               None      Bob Cratchit          Accountant   50000   Scrooge's favorite accountant.
1  LON-22-1446               None  Ebenezer Scrooge             Partner  950000  Charles Dickens would be proud.
2  CHI-21-0115               None        Bob Marley  Partner (Deceased)  425000                        Shackled.
3  NYC-22-1235        NYC-22-1234      Bob Cratchit          Accountant   30000    One of Scrooge's accountants.

这可以如下构造:

import pandas as pd

sample_data = [
    {
        "case_number": "NYC-22-1234",
        "parent_case_number": None,
        "name": "Bob Cratchit",
        "role": "Accountant",
        "paid": 50000,
        "notes": "Scrooge's favorite accountant.",
    },
    {
        "case_number": "LON-22-1446",
        "parent_case_number": None,
        "name": "Ebenezer Scrooge",
        "role": "Partner",
        "paid": 950000,
        "notes": "Charles Dickens would be proud.",
    },
    {
        "case_number": "CHI-21-0115",
        "parent_case_number": None,
        "name": "Bob Marley",
        "role": "Partner (Deceased)",
        "paid": 425000,
        "notes": "Shackled.",
    },
    {
        "case_number": "NYC-22-1235",
        "parent_case_number": "NYC-22-1234",
        "name": "Bob Cratchit",
        "role": "Accountant",
        "paid": 30000,
        "notes": "One of Scrooge's accountants.",
    },
]

df = pd.DataFrame(sample_data)

我想将子对象合并到其父对象中,其中parent_case_number指的是父对象case_number

  • 如果字段为空,则它应采用非空值(无论该值来自子字段还是父字段)。
  • 如果字段在两行中具有相同的值,则应保留其中一行。
  • 对于冲突的值(例如paid),应采用 * 最高 * 值。
  • 对于notes,它应该将子节点附加到父节点的注解中。
  • 它应在新列child_case_numbers中捕获已删除行的case_number

在此示例中,预期输出为:

case_number parent_case_number              name                role    paid                                                         notes child_case_numbers
0  NYC-22-1234               None      Bob Cratchit          Accountant   50000  Scrooge's favorite accountant. One of Scrooge's accountants.        NYC-22-1235
1  LON-22-1446               None  Ebenezer Scrooge             Partner  950000                               Charles Dickens would be proud.                NaN
2  CHI-21-0115               None        Bob Marley  Partner (Deceased)  425000                                                     Shackled.                NaN

我最初尝试过分组,但这假设某些列具有相同的值,而不是父/子关系。
我还考虑过将子元素移到一个单独的 Dataframe 中,然后合并回case_number= parent_case_number ',但不确定如何使用逻辑来影响合并的特定字段。
我还认为,按数据 * 类型 * 划分的方法也可能有效:

  • 任何
  • 如果父项或子项的值为空,则使用填充的值
  • 字符串:
  • 如果相同,则无变更
  • 如果不同,则将子项附加到父项
  • 数字(浮点数64)
  • 使用最大的数字(并且值始终“大于”NaN)

我该怎么做?

j8ag8udp

j8ag8udp1#

以下是一种方法:

# Merge relevant subsets on case_number/parent_case_number
new_df = pd.concat(
    [
        df[df["parent_case_number"].isna()].set_index("case_number"),
        df.dropna(subset="parent_case_number")
        .set_index("parent_case_number")
        .pipe(
            lambda df_: df_.rename(columns={col: f"child_{col}" for col in df_.columns})
        ),
    ],
    axis=1,
)

# Set new values
new_df["paid"] = new_df[["paid", "child_paid"]].max(axis=1)
new_df["name"] = new_df.apply(lambda x: x["name"] or x["child_name"], axis=1)
new_df["role"] = new_df.apply(lambda x: x["role"] or x["child_role"], axis=1)
new_df["notes"] = new_df["notes"] + " " + new_df["child_notes"].fillna("")

# Cleanup
new_df = (
    new_df[
        [col for col in new_df.columns if not col.startswith("child_")]
        + ["child_case_number"]
    ]
    .reset_index()
    .rename(columns={"index": "case_number"})
)

然后道:

print(new_df)
# Output
   case_number parent_case_number              name                role  \
0  NYC-22-1234               None      Bob Cratchit          Accountant   
1  LON-22-1446               None  Ebenezer Scrooge             Partner   
2  CHI-21-0115               None        Bob Marley  Partner (Deceased)   

       paid                                                         notes  \
0   50000.0  Scrooge's favorite accountant. One of Scrooge's accountants.   
1  950000.0                              Charles Dickens would be proud.    
2  425000.0                                                    Shackled.    

  child_case_number  
0       NYC-22-1235  
1               NaN  
2               NaN

相关问题