请考虑以下示例 Dataframe :
case_number parent_case_number name role paid notes
0 NYC-22-1234 None Bob Cratchit Accountant 50000 Scrooge's favorite accountant.
1 LON-22-1446 None Ebenezer Scrooge Partner 950000 Charles Dickens would be proud.
2 CHI-21-0115 None Bob Marley Partner (Deceased) 425000 Shackled.
3 NYC-22-1235 NYC-22-1234 Bob Cratchit Accountant 30000 One of Scrooge's accountants.
这可以如下构造:
import pandas as pd
sample_data = [
{
"case_number": "NYC-22-1234",
"parent_case_number": None,
"name": "Bob Cratchit",
"role": "Accountant",
"paid": 50000,
"notes": "Scrooge's favorite accountant.",
},
{
"case_number": "LON-22-1446",
"parent_case_number": None,
"name": "Ebenezer Scrooge",
"role": "Partner",
"paid": 950000,
"notes": "Charles Dickens would be proud.",
},
{
"case_number": "CHI-21-0115",
"parent_case_number": None,
"name": "Bob Marley",
"role": "Partner (Deceased)",
"paid": 425000,
"notes": "Shackled.",
},
{
"case_number": "NYC-22-1235",
"parent_case_number": "NYC-22-1234",
"name": "Bob Cratchit",
"role": "Accountant",
"paid": 30000,
"notes": "One of Scrooge's accountants.",
},
]
df = pd.DataFrame(sample_data)
我想将子对象合并到其父对象中,其中parent_case_number
指的是父对象case_number
。
- 如果字段为空,则它应采用非空值(无论该值来自子字段还是父字段)。
- 如果字段在两行中具有相同的值,则应保留其中一行。
- 对于冲突的值(例如
paid
),应采用 * 最高 * 值。 - 对于
notes
,它应该将子节点附加到父节点的注解中。 - 它应在新列
child_case_numbers
中捕获已删除行的case_number
在此示例中,预期输出为:
case_number parent_case_number name role paid notes child_case_numbers
0 NYC-22-1234 None Bob Cratchit Accountant 50000 Scrooge's favorite accountant. One of Scrooge's accountants. NYC-22-1235
1 LON-22-1446 None Ebenezer Scrooge Partner 950000 Charles Dickens would be proud. NaN
2 CHI-21-0115 None Bob Marley Partner (Deceased) 425000 Shackled. NaN
我最初尝试过分组,但这假设某些列具有相同的值,而不是父/子关系。
我还考虑过将子元素移到一个单独的 Dataframe 中,然后合并回case_number=
parent_case_number ',但不确定如何使用逻辑来影响合并的特定字段。
我还认为,按数据 * 类型 * 划分的方法也可能有效:
- 任何
- 如果父项或子项的值为空,则使用填充的值
- 字符串:
- 如果相同,则无变更
- 如果不同,则将子项附加到父项
- 数字(浮点数64)
- 使用最大的数字(并且值始终“大于”NaN)
我该怎么做?
1条答案
按热度按时间j8ag8udp1#
以下是一种方法:
然后道: