以编程方式修改pandas dataframe [已关闭]

e7arh2l6  于 2023-09-29  发布在  其他
关注(0)|答案(2)|浏览(94)

已关闭,此问题需要details or clarity。它目前不接受回答。
**想改善这个问题吗?**通过editing this post添加详细信息并澄清问题。

3天前关闭。
Improve this question
在下面的代码中,对于给定的日期,我根据weight_normalized获取Name。8月1日:根据选择的标准C,有3个选项A、B、C。我想要的是将上述结构中选择A或B的位置替换为C。

df_data = pd.DataFrame(
    {'date': ['2023-08-01', '2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02',  '2023-08-03', '2023-08-04', '2023-08-05'],
     'weight_normalized': [0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.7],
     'Name': ['A', 'B', 'C', 'A', 'C',
                 'A', 'B', 'ABC']})

df_data = (df_data
               .sort_values(["date", "weight_normalized", "Name"], ascending=[False, False, False])
               .drop_duplicates(subset="date")
               .drop(columns=["weight_normalized"])
               )

预期输出 Dataframe :

df_data = pd.DataFrame(
    {'date': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04', '2023-08-05'],
     'Name': ['C', 'C', 'C', 'C', 'ABC']})
xfb7svmp

xfb7svmp1#

这是你要找的吗

df_data = (df_data
               .sort_values(["date", "weight_normalized", "Name"], ascending=[False, False, False])
               .drop_duplicates(subset="date", keep='first')
               .drop(columns=["weight_normalized"])
               )

 # add code here to generate bad names; maybe starting with df_names.unique() and then updating
bad_names = ['A', 'B']
# update Name column
df_data.loc[(df_data['Name'].isin(bad_names)), "Name" ] = 'C'
df_data

输出量:

date Name
7  2023-08-05  ABC
6  2023-08-04    C
5  2023-08-03    C
4  2023-08-02    C
2  2023-08-01    C
bfnvny8b

bfnvny8b2#

这是另一个尝试。此示例将用Name替换 Dataframe 中 * 较早 * 发现的每个重叠:

def find_overlaps():
    current_overlaps = {}

    def _inner(group):
        group["Name"] = group["Name"].replace(current_overlaps)

        i = group["weight_normalized"].idxmax()
        name = group.loc[i, "Name"]

        for n in group.loc[(group["Name"] != name), "Name"].unique():
            current_overlaps[n] = name

        group = group.drop(columns="weight_normalized")

        return group.loc[i]

    return _inner

out = df_data.groupby("date", as_index=False).apply(find_overlaps())
print(out)

图纸:

date Name
0  2023-08-01    C
1  2023-08-02    C
2  2023-08-03    C
3  2023-08-04    C
4  2023-08-05  ABC

相关问题