pandas 根据特定行和列值,在基于引用表的框架表中填充NaN单元格

xesrikrc  于 11个月前  发布在  其他
关注(0)|答案(4)|浏览(108)

我有两个表。第一个参考表如下:

| Name | Target  | Bonus |
|------|--------:|------:|
| Joe  |      40 |    46 |
| Phil |      38 |    42 |
| Dean |      65 |    70 |

字符串
生成表格的Python代码是:

# Data for the table
data = {
    'Name': ['Joe', 'Phil', 'Dean'],
    'Target': [40, 38, 65],
    'Bonus': [46, 42, 70]
}

# Creating the DataFrame
ref = pd.DataFrame(data)


我的第二个表格如下:

| week       | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023  | Target  |  40 |   65 |
| 11/6/2023  | Bonus   |  46 |   70 |
| 11/6/2023  | Score   |  33 |   71 |
| 11/13/2023 | Target  |  40 |  NaN |
| 11/13/2023 | Bonus   |  46 |  NaN |
| 11/13/2023 | Score   |  45 |  NaN |
| 11/20/2023 | Target  |  40 |   65 |
| 11/20/2023 | Bonus   |  46 |   70 |
| 11/20/2023 | Score   |  35 |   68 |
| 11/27/2023 | Target  | NaN |   65 |
| 11/27/2023 | Bonus   | NaN |   70 |
| 11/27/2023 | Score   | NaN |   44 |
| 12/4/2023  | Target  |  40 |   65 |
| 12/4/2023  | Bonus   |  46 |   70 |
| 12/4/2023  | Score   |  42 |   66 |


生成这个表的Python代码是:

# Data for the new table
data = {
    'week': ['11/6/2023', '11/6/2023', '11/6/2023', '11/13/2023', '11/13/2023', '11/13/2023',
             '11/20/2023', '11/20/2023', '11/20/2023', '11/27/2023', '11/27/2023', '11/27/2023',
             '12/4/2023', '12/4/2023', '12/4/2023'],
    'Metrics': ['Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
                'Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
                'Target', 'Bonus', 'Score'],
    'Joe': [40, 46, 33, 40, 46, 45, 40, 46, 35, None, None, None, 40, 46, 42],
    'Dean': [65, 70, 71, None, None, None, 65, 70, 68, 65, 70, 44, 65, 70, 66]
}

# Creating the DataFrame
df = pd.DataFrame(data)


正如你所看到的,Dean有一周的Target、Bonus和Score单元格是空的。Joe在接下来的一周也是如此。在这些单元格为NaN的特定情况下,我想使用以下规则填充它们:

  • 从第一个参考表中获取每个人的Target和Bonus单元格值,并相应地填充NaN单元格。
  • 将Score单元格设置为等于人员的Target单元格值。

我想要的输出表看起来像这样:

| week       | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023  | Target  |  40 |   65 |
| 11/6/2023  | Bonus   |  46 |   70 |
| 11/6/2023  | Score   |  33 |   71 |
| 11/13/2023 | Target  |  40 |   65 |
| 11/13/2023 | Bonus   |  46 |   70 |
| 11/13/2023 | Score   |  45 |   65 |
| 11/20/2023 | Target  |  40 |   65 |
| 11/20/2023 | Bonus   |  46 |   70 |
| 11/20/2023 | Score   |  35 |   68 |
| 11/27/2023 | Target  |  40 |   65 |
| 11/27/2023 | Bonus   |  46 |   70 |
| 11/27/2023 | Score   |  40 |   44 |
| 12/4/2023  | Target  |  40 |   65 |
| 12/4/2023  | Bonus   |  46 |   70 |
| 12/4/2023  | Score   |  42 |   66 |

xdyibdwo

xdyibdwo1#

每列最多只有一个NaN块

另一种可能的解决方案,循环遍历对应于每个人的df列,并为NaN的每个块(由loc标识)分配ref中相应的值块(也由loc标识):

names = ['Joe', 'Dean']

d = ref.assign(Score = ref['Target'])

for x in names:
    df.loc[df[x].isna(), x] = d.loc[d['Name'].eq(x), 'Target':'Score'].T.values

字符串

一般情况

如果每个人有多个NaN块,我们需要稍微修改代码:

names = ['Joe', 'Dean']

d = ref.assign(Score = ref['Target'])

for x in names:
    n_blocks = df[x].isna().sum() // 3
    df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Target':'Score']
                                      .values.flatten(), n_blocks)

编辑

为了满足OP的新要求:而不是为了目标,奖金和分数,它需要的顺序奖金,目标和分数。在这种情况下,我们需要重新调整以前的代码:

names = ['Joe', 'Dean']

d = ref.assign(Score = ref['Target'])
d = d[['Name', 'Bonus', 'Target', 'Score']]

for x in names:
    n_blocks = df[x].isna().sum() // 3
    df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Bonus':'Score']
                                      .values.flatten(), n_blocks)


输出量:

week Metrics   Joe  Dean
0    11/6/2023  Target  40.0  65.0
1    11/6/2023   Bonus  46.0  70.0
2    11/6/2023   Score  33.0  71.0
3   11/13/2023  Target  40.0  65.0
4   11/13/2023   Bonus  46.0  70.0
5   11/13/2023   Score  45.0  65.0
6   11/20/2023  Target  40.0  65.0
7   11/20/2023   Bonus  46.0  70.0
8   11/20/2023   Score  35.0  68.0
9   11/27/2023  Target  40.0  65.0
10  11/27/2023   Bonus  46.0  70.0
11  11/27/2023   Score  40.0  44.0
12   12/4/2023  Target  40.0  65.0
13   12/4/2023   Bonus  46.0  70.0
14   12/4/2023   Score  42.0  66.0

n9vozmp4

n9vozmp42#

我已经将第二个 Dataframe 的名称更改为df2,因为我们不能使用相同的名称:

# Iterate over each row in df2
for i, row in df2.iterrows():
    # For each person
    for person in ['Joe', 'Dean']:
        # If the value is NaN
        if pd.isnull(row[person]):
            # If the metric is 'Score', use the 'Target' value
            if row['Metrics'] == 'Score':
                value = df.loc[df['Name'] == person, 'Target'].values[0]
            # Otherwise, check if the metric exists in df and use its value
            elif row['Metrics'] in df.columns:
                value = df.loc[df['Name'] == person, row['Metrics']].values[0]
            else:
                continue  # Skip if the metric is not in df and is not 'Score'
            # Replace the NaN value in df2
            df2.at[i, person] = value

字符串
这应该符合你的目的。

5cnsuln7

5cnsuln73#

data = pd.DataFrame({
        'Name': ['Joe', 'Phil', 'Dean'],
        'Target': [40, 38, 65],
        'Bonus': [46, 42, 70]
    })
data["Score"] = data["Target"]
transposed = data.set_index('Name').transpose().rename(columns=data['Name'].to_dict())
#
Name    Joe  Phil  Dean
Target   40    38    65
Bonus    46    42    70
Score    40    38    65
#

data2 = data2.merge(transposed[["Joe","Dean"]], how="left", right_index=True, left_on="Metrics", suffixes=("","_filler"))
data2["Joe"] = data2["Joe"].fillna(data2["Joe_filler"])
data2["Dean"] = data2["Dean"].fillna(data2["Dean_filler"])
data2.drop(columns=["Joe_filler","Dean_filler"])
#
          week Metrics   Joe  Dean  Joe_filler  Dean_filler
0    11/6/2023  Target  40.0  65.0          40           65
1    11/6/2023   Bonus  46.0  70.0          46           70
2    11/6/2023   Score  33.0  71.0          40           65
3   11/13/2023  Target  40.0  65.0          40           65
4   11/13/2023   Bonus  46.0  70.0          46           70
5   11/13/2023   Score  45.0  65.0          40           65
6   11/20/2023  Target  40.0  65.0          40           65
7   11/20/2023   Bonus  46.0  70.0          46           70
8   11/20/2023   Score  35.0  68.0          40           65
9   11/27/2023  Target  40.0  65.0          40           65
10  11/27/2023   Bonus  46.0  70.0          46           70
11  11/27/2023   Score  40.0  44.0          40           65
12   12/4/2023  Target  40.0  65.0          40           65
13   12/4/2023   Bonus  46.0  70.0          46           70
14   12/4/2023   Score  42.0  66.0          40           65

字符串
我保留了最后两列(没有下降到位),所以你可以看到发生了什么与这些左合并和如何fillna工程。
应该有一个更干净的解决方案和一个更紧凑的,但作为一个灵感,这可能对你有用。

# solution 2
data = pd.DataFrame({
    'Name': ['Joe', 'Phil', 'Dean'],
    'Target': [40, 38, 65],
    'Bonus': [46, 42, 70]
})
data["Score"] = data["Target"]

transposed = data.set_index('Name').transpose().rename(columns=data['Name'].to_dict())
data2["Joe"] = np.where(data2["Joe"].isna(), data2["Metrics"].map(transposed["Joe"].to_dict()),data2["Joe"])
data2["Dean"] = np.where(data2["Dean"].isna(), data2["Metrics"].map(transposed["Dean"].to_dict()),data2["Dean"])

rsl1atfo

rsl1atfo4#

试试这个:

1.从'ref'创建引用Map:

ref['Score'] = ref['Target']
ref.set_index('Name', inplace=True)
ref_map = ref.to_dict('index')
print(ref_map)
>>>
{'Joe': {'Target': 40, 'Bonus': 46, 'Score': 40},
 'Phil': {'Target': 38, 'Bonus': 42, 'Score': 38},
 'Dean': {'Target': 65, 'Bonus': 70, 'Score': 65}}

字符串

2.在DataFrame 'df'中为每个指定人员的列填充''
columns_to_fill = ['Joe', 'Dean']

df[columns_to_fill] = df[columns_to_fill].apply(lambda x: x.fillna(df['Metrics']))
print(df)
>>>
          week Metrics     Joe    Dean
0    11/6/2023  Target    40.0    65.0
1    11/6/2023   Bonus    46.0    70.0
2    11/6/2023   Score    33.0    71.0
3   11/13/2023  Target    40.0  Target
4   11/13/2023   Bonus    46.0   Bonus
5   11/13/2023   Score    45.0   Score
6   11/20/2023  Target    40.0    65.0
7   11/20/2023   Bonus    46.0    70.0
8   11/20/2023   Score    35.0    68.0
9   11/27/2023  Target  Target    65.0
10  11/27/2023   Bonus   Bonus    70.0
11  11/27/2023   Score   Score    44.0
12   12/4/2023  Target    40.0    65.0
13   12/4/2023   Bonus    46.0    70.0
14   12/4/2023   Score    42.0    66.0

3.使用引用Map“ref_map”替换“df”中的值

result = df.replace(ref_map)
print(result)
>>>
          week Metrics   Joe  Dean
0    11/6/2023  Target  40.0  65.0
1    11/6/2023   Bonus  46.0  70.0
2    11/6/2023   Score  33.0  71.0
3   11/13/2023  Target  40.0  65.0
4   11/13/2023   Bonus  46.0  70.0
5   11/13/2023   Score  45.0  65.0
6   11/20/2023  Target  40.0  65.0
7   11/20/2023   Bonus  46.0  70.0
8   11/20/2023   Score  35.0  68.0
9   11/27/2023  Target  40.0  65.0
10  11/27/2023   Bonus  46.0  70.0
11  11/27/2023   Score  40.0  44.0
12   12/4/2023  Target  40.0  65.0
13   12/4/2023   Bonus  46.0  70.0
14   12/4/2023   Score  42.0  66.0

相关问题