pandas 重构年龄插补

0md85ypi  于 2023-03-06  发布在  其他
关注(0)|答案(2)|浏览(154)

我如何重构下面的代码,以确保它更容易阅读和更好地使用一个函数。可以重现代码和 Dataframe 使用GitHub https://github.com/hamidpat/titanic posted csv使用我的github。

import numpy as np
import pandas as pd

train_df = pd.read_csv("train_df.csv")
test_df = pd.read_csv("test_df.csv.csv")
combine = [train_df, test_df]

guess_ages = np.zeros((2, 3))
for df in combine:
    for i in range(0, 2):
        for j in range(0, 3):
            guess_df = df[(df['Sex'] == i) & (
                df['Pclass'] == j + 1)]['Age'].dropna()
            age_guess = guess_df.median()
            guess_ages[i, j] = int(age_guess/0.5 + 0.5) * 0.5
    for i in range(0, 2):
        for j in range(0, 3):
            df.loc[(df.Age.isnull()) & (df.Sex == i) & (
                df.Pclass == j + 1), 'Age'] = guess_ages[i, j]

    df.Age = df.Age.astype(int)
tquggr8v

tquggr8v1#

IIUC,您需要的是在Age为空时,用针对每个组('Sex', 'Pclass')的公式替换Age

import numpy as np
import pandas as pd

train_df = pd.read_csv('train_df.csv', index_col=0)
test_df = pd.read_csv('test_df.csv', index_col=0)

guess_age = lambda x: int(x.median() / 0.5 + 0.5) * 0.5

train_df['Age'] = train_df['Age'].fillna(train_df.groupby(['Sex', 'Pclass'])['Age']
                                                 .transform(guess_age)).astype(int)

test_df['Age'] = test_df['Age'].fillna(test_df.groupby(['Sex', 'Pclass'])['Age']
                                              .transform(guess_age)).astype(int)

之前:

>>> train_df['Age'].isna().sum()
177

>>> test_df['Age'].isna().sum()
86

之后:

>>> train_df['Age'].isna().sum()
0

>>> test_df['Age'].isna().sum()
0
blpfk2vs

blpfk2vs2#

我觉得你可以做一些

train_set['Age'] = train_set['Age'].fillna(train_set.groupby(["Sex","Pclass"])['Age'].transform('median'))
test_set['Age'] = test_set['Age'].fillna(train_set.groupby(["Sex","Pclass"])['Age'].transform('median'))

相关问题