我如何重构下面的代码,以确保它更容易阅读和更好地使用一个函数。可以重现代码和 Dataframe 使用GitHub https://github.com/hamidpat/titanic posted csv使用我的github。
import numpy as np
import pandas as pd
train_df = pd.read_csv("train_df.csv")
test_df = pd.read_csv("test_df.csv.csv")
combine = [train_df, test_df]
guess_ages = np.zeros((2, 3))
for df in combine:
for i in range(0, 2):
for j in range(0, 3):
guess_df = df[(df['Sex'] == i) & (
df['Pclass'] == j + 1)]['Age'].dropna()
age_guess = guess_df.median()
guess_ages[i, j] = int(age_guess/0.5 + 0.5) * 0.5
for i in range(0, 2):
for j in range(0, 3):
df.loc[(df.Age.isnull()) & (df.Sex == i) & (
df.Pclass == j + 1), 'Age'] = guess_ages[i, j]
df.Age = df.Age.astype(int)
2条答案
按热度按时间tquggr8v1#
IIUC,您需要的是在
Age
为空时,用针对每个组('Sex', 'Pclass')
的公式替换Age
:之前:
之后:
blpfk2vs2#
我觉得你可以做一些