pandas 如果以下所有年份满足条件,则创建具有年份名称的列

km0tfn4u  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(137)

我有以下数据集

df = pd.DataFrame({
    'UID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Year': [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
    'Good?': [0, 1, 1, 0, 0, 1, 0, 1, 0]
})

对于每个UID,我试图找出第一个“良好?”值为1的年份值是什么,以及接下来的年份值满足条件“良好?”为1的年份值是什么。如果不满足条件,我想将该值指定为2017。
我似乎有一些问题的索引,因为它抛出一个'KeyError:######' -我想有些情况下我只有一个年份值,这会抛出一个错误。这是我目前得到的结果。

# Group the DataFrame by UID
groups = df.groupby('UID')

# Initialize an empty list to store the results
results = []

# Loop over each UID group
for uid, group in groups:
    # Find the first index with a Good value of 1
    first_good_index = group[group['Good?'] == 1].index[0]
    print(first_good_index)
    
    # Check if all following years have a Good value of 1
    if (group.loc[first_good_index+1:, 'Good?'] == 1).all():
        # If so, append the UID and the year of the first good row to the results list
        results.append((uid, group.loc[first_good_index, 'Year']))
    else:
        results.append((uid, 2017))

    
# Create a DataFrame from the results
results_df = pd.DataFrame(results, columns=['UID', 'First Good Year'])

# Print the results
print(results_df)

这些是预期的结果

results_df = pd.DataFrame({
    'UID': [1, 2, 3],
    'First Good Year': [2016, 2017, 2017],
})

results_df
ndasle7k

ndasle7k1#

用途:

#test 1 values
m = df['Good?'].eq(1)

#test if all values after first 1 is not 1
mask = m.groupby(df['UID']).cummax() & ~m

#filter UIDs with only 1 in Good column
df1 = df[~df['UID'].isin(df.loc[mask, 'UID']) & m]
print (df1)
   UID  Year  Good?
1    1  2016      1
2    1  2017      1
5    2  2017      1

#get first `IUD` wth append missing `UID` filled by 2017
out = (df1.drop_duplicates('UID')
          .set_index('UID')['Year']
          .reindex(df['UID'].unique(), fill_value=2017)
          .reset_index())
print (out)
   UID  Year
0    1  2016
1    2  2017
2    3  2017

相关问题