我有以下数据集
df = pd.DataFrame({
'UID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Year': [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
'Good?': [0, 1, 1, 0, 0, 1, 0, 1, 0]
})
对于每个UID,我试图找出第一个“良好?”值为1的年份值是什么,以及接下来的年份值满足条件“良好?”为1的年份值是什么。如果不满足条件,我想将该值指定为2017。
我似乎有一些问题的索引,因为它抛出一个'KeyError:######' -我想有些情况下我只有一个年份值,这会抛出一个错误。这是我目前得到的结果。
# Group the DataFrame by UID
groups = df.groupby('UID')
# Initialize an empty list to store the results
results = []
# Loop over each UID group
for uid, group in groups:
# Find the first index with a Good value of 1
first_good_index = group[group['Good?'] == 1].index[0]
print(first_good_index)
# Check if all following years have a Good value of 1
if (group.loc[first_good_index+1:, 'Good?'] == 1).all():
# If so, append the UID and the year of the first good row to the results list
results.append((uid, group.loc[first_good_index, 'Year']))
else:
results.append((uid, 2017))
# Create a DataFrame from the results
results_df = pd.DataFrame(results, columns=['UID', 'First Good Year'])
# Print the results
print(results_df)
这些是预期的结果
results_df = pd.DataFrame({
'UID': [1, 2, 3],
'First Good Year': [2016, 2017, 2017],
})
results_df
1条答案
按热度按时间ndasle7k1#
用途: