如何在pandas中正确设置条件?

gkl3eglg  于 2023-09-29  发布在  其他
关注(0)|答案(2)|浏览(125)

我在pandas中有一个数据框。我需要检查横幅如何影响订单。这就是为什么我需要比较订单日期和“banners_show”和“banners_click”等事件的日期。当然,订单日期应晚于这两个事件的日期。而主要的事情,我应该数的金额等订单。This is my dataframe In标题可以是“banner_show”、“banner_click”、“order”,当然还有它们的日期。我希望我会得到的数额的用户谁作出了订单后,看到或点击的旗帜。

user         title         time
0  user_0  banner_click  2017-02-09 20:24:04   
1  user_0         order  2017-03-20 19:24:04  #user0 clicked a banner and then after a period of time he has bought
2  user_1   banner_show  2017-04-14 20:24:04 
3  user_1         order  2017-02-04 20:24:04 #user1 made an order earlier than he saw a banner, we don't need to count him
4  user_2         order  2017-08-12 20:24:04 #user2 made an order, then he clicked a banner and made an order, we should count him
5  user_2         order  2017-03-12 20:24:04
5  user_2  banner_click  2017-08-11 20:24:04      #

我试过这个:

banner_click = set(df.loc[df['title'].eq('banner_click'), 'user'])
order = set(df.loc[df['title'].eq('order'), 'user'])

unique_user_buy_from_click= len(banner_click & order)
print(unique_user_buy)

我还试图使用“查询”,但它不工作正确预期输出:unique users-2,total_made_orders -2(user_0和user 2匹配)
可重复的示例:

df = pd.DataFrame({'user': ['user_0', 'user_0', 'user_1', 'user_1', 'user_2', 'user_2','user_2'],
                   'title': ['banner_click', 'order', 'banner_show', 'order', 'order','order','banner_click']
                  })

如果有看到此事件后下订单的唯一用户的数量以及这些唯一用户下的所有订单的数量,那就更好了

sqougxex

sqougxex1#

要解决此问题,您需要执行以下操作:
1.按用户分组,然后按日期排序。
1.检查横幅显示或横幅单击事件是否在每个用户的订单事件之前。
1.对满足条件的用户进行计数。
下面是这个问题的答案:

# Grouping by user and sorting by date
grouped = df.groupby('user').apply(lambda x: x.sort_values('date'))

# Check for each user if they have a banner_show or banner_click event before an order
def check_order_after_banner(user_df):
    banner_show_indices = user_df[user_df['title'] == 'banner_show'].index.tolist()
    banner_click_indices = user_df[user_df['title'] == 'banner_click'].index.tolist()
    order_indices = user_df[user_df['title'] == 'order'].index.tolist()
    for order_index in order_indices:
        if any(banner_index < order_index for banner_index in banner_show_indices) or \
           any(banner_index < order_index for banner_index in banner_click_indices):
            return True
    return False
users_with_condition = grouped.groupby('user').apply(check_order_after_banner)
count_users = sum(users_with_condition)
print(count_users)

要实现这一点,您可以按用户对您的框架进行分组,然后按日期排序。之后,对于每个用户,您可以检查在订单事件之前是否有banner_show或banner_click事件。这里有一个解决方案:

# Grouping by user and sorting by date
grouped = df.groupby('user').apply(lambda x: x.sort_values('date'))
# Check for each user if they have a banner_show or banner_click event before an order
def check_order_after_banner(user_df):
    banner_show_indices = user_df[user_df['title'] == 'banner_show'].index.tolist()
    banner_click_indices = user_df[user_df['title'] == 'banner_click'].index.tolist()
    order_indices = user_df[user_df['title'] == 'order'].index.tolist()
    
    for order_index in order_indices:
        if any(banner_index < order_index for banner_index in banner_show_indices) or \
           any(banner_index < order_index for banner_index in banner_click_indices):
            return True
    return False

users_with_condition = grouped.groupby('user').apply(check_order_after_banner)
count_users = sum(users_with_condition)

print(count_users)

此解决方案将为您提供在查看或单击横幅后下订单的用户计数。

myzjeezk

myzjeezk2#

我会用途:

# ensure data is sorted by date
df = df.sort_values(by=['user', 'time'])

# flag rows after the banner was seen/clicked
banner_seen = (df['title'].str.startswith('banner')
               .groupby(df['user']).cummax()
              )

N = df.loc[df['title'].eq('order') & banner_seen, 'user'].nunique()

输出:2
中间体:

user         title                 time  banner_seen  order  order_after_banner
0  user_0  banner_click  2017-02-09 20:24:04         True  False               False
1  user_0         order  2017-03-20 19:24:04         True   True                True
3  user_1         order  2017-02-04 20:24:04        False   True               False
2  user_1   banner_show  2017-04-14 20:24:04         True  False               False
5  user_2         order  2017-03-12 20:24:04        False   True               False
5  user_2  banner_click  2017-08-11 20:24:04         True  False               False
4  user_2         order  2017-08-12 20:24:04         True   True                True

要获取每个用户在看到横幅后的订单计数,请执行以下操作:

(df['title'].eq('order') & banner_seen).groupby(df['user']).sum()

输出量:

user
user_0    1
user_1    0
user_2    1
Name: title, dtype: int64

相关问题