Pandas-创建客户移动矩阵

dm7nw8vv  于 2023-03-06  发布在  其他
关注(0)|答案(2)|浏览(109)

我有一个如下所示的 Dataframe

customer_id,month,Group,category,days_ago
A1,Jan,Premium,saf,13
A1,Jan,Premium,ewf,54
A2,Jan,Lost,ds,32
A3,Jan,Lost,dfs,78
A4,Jan,Lost,sdfg,94
A5,Jan,Loyal,sa,14
A6,Jan,Need Attention,ewf,13

A1,Mar,Premium,efWCC,78
A2,Mar,Need Attention,POI
A3,Mar,Lost,QWE
A4,Mar,Need Attention,QOEP
A4,Mar,Need Attention,POTU
A5,Mar,Loyal,FANC
A6,Mar,Lost,FAS
A7,Mar,New,qewr
A8,Mar,New,wqer

t1 = pd.read_clipboard(sep=',')

我想做以下几点
a)根据一月和三月创建矩阵
B)在矩阵中填入每组的客户数
我希望我的输出在如下所示的表中

我试过下面的方法,但不知道如何把所有东西都放在一张整洁的table上

cust_info = t1.groupby(['customer_id','month','Group']).size().reset_index()
group_info = t1.groupby(['customer_id','Group']).size().reset_index()
group_info.merge(cust_info,on='customer_id',how='left')

有没有办法捕捉他们在JanMar这两个月之间从一个组到另一个组的移动?我有一个20K客户的大数据。有没有什么优雅的方法来生成下面的输出?

z9gpfhce

z9gpfhce1#

让我们做点积以提高计算效率

s = pd.crosstab([t1['month'], t1['Group']], t1['customer_id'])
s.loc['Jan'] @ s.loc['Mar'].T

结果

Group           Lost  Loyal  Need Attention  Premium
Group                                               
Lost               1      0               2        0
Loyal              0      1               0        0
Need Attention     1      0               0        0
Premium            0      0               0        1
7xzttuei

7xzttuei2#

使用merge的筛选 Dataframe ,crosstab

df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')], on='customer_id')

df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
print (df)
month_y                 Mar                             
Group_y                Lost Loyal Need Attention Premium
month_x Group_x                                         
Jan     Lost              1     0              2       0
        Loyal             0     1              0       0
        Need Attention    1     0              0       0
        Premium           0     0              0       1

IIUC使用:

df = df.drop_duplicates(['month','customer_id','customer_id', 'Group'])

df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')], 
                                     on='customer_id', 
                                     how='outer').fillna('no match')

df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
print (df)
month_y                  Mar                                 
Group_y                 Lost Loyal Need Attention New Premium
month_x  Group_x                                             
Jan      Lost              1     0              2   0       0
         Loyal             0     1              0   0       0
         Need Attention    1     0              0   0       0
         Premium           0     0              0   0       1
no match no match          0     0              0   2       0

相关问题