pandas Dataframe列,该列组合两列并统计每个值出现的次数

wyyhbhjk  于 2023-08-01  发布在  其他
关注(0)|答案(1)|浏览(103)

我有一个数据框架,结构如下:

key ticket_id             ticket_category
    
CNI-1    T1               Reset
CNI-2    T2, T3           Reset, Error
CNI-3    T1, T4           Reset, Server
CNI-4    T4, T5, T6, T7   Server, Server, Server, Error
CNI-5    T1, T8, T9, T10  Reset, Reset, Error, Server

字符串
是否可以创建一个新的列来计算每个类别在一行中出现的次数以及匹配的票证?
所需输出:

key      ticket_id        ticket_category                 combined
    
CNI-1    T1               Reset                           Reset - 1(T1)
CNI-2    T2, T3           Reset, Error                    Reset - 1(T1), Error - 1(T2)
CNI-3    T1, T4           Reset, Server                   Reset - 1(T1), Server - 1(T4)
CNI-4    T4, T5, T6, T7   Server, Server, Server, Error   Server - 3(T4, T5, T6), Error - 1(T7)
CNI-5    T1, T8, T9, T10  Reset, Reset, Error, Server     Reset - 2(T1, T8), Error - 1(T9), Server - 1(T10)

tjjdgumg

tjjdgumg1#

您可以使用多个groupby来构建所需的数据框架。请参阅代码中的注解以获取解释:

data = pd.DataFrame({"key": [f"CNI-{i+1}" for i in range(5)],
                   "ticket_id": ["T1", "T2, T3", "T1, T4", "T4, T5, T6, T7", "T1, T8, T9, T10"],
                   "ticket_category": ["Reset", "Reset, Error", "Reset, Server", "Server, Server, Server, Error", "Reset, Reset, Error, Server"]
                   })

#convert strings to lists for explode
df = data.copy().set_index("key").apply(lambda x: x.str.split(", "))

#explode to get one row per ticket_id
df = df.explode(column=list(df.columns)).reset_index()

#get count of ticket_category per original row
df["count"] = df.groupby(["key","ticket_category"])["ticket_id"].transform("count")

#groupby and construct labels
df = df.groupby(["key", "ticket_category"], as_index=False, sort=False).agg({"count": "first", "ticket_id": ", ".join})
df["combined"] = df["ticket_category"] + " - " + df["count"].astype(str) + "(" + df["ticket_id"] + ")"

#get back original df structure
df = df.groupby("key")["ticket_category", "ticket_id", "combined"].agg(", ".join).reset_index()

#get back the original column from the original data
df["ticket_category"] = df["key"].map(dict(zip(data["key"],data["ticket_category"])))

>>> df
     key                ticket_category        ticket_id  \
0  CNI-1                          Reset               T1   
1  CNI-2                   Reset, Error           T2, T3   
2  CNI-3                  Reset, Server           T1, T4   
3  CNI-4  Server, Server, Server, Error   T4, T5, T6, T7   
4  CNI-5    Reset, Reset, Error, Server  T1, T8, T9, T10   

                                            combined  
0                                      Reset - 1(T1)  
1                       Reset - 1(T2), Error - 1(T3)  
2                      Reset - 1(T1), Server - 1(T4)  
3              Server - 3(T4, T5, T6), Error - 1(T7)  
4  Reset - 2(T1, T8), Error - 1(T9), Server - 1(T10)

字符串

相关问题