pandas 基于多个数据框列获取频率计数

5cg8jx4n  于 2023-09-29  发布在  其他
关注(0)|答案(3)|浏览(96)

我有下面的dataframe
| 组|大小|
| --|--|
| 短|小|
| 短|小|
| 中度|介质|
| 中度|小|
| 高大|大|
我想计算同一行在dataframe中出现的频率。

Group           Size      Time
Short          Small        2
Moderate       Medium       1 
Moderate       Small        1
Tall           Large        1
2lpgd968

2lpgd9681#

您可以使用groupby的size

import pandas as pd

# load the sample data
data = {'Group': ['Short', 'Short', 'Moderate', 'Moderate', 'Tall'], 'Size': ['Small', 'Small', 'Medium', 'Small', 'Large']}
df = pd.DataFrame(data)

选项1:

dfg = df.groupby(by=["Group", "Size"]).size()

# which results in a pandas.core.series.Series
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

备选方案二:

dfg = df.groupby(by=["Group", "Size"]).size().reset_index(name="Time")

# which results in a pandas.core.frame.DataFrame
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1

备选方案3:

dfg = df.groupby(by=["Group", "Size"], as_index=False).size()

# which results in a pandas.core.frame.DataFrame
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1
e4eetjau

e4eetjau2#

  • 更新 * 后pandas 1.1 value_counts现在接受多列
df.value_counts(["Group", "Size"])

您也可以尝试pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)

Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

编辑:为了让你的输出

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0
bgibtngc

bgibtngc3#

其他可能性是使用.pivot_table()aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')

相关问题