使用GroupBy访问pandas列数据

ev7lccsx 于 2023-05-12 发布在其他

关注(0)|答案(2)|浏览(222)

我是新的Pandas，并试图组一个dataframe（从excel文件加载），然后循环通过每个独特的组数据，并对它执行额外的操作。对于本例，我们假设访问并打印组df中的列数据。
示例表数据
| 身份证|颜色|价值|
| --------------|--------------|--------------|
| 1|红色|二十|
| 1|红色|二十五|
| 1|蓝色|二十|
| 1|蓝色|十五岁|
| 1|绿色|五|
| 二|红色|一百|
| 二|蓝色|二十|
| 二|蓝色|二百二十|
| 二|蓝色|一百二十|
| 二|蓝色|二十|
| 二|绿色|二百二十|
| 二|绿色|一百二十|
| 二|绿色|四十五|
| 二|绿色|二|
| 二|绿色|五|
我的初始代码

df = pd.read_excel(test.xlsx')
grouped = df.grouby(['id'])

for key, values in grouped.groups.items():
     print("keys: {}".format(key))
     print("values: {}".format(values))

我想执行以下操作：
1.循环遍历每个组（“id”）
1.对于每个“id”，循环遍历每个颜色并打印出该组中每个颜色示例的值。
我正在努力解决如何遍历each group（'id'）并访问每个分组列数据。我已经尝试了上面的代码，只是为了了解迭代组是什么样子的。

pandas

来源：https://stackoverflow.com/questions/76185025/access-pandas-column-data-with-groupby

2条答案

按热度按时间

dl5txlt91#

这基本上打印出整个表，但我不确定您希望如何构造子组的打印。
但无论如何它可能会帮助你

for name, group in df.groupby("id"):
    print("id", name)
    for idx, row in group.iterrows():
        print("\trow index", idx, "color", row.color, "value", row.value)

打印

id 1
    row index 0 color Red value 20
    row index 1 color Red value 25
    row index 2 color Blue value 20
    row index 3 color Blue value 15
    row index 4 color Green value 5
id 2
    row index 5 color Red value 100
    row index 6 color Blue value 20
    row index 7 color Blue value 220
    row index 8 color Blue value 120
    row index 9 color Blue value 20
    row index 10 color Green value 220
    row index 11 color Green value 120
    row index 12 color Green value 45
    row index 13 color Green value 2
    row index 14 color Green value 5

编辑：

for name, group in df.groupby("id"):
    print("id", name)
    for color, subgroup in group.groupby("color"):
        print("\t", color, list(subgroup["value"]))

给予

id 1
     Blue [20, 15]
     Green [5]
     Red [20, 25]
id 2
     Blue [20, 220, 120, 20]
     Green [220, 120, 45, 2, 5]
     Red [100]

你在找那个吗？

赞(0）回复(0）举报 2023-05-12

nimxete22#

import pandas as pd
import numpy as np

data = pd.read_excel('test.csv.xlsx')

id_list = list(data['id'].unique())
color_list = list(data['color'].unique())

for i in id_list :
    for j in color_list :
    count = (data['color'][(data['color'] == j) & (data['id'] == i )]).count()
    print (f"The count of {j} in ID {i} is equal to {count}")

输出：

The count of Red in ID 1 is equal to 2 
The count of Blue in ID 1 is equal to 2
The count of Green in ID 1 is equal to 1
The count of Red in ID 2 is equal to 1
The count of Blue in ID 2 is equal to 4
The count of Green in ID 2 is equal to 5

赞(0）回复(0）举报 2023-05-12

我来回答

使用GroupBy访问pandas列数据

2条答案

相关问题

热门标签

最新问答