pandas 在python3中将ndarray转换为dict

zysjyyx4  于 2023-03-06  发布在  Python
关注(0)|答案(3)|浏览(157)

我有一个如下所示的ndarray

LABEL1              99       113           2010-04-26 20:12:23+00:00
LABEL1              29       143           2010-05-06 20:12:23+00:00
LABEL1              99       323           2010-02-12 20:12:23+00:00
LABEL1              23       223           2010-04-25 20:12:23+00:00
LABEL2              23        23           2010-01-21 20:12:23+00:00
LABEL1             234       123           2010-12-26 20:12:23+00:00
LABEL1              93       133           2010-02-23 20:12:23+00:00
LABEL4              19      1223           2010-07-24 20:12:23+00:00

我需要做一些聚合并作为dict返回。
我最后应该得到的和这个差不多

[ 
  { 'LABEL1': { 'COLA':577,  'COLB': 1058, 'LAST': '2010-12-26 20:12:23+00:00' } },
  { 'LABEL2': { 'COLA':23,   'COLB': 23,   'LAST': '2010-01-21 20:12:23+00:00' } },
  { 'LABEL4': { 'COLA':19,   'COLB':1223,  'LAST': '2010-07-24 20:12:23+00:00' } }
]

我考虑的方法是转换为DataFrame,然后执行group(). agg ...

aggr = select_df.groupby('LABELS').agg({'LABELS': [('LABELS', 'max')], 'COLA': [('COLA', 'sum'), ('COLB', 'count')], {'LAST': [('LAST', 'max')]})

我对Python有点陌生......做这个所需的所有数据转换都是噩梦......
原始结构是一个列表

[
    { 'Label': 'xxxx', 'LABELS': 'xxxx', 'COLA': ##, 'COLB': ##, 'LAST': 'datetime' },...
  ]

如果我可以简单地直接聚合这个列表,然后与下一次传递(以块的形式读取列表)连接,以得到上面提到的最终列表...

kx7yvsdv

kx7yvsdv1#

你差点就得手了。
代码:

import pandas as pd

input = [
    {"LABELS": "LABEL1", "COLA": 99, "COLB": 113, "LAST": "2010-04-26 20:12:23+00:00"},
    {"LABELS": "LABEL1", "COLA": 29, "COLB": 143, "LAST": "2010-05-06 20:12:23+00:00"},
    {"LABELS": "LABEL1", "COLA": 99, "COLB": 323, "LAST": "2010-02-12 20:12:23+00:00"},
    {"LABELS": "LABEL1", "COLA": 23, "COLB": 223, "LAST": "2010-04-25 20:12:23+00:00"},
    {"LABELS": "LABEL2", "COLA": 23, "COLB": 23, "LAST": "2010-01-21 20:12:23+00:00"},
    {"LABELS": "LABEL1", "COLA": 234, "COLB": 123, "LAST": "2010-12-26 20:12:23+00:00"},
    {"LABELS": "LABEL1", "COLA": 93, "COLB": 133, "LAST": "2010-02-23 20:12:23+00:00"},
    {"LABELS": "LABEL4", "COLA": 19, "COLB": 1223, "LAST": "2010-07-24 20:12:23+00:00"},
]

df = (
    pd.DataFrame(input)
    .groupby(["LABELS"])
    .agg({"COLA": "sum", "COLB": "sum", "LAST": "max"})
)

print(df.to_dict("index"))

输出:

{'LABEL1': {'COLA': 577, 'COLB': 1058, 'LAST': '2010-12-26 20:12:23+00:00'}, 'LABEL2': {'COLA': 23, 'COLB': 23, 'LAST': '2010-01-21 20:12:23+00:00'}, 'LABEL4': {'COLA': 19, 'COLB': 1223, 'LAST': '2010-07-24 20:12:23+00:00'}}
kupeojn6

kupeojn62#

首先将其转换为 Dataframe :

    • 描述符:**
0       1   2   3
0   LABEL1  29  143 2010-05-06  20:12:23+00:00
1   LABEL1  99  323 2010-02-12  20:12:23+00:00
2   LABEL1  23  223 2010-04-25  20:12:23+00:00
3   LABEL2  23  23  2010-01-21  20:12:23+00:00
4   LABEL1  234 123 2010-12-26  20:12:23+00:00
5   LABEL1  93  133 2010-02-23  20:12:23+00:00
6   LABEL4  19  1223    2010-07-24  20:12:23+00:00
df.columns = ['label','x','y','z','w']
df.set_index('label').T.to_dict('dict')
    • 结果:**
{'LABEL1': {'x': 93, 'y': 133, 'z': '2010-02-23', 'w': '20:12:23+00:00'},
 'LABEL2': {'x': 23, 'y': 23, 'z': '2010-01-21', 'w': '20:12:23+00:00'},
 'LABEL4': {'x': 19, 'y': 1223, 'z': '2010-07-24', 'w': '20:12:23+00:00'}}
    • 编辑:**然后按标签分组并按总和、最大值聚合
df.groupby(["label"])\
    .agg({"x": "sum", "y": "sum", "z": "max", "w": "max"}).T.to_dict('dict')
    • 结果:**
{'LABEL1': {'x': 478, 'y': 945, 'z': '2010-12-26', 'w': '20:12:23+00:00'},
 'LABEL2': {'x': 23, 'y': 23, 'z': '2010-01-21', 'w': '20:12:23+00:00'},
 'LABEL4': {'x': 19, 'y': 1223, 'z': '2010-07-24', 'w': '20:12:23+00:00'}}
ppcbkaq5

ppcbkaq53#

data = '''
col1  COLA  COLB  LAST
LABEL1              99       113           2010-04-26 20:12:23+00:00
LABEL1              29       143           2010-05-06 20:12:23+00:00
LABEL1              99       323           2010-02-12 20:12:23+00:00
LABEL1              23       223           2010-04-25 20:12:23+00:00
LABEL2              23        23           2010-01-21 20:12:23+00:00
LABEL1             234       123           2010-12-26 20:12:23+00:00
LABEL1              93       133           2010-02-23 20:12:23+00:00
LABEL4              19      1223           2010-07-24 20:12:23+00:00
'''

代码:

df1.groupby('col1').agg({'COLA':sum,'COLB':sum,'LAST':'last'}).groupby(level=0)\
    .apply(lambda dd:dd.to_dict("index")).tolist()

输出:

[{'LABEL1': {'COLA': 577, 'COLB': 1058, 'LAST': '2010-02-23 20:12:23+00:00'}},
 {'LABEL2': {'COLA': 23, 'COLB': 23, 'LAST': '2010-01-21 20:12:23+00:00'}},
 {'LABEL4': {'COLA': 19, 'COLB': 1223, 'LAST': '2010-07-24 20:12:23+00:00'}}]

相关问题