pandas Python:如何分解两列并设置前缀

iszxjhcz  于 2023-01-19  发布在  Python
关注(0)|答案(2)|浏览(129)

我有一个DataFrame,如下所示:

time         asks                                                                    bids
2022-01-01   [{'price':'0.99', 'size':'32213'}, {'price':'0.98', 'size':'23483'}]    [{'price':'1.0', 'size':'23142'}, {'price':'0.99', 'size':'6436'}]
2022-01-02   [{'price':'0.99', 'size':'33434'}, {'price':'0.98', 'size':'33333'}]    [{'price':'1.0', 'size':'343'}, {'price':'0.99', 'size':'2342'}]

...
2022-01-21   [{'price':'0.98', 'size':'32333'}, {'price':'0.98', 'size':'23663'}]    [{'price':'1.0', 'size':'23412'}]

我想分解要价和出价列,并设置前缀,以获得以下内容:

time        asks_price  asks_size   bids_price   bids_size
2022-01-01        0.99      32213          1.0       23142
2022-01-01        0.98      23483         0.99        6436
2022-01-02        0.99      33434          1.0         343
2022-01-02        0.98      33333         0.99        2342
...
2022-01-21        0.98      32333          1.0       23412
2022-01-21        0.98      23663          NaN         NaN

注意最后一行的bids_price和bids_size列下的值为NaN,因为没有对应的值。
如何利用Pandas实现这一目标?
编辑:以下是数据的一个片段:

{'time': {0: '2022-05-07T00:00:00.000000000Z',
  1: '2022-05-07T01:00:00.000000000Z',
  2: '2022-05-07T02:00:00.000000000Z',
  3: '2022-05-07T03:00:00.000000000Z',
  4: '2022-05-07T04:00:00.000000000Z'},
 'asks': {0: [{'price': '0.9999', 'size': '4220492'},
   {'price': '1', 'size': '2556759'},
   {'price': '1.0001', 'size': '941039'},
   {'price': '1.0002', 'size': '458602'},
   {'price': '1.0003', 'size': '257955'},
zengzsys

zengzsys1#

试试这样的方法:

df1 = df.explode(['asks', 'bids'])

df_asks = pd.json_normalize(df1['asks']).rename(columns={'price': 'asks_price', 'size':'asks_size'})
df_bids = pd.json_normalize(df1['bids']).rename(columns={'price': 'bids_price', 'size':'bids_size'})

df_final = df_asks.join(df_bids)

df_final输出:

|    |   asks_price |   asks_size |   bids_price |   bids_size |
|---:|-------------:|------------:|-------------:|------------:|
|  0 |         0.99 |       32213 |         1    |       23142 |
|  1 |         0.98 |       23483 |         0.99 |        6436 |
|  2 |         0.99 |       33434 |         1    |         343 |
|  3 |         0.98 |       33333 |         0.99 |        2342 |

设置:

df = pd.DataFrame({"time":[1,2], 
                   "asks":[
                       [{'price':'0.99', 'size':'32213'},
                        {'price':'0.98', 'size':'23483'}],
                       [{'price':'0.99', 'size':'33434'},
                        {'price':'0.98', 'size':'33333'}]], 
                   
                   "bids":[[{'price':'1.0', 'size':'23142'}, 
                            {'price':'0.99', 'size':'6436'}],    
                           [{'price':'1.0', 'size':'343'},
                            {'price':'0.99', 'size':'2342'}] 
                          ]})
k97glaaz

k97glaaz2#

尝试使用.explode()

df = df.explode(['asks','bids'])

df['asks_price'] = [dict(i)['price'] for i in df['asks']]
df['asks_size'] = [dict(i)['size'] for i in df['asks']]
df['bids_price'] = [dict(i)['price'] for i in df['bids']]
df['bids_size'] = [dict(i)['size'] for i in df['bids']]

相关问题