多索引numpy数组的Pandas Dataframe

64jmpszr  于 2023-02-02  发布在  其他
关注(0)|答案(2)|浏览(150)

我正在使用一个名为array_test的numpy数组,它的形状为(5,359,2),这个数组用array_test.shape来检查,它反映了一个实验重复5次的平均值和不确定性。
其目的是能够估计5次重复实验中每次观察的平均值,并估计每次观察的总不确定度,也是5次重复的平均值。
我需要从它创建一个Pandas Dataframe ,我相信在一个多索引中,第一个级别将有5个值从第一个维度(命名简单'1','2'等),第二个将是'平均'和'不确定性'。
建议是非常欢迎的!

wqlqzqxt

wqlqzqxt1#

IIUC,您可能希望在numpy中聚合,然后构造一个DataFrame和堆栈:

a = np.random.random((5, 359, 2))

out = pd.DataFrame(a.mean(1), index=range(1, a.shape[0]+1),
                   columns=['mean', 'uncertainty']).stack()

输出(a系列):
对于数据框:

out = pd.DataFrame(a.mean(1), index=range(1, a.shape[0]+1),
                   columns=['mean', 'uncertainty']).stack().to_frame('value')

输出:

value
1 mean         0.499102
  uncertainty  0.511757
2 mean         0.480295
  uncertainty  0.473132
3 mean         0.500507
  uncertainty  0.519352
4 mean         0.505443
  uncertainty  0.493672
5 mean         0.514302
  uncertainty  0.519299
m1m5dgzv

m1m5dgzv2#

我将使用一个普通的Dataframe来处理它,但是要为观察和实验编号添加列。

import numpy as np
import pandas as pd

a = np.random.rand(5, 10, 2)

# Get the shape
n_experiments, n_observations, n_values = a.shape

# Reshape array into a 2-dimensional array
# (stacking experiments on top of each other)
a = a.reshape(-1, n_values)

# Create Dataframe and add experiment and observation number
df = pd.DataFrame(a, columns=["mean", "uncertainty"])

# This returns an array, like [0, 0, 0, 0, 0, 1, 1, 1, ..., 4, 4]
experiment = np.repeat(range(n_experiments), n_observations)
df["experiment"] = experiment
# This returns an array like [0, 1, 2, 3, 4, 0, 1, 2, ..., 3, 4]
observation = np.tile(range(n_observations), n_experiments)
df["observation"] = observation

Dataframe 现在看起来如下所示:

print(df.head(15))

      mean  uncertainty  experiment  observation
0   0.741436     0.775086           0            0
1   0.401934     0.277716           0            1
2   0.148269     0.406040           0            2
3   0.852485     0.702986           0            3
4   0.240930     0.644746           0            4
5   0.309648     0.914761           0            5
6   0.479186     0.495845           0            6
7   0.154647     0.422658           0            7
8   0.381012     0.756473           0            8
9   0.939797     0.764821           0            9
10  0.994342     0.019140           1            0
11  0.300225     0.992146           1            1
12  0.265698     0.823469           1            2
13  0.791907     0.555051           1            3
14  0.503281     0.249237           1            4

现在您可以分析 Dataframe (使用groupbymean):

# Only the mean 
print(df[['observation', 'mean', 'uncertainty']].groupby(['observation']).mean())

                 mean  uncertainty
observation                       
0            0.699324     0.506369
1            0.382288     0.456324
2            0.333396     0.324469
3            0.690545     0.564583
4            0.365198     0.555231
5            0.453545     0.596149
6            0.526988     0.395162
7            0.565689     0.569904
8            0.425595     0.415944
9            0.731776     0.375612

或者使用更高级的aggregate函数,这些函数可能对您的使用情况有用:

# Use aggregate function to calculate not only mean, but min and max as well
print(df[['observation', 'mean', 'uncertainty']].groupby(['observation']).aggregate(['mean', 'min', 'max']))


                 mean                     uncertainty                    
                 mean       min       max        mean       min       max
observation                                                              
0            0.699324  0.297030  0.994342    0.506369  0.019140  0.974842
1            0.382288  0.063046  0.810411    0.456324  0.108774  0.992146
2            0.333396  0.148269  0.698921    0.324469  0.009539  0.823469
3            0.690545  0.175471  0.895190    0.564583  0.260557  0.721265
4            0.365198  0.015501  0.726352    0.555231  0.249237  0.929258
5            0.453545  0.111355  0.807582    0.596149  0.101421  0.914761
6            0.526988  0.323945  0.786167    0.395162  0.007105  0.691998
7            0.565689  0.154647  0.813336    0.569904  0.302157  0.964782
8            0.425595  0.116968  0.567544    0.415944  0.014439  0.756473
9            0.731776  0.411324  0.939797    0.375612  0.085988  0.764821

相关问题