pandas 对多级列随机抽样

ohtdti5x  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(136)

级别列DataFrame,如下所示:

df

Solid             Liquid                Gas
pen paper pipe    water juice milk      oxygen nitrogen helium
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
4   4     7       3     2     0         6      7        9
3   7     9       4     6     5         3      3        4

我想要的是随机选择2列中的“固体”,“液体”,和“气体”与3子列与他们。
例如,如果随机选择固体和气体,则预期结果应为:

Solid             Gas
pen paper pipe    oxygen nitrogen helium
5   2     1       7      8        10
5   2     1       7      8        10
5   2     1       7      8        10
4   4     7       6      7        9
3   7     9       3      3        4

我试过这个代码,但它没有给予我同样的结果。

result = df.sample(n=5, axis=1)
result

[output]

Solid    Gas
pipe     oxygen
1        7
1        7
1        7
1        7
7        6
9        3

有没有人能帮我弄清楚这一点?谢谢:)

pcrecxhr

pcrecxhr1#

可以对第一级列进行采样,然后选择采样列:

df[pd.Series(df.columns.levels[0]).sample(2)]

或者使用random.sample函数:

import random
df[random.sample(df.columns.levels[0].tolist(),2)]
wlzqhblo

wlzqhblo2#

import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df

from itertools import zip_longest


arrays = [np.array(['Liquid', 'Liquid','Liquid', 'Solid', 'Solid','Solid', 'Gas', 'Gas', 'Gas']),
          np.array(['water', 'nitrogen', 'juice', 'pen', 'paper', 'nitrogen', 'oxygen', 'helium','nitrogen'])]

df = pd.DataFrame(np.random.randn(3, 9), columns=arrays)
print(df.to_string())

"""
     Liquid                         Solid                           Gas                    
      water  nitrogen     juice       pen     paper  nitrogen    oxygen    helium  nitrogen
0  0.778774  0.243654  0.823253 -0.608256 -0.415255  1.472267  1.474572 -0.002190  0.712878
1 -0.648450 -0.801950 -2.100596 -0.627754 -0.060161 -0.691433  1.170950  0.023768 -0.613677
2  0.901922  0.069219  1.919909 -1.460708 -0.216709 -1.922276  1.045664  0.528569  0.779230
"""


l0 = ['Liquid','Solid','Gas']
l1 = [['water','juice'],['pen'],['helium','nitrogen']]

aa = [pd.DataFrame({'a': a,'b':b}) for a,b in zip(l0,l1)]
print(aa)

"""

[        a      b
0  Liquid  water
1  Liquid  juice,        a    b
0  Solid  pen,      a         b
0  Gas    helium
1  Gas  nitrogen]
"""
bb = pd.concat(aa)
print(bb)

"""

        a         b
0  Liquid     water
1  Liquid     juice
0   Solid       pen
0     Gas    helium
1     Gas  nitrogen
"""
cc = pd.concat(aa).values
print(cc)


"""
[['Liquid' 'water']
 ['Liquid' 'coke']
 ['Solid' 'pen']
 ['Gas' 'helium']
 ['Gas' 'nitrogen']]
"""
dd = df[cc]
print(dd)

"""

     Liquid               Solid       Gas          
      water     juice       pen    helium  nitrogen
0 -1.484977 -1.202752  0.048415 -0.054465 -0.355568
1  0.906612  1.355189  1.653327  1.184810 -0.934969
2  0.091918 -0.737838  0.610323 -2.164317 -1.529826
"""

"""
In a similar way, if we want only 2 columns.
selected 2 items from Liquid and from Gas. Then :
"""
l2 = ['Liquid','Gas']
l3 = [['water','juice'],['helium','nitrogen']]

p = pd.concat([pd.DataFrame({'a':a,'b':b})for a,b in zip(l2,l3)]).values
print(p)
p1 = df[p]
print(p1)

"""
   Liquid                 Gas          
      water     juice    helium  nitrogen
0 -1.484977 -1.202752 -0.054465 -0.355568
1  0.906612  1.355189  1.184810 -0.934969
2  0.091918 -0.737838 -2.164317 -1.529826
"""

"""
If you want only the information of nitrogen.
"""
aa = df.iloc[ : , df.columns.get_level_values(1) =='nitrogen' ]
print(aa)

"""
     Liquid     Solid       Gas
   nitrogen  nitrogen  nitrogen
0  0.369143  1.762105 -0.887656
1  2.035025  0.317349 -0.896609
2 -1.570745  0.208936  0.979549
"""

相关问题