转换Pandas数据框架,有点融化

wlp8pajw  于 2023-03-16  发布在  其他
关注(0)|答案(3)|浏览(113)

我有这个数据框:

pd.DataFrame({'day': [1, 1, 2, 2], 'category': ['a', 'b', 'a', 'b'],
              'min_feature1': [1, 2, 3, 4], 'max_feature1': [8, 9, 10, 11],
              'min_feature2': [2, 3, 4, 5], 'max_feature2': [6, 9, 12, 13]})

结果如下所示:
| 日|范畴|最小特征1|最大特征1|最小特征2|最大功能2|
| - ------|- ------|- ------|- ------|- ------|- ------|
| 1个|项目a|1个|八个|第二章|六个|
| 1个|B|第二章|九|三个|九|
| 第二章|项目a|三个|十个|四个|十二|
| 第二章|B|四个|十一|五个|十三|
我想转换这些数据,看起来像这样:

pd.DataFrame([[1, 'a', 'feature1', 1, 8],
 [1, 'a', 'feature2', 2, 6],
[1, 'b', 'feature1', 2, 9],
[1, 'b', 'feature2', 3, 9],
[2, 'a', 'feature1', 3, 10],
[2, 'a', 'feature2', 4, 12],
[2, 'b', 'feature1', 4, 11],
[2, 'b', 'feature2', 5, 13],], columns=['day', 'category', 'feature', 'min', 'max'])

| 日|范畴|特征|最小值|最大值|
| - ------|- ------|- ------|- ------|- ------|
| 1个|项目a|功能1|1个|八个|
| 1个|项目a|功能2|第二章|六个|
| 1个|B|功能1|第二章|九|
| 1个|B|功能2|三个|九|
| 第二章|项目a|功能1|三个|十个|
| 第二章|项目a|功能2|四个|十二|
| 第二章|B|功能1|四个|十一|
| 第二章|B|功能2|五个|十三|
我该怎么做呢?

soat7uwm

soat7uwm1#

一个选项使用带有多索引的自定义整形,先str.split,然后stack

(df.set_index(['day', 'category'])
   .pipe(lambda d: d.set_axis(d.columns.str.split('_', n=1, expand=True), axis=1))
   .rename_axis(columns=(None, 'features'))
   .stack().reset_index()
)

或者使用janitorpivot_longer

# pip install janitor
import janitor

out = df.pivot_longer(['day', 'category'], sort_by_appearance=True,
                      names_sep='_', names_to=('.value', 'feature'))

输出:

day category  features  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5
c2e8gylq

c2e8gylq2#

str.split用于MultiIndex,并通过DataFrame.stack进行整形:

df1 = df.set_index(['day','category'])
df1.columns= df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(columns=(None,'feature')).stack().reset_index()
print (df1)
   day category   feature  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

wide_to_long的另一个想法是:

df.columns = df.columns.str.replace(r'(\w+)_\s*(\w+)', r'\2_\1', regex=True)
df = (pd.wide_to_long(df, 
                     stubnames=['feature1','feature2'],
                     i=['day','category'], 
                     j='tmp',
                     sep='_', 
                     suffix=r'\w+').rename_axis(columns='feature')
       .stack()
       .unstack(2)
       .reset_index()
       .rename_axis(columns=None))
print (df)
   day category   feature  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5
i86rm4rw

i86rm4rw3#

您也可以使用melt作为替代:

out = (df.rename(columns=lambda x: tuple(m) if len(m := x.split('_')) > 1 else x)
         .melt(['day', 'category'])
         .assign(var1=lambda x: x['variable'].str[1], var2=lambda x: x['variable'].str[0])
         .pivot(index=['day', 'category', 'var1'], columns='var2', values='value')
         .rename_axis(columns=None).reset_index())

输出:

>>> out
   day category      var1  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

为了更好地理解转型,请循序渐进:

# Step 1: rename your columns
>>> out = df.rename(columns=lambda x: tuple(m) if len(m := x.split('_')) > 1 else x)
   day category  (min, feature1)  (max, feature1)  (min, feature2)  (max, feature2)
0    1        a                1                8                2                6
1    1        b                2                9                3                9
2    2        a                3               10                4               12
3    2        b                4               11                5               13

# Step 2: flatten your dataframe
>>> out = out.melt(['day', 'category'])
    day category         variable  value
0     1        a  (min, feature1)      1
1     1        b  (min, feature1)      2
2     2        a  (min, feature1)      3
3     2        b  (min, feature1)      4
4     1        a  (max, feature1)      8
5     1        b  (max, feature1)      9
...

# Step 3: expand variable column in two new variables
>>> out = out.assign(var1=lambda x: x['variable'].str[1], var2=lambda x: x['variable'].str[0])
    day category         variable  value      var1 var2
0     1        a  (min, feature1)      1  feature1  min
1     1        b  (min, feature1)      2  feature1  min
2     2        a  (min, feature1)      3  feature1  min
3     2        b  (min, feature1)      4  feature1  min
4     1        a  (max, feature1)      8  feature1  max
5     1        b  (max, feature1)      9  feature1  max
...

# Step 4: reshape your dataframe
>>> out = out.pivot(index=['day', 'category', 'var1'], columns='var2', values='value')
var2                   max  min
day category var1              
1   a        feature1    8    1
             feature2    6    2
    b        feature1    9    2
             feature2    9    3
2   a        feature1   10    3
             feature2   12    4
    b        feature1   11    4
             feature2   13    5

# Step 5: final output
>>> out = out.rename_axis(columns=None).reset_index()
   day category      var1  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

相关问题