Python Pandas将 Dataframe 转换为整洁格式

clj7thdc  于 2022-12-10  发布在  Python
关注(0)|答案(3)|浏览(158)
dt = {'ID': [1, 1, 1, 1, 2, 2, 2, 2],   
            'Test': [‘Math’, 'Math', 'Writing', 'Writing', ‘Math’, 'Math', 'Writing', 'Writing', ‘Math’]
            'Year': ['2008', '2009', '2008', '2009', '2008', ‘2009’, ‘2008’, ‘2009’],
            'Fall': [15, 12, 22, 10, 12, 16, 13, 23]
            ‘Spring’: [16, 13, 22, 14, 13, 14, 11, 20]
            ‘Winter’: [19, 27, 24, 20, 25, 21, 29, 26]}
mydt = pd.DataFrame(dt, columns = ['ID', ‘Test’, 'Year', 'Fall', ‘Spring’, ‘Winter’])

所以我有上面的数据集。我怎样才能把上面的数据集转换成下面的样子呢?请告诉我。

iezvtpos

iezvtpos1#

下面是另一个解决方案:

import pandas as pd

data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2],
      'Test': ['Math', 'Math', 'Writing', 'Writing', 'Math', 'Math', 'Writing', 'Writing'],
      'Year': ['2008', '2009', '2008', '2009', '2008', '2009', '2008', '2009'],
      'Fall': [15, 12, 22, 10, 12, 16, 13, 23],
      'Spring': [16, 13, 22, 14, 13, 14, 11, 20],
      'Winter': [19, 27, 24, 20, 25, 21, 29, 26]}
df_data = pd.DataFrame(data, columns=['ID', 'Test', 'Year', 'Fall', 'Spring', 'Winter'])

df = df_data.melt(id_vars=['ID', 'Year', 'Test'], var_name='Quarter', value_name='Score')
df = df.pivot(index=['ID', 'Year', 'Quarter'], columns=['Test'], values=['Score'])
df.columns = df.columns.droplevel(level=0)
df = df.add_suffix('_Score').reset_index(drop=False)
carvr3hs

carvr3hs2#

您可以尝试使用set_indexstack + unstack

out = (df.set_index(['ID','Test','Year']).
            stack().unstack(level=1).
            add_suffix('_Score').reset_index())
out
Out[271]: 
Test  ID  Year level_2  Math_Score  Writing_Score
0      1  2008    Fall          15             22
1      1  2008  Spring          16             22
2      1  2008  Winter          19             24
3      1  2009    Fall          12             10
4      1  2009  Spring          13             14
5      1  2009  Winter          27             20
6      2  2008    Fall          12             13
7      2  2008  Spring          13             11
8      2  2008  Winter          25             29
9      2  2009    Fall          16             23
10     2  2009  Spring          14             20
11     2  2009  Winter          21             26
1dkrff03

1dkrff033#

这需要使用tidypandas进行两次旋转操作:

import pandas as pd
from tidypandas.tidy_accessor import tp

data = {'ID': [1, 1, 1, 1, 2, 2, 2, 2],
      'Test': ['Math', 'Math', 'Writing', 'Writing', 'Math', 'Math', 'Writing', 'Writing'],
      'Year': ['2008', '2009', '2008', '2009', '2008', '2009', '2008', '2009'],
      'Fall': [15, 12, 22, 10, 12, 16, 13, 23],
      'Spring': [16, 13, 22, 14, 13, 14, 11, 20],
      'Winter': [19, 27, 24, 20, 25, 21, 29, 26]}
df_data = pd.DataFrame(data, columns=['ID', 'Test', 'Year', 'Fall', 'Spring', 'Winter'])

>>> (df_data.tp.pivot_longer(cols = ['Fall', 'Spring', 'Winter'],
...                          names_to='quarter'
...                          )
...         .tp.pivot_wider(id_cols = ['ID', 'quarter', 'Year'],
...                         names_from = 'Test',
...                         values_from = 'value',
...                         names_prefix = 'score_'
...                         )
...         )
    Year quarter  ID  score_Math  score_Writing
0   2008    Fall   1          15             22
1   2009    Fall   1          12             10
2   2008  Spring   1          16             22
3   2009  Spring   1          13             14
4   2008  Winter   1          19             24
5   2009  Winter   1          27             20
6   2008    Fall   2          12             13
7   2009    Fall   2          16             23
8   2008  Spring   2          13             11
9   2009  Spring   2          14             20
10  2008  Winter   2          25             29
11  2009  Winter   2          21             26

相关问题