pandas 在python中重塑世界银行数据

64jmpszr  于 2023-02-14  发布在  Python
关注(0)|答案(1)|浏览(123)

我有以下数据(下图)
{'Country Name': {0: 'China', 1: 'China', 2: 'China'}, 'Country Code': {0: 'CHN', 1: 'CHN', 2: 'CHN'}, 'Series Name': {0: 'Consumer price index (2010 = 100)', 1: 'Age dependency ratio (% of working-age population)', 2: 'Age dependency ratio, old (% of working-age population)'}, '1972 [YR1972]': {0: '..', 1: '79.4811770762984', 2: '6.7804365054766'}, '1973 [YR1973]': {0: '..', 1: '78.8312385191076', 2: '6.83482919991518'}}
我想要的输出,去掉了年份上的存根名称,并重新调整为长格式

desired_output = pd.DataFrame({'Country Code': ['CHN', 'CHN'], 'Country Name' : ['China', 'China'],
                           'Year': ['1972', '1973'],
                           'Age dependency ratio' : ['79.4811770762984', '78.8312385191076' ],
                          'Age dependency ratio, old' : ['6.7804365054766', '6.83482919991518']})

我试过了,

df.pivot(index = ['Country Name', 'Country Code'], columns = 'Series Name', values = ['1972 [YR1972]', '1973 [YR1973]'])

它工作了,但仍然不是我想要的格式。
至于去掉后缀,真的想不出来,因为它每年都在变化。

xt0899hw

xt0899hw1#

如果您的原始日期框与您共享的日期框相似,您可以尝试以下解决方案。我们需要进行一些数据清理,然后才能实际透视并获得您想要的输出:

# First we extract future column names from your Series Name column and 
# assign it to a new column name, then we subset the resulting DataFrame
# of rows in where NaN values appear

df3 = (df.assign(Series_Name = lambda c: c['Series Name'].str.extract('(Age.*)(?=\s+\()'))
 .drop(columns='Series Name', axis=1).loc[lambda c: ~ c['Series_Name'].isnull()])

(df3.melt(id_vars=['Country Name', 'Country Code', 'Series_Name'],
         value_vars=['1972 [YR1972]', '1973 [YR1973]'], var_name='Year')
 .assign(Year = lambda c: c['Year'].str.extract('(\d+)(?=\s+)'))
 .pivot(index = ['Country Name', 'Country Code', 'Year'], columns='Series_Name', values='value'))



Series_Name                    Age dependency ratio Age dependency ratio, old
Country Name Country Code Year                                               
China        CHN          1972     79.4811770762984           6.7804365054766
                          1973     78.8312385191076          6.83482919991518

相关问题