pandas 如何将每个偶数列用于相应的奇数列并创建新的数据框- Python

dly7yett  于 2023-05-05  发布在  Python
关注(0)|答案(2)|浏览(113)

我读了一个巨大的Excel文件,其中每个偶数列(例如,0,2)是天数,下一列是位置。每个位置具有不同的长度。作为一个小例子,这里有一个样本 Dataframe 。

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

data = {
    'Unnamed: 0': [1.8, 2, 5.9],
    'Location A': [0.2, 0.3, 0.87],
    'Unnamed: 2': [6, 7],
    'Location B': [1.5, 2.0],
    'Unnamed: 4': [11],
    'Location C': [],
    'Unnamed: 6': [16.7, 17, 18, 19.6, 26,72.9],
    'Location D': [3.5, 4.0, 5.5, 6.0, 7.5, 8.0]
}

max_len = max([len(v) for v in data.values()])

for key in data.keys():
    if len(data[key]) < max_len:
        data[key].extend([np.nan] * (max_len - len(data[key])))

df = pd.DataFrame(data)

因为偶数列没有标题,所以当我使用pd.read_excel时,它们被保存为Unnamed。我想用下面的逻辑把它转换成一个新的 Dataframe 。设date_value = pd.to_datetime('2023-01-01', format='%Y-%m-%d') .
第一列是位置的名称,第二列是date_value + timedelta(days=days_to_add),其中days_to_add是Unnamed列中的值,第三列是Location列下的值。

Location A 2023-01-02 0.2
Location A 2023-01-03 0.3
Location A 2023-01-06 0.87
...

Location D 2023-03-12 8
gjmwrych

gjmwrych1#

unstackassign配合使用:

(df.iloc[:, 1::2].rename_axis(columns='Localtion').unstack()
   .reset_index(level=0, name='value')
   .assign(date=df.iloc[:,::2].mul(pd.Timedelta('1D')).add(date_value).unstack().to_numpy())
   .dropna(subset=['value','date'], how='all')
)

输出:

Localtion  value                date
0  Location A   0.20 2023-01-02 19:12:00
1  Location A   0.30 2023-01-03 00:00:00
2  Location A   0.87 2023-01-06 21:36:00
0  Location B   1.50 2023-01-07 00:00:00
1  Location B   2.00 2023-01-08 00:00:00
0  Location C    NaN 2023-01-12 00:00:00
0  Location D   3.50 2023-01-17 16:48:00
1  Location D   4.00 2023-01-18 00:00:00
2  Location D   5.50 2023-01-19 00:00:00
3  Location D   6.00 2023-01-20 14:24:00
4  Location D   7.50 2023-01-27 00:00:00
5  Location D   8.00 2023-03-14 21:36:00
xmakbtuz

xmakbtuz2#

使用unstackconcat

out = (pd.concat([df.iloc[:, 1::2].rename_axis(columns='Location')
                    .unstack().reset_index(0, name='value'),
                  pd.to_timedelta(df.iloc[:, 0::2].unstack()
                                    .droplevel(0), unit='D'
                                 ).add(date_value).rename('date')
                    
                 ], axis=1)
         .dropna(subset='value')
      )

输出:

Location  value                date
0  Location A   0.20 2023-01-02 19:12:00
1  Location A   0.30 2023-01-03 00:00:00
2  Location A   0.87 2023-01-06 21:36:00
0  Location B   1.50 2023-01-07 00:00:00
1  Location B   2.00 2023-01-08 00:00:00
0  Location D   3.50 2023-01-17 16:48:00
1  Location D   4.00 2023-01-18 00:00:00
2  Location D   5.50 2023-01-19 00:00:00
3  Location D   6.00 2023-01-20 14:24:00
4  Location D   7.50 2023-01-27 00:00:00
5  Location D   8.00 2023-03-14 21:36:00

相关问题