无法以特定格式排列Pandas Dataframe

mzsu5hc0  于 2023-03-16  发布在  其他
关注(0)|答案(1)|浏览(116)

以下是我的数据框

Actual      Expected
F00000191U  F00000191U
2022-09-30  2022-12-20
0.546437    0.546437638135
F00000191U  F00000191U
2022-09-30  2022-12-20
F00000191U  F00000191U
2022-09-30  2022-12-20
0.546437    0.517306000000000
F00000QAXU  F00000QAXU
2022-02-28  2022-07-31
E0USA00HCY  E0USA00HCY
0.864700    0.911518
F000016X81  F000016X81
0.446887    0.446887122753
F000016X81  F000016X81
0.446887    0.446887000000000
E0USA0073X  E0USA0073X
2023-03-07  2023-03-14
0.836902    0.993576
F00000SZIP  F00000SZIP

我希望它按以下格式排列。字母数字文本是实体ID,下面的日期和小数值与该实体ID相关。它只是日期和小数值相对于该特定实体ID的实际值和预期值。我只希望它按如下所示的水平方式排列。

在输出中,我没有使用以下代码为任何实体ID填充值列。

def process(df):
    actual_entity = None
    actual_date = None
    actual_value = None
    expected_entity = None
    expected_date = None
    expected_value = None
    for i, row in df.iterrows():
        if isinstance(row[0], str) and row[0].startswith('F') or isinstance(row[0], str) and row[0].startswith('E'):
            if actual_entity or actual_date or actual_value:
                yield {
                    'Actual Entity ID': actual_entity,
                    'Actual start date': actual_date,
                    'Actual Value': actual_value,
                    'Expected Entity ID': expected_entity,
                    'Expected date': expected_date,
                    'Expected val': expected_value
                }
                actual_date = None
                actual_value = None
                expected_date = None
                expected_value = None
            actual_entity = row[0]
            expected_entity = row[1]
        elif isinstance(row[0], str) and re.fullmatch(r'\d{4}-\d{2}-\d{2}', row[0]):
            actual_date = row[0]
            expected_date = row[1]
        elif isinstance(row[0], float):
            actual_value = row[0]
            expected_value = row[1]
    yield {
        'Actual Entity ID': actual_entity,
        'Actual start date': actual_date,
        'Actual Value': actual_value,
        'Expected Entity ID': expected_entity,
        'Expected date': expected_date,
        'Expected val': expected_value
    }

我得到的输出如下所示。

wb1gzix0

wb1gzix01#

您可以使用自定义的pivot,我在这里假设ID将开始一个新行,并且格式为LETTER-digits-LETTERS(如果需要,可以进行调整):

names = ['ID', 'Start date', 'Value']

# identify types of values
col = (df['Actual'].str.extract(r'((?:F|E)\d*)|(^\d{4}-\d{2}-\d{2}$)|(^\d+\.*\d*$)')
       .set_axis(names, axis=1).stack().index.get_level_values(1)
      )

# pivot
out = (df.assign(index=df.groupby(col).cumcount(),
                 col=col)
         .pivot(index='index', columns='col')
         .rename_axis(index=None)
       )

out.columns = out.columns.map('_'.join)

print(out)

输出:

Actual_ID Actual_Start date Actual_Value Expected_ID Expected_Start date     Expected_Value
0  F00000191U        2022-09-30     0.546437  F00000191U          2022-12-20     0.546437638135
1  F00000191U        2022-09-30     0.546437  F00000191U          2022-12-20  0.517306000000000
2  F00000191U        2022-09-30     0.864700  F00000191U          2022-12-20           0.911518
3  F00000QAXU        2022-02-28     0.446887  F00000QAXU          2022-07-31     0.446887122753
4  E0USA00HCY        2023-03-07     0.446887  E0USA00HCY          2023-03-14  0.446887000000000
5  F000016X81               NaN     0.836902  F000016X81                 NaN           0.993576
6  F000016X81               NaN          NaN  F000016X81                 NaN                NaN
7  E0USA0073X               NaN          NaN  E0USA0073X                 NaN                NaN
8  F00000SZIP               NaN          NaN  F00000SZIP                 NaN                NaN

可重现输入:

df = pd.DataFrame({'Actual': ['F00000191U', '2022-09-30', '0.546437', 'F00000191U', '2022-09-30', 'F00000191U', '2022-09-30', '0.546437', 'F00000QAXU', '2022-02-28', 'E0USA00HCY', '0.864700', 'F000016X81', '0.446887', 'F000016X81', '0.446887', 'E0USA0073X', '2023-03-07', '0.836902', 'F00000SZIP'],
                   'Expected': ['F00000191U', '2022-12-20', '0.546437638135', 'F00000191U', '2022-12-20', 'F00000191U', '2022-12-20', '0.517306000000000', 'F00000QAXU', '2022-07-31', 'E0USA00HCY', '0.911518', 'F000016X81', '0.446887122753', 'F000016X81', '0.446887000000000', 'E0USA0073X', '2023-03-14', '0.993576', 'F00000SZIP']})

相关问题