一种将Pandas Dataframe 从行结构重构为重采样列结构的有效方法

1mrurvl1  于 2023-01-04  发布在  其他
关注(0)|答案(1)|浏览(107)

我有一个Pandas Dataframe 结构如下:

TimeStamp                                                                           
2022-12-30 10:31:58.483700+00:00       1        FixType            4    4.000000e+00
2022-12-30 10:31:58.483700+00:00       1     Satellites           11    1.100000e+01
2022-12-30 10:31:58.484150+00:00       2  TimeConfirmed            0    0.000000e+00
2022-12-30 10:31:58.484150+00:00       2          Epoch  63797521999    1.641638e+09
2022-12-30 10:31:58.484150+00:00       2      TimeValid            1    1.000000e+00
...                                  ...            ...          ...             ...
2022-12-30 10:54:32.714050+00:00       9   AngularRateZ         1020   -1.000000e+00
2022-12-30 10:54:32.714050+00:00       9  AccelerationY          513    1.250000e-01
2022-12-30 10:54:32.714050+00:00       9  AccelerationZ          594    1.025000e+01
2022-12-30 10:54:32.714050+00:00       9   AngularRateX         1025    2.500000e-01
2022-12-30 10:54:32.714050+00:00       9       ImuValid            1    1.000000e+00

[973528 rows x 4 columns]

我需要将其放入以下结构中,同时将其重采样到特定频率(例如1S):

FixType  Satellites  ...  AngularRateZ  ImuValid
TimeStamp                                       ...                        
2022-12-30 10:31:59+00:00      4.0        11.0  ...           NaN       NaN
2022-12-30 10:32:00+00:00      4.0        11.0  ...         -1.00       1.0
2022-12-30 10:32:01+00:00      4.0        12.0  ...         -1.00       1.0
2022-12-30 10:32:02+00:00      4.0        12.0  ...         -1.00       1.0
2022-12-30 10:32:03+00:00      4.0        12.0  ...         -1.00       1.0
...                            ...         ...  ...           ...       ...
2022-12-30 10:54:28+00:00      4.0        13.0  ...         -1.00       1.0
2022-12-30 10:54:29+00:00      4.0        14.0  ...         -1.00       1.0
2022-12-30 10:54:30+00:00      4.0        14.0  ...         -0.75       1.0
2022-12-30 10:54:31+00:00      4.0        14.0  ...         -1.00       1.0
2022-12-30 10:54:32+00:00      4.0        14.0  ...         -1.00       1.0

[1354 rows x 39 columns]

目前我通过以下代码实现这一点:

def restructure_data(df_phys, res):
    import pandas as pd

    df_phys_join = pd.DataFrame({"TimeStamp": []})
    if not df_phys.empty:
        for message, df_phys_message in df_phys.groupby("CAN ID"):
            for signal, data in df_phys_message.groupby("Signal"):

                col_name = signal

                df_phys_join = pd.merge_ordered(
                    df_phys_join,
                    data["Physical Value"].rename(col_name).resample(res).ffill().dropna(),
                    on="TimeStamp",
                    fill_method="none",
                ).set_index("TimeStamp")

    return df_phys_join

这是可行的,但似乎效率低下。我想知道是否有更聪明,也许更像Python的方法来达到类似的结果?

sd2nnvve

sd2nnvve1#

如果没有更多的数据/信息(是否有任何重复的时间/新列组合需要处理?),很难说这是否可行,但是您可以使用一行代码:

df.pivot_table(values="col4", index=pd.Grouper(freq="S", key="TimeStamp"), columns="col3")

在原始df中为values=columns=填写正确的列名。

相关问题