我如何用时间序列数据操纵这个pandas数据框,以便更容易使用?

up9lanfz  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(98)

I have a pandas dataframe with time series data, where the columns are looking like this:
| Customer | Item | Date | 00:00 | 00:30 | 01:00 | ... | 23:30 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| XYZ | A | 2020-01-01 | 0 | 1 | 2 | ... | 3 |
| XYZ | B | 2020-01-02 | 0 | 2 | 2 | ... | 5 |
| ABC | A | 2020-01-01 | 0 | 1 | 5 | ... | 3 |
| ABC | B | 2020-01-02 | 0 | 2 | 2 | ... | 1 |
So the hours are in the columns, instead of the rows. I want to manipulate this dataframe, concatenate the time columns into the date column, and make them a separate rows, like this:
| Customer | Date | Item A | Item B |
| ------------ | ------------ | ------------ | ------------ |
| XYZ | 2020-01-01 00:00 | 1 | 2 |
| XYZ | 2020-01-01 00:30 | 1 | 2 |
| XYZ | 2020-01-01 01:00 | 1 | 2 |
| XYZ | 2020-01-02 00:00 | 1 | 2 |
| XYZ | 2020-01-02 00:30 | 1 | 2 |
| XYZ | 2020-01-02 01:00 | 1 | 2 |
| ABC | 2020-01-01 00:00 | 2 | 3 |
| ABC | 2020-01-01 00:30 | 2 | 2 |
| ABC | 2020-01-01 01:00 | 4 | 2 |
| ABC | 2020-01-02 00:00 | 2 | 3 |
| ABC | 2020-01-02 00:30 | 2 | 2 |
| ABC | 2020-01-02 01:00 | 4 | 2 |
How can I do this? I tried a method using cross join, but that is very uneffective, because I have a lot of rows. (~100000)

qyswt5oh

qyswt5oh1#

您可以尝试以下操作(使用df数据框):

df["Date"] = pd.to_datetime(df["Date"])
df = (
    df.rename(columns={"Item": "Items"})
    .melt(id_vars=["Customer", "Items", "Date"], var_name="Time", value_name="Item")
    .assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["Time"] + ":00"))
    .drop(columns="Time")
    .pivot(index=["Customer", "Date"], columns="Items")
    .reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]

相关问题