将数据框时间增量列导出到时间戳Excel列

yebdmbv4  于 2023-02-05  发布在  其他
关注(0)|答案(2)|浏览(138)

我有一个DataFrame,其中包含一个datetime64和一个timedelta64。不幸的是,我无法将后者导出到Excel文件中格式正确的hh:mm:ss列:

import pandas as pd

data = {
    "date": [
        "2023-02-05",
        "2023-02-05",
        "2022-12-02",
        "2022-11-29",
        "2022-11-18",
    ],
    "duration": [
        "01:07:48",
        "05:23:06",
        "02:41:58",
        "00:35:11",
        "02:00:20",
    ],
}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration'])

with pd.ExcelWriter(
    "df.xlsx",
    datetime_format="YYYY-MM-DD",
    engine="xlsxwriter",
) as writer:
    workbook = writer.book
    time_format = workbook.add_format({"num_format": "HH:MM:SS"})
    df.to_excel(writer, sheet_name="sheet", index=False)
    worksheet = writer.sheets["sheet"]
    worksheet.set_column("A:A", 20)
    worksheet.set_column("B:B", 50, cell_format=time_format)

生成的Excel文件将显示如下:

因此,ExcelWriter对象中的date_time已正确应用于列A,列B的宽度设置也已正确应用,但数字格式设置不起作用。
我哪里做错了?

8cdiaqws

8cdiaqws1#

问题在于Excel是以天为单位来度量时间的。例如,对于第一个值(1:07:48 = 4068 s),您得到的持续时间为(4048/(24*3600))天。
您可以在这里找到可能的解决方案:formatting timedelta64 when using pandas.to_excel

h43kikqp

h43kikqp2#

列格式未被应用的原因是Pandas将单元格数字格式“0”应用于timedelta值。单元格格式覆盖了列格式,因此未被应用。您可以通过在with语句末尾添加以下内容来验证这一点,您将看到它的格式与预期一致:

worksheet.write(7, 1, .5)

我不知道最好的解决方法是什么,但你可以迭代timedelta值,重写它们来覆盖Pandas格式的值。

import pandas as pd

data = {
    "date": [
        "2023-02-05",
        "2023-02-05",
        "2022-12-02",
        "2022-11-29",
        "2022-11-18",
    ],
    "duration": [
        "01:07:48",
        "05:23:06",
        "02:41:58",
        "00:35:11",
        "02:00:20",
    ],
}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration'])

with pd.ExcelWriter(
    "df.xlsx",
    datetime_format="YYYY-MM-DD",
    engine="xlsxwriter",
) as writer:
    workbook = writer.book
    time_format = workbook.add_format({"num_format": "HH:MM:SS"})
    df.to_excel(writer, sheet_name="sheet", index=False)
    worksheet = writer.sheets["sheet"]
    worksheet.set_column("A:A", 20)
    worksheet.set_column("B:B", 50, cell_format=time_format)

    col = df.columns.get_loc('duration')
    for row, timedelta in enumerate(df['duration'], 1):
        worksheet.write(row, col, timedelta)

输出:

您还可以将时间增量转换回一个数字(就像Pandas所做的那样),因为Excel中的日期或时间只是具有格式的数字。
类似于下面这样的东西,它将给予与上面相同的结果:

f = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration']).dt.total_seconds() / 86400

with pd.ExcelWriter(
    "df.xlsx",
    datetime_format="YYYY-MM-DD",
    engine="xlsxwriter",
) as writer:
    workbook = writer.book
    time_format = workbook.add_format({"num_format": "HH:MM:SS"})
    df.to_excel(writer, sheet_name="sheet", index=False)
    worksheet = writer.sheets["sheet"]
    worksheet.set_column("A:A", 20)
    worksheet.set_column("B:B", 50, cell_format=time_format)

相关问题