如何解析日期字符串与天和月没有0填充在生 rust 版本的polars?

q7solyqu  于 2023-06-23  发布在  其他
关注(0)|答案(2)|浏览(92)

我正在阅读一个日期为年月日格式的csv文件(例如:“11/15/2022”)。但月和日没有0填充。下面是我的测试代码

use polars::prelude::*;
use polars_lazy::prelude::*;

fn main() {
    let df = df![
        "x" => ["1/4/2011", "2/4/2011", "3/4/2011", "4/4/2011"],
        "y" => [1, 2, 3, 4],
    ].unwrap();
    let lf: LazyFrame = df.lazy();

    let options = StrpTimeOptions {
        fmt: Some("%m/%d/%Y".into()),
        date_dtype: DataType::Date,
        ..Default::default()
    };

    let res = lf.clone()
    .with_column(col("x").str().strptime(options).alias("new time"))
    .collect().unwrap();

    println!("{:?}", res);

}

输出为

shape: (4, 3)
┌──────────┬─────┬──────────┐
│ x        ┆ y   ┆ new time │
│ ---      ┆ --- ┆ ---      │
│ str      ┆ i32 ┆ date     │
╞══════════╪═════╪══════════╡
│ 1/4/2011 ┆ 1   ┆ null     │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2/4/2011 ┆ 2   ┆ null     │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3/4/2011 ┆ 3   ┆ null     │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4/4/2011 ┆ 4   ┆ null     │

options中,我尝试了"%-m/%-d/%Y,而不是documentation中提到的"%m/%d/%Y。但它在运行时惊慌失措。

thread '<unnamed>' panicked at 'attempt to subtract with overflow', /home/xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/polars-time-0.21.1/src/chunkedarray/utf8/mod.rs:234:33

什么是正确的方式来阅读这种格式。我使用的是“Ubuntu 20.04.4 LTS”

ql3eal8s

ql3eal8s1#

您的Default使其使用错误的标志运行。您需要将exact设置为true

...
    let options = StrpTimeOptions {
        fmt: Some("%-m/%-d/%Y".into()),
        date_dtype: DataType::Date,
        exact: true,
        ..Default::default()
    };
...

完整的代码与填充包括测试:

use polars::prelude::*;
use polars_lazy::dsl::StrpTimeOptions;
use polars_lazy::prelude::{col, IntoLazy, LazyFrame};

fn main() {
    let df = df![
        "x" => ["01/04/2011", "2/4/2011", "3/4/2011", "4/4/2011"],
        "y" => [1, 2, 3, 4],
    ]
    .unwrap();
    let lf: LazyFrame = df.lazy();

    let options = StrpTimeOptions {
        fmt: Some("%-m/%-d/%Y".into()),
        date_dtype: DataType::Date,
        exact: true,
        ..Default::default()
    };

    let res = lf
        .clone()
        .with_column(col("x").str().strptime(options).alias("new time"))
        .collect()
        .unwrap();

    println!("{:?}", res);
}

输出:

shape: (4, 3)
┌────────────┬─────┬────────────┐
│ x          ┆ y   ┆ new time   │
│ ---        ┆ --- ┆ ---        │
│ str        ┆ i32 ┆ date       │
╞════════════╪═════╪════════════╡
│ 01/04/2011 ┆ 1   ┆ 2011-01-04 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2/4/2011   ┆ 2   ┆ 2011-02-04 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3/4/2011   ┆ 3   ┆ 2011-03-04 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4/4/2011   ┆ 4   ┆ 2011-04-04 │
└────────────┴─────┴────────────┘
pbgvytdp

pbgvytdp2#

适用于Rust Polars版本“0.30”。
关于Cargo. toml:

[dependencies]
polars = { version = "0.30", features = [
    "lazy", # Lazy API
    "dtype-date",
    # others features
] }

现在使用StrptimeOptions:

let time_options = StrptimeOptions {
        format: Some("%-m/%-d/%Y".into()),
        strict: false, // If set then polars will return an error if any date parsing fails
        exact: true,   // If polars may parse matches that not contain the whole string e.g. “foo-2021-01-01-bar” could match “2021-01-01”
        cache: true,   // use a cache of unique, converted dates to apply the datetime conversion.
    };

替换之后:

use polars::prelude::*;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let df: DataFrame = df![
        "x" => ["01/04/2011", "2/4/2011", "3/4/2011", "4/4/2011"],
        "y" => [1, 2, 3, 4],
    ]?;

    let time_options = StrptimeOptions {
        format: Some("%-m/%-d/%Y".into()),
        strict: false, // If set then polars will return an error if any date parsing fails
        exact: true,   // If polars may parse matches that not contain the whole string e.g. “foo-2021-01-01-bar” could match “2021-01-01”
        cache: true,   // use a cache of unique, converted dates to apply the datetime conversion.
    };

    let lz: LazyFrame = df
        .lazy()
        //.with_column(col("x").str().strptime(options).alias("new time"))
        .with_column(
            col("x")
            .str()
            .to_date(time_options)
            .alias("new time")
        );

    println!("result:\n{:?}", lz.collect()?);

    Ok(())
}

相关问题