Rust Polars:如何获取DataFrame的行数?

jjhzyzn0  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(183)

我想过滤一个Polars DataFrame,然后获取行数。
我现在所做的似乎是工作,但感觉如此错误:

let item_count = item_df
        .lazy()
        .filter(not(col("status").is_in(lit(filter))))
        .collect()?
        .shape().0;

在随后的DataFrame操作中,我需要在除法操作中使用此

.with_column(
               col("count")
                   .div(lit(item_count as f64))
                   .mul(lit(100.0))
                   .alias("percentage"),
           );

这是一个很小的数据集(几十行),所以我不担心性能,但我想知道最好的方法是什么。

luaexgnf

luaexgnf1#

虽然LazyFrame上似乎没有预定义的方法,但您可以使用polars表达式:

use polars::prelude::*;

let df = df!["a" => [1, 2], "b" => [3, 4]].unwrap();
dbg!(df.lazy().select([count()]).collect().unwrap());
pqwbnv8z

pqwbnv8z2#

与_row_count一起使用:

fn with_row_count(self, name: &str, offset: Option<IdxSize>) -> LazyFrame
use std::error::Error;
use polars::prelude::*;

fn main() -> Result<(), Box<dyn Error>> {
    let dataframe_01: DataFrame = df!(
        "strings" => &["aa", "bb", "cc", "dd", "ee","ff"],
        "float64"  => [23.654, 0.319, 10.0049, 89.01999, -3.41501, 52.0766],
        "options"  => [Some(28), Some(300), None, Some(2), Some(-30), None],
    )?;

    println!("original: {dataframe_01}\n");

    let dataframe_02: DataFrame = dataframe_01
        .lazy()
        .with_row_count("count lines", Some(1u32))
        .collect()?;

    println!("with new column: {dataframe_02}\n");

    let new_col: Vec<u32> = dataframe_02
        .column("count lines")?
        .u32()?
        .into_iter()
        .map(|opt_u32| opt_u32.unwrap())
        .collect();

    println!("new_col: {new_col:?}");

    Ok(())
}

输出量:

original: shape: (6, 3)
┌─────────┬──────────┬─────────┐
│ strings ┆ float64  ┆ options │
│ ---     ┆ ---      ┆ ---     │
│ str     ┆ f64      ┆ i32     │
╞═════════╪══════════╪═════════╡
│ aa      ┆ 23.654   ┆ 28      │
│ bb      ┆ 0.319    ┆ 300     │
│ cc      ┆ 10.0049  ┆ null    │
│ dd      ┆ 89.01999 ┆ 2       │
│ ee      ┆ -3.41501 ┆ -30     │
│ ff      ┆ 52.0766  ┆ null    │
└─────────┴──────────┴─────────┘

with new column: shape: (6, 4)
┌─────────────┬─────────┬──────────┬─────────┐
│ count lines ┆ strings ┆ float64  ┆ options │
│ ---         ┆ ---     ┆ ---      ┆ ---     │
│ u32         ┆ str     ┆ f64      ┆ i32     │
╞═════════════╪═════════╪══════════╪═════════╡
│ 1           ┆ aa      ┆ 23.654   ┆ 28      │
│ 2           ┆ bb      ┆ 0.319    ┆ 300     │
│ 3           ┆ cc      ┆ 10.0049  ┆ null    │
│ 4           ┆ dd      ┆ 89.01999 ┆ 2       │
│ 5           ┆ ee      ┆ -3.41501 ┆ -30     │
│ 6           ┆ ff      ┆ 52.0766  ┆ null    │
└─────────────┴─────────┴──────────┴─────────┘

new_col: [1, 2, 3, 4, 5, 6]

相关问题