rust 从函数中返回一个Vec< f32>需要很多钱吗?

tf7tbtn2  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(92)

我是Rust的新手,在从函数返回Vec到main时面临着一个奇怪的性能问题。
给定这段代码,在发布编译模式下的运行时间是13 us。

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    let series: ChunkedArray<Float32Type>

    let start = Instant::now();
    let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    let mut result = Vec::<f32>::with_capacity(percentiles.len());
    for p in percentiles {
        match series.quantile(p as f64, QuantileInterpolOptions::Lower) {
            Ok(v) => result.push(v.unwrap()),
            _ => {}
        }
    }
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    println!("{}",result");
    Ok(())
}

字符串
运行时间:44.484µs
这很酷,但是一旦我将逻辑提取到另一个函数中,所花费的时间就变成了大约72 ms(!)。请注意,我在percentile函数中添加了另一个时间度量,以确保其运行时仍然超级快。

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    let series: ChunkedArray<Float32Type>

    let start = Instant::now();
    let percentiles = percentile(series);
    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    println!("{}",result)
    Ok(())
}

pub fn percentile(data: Series) -> Vec<f32> {
    let data = data;

    let start = Instant::now();
    let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    let mut result = Vec::<f32>::with_capacity(percentiles.len());
    for p in percentiles {
        match data.quantile(p as f64, QuantileInterpolOptions::Lower) {
            Ok(v) => result.push(v.unwrap()),
            _ => {}
        }
    }
    let duration = start.elapsed();
    println!("internal time: {:?}", duration);

    result
}


这导致:
内部时间:13.796µs
运行时间:73.91404ms
在debug中运行此代码版本将导致大约600 ms
我超级困惑。为什么从函数中返回值似乎要花费这么多?我错过了什么?
edit:测量在代码的第一个和快速版本中删除series的成本,达到241µs。
编辑二:
这是完整的输出

mr@Ripper:~/git/me/distributor$ cargo run -r
   Compiling distributor v0.1.0 (/home/or/git/me/distributor)
    Finished release [optimized] target(s) in 3.03s
     Running `target/release/distributor`
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709]
Time elapsed is: 44.484µs
mr@Ripper:~/git/me/distributor$ cargo run -r
   Compiling distributor v0.1.0 (/home/or/git/me/distributor)
    Finished release [optimized] target(s) in 2.76s
     Running `target/release/distributor`
internal time: 13.556µs
Time elapsed is: 73.22045ms
[0.15219125, 0.977695, 0.9928536, 1.0046335, 1.0219395, 552.4709

szqfcxe2

szqfcxe21#

由于result从未使用过,Rust可能已经优化了main函数的主体。您可能正在对此进行基准测试。

fn main() -> Result<(), ()> {
    // ... Some code for loading data, ends up with:
    // let series: ChunkedArray<Float32Type>

    let start = Instant::now();

    // let percentiles = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0];
    // let mut result = Vec::<f32>::with_capacity(percentiles.len());
    // for p in percentiles {
    //     match series.quantile(p as f64, QuantileInterpolOptions::Lower) {
    //         Ok(v) => result.push(v.unwrap()),
    //         _ => {}
    //     }
    // }

    let duration = start.elapsed();
    println!("Time elapsed is: {:?}", duration);
    Ok(())
}

字符串
通过将代码放入函数中,Rust优化器可能没有意识到函数调用是不必要的,所以它调用函数,做工作,并抛出结果。
您可以通过对结果执行某些操作来避免这种情况,例如将结果传递给black_box

相关问题