基于条件的Pandas滚动窗口选择与计算

nnsrf1az  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(105)

如何根据条件计算滚动窗口平均值?需要计算滚动窗口平均值,其中对于每个索引,我捕获范围〈400内的坐标差。
我需要将此添加为新列。
例如:

at Index 
cg13869341 = mean(cg13869341, cg14008030)
cg14008030 = mean(cg13869341, cg14008030) 
cg14008031 = mean(cg13869341)  
...
cg14008033 = mean(cg14008031,cg40826798, cg14008034, cg40826792)
....        
cg40826792 = mean(cg60826792, cg47454306, cg14008034, cg14008033, cg40826792)

示例数据集

Index       coordinate   rolling_mean
cg13869341  100         
cg14008030  200         
cg14008031  800         
cg40826798  900         
cg14008033  1000        
cg14008034  1050            
cg40826792  1250            
cg47454306  1500
rt4zxlrg

rt4zxlrg1#

使用您提供的 Dataframe :

import pandas as pd

df = pd.DataFrame(
    {
        "index": [
            "cg13869341",
            "cg14008030",
            "cg14008031",
            "cg40826798",
            "cg14008033",
            "cg14008034",
            "cg40826792",
            "cg47454306",
        ],
        "coordinate": [100, 200, 800, 900, 1000, 1050, 1250, 1500],
    }
)

以下是使用Pandas apply执行此操作的一种方法:

df["rolling_mean"] = df.apply(
    lambda x: df.loc[
        (df["coordinate"] >= x["coordinate"] - 400)
        & (df["coordinate"] <= x["coordinate"] + 400),
        "coordinate",
    ].mean(),
    axis=1,
)

然后道:

print(df)
# Output
        index  coordinate  rolling_mean
0  cg13869341         100         150.0
1  cg14008030         200         150.0
2  cg14008031         800         937.5
3  cg40826798         900        1000.0
4  cg14008033        1000        1000.0
5  cg14008034        1050        1000.0
6  cg40826792        1250        1140.0
7  cg47454306        1500        1375.0

相关问题