pandas 从“LMP”列中减去“INT”列,但仅从每个唯一“ID”的索引行中减去

yyyllmsg  于 2023-05-27  发布在  其他
关注(0)|答案(2)|浏览(148)

我喜欢在DataFrame中创建一个名为“sub”的新列,并通过从“LMP”列中减去“INT”列来计算其值,但仅从每个唯一“ID”的最新行中减去“FM”列设置为“time0”,我计算FM如下,但我不知道如何实现子列。

data = {
    'ID': [0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
    'VIS': [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    'STA': [float('NaN'), 4.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
    'LMP': [float('NaN'), -35.0, 411.0, 773.0, 1143.0, 1506.0, float('NaN'), float('NaN'), float('NaN'), float('NaN'), float('NaN'), float('NaN')],
    'INT': [0.0, 0.0, 413.0, 777.0, 1171.0, 1509.0, 1967.0, 2310.0, 2627.0, 2970.0, 3357.0, 3768.0],
    'FM': [-1, -1, "time0", -1, -1, "time0", -1, -1, -1, -1, -1,-1]

}

sorted_data = pd.DataFrame(data)

sorted_data['FM'] = np.nan
for id in sorted_data['ID'].unique():
    filter_condition = (sorted_data['ID'] == id) & (~sorted_data['LMP'].isnull())
    if filter_condition.any():
        last_row_index = sorted_data.loc[filter_condition].index[-1]
        sorted_data.loc[last_row_index, 'FM'] = 'time0'

sorted_data['FM'] = sorted_data['FM'].fillna(-1)

预期输出应按下式计算:

'sub': [float('NaN'), 0-411.0,413-411, 777-1509.0 , 1171.0-1509.0 ,1509-1509, 1967.0-1509, 2310.0-1509,2627.0- 1509, 2970.0-1509, 3357.0-1509,3768.0-1509]
frebpwbc

frebpwbc1#

下面是在time0从LMP列中减去INT列的示例代码

data = {
    "ID": [0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
    "VIS": [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    "STA": [float("NaN"), 4.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
    "LMP": [
        float("NaN"),
        -35.0,
        411.0,
        773.0,
        1143.0,
        1506.0,
        float("NaN"),
        float("NaN"),
        float("NaN"),
        float("NaN"),
        float("NaN"),
        float("NaN"),
    ],
    "INT": [
        0.0,
        0.0,
        413.0,
        777.0,
        1171.0,
        1509.0,
        1967.0,
        2310.0,
        2627.0,
        2970.0,
        3357.0,
        3768.0,
    ],
    "FM": [-1, -1, "time0", -1, -1, "time0", -1, -1, -1, -1, -1, -1],
}

sorted_data = pd.DataFrame(data)

lmp_at_time0 = (
    sorted_data.groupby(["ID"])
    .apply(lambda grp: grp[grp["FM"] == "time0"]["LMP"])
    .reset_index()
    .drop(columns=["level_1"])
)
lmp_at_time0.columns = ["ID", "LMP_at_time0"]
sorted_data = sorted_data.merge(lmp_at_time0, on="ID", how="left")
sorted_data["INT"] - sorted_data["LMP_at_time0"]
zdwk9cvp

zdwk9cvp2#

IIUC,您可以用途:

# get the last time0 value per ID
mapper = (sorted_data
          .loc[sorted_data['FM'].eq('time0')]
          .drop_duplicates(subset='ID', keep='last')
          .set_index('ID')['LMP']
         )

# map and subtract
sorted_data['sub'] = sorted_data['INT'].sub(sorted_data['ID'].map(mapper))

输出:

ID   VIS  STA     LMP     INT     FM     sub
0    0   0.0  NaN     NaN     0.0     -1     NaN
1    1   0.0  4.0   -35.0     0.0     -1  -411.0
2    1   1.0  7.0   411.0   413.0  time0     2.0
3    2   2.0  7.0   773.0   777.0     -1  -729.0
4    2   3.0  7.0  1143.0  1171.0     -1  -335.0
5    2   4.0  7.0  1506.0  1509.0  time0     3.0
6    2   5.0  2.0     NaN  1967.0     -1   461.0
7    2   6.0  2.0     NaN  2310.0     -1   804.0
8    2   7.0  2.0     NaN  2627.0     -1  1121.0
9    2   8.0  2.0     NaN  2970.0     -1  1464.0
10   2   9.0  2.0     NaN  3357.0     -1  1851.0
11   2  10.0  2.0     NaN  3768.0     -1  2262.0

相关问题