pandas 如何根据参考标志的位置计算相对值

eoxn13cs  于 2023-03-16  发布在  其他
关注(0)|答案(2)|浏览(139)

我有一个简单的表,其中IsReference列定义了指示Pmpp基值的标志,应根据该标志计算按Item分组的所有其他Pmpp值的相对值,如下图所示。类似地,我可以根据参考日期计算日期之间的差异。等。我希望得到一些提示,如何在Python中做到这一点。下面是一个代码,我开始。最好的问候

import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"], 
 'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
 'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
 'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data = d)

# find location of reference
ref = df['Pmpp'][df['IsReference'] == 1].values

# calculate relative values
df['Pmpp_norm'] = df.groupby('Item')['Pmpp'].apply(lambda x: x/ref)
db2dz4w8

db2dz4w81#

必须将引用值广播到所有行:

ref = df['Pmpp'].where(df['IsReference'] == 1).groupby(df['Item']).transform('max')

df['Pmpp_norm'] = df['Pmpp'] / ref

输出:

>>> df
   Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref

更新

您还可以使用Mapdict:

ref = df[df['IsReference'] == 1].set_index('Item')['Pmpp']

df['Pmpp_norm'] = df['Pmpp'] / df['Item'].map(ref)

输出:

>>> df
   Item  Pmpp  IsReference   TimeStamp  Pmpp_norm
0  dmc1     3            0  22.02.2023       0.75
1  dmc1     4            1  25.02.2023       1.00
2  dmc1     3            0  28.02.2023       0.75
3  dmc1     1            0   3.03.2023       0.25
4  dmc2     2            1  24.02.2023       1.00
5  dmc2     4            0  25.02.2023       2.00
6  dmc2     3            0   2.03.2023       1.50
7  dmc2     1            0   5.03.2023       0.50

>>> ref
Item
dmc1    4
dmc2    2
Name: Pmpp, dtype: int64
rpppsulh

rpppsulh2#

我觉得这应该行得通:

import pandas as pd

d = {'Item': ["dmc1", "dmc1", "dmc1", "dmc1", "dmc2", "dmc2", "dmc2", "dmc2"], 
 'Pmpp': [3, 4, 3, 1, 2, 4, 3, 1],
 'IsReference': [0, 1, 0, 0, 1, 0, 0, 0],
 'TimeStamp': ["22.02.2023", "25.02.2023", "28.02.2023", "3.03.2023", "24.02.2023", "25.02.2023", "2.03.2023", "5.03.2023"]
}

df = pd.DataFrame(data=d)

# Convert TimeStamp column to datetime format
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'], format='%d.%m.%Y')

# Calculate the reference Pmpp for each Item
reference_pmpp = df.loc[df['IsReference'] == 1, ['Item', 'Pmpp']].set_index('Item')['Pmpp']

# Calculate relative Pmpp values
df['Pmpp_norm'] = df.apply(lambda x: x['Pmpp'] / reference_pmpp.loc[x['Item']], axis=1)

# Calculate the reference TimeStamp for each Item
reference_timestamp = df.loc[df['IsReference'] == 1, ['Item', 'TimeStamp']].set_index('Item')['TimeStamp']

# Calculate the difference between dates based on reference date
df['Date_diff'] = df.apply(lambda x: (x['TimeStamp'] - reference_timestamp.loc[x['Item']]).days, axis=1)

print(df)

相关问题