pandas 如何计算两列之间的差异并根据条件进行标记?

oipij1gg  于 2023-03-16  发布在  其他
关注(0)|答案(3)|浏览(199)

我有 Dataframe

Group  Required  stock
0     A        10      5
1     A        10      8
2     A        10      7
3     B        13      6
4     B        13      5
5     C         8      4
6     C         8      5
7     C         8      8
8     D        16    NaN

这里A、B、C、D所需的是[10,13,8,16],我各自的库存在上表中提到。我需要标记所有需要移动的行以及需要移动的数量
输出应为

Group  Required  stock  to_move flag
0     A        10    5.0      5.0  yes
1     A        10    8.0      5.0  yes
2     A        10    7.0      0.0   no
3     B        13    6.0      6.0  yes
4     B        13    5.0      5.0  yes
5     C         8    4.0      4.0  yes
6     C         8    5.0      4.0  yes
7     C         8    8.0      0.0   no
8     D        16    NaN      NaN   no
b0zn9rqh

b0zn9rqh1#

您可以使用groupby.cumsumclip来计算要移动而不会溢出的累积值,然后使用groupby.diff来反算单个值:

# compute the cumsum per group
# clip it to not go over the required value
s = df.groupby('Group')['stock'].cumsum().clip(upper=df['Required'].values)

# back calculate the incremental values
df['to_move'] = s.groupby(df['Group']).diff().fillna(s)

# assign the flag if a strictly positive value was moved
df['flag'] = np.where(df['to_move'].gt(0), 'yes', 'no')

输出:

Group  Required  stock  to_move flag
0     A        10    5.0      5.0  yes
1     A        10    8.0      5.0  yes
2     A        10    7.0      0.0   no
3     B        13    6.0      6.0  yes
4     B        13    5.0      5.0  yes
5     C         8    4.0      4.0  yes
6     C         8    5.0      4.0  yes
7     C         8    8.0      0.0   no
8     D        16    NaN      NaN   no
mnemlml8

mnemlml82#

用途:

#create cumulative sum per groups
s = df.groupby('Group')['stock'].cumsum()
#get difference with Required
diff = df['Required'].rsub(s)
#comapre if difference is less or equal like Stock
m = diff.le(df['stock'])

#subtract stock if diffrence less 0
df['to_move'] = df['stock'].sub(diff.where(diff.gt(0), 0)).where(m, 0)
#create Flag column
df['Flag'] = np.where(m, 'Yes', 'No')

print (df)
  Group  Required  stock  to_move Flag
0     A        10    5.0      5.0  Yes
1     A        10    8.0      5.0  Yes
2     A        10    7.0      0.0   No
3     B        13    6.0      6.0  Yes
4     B        13    5.0      5.0  Yes
5     C         8    4.0      4.0  Yes
6     C         8    5.0      4.0  Yes
7     C         8    8.0      0.0   No
8     D        16    NaN      0.0   No
wribegjk

wribegjk3#

您可以在Pandas中指定新的列:

>>> df = pd.DataFrame({'Group': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'D']})
>>> df
  Group
0     A
1     A
2     A
3     B
4     B
5     C
6     C
7     C
8     D
>>> df['to_move'] = ['Yes']*2+['No']+['Yes']*4+['No']*2
>>> df
  Group to_move
0     A     Yes
1     A     Yes
2     A      No
3     B     Yes
4     B     Yes
5     C     Yes
6     C     Yes
7     C      No
8     D      No

相关问题