pandas 海运:累计和和色调

eqoofvh9  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(94)

我在pandas中有以下dataframe:

data = {
    'idx': [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
    'hue_val': ["A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C","C",],
    'value': np.random.rand(30),
}

df = pd.DataFrame(data)

现在,我想通过跟随每个“hue_val”的“idx”,得到一个值的累积和的线图。因此,最终将是三条严格向上的曲线(因为它们是正数),一条用于“A”,“B”和“C”。
我在几个来源中找到了这段代码:

sns.lineplot(x="idx", y="value", hue="hue_val", data=df, estimator="cumsum")

这是行不通的,因为曲线和x轴都是假的:

bpsygsoo

bpsygsoo1#

您可以单独计算累计和并绘制结果:

df['cumsum'] = df.groupby('hue_val').value.transform('cumsum')
sns.lineplot(x="idx", y="cumsum", hue="hue_val", data=df)

neekobn8

neekobn82#

给定OP Dataframe

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {
    'idx': [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
    'hue_val': ["A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C","C",],
    'value': np.random.rand(30)
}

df = pd.DataFrame(data)

有两件事需要做:
1.计算每个hue_val的累积和
1.把它画出来

1.计算每个hue_val的累计和

为了计算累计和,可以使用pandas.DataFrame.groupbypandas.Series.cumsum。根据OP的要求,使用变量column作为选择要考虑的列的方法,如下所示

column = 'value'
df['cum_sum'] = df.groupby('hue_val')[column].cumsum()

当使用Numpy生成一些dataframe值时,也可以使用它来计算pandas.DataFrame.applynumpy.cumsum的cum sum,如下所示

df['cum_sum'] = df.groupby('hue_val')[column].apply(lambda x: np.cumsum(x))

2.画出来

然后可以用seaborn.lineplot绘制它,如下所示

sns.lineplot(data=df, x='idx', y='cum_sum', hue='hue_val')

注:

  • 使用一个变量来指定列,如果 Dataframe 有更多的列,则会使其更加用户友好,如下面所示(OP关注的问题之一),因为只需将变量更改为,比方说value3
data = {
    'idx': [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
    'hue_val': ["A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C","C",],
    'value': np.random.rand(30),
    'value1': np.random.rand(30),
    'value2': np.random.rand(30),
    'value3': np.random.rand(30),
}

df = pd.DataFrame(data)

column = 'value3'
df['cum_sum'] = df.groupby('hue_val')[column].cumsum()

sns.lineplot(data=df, x='idx', y='cum_sum', hue='hue_val')

相关问题