Pandas分组数据时间序列的线图

zrfyljdw  于 2023-08-01  发布在  其他
关注(0)|答案(3)|浏览(91)

我有属于不同类别的客户的月度数据。我想显示按客户类别分组的时间序列线图
以下是一个快照,包含3个客户的连续4个月的数据(“卷”),属于2个类别

df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

字符串
如何绘制线图(x轴上的时间段)来显示这两组中每一组的平均月成交量?

yrefmtwq

yrefmtwq1#

使用groupby.meanseaborn.lineplot怎么样:

import seaborn as sns
import matplotlib.dates as mdates

# optional: to ensure having a categorical palette
df['category'] = df['category'].astype('category')

tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
       ['volume'].mean().reset_index(name='average volume')
      )

ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')

# change labels
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

字符串
输出量:


的数据
更改打印宽度:

import matplotlib.pyplot as plt

# ...

fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)

# ...


输出量:


cyvaqqii

cyvaqqii2#

pivot_table在非常少的文字中提供了对期望的聚集形状的很好的控制。
您请求的时间序列:

import pandas as pd
import numpy as np

TS = pd.pivot_table(data    = df,
                    values  = ['volume'],
                    index   = ['month'],
                    columns = ['category'],
                    aggfunc = np.mean)

字符串
输出量:

volume    
category      1   2
month              
200101        1   7
200102        2   8
200103        3   9
200104        4  10


事实上,如其他响应者所建议的,事先转换为日期时间总是更可取的。Mozway对此表示赞同:

df['month'] = pd.to_datetime(df['month'], format='%Y%m')


情节如预期;添加任何你喜欢的装饰品。

TS.plot(figsize=(24,8), 
        ylabel = 'Monthly average volume')

beq87vna

beq87vna3#

此代码将创建一个线图,其中x轴为月份,y轴为成交量,每条线对应于一个客户类别。图例将显示与每行关联的客户类别。

import pandas as pd
import matplotlib.pyplot as plt

# Your DataFrame
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

# Convert the 'month' column to a pandas datetime format for proper sorting
df['month'] = pd.to_datetime(df['month'], format='%y%m%d')

# Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)

# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
    plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')

plt.xlabel('Month')
plt.ylabel('Volume')
plt.title('Time Series of Volume Grouped by Customer Category')
plt.legend()
plt.show()

字符串

相关问题