pandas 如何在使用panda.cut和IntervalIndex后重命名分类?

xqkwcwgp  于 2022-12-28  发布在  其他
关注(0)|答案(2)|浏览(164)

我使用pandas.cut离散化了dataframe中的一列,其中bin由IntervalIndex.from_tuples创建。
剪切按预期工作,但是类别显示为我在IntervalIndex中指定的元组。有没有办法将类别重命名为不同的标签,例如(小,中,大)?
示例:

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

产生的类别为:

[NaN, (0, 1], NaN, (2, 3], (4, 5]]
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]

我正在尝试将[(0, 1] < (2, 3] < (4, 5]]更改为类似1, 2 ,3small, medium ,large的内容。
遗憾的是,在使用IntervalIndex时,pd. cut的labels参数参数被忽略。
谢谢!

    • 更新日期:**

感谢@SergeyBushmanov,我注意到这个问题只存在于试图更改 Dataframe 内的类别标签时(这是我正在尝试做的)。

In [1]: df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
In [2]: bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
In [3]: df['col1'] = pd.cut(df['col1'], bins)
In [4]: df['col1'].categories = ['small','med','large']

In [5]: df['col1']

Out [5]:
0       NaN
1    (0, 1]
2       NaN
3    (2, 3]
4    (4, 5]
Name: col1, dtype: category
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]
093gszye

093gszye1#

如果我们有一些数据:

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

您可以尝试重新分配类别,例如:

In [7]: x.categories = [1,2,3]

In [8]: x   
Out[8]: 
[NaN, 1, NaN, 2, 3]
Categories (3, int64): [1 < 2 < 3]

或:

In [9]: x.categories = ["small", "medium", "big"]                         

In [10]: x                                             
Out[10]: 
[NaN, small, NaN, medium, big]
Categories (3, object): [small < medium < big]
    • 更新**:
df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut(df["col1"].to_list(),bins)
x.categories = [1,2,3]
df['col1'] = x
df.col1
0    NaN
1      1
2    NaN
3      2
4      3
Name: col1, dtype: category
Categories (3, int64): [1 < 2 < 3]
    • 更新2**:

在更新版本的panda中,不使用x.categories = [1, 2, 3]重新分配类别,而是使用x.cat.rename_categories

labels = [1, 2, 3]
x.cat.rename_categories(labels, inplace=True)

labels可以是任何类型,并且在任何情况下,将保留在创建pd.IntervalIndex时设置的原始分类顺序。

mctunoxg

mctunoxg2#

series = pd.Series([0, 0.5, 1.5, 2.5, 4.5])

bins = [(0, 1), (2, 3), (4, 5)]
index = pd.IntervalIndex.from_tuples(bins)
intervals = index.values
names = ['small', 'med', 'large']
to_name = {interval: name for interval, name in zip(intervals, names)}

named_series = pd.Series(
    pd.CategoricalIndex(pd.cut(series, bins_index)).rename_categories(to_name)
)
print(named_series)

0      NaN
1    small
2      NaN
3      med
4    large
dtype: category
Categories (3, object): ['small' < 'med' < 'large']

相关问题