pandas 具有无限上/下界的panda切割

xesrikrc  于 2022-11-20  发布在  其他
关注(0)|答案(3)|浏览(206)

pandas cut() documentation声明:“超出界限的值在生成的分类对象中将为NA。”当上限不一定清楚或重要时,这会使问题变得困难。例如:

cut (weight, bins=[10,50,100,200])

将产生垃圾箱:

[(10, 50] < (50, 100] < (100, 200]]

所以cut (250, bins=[10,50,100,200])会产生NaNcut (5, bins=[10,50,100,200])也是一样,我想做的是在第一个例子中产生> 200,在第二个例子中产生< 10
我意识到我可以使用cut (weight, bins=[float("inf"),10,50,100,200,float("inf")])或类似的方法,但我所遵循的报表样式不允许使用(200, inf]。我也意识到我实际上可以通过cut()上的labels参数指定自定义标签,但这意味着每次调整bins时都要记住调整它们,这可能很频繁。
我是否已经尝试了所有的可能性,或者cut()pandas中是否有什么东西可以帮助我做到这一点?我正在考虑为cut()编写一个 Package 函数,它可以自动从垃圾箱中生成所需格式的标签,但我想先在这里检查一下。

5q4ezhmt

5q4ezhmt1#

您可以使用float("inf")作为区间列表中的上限,-float("inf")作为下限。这将删除NaN值。

bttbmeg0

bttbmeg02#

等了几天,还是没有答案--我想这可能是因为除了写cut() Package 函数之外,实在没有别的办法了。我在这里发布了我的版本,并将问题标记为已回答。如果有新的答案沿着,我会更改它。

def my_cut (x, bins,
            lower_infinite=True, upper_infinite=True,
            **kwargs):
    r"""Wrapper around pandas cut() to create infinite lower/upper bounds with proper labeling.

    Takes all the same arguments as pandas cut(), plus two more.

    Args :
        lower_infinite (bool, optional) : set whether the lower bound is infinite
            Default is True. If true, and your first bin element is something like 20, the
            first bin label will be '<= 20' (depending on other cut() parameters)
        upper_infinite (bool, optional) : set whether the upper bound is infinite
            Default is True. If true, and your last bin element is something like 20, the
            first bin label will be '> 20' (depending on other cut() parameters)
        **kwargs : any standard pandas cut() labeled parameters

    Returns :
        out : same as pandas cut() return value
        bins : same as pandas cut() return value
    """

    # Quick passthru if no infinite bounds
    if not lower_infinite and not upper_infinite:
        return pd.cut(x, bins, **kwargs)

    # Setup
    num_labels      = len(bins) - 1
    include_lowest  = kwargs.get("include_lowest", False)
    right           = kwargs.get("right", True)

    # Prepend/Append infinities where indiciated
    bins_final = bins.copy()
    if upper_infinite:
        bins_final.insert(len(bins),float("inf"))
        num_labels += 1
    if lower_infinite:
        bins_final.insert(0,float("-inf"))
        num_labels += 1

    # Decide all boundary symbols based on traditional cut() parameters
    symbol_lower  = "<=" if include_lowest and right else "<"
    left_bracket  = "(" if right else "["
    right_bracket = "]" if right else ")"
    symbol_upper  = ">" if right else ">="

    # Inner function reused in multiple clauses for labeling
    def make_label(i, lb=left_bracket, rb=right_bracket):
        return "{0}{1}, {2}{3}".format(lb, bins_final[i], bins_final[i+1], rb)

    # Create custom labels
    labels=[]
    for i in range(0,num_labels):
        new_label = None

        if i == 0:
            if lower_infinite:
                new_label = "{0} {1}".format(symbol_lower, bins_final[i+1])
            elif include_lowest:
                new_label = make_label(i, lb="[")
            else:
                new_label = make_label(i)
        elif upper_infinite and i == (num_labels - 1):
            new_label = "{0} {1}".format(symbol_upper, bins_final[i])
        else:
            new_label = make_label(i)

        labels.append(new_label)

    # Pass thru to pandas cut()
    return pd.cut(x, bins_final, labels=labels, **kwargs)
gstyhher

gstyhher3#

只需添加np.inf,例如:

import pandas as pd
import numpy as np

pd.cut(df['weight'], [0, 50, 100, np.inf], labels=['0-50', '50-100', '100-'])

相关问题