scipy 如何在python中编写while循环函数进行winsorizing

vmdwslir 于 11个月前发布在 Python

关注(0)|答案(1)|浏览(173)

我有以下功能：

from scipy.stats.mstats import winsorize 
import pandas as pd

# winsorize function
def winsor_try1(var, lower, upper):
    var = winsorize(var,limits=[lower,upper])
    ''' 
    Outliers Calculation using IQR 
    ''' 
    q1, q3= np.percentile(var, [25, 75])                 # q1,q3 calc
    iqr = q3 - q1                                        # iqr calc
    lower_bound = round(q1 - (1.5 * iqr),3)              # lower bound
    upper_bound = round(q3 + (1.5 * iqr),3)              # upper bound
    outliers = [x for x in var if x < lower_bound or x > upper_bound]  
    print('These would be the outliers:', set(outliers),'\n',
          'Total:', len(outliers),'.Upper bound & Lower bound:', lower_bound,'&',upper_bound)

# the variable 
df = pd.DataFrame({
    'age': [1,1,2,5,5,2,5,4,8,2,5,1,41,2,1,4,4,1,1,4,1,2,15,21,5,1,8,22,1,5,2,5,256,5,6,2,2,8,452]})

字符串
我想写一个while loop函数，我想在变量df['age']上应用函数winsor_try1，从lower = .01和upper = .01开始，直到len(outliers) = 0.
我的理由是：只要len(outliers) > 0，我想重复这个函数，直到我能找到极限，直到age分布中的离群值变为0。
期望的输出应该是这样的：

print('At limit =', i, 'there is no more outliers presented in the age variable.')

型
i =极限，其中len(outliers) = 0。

scipy

来源：https://stackoverflow.com/questions/77616113/how-to-write-a-while-loop-function-in-python-for-winsorizing

1条答案

按热度按时间

brvekthn1#

您可以将其视为标量根查找问题并使用scipy.optimize.root_scalar，而不是自己编写while循环。

import numpy as np
from scipy.stats.mstats import winsorize
from scipy.optimize import root_scalar 

# winsorize function
def winsor_try1(var, lower, upper):
    ''' 
    Compute the number of IQR outliers
    ''' 
    var = winsorize(var,limits=[lower,upper])
    q1, q3= np.percentile(var, [25, 75])                 # q1,q3 calc
    iqr = q3 - q1                                        # iqr calc
    lower_bound = round(q1 - (1.5 * iqr),3)              # lower bound
    upper_bound = round(q3 + (1.5 * iqr),3)              # upper bound
    outliers = [x for x in var if x < lower_bound or x > upper_bound]  
    return len(outliers)

# the variable 
var = np.asarray([1,1,2,5,5,2,5,4,8,2,5,1,41,2,1,4,4,1,1,4,1,2,15,21,5,1,8,22,1,5,2,5,256,5,6,2,2,8,452])

def fun(i):
  # try to find `i` at which there is half an outlier
  # it doesn't exist, but this should get closer to the transition
  return winsor_try1(var, i, i) - 0.5

# root_scalar tries to find the argument `i` that makes `fun` return zero
res = root_scalar(fun, bracket=(0, 0.5))

eps = 1e-6
print(winsor_try1(var, res.root + eps, res.root + eps))  # 0
print(winsor_try1(var, res.root - eps, res.root - eps))  # 6
res.root  # 0.15384615384656308

字符串
可能有更好的方法来解决这个问题，但我试图用类似于编写while循环的方式来回答这个问题。如果你想知道while循环是如何工作的，有很多关于bisection method或其他标量寻根算法的参考资料。

赞(0）回复(0）举报 11个月前

我来回答

scipy 如何在python中编写while循环函数进行winsorizing

1条答案

相关问题

热门标签

最新问答