我正在尝试对重新平衡的数据集“churn_train”重新采样20%,或n = 158条记录,以获得“True "”Churn“列值。我收到一条错误消息。数据集不为空,因为我确定了它的形状和值计数。我如何解决此错误消息?任何帮助都将不胜感激。谢谢。
**数据框'churn':**下面是数据框的一些行。
State,Account Length,Area Code,Phone,Intl Plan,VMail Plan,VMail Message,Day Mins,Day Calls,Day Charge,Eve Mins,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,Intl Mins,Intl Calls,Intl Charge,CustServ Calls,Old Churn,Churn
"KS",128,415,"382-4657","no","yes",25,265.100000,110,45.070000,197.400000,99,16.780000,244.700000,91,11.010000,10.000000,3,2.700000,1,"False.","False"
"OH",107,415,"371-7191","no","yes",26,161.600000,123,27.470000,195.500000,103,16.620000,254.400000,103,11.450000,13.700000,3,3.700000,1,"False.","False"
"NJ",137,415,"358-1921","no","no",0,243.400000,114,41.380000,121.200000,110,10.300000,162.600000,104,7.320000,12.200000,5,3.290000,0,"False.","False"
"OH",84,408,"375-9999","yes","no",0,299.400000,71,50.900000,61.900000,88,5.260000,196.900000,89,8.860000,6.600000,7,1.780000,2,"False.","False"
"OK",75,415,"330-6626","yes","no",0,166.700000,113,28.340000,148.300000,122,12.610000,186.900000,121,8.410000,10.100000,3,2.730000,3,"False.","False"
"AL",118,510,"391-8027","yes","no",0,223.400000,98,37.980000,220.600000,101,18.750000,203.900000,118,9.180000,6.300000,6,1.700000,0,"False.","False"
"MA",121,510,"355-9993","no","yes",24,218.200000,88,37.090000,348.500000,108,29.620000,212.600000,118,9.570000,7.500000,7,2.030000,3,"False.","False"
"MO",147,415,"329-9001","yes","no",0,157.000000,79,26.690000,103.100000,94,8.760000,211.800000,96,9.530000,7.100000,6,1.920000,0,"False.","False"
"WV",141,415,"330-8173","yes","yes",37,258.600000,84,43.960000,222.000000,111,18.870000,326.400000,97,14.690000,11.200000,5,3.020000,0,"False.","False"
"IN",65,415,"329-6603","no","no",0,129.100000,137,21.950000,228.500000,83,19.420000,208.800000,111,9.400000,12.700000,6,3.430000,4,"True.","True"
字符串
我的验证码:
churn_train['Churn'].value_counts()
False 1913
True 320
Name: Churn, dtype: int64
to_resample = churn_train.loc[churn_train['Churn'] == "True"]
our_resample = to_resample.sample(n = 158, replace = True)
churn_train_rebal = pd.concat([churn_train, our_resample])
错误信息:
ValueError Traceback (most recent call last)
/var/folders/wv/42dn23fd1cb0czpvqdnb6zw00000gn/T/ipykernel_7751/2929105044.py in <module>
1 to_resample = churn_train.loc[churn_train['Churn'] == "True"]
----> 2 our_resample = to_resample.sample(n = 158, replace = True)
3 churn_train_rebal = pd.concat([churn_train, our_resample])
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py in sample(self, n, frac, replace, weights, random_state, axis, ignore_index)
5452 weights = sample.preprocess_weights(self, weights, axis)
5453
-> 5454 sampled_indices = sample.sample(obj_len, size, replace, weights, rs)
5455 result = self.take(sampled_indices, axis=axis)
5456
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/sample.py in sample(obj_len, size, replace, weights, random_state)
148 raise ValueError("Invalid weights: weights sum to zero")
149
--> 150 return random_state.choice(obj_len, size=size, replace=replace, p=weights).astype(
151 np.intp, copy=False
152 )
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: a must be greater than 0 unless no samples are taken
型
1条答案
按热度按时间brgchamk1#
删除行数为0的未使用类别:
字符串
在这种情况下:
型