pandas 从缺失值超过5个的行中删除缺失值,然后打印每列中缺失值的百分比

wi3ka0sx  于 2022-11-20  发布在  其他
关注(0)|答案(7)|浏览(171)
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d= df.loc[df.isnull().sum(axis=1)>5]
d.dropna(axis=0,inplace=True)
print(round(100*(1-df.count()/len(df)),2))

我得到的输出为

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    1.30

dtype: float64

但输出

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06

dtype: float64
jm2pwxwz

jm2pwxwz1#

试试这个办法:

df.drop(df[df.isnull().sum(axis=1)>5].index,axis=0,inplace=True)

print(round(100*(1-df.count()/len(df)),2))
wfsdck30

wfsdck302#

我认为您正在尝试查找空值之和大于5的行索引。请使用np.where而不是df.loc来查找索引,然后删除它们。
请尝试:

import pandas as pd
import numpy as np
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d = np.where(df.isnull().sum(axis=1)>5)
df= df.drop(df.index[d])
print(round(100*(1-df.count()/len(df)),2))

输出:

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06
dtype: float64
ddhy6vgd

ddhy6vgd3#

试试这个,应该可以

df = df[df.isnull().sum(axis=1) <= 5]
print(round(100*(1-df.count()/len(df)),2))
a8jjtwal

a8jjtwal4#

尝试此解决方案

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*(df.isnull().sum()/len(df.index)),2))
vaj7vani

vaj7vani5#

这个应该可以

df = df.drop(df[df.isnull().sum(axis=1) > 5].index)

print(round(100 * (df.isnull().sum() / len(df.index)), 2))
6mw9ycah

6mw9ycah6#

{marks = marks[marks.isnull().sum(axis=1) < 5]
print(marks.isna().sum())}

请尝试这些,这将有助于

g0czyy6m

g0czyy6m7#

这是可行的:

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<5]
print(df.isnull().sum())

相关问题