pandas 从缺失值超过5个的行中删除缺失值，然后打印每列中缺失值的百分比

wi3ka0sx 于 2022-11-20 发布在其他

关注(0)|答案(7)|浏览(171)

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d= df.loc[df.isnull().sum(axis=1)>5]
d.dropna(axis=0,inplace=True)
print(round(100*(1-df.count()/len(df)),2))

我得到的输出为

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    1.30

dtype: float64

但输出

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06

dtype: float64

pandas

来源：https://stackoverflow.com/questions/55207940/remove-the-missing-values-from-the-rows-having-greater-than-5-missing-values-and

7条答案

按热度按时间

jm2pwxwz1#

试试这个办法：

df.drop(df[df.isnull().sum(axis=1)>5].index,axis=0,inplace=True)

print(round(100*(1-df.count()/len(df)),2))

赞(0）回复(0）举报 2022-11-20

wfsdck302#

我认为您正在尝试查找空值之和大于5的行的索引。请使用np.where而不是df.loc来查找索引，然后删除它们。
请尝试：

import pandas as pd
import numpy as np
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d = np.where(df.isnull().sum(axis=1)>5)
df= df.drop(df.index[d])
print(round(100*(1-df.count()/len(df)),2))

输出：

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06
dtype: float64

赞(0）回复(0）举报 2022-11-20

ddhy6vgd3#

试试这个，应该可以

df = df[df.isnull().sum(axis=1) <= 5]
print(round(100*(1-df.count()/len(df)),2))

赞(0）回复(0）举报 2022-11-20

a8jjtwal4#

尝试此解决方案

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*(df.isnull().sum()/len(df.index)),2))

赞(0）回复(0）举报 2022-11-20

vaj7vani5#

这个应该可以

df = df.drop(df[df.isnull().sum(axis=1) > 5].index)

print(round(100 * (df.isnull().sum() / len(df.index)), 2))

赞(0）回复(0）举报 2022-11-20

6mw9ycah6#

{marks = marks[marks.isnull().sum(axis=1) < 5]
print(marks.isna().sum())}

请尝试这些，这将有助于

赞(0）回复(0）举报 2022-11-20

g0czyy6m7#

这是可行的：

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<5]
print(df.isnull().sum())

赞(0）回复(0）举报 2022-11-20

我来回答

pandas 从缺失值超过5个的行中删除缺失值，然后打印每列中缺失值的百分比

7条答案

相关问题

热门标签

最新问答