numpy 在np.where()逻辑中指定NaN编码

iecba09b  于 2023-03-23  发布在  其他
关注(0)|答案(2)|浏览(196)

我有这样的数据:

id    case2_q6
0   300   3.0
1   304   4.0
2   306   3.0
3   309   1.0
4   311   3.0
5   312   4.0
6   314   NaN
7   315   2.0
8   316   3.0
9   317   3.0

使用这个np.where()函数调用来生成新变量:

df['fluid_2'] = np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0)

现在df具有列fluid_2,如下所示:

id    case2_q6  fluid_2
0   300   3.0       0
1   304   4.0       0
2   306   3.0       0
3   309   1.0       1
4   311   3.0       0
5   312   4.0       0
6   314   NaN       0
7   315   2.0       1
8   316   3.0       0
9   317   3.0       0

如您所见,索引6处的NaN值被转换为0。是否有方法设置np.where(),以便将这些值保留为fluid_2中的NaN值?
预期产出为:

id    case2_q6  fluid_2
0   300   3.0       0
1   304   4.0       0
2   306   3.0       0
3   309   1.0       1
4   311   3.0       0
5   312   4.0       0
6   314   NaN       NaN
7   315   2.0       1
8   316   3.0       0
9   317   3.0       0

其中NaN被保留。

pprl5pva

pprl5pva1#

让我们删除空值并使用isin检查条件

df['fluid_2'] = df['case2_q6'].dropna().isin([1, 2]).astype('int')
id  case2_q6  fluid_2
0  300       3.0      0.0
1  304       4.0      0.0
2  306       3.0      0.0
3  309       1.0      1.0
4  311       3.0      0.0
5  312       4.0      0.0
6  314       NaN      NaN
7  315       2.0      1.0
8  316       3.0      0.0
9  317       3.0      0.0
lhcgjxsq

lhcgjxsq2#

可能的解决方案:

df['fluid_2'] = np.where(
    df['case2_q6'].isna(), np.nan, 
    np.where((df['case2_q6'] == 1) | (df['case2_q6'] == 2), 1, 0))

另一种可能的解决方案:

df['fluid_2'] = df['case2_q6'].clip(upper=1).mul(df['case2_q6'].isin([1,2]))

输出:

id  case2_q6  fluid_2
0  300       3.0      0.0
1  304       4.0      0.0
2  306       3.0      0.0
3  309       1.0      1.0
4  311       3.0      0.0
5  312       4.0      0.0
6  314       NaN      NaN
7  315       2.0      1.0
8  316       3.0      0.0
9  317       3.0      0.0

相关问题