pandas 迭代行,同时一次以两列为目标

n3h0vuf2  于 2023-01-11  发布在  其他
关注(0)|答案(1)|浏览(122)

cols_df表示DataFrame的块,我希望能够在其中执行多个操作,但每次都将第一列和第二列作为目标。(例如,第一轮中的"0"、"2"、"3"列,以及第二轮中的"0"、"4"、"5"列)。在新列中,如果每一行在两个目标列中没有包含数值,我就用X标记它。我继续对每对cols_df的列应用这个方法。然后,我将有一个DataFrame,其中包含新标记的列以及所有其他列。
输入:

import pandas as pd

cols_dict = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}  

cols_df = pd.DataFrame.from_dict(cols_dict)

所需输出:

cols_dict_out = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'xs': {0: None, 1: None, 2: None, 3: None, 4: None, 5: 'X', 6: None, 7: None, 8: None, 9: 'X'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}

cols_out_df = pd.DataFrame.from_dict(cols_dict_out)
eiee3dmh

eiee3dmh1#

更新答案
# get columns 2/3 or 4/5 and check if NaN
m1 = cols_df.iloc[:, [2,3]].isna()
m2 = cols_df.iloc[:, [4,5]].isna()

# check if all values are homogeneous
m = ( m1.eq(m1.iloc[:, 0], axis=0).all(axis=1)
    & m2.eq(m2.iloc[:, 0], axis=0).all(axis=1)
    )

cols_df['xs'] = np.where(m, None, 'X')
numpy的一般答案:
N = 2
a = cols_df.filter(like='Unnamed').isna().to_numpy()
b = a.reshape((a.shape[0], N, -1,))
m = (b == b[..., [0]]).all(-1).all(1)

cols_df.insert(1, 'xs', np.where(m, None, 'X'))

输出:

matr    xs cat  Unnamed: 5  Unnamed: 6  Unnamed: 7  Unnamed: 8
0  18I1  None   3        81.0        91.0       117.0       123.0
1  03I2  None   3        81.0        97.0       115.0       115.0
2  03I3  None   3        81.0        97.0       115.0       115.0
3  18I4  None   3        77.0        91.0       115.0       115.0
4  03I5  None   3         NaN         NaN       115.0       123.0
5  03I6     X  18         NaN        93.0         NaN       123.0
6  03I7  None   3        83.0        89.0       115.0       125.0
7  03I8  None   3        81.0        83.0       115.0       123.0
8  18I9  None   3        79.0        81.0       115.0       117.0
9  18I0     X   3        81.0        99.0       115.0         NaN
原始答案

逻辑并不完全清楚,但看起来您可能需要:

cols_df['xs'] = np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None)

或者,如果要作为第二列插入:

cols_df.insert(1, 'xs', np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None))

输出:

matr    xs cat  Unnamed: 5  Unnamed: 6  Unnamed: 7  Unnamed: 8
0  18I1  None   3        81.0          91       117.0       123.0
1  03I2  None   3        81.0          97       115.0       115.0
2  03I3  None   3        81.0          97       115.0       115.0
3  18I4  None   3        77.0          91       115.0       115.0
4  03I5     X   3         NaN          81       115.0       123.0
5  03I6     X  18         NaN          93         NaN       123.0
6  03I7  None   3        83.0          89       115.0       125.0
7  03I8  None   3        81.0          83       115.0       123.0
8  18I9  None   3        79.0          81       115.0       117.0
9  18I0     X   3        81.0          99       115.0         NaN

相关问题