pandas Dataframe 的不规则列表

pkln4tw6  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(139)

我有一个非统一的名单如下:

[['E', 'A', 'P'],
 ['E', 'A', 'X', 'P'],
 ['E', 'A', 'P'],
 ['P'],
 ['E', 'A', 'X', 'P'],
 ['E', 'A', 'P'],
 ['A', 'X', 'P'],
 ['E', 'A', 'P'],
 ['E', 'A', 'P'],
 ['E', 'A', 'X', 'P'],
 ['E', 'A', 'P'],
 ['E', 'A', 'P'],
 ['A', 'X', 'P'],

我想以此为基础创建一个数据框,其中每一列都以一位热编码的方式表示四个可能的字母"E""A""X""p"-最有效的方法是什么?

ukqbszuj

ukqbszuj1#

我会推荐sklearn中的MultiLabelBinarizer

from sklearn.preprocessing import MultiLabelBinarizer
 
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(l),columns=mlb.classes_)
Out[170]: 
    A  E  P  X
0   1  1  1  0
1   1  1  1  1
2   1  1  1  0
3   0  0  1  0
4   1  1  1  1
5   1  1  1  0
6   1  0  1  1
7   1  1  1  0
8   1  1  1  0
9   1  1  1  1
10  1  1  1  0
11  1  1  1  0
12  1  0  1  1

或者我们尝试用explodestr.get_dummies的Pandas方式

df = pd.Series(l).explode().str.get_dummies().groupby(level=0).sum()
Out[176]: 
    A  E  P  X
0   1  1  1  0
1   1  1  1  1
2   1  1  1  0
3   0  0  1  0
4   1  1  1  1
5   1  1  1  0
6   1  0  1  1
7   1  1  1  0
8   1  1  1  0
9   1  1  1  1
10  1  1  1  0
11  1  1  1  0
12  1  0  1  1

注意l就是这里的list

jutyujz0

jutyujz02#

请尝试:

lst = [
    ["E", "A", "P"],
    ["E", "A", "X", "P"],
    ["E", "A", "P"],
    ["P"],
    ["E", "A", "X", "P"],
    ["E", "A", "P"],
    ["A", "X", "P"],
    ["E", "A", "P"],
    ["E", "A", "P"],
    ["E", "A", "X", "P"],
    ["E", "A", "P"],
    ["E", "A", "P"],
    ["A", "X", "P"],
]

df = pd.DataFrame({v: 1 for v in l} for l in lst).notna().astype(int)
print(df)

印刷品:

E  A  P  X
0   1  1  1  0
1   1  1  1  1
2   1  1  1  0
3   0  0  1  0
4   1  1  1  1
5   1  1  1  0
6   0  1  1  1
7   1  1  1  0
8   1  1  1  0
9   1  1  1  1
10  1  1  1  0
11  1  1  1  0
12  0  1  1  1

相关问题