您需要为唯一值指定索引。stringindexer不合适,因为它考虑了字符的频率。
如何进行因式分解
这样地:
activity_start activity_end activity_start_code activity_end_code
0 Stage_0 Stage_3 0 0
1 Stage_3 Stage_5 1 1
2 Stage_5 Stage_2 2 2
3 Stage_2 Stage_7 3 3
4 Stage_7 end 4 4
5 Stage_0 Stage_2 0 2
6 Stage_2 Stage_4 3 5
7 Stage_4 Stage_3 5 0
8 Stage_3 Stage_8 1 6
9 Stage_8 end 6 4
43 Stage_0 Stage_2 0 2
44 Stage_2 Stage_5 3 1
45 Stage_5 Stage_7 2 3
46 Stage_7 end 4 4
457 Stage_2 Stage_3 3 0
458 Stage_3 Stage_8 1 6
459 Stage_8 end 6 4
1条答案
按热度按时间lvmkulzt1#
这可以通过降低到rdd级别来实现。如果事先知道唯一行的id,则可以进一步简化该过程。以下是示例代码: