我有一张这样的table:
+-------+---- -+-------+-------+----
|movieId|Action| Comedy|Fantasy| ...
+-------+----- +-------+-------+----
| 1001 | 1 | 1 | 0 | ...
| 1011 | 0 | 1 | 1 | ...
+-------+------+-------+-------+----
如何将其每一行转换为indexedrow?所以我有这样的想法:
+-------+----------------+
|movieId| Features |
+-------+----------------+
| 1001 | [1, 1, 0, ...] |
| 1011 | [0, 1, 1, ...] |
+-------+----------------+
1条答案
按热度按时间bvpmtnay1#
如果需要数组类型输出,可以使用array()函数。
如果您试图对ml操作执行此操作,那么最好使用向量汇编程序:http://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/ml/feature.html#vectorassembler