如何将PySpark Dataframe 转换为字典:第一列作为主键,其他列及其内容键值对?

idfiyjo8  于 2022-11-16  发布在  Apache
关注(0)|答案(1)|浏览(140)

我在PySpark中创建了一个数据框,如下所示:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

data_1 = [
    ("rule1", "", "1", "2", "3", "4"),
    ("rule2", "1", "3", "5", "6", "4"),
    ("rule3", "", "0", "1", "2", "5"),
    ("rule4", "0", "1", "3", "6", "2"),
]

schema = StructType(
    [
        StructField("_c0", StringType(), True),
        StructField("para1", StringType(), True),
        StructField("para2", StringType(), True),
        StructField("para3", StringType(), True),
        StructField("para4", StringType(), True),
        StructField("para5", StringType(), True),
    ]
)
 
df = spark.createDataFrame(data=data_1,schema=schema)

这给出:

+-----+-----+-----+-----+-----+-----+
|_c0  |para1|para2|para3|para4|para5|
+-----+-----+-----+-----+-----+-----+
|rule1|     |1    |2    |3    |4    |
|rule2|1    |3    |5    |6    |4    |
|rule3|     |0    |1    |2    |5    |
|rule4|0    |1    |3    |6    |2    |
+-----+-----+-----+-----+-----+-----+

我想把它转换成这样的字典:

dict = {'rule1': {'para2': '1', 'para3': '2','para4': '3','para5': '4'},
        'rule2': {'para1': '1', 'para2': '3','para3': '5','para4': '6','para5': '4'}, ...}

具有空""值的列不应出现在最终字典中,例如,在“rule1”的字典中,“para1”不存在。其余的都存在。
我试着将其作为初始代码,但它并不令人满意:

dict1 = df.rdd.map(lambda row: row.asDict()).collect()
final_dict = {d['_c0']: d[col] for d in dict1 for col in df.columns}

# Returns {'rule1': '4', 'rule2': '4', 'rule3': '5', 'rule4': '2'}
ulmd4ohb

ulmd4ohb1#

您可以尝试以下嵌套字典解析:

dict_rules = {r['_c0']: {k: v 
                         for k, v in r.asDict().items() 
                         if k != '_c0' and v != ''}
              for r in df.collect()}

# {'rule1': {'para2': '1', 'para3': '2', 'para4': '3', 'para5': '4'},
#  'rule2': {'para1': '1', 'para2': '3', 'para3': '5', 'para4': '6', 'para5': '4'},
#  'rule3': {'para2': '0', 'para3': '1', 'para4': '2', 'para5': '5'},
#  'rule4': {'para1': '0', 'para2': '1', 'para3': '3', 'para4': '6', 'para5': '2'}}

相关问题