我有下面的json文件
{"name":"John", "age":31, "city":"New York"}
{"name":"Henry", "age":41, "city":"Boston"}
{"name":"Dave", "age":26, "city":"New York"}
因此,我需要将每个json行作为一行与Dataframe一起读取。
以下是预期输出:
我尝试了以下代码:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName('Read Json') \
.getOrCreate()
df = spark.read.format('json').load('sample_json')
df.show()
但我只能得到以下输出:
请帮帮我。提前谢谢。
1条答案
按热度按时间ajsxfq5m1#
将文件读取为
json
然后使用to_json
要创建的函数json_column
.1.Using to_json function:
```from pyspark.sql.functions import *
spark.read.json("sample.json").
withColumn("Json_column",to_json(struct(col("age"),col('city'),col('name')))).
show(10,False)
+---+--------+-----+------------------------------------------+
|age|city |name |Json_column |
+---+--------+-----+------------------------------------------+
|31 |New York|John |{"age":31,"city":"New York","name":"John"}|
|41 |Boston |Henry|{"age":41,"city":"Boston","name":"Henry"} |
|26 |New York|Dave |{"age":26,"city":"New York","name":"Dave"}|
+---+--------+-----+------------------------------------------+
or more dynamic way
df=spark.read.json("sample.json")
df.withColumn("Json_column",to_json(struct([col(c) for c in df.columns]))).show(10,False)
+---+--------+-----+------------------------------------------+
|age|city |name |Json_column |
+---+--------+-----+------------------------------------------+
|31 |New York|John |{"age":31,"city":"New York","name":"John"}|
|41 |Boston |Henry|{"age":41,"city":"Boston","name":"Henry"} |
|26 |New York|Dave |{"age":26,"city":"New York","name":"Dave"}|
+---+--------+-----+------------------------------------------+
```
2.Other approach using get_json_object function:
以文本形式读取json文件,然后创建name,age,city
从中提取列json object
.