pyspark 以json格式显示spark Dataframe 而不是表格

h5qlskok  于 2023-11-16  发布在  Spark
关注(0)|答案(1)|浏览(99)

我想看到一个嵌套的几行。但我想看到它在json格式。
有没有可能。因为df.show()是以表格格式打印的。我有很多列。我觉得可以以json格式打印吗

fcipmucu

fcipmucu1#

我想如果你想直观地看到最终的框架是什么样子的,如果你把它写到控制台上。最好的方法是创建一个临时目录。把几行数据写到那个目录,一个文件一个文件地读那个目录。这里有一个例子。

from pyspark import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import os

spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()

sqlContext = SQLContext(spark.sparkContext)

data1 = [

[1, 35,"Male","2023-10-01",200],
[1, 35,"Male","2023-10-02",210],
[2, 28,"Female","2023-10-01",150],
[2, 28,"Female","2023-10-02",160],
[1, 35,"Male","2023-10-01",200],
[1, 35,"Male","2023-10-02",210],
[2, 28,"Female","2023-10-01",150],
[2, 28,"Female","2023-10-02",160],
[1, 35,"Male","2023-10-01",200],
[1, 35,"Male","2023-10-02",210],
]

columns =["member_id", "age", "gender", "date", "cost"]

df1 = sqlContext.createDataFrame(data=data1, schema=columns)

print("dataframe printed")
df1.show(n=10, truncate=False)


import tempfile
temp_dir = tempfile.mkdtemp()

## a large dataframe so only limiting to 6 rows
df1.limit(7).write.format("json").mode('overwrite').save(temp_dir)

files = [os.path.join(temp_dir, f) for f in os.listdir(temp_dir) if os.path.isfile(os.path.join(temp_dir, f))]

print(files)
print()

for file in files:
    if file.split("/")[-1].startswith("part"):
        with open(file, 'r', encoding="utf-8") as f:
            print("file being printed", file)
            line_content = f.readlines()
            final = "".join(line_content)
            print(final)

字符串
输出量:

dataframe printed
+---------+---+------+----------+----+
|member_id|age|gender|date      |cost|
+---------+---+------+----------+----+
|1        |35 |Male  |2023-10-01|200 |
|1        |35 |Male  |2023-10-02|210 |
|2        |28 |Female|2023-10-01|150 |
|2        |28 |Female|2023-10-02|160 |
|1        |35 |Male  |2023-10-01|200 |
|1        |35 |Male  |2023-10-02|210 |
|2        |28 |Female|2023-10-01|150 |
|2        |28 |Female|2023-10-02|160 |
|1        |35 |Male  |2023-10-01|200 |
|1        |35 |Male  |2023-10-02|210 |
+---------+---+------+----------+----+

['/tmp/tmptset62u3/_SUCCESS', '/tmp/tmptset62u3/._SUCCESS.crc', '/tmp/tmptset62u3/part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json', '/tmp/tmptset62u3/.part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json.crc']

file being printed /tmp/tmptset62u3/part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json
{"member_id":1,"age":35,"gender":"Male","date":"2023-10-01","cost":200}
{"member_id":1,"age":35,"gender":"Male","date":"2023-10-02","cost":210}
{"member_id":2,"age":28,"gender":"Female","date":"2023-10-01","cost":150}
{"member_id":2,"age":28,"gender":"Female","date":"2023-10-02","cost":160}
{"member_id":1,"age":35,"gender":"Male","date":"2023-10-01","cost":200}
{"member_id":1,"age":35,"gender":"Male","date":"2023-10-02","cost":210}
{"member_id":2,"age":28,"gender":"Female","date":"2023-10-01","cost":150}

相关问题