我想看到一个嵌套的几行。但我想看到它在json格式。有没有可能。因为df.show()是以表格格式打印的。我有很多列。我觉得可以以json格式打印吗
df.show()
fcipmucu1#
我想如果你想直观地看到最终的框架是什么样子的,如果你把它写到控制台上。最好的方法是创建一个临时目录。把几行数据写到那个目录,一个文件一个文件地读那个目录。这里有一个例子。
from pyspark import SQLContext from pyspark.sql import SparkSession from pyspark.sql.functions import * import os spark = SparkSession.builder \ .appName("MyApp") \ .getOrCreate() sqlContext = SQLContext(spark.sparkContext) data1 = [ [1, 35,"Male","2023-10-01",200], [1, 35,"Male","2023-10-02",210], [2, 28,"Female","2023-10-01",150], [2, 28,"Female","2023-10-02",160], [1, 35,"Male","2023-10-01",200], [1, 35,"Male","2023-10-02",210], [2, 28,"Female","2023-10-01",150], [2, 28,"Female","2023-10-02",160], [1, 35,"Male","2023-10-01",200], [1, 35,"Male","2023-10-02",210], ] columns =["member_id", "age", "gender", "date", "cost"] df1 = sqlContext.createDataFrame(data=data1, schema=columns) print("dataframe printed") df1.show(n=10, truncate=False) import tempfile temp_dir = tempfile.mkdtemp() ## a large dataframe so only limiting to 6 rows df1.limit(7).write.format("json").mode('overwrite').save(temp_dir) files = [os.path.join(temp_dir, f) for f in os.listdir(temp_dir) if os.path.isfile(os.path.join(temp_dir, f))] print(files) print() for file in files: if file.split("/")[-1].startswith("part"): with open(file, 'r', encoding="utf-8") as f: print("file being printed", file) line_content = f.readlines() final = "".join(line_content) print(final)
字符串输出量:
dataframe printed +---------+---+------+----------+----+ |member_id|age|gender|date |cost| +---------+---+------+----------+----+ |1 |35 |Male |2023-10-01|200 | |1 |35 |Male |2023-10-02|210 | |2 |28 |Female|2023-10-01|150 | |2 |28 |Female|2023-10-02|160 | |1 |35 |Male |2023-10-01|200 | |1 |35 |Male |2023-10-02|210 | |2 |28 |Female|2023-10-01|150 | |2 |28 |Female|2023-10-02|160 | |1 |35 |Male |2023-10-01|200 | |1 |35 |Male |2023-10-02|210 | +---------+---+------+----------+----+ ['/tmp/tmptset62u3/_SUCCESS', '/tmp/tmptset62u3/._SUCCESS.crc', '/tmp/tmptset62u3/part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json', '/tmp/tmptset62u3/.part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json.crc'] file being printed /tmp/tmptset62u3/part-00000-b481e6a1-f4af-4f21-88c5-c7e9e4134030-c000.json {"member_id":1,"age":35,"gender":"Male","date":"2023-10-01","cost":200} {"member_id":1,"age":35,"gender":"Male","date":"2023-10-02","cost":210} {"member_id":2,"age":28,"gender":"Female","date":"2023-10-01","cost":150} {"member_id":2,"age":28,"gender":"Female","date":"2023-10-02","cost":160} {"member_id":1,"age":35,"gender":"Male","date":"2023-10-01","cost":200} {"member_id":1,"age":35,"gender":"Male","date":"2023-10-02","cost":210} {"member_id":2,"age":28,"gender":"Female","date":"2023-10-01","cost":150}
型
1条答案
按热度按时间fcipmucu1#
我想如果你想直观地看到最终的框架是什么样子的,如果你把它写到控制台上。最好的方法是创建一个临时目录。把几行数据写到那个目录,一个文件一个文件地读那个目录。这里有一个例子。
字符串
输出量:
型