val query = df.withColumn("value", col("value").cast(StringType))
.withColumn("value", from_json(col("value"), processor.Schema))
.select(unix_timestamp(col("timestamp")).alias("kafka_time"), col("value.*"))
.filter(processor.filter)
.transform(processor.transform)
.writeStream
.format("parquet")
.partitionBy("grass_date")
.option("path", config.savePath)
.option("checkpointLocation", config.checkpointLocation)
.trigger(Trigger.ProcessingTime("15 minutes"))
.outputMode(OutputMode.Append)
.start()
运行结构化流式处理作业时 parquet
文件接收器,spark创建 _spark_metadata
作业写入路径下的文件夹。由于这个文件夹,分区发现似乎不起作用。那么,有没有可能摆脱这个 _spark_metadata
文件夹或可能正在更改它的位置?
编辑1:我正在使用 spark 2.4.4
编辑2:我可以在上创建配置单元表 config.savePath
. 但在那张表里看不到任何数据。这是我手头的东西 savePath
.
[xxx]$ hadoop fs -ls /tmp/ravi.mondal/product_click/remind_me_button
Found 2 items
drwxrwxr-x - ravi.mondal supergroup 0 2020-05-20 12:36 /tmp/ravi.mondal/product_click/remind_me_button/_spark_metadata
drwxrwxr-x - ravi.mondal supergroup 0 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20
[xxx]$
[xxx]$ hadoop fs -ls /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20
Found 27 items
-rw-rw-r-- 3 ravi.mondal supergroup 1575 2020-05-20 12:46 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00009-34ec06fb-4506-4e73-963b-4441bd00410d.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1798 2020-05-20 12:31 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00017-e0d550b4-225c-44d5-a539-1e4e38a1069e.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1681 2020-05-20 11:46 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00023-9caf4a09-6c99-482b-9212-f03513c80070.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1561 2020-05-20 12:32 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00028-493b6d84-9638-4428-a0c7-99252d2efcd5.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1737 2020-05-20 12:32 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00032-4a72a3f3-a221-4071-b4f5-a49d16aadbba.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1773 2020-05-20 12:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00036-dca34760-861f-45f8-8ce0-51feb5ac2768.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1539 2020-05-20 11:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00042-cc062316-2afd-49c2-9ad8-8709693b2986.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 12:47 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00048-9432d414-2aaa-424a-84b8-cd4364fa4e87.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1665 2020-05-20 12:17 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00049-a8c3f0f0-80f5-4690-a928-1f2108aa39df.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1656 2020-05-20 11:30 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00051-0c016684-cf71-4681-b1cd-fcb325452e89.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1825 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00063-2dc3d00d-46ed-41cc-b189-2ed475ed5c5c.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 12:20 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00065-70b9e314-8292-4e48-81c4-e3b983977563.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1629 2020-05-20 12:50 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00065-bfed91f6-1398-4038-aee7-56cb0cf87414.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 12:18 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00074-4beb1880-2bc0-4001-9684-546e240b6888.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1665 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00075-adbc8782-7b6f-4dbd-a8f8-e878648b1ff2.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 11:31 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00081-9e56a444-161f-43d8-9e50-bf24c6484d83.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1688 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00081-f246df73-8db5-49f4-9682-9a12bdeb0b5a.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1656 2020-05-20 11:30 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00083-8e0fdecb-8d0b-49d5-8e93-6edeee1539fc.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1656 2020-05-20 12:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00092-b292d9ed-ce41-4426-833d-38f994af87d4.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1665 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00105-59bf04c1-b79f-42f1-995d-f3673486886d.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1823 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00108-00f0fc98-4e10-43c5-b5b3-9e0a10a7db03.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1737 2020-05-20 12:51 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00109-1389f070-e430-4246-95da-d2d4606b46ec.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1672 2020-05-20 12:20 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00109-b8e42728-ef8c-49d9-8451-aab55e3045cc.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00110-de37d04f-f26e-4a9c-872b-3b04ac8a188c.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1672 2020-05-20 11:49 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00112-b06ee506-04c1-4969-bf12-069a9a88f222.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1825 2020-05-20 12:51 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00119-f36fc943-3f97-47ff-9502-a4dbcd69b591.c000.snappy.parquet
-rw-rw-r-- 3 ravi.mondal supergroup 1584 2020-05-20 12:05 /tmp/ravi.mondal/product_click/remind_me_button/grass_date=2020-05-20/part-00124-bcfcbd1c-15aa-4410-b016-719715c8e775.c000.snappy.parquet
1条答案
按热度按时间erhoui1w1#
通过查看spark源代码,没有办法改变
_spark_metadata
目录,作为您的参考,我添加了git-repo代码,其中他们正在创建这个目录&这个目录正在指定的路径中创建。FileStreamLink源代码