为什么r在我尝试导入.parquet文件时给我“errore:org.apache.spark.sql.catalyst.errors.package$treenodeexception:execute,tree:?

pw136qt2  于 2021-05-22  发布在  Spark
关注(0)|答案(0)|浏览(513)

我需要管理一个传感器在5个月内检测到的数据,我要处理30列,5秒的采样周期,所以我有相当数量的检测。
这些数据已发送给我在一个文件夹中包含4“.parquet”文件。
现在我需要在rstudio中导入它们来执行一些分析。
到目前为止,我只安装了一个使用sparkyr接口的本地spark集群。以下是我运行的代码:

sc <- spark_connect(master = "local", version = "3.0")
parquet_dir <- ("C:/Users/Utente/Desktop/Tesi/DatiP")
spark_read_parquet(sc, name = "dati", parquet_dir)

问题是,当我尝试加载这些数据时,出现了一些问题,我收到了以下错误:

Errore: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition, true, [id=#3117]
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#15332L])
   +- Scan In-memory table `dati`
         +- InMemoryRelation [Time#14013, Wind Sensor Direction#14014, Wind Sensor Speed#14015, Air temp#14016, Air humidity#14017, Air pressure#14018, Rain amount#14019, Rain duration#14020, Rain intensity#14021, Hail amount#14022, Hail duration#14023, Hail intensity#14024, Thermoelement 1#14025, Thermoelement 2#14026, Thermoelement 3#14027, Thermoelement 4#14028, Thermoelement 17#14029, Thermoelement 18#14030, Thermoelement 19#14031, Thermoelement 20#14032, Thermoelement 21#14033, Thermoelement 22#14034, Thermoelement 23#14035, Thermoelement 24#14036, ... 27 more fields], StorageLevel(disk, memory, deserialized, 1 replicas)
               +- *(1) ColumnarToRow
                  +- FileScan parquet [Time#14013,Wind Sensor Direction#14014,Wind Sensor Speed#14015,Air temp#14016,Air humidity#14017,Air pressure#14018,Rain amount#14019,Rain duration#14020,Rain intensity#14021,Hail amount#14022,Hail duration#14023,Hail intensity#14024,Thermoelement 1#14025,Thermoelement 2#14026,Thermoelement 3#14027,Thermoelement 4#14028,Thermoelement 17#14029,Thermoelement 18#14030,Thermoelement 19#14031,Thermoelement 20#14032,Thermoelement 21#14033,Thermoelement 22#14034,Thermoelement 23#14035,Thermoelement 24#14036,... 27 more fields] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/C:/Users/Utente/Desktop/Tesi/DatiP], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Time:timestamp,Wind Sensor Direction:double,Wind Sensor Speed:double,Air temp:double,Air h...

    at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:95)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
    at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
    at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
    at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:162)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:316)
    at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:382)
    at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2979)
    at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:2978)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
    at org.apache.spark.sql.Dataset.count(Dataset.scala:2978)
    at org.apache.spark.sql.execution.command.CacheTableCommand.run(cache.scala:62)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
    at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at sparklyr.Invoke.invoke(invoke.scala:147)
    at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
    at sparklyr.StreamHandler.read(stream.scala:61)
    at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:58)
    at scala.util.control.Breaks.breakable(Breaks.scala:42)
    at sparklyr.BackendHandler.channelRead0(handler.scala:39)
    at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at

我读到了关于以下问题的几个答案:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

错误;然而,我注意到那些要求澄清这类错误的人是在试图操纵关节;这不是我的情况,我只是想导入数据。
有没有可能这个问题与我打算上传的数据的结构直接相关?
我还尝试使用github上的“.parquet文件查看器”查看数据,但似乎没有任何异常。
事先谢谢你的帮助。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题