spark google bigquery表dataframe未返回任何行且为空但是,printschema返回元数据信息

v8wbuo2f  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(441)

我使用intellijide中的spark biquery连接器连接到google bigquery表。在读取和打印表格时,它不会显示任何记录。但是,元数据信息是从bigquery中提取的。表emp中有记录。

val spark= SparkSession.builder.appName("my first app")
  .config("spark.master", "local")
  .getOrCreate()

val myDF = spark.read.format("bigquery").option("credentialsFile", "src\\main\\resources\\gcloud-rkg-cred.json").load("decoded-tribute-279515:gcp_test_db.emp")
val newDF = myDF.select("empid", "empname", "salary")
println(myDF.printSchema)
println(newDF.printSchema)
println(newDF.show)

mydf和newdf printschemas返回列,但newdf.show只返回-()
我的build.sbt文件如下-

name := "myTestGCPProject"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
ibraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.16.1"

表的架构和数据的快照-


在我尝试了0.17.0之后,正如david在下面的评论中建议的那样-下面是我收到的错误。

20/07/23 11:17:43 INFO ComputeEngineCredentials: Failed to detect whether we are running on Google Compute Engine.
    root
     |-- empid: long (nullable = false)
     |-- empname: string (nullable = false)
     |-- location: string (nullable = false)
     |-- salary: long (nullable = false)
    root
     |-- empid: long (nullable = false)
     |-- empname: string (nullable = false)
     |-- salary: long (nullable = false)
    20/07/23 11:17:51 INFO DirectBigQueryRelation: Querying table decoded-tribute-279515.gcp_test_db.emp, parameters sent from Spark: requiredColumns=[empid,empname,salary], filters=[]
    20/07/23 11:17:51 INFO DirectBigQueryRelation: Going to read from decoded-tribute-279515.gcp_test_db.emp columns=[empid, empname, salary], filter=''
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 49
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 79
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 71
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 69
    20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_5_piece0 on raghav-VAIO:49977 in memory (size: 6.5 KB, free: 639.2 MB)
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 88
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 83
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 58
    20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_4_piece0 on raghav-VAIO:49977 in memory (size: 20.8 KB, free: 639.2 MB)
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 14
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 62
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 87
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 76
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 8
    20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_3_piece0 on raghav-VAIO:49977 in memory (size: 7.2 KB, free: 639.3 MB)
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 9
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 10
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 72
    20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_1_piece0 on raghav-VAIO:49977 in memory (size: 4.5 KB, free: 639.3 MB)
    20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 42
    [error] (run-main-0) com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnknownException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
    [error] com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnknownException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:47)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1083)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1174)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:969)
    [error]         at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:760)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:545)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:515)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    [error]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    [error]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    [error]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    [error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    [error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    [error]         at java.lang.Thread.run(Thread.java:748)
    [error] Caused by: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.Status.asRuntimeException(Status.java:533)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:515)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    [error]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    [error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    [error]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    [error]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    [error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    [error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    [error]         at java.lang.Thread.run(Thread.java:748)
    [error] Caused by: com.google.cloud.spark.bigquery.repackaged.io.netty.channel.ChannelPipelineException: com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$ClientTlsHandler.handlerAdded() has thrown an exception; removed.
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:572)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:515)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(ProtocolNegotiators.java:767)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.channelActive(ProtocolNegotiators.java:676)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:305)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    [error]         at java.lang.Thread.run(Thread.java:748)
    [error] Caused by: java.lang.RuntimeException: ALPN unsupported. Is your classpath configured correctly? For Conscrypt, add the appropriate Conscrypt JAR to classpath and set the security provider. For Jetty-ALPN, see http://www.eclipse.org/jetty/documentation/current/alpn-chapter.html#alpn-starting
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkAlpnApplicationProtocolNegotiator$FailureWrapper.wrapSslEngine(JdkAlpnApplicationProtocolNegotiator.java:122)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkSslContext.configureAndWrapEngine(JdkSslContext.java:360)
    [error]         at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkSslContext.newEngine(JdkSslContext.java:335)

请帮忙。

50pmv0ei

50pmv0ei1#

编辑:请确保使用spark bigquery with dependencies工件。
你不需要 println() 对于这些方法,请尝试

myDF.printSchema()
newDF.printSchema()
newDF.show()

相关问题