我使用intellijide中的spark biquery连接器连接到google bigquery表。在读取和打印表格时,它不会显示任何记录。但是,元数据信息是从bigquery中提取的。表emp中有记录。
val spark= SparkSession.builder.appName("my first app")
.config("spark.master", "local")
.getOrCreate()
val myDF = spark.read.format("bigquery").option("credentialsFile", "src\\main\\resources\\gcloud-rkg-cred.json").load("decoded-tribute-279515:gcp_test_db.emp")
val newDF = myDF.select("empid", "empname", "salary")
println(myDF.printSchema)
println(newDF.printSchema)
println(newDF.show)
mydf和newdf printschemas返回列,但newdf.show只返回-()
我的build.sbt文件如下-
name := "myTestGCPProject"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
ibraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.16.1"
表的架构和数据的快照-
在我尝试了0.17.0之后,正如david在下面的评论中建议的那样-下面是我收到的错误。
20/07/23 11:17:43 INFO ComputeEngineCredentials: Failed to detect whether we are running on Google Compute Engine.
root
|-- empid: long (nullable = false)
|-- empname: string (nullable = false)
|-- location: string (nullable = false)
|-- salary: long (nullable = false)
root
|-- empid: long (nullable = false)
|-- empname: string (nullable = false)
|-- salary: long (nullable = false)
20/07/23 11:17:51 INFO DirectBigQueryRelation: Querying table decoded-tribute-279515.gcp_test_db.emp, parameters sent from Spark: requiredColumns=[empid,empname,salary], filters=[]
20/07/23 11:17:51 INFO DirectBigQueryRelation: Going to read from decoded-tribute-279515.gcp_test_db.emp columns=[empid, empname, salary], filter=''
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 49
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 79
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 71
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 69
20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_5_piece0 on raghav-VAIO:49977 in memory (size: 6.5 KB, free: 639.2 MB)
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 88
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 83
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 58
20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_4_piece0 on raghav-VAIO:49977 in memory (size: 20.8 KB, free: 639.2 MB)
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 14
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 62
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 87
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 76
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 8
20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_3_piece0 on raghav-VAIO:49977 in memory (size: 7.2 KB, free: 639.3 MB)
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 9
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 10
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 72
20/07/23 11:17:52 INFO BlockManagerInfo: Removed broadcast_1_piece0 on raghav-VAIO:49977 in memory (size: 4.5 KB, free: 639.3 MB)
20/07/23 11:17:52 INFO ContextCleaner: Cleaned accumulator 42
[error] (run-main-0) com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnknownException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
[error] com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnknownException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:47)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1083)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1174)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:969)
[error] at com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:760)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:545)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:515)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
[error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[error] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] Caused by: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNKNOWN: Channel Pipeline: [WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.Status.asRuntimeException(Status.java:533)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:515)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
[error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[error] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] Caused by: com.google.cloud.spark.bigquery.repackaged.io.netty.channel.ChannelPipelineException: com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$ClientTlsHandler.handlerAdded() has thrown an exception; removed.
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:572)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:515)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(ProtocolNegotiators.java:767)
[error] at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.channelActive(ProtocolNegotiators.java:676)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:305)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[error] at java.lang.Thread.run(Thread.java:748)
[error] Caused by: java.lang.RuntimeException: ALPN unsupported. Is your classpath configured correctly? For Conscrypt, add the appropriate Conscrypt JAR to classpath and set the security provider. For Jetty-ALPN, see http://www.eclipse.org/jetty/documentation/current/alpn-chapter.html#alpn-starting
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkAlpnApplicationProtocolNegotiator$FailureWrapper.wrapSslEngine(JdkAlpnApplicationProtocolNegotiator.java:122)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkSslContext.configureAndWrapEngine(JdkSslContext.java:360)
[error] at com.google.cloud.spark.bigquery.repackaged.io.netty.handler.ssl.JdkSslContext.newEngine(JdkSslContext.java:335)
请帮忙。
1条答案
按热度按时间50pmv0ei1#
编辑:请确保使用spark bigquery with dependencies工件。
你不需要
println()
对于这些方法,请尝试