基于intellij的本地集群spark应用开发

jw5wzhpr  于 2021-06-29  发布在  Hive
关注(0)|答案(1)|浏览(294)

我尝试了很多方法在本地集群上执行应用程序。然而,它没有起作用。
我使用的是cdh5.7和spark版本是1.6。我正在尝试从CDH5.7上的配置单元创建Dataframe。
如果我使用sparkshell,所有的代码都能很好地工作。但是,我不知道如何为高效的开发环境设置intellj配置。
这是我的密码;

import org.apache.spark.{SparkConf, SparkContext}

object DataFrame {
  def main(args: Array[String]): Unit = {
    println("Hello DataFrame")

    val conf = new SparkConf() // skip loading external settingg
      .setMaster("local") // could be "local[4]" for 4 threads
      .setAppName("DataFrame-Example")
      .set("spark.logConf", "true")

    val sc = new SparkContext(conf) 
    sc.setLogLevel("WARN") 
    println(s"Running Spark Version ${sc.version}")

    val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
    sqlContext.sql("From src select key, value").collect().foreach(println)

  }
}

当我在intellij上运行这个程序时,错误消息如下:;

Hello DataFrame
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/29 11:30:57 INFO Slf4jLogger: Slf4jLogger started
Running Spark Version 1.6.0
16/05/29 11:31:02 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
16/05/29 11:31:02 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239)
at org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:458)
at org.apache.spark.sql.hive.HiveContext$$anon$3.<init>(HiveContext.scala:475)
at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:475)
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:474)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.corus.spark.example.DataFrame$.main(DataFrame.scala:25)
at org.corus.spark.example.DataFrame.main(DataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:539)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
... 24 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:624)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:573)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:517)
... 25 more

Process finished with exit code 1

有人知道解决办法吗?谢谢。
我找到了一些关于这个问题的资料。但没有一个不起作用。https://www.linkedin.com/pulse/develop-apache-spark-apps-intellij-idea-windows-os-samuel-yeehttpshttp://blog.cloudera.com/blog/2014/06/how-to-create-an-intellij-idea-project-for-apache-hadoop/

qco9c6ql

qco9c6ql1#

谢谢大家。我自己解决了这个问题。问题是本地spark(maven版本)不知道集群上的hive信息。
解决办法很简单。
只需在源代码中添加以下代码。
conf.set("spark.sql.hive.thriftServer.singleSession", "true") System.setProperty("hive.metastore.uris","thrift://hostname:serviceport") 真管用!让我们玩Spark。

相关问题