spark配置单元查询联接错误

xjreopfe 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(185)

我有如下cloudera集群规范：

我创建了简单的sparksql应用程序来连接配置单元表。表是外部表。对于healtpersonalcare\u reviews表，数据是用json文件编写的；对于healtpersonalcare\u ratings表，数据是用csv格式（115mb）编写的。这是我的密码：

val warehouseLocation = "/hive/warehouse"
var args_list      = args.toList
var conf = new SparkConf()
  .set("spark.sql.warehouse.dir", warehouseLocation)
  .set("spark.kryoserializer.buffer.max","1024m")

val spark = SparkSession
  .builder()
  .appName("Spark Hive Example")
  .config(conf)
  .enableHiveSupport()
  .getOrCreate()

val table_view_name = args_list(0)
val limit = args_list(1)

val df_addjar = spark.sql("ADD JAR /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar")

var df_use =spark.sql("use testing")
var df = spark.sql("SELECT hp.asin, hp.helpful,hp.overall,hp.reviewerid,hp.reviewername,hp.reviewtext,hp.reviewtime,hp.summary,hp.unixreviewtime FROM testing.healtpersonalcare_reviews hp LEFT JOIN testing.health_ratings hr ON (hp.reviewerid = hr.reviewerid) ")
var df_create_join_table = spark.sql("CREATE TABLE IF NOT EXISTS healtpersonalcare_joins (asin string,helpful array<int>,overall double,reviewerid string,reviewername string,reviewtext string,reviewtime string,summary string,unixreviewtime int)")

df.cache()
df.collect().foreach(println)

System.exit(0)

我使用以下命令运行应用程序：
spark submit--class org.sia.chapter03app.app--master yarn--deploy mode client--executor memory 1024m--driver memory 1024m--conf spark.driver.maxresultsize=2g--verbose/root/sparktest/original-chapter03app-0.0.1-snapshot.jar name 10
我尝试使用值的变化--执行器内存和--驱动程序内存
对于“--executor memory 1024m--driver memory 1024m”i get error“java.lang.outofmemoryerror:java堆空间”
对于“--executor memory 2048m--driver memory 2048m”，线程“main”java.lang.outofmemoryerror:超出gc开销限制
有人遇到过这样的问题吗？解决办法是什么？谢谢。

Java Hive apache-spark apache-spark-sql cloudera

来源：https://stackoverflow.com/questions/42360722/spark-hive-query-join-error

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

spark配置单元查询联接错误

暂无答案！

相关问题

热门标签

最新问答