在spark中读取配置单元表

e4eetjau  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(241)

如果配置单元表中有数十亿条记录,以下哪种方法更好:
直接:

SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
    SparkContext sc = new SparkContext(conf);
    HiveContext hc = new HiveContext(sc);
    DataFrame df = hc.table(tableName);
    df.write().orc(outputHdfsFile);

使用jdbc:

SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
    SparkContext sc = new SparkContext(conf);
    SQLContext sqlContext = new SQLContext(sc);

    try {
        Class.forName(driverName);
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }

    Properties props = new Properties();
    props.setProperty("user", userName);
    props.setProperty("password", password);
    props.setProperty("driver", driverName);

    DataFrame df = sqlContext.read().jdbc(connectionUri, tableName, props);
    df.write().orc(outputHdfsFile);

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题