使用spark将数据从oracle传输到hive

ulydmbyx  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(536)

如何使用spark将数据从oracle数据库导入dataframe或rdd,然后将这些数据写入某个配置单元表?
我有相同的代码:

public static void main(String[] args) {

    SparkConf conf = new SparkConf().setAppName("Data transfer test (Oracle -> Hive)").setMaster("local");
    JavaSparkContext sc = new JavaSparkContext(conf);
    SQLContext sqlContext = new SQLContext(sc);

    HashMap<String, String> options = new HashMap<>();
    options.put("url", "jdbc:oracle:thin:@<ip>:<port>:orcl");
    options.put("dbtable", "ACCOUNTS");
    options.put("user", "username");
    options.put("password", "12345");
    options.put("driver", "oracle.jdbc.OracleDriver");
    options.put("numPartitions", "4");

    DataFrame oracleDataFrame = sqlContext.read()
              .format("jdbc")
              .options(options)
              .load();

}

如果我创建一个hivecontext示例来使用hive

HiveContext hiveContext = new HiveContext(sc);

我也犯了同样的错误:

ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser oracle.xml.jaxp.JXDocumentBuilderFactory@51be472e:java.lang                                                                                      .UnsupportedOperationException:  setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFacto                                                                                      ry
java.lang.UnsupportedOperationException:  setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBui                                                                                      lderFactory
        at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:614)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2534)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2503)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2409)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1144)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1116)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:437)
        at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2750)
        at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:2713)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:185)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
        at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329)
        at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239)
        at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:443)
        at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
        at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:271)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:103)
        at replicator.ImportFromOracleToHive.init(ImportFromOracleToHive.java:52)
        at replicator.ImportFromOracleToHive.main(ImportFromOracleToHive.java:76)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
bt1cpqcv

bt1cpqcv1#

这个问题似乎是一个过时的xerces依赖的问题,在这个问题中有详细说明。我猜你是以某种方式间接地把它拉进来的,但是如果没有看到你的眼睛就不可能说出来 pom.xml . 您将从发布的堆栈跟踪中注意到,错误源于hadoop公共代码 Configuration 对象,而不是Spark本身。解决方法是确保您使用的是足够新的版本。

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

相关问题