spark with hive,无法使用hive支持示例化sparksession,因为找不到hive类

czq61nw1  于 2021-05-31  发布在  Hadoop
关注(0)|答案(1)|浏览(417)

spark应用程序将从hive加载数据:

SparkSession spark = SparkSession.builder()
        .appName(topics)
        .config("hive.metastore.uris", "thrift://device1:9083")
        .enableHiveSupport()
        .getOrCreate();

我通过以下方式启动Spark:

spark-submit --master local[*] --class zhihu.SparkConsumer target/original-kafka-consumer-0.1-SNAPSHOT.jar  --jars spark-hive_2.11-2.4.4.jar

maven pom.xml文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.zhihu</groupId>
  <artifactId>kafka-consumer</artifactId>
  <packaging>jar</packaging>
  <version>0.1-SNAPSHOT</version>
  <name>kafkadev</name>
  <url>http://maven.apache.org</url>
  <repositories>
    <repository>
      <!-- Proper URL for Cloudera maven artifactory -->
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

</repositories>
<dependencies>

<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core -->
<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api -->
<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
    <version>2.8.2</version>
</dependency>
<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>

</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.4.4</version>
      </dependency>

      <dependency>
          <groupId>org.apache.kafka</groupId>
          <artifactId>kafka-clients</artifactId>
          <version>2.1.0</version>

          <exclusions>
        <exclusion>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.log4j</groupId>
            <artifactId>log4j-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
        </exclusion>
    </exclusions>

      <scope>compile</scope>
  </dependency>
  <!-- gson -->
  <dependency>
      <groupId>com.google.code.gson</groupId>
      <artifactId>gson</artifactId>
      <version>2.8.2</version>
  </dependency>

  <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
  </dependency>

  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>2.1.1-cdh6.2.0</version>
  </dependency>

  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-service</artifactId>
      <version>2.1.1-cdh6.2.0</version>
  </dependency>

  <!-- runtime Hive -->
  <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-common</artifactId>
      <version>2.1.1-cdh6.2.0</version>
      <scope>runtime</scope>
  </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-beeline</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-shims</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-serde</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-contrib</artifactId>
        <version>2.1.1-cdh6.2.0</version>
        <scope>runtime</scope>
    </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>

                    <filters>

                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                        <exclude>**/Log4j2Plugins.dat</exclude>
                    </excludes>
                </filter>

                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                    <artifactSet>
                    <excludes>
                        <exclude>classworlds:classworlds</exclude>
                        <exclude>junit:junit</exclude>
                        <exclude>jmock:*</exclude>
                        <exclude>*:xml-apis</exclude>
                        <exclude>org.apache.maven:lib:tests</exclude>
                    </excludes>
                </artifactSet>
            <skip>true</skip>
        </configuration>
          </execution>
        </executions>
    </plugin>

    </plugins>
  </build>
</project>

看起来没什么问题,但总是会引起:

20/05/07 12:03:17 INFO spark.SparkContext: Added JAR file:/data/projects/zhihu_scraper/consumers/target/original-kafka-consumer-0.1-SNAPSHOT.jar at spark://device2:42395/jars/original-kafka-consumer-0.1-SNAPSHOT.jar with timestamp 1588824197724
20/05/07 12:03:17 INFO executor.Executor: Starting executor ID driver on host localhost
20/05/07 12:03:17 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33849.
20/05/07 12:03:17 INFO netty.NettyBlockTransferService: Server created on device2:33849
20/05/07 12:03:17 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager device2:33849 with 366.3 MB RAM, BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63e5e5b4{/metrics/json,null,AVAILABLE,@Spark}
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
    at zhihu.SparkConsumer.main(SparkConsumer.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/07 12:03:18 INFO spark.SparkContext: Invoking stop() from shutdown hook

在这篇文章中,我已经尝试了所有的答案,如何使用hive支持创建sparksession。但是,他们都不为我工作。

o2g1uqev

o2g1uqev1#

<dependency> <!-- Spark dependency -->
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.4.4</version>
    <scope>compile</scope>
</dependency>

我不知道为什么 compile 它应该是什么范围 runtime . 因为您使用的是maven shade插件,所以可以打包uber jar(使用 target/original-kafka-consumer-0.1-SNAPSHOT.jar )将所有依赖项放在一个伞/归档中,并且它将位于类路径中,这样就不会遗漏任何内容。
hive-site.xml 应该在类路径中。那么就不需要单独配置metastoreuris了。以编程的方式。
进一步阅读

相关问题