java—如何使用cloudera cdh4和maven获得正在运行的spring数据hadoop项目

gz5pxeao  于 2021-06-04  发布在  Hadoop
关注(0)|答案(2)|浏览(356)

由于springdatahadoop还没有发布,很难找到一个运行示例配置来与cloudera一起使用它。
在hadoop和cdh4(hadoop2.0.0-cdh4.1.3)一起运行spring数据时,我需要选择哪些依赖项?
通过选择不同的职位,我得到了以下例外:
空指针

Exception in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError
    at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183)
    at java.lang.Thread.run(Thread.java:722)
    Caused by: java.lang.NullPointerException
    at org.springframework.util.ReflectionUtils.makeAccessible(ReflectionUtils.java:405)
    at org.springframework.data.hadoop.mapreduce.JobUtils.<clinit>(JobUtils.java:123)
    ... 2 more

版本不匹配7到4

Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
    at org.apache.hadoop.ipc.Client.call(Client.java:1070)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
    at org.springframework.data.hadoop.mapreduce.JobFactoryBean.afterPropertiesSet(JobFactoryBean.java:208)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1545)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1483)
... 12 more
uxh89sit

uxh89sit1#

这是一个如何配置它的示例。
maven设置:
笔记:
(可选)从spring数据hadoop中排除hadoop流和hadoop工具
添加hadoopcommon和hadoophdfs,通用版本为:2.0.0-cdhx.x.x
在mr1版本中添加hadoop工具和hadoop流:2.0.0-mr1-cdhx.x.x
spring数据hadoop目前只支持mr1。所以确保你没有把mr2包含在其他依赖项中。检查一下这个 mvn dependency:tree !
pom.xml文件:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>com.example.main</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <properties>
        <java-version>1.7</java-version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spring.version>3.2.0.RELEASE</spring.version>
        <spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version>
        <hadoop.version.generic>2.0.0-cdh4.1.3</hadoop.version.generic>
        <hadoop.version.mr1>2.0.0-mr1-cdh4.1.3</hadoop.version.mr1>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>${spring.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-logging</groupId>
                    <artifactId>commons-logging</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-context</artifactId>
            <version>${spring.version}</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-hadoop</artifactId>
            <version>${spring.hadoop.version}</version>

            <exclusions>
                <!-- Excluded the Hadoop dependencies to be sure that they are not mixed 
                    with them provided by cloudera. -->
                <exclusion>
                    <artifactId>hadoop-streaming</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>hadoop-tools</artifactId>
                    <groupId>org.apache.hadoop</groupId>
                </exclusion>
            </exclusions>

        </dependency>

        <!-- Hadoop Cloudera Dependencies -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version.generic}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version.generic}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-tools</artifactId>
            <version>2.0.0-mr1-cdh4.1.3</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-streaming</artifactId>
            <version>2.0.0-mr1-cdh4.1.3</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>${java-version}</source>
                    <target>${java-version}</target>
                </configuration>
            </plugin>

        </plugins>
    </build>

    <repositories>
        <repository>
            <id>spring-milestones</id>
            <url>http://repo.springsource.org/libs-milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>

        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>

        <repository>
            <id>spring-snapshot</id>
            <name>Spring Maven SNAPSHOT Repository</name>
            <url>http://repo.springframework.org/snapshot</url>
        </repository>
    </repositories>
</project>

spring设置(applicationcontext.xml):
更换 fs.default.name 使用namenode域

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xsi:schemaLocation="
                    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
                    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
                    http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration
                    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd">

    <hdp:configuration id="hadoopConfiguration">
        fs.default.name=hdfs://example.com:8020
    </hdp:configuration>

    <hdp:job id="wordCountJob" 
        mapper="com.example.WordMapper"
        reducer="com.example.WordReducer" 
        input-path="/user/christian/input/test"
        output-path="/user/christian/output2" />

    <hdp:job-runner job-ref="wordCountJob" run-at-startup="true"
        wait-for-completion="true" />

这样您就可以访问集群了。
一些参考资料:
springsource论坛-空指针异常
cloudera maven存储库

8yparm6h

8yparm6h2#

嘿,你可以从https://github.com/spring-projects/spring-data-book.
构建和运行它在readme文档中给出。

相关问题