mahout:缺少创建序列文件的类

dpiehjr4  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(378)

我按照mahout站点的说明将现有文件转换为序列文件:

VectorWriter vectorWriter = SequenceFile.createWriter(filesystem,
                                                  configuration,
                                                  outfile,
                                                  LongWritable.class,
                                                  SparseVector.class);

long numDocs = vectorWriter.write(new VectorIterable(), Long.MAX_VALUE);

我已经将mahout jar包含在我的maven项目中:

<dependency>
        <groupId>org.apache.mahout</groupId>
        <artifactId>mahout-core</artifactId>
        <version>0.9</version>
    </dependency>

但它不会写文件。
我得到这个错误:

Caused by: java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:963)
at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1136)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)

经进一步调查,其原因是:

Serilization class not found: java.lang.ClassNotFoundException: org.apache.hadoop.io.serializer.WritableSerialization

这说明我少了一个jar——有人知道是哪一个吗?

dy1byipe

dy1byipe1#

问题是我在lenskit中使用它 Configuration 类尝试使用 Thread.currentThread().getContextClassLoader() 它(无论出于什么原因)没有mahout或hadoop包。完整代码为:

final Configuration configuration = new Configuration();
configuration.setClassLoader(Configuration.class.getClassLoader());

final Path path = new Path(POINTS_PATH + "/pointsFile");

LocalFileSystem fs = new LocalFileSystem();
fs.initialize(URI.create(POINTS_PATH + "/pointsFile"), configuration);

final SequenceFile.Writer writer =
        SequenceFile.createWriter(
                fs,
                configuration,
                path,
                Text.class,
                VectorWritable.class);

相关问题