如何使用spark流式java api将twitter推文写入hdfs

k75qkfdt  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(367)
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);

// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
        new Function<Status, String>() {
            public String call(Status status) { return status.getText(); }
        }
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");

我能够获取twitter tweets,但我在向hdfs写入时出错。
有人能帮我用java把tweet保存到hdfs吗
下面是我得到的错误:
[错误]无法执行goal org.apache.maven。plugins:maven-compiler-plugin:3.1:在项目sparktwitterhelloworldexample上编译(默认编译):编译失败[error]/home/hadoop/mani/sparktwitterhelloworldexample master/src/main/java/de/michaelgoettsche/sparktwitterhelloworldexample.java:[58,17] 找不到symbol[error]symbol:方法saveashadoopfiles(java.lang.string,java.lang.string)[error]位置:class org.apache.spark.streaming.api.java.javadstream

eivnm1vs

eivnm1vs1#

你需要使用 saveAsTextFile() 方法。hadoop输出格式仅适用于 JavaPairDStream (它需要键和值)。
解决方案是:

statuses.dstream().saveAsTextFiles(prefix, suffix);

相关问题