如何使用spark流式java api将twitter推文写入hdfs

k75qkfdt 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(373)

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);

// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
        new Function<Status, String>() {
            public String call(Status status) { return status.getText(); }
        }
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");

我能够获取twitter tweets，但我在向hdfs写入时出错。
有人能帮我用java把tweet保存到hdfs吗
下面是我得到的错误：
[错误]无法执行goal org.apache.maven。plugins:maven-compiler-plugin：3.1:在项目sparktwitterhelloworldexample上编译（默认编译）：编译失败[error]/home/hadoop/mani/sparktwitterhelloworldexample master/src/main/java/de/michaelgoettsche/sparktwitterhelloworldexample.java:[58，17] 找不到symbol[error]symbol:方法saveashadoopfiles（java.lang.string，java.lang.string）[error]位置：class org.apache.spark.streaming.api.java.javadstream

Java hadoop apache-spark spark-streaming twitter

来源：https://stackoverflow.com/questions/32568436/how-to-write-twitter-tweets-to-hdfs-using-spark-streaming-java-api

1条答案

按热度按时间

eivnm1vs1#

你需要使用 saveAsTextFile() 方法。hadoop输出格式仅适用于 JavaPairDStream （它需要键和值）。
解决方案是：

statuses.dstream().saveAsTextFiles(prefix, suffix);

赞(0）回复(0）举报 2021-05-30

我来回答

如何使用spark流式java api将twitter推文写入hdfs

1条答案

相关问题

热门标签

最新问答