启动zookeeper,hadoop,saprk
# 1、三个节点
/usr/zookeeper/zookeeper-3.4.10/bin/zkServer.sh start
/usr/zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
# 2、master节点 启动hadoop
/usr/hadoop/hadoop-2.7.3/sbin/start-all.sh
# 3、master节点 启动spark
/usr/spark/spark-2.4.0-bin-hadoop2.7/sbin/start-all.sh
# 在三个节点后台启动kafka
cd $KAFKA_HOME
./bin/kafka-server-start.sh config/server.properties &
cd $KAFKA_HOME
# 创建主题 badou_topic
./bin/kafka-topics.sh --create --zookeeper master:2181,slave1:2181,slave2:2181 --replication-factor 3 --partitions 6 --topic badou_topic
# 创建生产者producer
./bin/kafka-console-producer.sh --broker-list master:9092,slave1:9092,slave2:9092 --topic badou_topic
# 创建消费者consumer
./bin/kafka-console-consumer.sh --from-beginning --topic badou_topic --bootstrap-server master:9092,slave1:9092,slave2:9092
对2s内,拉取的数据进行wordCount处理。
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
object ReceiverTest01 {
def main(args: Array[String]): Unit = {
val Array(group_id, topic, exectime) = Array("group_badou_topic", "badou_topic", "2")
val conf = new SparkConf().setAppName("Receiver Test").setMaster("local[2]")
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
val ssc = new StreamingContext(conf, Seconds(exectime.toInt))
// 定义外部传入的topic, 可以指定多个topic
val topicSet = topic.split(",").toSet
// 定义线程
val numThreads = 1
val topicMap = topicSet.map((_, numThreads.toInt)).toMap
val zkQuorum = "192.168.142.128:2181" // master的zookeeper地址
// createStream 表示receiver方式,核心操作
val lines = KafkaUtils.createStream(ssc, zkQuorum, group_id, topicMap).map(_._2)
lines.map((_,1L)).reduceByKey(_+_).print()
// (null,1) !==> map(_._2)
ssc.start()
ssc.awaitTermination()
}
}
运行代码程序,在xshell的kafka–>producer,进行数据输入
例如输入: 代码结果会显示:
word
word
word
set
set
set
srt
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/weixin_44775255/article/details/121863315
内容来源于网络,如有侵权,请联系作者删除!