spark结构化流式avro到avro和自定义接收器

vddsk6oq 于 2021-06-07 发布在 Kafka

关注(0)|答案(1)|浏览(291)

有人能给我介绍一个在s3或任何文件系统中编写avro的好例子或示例吗？我正在使用一个自定义接收器，但我想通过sinkprovider的构造函数传递一些属性Map，我想可以进一步传递到接收器？
更新代码：

val query = df.mapPartitions { itr =>
  itr.map { row =>
    val rowInBytes = row.getAs[Array[Byte]]("value")
    MyUtils.deserializeAvro[GenericRecord](rowInBytes).toString
  }
}.writeStream
  .format("com.test.MyStreamingSinkProvider")
  .outputMode(OutputMode.Append())
  .queryName("testQ" )
  .trigger(ProcessingTime("10 seconds"))
  .option("checkpointLocation", "my_checkpoint_dir")
  .start()

query.awaitTermination()

接收器提供程序：

class MyStreamingSinkProvider extends StreamSinkProvider {

  override def createSink(sqlContext: SQLContext, parameters: Map[String, String], partitionColumns: Seq[String], outputMode: OutputMode): Sink = {
    new MyStreamingSink
  }
}

Flume：

class MyStreamingSink extends Sink with Serializable {

  final val log: Logger = LoggerFactory.getLogger(classOf[MyStreamingSink])

  override def addBatch(batchId: Long, data: DataFrame): Unit = {
    //For saving as text doc
    data.rdd.saveAsTextFile("path")

    log.warn(s"Total records processed: ${data.count()}")
    log.warn("Data saved.")
  }
}

avro scala apache-kafka spark-structured-streaming

来源：https://stackoverflow.com/questions/49339469/spark-structured-streaming-avro-to-avro-and-custom-sink

1条答案

按热度按时间

mm9b1k5b1#

您应该能够通过 writeStream.option(key, value) :

DataStreamWriter writer = dataset.writeStream()
  .format("com.test.MyStreamingSinkProvider")
  .outputMode(OutputMode.Append())
  .queryName("testQ" )
  .trigger(ProcessingTime("10 seconds"))
  .option("key_1", "value_1")
  .option("key_2", "value_2")
  .start()

在这种情况下 parameters in方法 MyStreamingSinkProvider.createSink(...) 将包含 key_1 以及 key_2

赞(0）回复(0）举报 2021-06-07

我来回答

spark结构化流式avro到avro和自定义接收器

1条答案

相关问题

热门标签

最新问答