在scala中反序列化avro数据有问题

b1payxdu  于 2021-06-08  发布在  Kafka
关注(0)|答案(1)|浏览(385)

我正在scala中构建一个ApacheFlink应用程序,它从kafka总线读取流数据,然后对其执行汇总操作。来自kafka的数据是avro格式的,需要一个特殊的反序列化类。我找到了这个scala类AvrodeserializationsChema(http://codegists.com/snippet/scala/avrodeserializationschemascala_saveveltri_scala):

package org.myorg.quickstart
import org.apache.avro.io.BinaryDecoder
import org.apache.avro.io.DatumReader
import org.apache.avro.io.DecoderFactory
import org.apache.avro.reflect.ReflectDatumReader
import org.apache.avro.specific.{SpecificDatumReader, SpecificRecordBase}
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.java.typeutils.TypeExtractor
import org.apache.flink.api.common.serialization._
import java.io.IOException

class AvroDeserializationSchema[T](val avroType: Class[T]) extends DeserializationSchema[T] {
  private var reader: DatumReader[T] = null
  private var decoder : BinaryDecoder = null

  def deserialize(message: Array[Byte]): T = {
    ensureInitialized()
    try {
      decoder = DecoderFactory.get.binaryDecoder(message, decoder)
      reader.read(null.asInstanceOf[T], decoder)
    }
    catch {
      case e: IOException => {
        throw new RuntimeException(e)
      }
    }
  }

  def isEndOfStream(nextElement: T): Boolean = false

  def getProducedType: TypeInformation[T] = TypeExtractor.getForClass(avroType)

  private def ensureInitialized() {
    if (reader == null) {
      if (classOf[SpecificRecordBase].isAssignableFrom(avroType)) {
        reader = new SpecificDatumReader[T](avroType)
      }
      else {
        reader = new ReflectDatumReader[T](avroType)
      }
    }
  }
}

在我的流媒体课程中,我使用以下方法:

val stream = env
        .addSource(new FlinkKafkaConsumer010[String]("test", new 
AvroDeserializationSchema[DeviceData](Class[DeviceData]), properties))

其中,devicedata是在同一项目中定义的scala case类

/**Case class to hold the Device data. */
case class DeviceData(deviceId: String,
                    sw_version: String,
                    timestamp: String,
                    reading: Double
                   )

编译streamingkafkaclient.scala类时出现以下错误

Error:(24, 102) object java.lang.Class is not a value
        .addSource(new FlinkKafkaConsumer010[String]("test", new 
AvroDeserializationSchema[DeviceData](Class[DeviceData]), properties))

也试过了

val stream = env
        .addSource(new FlinkKafkaConsumer010[String]("test", new 
AvroDeserializationSchema[DeviceData](classOf[DeviceData]), properties))

我得到了一个不同的错误:

Error:(21, 20) overloaded method constructor FlinkKafkaConsumer010 with alternatives:
  (x$1: java.util.regex.Pattern,x$2: org.apache.flink.streaming.util.serialization.KeyedDeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String] <and>
  (x$1: java.util.regex.Pattern,x$2: org.apache.flink.api.common.serialization.DeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String] <and>
  (x$1: java.util.List[String],x$2: org.apache.flink.streaming.util.serialization.KeyedDeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String] <and>
  (x$1: java.util.List[String],x$2: org.apache.flink.api.common.serialization.DeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String] <and>
  (x$1: String,x$2: org.apache.flink.streaming.util.serialization.KeyedDeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String] <and>
  (x$1: String,x$2: org.apache.flink.api.common.serialization.DeserializationSchema[String],x$3: java.util.Properties)org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010[String]
 cannot be applied to (String, org.myorg.quickstart.AvroDeserializationSchema[org.myorg.quickstart.DeviceData], java.util.Properties)
        .addSource(new FlinkKafkaConsumer010[String]("test", new AvroDeserializationSchema[DeviceData](classOf[DeviceData]), properties))

我是scala的新手(这是我的第一个scala程序),所以我知道我缺少一些基本的东西。当我尝试学习scala时,有人能指出我做错了什么吗。我的意图基本上是把avro编码的数据从Kafka读入flink,并对流数据做一些操作。我找不到任何使用avrodeserializationschema类的例子,在我看来,这应该是本机内置到flink包中的东西。

m0rkklqb

m0rkklqb1#

为了在scala中获得类对象,您需要 classOf[DeviceData] ,不是 Class[DeviceData] ```
new AvroDeserializationSchemaDeviceData

我找不到任何使用avrodeserializationschema类的示例
我发现了一个(在 java )
另外,在Flink1.6版本中,他们会添加这个类,而不是从其他地方复制。flink-9337和flink-9338
如注解中所述,如果您想使用合流avro模式注册表而不是给出类类型,请参阅此答案,或参考上面github链接中的代码
另外,如果您运行的是kafka0.11+(或confluent3.3+),那么您最好使用 `FlinkKafkaConsumer011` 以及反序列化到的类

new FlinkKafkaConsumer011[DeviceData]

相关问题