在阅读了kafka的genericrecord之后,我编写了一个将流写入parquet格式的示例代码
Properties config = new Properties();
config.setProperty("bootstrap.servers", "localhost:9092");
config.setProperty("group.id", "1");
config.setProperty("zookeeper.connect", "localhost:2181");
String schemaRegistryUrl = "http://127.0.0.1:8081";
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
File file = new File(EventProcessor.class.getClassLoader().getResource("event.avsc").getFile());
Schema schema = new Schema.Parser().parse(file);
DataStreamSource<GenericRecord> input = env
.addSource(
new FlinkKafkaConsumer010<GenericRecord>("event_new",
new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl),
config).setStartFromEarliest());
Path path = new Path("/tmp");
final StreamingFileSink sink = StreamingFileSink.forBulkFormat
(path, ParquetAvroWriters.forGenericRecord(schema)).build();
input.addSink(sink);
当我运行此代码时,得到错误:
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to org.apache.avro.generic.IndexedRecord
at org.apache.avro.generic.GenericData.getField(GenericData.java:697)
at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:188)
我不明白出了什么问题。请帮助我理解并解决这个问题。
1条答案
按热度按时间ccgok5k51#
最可能的原因是event.avsc与存储在kafka中的记录不匹配。它正在寻找一个字符串,它需要一个记录。
如果您添加来自kafka的模式和示例记录(例如,使用console consumer打印),那么我可以提供更多帮助。