cassandra 将PCollection转换< KV>为自定义类

okxuctiv 于 2023-02-15 发布在 Cassandra

关注(0)|答案(1)|浏览(144)

我的目标是从GCS读取一个文件并将其写入Cassandra。对于Apache Beam/Dataflow来说，我是一个新手，我可以找到使用Python构建的大部分方法。不幸的是，CassandraIO对于Beam来说只是Java原生的。
我使用单词计数示例作为模板，并尝试摆脱TextIO.write()，并将其替换为CassandraIO.<Words>write()。
下面是我的Cassandra表的java类

package org.apache.beam.examples;

import java.io.Serializable;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;

@Table(keyspace = "test", name = "words", readConsistency = "ONE", writeConsistency = "QUORUM",
        caseSensitiveKeyspace = false, caseSensitiveTable = false)
public class Words implements Serializable {
//    private static final long serialVersionUID = 1L;

    @PartitionKey
    @Column(name = "word")
    public String word;

    @Column(name = "count")
    public long count;

    public Words() {
    }

    public Words(String word, int count) {
        this.word = word;
        this.count = count;
    }

    @Override
    public boolean equals(Object obj) {
        Words other = (Words) obj;
        return this.word.equals(other.word) && this.count == other.count;
    }
}

这里是主代码的管道部分。

static void runWordCount(WordCount.WordCountOptions options) {
        Pipeline p = Pipeline.create(options);

        // Concepts #2 and #3: Our pipeline applies the composite CountWords transform, and passes the
        // static FormatAsTextFn() to the ParDo transform.
        p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
                .apply(new WordCountToCassandra.CountWords())
                
// Here I'm not sure how to transform PCollection<KV> into PCollection<Words> 

.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(PCollection<KV<String, Long>>)
                }))

                .apply(CassandraIO.<Words>write()
                        .withHosts(Collections.singletonList("my_ip"))
                        .withPort(9142)
                        .withKeyspace("test")
                        .withEntity(Words.class));

        p.run().waitUntilFinish();
    }

我的理解是使用一个PTransform从PCollection<T1>传递到PCollection<T2>，我不知道如何Map它。

cassandra

来源：https://stackoverflow.com/questions/75436566/transform-pcollectionkv-to-custom-class

1条答案

按热度按时间

eanckbw91#

如果是1：1Map，则MapElements.into是正确的选择。
您可以指定一个实现SerializableFunction<FromType, ToType>的类，或者简单地使用lambda，例如：

.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(kv -> new Words(kv.getKey(), kv.getValue()));

有关详细信息，请查看MapElements。
如果转换不是一对一的，则还有其他可用选项，如FlatMapElements或ParDo。

赞(0）回复(0）举报 2023-02-15

我来回答

cassandra 将PCollection转换< KV>为自定义类

1条答案

相关问题

热门标签

最新问答