mongodb MongoIO Apache beam GCP Dataflow with Mongo Upsert Pipeline示例

ogq8wdun  于 2023-11-17  发布在  Go
关注(0)|答案(1)|浏览(173)

我正在寻找一个例子来实现Apache梁GCP的流水线更新Mongo DB中的数据使用upsert操作,即如果值存在,它应该更新值,如果没有,它应该插入。
像下面这样:

pipeline.apply(...)
.apply(MongoDbIO.write()
.withUri("mongodb://localhost:27017")
.withDatabase("my-database")
.withCollection("my-collection")
.withUpdateConfiguration(UpdateConfiguration.create().withUpdateKey("key1")
      .withUpdateFields(UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
                        UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
                       //pushes entire input doc to the dest field
                         UpdateField.fullUpdate("$push", "dest-field3") )));

字符串
下面是我的管道代码,我目前插入的文件后,preapring收集如下

{"_id":{"$oid":"619632693261e80017c44145"},"vin":"SATESTCAVA74621","timestamp":"2021-11-18T10:48:59.889Z","key":"EV_CHARGE_NOW_SETTING","value":"DEFAULT"}


现在我想更新'value'和'timestamp',如果'vin'和'key'的组合存在,如果'vin'和'key'组合不存在,然后使用upsert插入新文档。

PCollection<PubsubMessage> pubsubMessagePCollection= pubsubMessagePCollectionMap.get(topic);
            pubsubMessagePCollection.apply("Convert pubsub to kv,k=vin", ParDo.of(new ConvertPubsubToKVFn()))
                .apply("group by vin key",GroupByKey.<String,String>create())
                .apply("filter data for alerts, status and vehicle data", ParDo.of(new filterMessages()))
                .apply("converting message to document type", ParDo.of(
                    new ConvertMessageToDocumentTypeFn(list_of_keys_str, collection, options.getMongoDBHostName(),options.getMongoDBDatabaseName())).withSideInputs(list_of_keys_str))
                .apply(MongoDbIO.write()
                    .withUri(options.getMongoDBHostName())
                    .withDatabase(options.getMongoDBDatabaseName())
                    .withCollection(collection));


现在,如果我想使用下面的代码行:

.withUpdateConfiguration(UpdateConfiguration.create().withUpdateKey("key1")
      .withUpdateFields(UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
                        UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
                       //pushes entire input doc to the dest field
                         UpdateField.fullUpdate("$push", "dest-field3") )));


我的key 1“source-field 1”、“dest-field 1”“source-field 2”、“dest-field 2”、**“dest-field 3”**是什么?
我对这个值感到困惑。请帮助!
下面的代码我正在尝试更新

MongoDbIO.write()
.withUri(options.getMongoDBHostName())
.withDatabase(options.getMongoDBDatabaseName())
.withCollection(collection)
.withUpdateConfiguration(UpdateConfiguration.create()
                            .withIsUpsert(true)
                            .withUpdateKey("vin")
                            .withUpdateKey("key")
                            .withUpdateFields(UpdateField.fieldUpdate("$set", "vin", "vin"),
                                              UpdateField.fieldUpdate("$set", "key", "key"),
                                              UpdateField.fieldUpdate("$set", "timestamp", "timestamp"),
                                              UpdateField.fieldUpdate("$set", "value", "value")))


使用上面的代码我的文档不是更新,而是添加id = vin,它应该更新的基础上现有的记录与vin和关键字匹配,如果插入它也应该插入自动生成的id值。
请建议在这里做什么?

xoshrz7s

xoshrz7s1#

从这里读取upsert配置,您可以使用withIsUpsert(true)进行配置。
在原始语法中,添加额外的行以启用upsert。

pipeline.apply(...)
  .apply(MongoDbIO.write()
    .withUri("mongodb://localhost:27017")
    .withDatabase("my-database")
    .withCollection("my-collection")
    .withUpdateConfiguration(
      UpdateConfiguration.create()
        .withIsUpsert(true)
        .withUpdateKey("key1")
        .withUpdateFields(
          UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
          UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
          //pushes entire input doc to the dest field
          UpdateField.fullUpdate("$push", "dest-field3"))));

字符串

相关问题