我正在开发一个gcp云数据流作业,它使用kafka代理和模式注册表。我们的kafka代理和模式注册表需要tls客户端证书。我在部署时遇到了架构注册表的连接问题。任何建议都是受欢迎的。
下面是我为dataflow工作所做的。我为tls配置创建使用者属性。
props.put("security.protocol", "SSL");
props.put("ssl.truststore.password", "aaa");
props.put("ssl.keystore.password", "bbb");
props.put("ssl.key.password", "ccc"));
props.put("schema.registry.url", "https://host:port")
props.put("specific.avro.reader", true);
并通过updateconsumerproperties更新使用者属性。
Pipeline p = Pipeline.create(options)
...
.updateConsumerProperties(properties)
...
正如这个stackoverflow答案所建议的,我还将keystore和truststore下载到本地目录,并在consumerfactory中的consumerproperties上指定truststore/keystore位置。
truststore和google云数据流
Pipeline p = Pipeline.create(options)
...
.withConsumerFactoryFn(new MyConsumerFactory(...))
...
在consumerfactory中:
public Consumer<byte[], byte[]> apply(Map<String, Object> config) {
// download keyStore and trustStore from GCS bucket
config.put("ssl.truststore.location", (Object)localTrustStoreFilePath)
config.put("ssl.keystore.location", (Object)localKeyStoreFilePath)
new KafkaConsumer<byte[], byte[]>(config);
}
使用此代码,我成功地进行了部署,但数据流作业获得了tls服务器证书验证错误。
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
sun.security.validator.Validator.validate(Validator.java:260)
sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1513)
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:208)
io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:252)
io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:482)
io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:475)
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:151)
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:230)
io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:209)
io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:116)
io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:88)
org.fastretailing.rfid.store.siv.EPCTransactionKafkaAvroDeserializer.deserialize(EPCTransactionKafkaAvroDeserializer.scala:14)
org.fastretailing.rfid.store.siv.EPCTransactionKafkaAvroDeserializer.deserialize(EPCTransactionKafkaAvroDeserializer.scala:7)
org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance(KafkaUnboundedReader.java:234)
org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.start(KafkaUnboundedReader.java:176)
org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:779)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1228)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:143)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:967)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
然后我发现schema registry客户机从系统属性加载tls配置。https://github.com/confluentinc/schema-registry/issues/943
我测试了Kafka消费者与相同的配置,我确认它的工作正常。
props.put("schema.registry.url", "https://host:port")
props.put("specific.avro.reader", true);
props.put("ssl.truststore.location", System.getProperty("javax.net.ssl.trustStore"));
props.put("ssl.truststore.password", System.getProperty("javax.net.ssl.keyStore"));
props.put("ssl.keystore.location", System.getProperty("javax.net.ssl.keyStore"));
props.put("ssl.keystore.password", System.getProperty("javax.net.ssl.keyStorePassword"));
props.put("ssl.key.password", System.getProperty("javax.net.ssl.key.password"));
接下来我应用了相同的方法,这意味着对系统属性和使用者属性应用相同的tls配置,对数据流作业代码应用相同的tls配置。
在执行应用程序时,我通过系统属性指定了密码。
-Djavax.net.ssl.keyStorePassword=aaa \
-Djavax.net.ssl.key.password=bbb \
-Djavax.net.ssl.trustStorePassword=ccc \
注意:我在使用者工厂中为信任库和密钥库位置设置了系统属性,因为这些文件下载到本地临时目录。
config.put("ssl.truststore.location", (Object)localTrustStoreFilePath)
config.put("ssl.keystore.location", (Object)localKeyStoreFilePath)
System.setProperty("javax.net.ssl.trustStore", localTrustStoreFilePath)
System.setProperty("javax.net.ssl.keyStore", localKeyStoreFilePath)
但即使是部署也因超时错误而失败。
Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:224)
...
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
Caused by: java.lang.IllegalArgumentException: DataflowRunner requires gcpTempLocation, but failed to retrieve a value from PipelineOptions
at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:246)
Caused by: java.lang.IllegalArgumentException: Error constructing default value for gcpTempLocation: tempLocation is not a valid GCS path, gs://dev-k8s-rfid-store-dataflow/rfid-store-siv-epc-transactions-to-bq/tmp.
at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:255)
...
Caused by: java.lang.RuntimeException: Unable to verify that GCS bucket gs://dev-k8s-rfid-store-dataflow exists.
at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPathIsAccessible(GcsPathValidator.java:86)
...
Caused by: java.io.IOException: Error getting access token for service account: java.security.NoSuchAlgorithmException: Error constructing implementation (algorithm: Default, provider: SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)
at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:401)
...
Caused by: java.net.SocketException: java.security.NoSuchAlgorithmException: Error constructing implementation (algorithm: Default, provider: SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)
at javax.net.ssl.DefaultSSLSocketFactory.throwException(SSLSocketFactory.java:248)
...
Caused by: java.security.NoSuchAlgorithmException: Error constructing implementation (algorithm: Default, provider: SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)
at java.security.Provider$Service.newInstance(Provider.java:1617)
...
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:780)
Caused by: java.security.UnrecoverableKeyException: Password verification failed
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:778)
我错过什么了吗?
2条答案
按热度按时间9fkzdhlc1#
在
ConsumerFactoryFn
,您需要将证书从某个位置(如gcs)复制到计算机上的本地文件路径。在truststore和google云数据流中
ConsumerFnFactory
用户编写的代码片段从gcs获取信任库:您将需要做一些类似的事情(它不必是gcs,但它确实需要从所有在googleclouddataflow上执行管道的vm访问)。
qrjkbowd2#
我得到了gcp支持部门的回复。apachebeam似乎不支持schema注册表。
你好,数据流Maven给我回电了。我现在要揭露他们告诉我的事情。
您的问题的答案是否定的,apachebeam不支持schema注册表。但是,他们告诉我,您可以自己实现对schema registry的调用,因为beam只消耗原始消息,用户有责任对数据做任何他们需要的事情。
这是基于我们对这种情况的理解,即您希望向kafka发布消息,并让df使用这些消息,根据注册表中的模式对它们进行解析。
我希望这些信息能对你有用,让我知道如果我可以进一步帮助。
但dataflow作业仍然可以接收avro格式的二进制消息。所以您可以在内部调用schema registry restapi,如下所示。https://stackoverflow.com/a/55917157