Apache Spark 如何申请疾控中心？

7xzttuei 于 2023-02-24 发布在 Apache

关注(0)|答案(1)|浏览(127)

请我如何可以应用CDC（变更数据捕获），我用Spark读取的数据库，然后将其保存为 parquet 到HADOOP HDFS.这是代码：

spark = SparkSession \
        .builder \
        .appName("Ingest") \
        .master("local[*]") \
        .config("spark.driver.extraClassPath", "/home.../mysql-connector-java-5.1.30.jar") \
        .getOrCreate()
df = spark.read\
        .format("jdbc") \
        .option("url", "jdbc:mysql://localhost:3306/classicmodels") \
        .option("driver", "com.mysql.jdbc.Driver") \
        .option("dbtable", "employees") \
        .option("user", "...") \
        .option("password", "...").load()
print(df.show())
dataframe_mysql.write.parquet("hdfs://localhost:9000/...")

代码返回在 Dataframe 中读取的数据。

apache-spark

来源：https://stackoverflow.com/questions/69227512/how-to-apply-cdc