spark cassandra方法

x9ybnkn6  于 2021-06-10  发布在  Cassandra
关注(0)|答案(2)|浏览(684)

我在databricks笔记本上使用spark,cassandra,spark cassandra连接器,根据他们的网站,我们可以使用'deletefromcassandra'删除行:https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md, https://datastax-oss.atlassian.net/browse/sparkc-349 下面是我的python脚本:

def read_table(tableName,kespace, columns):
  dfData = (spark
        .read
        .format("org.apache.spark.sql.cassandra")
        .options(table = tableName, keyspace = kespace)
        .load()
        .select(*columns))
  return dfData 

emails='abc@test.com'.split(",")
df = read_table(my_table, my_keyspace,"*").where(col("email").isin(emails))
df.rdd.deleteFromCassandra(my_keyspace, my_table)

但它失败了:

AttributeError: 'RDD' object has no attribute 'deleteFromCassandra'

注意到他们提供的所有示例都在scala中,这是否意味着函数deletefromcassandra在python中不可用?

cig3rfwq

cig3rfwq1#

使用stock spark cassandra连接器是不可能的,因为python绑定只支持Dataframe。但PySparkCassandra应该是可能的,这也可以在SparkPackages网站上找到 --packages anguenot:pyspark-cassandra:2.4.0 . 像这样:

dataFrame.rdd().deleteFromCassandra(keyspace, table)
kulphzqa

kulphzqa2#

希望这能解决-

import com.datastax.spark.connector._

相关问题