带元组的spark重分区和SortWithinPartitions

aiazj4mn  于 2021-06-09  发布在  Hbase
关注(0)|答案(1)|浏览(673)

我尝试按照以下示例对hbase行进行分区:https://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/
但是,我已经在(string,string,string)中存储了数据,其中第一个是rowkey,第二个是column name,第三个是column value。
我试着写一个隐式排序来实现orderedd隐式

implicit val caseInsensitiveOrdering: Ordering[(String, String, String)] = new Ordering[(String, String, String)] {
    override def compare(x: (String, String, String), y: (String, String, String)): Int = ???
  }

但是重新分区和其他分区仍然不可用。有没有一种方法可以将这个方法用于这个元组?

r3i60tvu

r3i60tvu1#

rdd必须有键和值,而不仅仅是值,例如:

val data = List((("5", "6", "1"), (1)))
val rdd : RDD[((String, String, String), Int)] = sparkContext.parallelize(data)
implicit val caseInsensitiveOrdering = new Ordering[(String, String, String)] {
  override def compare(x: (String, String, String), y: (String, String, String)): Int = 1
}
rdd.repartitionAndSortWithinPartitions(..)

相关问题