scala和hive：编写适用于所有可写类型的泛型方法的最佳方法

4ioopgfo 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(480)

我正在为scala中的hive编写通用UDF。我的第一个测试是编写一个函数来求和数组（复杂数据类型）。
我的代码存根如下所示（因为这是存根，请忽略 asInstanceOf ：d）：

...

class SumElements extends GenericUDF {

  protected val expectedCategories: Array[Category] = Array(ObjectInspector.Category.LIST)
  protected var listInspector: ListObjectInspector = _

  @throws(classOf[UDFNullArgumentException])
  @throws(classOf[UDFArgumentLengthException])
  @throws(classOf[UDFArgumentTypeException])
  override def initialize(inspectors: Array[ObjectInspector]): ObjectInspector = {
    ...
    listInspector = inspectors(0).asInstanceOf[ListObjectInspector]
    ...
  }

  @throws(classOf[HiveException])
  override def evaluate(args: Array[DeferredObject]): AnyRef = {

    val list: util.List[_] = listInspector.getList(args(0).get)
    val listLength: Int = listInspector.getListLength(list)

    val tmp: IndexedSeq[Int] = for {
      i <- 0 until listLength
    } yield listInspector.getListElement(list, i).asInstanceOf[IntWritable].get

    tmp.sum.asInstanceOf[AnyRef]
  }

  override def getDisplayString(args: Array[String]): String = "SumElements(Array<Numeric>)"
}

基本上，我必须阅读列表中的每一个元素，将其转换为 IntWritable 然后 get 原始的。上面的代码可以工作并返回正确的总和，但不是泛型的：它只对 Int .
试图创建一个通用的我得到了这个：

class HadoopList(list: util.List[_], listInspector: ListObjectInspector) {

  def fromWritableToPrimitive[W <: Writable, N]: IndexedSeq[N] = {

    val listLength: Int = listInspector.getListLength(list)

    val tmp: IndexedSeq[N] = for {
      i <- 0 until listLength
    } yield listInspector.getListElement(list, i).asInstanceOf[W].get.asInstanceOf[N]

    tmp
  }

}

但事实证明 Writable 接口不强制该函数 get ! 发现具体的可写类型不能保证具有 get 方法。
我的问题是：
我错过什么了吗？是否存在提供契约的intwritable超类 get 这样我就可以通用了？
为什么java似乎能自动将intwritable转换成int，而scala却不能？java示例没有强制转换步骤
有没有更好的scala方法？

hadoop Hive scala Generics

来源：https://stackoverflow.com/questions/40196516/scala-and-hive-best-way-to-write-a-generic-method-that-works-with-all-types-of

1条答案

按热度按时间

xiozqbni1#

虽然不是用scala编写的，但是查看配置单元本身中的泛型udafs，例如genericudafaverage或GenericUDAFHistorogramNumeric，似乎它们将任何基本数值输入转换为double（使用primitiveobjectinspectorutils.getdouble），然后对double进行操作。
另一种方法是在运行时查找类型（通过对每个数字基元类型检查isinstanceof），然后调用泛型函数。

赞(0）回复(0）举报 2021-06-03

我来回答

scala和hive：编写适用于所有可写类型的泛型方法的最佳方法

1条答案

相关问题

热门标签

最新问答