如何在spark中正确提取hive表中的array< bigint>？

gz5pxeao 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(1133)

我有一个配置单元表，它有一列（c4）和 array<bigint> 类型。现在，我想用spark提取这个列。下面是代码片段：

val query = """select c1, c2, c3, c4 from 
               some_table where some_condition"""
val rddHive = hiveContext.sql(query).rdd.map{ row =>

//is there any other ways to extract wid_list(String here seems not work)
//no compile error and no runtime error
val w = if (row.isNullAt(3)) List() else row.getAs[scala.collection.mutable.WrappedArray[String]]("wid_list").toList
w
}
-> rddHive: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[7] at map at <console>:32

rddHive.map(x => x(0).getClass.getSimpleName).take(1)
-> Array[String] = Array[Long]

所以，我用 getAs[scala.collection.mutable.WrappedArray[String]] ，而原始数据类型为 array<int> . 但是，没有编译错误或运行时错误。我提取的数据仍然是bigint（long）类型。那么，这里发生了什么（为什么没有编译器错误或运行时错误）？正确的提取方法是什么 array<int> 作为 List[String] Spark？
====================添加更多信息====================

hiveContext.sql(query).printSchema
root
 |-- c1: string (nullable = true)
 |-- c2: integer (nullable = true)
 |-- c3: string (nullable = true)
 |-- c4: array (nullable = true)
 |    |-- element: long (containsNull = true)

hiveContext.sql(query).show(3)
+--------+----+----------------+--------------------+
|      c1|  c2|              c3|                  c4|
+--------+----+----------------+--------------------+
|   c1111|   1|5511798399.22222|[21772244666, 111...|
|   c1112|   1|5511798399.88888|[11111111, 111111...|
|   c1113|   2| 5555117114.3333|[77777777777, 112...|

Hive scala apache-spark Arrays spark-dataframe

来源：https://stackoverflow.com/questions/47586605/how-to-extract-arraybigint-in-hive-table-in-spark-properly

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何在spark中正确提取hive表中的array< bigint>？

暂无答案！

相关问题

热门标签

最新问答