配置单元和配置单元llap之间的结果集不一致

sqougxex  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(662)

我们正在HDI4.0上使用Hive3.1.x集群,其中1个是llap,另一个只是Hive3.1.x集群。
我们在两个集群上都创建了一个托管表,行数为 272409 .
在两个群集上合并之前

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date  | col_count  | col_distinct_count  |        min_lmd         |        max_lmd         |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615            | 272409     | 272409              | 2020-06-15 00:00:12.0  | 2020-07-26 23:42:17.0  |
+---------------------+------------+---------------------+------------------------+------------------------+
``` `Based on the delta, we'd perform a merge operation (which updates 17 rows).` 在hive llap群集上合并后(压缩前)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272392 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+

在hive llap群集上合并后(压缩后)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272409 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+

仅在配置单元群集上合并后(不压缩增量)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date | col_count | col_distinct_count | min_lmd | max_lmd |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615 | 272409 | 272409 | 2020-06-15 00:00:12.0 | 2020-07-27 22:52:34.0 |
+---------------------+------------+---------------------+------------------------+------------------------+

这就是观察到的不一致
但是,在hivellap上压缩表之后,结果集不一致并没有出现,两个集群都返回相同的结果。 `We thought it might be due to either caching or llap issue, so we restarted the hive-server2 process which will clear the cache. The issue is still persistent.` We also created a dummy table with same schema on just hive cluster and pointed the location of that table to that of llap one, which in turn is producing result as expected. `We even queried on spark using**Qubole spark-acid reader**(direct hive managed table reader), which is also producing expected result` 这太奇怪了,有人能帮上忙吗。
6qqygrtg

6qqygrtg1#

我们在hdinsight hive llap集群中也遇到了类似的问题。设置时 hive.llap.io.enabled 作为 false 解决了这个问题

gmxoilav

gmxoilav2#

qubole还不支持hive llap(但是,我们(在qubole)正在评估将来是否支持这一点)

相关问题