Spark SQL无法递归读取hive表的HDFS子文件夹(Spark - 2.4.6)

mwyxok5s  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(260)

我们正在尝试使用Spark-SQL读取配置单元表,但它未显示任何记录(输出中给出0条记录)。检查时,我们发现该表的HDFS文件存储在多个子目录中,如下所示-

hive> [hadoop@ip-10-37-195-106 CDPJobs]$ hdfs dfs -ls /its/cdp/refn/cot_tbl_cnt_hive/     
Found 18 items     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/1     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/10     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/11     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/12     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/13     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/14     
drwxrwxr-x+ - hadoop hadoop 0 2021-12-19 20:17 /its/cdp/refn/cot_tbl_cnt_hive/15

我们尝试在spark-defaults.conf文件中设置以下属性,但问题仍然存在。

set spark.hadoop.hive.supports.subdirectories = true;    
set spark.hadoop.hive.mapred.supports.subdirectories = true;     
set spark.hadoop.hive.input.dir.recursive=true;     
set mapreduce.input.fileinputformat.input.dir.recursive=true;          
set recursiveFileLookup=true;            
set spark.hive.mapred.supports.subdirectories=true;         
set spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true;

有人知道解决这个问题的方法吗?我们使用的是Spark 2.4.6版。

更新(找到解决方案)-

我已将此属性更改为false,现在spark可以从子目录读取数据。
如果您的数据库中有一个错误,请将其设置为false;

zxlwwiss

zxlwwiss1#

sparkSession = (SparkSession
                    .builder
                    .appName('USS - Unified Scheme of Sells')
                    .config("hive.metastore.uris", "thrift://probighhwm001:9083", conf=SparkConf())
                    .config("hive.input.dir.recursive", "true")
                    .config("hive.mapred.supports.subdirectories", "true")
                    .config("hive.supports.subdirectories", "true")
                    .config("mapred.input.dir.recursive", "true")
                    .enableHiveSupport()
                    .getOrCreate()
                    )

相关问题