spark配置单元sqoop：通过sqoop导出使用spark显示垃圾字符在配置单元表中保存数据

gblwokeq 于 2021-07-15 发布在 Hadoop

关注(0)|答案(1)|浏览(377)

我正在尝试从配置单元表（hivetable1）加载数据，然后使用spark对其进行一些修改，然后再次保存到配置单元的另一个表（hivetable2）中。当我从hivetable2中选择*时，它会显示正确的数据，但当我尝试在hdfs中查看同一个文件时，它会显示所有垃圾字符，如下所示。当我尝试使用sqoop导出postgres中的相同数据时，它会将整个数据附加到postgres表的单个列中。
Spark脚本：

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .config("hive.metastore.uris", "thrift://localhost:9083") \
    .config("spark.sql.catalogImplementation=hive") \
    .enableHiveSupport() \
    .getOrCreate() 

df = spark.sql("select * from hivetable1")

df.write.format("hive").mode('overwrite').option("delimiter", "\t").saveAsTable("hivetable2")

hdfs文件数据：
hadoop fs-cat/user/hive/warehouse/tb.db/hivetable2/part-0000

lnullUnknownnullNull\n\n\n\nnullNull0.00.0Null\nnull\nnullnull\nnullnull\nnullnull\nnullnull\nnullnull\nnullnull
sqoop导出：

sqoop export --connect jdbc:postgresql://localhost:5432/postgres?stringtype=unspecified -m 1 --table test --export-dir /user/hive/warehouse/tb.db/hivetable2 \
 --username test --password test --input-fields-terminated-by '\t'

hadoop Hive hdfs sqoop apache-spark

来源：https://stackoverflow.com/questions/65698547/spark-hive-sqoop-saving-data-in-hive-table-using-spark-showing-junk-char-with-s