java.lang.classcastexception:org.apache.hadoop.hive.ql.io.orc.orcstruct不能转换为org.apache.hadoop.io.textjson serde错误

oyt4ldly  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(605)

我对在hive上使用json数据还不熟悉。我正在开发一个spark应用程序,它获取json数据并将其存储到配置单元表中。我有这样一个json:

展开后如下所示:

我能够将json读入Dataframe并将其保存在hdfs上的某个位置。但是让Hive能够读取数据是很困难的。
例如,我在网上搜索之后,就尝试这样做:
使用 STRUCT 然后使用 column.element .
例如: web_app_security 将是(类型)列的名称 STRUCT )在table里面还有其他的JSON config_web_cms_authentication, web_threat_intel_alert_external 也将是structs(带 rating 以及 rating_numeric 就像田野一样)。
我尝试用json-serde创建表。以下是我的表定义:

CREATE EXTERNAL TABLE jsons (
web_app_security struct<config_web_cms_authentication: struct<rating: string, rating_numeric: float>, web_threat_intel_alert_external: struct<rating: string, rating_numeric: float>, web_http_security_headers: struct<rating: string, rating_numeric: float>, rating: string, rating_numeric: float>,
dns_security struct<domain_hijacking_protection: struct<rating: string, rating_numeric: float>, rating: string, rating_numeric: float, dns_hosting_providers: struct<rating:string, rating_numeric: float>>,
email_security struct<rating: string, email_encryption_enabled: struct<rating: string, rating_numeric: float>, rating_numeric: float, email_hosting_providers: struct<rating: string, rating_numeric: float>, email_authentication: struct<rating: string, rating_numeric: float>>,
threat_intell struct<rating: string, threat_intel_alert_internal_3: struct<rating: string, rating_numeric: float>, threat_intel_alert_internal_1: struct<rating: string, rating_numeric: float>, rating_numeric: float,  threat_intel_alert_internal_12: struct<rating: string, rating_numeric: float>, threat_intel_alert_internal_6: struct<rating: string, rating_numeric: float>>,
data_loss struct<data_loss_6: struct<rating: string, rating_numeric: float>, rating: string, data_loss_36plus: struct<rating: string, rating_numeric: float>, rating_numeric: float,  data_loss_36: struct<rating: string, rating_numeric: float>, data_loss_12: struct<rating: string, rating_numeric: float>, data_loss_24: struct<rating: string, rating_numeric: float>>,
system_hosting struct<host_hosting_providers: struct<rating: string, rating_numeric: float>,  hosting_countries: struct<rating: string, rating_numeric: float>, rating: string, rating_numeric: float>,
defensibility struct<attack_surface_web_ip: struct<rating: string, rating_numeric: float>, shared_hosting: struct<rating: string, rating_numeric: float>, defensibility_hosting_providers: struct<rating: string, rating_numeric: float>, rating: string, rating_numeric: float, attack_surface_web_hostname: struct<rating: string, rating_numeric: float>>,
software_patching struct<patching_web_cms: struct<rating: string, rating_numeric: float>, rating: string, patching_web_server: struct<rating: string, rating_numeric: float>, patching_vuln_open_ssl: struct<rating: string, rating_numeric: float>, patching_app_server: struct<rating: string, rating_numeric: float>, rating_numeric: float>,
governance struct<governance_customer_base: struct<rating: string, rating_numeric: float>, governance_security_certifications: struct<rating: string, rating_numeric: float>, governance_regulatory_requirements: struct<rating: string, rating_numeric: float>, rating: string, rating_numeric: float>
)ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS orc
LOCATION 'hdfs://nameservice1/data/gis/final/rr_current_analysis'

我试着用json serde解析这些行。在我将一些数据保存到表中之后,当我尝试查询它时,会出现以下错误:

Error: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.Text (state=,code=0)

我不确定我做得对不对。
我对将数据存储到表中的任何其他方法也持开放态度。任何帮助都将不胜感激。谢谢您。

mnemlml8

mnemlml81#

那是因为你把兽人当作储藏室( STORED AS orc )json作为serde( ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' )覆盖兽人的默认值 OrcSerde serde,但不输入( OrcInputFormat )和输出( OrcOutputFormat )格式。
您需要使用orc存储而不重写其默认serde。在这种情况下,确保spark应用程序写入orc表,而不是json。
或者,如果希望数据存储在json中,请使用 JsonSerDe 与纯文本文件一起作为存储( STORED AS TEXTFILE ).
《hive开发人员指南》解释了serde和存储的工作原理-https://cwiki.apache.org/confluence/display/hive/developerguide#developerguide-海维塞德

相关问题