如何使用包含复杂结构数据类型的数据的Parquet文件创建外部配置单元表

bcs8qyzn  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(231)

我有一组Parquet文件,其中包含一个名为people的表的数据。现在,Parquet文件中的这些数据由复杂的数据类型组成,如结构等。Parquet文件中的数据模式已附加在模式下:

|-- distinct_id: string (nullable = true)
 |-- android_app_version: string (nullable = true)
 |-- android_app_version_code: string (nullable = true)
 |-- android_brand: string (nullable = true)
 |-- android_devices: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- android_lib_version: string (nullable = true)
 |-- android_manufacturer: string (nullable = true)
 |-- android_os: string (nullable = true)
 |-- android_os_version: string (nullable = true)
 |-- android_push_error: string (nullable = true)
 |-- browser: string (nullable = true)
 |-- browser_version: double (nullable = true)
 |-- campaigns: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- country_code: string (nullable = true)
 |-- deliveries: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- initial_referrer: string (nullable = true)
 |-- initial_referring_domain: string (nullable = true)
 |-- ios_app_release: string (nullable = true)
 |-- ios_app_version: string (nullable = true)
 |-- ios_device_model: string (nullable = true)
 |-- ios_devices: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- ios_lib_version: string (nullable = true)
 |-- ios_version: string (nullable = true)
 |-- last_seen: string (nullable = true)
 |-- notifications: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- $time: string (nullable = true)
 |    |    |-- campaign_id: long (nullable = true)
 |    |    |-- message_id: long (nullable = true)
 |    |    |-- message_subtype: string (nullable = true)
 |    |    |-- message_type: string (nullable = true)
 |    |    |-- time: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- os: string (nullable = true)
 |-- predict_grade: string (nullable = true)
 |-- region: string (nullable = true)
 |-- swift_lib_version: string (nullable = true)
 |-- timezone: string (nullable = true)
 |-- area: string (nullable = true)
 |-- country: string (nullable = true)
 |-- dob: string (nullable = true)
 |-- date: string (nullable = true)
 |-- default_languages: string (nullable = true)
 |-- email: string (nullable = true)
 |-- first_app_launch: string (nullable = true)
 |-- first_app_launch_date: string (nullable = true)
 |-- first_login: boolean (nullable = true)
 |-- gaid: string (nullable = true)
 |-- lr_age: string (nullable = true)
 |-- lr_birthdate: string (nullable = true)
 |-- lr_country: string (nullable = true)
 |-- lr_gender: string (nullable = true)
 |-- language: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languages: string (nullable = true)
 |-- languages_disabled: string (nullable = true)
 |-- languages_selected: string (nullable = true)
 |-- launched: string (nullable = true)
 |-- location: string (nullable = true)
 |-- media_id: string (nullable = true)
 |-- no_of_logins: long (nullable = true)
 |-- pop-strata: string (nullable = true)
 |-- price: string (nullable = true)
 |-- random_number: long (nullable = true)
 |-- second_name: string (nullable = true)
 |-- state: string (nullable = true)
 |-- state_as_per_barc: string (nullable = true)
 |-- total_app_opens: long (nullable = true)
 |-- total_app_sessions: string (nullable = true)
 |-- total_sessions: string (nullable = true)
 |-- town: string (nullable = true)
 |-- user_type: string (nullable = true)
 |-- userid: string (nullable = true)
 |-- appversion: string (nullable = true)
 |-- birthdate: string (nullable = true)
 |-- campaign: string (nullable = true)
 |-- city: string (nullable = true)
 |-- media_source: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- first_name: string (nullable = true)
 |-- ios_ifa: string (nullable = true)
 |-- android_model: string (nullable = true)
 |-- age: string (nullable = true)
 |-- uid: string (nullable = true)

我想要的是最终创建一个hiveext表,指向parquet文件中的数据。一种解决方案可以是扁平化或使用sqlexplode将结构多样化为单个的列数据,但我最终得到了所有最初属于struct数据类型的列的空值。Parquet文件位于azure blob位置。
我尝试在sparksql的Dataframe中加载Parquet文件,但它为具有复杂数据类型的列提供空值:

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题