我有一组Parquet文件,其中包含一个名为people的表的数据。现在,Parquet文件中的这些数据由复杂的数据类型组成,如结构等。Parquet文件中的数据模式已附加在模式下:
|-- distinct_id: string (nullable = true)
|-- android_app_version: string (nullable = true)
|-- android_app_version_code: string (nullable = true)
|-- android_brand: string (nullable = true)
|-- android_devices: array (nullable = true)
| |-- element: string (containsNull = true)
|-- android_lib_version: string (nullable = true)
|-- android_manufacturer: string (nullable = true)
|-- android_os: string (nullable = true)
|-- android_os_version: string (nullable = true)
|-- android_push_error: string (nullable = true)
|-- browser: string (nullable = true)
|-- browser_version: double (nullable = true)
|-- campaigns: array (nullable = true)
| |-- element: long (containsNull = true)
|-- country_code: string (nullable = true)
|-- deliveries: array (nullable = true)
| |-- element: long (containsNull = true)
|-- initial_referrer: string (nullable = true)
|-- initial_referring_domain: string (nullable = true)
|-- ios_app_release: string (nullable = true)
|-- ios_app_version: string (nullable = true)
|-- ios_device_model: string (nullable = true)
|-- ios_devices: array (nullable = true)
| |-- element: string (containsNull = true)
|-- ios_lib_version: string (nullable = true)
|-- ios_version: string (nullable = true)
|-- last_seen: string (nullable = true)
|-- notifications: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- $time: string (nullable = true)
| | |-- campaign_id: long (nullable = true)
| | |-- message_id: long (nullable = true)
| | |-- message_subtype: string (nullable = true)
| | |-- message_type: string (nullable = true)
| | |-- time: string (nullable = true)
| | |-- type: string (nullable = true)
|-- os: string (nullable = true)
|-- predict_grade: string (nullable = true)
|-- region: string (nullable = true)
|-- swift_lib_version: string (nullable = true)
|-- timezone: string (nullable = true)
|-- area: string (nullable = true)
|-- country: string (nullable = true)
|-- dob: string (nullable = true)
|-- date: string (nullable = true)
|-- default_languages: string (nullable = true)
|-- email: string (nullable = true)
|-- first_app_launch: string (nullable = true)
|-- first_app_launch_date: string (nullable = true)
|-- first_login: boolean (nullable = true)
|-- gaid: string (nullable = true)
|-- lr_age: string (nullable = true)
|-- lr_birthdate: string (nullable = true)
|-- lr_country: string (nullable = true)
|-- lr_gender: string (nullable = true)
|-- language: array (nullable = true)
| |-- element: string (containsNull = true)
|-- languages: string (nullable = true)
|-- languages_disabled: string (nullable = true)
|-- languages_selected: string (nullable = true)
|-- launched: string (nullable = true)
|-- location: string (nullable = true)
|-- media_id: string (nullable = true)
|-- no_of_logins: long (nullable = true)
|-- pop-strata: string (nullable = true)
|-- price: string (nullable = true)
|-- random_number: long (nullable = true)
|-- second_name: string (nullable = true)
|-- state: string (nullable = true)
|-- state_as_per_barc: string (nullable = true)
|-- total_app_opens: long (nullable = true)
|-- total_app_sessions: string (nullable = true)
|-- total_sessions: string (nullable = true)
|-- town: string (nullable = true)
|-- user_type: string (nullable = true)
|-- userid: string (nullable = true)
|-- appversion: string (nullable = true)
|-- birthdate: string (nullable = true)
|-- campaign: string (nullable = true)
|-- city: string (nullable = true)
|-- media_source: string (nullable = true)
|-- last_name: string (nullable = true)
|-- first_name: string (nullable = true)
|-- ios_ifa: string (nullable = true)
|-- android_model: string (nullable = true)
|-- age: string (nullable = true)
|-- uid: string (nullable = true)
我想要的是最终创建一个hiveext表,指向parquet文件中的数据。一种解决方案可以是扁平化或使用sqlexplode将结构多样化为单个的列数据,但我最终得到了所有最初属于struct数据类型的列的空值。Parquet文件位于azure blob位置。
我尝试在sparksql的Dataframe中加载Parquet文件,但它为具有复杂数据类型的列提供空值:
暂无答案!
目前还没有任何答案,快来回答吧!