使用emr群集将csv转换为Parquet时出错

我创建了一个emr集群，其中添加了一个配置单元脚本作为执行步骤。我的Hive脚本如下所示：
--为现有数据创建配置单元外部表创建外部表调用\u csv（ id 内景， campaign_id 内景， campaign_name 字符串， offer_id 内景， offer_name 字符串， is_offer_not_found 内景， ivr_key 字符串， call_uuid 字符串， a_leg_uuid 字符串， a_leg_request_uuid 字符串， to_number 字符串， promo_id 内景， description 字符串， call_type 字符串， answer_type 字符串， agent_id 内景， from_number 字符串， from_caller_name 字符串， from_line_type 字符串， from_state 字符串， from_city 字符串， from_country 字符串， from_zip 字符串， from_latitude 字符串， from_longitude 字符串， b_leg_uuid 字符串， b_leg_number 字符串， b_leg_duration 内景， b_leg_bill_rate 加倍， b_leg_bill_duration 内景， b_leg_total_cost 加倍， b_leg_hangup_cause 字符串， b_leg_start_time 字符串， b_leg_answer_time 字符串， b_leg_end_time 字符串， b_leg_active 微小的， bill_rate 加倍， bill_duration 内景， hangup_cause 字符串， start_time 字符串， answer_time 字符串， end_time 字符串， status 字符串， selected_ivr_keys 字符串， processed_ivr_keys 字符串， filter_id 内景， filter_name 字符串， ivr_action 字符串， selected_zip_code 字符串， processed_zip_code 字符串， duration 内景， payout 加倍， min_duration 内景， connected_duration 内景， provider_cost 加倍， caller_id_cost 加倍， total_revenue 加倍， total_cost 加倍， total_profit 加倍， publisher_id 内景， publisher_name 字符串， publisher_revenue 加倍， publisher_cost 加倍， publisher_profit 加倍， advertiser_id 内景， advertiser_name 字符串， advertiser_cost 加倍， is_test 微小的， is_sale 微小的， is_repeat 微小的， is_machine_detection 微小的， no_of_call_transfer 内景， offer_ivr_status 微小的， file_url 字符串， algo 字符串， callback_service_status 微小的， hangup_service_status 微小的， sms_uuid 字符串， number_name 字符串， keyword 字符串， keywordmatchtype 字符串， created_at 字符串， updated_at 字符串， ymdhm bigint）行格式serde'org.apache.hadoop.hive.serde2.opencsvserde'，serdeproperty（'separatorchar'='，'，'quotechar'='\“'）位置's3://calls csv/'tblproperties（'has\u encrypted\u data'='false'，'serialization.null.format'='）；
msck修复表调用\u csv；
--现在让我们创建一个外部表，以parquet格式创建外部表调用\u parquet（ id 内景， campaign_id 内景， campaign_name 字符串， offer_id 内景， offer_name 字符串， is_offer_not_found 内景， ivr_key 字符串， call_uuid 字符串， a_leg_uuid 字符串， a_leg_request_uuid 字符串， to_number 字符串， promo_id 内景， description 字符串， call_type 字符串， answer_type 字符串， agent_id 内景， from_number 字符串， from_caller_name 字符串， from_line_type 字符串， from_state 字符串， from_city 字符串， from_country 字符串， from_zip 字符串， from_latitude 字符串， from_longitude 字符串， b_leg_uuid 字符串， b_leg_number 字符串， b_leg_duration 内景， b_leg_bill_rate 加倍， b_leg_bill_duration 内景， b_leg_total_cost 加倍， b_leg_hangup_cause 字符串， b_leg_start_time 字符串， b_leg_answer_time 字符串， b_leg_end_time 字符串， b_leg_active 微小的， bill_rate 加倍， bill_duration 内景， hangup_cause 字符串， start_time 字符串， answer_time 字符串， end_time 字符串， status 字符串， selected_ivr_keys 字符串， processed_ivr_keys 字符串， filter_id 内景， filter_name 字符串， ivr_action 字符串， selected_zip_code 字符串， processed_zip_code 字符串， duration 内景， payout 加倍， min_duration 内景， connected_duration 内景， provider_cost 加倍， caller_id_cost 加倍， total_revenue 加倍， total_cost 加倍， total_profit 加倍， publisher_id 内景， publisher_name 字符串， publisher_revenue 加倍， publisher_cost 加倍， publisher_profit 加倍， advertiser_id 内景， advertiser_name 字符串， advertiser_cost 加倍， is_test 微小的， is_sale 微小的， is_repeat 微小的， is_machine_detection 微小的， no_of_call_transfer 内景， offer_ivr_status 微小的， file_url 字符串， algo 字符串， callback_service_status 微小的， hangup_service_status 微小的， sms_uuid 字符串， number_name 字符串， keyword 字符串， keywordmatchtype 字符串， created_at 字符串， updated_at 字符串， ymdhm bigint）存储为Parquet位置“s3://calls parquet/”；
--是时候转换和导出了。此步骤将运行很长时间，具体取决于数据大小和集群大小。插入覆盖表调用\uParquet选择*从调用\u csv
下面是我在emr集群上运行此步骤时遇到的错误
状态：失败
详细信息：失败：执行错误，从org.apache.hadoop.hive.ql.exec.movetask返回代码1。将：s3://calls parquet/.hive-staging\u hive\u 2018-03-20\u 07-09-28\u 592\u 6773618098932115163-1/-ext-10000移动到：s3://calls parquet时出错/
jar位置：command-runner.jar主类：无
参数：配置单元脚本--运行配置单元脚本--args-f s3://calls scripts/convertoparquethive.sql-d input=s3://calls csv-d output=s3://calls parquet失败时的操作：继续

使用emr群集将csv转换为Parquet时出错

暂无答案！

相关问题

热门标签

最新问答