我使用elephantbird项目将json文件加载到pig。但我不确定如何在加载时定义模式。没有找到相同的描述。
数据:
{"id":22522,"name":"Product1","colors":["Red","Blue"],"sizes":["S","M"]}
{"id":22523,"name":"Product2","colors":["White","Blue"],"sizes":["M"]}
代码:
feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;
extracted_products = FOREACH feed GENERATE
products_json#'id' AS id,
products_json#'name' AS name,
products_json#'colors' AS colors,
products_json#'sizes' AS sizes;
describe extracted_products;
结果:
extracted_products: {id: chararray,name: bytearray,colors: bytearray,sizes: bytearray}
如何为它们提供正确的模式(int、string、array、array)以及如何将数组元素展平成行?
提前谢谢
1条答案
按热度按时间ax6ht2ek1#
要将json数组转换为元组:
压扁元组