用文本限定符加载

6ovsh4lw  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(571)

我正在尝试用pig拉丁语脚本加载数据文件,数据有2列,但第2列中有一个文本限定符,示例数据如下:

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"

当我尝试按如下方式加载日期时,第2列不被识别为1列

deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );

加载数据集时如何定义文本限定符?

guicsvcw

guicsvcw1#

试试这个,如果你需要不同的输出格式请告诉我
输入文件

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200

Pig手稿:

A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;

输出:

(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")

相关问题