pig拉丁语:字符(文本和数字)中的筛选器编号< 5且>=5

bmp9r5qi  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(234)

我如何筛选或分组那些少于5年和那些超过5年。我对Pig拉丁语很陌生。id(例如bus2003)应保持原样。
输入数据

ID,Experience
BUS2003,More than 17 years teaching experience
BUS1303,2 years teaching experience
BUS4543,13 plus years of teaching experience; 4 plus years of corporate experience
BUS2103,4 year + 6 years in business
BUS2913,8 yrs teaching experience

我知道如何将数据加载到pigstorage或csvloader,但是由于单词和数字在一起,我很难解决这个问题。
期望结果:


**Less than five years**

BUS1303,2 years teaching experience
BUS2103,4 year + 6 years in business

**Equal or greater than five years**

BUS2003,More than 17 years teaching experience
BUS4543,13 plus years of teaching experience; 4 plus years of corporate experience
BUS2913,8 yrs teaching experience

提前谢谢。

wljmcqd8

wljmcqd81#

你必须提取数字,然后分割。这应该能得到你想要的

A = LOAD 'input.txt' USING PigStorage(',') AS (a1:chararray,a2:chararray);
B = FOREACH A GENERATE a1,a2,REGEX_EXTRACT(a2,'(\\d*)',1) as exp:int;
C = SPLIT B INTO C1 IF B.exp < 5, C2 IF B.exp >= 5;
DUMP C1;
DUMP C2;

相关问题