最近一年的清管器过滤器

jgwigjjp  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(435)

如何筛选只占用最新更新信息的数据?这是样本数据。
数据:

United States of America    2000    Dentistry personnel density 162.7
United States of America    2000    Health management & support workers 1237.9
United States of America    2000    Laboratory health workers   228.4
United States of America    1995    Nursing and midwifery personnel 879.80005
United States of America    2000    Nursing and midwifery personnel 936.69995
United States of America    2005    Nursing and midwifery personnel 981.49994
United States of America    1995    Other health workers    650.89996
United States of America    2000    Other health workers    1452.1
United States of America    2005    Other health workers    494.3
United States of America    2009    Other health workers    849.89996
United States of America    2010    Other health workers    857.9
United States of America    2011    Other health workers    845.89996
United States of America    2000    Pharmaceutical personnel    87.6
United States of America    2010    Pharmaceutical personnel    88.1
United States of America    1995    Physicians  239.5
United States of America    2000    Physicians  256.4
United States of America    2004    Physicians  267.19998
United States of America    2005    Physicians  240.9
United States of America    2006    Physicians  240.2
United States of America    2007    Physicians  241.00002
United States of America    2008    Physicians  241.59999
United States of America    2009    Physicians  242.2
United States of America    2010    Physicians  241.00002
United States of America    2011    Physicians  245.2
Uruguay 2002    Dentistry personnel density 116.1
Uruguay 2008    Dentistry personnel density 70.1
Uruguay 2008    Health management & support workers 69.5
Uruguay 2008    Laboratory health workers   17.0
Uruguay 2002    Nursing and midwifery personnel 84.899994
Uruguay 2008    Nursing and midwifery personnel 554.8
Uruguay 2008    Other health workers    137.0
Uruguay 2008    Pharmaceutical personnel    53.100002
Uruguay 2002    Physicians  365.19998
Uruguay 2008    Physicians  373.6

我想要的是:

United States of America    2000    Dentistry personnel density 162.7
United States of America    2000    Health management & support workers 1237.9
United States of America    2000    Laboratory health workers   228.4
United States of America    2005    Nursing and midwifery personnel 981.49994
United States of America    2011    Other health workers    845.89996
United States of America    2010    Pharmaceutical personnel    88.1
United States of America    2011    Physicians  245.2
Uruguay 2008    Dentistry personnel density 70.1
Uruguay 2008    Health management & support workers 69.5
Uruguay 2008    Laboratory health workers   17.0
Uruguay 2008    Nursing and midwifery personnel 554.8
Uruguay 2008    Other health workers    137.0
Uruguay 2008    Pharmaceutical personnel    53.100002
Uruguay 2008    Physicians  373.6

我想确定如果国家和职业是一样的,我只过滤最近一年的信息。
这是我的密码,但没用。

b = LOAD '/nomnom' AS (country:chararray, year:int, career:chararray, density:chararray);

c = GROUP b by (country,career);

d = FOREACH c GENERATE MAX(b.year) AS val, group, $1 as max;

e = FOREACH d {
row = FILTER b BY (year == val);
GENERATE FLATTEN(row);
};

DUMP e;
db2dz4w8

db2dz4w81#

分组之后( c ),按年份对分组数据排序,取最新数据:

d = foreach c {
  sorted = order b by year desc;
  latest = limit sorted 1;
  generate FLATTEN(latest);
}
dump d;

相关问题