如何提取不同于一袋元组?

ezykj2lf  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(288)

所以我在pig中描述了以下数据结构:

--------------------------------------------------------------------------------------------------------------------------------------------------------
| summed_hours_and_miles_by_driver     | group:int     | :bag{:tuple(driver_name:chararray)}             | total_hours:long     | total_miles:long     | 
--------------------------------------------------------------------------------------------------------------------------------------------------------
|                                      | 27            | {(Mark Lochbihler), ..., (Mark Lochbihler)}     | 220                  | 11006                | 
--------------------------------------------------------------------------------------------------------------------------------------------------------

其思想是在一个元组包中多次复制驱动程序名(mark lochbihler)。我怎样才能将它限制为一个单独的名称,比如sql中的distinct?

qjp7pelc

qjp7pelc1#

使用distinct,假设你的关系是这样的

summed_hours_and_miles_by_driver = FOREACH grp GENERATE 
                                       group,
                                       org.apache.pig.builtin.Distinct(A.driver_name),
                                       total_hours,
                                       total_miles;

相关问题