所以我在pig中描述了以下数据结构:
--------------------------------------------------------------------------------------------------------------------------------------------------------
| summed_hours_and_miles_by_driver | group:int | :bag{:tuple(driver_name:chararray)} | total_hours:long | total_miles:long |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| | 27 | {(Mark Lochbihler), ..., (Mark Lochbihler)} | 220 | 11006 |
--------------------------------------------------------------------------------------------------------------------------------------------------------
其思想是在一个元组包中多次复制驱动程序名(mark lochbihler)。我怎样才能将它限制为一个单独的名称,比如sql中的distinct?
1条答案
按热度按时间qjp7pelc1#
使用distinct,假设你的关系是这样的