根据特定单词获取所有tweets，并将所有tweets存储在一个包中

sy5wg1nm 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(324)

我正在尝试处理示例tweet，并根据过滤条件存储tweet。
例如，
样品tweet:-

{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a word about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, "created_date": "Sun Sep 30 2012"}

twitter = LOAD 'Tweet.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
grouped = GROUP twitter BY (text,id);
filtered =FOREACH grouped { row = FILTER $1 BY (text MATCHES '.*word.*'); GENERATE FLATTEN(row);}

它会得到与单词匹配的完整tweet。
但我需要得到如下输出：

(word)(all tweets of contained that word)

我怎样才能做到这一点？
任何帮助。
莫汉五世

hadoop JSON apache-pig hadoop-streaming

来源：https://stackoverflow.com/questions/39244479/get-all-tweets-based-on-specific-word-and-store-all-tweets-in-single-bag

1条答案

按热度按时间

31moq8wy1#

过滤后，将单词作为一个字段添加到过滤后的关系中，说“pattern”，然后按该字段分组。这将得到单词和一包tweets。

twitter = LOAD 'Tweet.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
grouped = GROUP twitter BY (text,id);
filtered =  FILTER $1 BY (text MATCHES '.*word.*');
newfiltered = FOREACH filtered GENERATE 'word' AS pattern,filtered.text;
final = GROUP newfiltered BY pattern;
DUMP final;

赞(0）回复(0）举报 2021-05-29

我来回答

根据特定单词获取所有tweets，并将所有tweets存储在一个包中

1条答案

相关问题

热门标签

最新问答