我有一个包含json条目的文件,如下所示:
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
我想计算文件中不同json对象的频率。我在pig中看到了使用group by和count()函数的其他答案。我不确定我是否正确使用它们,但我没有得到所需的结果。我的输出应该如下所示:
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua", "count": "3"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin", "count": "2"}
顺序并不重要。有人能给我一些建议吗?
1条答案
按热度按时间7cwmlq891#
这里是可以使用的代码,所有字段的条件都要分组如果你想要其他格式,你可以从元组中读取feild并使用任何其他格式