pig展平错误

pxiryf3j  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(428)

我为嵌套数据尝试了以下脚本:

`books = load 'data/book-seded-workings-reduced.json'
    using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');`

group_auth = group books by title; maped = foreach group_auth generate group, books.authors; fil = foreach maped generate flatten(books); DUMP fil; 但是我得到了一个错误:一个列需要从一个关系中投影出来才能用作标量
你知道吗?

ycggw6v2

ycggw6v21#

books = load 'input.data'
    using JsonLoader('user_id:chararray,
                      type:chararray,
                      title:chararray,
                      year:chararray,
                      publisher:chararray,
                      authors:{(name:chararray)},source:chararray');

flatten_authors = foreach books generate title, FLATTEN(authors.name);

dump flatten_authors;

输出:(在cloudera中使用serde加载json文件时引用的输入)

(Modern Database Systems: The Object Model, Interoperability, and Beyond.,null)
(Inequalities: Theory of Majorization and Its Application.,Albert W. Marshall)
(Inequalities: Theory of Majorization and Its Application.,Ingram Olkin)

相关问题