sql—将url拆分为单词数组后获取单词数

e5njpo68  于 2021-06-24  发布在  Hive
关注(0)|答案(0)|浏览(289)

我有一个包含URL列表的表

url

http://03cubsml.baseball.cbssports.com/stats/stats-main?selectedplayer=2122997
http://08flb.baseball.cbssports.com/scoring/standard
http://100-poems.com/poems/life/index2.htm
http://10000lakesrbl.baseball.cbssports.com/stats/stats-main
http://1000pictures.com/view.htm?cscenic/sunset+fnoy-2011-07-21-211010+a1112212325323435434553545885949hh9
http://05command.wikidot.com/tech-hub-tag-list
http://10000lakesrbl.baseball.cbssports.com/players/playerpage/2504134
http://1001goroskop.ru/gadanie/?kniga-sudeb
http://04spfbl.baseball.cbssports.com/standings/overall
http://05command.wikidot.com
http://05command.wikidot.com/tech-hub-tag
http://05fbl.baseball.cbssports.com/stats/stats-main
http://100-poems.com/poems/life/0464004.htm
http://10000islands.proboards.com/board/129/tito-headquarters
http://10000islands.proboards.com/thread/11959/tip-islands-party?page=477
http://10000islands.proboards.com/thread/14172/illustrious-house-improving-wordiness?page=82
http://1000pictures.com/view.htm?cscenic/sunset+feilat05-040+a1112212325323435434553545885949hh9
http://1001-rimes.com/listeperson.php?letter=%E9&start=30
http://1001-rimes.com/listeperson.php?letter=ques&start=30
http://1001goroskop.ru/?god

我现在使用以下代码将url拆分为url中的单词列表

Create table url_keyword
(url string,
keywords Array<String>);

Insert Overwrite table url_keyword
as
Select url,split(lcase (parse_url (url,'PATH')),"[=/_%:|^$#@!&,?*_~+.`<>(){}' \-\;\" \\ \\[\\]{[0 -9]+ }]") AS keywords from url_table;

我得到的输出有url和通过拆分数组生成的关键字(空格分隔的数组)。现在我想得到一个由每个url生成的字数,但是每当我试图做一个

regexp_replace(keywords,' ',',')

为了将它转换成逗号分隔的数组,以便使用length函数来获取字数,我得到了错误

Wrong arguments '','': No matching method for class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with (array, string, string). Possible choices: _FUNC_(string, string, string)

在这种情况下,如何实现字数计算?
我的关键字输出看起来像

stats stats main
 scoring standard
 poems life index  htm
 stats stats main
 view htm
 tech hub tag list
 players playerpage        
 gadanie 
 standings overall

 tech hub tag
 stats stats main
 poems life         htm
 board     tito headquarters
 thread       tip islands party
 thread       illustrious house improving wordiness
 view htm
 listeperson php
 listeperson php

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题