select distinct
promo_name
,case
when substring(promo_name,instr(promo_name, "P0"),2) = "P0" then 0
when substring(promo_name,instr(promo_name, "P1"),2) = "P1" then 1
When substring(promo_name,instr(promo_name, "P01"),3) = "P01" then 1
when substring(promo_name,instr(promo_name, "P2"),2) = "P2" then 2
When substring(promo_name,instr(promo_name, "P02"),3) = "P02" then 2
when substring(promo_name,instr(promo_name, "P3"),2) = "P3" then 3
when substring(promo_name,instr(promo_name, "P03"),3) = "P03" then 3
when substring(promo_name,instr(promo_name, "P4"),2) = "P4" then 4
when substring(promo_name,instr(promo_name, "P04"),3) = "P04" then 4
when substring(promo_name,instr(promo_name, "P5"),2) = "P5" then 5
when substring(promo_name,instr(promo_name, "P05"),3) = "P05" then 5
when substring(promo_name,instr(promo_name, "P6"),2) = "P6" then 6
when substring(promo_name,instr(promo_name, "P06"),3) = "P06" then 6
when substring(promo_name,instr(promo_name, "P7"),2) = "P7" then 7
when substring(promo_name,instr(promo_name, "P07"),3) = "P07" then 7
when trim(substring(promo_name,instr(promo_name, "P8"),2)) ="P8" then 8
when trim(substring(promo_name,instr(promo_name, "P08"),3)) ="P08" then 8
when trim(substring(promo_name,instr(promo_name, "P9"),2)) ="P9" then 9
when trim(substring(promo_name,instr(promo_name, "P09"),3)) ="P09" then 9
when trim(substring(promo_name,instr(promo_name, "P10"),3)) ="P10" then 10
when trim(substring(promo_name,instr(promo_name, "P11"),3)) ="P11" then 11
when trim(substring(promo_name,instr(promo_name, "P12"),3)) ="P12" then 12
else 0以promo\u id结尾,case when trim(substring(promo\u name,instr(promo\u name,“p10”),3))=“p10”,10 when trim(substring(promo\u name,instr(promo\u name,“p11”),3))=“p11”,11 when trim(substring(promo\u name,instr(promo\u name,“p12”),3))=“p12”,12 when trim(substring(promo\u name,instr(promo\u name,“p13”),3) )=“p13”然后13当修剪时(子字符串(promo\u name,instr(promo\u name,“p14”),3))=“p14”然后14否则0结束为hbi\u dns\u protected.store\u zones\u stock\u v7\u 1\u 4,其中promo\u name不为空
试着从字符串中提取id,当我在一个单独的列中使用时,它从p10到p14工作得很好,当我在同一列中使用时,它只选择1而不是11,1而不是12等等。。。
我是不是搞错了?样本数据
2条答案
按热度按时间k5ifujac1#
代码在第一次匹配时停止,因此“11”与“1”匹配。
我建议重新订购和使用
like
:也许你应该问一个新问题,包括样本数据和期望的结果。可能有更简单的方法。
oewdyzsn2#
为什么不用
regexp_extract
从字符串中提取正则表达式,而不是为每种情况编写代码,例如:我的结果:
注意正则表达式区分大小写。如果您需要捕获小写或大写的ps,那么可以使用字符类ie
[pP]
相反。对所用正则表达式模式的完整解释:
正则表达式以空格字符和大写p开头。这将符合字面上的空格和大写p。如果要使匹配不区分大小写,可以使用字符类,例如
[pP]
表示匹配括号中的任何字符(区分大小写)正则表达式的下一个组件是
(\\d+)
. 这是由正则表达式模式组成的\d
对于匹配的数字+
表示“匹配一个或多个”的符号。括号将其分成一组,即第1组。这个\d
有一个附加的斜杠,它是的spark sql实现所需的转义字符regexp_extract
.最后一个论点
regexp_extract
值为1,表示“从函数返回组1”我使用regex101.com测试和练习regex表达式。