databricksparksql中的字符串比较

xyhw6mcr 于 2021-05-27 发布在 Spark

关注(0)|答案(2)|浏览(412)

select distinct
 promo_name 
  ,case 
    when substring(promo_name,instr(promo_name, "P0"),2)  = "P0"   then 0
    when substring(promo_name,instr(promo_name, "P1"),2)  = "P1"   then 1
    When substring(promo_name,instr(promo_name, "P01"),3) = "P01"  then 1
    when substring(promo_name,instr(promo_name, "P2"),2)  = "P2"   then 2 
    When substring(promo_name,instr(promo_name, "P02"),3) = "P02"  then 2  
    when substring(promo_name,instr(promo_name, "P3"),2)  = "P3"   then 3 
    when substring(promo_name,instr(promo_name, "P03"),3) = "P03"  then 3
    when substring(promo_name,instr(promo_name, "P4"),2)  = "P4"   then 4  
    when substring(promo_name,instr(promo_name, "P04"),3) = "P04"  then 4 
    when substring(promo_name,instr(promo_name, "P5"),2)  = "P5"   then 5
    when substring(promo_name,instr(promo_name, "P05"),3) = "P05"  then 5
    when substring(promo_name,instr(promo_name, "P6"),2)  = "P6"   then 6
    when substring(promo_name,instr(promo_name, "P06"),3) = "P06"  then 6
    when substring(promo_name,instr(promo_name, "P7"),2)  = "P7"   then 7
    when substring(promo_name,instr(promo_name, "P07"),3) = "P07"  then 7 
    when trim(substring(promo_name,instr(promo_name, "P8"),2))  ="P8"  then 8 
    when trim(substring(promo_name,instr(promo_name, "P08"),3)) ="P08" then 8
    when trim(substring(promo_name,instr(promo_name, "P9"),2))  ="P9"  then 9
    when trim(substring(promo_name,instr(promo_name, "P09"),3)) ="P09" then 9
    when trim(substring(promo_name,instr(promo_name, "P10"),3)) ="P10" then 10 
    when trim(substring(promo_name,instr(promo_name, "P11"),3)) ="P11" then 11
    when trim(substring(promo_name,instr(promo_name, "P12"),3)) ="P12" then 12

else 0以promo\u id结尾，case when trim（substring（promo\u name，instr（promo\u name，“p10”），3））=“p10”，10 when trim（substring（promo\u name，instr（promo\u name，“p11”），3））=“p11”，11 when trim（substring（promo\u name，instr（promo\u name，“p12”），3））=“p12”，12 when trim（substring（promo\u name，instr（promo\u name，“p13”），3））=“p13”然后13当修剪时（子字符串（promo\u name，instr（promo\u name，“p14”），3））=“p14”然后14否则0结束为hbi\u dns\u protected.store\u zones\u stock\u v7\u 1\u 4，其中promo\u name不为空
试着从字符串中提取id，当我在一个单独的列中使用时，它从p10到p14工作得很好，当我在同一列中使用时，它只选择1而不是11，1而不是12等等。。。
我是不是搞错了？样本数据

sql apache-spark databricks instr

来源：https://stackoverflow.com/questions/63017975/string-comparison-in-databricks-spark-sql

2条答案

按热度按时间

k5ifujac1#

代码在第一次匹配时停止，因此“11”与“1”匹配。
我建议重新订购和使用 like :

(case when promo_name like 'P14%' then 14
      when promo_name like 'P13%' then 13
      . . .
 end)

也许你应该问一个新问题，包括样本数据和期望的结果。可能有更简单的方法。

赞(0）回复(0）举报 2021-05-27

oewdyzsn2#

为什么不用 regexp_extract 从字符串中提取正则表达式，而不是为每种情况编写代码，例如：

%sql
SELECT *,
  regexp_extract( promo_name, ' P(\\d+)', 1 ) AS promoNumber
FROM tmp

我的结果：

注意正则表达式区分大小写。如果您需要捕获小写或大写的ps，那么可以使用字符类ie [pP] 相反。
对所用正则表达式模式的完整解释：
正则表达式以空格字符和大写p开头。这将符合字面上的空格和大写p。如果要使匹配不区分大小写，可以使用字符类，例如 [pP] 表示匹配括号中的任何字符（区分大小写）
正则表达式的下一个组件是 (\\d+) . 这是由正则表达式模式组成的 \d 对于匹配的数字 + 表示“匹配一个或多个”的符号。括号将其分成一组，即第1组。这个 \d 有一个附加的斜杠，它是的spark sql实现所需的转义字符 regexp_extract .
最后一个论点 regexp_extract 值为1，表示“从函数返回组1”
我使用regex101.com测试和练习regex表达式。

赞(0）回复(0）举报 2021-05-27

我来回答

databricksparksql中的字符串比较

2条答案

相关问题

热门标签

最新问答