python—如何选择名称中包含指定单词且其值等于databricks上配置单元sql中指定值的列

nbnkbykc  于 2021-06-24  发布在  Hive
关注(0)|答案(0)|浏览(231)

我正在尝试选择一些列,这些列的名称在databricks上的配置单元sql中带有一些特定的单词。
基于配置单元使用正则表达式选择列名?
我的代码:

%py
  t = spark.createDataFrame([('50', 'rscds', 'tyhdvs'),], ['id', 'col_pattern_1', 'col_pattern_2'])
  t.write.saveAsTable('my_database.my_table')

  %sql 
  set hive.support.quoted.identifiers=none;
  select `col_pattern.*` 
  from my_database.my_table

我得到了:

Error in SQL statement: AnalysisException: cannot resolve '`col_pattern.*`' given input

我试过:

import pyspark.sql.functions as F
 selected = [s for s in t.columns if 'col_pattern' in s]
 t.filter(t[x]=='rscds' for x in selected)

我得到了:

TypeError: condition should be string or Column

输入:

the dataframe may have 20+ columns with the same prefix, I cannot type them in the query one by one, so I need to find a way to filter the DF by all the columns with the same prefix by a given value.    

 +---+-------------+-------------+-------------+
 | id|col_pattern_1|col_pattern_2|col_pattern_3|
 +---+-------------+-------------+-------------+
| 50|        rscds|       tyhdvs|        tyhdvs|
 +---+-------------+-------------+-------------+

输出:

e.g. I need to find the rows with the column that has the given prefix ('col_pattern') and its value == 'rscds'

  +---+-------------+
  | id|col_pattern_1|
  +---+-------------|
  | 50|        rscds|
  +---+-------------+

选择名称包含指定单词且其值==指定值的列。
谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题