我正在尝试选择一些列,这些列的名称在databricks上的配置单元sql中带有一些特定的单词。
基于配置单元使用正则表达式选择列名?
我的代码:
%py
t = spark.createDataFrame([('50', 'rscds', 'tyhdvs'),], ['id', 'col_pattern_1', 'col_pattern_2'])
t.write.saveAsTable('my_database.my_table')
%sql
set hive.support.quoted.identifiers=none;
select `col_pattern.*`
from my_database.my_table
我得到了:
Error in SQL statement: AnalysisException: cannot resolve '`col_pattern.*`' given input
我试过:
import pyspark.sql.functions as F
selected = [s for s in t.columns if 'col_pattern' in s]
t.filter(t[x]=='rscds' for x in selected)
我得到了:
TypeError: condition should be string or Column
输入:
the dataframe may have 20+ columns with the same prefix, I cannot type them in the query one by one, so I need to find a way to filter the DF by all the columns with the same prefix by a given value.
+---+-------------+-------------+-------------+
| id|col_pattern_1|col_pattern_2|col_pattern_3|
+---+-------------+-------------+-------------+
| 50| rscds| tyhdvs| tyhdvs|
+---+-------------+-------------+-------------+
输出:
e.g. I need to find the rows with the column that has the given prefix ('col_pattern') and its value == 'rscds'
+---+-------------+
| id|col_pattern_1|
+---+-------------|
| 50| rscds|
+---+-------------+
选择名称包含指定单词且其值==指定值的列。
谢谢
暂无答案!
目前还没有任何答案,快来回答吧!