如何选择名称中带有指定单词的列及其值等于databricks上的Hive sql中的指定值

问题描述

我正在尝试在数据块上的Hive sql中选择一些名称带有某些特定单词的列。

基于HIVE Select columns names with regular expression?

我的代码

  %py
  t = spark.createDataFrame([('50','rscds','tyhdvs'),],['id','col_pattern_1','col_pattern_2'])
  t.write.saveAsTable('my_database.my_table')

  %sql 
  set hive.support.quoted.identifiers=none;
  select `col_pattern.*` 
  from my_database.my_table

我知道了

 Error in sql statement: AnalysisException: cannot resolve '`col_pattern.*`' given input 
 

我尝试过:

 import pyspark.sql.functions as F
 selected = [s for s in t.columns if 'col_pattern' in s]
 t.filter(t[x]=='rscds' for x in selected)

我得到了:

  TypeError: condition should be string or Column

输入:

  the dataframe may have 20+ columns with the same prefix,I cannot type them in the query one by one,so I need to find a way to filter the DF by all the columns with the same prefix by a given value.    

 +---+-------------+-------------+-------------+
 | id|col_pattern_1|col_pattern_2|col_pattern_3|
 +---+-------------+-------------+-------------+
| 50|        rscds|       tyhdvs|        tyhdvs|
 +---+-------------+-------------+-------------+

输出

 e.g. I need to find the rows with the column that has the given prefix ('col_pattern') and its value == 'rscds'

  +---+-------------+
  | id|col_pattern_1|
  +---+-------------|
  | 50|        rscds|
  +---+-------------+

选择具有指定单词名称的列,并且其值==指定值。

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)