问题描述
我正在尝试在数据块上的Hive sql中选择一些名称带有某些特定单词的列。
基于HIVE Select columns names with regular expression?
我的代码:
%py
t = spark.createDataFrame([('50','rscds','tyhdvs'),],['id','col_pattern_1','col_pattern_2'])
t.write.saveAsTable('my_database.my_table')
%sql
set hive.support.quoted.identifiers=none;
select `col_pattern.*`
from my_database.my_table
我知道了
Error in sql statement: AnalysisException: cannot resolve '`col_pattern.*`' given input
我尝试过:
import pyspark.sql.functions as F
selected = [s for s in t.columns if 'col_pattern' in s]
t.filter(t[x]=='rscds' for x in selected)
我得到了:
TypeError: condition should be string or Column
输入:
the dataframe may have 20+ columns with the same prefix,I cannot type them in the query one by one,so I need to find a way to filter the DF by all the columns with the same prefix by a given value.
+---+-------------+-------------+-------------+
| id|col_pattern_1|col_pattern_2|col_pattern_3|
+---+-------------+-------------+-------------+
| 50| rscds| tyhdvs| tyhdvs|
+---+-------------+-------------+-------------+
输出:
e.g. I need to find the rows with the column that has the given prefix ('col_pattern') and its value == 'rscds'
+---+-------------+
| id|col_pattern_1|
+---+-------------|
| 50| rscds|
+---+-------------+
选择具有指定单词名称的列,并且其值==指定值。
谢谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)