问题描述
我正在尝试删除引号之间的空格,但未得到正确的结果。您能帮我怎么做吗?
示例:
Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN "|"894265"|"SILDENAFIL"|"REVATIO"
理想的输出:
Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN"|"894265"|"SILDENAFIL"|"REVATIO"
我尝试了以下代码:
for c_name in df1.columns:
df1 = df1.withColumn(c_name,trim(df1[c_name]))
解决方法
导入trim
函数。
import pyspark.sql.functions as f
for c_name in df1.columns:
df1 = df1.withColumn(c_name,f.trim(df1[c_name]))
df_list = df1.collect()
print(df_list)
[Row(Local_Manufacturer='UPJOHN',SKU_PackID_ProductNumber='894265',Molecule_Name='SILDENAFIL',BrandName_Intl='REVATIO')]
结果被修剪。