在具有特殊字符的python字典上创建Pyspark数据框

问题描述

我有一个如下的python字典:

data = [{"cust_decision": "buy","cust_details": "Easy to use"},{"cust_decision": "buy","cust_details": "econoimical"},{"cust_decision":"no buy","cust_details": "Didn’t like Product"}]

我正在根据以下数据创建pyspark df和temp视图:

from pyspark.sql import SparkSession,Row
spark.createDataFrame([Row(**i) for i in data]).createOrReplaceTempView("cust")

现在,当我看到此临时视图的数据时,特殊字符'(这不是单引号,它是)变成了另一个字符â。以下是结果

spark.table("cust").show(10,False)
+-------------+---------------------+                                           
|cust_decision|cust_details         |
+-------------+---------------------+
|buy          |Easy to use          |
|buy          |econoimical          |
|no buy       |Didn’t like Product|
+-------------+---------------------+ 

但是我想按每个值获取字符。我该如何实现? 预期结果如下:

+-------------+---------------------+                                           
|cust_decision|cust_details         |
+-------------+---------------------+
|buy          |Easy to use          |
|buy          |econoimical          |
|no buy       |Didn’t like Product  |
+-------------+---------------------+ 

谢谢..

解决方法

尝试通过 df$z <- ifelse(df$y=='blank',1) 将您的数据字典访问 decoding

utf-8