如何在其他情况下在PySpark中执行嵌套?

问题描述

大家好我正在尝试解释此PowerBi语法并将其转换为Pyspark

 if(UCS_Incidents[Intensity]="Very High",IF(UCS_Incidents[Severity]="Very High","Red",IF(UCS_Incidents[Severity]="High",IF(UCS_Incidents[Severity]="Medium","Orange","Yellow"))),if(UCS_Incidents[Intensity]="High",if(UCS_Incidents[Intensity]="Medium","Yellow","Green"))),if(UCS_Incidents[Intensity]="Low","Green",""))))

这就是我尝试过的:

 Intensities = df.withColumn(('Intensities',f.when((f.col('Intensity') == 'Very High') & (f.col('Severity') == 'Very High'),"Red").
                        otherwise(f.when((f.col('Intensity') == 'Very High') & (f.col('Severity') == 'High'),"Red").
                        otherwise(f.when((f.col('Intensity') == 'Very High') & (f.col('Severity') == 'Medium'),"Orange")
                        .otherwise('Yellow'))))
                        .otherwise(f.when((f.col('Intensity') == 'High') & (f.col('Severity') == 'Very High'),"Red").
                        otherwise(f.when((f.col('Intensity') == 'High') & (f.col('Severity') == 'High'),"Orange").
                        otherwise(f.when((f.col('Intensity') == 'High') & (f.col('Severity') == 'Medium'),"Orange")
                        .otherwise('Yellow'))))
                        .otherwise(f.when((f.col('Intensity') == 'Medium') & (f.col('Severity') == 'Very High'),"Orange").
                        otherwise(f.when((f.col('Intensity') == 'Medium') & (f.col('Severity') == 'High'),"Yellow").
                        otherwise(f.when((f.col('Intensity') == 'Medium') & (f.col('Severity') == 'Medium'),"Yellow")
                        .otherwise('Green'))))
                        .otherwise(f.when((f.col('Intensity') == 'Low') & (f.col('Severity') == 'Very High'),"Yellow").
                        otherwise(f.when((f.col('Intensity') == 'Low') & (f.col('Severity') == 'High'),"Green").
                        otherwise(f.when((f.col('Intensity') == 'Low') & (f.col('Severity') == 'Medium'),"Green")
                        .otherwise('Green'))))

                        ).otherwise("")

但是,我遇到了这个错误

  A Tuple Object dosen't have an attribute Otherwise

任何帮助将不胜感激,谢谢

解决方法

只是举例说明@jxc的意思: 假设您已经有一个名为 df 的数据框:

from pyspark.sql.functions import expr

Intensities = df.withColumn('Intensities',expr("CASE WHEN Intensity = 'Very High' AND Severity = 'Very High' THEN 'Red' WHEN .... ELSE ... END"))

我把“...”作为占位符,但我认为它使方法清晰。