问题描述
示例代码如下所示:
accounting.groupBy("department","cityCode","accountNumber","siret").agg(...);
如您所见,我使用了simpleData = (("James","Sales",3000),\
("Michael",4600),\
("Robert",4100),\
("Maria","Finance",\
("James",\
("Scott",3300),\
("Jen",3900),\
("Jeff","Marketing",\
("Kumar",2000),\
("Saif",4100) \
)
columns= ["employee_name","department","salary"]
df = spark.createDataFrame(data = simpleData,schema = columns)
windowSpecAgg = Window.partitionBy("department")
from pyspark.sql.functions import col,avg,sum,min,max,row_number
df.withColumn("row",row_number().over(windowSpec)) \
.withColumn("avg",avg(col("salary")).over(windowSpecAgg)) \
.withColumn("sum",sum(col("salary")).over(windowSpecAgg)) \
.withColumn("min",min(col("salary")).over(windowSpecAgg)) \
.withColumn("max",max(col("salary")).over(windowSpecAgg)) \
.where(col("row")==1).select("department","avg","sum","min","max") \
.show()
withColumn
4次。我想知道,是否有4次随机播放操作,或者只有1次随机播放操作,因为它们都在同一窗口内。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)