如何添加相对于火花数据框中第一行值的增量日期值

问题描述

输入:

+------+--------+  
|Test  |01-12-20|  
|ravi  |    null|  
|Son   |    null|

预期输出

+------+--------+  
|Test  |01-12-20|  
|ravi  |02-12-20|  
|Son   |03-12-20|

我试过 .withColumn(col("dated"),date_add(col("dated"),1)); 但这会导致所有列值都为 NULL。

你能帮我获取日期第二列的增量值吗?

解决方法

这将是适合您的解决方案

输入

df = spark.createDataFrame([("Test","01-12-20"),("Ravi",None),("Son",None)],[ "col1","col2"])
df.show()
df = df.withColumn("col2",F.to_date(F.col("col2"),"dd-MM-yy"))
# a dummy col for window function
df = df.withColumn("del_col",F.lit(0))
_w = W.partitionBy(F.col("del_col")).orderBy(F.col("del_col").desc())
df = df.withColumn("rn_no",F.row_number().over(_w)-1)
# Create a column with the same date
df = df.withColumn("dated",F.first("col2").over(_w))

df = df.selectExpr('*','date_add(dated,rn_no) as next_date')
df.show()

DF

+----+--------+
|col1|    col2|
+----+--------+
|Test|01-12-20|
|Ravi|    null|
| Son|    null|
+----+--------+

最终输出

+----+----------+-------+-----+----------+----------+
|col1|      col2|del_col|rn_no|     dated| next_date|
+----+----------+-------+-----+----------+----------+
|Test|2020-12-01|      0|    0|2020-12-01|2020-12-01|
|Ravi|      null|      0|    1|2020-12-01|2020-12-02|
| Son|      null|      0|    2|2020-12-01|2020-12-03|
+----+----------+-------+-----+----------+----------+