问题描述
输入:
+------+--------+
|Test |01-12-20|
|ravi | null|
|Son | null|
预期输出:
+------+--------+
|Test |01-12-20|
|ravi |02-12-20|
|Son |03-12-20|
我试过 .withColumn(col("dated"),date_add(col("dated"),1));
但这会导致所有列值都为 NULL。
你能帮我获取日期第二列的增量值吗?
解决方法
这将是适合您的解决方案
输入
df = spark.createDataFrame([("Test","01-12-20"),("Ravi",None),("Son",None)],[ "col1","col2"])
df.show()
df = df.withColumn("col2",F.to_date(F.col("col2"),"dd-MM-yy"))
# a dummy col for window function
df = df.withColumn("del_col",F.lit(0))
_w = W.partitionBy(F.col("del_col")).orderBy(F.col("del_col").desc())
df = df.withColumn("rn_no",F.row_number().over(_w)-1)
# Create a column with the same date
df = df.withColumn("dated",F.first("col2").over(_w))
df = df.selectExpr('*','date_add(dated,rn_no) as next_date')
df.show()
DF
+----+--------+
|col1| col2|
+----+--------+
|Test|01-12-20|
|Ravi| null|
| Son| null|
+----+--------+
最终输出
+----+----------+-------+-----+----------+----------+
|col1| col2|del_col|rn_no| dated| next_date|
+----+----------+-------+-----+----------+----------+
|Test|2020-12-01| 0| 0|2020-12-01|2020-12-01|
|Ravi| null| 0| 1|2020-12-01|2020-12-02|
| Son| null| 0| 2|2020-12-01|2020-12-03|
+----+----------+-------+-----+----------+----------+