问题描述
我在下面的词典中将特征定义保留为字符串。
features = {
"journey_email_been_sent_flag": "F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0))","journey_opened_flag": "F.when(F.col('opened_14days') > 0,F.lit(1)).otherwise(F.lit(0))"
}
retrieved_features = {}
non_retrieved_features = {}
或将其保留为定义本身。
features = {
"journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),"journey_opened_flag": F.when(F.col('opened_14days') > 0,F.lit(1)).otherwise(F.lit(0))
}
def feature_extract(*featurenames):
for featurename in featurenames:
if featurename in features:
print(f"{featurename} : {features[featurename]}")
retrieved_features[featurename] = features[featurename]
else:
print('failure')
non_retrieved_features[featurename] = "Not Found in the feature defenition"
return retrieved_features
feature_extract('journey_email_been_sent_flag','journey_opened_flag')
但是当我尝试检索未来时它不起作用,当将定义保存在字典中时,我收到以下结果
Out[19]: {'journey_email_been_sent_flag': Column<b'CASE WHEN (email_14days > 0) THEN 1 ELSE 0 END'>}
当我在数据框中调用以下特征检索时。
.withColumn('journey_email_been_sent_flag',feature_extract('journey_email_been_sent_flag'))
得到以下错误
AssertionError: col should be Column
解决方法
我可以通过这种方式解决
我将要素定义保留为定义
features = {
"journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),"journey_opened_flag": F.when(F.col('opened_14days') > 0,F.lit(1)).otherwise(F.lit(0))
}
然后使用F.lit调用feature_extract函数
F.lit(feature_extract('journey_email_been_sent_flag').get('journey_email_been_sent_flag'))