问题描述
我正在尝试使用 featuretools 创建自定义原始 rolling-sum 功能,以下是代码:-
class RollingSumOnDatetime(TransformPrimitive):
"""Calculates the rolling sum on a Datetime time index column.
Description:
Given a list of values and a Datetime time index,return the rolling sum.
"""
name = "rolling_sum_on_datetime"
input_types = [Numeric,DatetimeTimeIndex]
return_type = Numeric
uses_full_entity = True
description_template = "the rolling sum of {} on {}"
def __init__(self,window=None,on=None):
self.window = window
self.on = on
def get_function(self):
def rolling_sum(to_roll,on_column):
"""method is passed a pandas series"""
# create a DataFrame that has the both columns in it
df = pd.DataFrame({to_roll.name: to_roll,on_column.name: on_column})
rolled_df = df.rolling(window=self.window,on=on_column.name).sum()
return rolled_df[to_roll.name]
return rolling_sum
feature_matrix,feature_defs = ft.dfs(
entityset=es,n_jobs=10,target_entity="contracts",agg_primitives=agg_prim,trans_primitives=trans_prim,groupby_trans_primitives=[
RollingSumOnDatetime(window="5D",on=es["days"]["datetime"])
],max_depth=2,drop_contains=["contract_id","merchant_id"],)
代码的第一部分是自定义原语,第二部分我正在调用函数 它给出了错误:
ValueError: setting an array element with a sequence.
解决方法
当您将原语传递给 on=es["days"]["datetime"]
时,您需要移除 groupby_trans_primitives
。它不是 __init__
的 RollingSumOnDatetime
中的参数,因此不适用。
这是一个最小的、可重复的示例:
from featuretools.primitives import AggregationPrimitive,TransformPrimitive
from featuretools.variable_types import Numeric,DatetimeTimeIndex
class RollingSumOnDatetime(TransformPrimitive):
"""Calculates the rolling sum on a Datetime time index column.
Description:
Given a list of values and a Datetime time index,return the rolling sum.
"""
name = "rolling_sum_on_datetime"
input_types = [Numeric,DatetimeTimeIndex]
return_type = Numeric
uses_full_entity = True
description_template = "the rolling sum of {} on {}"
def __init__(self,window=None):
self.window = window
def get_function(self):
def rolling_sum(to_roll,on_column):
"""method is passed a pandas series"""
#create a DataFrame that has the both columns in it
df = pd.DataFrame({to_roll.name:to_roll,on_column.name:on_column})
rolled_df = df.rolling(window=self.window,on=on_column.name).sum()
return rolled_df[to_roll.name]
return rolling_sum
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix,feature_defs = ft.dfs(
entityset=es,target_entity="transactions",agg_primitives=[],trans_primitives=[],groupby_trans_primitives=[
RollingSumOnDatetime(window="5D")
]
)
feature_defs
如果我们打印出 feature_defs
,我们会得到:
[<Feature: session_id>,<Feature: amount>,<Feature: product_id>,<Feature: ROLLING_SUM_ON_DATETIME(amount,transaction_time,window=5D) by product_id>,window=5D) by session_id>,<Feature: products.brand>,<Feature: sessions.customer_id>,<Feature: sessions.device>,<Feature: sessions.customers.zip_code>,sessions.session_start,window=5D) by sessions.customer_id>]