Featuretools - RollingSum

问题描述

我正在尝试使用 featuretools 创建自定义原始 rolling-sum 功能,以下是代码:-

class RollingSumOnDatetime(TransformPrimitive):
    """Calculates the rolling sum on a Datetime time index column.
    Description:
        Given a list of values and a Datetime time index,return the rolling sum.
    """

    name = "rolling_sum_on_datetime"
    input_types = [Numeric,DatetimeTimeIndex]
    return_type = Numeric
    uses_full_entity = True
    description_template = "the rolling sum of {} on {}"

    def __init__(self,window=None,on=None):
        self.window = window
        self.on = on


    def get_function(self):
        def rolling_sum(to_roll,on_column):
            """method is passed a pandas series"""
            # create a DataFrame that has the both columns in it
            df = pd.DataFrame({to_roll.name: to_roll,on_column.name: on_column})
            rolled_df = df.rolling(window=self.window,on=on_column.name).sum()
            return rolled_df[to_roll.name]

        return rolling_sum


feature_matrix,feature_defs = ft.dfs(
            entityset=es,n_jobs=10,target_entity="contracts",agg_primitives=agg_prim,trans_primitives=trans_prim,groupby_trans_primitives=[
                RollingSumOnDatetime(window="5D",on=es["days"]["datetime"])
            ],max_depth=2,drop_contains=["contract_id","merchant_id"],)

代码的第一部分是自定义原语,第二部分我正在调用函数 它给出了错误

ValueError: setting an array element with a sequence.

解决方法

当您将原语传递给 on=es["days"]["datetime"] 时,您需要移除 groupby_trans_primitives。它不是 __init__RollingSumOnDatetime 中的参数,因此不适用。

这是一个最小的、可重复的示例:

from featuretools.primitives import AggregationPrimitive,TransformPrimitive
from featuretools.variable_types import Numeric,DatetimeTimeIndex

class RollingSumOnDatetime(TransformPrimitive):
    """Calculates the rolling sum on a Datetime time index column.
    Description:
        Given a list of values and a Datetime time index,return the rolling sum.
    """
    name = "rolling_sum_on_datetime"
    input_types = [Numeric,DatetimeTimeIndex]
    return_type = Numeric
    uses_full_entity = True
    description_template = "the rolling sum of {} on {}"
    def __init__(self,window=None):
        self.window = window

    def get_function(self):
        def rolling_sum(to_roll,on_column):
            """method is passed a pandas series"""
            #create a DataFrame that has the both columns in it
            df = pd.DataFrame({to_roll.name:to_roll,on_column.name:on_column})
            rolled_df = df.rolling(window=self.window,on=on_column.name).sum()
            return rolled_df[to_roll.name]
        return rolling_sum

import featuretools as ft 

es = ft.demo.load_mock_customer(return_entityset=True)

feature_matrix,feature_defs = ft.dfs(
    entityset=es,target_entity="transactions",agg_primitives=[],trans_primitives=[],groupby_trans_primitives=[
        RollingSumOnDatetime(window="5D")
    ]
)
feature_defs

如果我们打印出 feature_defs,我们会得到:

[<Feature: session_id>,<Feature: amount>,<Feature: product_id>,<Feature: ROLLING_SUM_ON_DATETIME(amount,transaction_time,window=5D) by product_id>,window=5D) by session_id>,<Feature: products.brand>,<Feature: sessions.customer_id>,<Feature: sessions.device>,<Feature: sessions.customers.zip_code>,sessions.session_start,window=5D) by sessions.customer_id>]

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...