在功能工具中，如何自定义 2 列的基元？

问题描述

我创建了如下的自定义基元。

class Correlate(TransformPrimitive):
name = 'correlate'
input_types = [Numeric,Numeric]
return_type = Numeric
commutative = True
compatibility = [Library.PANDAS,Library.dask,Library.KOALAS]

def get_function(self):
    def correlate(column1,column2):
        return np.correlate(column1,column2,"same")
    
    return correlate

然后我检查了下面的计算以防万一。

np.correlate(feature_matrix["alcohol"],feature_matrix["chlorides"],mode="same")

但是上面的函数结果和下面的函数结果是不同的。

你知道它们为什么不同吗？

如果我的代码基本上是错误的，请纠正我。

解决方法

感谢提问！您可以创建一个带有固定参数的自定义原语，以通过使用 TransformPrimitive 作为基类来计算这种相关性。我将通过一个例子来使用这些数据。

import pandas as pd

data = [
    [0.40168819,0.0857946],[0.06268886,0.27811651],[0.16931269,0.96509497],[0.15123022,0.80546244],[0.58610794,0.56928692],]

df = pd.DataFrame(data=data,columns=list('ab'))
df.reset_index(inplace=True)
df

index         a         b
    0  0.401688  0.085795
    1  0.062689  0.278117
    2  0.169313  0.965095
    3  0.151230  0.805462
    4  0.586108  0.569287

函数 np.correlate 是参数 mode=same 时的变换，因此使用 TransformPrimitive 作为基类定义自定义原语。

from featuretools.primitives import TransformPrimitive
from featuretools.variable_types import Numeric
import numpy as np


class Correlate(TransformPrimitive):
    name = 'correlate'
    input_types = [Numeric,Numeric]
    return_type = Numeric

    def get_function(self):
        def correlate(a,b):
            return np.correlate(a,b,mode='same')

        return correlate

DFS 调用要求将数据结构化为 EntitySet，然后您可以使用自定义原语。

import featuretools as ft

es = ft.EntitySet()

es.entity_from_dataframe(
    entity_id='data',dataframe=df,index='index',)

fm,fd = ft.dfs(
    entityset=es,target_entity='data',trans_primitives=[Correlate],max_depth=1,)

fm[['CORRELATE(a,b)']]

       CORRELATE(a,b)
index                 
0             0.534548
1             0.394685
2             0.670774
3             0.670506
4             0.622236

您应该在特征矩阵和 np.correlate 之间获得相同的值。

actual = fm['CORRELATE(a,b)'].values
expected = np.correlate(df['a'],df['b'],mode='same')
np.testing.assert_array_equal(actual,expected)

您可以了解有关在链接页面中定义 simple custom primitives 和 advanced custom primitives 的更多信息。如果您觉得这有帮助，请告诉我。

featuretools python

在功能工具中，如何自定义 2 列的基元？

问题描述

解决方法

相关问答