Featuretools 指定原始选项的文档是错误的?

问题描述

The documentation 说:

为单个原语指定

单个基元或基元组的选项由 DFS 的primitive_options 参数设置。此参数将任何所需选项映射到特定基元。在选项冲突的情况下,在此级别设置的选项将覆盖在整个 DFS 运行级别设置的选项,并且包含选项将始终优先于其忽略对应项

但是,我发现这不是真的,并且忽略选项实际上优先于包含选项。

以下是我将用来演示声称的行为的设置。它是一个实体集,有一个祖父母 (gp)、两个父母 (p1,p2) 和一个孩子 (c) 到一个父母 (p1):

Range("A128").Value

现在在这个实体集上我运行以下 import pandas as pd import featuretools as ft from featuretools import variable_types as vt # # Creating Relational Dataset # ## Grand Parent df_gp = pd.DataFrame({'gp_ind':['a','b'],'gp_ncol1':[1,2],'gp_ncol2':[3,4],'gp_ccol1':['x','y'],'gp_ccol2':['p','q'],'gp_time_col1':pd.to_datetime(['20-01-2020','20-01-2019']),'gp_time_ind':pd.to_datetime(['20-01-2021','20-01-2020'])}) # ## Parent 1 df_p1 = pd.DataFrame({'p1_ind':['a1','a2','b1'],'p1_id': ['a','a','p1_ncol1':[1,2,3],'p1_ncol2':[3,4,5],'p1_ccol1':['x','y','z'],'p1_ccol2':['p','q','r'],'p1_id1' : ['t','t','u'],'p1_time_col1':pd.to_datetime(['16-01-2020','11-12-2019','16-01-2019'],format="%d-%m-%Y"),'p1_time_ind':pd.to_datetime(['15-01-2021','10-12-2020','15-01-2020'],format="%d-%m-%Y")}) # ## Parent 2 df_p2 = pd.DataFrame({'p2_ind':['a1_','a2_','b1_'],'p2_id': ['a','p2_ncol1':[1,'p2_ncol2':[3,'p2_ccol1':['x','p2_ccol2':['p','p2_time_col1':pd.to_datetime(['18-01-2020','13-12-2019','18-01-2019'],'p2_time_ind':pd.to_datetime(['17-01-2021','12-12-2020','17-01-2020'],format="%d-%m-%Y")}) # ## Child df_c = pd.DataFrame({'c_ind':['a1_1','a1_2','a2_1','a2_2','a2_3','b1_1'],'c_id': ['a1','a1','c_ncol1':[1,3,5,6],'c_ncol2':[3,6,7,8],'c_ccol1':['x','z','b','c'],'c_ccol2':['p','r','s','c_time_col1':pd.to_datetime(['13-01-2020','10-12-2019','8-12-2019','5-11-2019','2-10-2019','13-01-2019'],'c_time_ind':pd.to_datetime(['10-01-2021','5-12-2020','9-12-2020','6-11-2020','3-10-2019','12-01-2020'],format="%d-%m-%Y")}) # # Creating Entityset es = ft.EntitySet(id='experimentation') # ## Adding entities # ### Adding gp vt_gp = {'gp_ind':vt.Index,'gp_ncol1':vt.Numeric,'gp_ncol2':vt.Numeric,'gp_ccol1':vt.Categorical,'gp_ccol2':vt.Categorical,'gp_time_col1':vt.Datetime,'gp_time_ind':vt.DatetimeTimeIndex} es.entity_from_dataframe(entity_id='gp',dataframe=df_gp,index='gp_ind',variable_types=vt_gp,time_index='gp_time_ind') # ### Adding p1 vt_p1 = {'p1_ind':vt.Index,'p1_id':vt.Id,'p1_id1' : vt.Id,'p1_ncol1':vt.Numeric,'p1_ncol2':vt.Numeric,'p1_ccol1':vt.Categorical,'p1_ccol2':vt.Categorical,'p1_time_col1':vt.Datetime,'p1_time_ind':vt.DatetimeTimeIndex} es.entity_from_dataframe(entity_id='p1',dataframe=df_p1,index='p1_ind',variable_types=vt_p1,time_index='p1_time_ind') # ### Adding p2 vt_p2 = {'p2_ind':vt.Index,'p2_id':vt.Id,'p2_ncol1':vt.Numeric,'p2_ncol2':vt.Numeric,'p2_ccol1':vt.Categorical,'p2_ccol2':vt.Categorical,'p2_time_col1':vt.Datetime,'p2_time_ind':vt.DatetimeTimeIndex} es.entity_from_dataframe(entity_id='p2',dataframe=df_p2,index='p2_ind',variable_types=vt_p2,time_index='p2_time_ind') # ### Adding c vt_c = {'c_ind':vt.Index,'c_id':vt.Id,'c_ncol1':vt.Numeric,'c_ncol2':vt.Numeric,'c_ccol1':vt.Categorical,'c_ccol2':vt.Categorical,'c_time_col1':vt.Datetime,'c_time_ind':vt.DatetimeTimeIndex} es.entity_from_dataframe(entity_id='c',dataframe=df_c,index='c_ind',variable_types=vt_c,time_index='c_time_ind') # ## Adding Relationships r_gp_p1 = ft.Relationship(es['gp']['gp_ind'],es['p1']['p1_id']) r_gp_p2 = ft.Relationship(es['gp']['gp_ind'],es['p2']['p2_id']) r_p1_c = ft.Relationship(es['p1']['p1_ind'],es['c']['c_id']) es.add_relationships([r_gp_p1,r_gp_p2,r_p1_c]) # ## Create Cutoff Times cutoff_times = df_gp.loc[:,['gp_ind','gp_time_ind']].copy(deep=True) # ## add interesting values es['p1']['p1_ccol1'].interesting_values = es['p1'].df['p1_ccol1'].unique()[0:1] es['c']['c_ccol1'].interesting_values = es['c'].df['c_ccol1'].unique()[0:1] # ## Add last time index es.add_last_time_indexes() # ## Plotting entityset es.plot()

我在 dfsp1 键中都包含 ignore_entities。通过这种方式,我向 include_entities 传达了关于是否在特征创建过程中包含 dfs 实体的​​冲突命令。

预期行为: p1 覆盖 include_entities 和实体 ignore_entities 上的变量

看到的行为: p1 覆盖 ignore_entities 并且 include_entities 上的变量没有生成

p1

agg_primitives = ['sum'] where_primitives = ['sum'] primitive_options = {} primitive_options[('sum',)] = {} primitive_options[('sum',)]['ignore_entities'] = ['p1'] primitive_options[('sum',)]['include_entities'] = ['p1'] features = ft.dfs(entityset=es,target_entity='gp',cutoff_time=cutoff_times,agg_primitives=agg_primitives,features_only=True,max_depth=2,where_primitives = where_primitives,primitive_options=primitive_options,trans_primitives=[]) features output: [<Feature: gp_ncol1>,<Feature: gp_ncol2>,<Feature: gp_ccol1>,<Feature: gp_ccol2>] 上没有任何与文档中所述内容相悖的功能

我是否在这里遗漏了某些东西,或者我看到的文档实际上是错误的,我应该理解 p1 覆盖 ignore_entities

解决方法

这是一个错误,您可以在此处跟踪建议的修复:https://github.com/alteryx/featuretools/pull/1518

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...