问题描述
为单个原语指定
单个基元或基元组的选项由 DFS 的primitive_options 参数设置。此参数将任何所需选项映射到特定基元。在选项冲突的情况下,在此级别设置的选项将覆盖在整个 DFS 运行级别设置的选项,并且包含选项将始终优先于其忽略对应项。
但是,我发现这不是真的,并且忽略选项实际上优先于包含选项。
以下是我将用来演示声称的行为的设置。它是一个实体集,有一个祖父母 (gp)、两个父母 (p1,p2) 和一个孩子 (c) 到一个父母 (p1):
Range("A128").Value
现在在这个实体集上我运行以下 import pandas as pd
import featuretools as ft
from featuretools import variable_types as vt
# # Creating Relational Dataset
# ## Grand Parent
df_gp = pd.DataFrame({'gp_ind':['a','b'],'gp_ncol1':[1,2],'gp_ncol2':[3,4],'gp_ccol1':['x','y'],'gp_ccol2':['p','q'],'gp_time_col1':pd.to_datetime(['20-01-2020','20-01-2019']),'gp_time_ind':pd.to_datetime(['20-01-2021','20-01-2020'])})
# ## Parent 1
df_p1 = pd.DataFrame({'p1_ind':['a1','a2','b1'],'p1_id': ['a','a','p1_ncol1':[1,2,3],'p1_ncol2':[3,4,5],'p1_ccol1':['x','y','z'],'p1_ccol2':['p','q','r'],'p1_id1' : ['t','t','u'],'p1_time_col1':pd.to_datetime(['16-01-2020','11-12-2019','16-01-2019'],format="%d-%m-%Y"),'p1_time_ind':pd.to_datetime(['15-01-2021','10-12-2020','15-01-2020'],format="%d-%m-%Y")})
# ## Parent 2
df_p2 = pd.DataFrame({'p2_ind':['a1_','a2_','b1_'],'p2_id': ['a','p2_ncol1':[1,'p2_ncol2':[3,'p2_ccol1':['x','p2_ccol2':['p','p2_time_col1':pd.to_datetime(['18-01-2020','13-12-2019','18-01-2019'],'p2_time_ind':pd.to_datetime(['17-01-2021','12-12-2020','17-01-2020'],format="%d-%m-%Y")})
# ## Child
df_c = pd.DataFrame({'c_ind':['a1_1','a1_2','a2_1','a2_2','a2_3','b1_1'],'c_id': ['a1','a1','c_ncol1':[1,3,5,6],'c_ncol2':[3,6,7,8],'c_ccol1':['x','z','b','c'],'c_ccol2':['p','r','s','c_time_col1':pd.to_datetime(['13-01-2020','10-12-2019','8-12-2019','5-11-2019','2-10-2019','13-01-2019'],'c_time_ind':pd.to_datetime(['10-01-2021','5-12-2020','9-12-2020','6-11-2020','3-10-2019','12-01-2020'],format="%d-%m-%Y")})
# # Creating Entityset
es = ft.EntitySet(id='experimentation')
# ## Adding entities
# ### Adding gp
vt_gp = {'gp_ind':vt.Index,'gp_ncol1':vt.Numeric,'gp_ncol2':vt.Numeric,'gp_ccol1':vt.Categorical,'gp_ccol2':vt.Categorical,'gp_time_col1':vt.Datetime,'gp_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='gp',dataframe=df_gp,index='gp_ind',variable_types=vt_gp,time_index='gp_time_ind')
# ### Adding p1
vt_p1 = {'p1_ind':vt.Index,'p1_id':vt.Id,'p1_id1' : vt.Id,'p1_ncol1':vt.Numeric,'p1_ncol2':vt.Numeric,'p1_ccol1':vt.Categorical,'p1_ccol2':vt.Categorical,'p1_time_col1':vt.Datetime,'p1_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='p1',dataframe=df_p1,index='p1_ind',variable_types=vt_p1,time_index='p1_time_ind')
# ### Adding p2
vt_p2 = {'p2_ind':vt.Index,'p2_id':vt.Id,'p2_ncol1':vt.Numeric,'p2_ncol2':vt.Numeric,'p2_ccol1':vt.Categorical,'p2_ccol2':vt.Categorical,'p2_time_col1':vt.Datetime,'p2_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='p2',dataframe=df_p2,index='p2_ind',variable_types=vt_p2,time_index='p2_time_ind')
# ### Adding c
vt_c = {'c_ind':vt.Index,'c_id':vt.Id,'c_ncol1':vt.Numeric,'c_ncol2':vt.Numeric,'c_ccol1':vt.Categorical,'c_ccol2':vt.Categorical,'c_time_col1':vt.Datetime,'c_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='c',dataframe=df_c,index='c_ind',variable_types=vt_c,time_index='c_time_ind')
# ## Adding Relationships
r_gp_p1 = ft.Relationship(es['gp']['gp_ind'],es['p1']['p1_id'])
r_gp_p2 = ft.Relationship(es['gp']['gp_ind'],es['p2']['p2_id'])
r_p1_c = ft.Relationship(es['p1']['p1_ind'],es['c']['c_id'])
es.add_relationships([r_gp_p1,r_gp_p2,r_p1_c])
# ## Create Cutoff Times
cutoff_times = df_gp.loc[:,['gp_ind','gp_time_ind']].copy(deep=True)
# ## add interesting values
es['p1']['p1_ccol1'].interesting_values = es['p1'].df['p1_ccol1'].unique()[0:1]
es['c']['c_ccol1'].interesting_values = es['c'].df['c_ccol1'].unique()[0:1]
# ## Add last time index
es.add_last_time_indexes()
# ## Plotting entityset
es.plot()
:
我在 dfs
和 p1
键中都包含 ignore_entities
。通过这种方式,我向 include_entities
传达了关于是否在特征创建过程中包含 dfs
实体的冲突命令。
预期行为: p1
覆盖 include_entities
和实体 ignore_entities
上的变量
看到的行为: p1
覆盖 ignore_entities
并且 include_entities
上的变量没有生成
p1
在 agg_primitives = ['sum']
where_primitives = ['sum']
primitive_options = {}
primitive_options[('sum',)] = {}
primitive_options[('sum',)]['ignore_entities'] = ['p1']
primitive_options[('sum',)]['include_entities'] = ['p1']
features = ft.dfs(entityset=es,target_entity='gp',cutoff_time=cutoff_times,agg_primitives=agg_primitives,features_only=True,max_depth=2,where_primitives = where_primitives,primitive_options=primitive_options,trans_primitives=[])
features
output:
[<Feature: gp_ncol1>,<Feature: gp_ncol2>,<Feature: gp_ccol1>,<Feature: gp_ccol2>]
上没有任何与文档中所述内容相悖的功能
我是否在这里遗漏了某些东西,或者我看到的文档实际上是错误的,我应该理解 p1
覆盖 ignore_entities
解决方法
这是一个错误,您可以在此处跟踪建议的修复:https://github.com/alteryx/featuretools/pull/1518