多元相关过滤器

问题描述

如何识别两个分类特征与目标变量的关联之间的相关性。

例如:

如果三个特征包含 2 个分类变量和 1 个目标变量 在使用卡方检验确定每个特征与目标变量的相关性时,我无法找到强关系。所以我想使用这两个特征的组合来检查是否与目标变量存在相关性,但我很困惑对于这种情况我们是否可以使用卡方检验或其他一些方法

例如:

ct_reloc_status = pd.crosstab(df_offer_details['percentage_hike_offered_bin'].sample(frac=0.5,replace=True,random_state=1),[df_offer_details['Candidate relocation status'].sample(frac=0.5,df_offer_details['Acceptance status'].sample(frac=0.5,random_state=1)])
ct_reloc_status

# we carry out a contingency test to check whether there is a correlation with the target variable 
# and relocation status 
H0 = "There is no relationship between Relocation status and Acceptance status"
Ha = "There is a relationship between Relocation status and Acceptance status"

stat,p,dof,expected = chi2_contingency(ct_reloc_status)
print('p-value: ',p)

prob = 0.95
critical = chi2.ppf(prob,dof)
print('probability=%.3f,critical=%.3f,stat=%.3f' % (prob,critical,stat))

if abs(stat) >= critical :
    print(f'''Since p-value {p} < 0.05 we reject null hypothesis: {H0}.Thus alternate hypothesis: {Ha} holds good ''')
else:
    print(f'Fail to reject null hypothesis {H0}')

Result:


p-value:  0.019814129159194147
probability=0.950,critical=28.869,stat=32.380
Since p-value 0.019814129159194147 < 0.05 we reject the null hypothesis: There is no relationship between Relocation status and Acceptance status.Thus alternate hypothesis: There is a relationship between Relocation status and Acceptance status holds good 

但我不确定这是否是正确的方法

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)