问题描述
例如:
如果三个特征包含 2 个分类变量和 1 个目标变量 在使用卡方检验确定每个特征与目标变量的相关性时,我无法找到强关系。所以我想使用这两个特征的组合来检查是否与目标变量存在相关性,但我很困惑对于这种情况我们是否可以使用卡方检验或其他一些方法?
例如:
ct_reloc_status = pd.crosstab(df_offer_details['percentage_hike_offered_bin'].sample(frac=0.5,replace=True,random_state=1),[df_offer_details['Candidate relocation status'].sample(frac=0.5,df_offer_details['Acceptance status'].sample(frac=0.5,random_state=1)])
ct_reloc_status
# we carry out a contingency test to check whether there is a correlation with the target variable
# and relocation status
H0 = "There is no relationship between Relocation status and Acceptance status"
Ha = "There is a relationship between Relocation status and Acceptance status"
stat,p,dof,expected = chi2_contingency(ct_reloc_status)
print('p-value: ',p)
prob = 0.95
critical = chi2.ppf(prob,dof)
print('probability=%.3f,critical=%.3f,stat=%.3f' % (prob,critical,stat))
if abs(stat) >= critical :
print(f'''Since p-value {p} < 0.05 we reject null hypothesis: {H0}.Thus alternate hypothesis: {Ha} holds good ''')
else:
print(f'Fail to reject null hypothesis {H0}')
Result:
p-value: 0.019814129159194147
probability=0.950,critical=28.869,stat=32.380
Since p-value 0.019814129159194147 < 0.05 we reject the null hypothesis: There is no relationship between Relocation status and Acceptance status.Thus alternate hypothesis: There is a relationship between Relocation status and Acceptance status holds good
但我不确定这是否是正确的方法
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)