问题描述
我有一个响应值不平衡的数据集,我有更多的合格拒绝值和非拒绝值,所以我希望平衡我的数据集。
为此,有一个代码可以与现在不推荐使用的Meta_query
一起使用,但是现在我需要对其进行改编,并且我不太了解它,因此我在寻求帮助。
原始代码是:
$args = array(
'update_post_term_cache' => false,'post_type' => 'vandelay_industries','posts_per_page' => $request['per_page'],'paged' => $request['page'],'geo_query' => array(
'lat_field' => 'flat_lat',// this is the name of the Meta field storing latitude
'lng_field' => 'flat_lng',// this is the name of the Meta field storing longitude
'latitude' => $lat1,// this is the latitude of the point we are getting distance from
'longitude' => $lng1,// this is the longitude of the point we are getting distance from
'distance' => $proximity,// this is the maximum distance to search
'units' => 'miles' // this supports options: miles,mi,kilometers,km
),'Meta_query' => array(
'relation' => 'OR',array(
'key' => 'hair_types','value' => $value1,'compare' => 'LIKE'
),array(
'key' => 'education_type','value' => $value2,'compare' => 'IN'
),array(
'key' => 'something_else','value' => $value3,array(
'key' => 'george_is_getting_upset','value' => $value4,)
);
其中cross_validation.StratifiedKFold
是fit_transformed的数据集,转换为numpy浮点数组并进行缩放,而def stratified_cv(X,y,clf_class,shuffle=True,n_folds=10,**kwargs):
stratified_k_fold = cross_validation.StratifiedKFold(y,n_folds=n_folds,shuffle=shuffle)
y_pred = y.copy()
# ii -> train
# jj -> test indices
for ii,jj in stratified_k_fold:
X_train,X_test = X[ii],X[jj]
y_train = y[ii]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[jj] = clf.predict(X_test)
return y_pred
是转换为int(0的数组)的“拒绝”与“未拒绝”分类或1个)。最后,X
可以是y
,clf_class(**kwargs)
和ensemble.GradientBoostingClassifier
svm.SVC
ensemble.RandomForestClassifier
解决方法
StratifiedKFold
已移至model_selection
。所以你应该这样做:
from sklearn.model_selection import StratifiedKFold
def stratified_cv(X,y,clf_class,shuffle=True,n_folds=10,**kwargs):
stratified_k_fold = StratifiedKFold(n_splits=n_folds,shuffle=shuffle)
y_pred = y.copy()
# ii -> train
# jj -> test indices
for ii,jj in stratified_k_fold.split(X,y):
X_train,X_test = X[ii],X[jj]
y_train = y[ii]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[jj] = clf.predict(X_test)
return y_pred