问题描述
我正在尝试为我的分类器和 csp 找到最佳参数,但是 train_test_split 和交叉验证的 f1 分数是不同的。这是我的 train_test_split 代码:
X_train,X_test,y_train,y_test = train_test_split(data_1,classes_1,test_size=0.2,stratify=classes_1,random_state=42)
sc = Scaler(scalings='mean')
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
csp = CSP(n_components=10)
X_train = csp.fit_transform(X_train,y_train)
X_test = csp.transform(X_test)
svc = SVC(random_state=42)
svc.fit(X_train,y_train)
precision_recall_fscore_support(y_test,svc.predict(X_test),average='macro')
结果:
Computing rank from data with rank=None
Using tolerance 9.9 (2.2e-16 eps * 22 dim * 2e+15 max singular value)
Estimated rank (mag): 22
MAG: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating covariance using EMPIRICAL
Done.
Computing rank from data with rank=None
Using tolerance 9.7 (2.2e-16 eps * 22 dim * 2e+15 max singular value)
Estimated rank (mag): 22
MAG: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating covariance using EMPIRICAL
Done.
Computing rank from data with rank=None
Using tolerance 9.6 (2.2e-16 eps * 22 dim * 2e+15 max singular value)
Estimated rank (mag): 22
MAG: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating covariance using EMPIRICAL
Done.
Computing rank from data with rank=None
Using tolerance 9.7 (2.2e-16 eps * 22 dim * 2e+15 max singular value)
Estimated rank (mag): 22
MAG: rank 22 computed from 22 data channels with 0 projectors
Reducing data rank from 22 -> 22
Estimating covariance using EMPIRICAL
Done.
(0.6939451810472472,0.6924810661243371,0.690826330421708,None)
这是我的 cross_validate 代码:
scorer = {'precision' : make_scorer(precision_score,average='macro'),'recall' : make_scorer(recall_score,'f1_score' : make_scorer(f1_score,average='macro')}
pipe = Pipeline([('sc',Scaler(scalings='mean')),('csp',CSP(n_components=10)),('svc',SVC(random_state=42))])
cross_validate(pipe,data_1,cv=5,scoring=scorer,n_jobs=-1)
结果:
{'fit_time': array([14.22801042,14.27563739,14.52271605,14.33256483,14.07182837]),'score_time': array([0.51321149,0.46285748,0.39439225,0.47808695,0.49983597]),'test_precision': array([0.45328272,0.51844098,0.45984035,0.57410918,0.53001667]),'test_recall': array([0.45506263,0.5237793,0.45782758,0.56854307,0.53123101]),'test_f1_score': array([0.45276712,0.51451502,0.45853913,0.56818711,0.5278188 ])}
我预计结果会有微小差异,但我认为 0.2 差异太大,有人知道为什么会这样吗?
我正在使用来自 https://www.inhinyeru.com/blog/implement-nuxt-dynamic-sitemaps-in-multiple-data-sources-1624734684 的 A01E BCICV 2a gdf 数据集并应用自定义窗口函数将我的数据长度减少到 1 秒
# Aplicar ventanas
def windowing(data,classes,duracion=1,overlap=0.8,fs=250):
# Cantidad de mediciones por trial (1000)
limite = len(data[0,0])
data_convertida = []
classes_convertida = []
# Tamaño a convertir de mediciones
muestras = int(duracion*fs)
for idx in range(len(data)):
ptrLeft = 0
ptrRight = muestras
while limite >= ptrRight:
if not data_convertida: # Primera iteracion
data_convertida = [data[idx,:,ptrLeft:ptrRight]]
else:
data_convertida.append(data[idx,ptrLeft:ptrRight])
ptrLeft = ptrRight - int((ptrRight-ptrLeft)*overlap)
ptrRight = ptrLeft + muestras
classes_convertida.append(classes[idx])
return np.array(data_convertida),np.array(classes_convertida)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)