如何使用 DTW + kNN

问题描述

我一直试图通过从音频中提取 MFCC（使用 librosa 库）来区分单词，然后应用动态时间扭曲使用 kNN 在音频之间进行分类。

例如，我试图在“cat”和“anything”这两个词之间进行识别。我的问题是我找不到“anything”这个词和“cat”这个词的两种不同发音之间的相似之处。根据 DTW，这三个词之间的距离似乎相等。我试图减少或增加 MFCC 中使用的系数数量，以对 MFCC 进行预处理（标准化和去除均值），但似乎没有任何效果。

我正在使用 dtw 包中的 DTW 函数：dist,cost,acc_cost,path = dtw(mfcc3.T,mfcc2.T,dist=lambda x,y: norm(x - y,ord=1))

我的问题是：为什么你认为我不能对这些数据进行分类？

——在与 DTW 比较之前，我是否对数据进行了不充分的预处理？

——我需要更智能地调整 DTW 以便有效地区分不同单词的距离吗？

——在我的情况下，kNN 或 DTW 是否不够用？我该如何解决这个问题？

以下是代码的主要行：

for i in range(len(mots)):
y1,sr1 = librosa.load(dirname+"/"+mots[i])
mfcc1 = librosa.feature.mfcc(y1,sr1)
for j in range(len(mots)):
    y2,sr2 = librosa.load(dirname+"/"+mots[j])
    mfcc2 = librosa.feature.mfcc(y2,sr2)
    dist,_,_ = dtw(mfcc1.T,ord=1))
    distances[i,j] = dist #representing the distance between the spoken words i and j


label = ['cat','anything']

# # Train a kNN classifier to determine if the audio is cat or anything

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=3,metric='euclidean')
classifier.fit(distances,y)


# # Comparing a sample with these distances to find which word is the most similar to the sample
y,sr = librosa.load(dst)
mfcc = librosa.feature.mfcc(y,sr)
distanceTest = []
for i in range(len(mots)):
    y1,sr1 = librosa.load(dirname+"/"+mots[i])
    mfcc1 = librosa.feature.mfcc(y1,sr1)
    dist,_ = dtw(mfcc.T,mfcc1.T,ord=1))
    distanceTest.append(dist)

#result
pre = classifier.predict([distanceTest])[0]

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

dtw knn librosa python speech-recognition

如何使用 DTW + kNN

问题描述

解决方法

相关问答