问题描述
我正在研究语音识别系统,我从 GitHub 获取了代码。在此代码中添加了一些内容:
DATASET_PATH = "F://MS//MS-4//LibriSpeech"
*JSON_PATH = "data_10.json"
SAMPLE_RATE = 22050
TRACK_DURATION = 15
SAMPLES_PER_TRACK = SAMPLE_RATE * TRACK_DURATION*
def save_mfcc (dataset_path,json_path,n_mfcc=13,n_fft=2048,hop_length=512,num_segments=5):
data = {
"mapping": [ ],"mfcc": [ ],"labels": [ ]}
num_samples_per_segment = int(SAMPLES_PER_TRACK / num_segments)
expected_num_mfcc_vectors_per_segment = math.ceil(num_samples_per_segment / hop_length)
for i,(dirpath,dirnames,filenames) in enumerate(os.walk(dataset_path)):
if dirpath is not dataset_path:
dirpath_components = os.path.split(dirpath)
semantic_label = dirpath_components[-1]
data["mapping"].append(semantic_label)
print("\nProcessing: {}".format(semantic_label))
for f in filenames:
file_path = os.path.join(dirpath,f)
signal,sr = librosa.load(file_path,sr = SAMPLE_RATE)
for s in range(num_segments):
start_sample = num_samples_per_segment * s
finish_sample = start_sample + num_samples_per_segment
mfcc=librosa.feature.mfcc(signal[start_sample:finish_sample],sr=sr,n_fft=n_fft,n_mfcc=n_mfcc,hop_length=hop_length)
mfcc = mfcc.T
if len(mfcc) == expected_num_mfcc_vectors_per_segment:
data["mfcc"].append(mfcc.tolist())
data["labels"].append(i-1)
print("{},segment:{}".format(file_path,s+1 ))
with open(json_path,"w") as fp:
json.dump(data,fp,indent=4)
if __name__ == "__main__":
save_mfcc(DATASET_PATH,JSON_PATH,num_segments=10)
这是错误。我想知道如何修复它:
警告(来自警告模块):
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\librosa\core\spectrum.py”,第 222 行
警告.warn(
用户警告:n_fft=2048 对于长度=0 的输入信号来说太小
回溯(最近一次调用最后一次):
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\datasetread.py”,第 73 行,
save_mfcc(DATASET_PATH,num_segments=10)
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\datasetread.py”,第 55 行,在 save_mfcc
mfcc=librosa.feature.mfcc(signal[start_sample:finish_sample],hop_length=hop_length)
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\librosa\feature\spectral.py”,第 1852 行,在 mfcc
S = power_to_db(melspectrogram(y=y,**kwargs))
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\librosa\feature\spectral.py”,第 1996 行,在 melspectrogram
S,n_fft = _spectrogram(
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\librosa\core\spectrum.py”,第 2512 行,在 _spectrogram
stft(
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\librosa\core\spectrum.py”,第 228 行,在 stft
y = np.pad(y,int(n_fft // 2),mode=pad_mode)
文件“array_function internals>”,第 5 行,在 pad
文件“C:\Users\Hp\AppData\Local\Programs\Python\python39\lib\site-packages\numpy\lib\arraypad.py”,第 814 行,在 pad
引发 ValueError(
ValueError: 不能使用“constant”或“empty”以外的模式扩展空轴 0
解决方法
在您的代码中,您采用了 TRACK_DURATION=15(秒),但我认为在您的数据集中可能有一些曲目(文件)的持续时间少于 15 秒。所以尝试减少 TRACK_DURATION