从 python_speech_features 使用 mfcc 并获得内存错误

问题描述

我正在使用 python_speech_features 中的 mfcc 并尝试从 (5-120) 秒范围内的波形文件提取特征。对于持续时间较短(如 (10,20) 秒)的文件,我可以提取特征,但对于较大的文件,它会显示错误

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-6-ea3546938d03> in <module>
     14     print("\n\tFeatures\n")
     15     data,sampling_rate = librosa.load(sample_data[i])
---> 16     mfcc_features = mfcc(data,sampling_rate,winlen=30,nfft=66150)
     17     print(pd.DataFrame(mfcc_features))
     18     print("========================================\n")

~/anaconda3/lib/python3.8/site-packages/python_speech_features/base.py in mfcc(signal,samplerate,winlen,winstep,numcep,nfilt,nfft,lowfreq,highfreq,preemph,ceplifter,appendEnergy,winfunc)
     26     :returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.
     27     """
---> 28     feat,energy = fbank(signal,winfunc)
     29     feat = numpy.log(feat)
     30     feat = dct(feat,type=2,axis=1,norm='ortho')[:,:numcep]

~/anaconda3/lib/python3.8/site-packages/python_speech_features/base.py in fbank(signal,winfunc)
     53     highfreq= highfreq or samplerate/2
     54     signal = sigproc.preemphasis(signal,preemph)
---> 55     frames = sigproc.framesig(signal,winlen*samplerate,winsteP*samplerate,winfunc)
     56     pspec = sigproc.powspec(frames,nfft)
     57     energy = numpy.sum(pspec,1) # this stores the total energy in each frame

~/anaconda3/lib/python3.8/site-packages/python_speech_features/sigproc.py in framesig(sig,frame_len,frame_step,winfunc)
     33     padsignal = numpy.concatenate((sig,zeros))
     34 
---> 35     indices = numpy.tile(numpy.arange(0,frame_len),(numframes,1)) + numpy.tile(numpy.arange(0,numframes*frame_step,frame_step),(frame_len,1)).T
     36     indices = numpy.array(indices,dtype=numpy.int32)
     37     frames = padsignal[indices]

<__array_function__ internals> in tile(*args,**kwargs)

~/anaconda3/lib/python3.8/site-packages/numpy/lib/shape_base.py in tile(A,reps)
   1256         for dim_in,nrep in zip(c.shape,tup):
   1257             if nrep != 1:
-> 1258                 c = c.reshape(-1,n).repeat(nrep,0)
   1259             n //= dim_in
   1260     return c.reshape(shape_out)

MemoryError: Unable to allocate 12.8 GiB for an array with shape (2591,661500) and data type int64

这是代码,我在 Jupyter 笔记本上运行它。我在具有 8Gb RAM 的笔记本电脑、具有 32 GB RAM 的 PC 和具有近 12Gb RAM 的 Google Collab 计算引擎上尝试过,但错误仍然存​​在。

print("\nSample Data:")
print("============\n")
path = ('speech-sample-data')
sample_data = [os.path.join(dp,f) for dp,dn,filenames in os.walk(path) for f in filenames if os.path.splitext(f)[1] == '.wav']

for i in range(5):
    print("Speech: ")
    ipd.display(ipd.Audio(sample_data[i]))
    print("Type: \n\tnormal\n")
    print("\n\tFeatures\n")
    data,sampling_rate = librosa.load(sample_data[i])
    mfcc_features = mfcc(data,nfft=66150)     
    print(pd.DataFrame(mfcc_features))
    print("========================================\n")
    print("Speech: ")
    ipd.display(ipd.Audio(sample_data[i+5]))
    print("Type: \n\tToxic\n")
    print("\n\tFeatures\n")
    data,sampling_rate = librosa.load(sample_data[i+5])
    mfcc_features = mfcc(data,nfft=66150)     
    print(pd.DataFrame(mfcc_features))
    print("========================================\n")

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)