是否有一个相对简单的卷积神经网络架构用于音频去噪？

问题描述

我需要使用CNN来完成一些音频任务，因此我选择了比明显的音频分类器更有趣的事情，因此我着手实现卷积去噪自动编码器，但是将简单的图像去噪架构应用于事实证明该任务非常不成功。我正在Google Colab上使用Keras的云GPU。我正在使用STFT表示我的音频。到目前为止，下面是我的架构和结果（指向频谱图的链接），有什么建议吗？

input_layer = Input(shape=X_train[0].shape)
reshaped_input = Reshape([X_train[0].shape[0],X_train[0].shape[1],1],input_shape=X_train[0].shape)(input_layer)
skip0 = Conv2D(64,(3,3),activation='relu',padding='same')(reshaped_input)
h = Batchnormalization()(skip0)
h = MaxPooling2D((2,2),padding='same')(h)

# decoder
h = Conv2D(64,padding='same')(h)
h = Batchnormalization()(h)
h = UpSampling2D((2,2))(h)
h = add([h,skip0])
h = Conv2D(1,padding='same')(h)
output_layer = add([h,reshaped_input])
reshaped_output = Reshape(X_train[0].shape)(output_layer)

autoencoder = Model(input_layer,reshaped_output)
autoencoder.summary()
autoencoder.compile(loss='mse',optimizer='adam')

Clean Recording Noisy Recording (I simply added scaled-down Gaussian noise) Prediction

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

conv-neural-network python signal-processing