问题描述
我最近开始阅读Andrew W. Trask的Grokking深度学习书并实现了CNN,效果很好,但后来我尝试添加更多隐藏的CNN层,但以失败告终,我只是无法获得适合CNN的尺寸反向传播。
我的代码如下:
for iteration in range(iterations):
'''
images: (1000,3,32,32)
kernel_rows,kernel_cols,num_colors = 4,4,3
num_kernels_1,num_kernels_2 = 15,30
hidden_size = ((input_rows - 2*kernel_rows) * (input_cols - 2*kernel_cols)) * num_kernels_2
The size the matrix has that is the output after doing 2 convolutions
(thats why its 2*kernel_rows and 2*kernel_cols)
kernels_1 = (kernel_rows*kernel_cols * num_colors,num_kernels_1)
kernels_2 = (kernel_rows*kernel_cols * num_kernels_1,num_kernels_2)
weights_1 = (hidden_size,100)
weights_2 = (100,30)
weights_3 = (30,10)
'''
sample_size = len(images)
C_0 = convolution(images,input_rows,input_cols,kernel_rows,kernel_cols)
C_1 = tanh(C_0 @ kernels_1)
C_1_flattened = C_1.reshape(sample_size,-1)
C_1 = C_1.reshape(sample_size,-1,(input_rows - kernel_rows),(input_cols - kernel_cols))
C_1 = convolution(C_1,C_1.shape[2],C_1.shape[3],kernel_cols)
C_2 = tanh(C_1 @ kernels_2)
C_2 = C_2.reshape(sample_size,-1)
Z_1 = C_2 @ weights_1
A_1 = tanh(Z_1)
Z_2 = A_1 @ weights_2
A_2 = tanh(Z_2)
Z_3 = A_2 @ weights_3
A_3 = softmax(Z_3)
delta_A_3 = (labels - A_3) / len(images)
delta_A_2 = (delta_A_3 @ weights_3.T) * tanh2deriv(A_2)
delta_A_1 = (delta_A_2 @ weights_2.T) * tanh2deriv(A_1)
delta_C_2 = (delta_A_1 @ weights_1.T) * tanh2deriv(C_2)
k_update_2 = C_1.reshape(kernel_rows*kernel_cols*num_kernels_1,-1) @ delta_C_2.reshape(-1,num_kernels_2)
delta_C_1 = (delta_C_2.reshape(sample_size,num_kernels_2) @ kernels_2.T) * tanh2deriv(C_1)
k_update_1 = C_0.reshape(kernel_rows*kernel_cols*num_colors,-1) @ delta_C_1.reshape(-1,num_kernels_1)
cost = np.sum((labels - A_3)**2) / len(images)
weights_3 += alpha * (A_3.T @ delta_A_3)
weights_2 += alpha * (A_2.T @ delta_A_2)
weights_1 += alpha * (A_1.T @ delta_A_1)
kernels_2 -= alpha * k_update_2
kernels_1 -= alpha * k_update_1
print(str(cost)[:8])
有问题的行是我计算k_update_1
的行,其中C_0.reshape(kernel_rows*kernel_cols*num_colors,-1)
的形状为(48,784000)而delta_C_1.reshape(-1,num_kernels_1)
的形状为(9216000,15),我正在尝试将我的kernels_1矩阵更新为(48,15)的形状,显然不会加起来。
辅助功能是:
def convolution(data,kernel_cols):
sects = []
for row_start in range(input_rows - kernel_rows):
for col_start in range(input_cols - kernel_cols):
section = get_image_section(data,row_start,row_start + kernel_rows,col_start,col_start + kernel_cols)
sects.append(section)
expanded_input = np.concatenate(sects,axis = 1)
es = expanded_input.shape
return expanded_input.reshape(es[0],es[1],-1)
和:
def get_image_section(layer,row_from,row_to,col_from,col_to):
section = layer[:,:,row_from:row_to,col_from:col_to]
return np.expand_dims(section,axis = 1)
解决方法
我找到了解决我的问题的方法,在我可以将2个矩阵相乘之前,我必须撤消卷积,我使用以下函数做到了这一点:
def reverse_convolution(data,sample_size,num_kernels,kernel_rows,kernel_cols,output_shape,stride):
data_expanded_dims = data.reshape(sample_size,-1,kernel_cols)
output = np.zeros(output_shape)
data_row = 0
for row_start in range(0,output_shape[2] - kernel_rows,stride):
for col_start in range(0,output_shape[3] - kernel_rows,stride):
output[:,:,row_start:row_start+kernel_rows,col_start:col_start+kernel_cols] = data_expanded_dims[:,data_row,:]
data_row += 1
return output.reshape(sample_size,-1)
我像这样声明delta_C_1后使用该函数:
delta_C_1_reverse_conv = reverse_convolution(delta_C_1,num_kernels_1,output_shape = C_1_shape,stride = stride)
C_1_shape是卷积之前C_1的形状