问题描述
我正在尝试在 Caffe 中训练一个简单的 CNN,用于对分成 500 个元素的向量的时间序列数据进行分类。我首先在 Google Colab 中执行此操作:
!apt install caffe-cpu
关注
import caffe
为此,我将数据保存到 hdf5 文件中并定义了三个 .prototxt 文件。训练 sg_train.prototxt 中定义的模型:
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/content/gdrive/MyDrive/data/train2_h5_list.txt"
batch_size: 32
}
include {
phase:TRAIN
}
}
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "/content/gdrive/MyDrive/data/test2_h5_list.txt"
batch_size: 32
}
include {
phase:TEST
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 4
kernel_size: 10
stride: 1
pad: 1
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1"
top: "conv2"
convolution_param {
num_output: 8
kernel_size: 10
stride: 1
pad: 1
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "conv2"
top: "fc1"
inner_product_param {
num_output: 1
}
}
layer {
name: "loss"
type: "softmaxWithLoss"
bottom: "fc1"
bottom: "label"
}
我跑:
caffe.set_mode_cpu()
solver = caffe.get_solver("sg_caffe_solver.prototxt")
solver.solve()
其中 sg_caffe_solver.prototxt 包含:
# The train/test net protocol buffer deFinition
net: "/content/gdrive/MyDrive/models/sg_train.prototxt"
# test_iter specifies how many forward passes the test should carry out.
test_iter: 1
# Carry out testing every test_interval training iterations.
test_interval: 1000
# The base learning rate,momentum and the weight decay of the network.
base_lr: 0.0001
momentum: 0.001
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# display every 100 iterations
display: 1000
# The maximum number of iterations
max_iter: 5000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "sg_"
# solver mode: cpu or GPU
solver_mode: cpu # GPU
重要的是,如果我在 sg_train.prototxt 中的 kernel_h
层的定义中使用 kernel_w
和 conv
,会话崩溃并且会话日志包含警告 F0209 12:13:53.816366 2919 base_conv_layer.cpp:27] Check Failed: num_spatial_axes_ == 2 (0 vs. 2) kernel_h & kernel_w can only be used for 2D convolution.
否则,在创建 solver.solve()
和 sg__iter_5000.caffemodel
快照的情况下,sg__iter_5000.solverstate
执行时不会出现明显错误(尽管有时内核会重新启动)。
当我后来尝试通过首先使用 net = caffe.Net("sg_deploy.prototxt","sg__iter_5000.caffemodel",caffe.TEST)
实例化 cnn 来测试训练模型时,我遇到了各种错误和内核崩溃,我不明白。 sg_deploy.prototxt 的内容与 sg_train.prototxt 的不同之处在于 HDF5
层被替换为 Input
层:
layer {
type: "Input"
name: "data"
top: "data"
input_param { shape: { dim: 1 dim: 1 dim: 1 dim: 500 } }
}
然后将 softmaxWithLoss
层替换为 softmax
:
layer {
name: "prob1"
type: "softmax"
bottom: "fc1"
top: "prob1"
}
当我尝试初始化 net
时,内核崩溃了,我在日志中收到以下警告:
F0209 12:40:08.371551 3281 blob.cpp:32] Check Failed: shape[i] >= 0 (-6 vs. 0)
I0209 12:40:08.371512 3281 net.cpp:380] conv1 -> conv1
I0209 12:40:08.371505 3281 net.cpp:406] conv1 <- data
我尝试将 kernel_size = 10
替换为 kernel_w = 10
和 kernel_h = 1
。之后初始化 net
也会导致崩溃,但有不同的日志警告:
F0209 12:43:09.321560 3410 net.cpp:757] Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 4 500 (2000); target param shape is 4 1 1 10 (40). To learn this layer's parameters from scratch rather than copying from a saved net,rename the layer.
I0209 12:43:09.310168 3410 net.cpp:255] Network initialization done.
I0209 12:43:09.310160 3410 net.cpp:242] This network produces output prob1
此时我不知道如何继续。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)