问题描述
我有一个训练/测试数据,其中包含120000个输入x,y和标签l,使得
if(x>100 && y>100) l = 1 else l = 0.
我在python中生成上述训练/测试数据,并将其存储在numpy数组中。 有关火车/测试数据的一些信息:
X_train.shape = 120000,2; y_train.shape = 120000,1; X_test.shape=20000,2; y_test.shape=20000,1
我使用h5py将此数组转换为train.h5和test.h5。我在网络上训练这些数据。
name: "LogisticRegressionNet"
layer {
name: "data"
type: "HDF5Data"
#type: "Data"
#top: "data"
top: "image"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "F:/swap/caffe/caffe/data/scai/data/train.txt"
batch_size: 10
}
#data_param {
# source: "F:/swap/caffe/caffe/data/scai/data/train.lmdb"
# batch_size: 10
# backend: LMDB
#}
}
layer {
name: "data"
type: "HDF5Data"
#type: "Data"
#top: "data"
top: "image"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "F:/swap/caffe/caffe/data/scai/data/test.txt"
batch_size: 10
}
#data_param {
# source: "F:/swap/caffe/caffe/data/scai/data/test.lmdb"
# batch_size: 10
# backend: LMDB
#}
}
layer {
name: "fc1"
type: "InnerProduct"
#bottom: "data"
bottom: "image"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc1"
bottom: "label"
top: "loss"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
我在此数据上的准确性为90%。 我无法使用caffe的HDF5层,因此我使用以下脚本将h5转换为lmdb。
from numpy import *
import lmdb
import h5py
import sys
sys.path.append (r'F:\swap\caffe\caffe\python')
import caffe
tgt_db = "test.lmdb"
src_db = "test.h5"
[...]
env = lmdb.open(tgt_db,map_size=1000000)
with h5py.File(src_db,'r') as f:
# extract data from hdf file
ar_data = array(f['data'],dtype=float32)
ar_label = array(f['label'],dtype=int)
n,c,w,h = 120000,1,2,1
ar_label = ar_label.flatten()
assert len(ar_label) == n # number of labels has to match the number of input images!
# write data to lmdb
for i in range(n):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = c
datum.height = h
datum.width = w
datum.data = ar_data[i,:].tobytes()
datum.label = ar_label[i]
#print(datum)
str_id = '{:08}'.format(i) # create a8 digit string id based on the index
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'),datum.SerializeToString())
我通过同一网络运行此lmdb,但我的准确率仅为60%。
有人可以指出我要去哪里吗?
原始火车/测试数据为120000,2,但lmdb需要4D输入。我会弄乱转换脚本中的n,c,w,h吗?
请帮忙。谢谢。
PS:一种将np数组直接转换为lmdb的方法对我也应该有效。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)