为什么我的准确度越来越低,损失越来越大?

问题描述

由于具有分布式训练 API 支持,我选择了 TensorFlow 估算器进行实施。好吧,老实说,我找到了一个代码,这很容易理解。所以我选择它来在多个 GPU 上实现基于传感器的信号识别。

代码主要用于在多个 GPU 上进行 MNIST 数据集训练。执行代码后出现错误,该错误是由于无法正常运行 mnist 数据集 API 下载而出现的。这是该代码链接https://github.com/shu-yusa/tensorflow-mirrored-strategy-sample/blob/master/cnn_mnist.py

我在谷歌上找不到任何解决方案。这背后可能有多个问题; Tensorflow1 中的实现可以是其中之一。我试图将该代码转换为 Tensorflow2。大部分代码被转换;但是,tf.contrib 相关的东西并没有恢复。所以我决定编辑基于传感器的信号(时间序列)的代码。 但是我跑代码的时候,准确率是30%,损失值更大。另一方面,当我在 Low-level tensor API 上的相同数据集上实现 CNN 时,我获得了 95% 的准确率。现在我不知道为什么它在 tf 估计器上给出的准确度很低。在我看来,原因之一可能是 CNN 的错误输入。代码如下:

def cnn_model_fn(features,labels,mode):
"""Model function for CNN."""
# Input Layer
# Reshape X to 4-D tensor: [batch_size,width,height,channels]
# input 1 * segment_size,and have three channel,in accelrometer we have x,y,z
input_layer = tf.reshape(features["x"],[-1,1,segment_size,num_input_channels])

# Convolutional Layer #1
# Computes 32 features using a 5x5 filter with ReLU activation.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size,28,1]
# Output Tensor Shape: [batch_size,32]
conv1 = tf.compat.v1.layers.conv2d(
    inputs=input_layer,filters=32,kernel_size=[1,12],padding="same",activation=tf.nn.relu)

# Pooling Layer #1
# First max pooling layer with a 2x2 filter and stride of 2
# Input Tensor Shape: [batch_size,32]
# Output Tensor Shape: [batch_size,14,32]
pool1 = tf.compat.v1.layers.max_pooling2d(inputs=conv1,pool_size=[1,4],strides=2,padding='same')

# Convolutional Layer #2
# Computes 64 features using a 5x5 filter.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size,64]
conv2 = tf.compat.v1.layers.conv2d(
    inputs=pool1,filters=64,activation=tf.nn.relu)

# Pooling Layer #2
# Second max pooling layer with a 2x2 filter and stride of 2
# Input Tensor Shape: [batch_size,64]
# Output Tensor Shape: [batch_size,7,64]
pool2 = tf.compat.v1.layers.max_pooling2d(inputs=conv2,padding='same')

# Flatten tensor into a batch of vectors
# Input Tensor Shape: [batch_size,7 * 7 * 64]
pool2_flat = tf.reshape(pool2,1 * 50 * 64])

# Dense Layer
# Densely connected layer with 1024 neurons
# Input Tensor Shape: [batch_size,7 * 7 * 64]
# Output Tensor Shape: [batch_size,1024]
dense = tf.compat.v1.layers.dense(inputs=pool2_flat,units=1024,activation=tf.nn.relu)

# Add dropout operation; 0.6 probability that element will be kept
dropout = tf.compat.v1.layers.dropout(
    inputs=dense,rate=0.4,training=mode == tf.estimator.ModeKeys.TRAIN)

# Logits layer
# Input Tensor Shape: [batch_size,1024]
# Output Tensor Shape: [batch_size,10]
logits = tf.compat.v1.layers.dense(inputs=dropout,units=6)  # unit =10 in our case we have 6 classes so will 6 units at last layer
predictions = {
    # Generate predictions (for PREDICT and EVAL mode)
    "classes": tf.argmax(input=logits,axis=1),# Add `softmax_tensor` to the graph. It is used for PREDICT and by the
    # `logging_hook`.
    "probabilities": tf.nn.softmax(logits,name="softmax_tensor")
}
if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode,predictions=predictions)

# labels = tf.argmax(tf.cast(labels,dtype=tf.int32),1)
# Calculate Loss (for both TRAIN and EVAL modes)
loss = tf.compat.v1.losses.sparse_softmax_cross_entropy(labels=labels,logits=logits)
# here we define how we calculate our accuracy
# if you want to monitor your training accuracy you need these two lines
# accuracy = tf.compat.v1.metrics.accuracy(labels=labels,predictions=predictions['classes'],name='acc_op')
# tf.summary.scalar('accuracy',accuracy[1])
# Configure the Training Op (for TRAIN mode)
if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,global_step=tf.compat.v1.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,loss=loss,train_op=train_op)

# Add evaluation metrics (for EVAL mode)
eval_metric_ops = {
    "accuracy": tf.compat.v1.metrics.accuracy(labels,predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
    mode=mode,eval_metric_ops=eval_metric_ops)

调试后: 在 def cnn_model_fn(features,mode): 我得到 {'x': } 和标签 Tensor("IteratorGetNext:1",shape=(?,),dtype=int64) 和模式 {str} train.

这里是测试数据的结果: 保存全局步骤 1000 的“checkpoint_path”摘要:/tmp/tmp77ffy2i9/model.ckpt-1000 {'准确度':0.3959022,'损失':1.698279,'global_step':1000}

谁能帮助我的模型给出较低的准确度和巨大的损失值?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)