为什么 Google colab TPU 比 CPU 或 GPU 慢?

问题描述

from keras.datasets import mnist
from keras import models,layers
from keras.utils import to_categorical

import tensorflow as tf



tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print('Running on TPU ',tpu.cluster_spec().as_dict()['worker'])

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)

strategy = tf.distribute.experimental.TPUStrategy(tpu)
print("REPLICAS: ",strategy.num_replicas_in_sync)




with strategy.scope():
  network = models.Sequential()
  network.add(layers.Dense(512,activation='relu',input_shape=(28 * 28,)))
  network.add(layers.Dense(10,activation='softmax'))

  network.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])

 
  
(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

train_images = train_images.reshape((60000,28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000,28 * 28))
test_images = test_images.astype('float32') / 255        

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

network.fit(train_images,train_labels,epochs=5,batch_size=128)

test_loss,test_acc = network.evaluate(test_images,test_labels)

print("test_acc: ",test_acc)


结果:
Running on TPU  ['10.123.181.34:8470']
WARNING:tensorflow:TPU system grpc://10.123.181.34:8470 has already been initialized. Reinitializing the TPU can cause prevIoUsly created variables on TPU to be lost.
WARNING:tensorflow:TPU system grpc://10.123.181.34:8470 has already been initialized. Reinitializing the TPU can cause prevIoUsly created variables on TPU to be lost.
INFO:tensorflow:Initializing the TPU system: grpc://10.123.181.34:8470
INFO:tensorflow:Initializing the TPU system: grpc://10.123.181.34:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
WARNING:absl:`tf.distribute.experimental.TPUStrategy` is deprecated,please use  the non experimental symbol `tf.distribute.TPUStrategy` instead.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:cpu:0,cpu,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:cpu:0,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:cpu:0,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0,TPU,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYstem:0,TPU_SYstem,0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_cpu:0,XLA_cpu,0)
REPLICAS:  8
Epoch 1/5
469/469 [==============================] - 11s 18ms/step - loss: 0.4224 - accuracy: 0.8787
Epoch 2/5
469/469 [==============================] - 8s 18ms/step - loss: 0.1089 - accuracy: 0.9675
Epoch 3/5
469/469 [==============================] - 9s 18ms/step - loss: 0.0695 - accuracy: 0.9789
Epoch 4/5
469/469 [==============================] - 8s 17ms/step - loss: 0.0485 - accuracy: 0.9856
Epoch 5/5
469/469 [==============================] - 8s 18ms/step - loss: 0.0345 - accuracy: 0.9898
313/313 [==============================] - 7s 18ms/step - loss: 0.0684 - accuracy: 0.9791
test_acc:  0.9790999889373779

它比 cpu 和 GPU 慢。 为什么慢?我该如何解决

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)