为什么在 Tensorflow 中使用 uint32 自动使用 XLA 编译

问题描述

我在使用 TF 2.3 时遇到了问题，尤其是 tf.uint32 类型的张量。似乎有几个操作不支持 tf.uint32，例如对于 tf.foldl，如果累加器的类型为 tf.uint32，则操作将使用 XLA 编译，这会带来全新的复杂性（例如，未实现对跨越 XLA/TF 边界的 TensorList 的支持。）为什么不直接引发错误并让用户决定是否编译该函数？

我的问题是，我是否认为编译操作的原因是因为张量是 dtype uint32？还是有其他事情发生？

这是一个例子：

In [1]: import tensorflow as tf
2020-12-22 20:16:40.400712: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1

In [2]: tf.foldl(lambda a,x: tf.bitwise.bitwise_or(a,x),tf.constant([1,2,3],tf.uint32),tf.constant(0,tf.uint32))
2020-12-22 20:16:43.607662: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-22 20:16:43.632255: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] Failed call to cuInit: CUDA_ERROR_UNKNowN: unkNown error
2020-12-22 20:16:43.632280: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: earth
2020-12-22 20:16:43.632287: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: earth
2020-12-22 20:16:43.632362: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.45.1
2020-12-22 20:16:43.632381: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.45.1
2020-12-22 20:16:43.632388: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.45.1
2020-12-22 20:16:43.632703: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (onednN)to use the following cpu instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-12-22 20:16:43.639576: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] cpu Frequency: 2299965000 Hz
2020-12-22 20:16:43.640755: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de211c04b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 20:16:43.640815: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,Default Version
2020-12-22 20:16:43.644069: I tensorflow/compiler/jit/xla_device.cc:398] XLA_GPU and XLA_cpu devices are deprecated and will be removed in subsequent releases. Instead,use either @tf.function(experimental_compile=True) for must-compile semantics,or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
2020-12-22 20:16:43.653183: I tensorflow/compiler/jit/xla_compilation_cache.cc:314] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Out[2]: <tf.Tensor: shape=(),dtype=uint32,numpy=3>

In [3]:

如果我使用 tf.int32，则可以避免整个编译：

In [1]: import tensorflow as tf
2020-12-22 20:17:38.788992: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
^[[A
In [2]: tf.foldl(lambda a,x: tf.bitwise.bitwise_or(tf.cast(a,tf.cast(x,tf.uint32)),tf.int32),tf.int32))
2020-12-22 20:17:58.931867: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-22 20:17:58.953301: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] Failed call to cuInit: CUDA_ERROR_UNKNowN: unkNown error
2020-12-22 20:17:58.953368: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: earth
2020-12-22 20:17:58.953377: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: earth
2020-12-22 20:17:58.953573: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.45.1
2020-12-22 20:17:58.953634: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.45.1
2020-12-22 20:17:58.953661: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.45.1
2020-12-22 20:17:58.954212: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (onednN)to use the following cpu instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-12-22 20:17:58.965071: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] cpu Frequency: 2299965000 Hz
2020-12-22 20:17:58.967271: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf83532db0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 20:17:58.967367: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,Default Version
Out[2]: <tf.Tensor: shape=(),numpy=3>

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

tensorflow-xla tensorflow2.0

为什么在 Tensorflow 中使用 uint32 自动使用 XLA 编译

问题描述

解决方法

相关问答