为什么在 Tensorflow 中使用 uint32 自动使用 XLA 编译

问题描述

我在使用 TF 2.3 时遇到了问题,尤其是 tf.uint32 类型的张量。似乎有几个操作不支持 tf.uint32,例如对于 tf.foldl,如果累加器的类型为 tf.uint32,则操作将使用 XLA 编译,这会带来全新的复杂性(例如,未实现对跨越 XLA/TF 边界的 TensorList 的支持。) 为什么不直接引发错误并让用户决定是否编译该函数

我的问题是,我是否认为编译操作的原因是因为张量是 dtype uint32?还是有其他事情发生?

这是一个例子:

In [1]: import tensorflow as tf
2020-12-22 20:16:40.400712: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1

In [2]: tf.foldl(lambda a,x: tf.bitwise.bitwise_or(a,x),tf.constant([1,2,3],tf.uint32),tf.constant(0,tf.uint32))
2020-12-22 20:16:43.607662: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-22 20:16:43.632255: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] Failed call to cuInit: CUDA_ERROR_UNKNowN: unkNown error
2020-12-22 20:16:43.632280: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: earth
2020-12-22 20:16:43.632287: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: earth
2020-12-22 20:16:43.632362: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.45.1
2020-12-22 20:16:43.632381: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.45.1
2020-12-22 20:16:43.632388: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.45.1
2020-12-22 20:16:43.632703: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (onednN)to use the following cpu instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-12-22 20:16:43.639576: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] cpu Frequency: 2299965000 Hz
2020-12-22 20:16:43.640755: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de211c04b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 20:16:43.640815: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,Default Version
2020-12-22 20:16:43.644069: I tensorflow/compiler/jit/xla_device.cc:398] XLA_GPU and XLA_cpu devices are deprecated and will be removed in subsequent releases. Instead,use either @tf.function(experimental_compile=True) for must-compile semantics,or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
2020-12-22 20:16:43.653183: I tensorflow/compiler/jit/xla_compilation_cache.cc:314] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Out[2]: <tf.Tensor: shape=(),dtype=uint32,numpy=3>

In [3]: 

如果我使用 tf.int32,则可以避免整个编译:

In [1]: import tensorflow as tf
2020-12-22 20:17:38.788992: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
^[[A
In [2]: tf.foldl(lambda a,x: tf.bitwise.bitwise_or(tf.cast(a,tf.cast(x,tf.uint32)),tf.int32),tf.int32))
2020-12-22 20:17:58.931867: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-22 20:17:58.953301: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] Failed call to cuInit: CUDA_ERROR_UNKNowN: unkNown error
2020-12-22 20:17:58.953368: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: earth
2020-12-22 20:17:58.953377: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: earth
2020-12-22 20:17:58.953573: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.45.1
2020-12-22 20:17:58.953634: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.45.1
2020-12-22 20:17:58.953661: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.45.1
2020-12-22 20:17:58.954212: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (onednN)to use the following cpu instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-12-22 20:17:58.965071: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] cpu Frequency: 2299965000 Hz
2020-12-22 20:17:58.967271: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf83532db0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 20:17:58.967367: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,Default Version
Out[2]: <tf.Tensor: shape=(),numpy=3>

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...