GLX中的偶发性故障

问题描述

我正在使用Xserver在无头模式下运行OpenGL,并多次调用此api:https://github.com/RobotLocomotion/drake/blob/74292cacd1c42d6b3e682dc836254cdb834ea2e6/geometry/render/render_engine_vtk.cc#L311

偶尔但几乎总是有一个

X Error of Failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of Failed request:  154 (GLX)
  Minor opcode of Failed request:  3 (X_GLXCreateContext)
  Value in Failed request:  0x0
  Serial number of Failed request:  61
  Current serial number in output stream:  62

glxinfo:

glxinfo
name of display: :0
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 50 requests (50 kNown processed) with 0 events remaining.

/var/log/Xorg.0.log日志的最后几行:

[ 47757.261] (EE) Backtrace:
[ 47757.261] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4d) [0x557e48dd2acd]
[ 47757.261] (EE) 1: /usr/lib/xorg/Xorg (0x557e48c1a000+0x1bc869) [0x557e48dd6869]
[ 47757.261] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f4cbddc7000+0x128a0) [0x7f4cbddd98a0]
[ 47757.261] (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (0x7f4cba768000+0x479100) [0x7f4cbabe1100] 
[ 47757.261] (EE) 
[ 47757.262] (EE) Segmentation fault at address 0x8
[ 47757.262] (EE) 
Fatal server error:
[ 47757.262] (EE) Caught signal 11 (Segmentation fault). Server aborting

机器:18.04.2-Ubuntu

NVIDIA-SMI 440.100驱动程序版本:440.100 CUDA版本:10.2

有人可以让我知道下一步要在这里调试吗?

解决方法

我也在自己的CI中看到了这一点:

[ 18228.470] (EE) Backtrace:
[ 18228.470] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4d) [0x55e0ca9fcacd]
[ 18228.470] (EE) 1: /usr/lib/xorg/Xorg (0x55e0ca844000+0x1bc869) [0x55e0caa00869]
[ 18228.470] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fce3e7d6000+0x128a0) [0x7fce3e7e88a0]
[ 18228.470] (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (0x7fce3b177000+0x479100) [0x7fce3b5f0100]
[ 18228.470] (EE) 
[ 18228.470] (EE) Segmentation fault at address 0x8

ASLR不同,但跟踪中的低位字节相同。

我正在使用xorg-server 2:1.19.6-1ubuntu4.4。

[ 17925.887] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 17925.887]    compiled for 1.6.99.901,module version = 1.0.0
[ 17925.887]    Module class: X.Org Video Driver
[ 17925.887] (II) NVIDIA dlloader X Driver  440.100  Fri May 29 08:21:27 UTC 2020

不幸的是,我还无法调试它。

我的即时经验(尚未得到数据证实)是,几个月前Ubuntu将每个人从nvidia 430升级到nvidia 440时,这种情况变得越来越频繁。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...