OpenACC 托管内存代码在使用调试符号时有效,但没有它们会崩溃

问题描述

我有一个 C++ 程序,它使用 ROS 库和 OpenACC 指令,并在 Ubuntu 18.04 上使用 PGI 编译器(20.11 版)编译。我注意到,如果我在调试符号和托管内存打开的情况下构建程序(-g -ta=tesla:managed),它会按预期运行(尽管速度很慢),并且不会崩溃,但是如果我在调试符号关闭的情况下构建程序和托管内存打开 (-ta=tesla:managed),它在程序启动期间崩溃并显示以下消息:

free(): invalid pointer

此外,如果我使用 cpu 多线程(通过使用 -ta=multicore 或使用 OpenMP 指令而不是 OpenACC 指令)运行相同的程序,它会按预期工作并且不会崩溃。从上面的实验中,我想知道是否可能是由于使用 -ta=tesla:managed 启用的 CUDA 统一内存与传递指针的 ROS 消息传递库之间存在冲突,但自从尝试后我不是 100% 确定事实证明,调试程序非常棘手。

查看开始运行我们拥有的 ROS 节点的 main 方法

int main(int argc,char** argv) {
    ros::init(argc,argv,"MCL node");
    ros::NodeHandle node_handle;

    MonteCarloLocalizationNode mcl_node(node_handle);

    ros::spin();

    return 0;
}

然后查看我们拥有的 MonteCarloLocalizationNode 类的构造函数的前几行:

MonteCarloLocalizationNode::MonteCarloLocalizationNode(const ros::NodeHandle &node_handle) : node_handle_(node_handle) {

    ROS_INFO("Waiting to receive map...");
    occupancy_grid_map_ptr_ = ros::topic::waitForMessage<nav_msgs::OccupancyGrid>("/map",node_handle_);
    ROS_INFO("Map received.");
    ...
    ...
    ...
}

当运行上面的程序时,它会到达构造函数中的那一行

ROS_INFO("Waiting to receive map...");

但它永远不会到达

ROS_INFO("Map received.");

所以问题似乎出在分配在

的指针上
occupancy_grid_map_ptr_ = ros::topic::waitForMessage<nav_msgs::OccupancyGrid>("/map",node_handle_);

它是 nav_msgs::OccupancyGridConstPtr 类型(仔细观察这个类型,它是一个 Boost 共享指针的 typedef,因为我们有 typedef boost::shared_ptr< ::nav_msgs::OccupancyGrid const> OccupancyGridConstPtr;)。

查看 gdb 输出的回溯(没有调试符号,因为使用调试符号会使程序再次运行)我得到以下信息:

[ INFO] [1609543638.562749034]: Waiting to receive map...
[New Thread 0x7fffe6129700 (LWP 12427)]
[New Thread 0x7fffe5928700 (LWP 12428)]
[New Thread 0x7fffe50a6700 (LWP 12429)]
free(): invalid pointer

Thread 3 "mcl_node" received signal SIGABRT,Aborted.
[Switching to Thread 0x7fffed1bd700 (LWP 12423)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) [ INFO] [1609543643.706149732,1669.454000000]: Read a 4000 X 4000 map @ 0.050 m/cell
bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff1121921 in __GI_abort () at abort.c:79
#2  0x00007ffff116a967 in __libc_message (action=action@entry=do_abort,fmt=fmt@entry=0x7ffff1297b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff11719da in malloc_printerr (str=str@entry=0x7ffff1295d08 "free(): invalid pointer") at malloc.c:5342
#4  0x00007ffff1178f0c in _int_free (have_lock=0,p=0x7fff840000f0,av=0x7ffff14ccc40 <main_arena>) at malloc.c:4167
#5  __GI___libc_free (mem=0x7fff84000100) at malloc.c:3134
#6  0x00007ffff7970e9d in ros::Subscription::negotiateConnection(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char> > const&) () from /opt/ros/melodic/lib/libroscpp.so
#7  0x00007ffff79728af in ros::Subscription::pubUpdate(std::vector<std::__cxx11::basic_string<char,std::allocator<char> >,std::allocator<std::__cxx11::basic_string<char,std::allocator<char> > > > const&) () from /opt/ros/melodic/lib/libroscpp.so
#8  0x00007ffff79006f3 in ros::TopicManager::pubUpdate(std::__cxx11::basic_string<char,std::allocator<char> > const&,std::vector<std::__cxx11::basic_string<char,std::allocator<char> > > > const&) () from /opt/ros/melodic/lib/libroscpp.so
#9  0x00007ffff7904173 in ros::TopicManager::pubUpdateCallback(XmlRpc::XmlRpcValue&,XmlRpc::XmlRpcValue&) () from /opt/ros/melodic/lib/libroscpp.so
#10 0x00007ffff78f2edc in ros::XMLRPCCallWrapper::execute(XmlRpc::XmlRpcValue&,XmlRpc::XmlRpcValue&) () from /opt/ros/melodic/lib/libroscpp.so
#11 0x00007ffff66e8dbf in XmlRpc::XmlRpcServerConnection::executeMethod(std::__cxx11::basic_string<char,XmlRpc::XmlRpcValue&,XmlRpc::XmlRpcValue&)
    () from /opt/ros/melodic/lib/libxmlrpcpp.so
#12 0x00007ffff66eaff1 in XmlRpc::XmlRpcServerConnection::executeRequest() () from /opt/ros/melodic/lib/libxmlrpcpp.so
#13 0x00007ffff66e8a28 in XmlRpc::XmlRpcServerConnection::writeResponse() () from /opt/ros/melodic/lib/libxmlrpcpp.so
#14 0x00007ffff66e8bd8 in XmlRpc::XmlRpcServerConnection::handleEvent(unsigned int) () from /opt/ros/melodic/lib/libxmlrpcpp.so
#15 0x00007ffff66e640f in XmlRpc::XmlRpcdispatch::work(double) () from /opt/ros/melodic/lib/libxmlrpcpp.so
#16 0x00007ffff78eed98 in ros::XMLRPCManager::serverThreadFunc() () from /opt/ros/melodic/lib/libroscpp.so
#17 0x00007ffff5c93bcd in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.65.1
#18 0x00007ffff54526db in start_thread (arg=0x7fffed1bd700) at pthread_create.c:463
#19 0x00007ffff120271f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

此外,valgrind 输出会产生一条 Invalid free() / delete / delete[] / realloc() 消息(大部分 valgrind 输出尚未包括在内,因为它相当多,但我很乐意提供如果需要,其余部分):

[ INFO] [1609543808.695082796]: Waiting to receive map...
==12718== Warning: set address range perms: large range [0x59e43000,0x87e42000) (noaccess)
==12718== Warning: set address range perms: large range [0x5a000000,0x85828000) (defined)
==12718== Thread 3:
==12718== Invalid free() / delete / delete[] / realloc()
==12718==    at 0x4C3323B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12718==    by 0x517EE9C: ros::Subscription::negotiateConnection(std::__cxx11::basic_string<char,std::allocator<char> > const&) (in /opt/ros/melodic/lib/libroscpp.so)
==12718==    by 0x51808AE: ros::Subscription::pubUpdate(std::vector<std::__cxx11::basic_string<char,std::allocator<char> > > > const&) (in /opt/ros/melodic/lib/libroscpp.so)
==12718==    by 0x510E6F2: ros::TopicManager::pubUpdate(std::__cxx11::basic_string<char,std::allocator<char> > > > const&) (in /opt/ros/melodic/lib/libroscpp.so)
==12718==    by 0x5112172: ros::TopicManager::pubUpdateCallback(XmlRpc::XmlRpcValue&,XmlRpc::XmlRpcValue&) (in /opt/ros/melodic/lib/libroscpp.so)
==12718==    by 0x5100EDB: ros::XMLRPCCallWrapper::execute(XmlRpc::XmlRpcValue&,XmlRpc::XmlRpcValue&) (in /opt/ros/melodic/lib/libroscpp.so)
==12718==    by 0x6326DBE: XmlRpc::XmlRpcServerConnection::executeMethod(std::__cxx11::basic_string<char,XmlRpc::XmlRpcValue&) (in /opt/ros/melodic/lib/libxmlrpcpp.so)
==12718==    by 0x6328FF0: XmlRpc::XmlRpcServerConnection::executeRequest() (in /opt/ros/melodic/lib/libxmlrpcpp.so)
==12718==    by 0x6326A27: XmlRpc::XmlRpcServerConnection::writeResponse() (in /opt/ros/melodic/lib/libxmlrpcpp.so)
==12718==    by 0x6326BD7: XmlRpc::XmlRpcServerConnection::handleEvent(unsigned int) (in /opt/ros/melodic/lib/libxmlrpcpp.so)
==12718==    by 0x632440E: XmlRpc::XmlRpcdispatch::work(double) (in /opt/ros/melodic/lib/libxmlrpcpp.so)
==12718==    by 0x50FCD97: ros::XMLRPCManager::serverThreadFunc() (in /opt/ros/melodic/lib/libroscpp.so)
==12718==  Address 0x5a000100 is in a rw- mapped file /dev/nvidia-uvm segment

因此,我想知道我是否可以采取任何措施来修复此 free(): invalid pointer 错误,因为这似乎与 ROS 消息指针与 OpenACC 托管内存冲突有关。任何帮助将不胜感激。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)