Ticket #692 (new defect)
Opened 3 months ago
crash in cpnd at opensaf restart
| Reported by: | chris.j.leary@… | Owned by: | |
|---|---|---|---|
| Priority: | critical | Milestone: | PL 3.0.1 |
| Component: | CPSv | Version: | 3.0.0-GA |
| Keywords: | Cc: | ||
| patch waiting for maintainer: | no |
Description
Our system was running for 6-8 hours. The system was stopped and then restarted via /etc/init.d/opensafd stop ; /etc/init.d/opensafd start and the system rebooted shortly thereafter.
After the system came back up, it was found that ncs_cpnd had core dumped 6 times in a row during startup before escalation to node reboot occurred.
Below is the backtrace of one of the core dumps (all are identical). There appears to be an error in req->info.read.i_addr passed to ncs_os_posix_shm.
Core was generated by `/usr/lib/opensaf/ncs_cpnd'.
Program terminated with signal 11, Segmentation fault.
[New process 28112]
[New process 28111]
[New process 28110]
[New process 28109]
[New process 28108]
#0 0x00002b17ffb09d90 in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x00002b17ffb09d90 in memcpy () from /lib/libc.so.6
#1 0x00002b17ff20e462 in ncs_os_posix_shm (req=0x40dd21c0)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/leap/src/os_defs.c:1326
#2 0x000000000040af10 in cpnd_find_free_loc (cb=0x631980, type=<value optimized out>)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/cpsv/cpnd/cpnd_res.c:640
#3 0x000000000040b168 in cpnd_restart_shm_client_update (cb=0x40dd2200, cl_node=0x639110)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/cpsv/cpnd/cpnd_res.c:1045
#4 0x000000000041780e in cpnd_process_evt (evt=0x637eb0)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/cpsv/cpnd/cpnd_evt.c:466
#5 0x0000000000403622 in cpnd_main_process (info=<value optimized out>)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/cpsv/cpnd/cpnd_init.c:628
#6 0x00002b17ffdf53f7 in start_thread () from /lib/libpthread.so.0
#7 0x00002b17ffb64b4d in clone () from /lib/libc.so.6
#8 0x0000000000000000 in ?? ()
(gdb) up
#1 0x00002b17ff20e462 in ncs_os_posix_shm (req=0x40dd21c0)
at /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/leap/src/os_defs.c:1326
1326 /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/leap/src/os_defs.c: No such file or directory.
in /home/cleary/build_opensaf3/cae/extern/opensaf-3.0.0/services/leap/src/os_defs.c
(gdb) print req
$1 = (NCS_OS_POSIX_SHM_REQ_INFO *) 0x40dd21c0
(gdb) print *req
$2 = {type = NCS_OS_POSIX_SHM_REQ_READ, info = {open = {i_name = 0x0, i_flags = 4, i_map_flags = 0,
i_size = 1088233984, i_offset = 40, o_addr = 0x0, o_fd = 0, o_hdl = 0}, close = {i_hdl = 0,
i_addr = 0x4, i_fd = 1088233984, i_size = 40}, unlink = {i_name = 0x0}, read = {i_hdl = 0,
i_addr = 0x4, i_to_buff = 0x40dd2200, i_read_size = 40, i_offset = 0}, write = {i_hdl = 0,
i_addr = 0x4, i_from_buff = 0x40dd2200, i_write_size = 40, i_offset = 0}}}
