Ticket #568 (new defect)
Controller failover does not work when EntityLocations is not 1 or 2
| Reported by: | troyh | Owned by: | |
|---|---|---|---|
| Priority: | blocker | Milestone: | PL 3.0.1 |
| Component: | AvSv | Version: | 3.0.0-RC1 |
| Keywords: | Cc: | ||
| patch waiting for maintainer: | no |
Description
Controller failover does not work when the controllers are not physically in EntityLocations? 1 and 2.
On my c-class cluser. I have 4 physical slots, 9, 10, 11 and 12. If I configure the clsuter using the default slot_id's 1-4 and update the EntityLoactions? 9-12 respectively. The cluster starts and mostly works as expected, except for active controller failover does not work. If I look at the log on the standby controller when the active controller fails I see:
opensaf_immnd: Director Service in NOACTIVE state
kernel: TIPC: Resetting link <1.1.47:eth0-1.1.31:eth0>, peer not responding
kernel: TIPC: Lost link <1.1.47:eth0-1.1.31:eth0> on network plane A
kernel: TIPC: Lost contact with <1.1.31>
opensaf_immd: IMMND DOWN on active controller f1 detected at standby immd!! f2. Possible failover
opensaf_immd: Resend of fevs message 20, will not mbcp to peer IMMD
opensaf_immd: Resend of fevs message 21, will not mbcp to peer IMMD
opensaf_immnd: DISCARD DUPLICATE FEVS message:20
opensaf_immnd: DISCARD DUPLICATE FEVS message:21
opensaf_immnd: Global discard node received for nodeId:2010f pid:22233
ncs_scap: AVD: Heart Beat missed with active director on 2010f
opensaf_fmsd: Role: STANDBY, FM_EVT_HB_LOSS: for slot_id: 9, subslot_id: 15
The STANDBY clearly knows that the ACTIVE went away but yet doesn't seem to think it needs to become ACTIVE.
If I simply move the blades to physical slots 1-4 and updated the EntityLocations? everything seems to work as expected.
There seems to be a disconnect between the slot/subslot ID and the actual EntityLocation?.
