Ticket #266 (assigned defect)
AMF: Wrong behavior of N-way active redundancy model
| Reported by: | marioa | Owned by: | marioa |
|---|---|---|---|
| Priority: | critical | Milestone: | 3.0.0-GA |
| Component: | AvSv | Version: | 3.0.0-FC |
| Keywords: | Cc: | ||
| patch waiting for maintainer: | no |
Description
See background information:
http://list.opensaf.org/pipermail/users/2008-September/001492.html
According to AMF spec the characteristics of the N-way redundancy model is to: "At any given time, the Availability Management Framework should make sure that the redundancy level (the preferred number of active assignments) for each SI is guaranteed, if possible, while the maximum number of service units is not exceeded." (AMF Chapter 3.7.5.1 bullet 6).
The goal is unquestionable, the remaining question is then if it is possible to assign HA-state=ACTIVE to the service units at node nine and ten (see background information in mail thread)
According to AMF spec (chapter 3.3.2.3), the "readiness state" of a component indicates whether a component is ready to take service instance assignments. When a component's readiness state is In-service it is eligible for CSI assignments. The components readiness state is in-service if its operational state is enabled and and the readiness state of the SU containing it is in-service.
According to AMF spec (chapter 3.3.1.4), the readiness state of an SU is In-service if its operational state and and the operational state of its containing node is enabled its administrative state and the administrative state of its containing service group, node and cluster are unlocked its presence state is either instantiated or restarting
The log records that we have show that the SUs at both node nine and ten have readiness state In-service, which means there shall be no hinder for AMF to assign CSIs with HA-state=Active to the components of these SUs.
The problem has been explained (i mail thread) as a consequence of the SG is not being "in a stable state". We can not find anything stated in the AMF specification that the SG has to be in a stable state before SUs can be assigned.
Possible view on the problem is that AMF detects that a csiSet(QUIESCED) operation has timed out on node 5 and AMF has detected this and tried to recover. During the recovery CLEAN UP has been done successfully and then an attempt to INSTANTIATE the component has been done. INSTANTIATION has failed however leaving the component in the INSTANTIATION-FAILED state. (The mistake is perhaps that the successful CLEANUP has not been internally reported to the SG so that the SI-state of the SU could be removed.)
