diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/requirements/03-architecture.rst | 18 | ||||
-rw-r--r-- | docs/requirements/05-implementation.rst | 116 |
2 files changed, 86 insertions, 48 deletions
diff --git a/docs/requirements/03-architecture.rst b/docs/requirements/03-architecture.rst index d9ccd221..b7417691 100644 --- a/docs/requirements/03-architecture.rst +++ b/docs/requirements/03-architecture.rst @@ -191,11 +191,15 @@ fencing, but there has not been any progress. The general description is available here: https://wiki.openstack.org/wiki/Fencing_Instances_of_an_Unreachable_Host -As OpenStack does not cover fencing it is in the responsibility of the Doctor -project to make sure fencing is done by using tools like pacemaker and by -calling OpenStack APIs. Only after fencing is done OpenStack resources can be -marked as down. In case there are gaps in OpenStack projects to have all -relevant resources marked as down, those gaps need to be identified and fixed. +OpenStack provides some mechanisms that allow fencing of faulty resources. Some +are automatically invoked by the platform itself (e.g. Nova disables the +compute service when libvirtd stops running, preventing new VMs to be scheduled +to that node), while other mechanisms are consumer trigger-based actions (e.g. +Neutron port admin-state-up). For other fencing actions not supported by +OpenStack, the Doctor project may suggest ways to address the gap (e.g. through +means of resourcing to external tools and orchestration methods), or +documenting or implementing them upstream. + The Doctor Inspector component will be responsible of marking resources down in the OpenStack and back up if necessary. @@ -206,11 +210,11 @@ In the basic :ref:`uc-fault1` use case, no automatic actions will be taken by the VIM, but all recovery actions executed by the VIM and the NFVI will be instructed and coordinated by the Consumer. -In a more advanced use case, the VIM shall be able to recover the failed virtual +In a more advanced use case, the VIM may be able to recover the failed virtual resources according to a pre-defined behavior for that resource. In principle this means that the owner of the resource (i.e., its consumer or administrator) can define which recovery actions shall be taken by the VIM. Examples are a -restart of the VM, migration/evacuation of the VM, or no action. +restart of the VM or migration/evacuation of the VM. diff --git a/docs/requirements/05-implementation.rst b/docs/requirements/05-implementation.rst index 848d5a04..84979772 100644 --- a/docs/requirements/05-implementation.rst +++ b/docs/requirements/05-implementation.rst @@ -672,47 +672,81 @@ and correlated alarms. Instead the AODH alarm class has attributes for actions, rules and user and project id. -+------------------------+------------------------+------------------------+ -| ETSI NFV Alarm Type | OPNFV Doctor Req Spec | AODH Alarm Type | -+========================+========================+========================+ -| AlarmId | FaultId | Alarm Id | -+------------------------+------------------------+------------------------+ -| managedObjectId | virtualResourceId | (N/A) | -+------------------------+------------------------+------------------------+ -| \- | \- | User_Id, Project_Id | -+------------------------+------------------------+------------------------+ -| alarmRaisedTime | \- | (N/A) | -+------------------------+------------------------+------------------------+ -| alarmChangedTime | \- | (N/A) | -+------------------------+------------------------+------------------------+ -| alarmClearedTime | \- | (N/A) | -+------------------------+------------------------+------------------------+ -| alarmState: | virtualResourceState | State: ok, alarm, | -| New, Updated, Cleared | (e.g. normal, | insufficient data | -| | maintenance, down, | | -| | error) | | -+------------------------+------------------------+------------------------+ -| vrPerceivedSeverity: | Severity (Integer) | Severity: low, | -| Critical, Major, Minor,| | moderate, critical | -| Warning, Indeterminate,| | | -| Cleared | | | -+------------------------+------------------------+------------------------+ -| eventTime (unclear?) | EventTime | (N/A) | -+------------------------+------------------------+------------------------+ -| faultType | FaultType | type | -+------------------------+------------------------+------------------------+ -| probableCause | ProbableCause | description | -+------------------------+------------------------+------------------------+ -| isRootCause | IsRootCause | \- | -+------------------------+------------------------+------------------------+ -| correlatedAlarmId | CorrelatedFaultId | \- | -+------------------------+------------------------+------------------------+ -| faultDetails | FaultDetails | \- | -+------------------------+------------------------+------------------------+ -| \- | \- | actions, rule, time | -| | | constraints | -+------------------------+------------------------+------------------------+ - ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| ETSI NFV Alarm Type | OPNFV Doctor | AODH Event Alarm | Description / Comment | Recommendations | +| | Requirement Specs | Notification | | | ++========================+========================+=====================+=============================================+=======================================+ +| alarmId | FaultId | alarm_id | Identifier of an alarm. | \- | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| \- | \- | alarm_name | Human readable alarm name. | May be added in ETSI NFV Stage 3. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| managedObjectId | VirtualResourceId | (reason) | Identifier of the affected virtual resource | \- | +| | | | is part of the AODH reason parameter. | | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| \- | \- | user_id, project_id | User and project identifiers. | May be added in ETSI NFV Stage 3. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| alarmRaisedTime | \- | \- | Timestamp when alarm was raised. | To be added to Doctor and AODH. May | +| | | | | be derived (e.g. in a shimlayer) from | +| | | | | the AODH alarm history. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| alarmChangedTime | \- | \- | Timestamp when alarm was changed/updated. | see above | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| alarmClearedTime | \- | \- | Timestamp when alarm was cleared. | see above | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| eventTime | \- | \- | Timestamp when alarm was first observed by | see above | +| | | | the Monitor. | | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| \- | EventTime | generated | Timestamp of the Notification. | Update parameter name in Doctor spec. | +| | | | | May be added in ETSI NFV Stage 3. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| state: | VirtualResourceState: | current: ok, alarm, | ETSI NFV IFA 005/006 lists example alarm | Maintenance state is missing in AODH. | +| E.g. Fired, Updated | E.g. normal, down | insufficient_data | states. | List of alarm states will be | +| Cleared | maintenance, error | | | specified in ETSI NFV Stage 3. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| perceivedSeverity: | Severity (Integer) | Severity: | ETSI NFV IFA 005/006 lists example | List of alarm states will be | +| E.g. Critical, Major, | | low (default), | perceived severity values. | specified in ETSI NFV Stage 3. | +| Minor, Warning, | | moderate, critical | | | +| Indeterminate, Cleared | | | | **OPNFV: Severity (Integer)**: | +| | | | | * update OPNFV Doctor specification | +| | | | | to *Enum* | +| | | | | | +| | | | | **perceivedSeverity=Indetermined**: | +| | | | | * remove value *Indetermined* in | +| | | | | IFA and map undefined values to | +| | | | | “minor” severity, or | +| | | | | * add value *indetermined* in AODH | +| | | | | and make it the default value. | +| | | | | | +| | | | | **perceivedSeverity=Cleared**: | +| | | | | * remove value *Cleared* in IFA as | +| | | | | the information about a cleared | +| | | | | alarm alarm can be derived from | +| | | | | the alarm state parameter, or | +| | | | | * add value *cleared* in AODH and | +| | | | | set a rule that the severity is | +| | | | | “cleared” when the state is *ok*. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| faultType | FaultType | event_type in | Type of the fault, e.g. “CPU failure” of a | OpenStack Alarming (Aodh) can use a | +| | | reason_data | compute resource, in machine interpretable | fuzzy matching with wildcard string, | +| | | | format. | "compute.cpu.failure". | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| N/A | N/A | type = "event" | Type of the notification. For fault | \- | +| | | | notifications the type in AODH is “event”. | | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| probableCause | ProbableCause | \- | Probable cause of the alarm. | May be provided (e.g. in a shimlayer) | +| | | | | based on Vitrage topology awareness / | +| | | | | root-cause-analysis. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| isRootCause | IsRootCause | \- | Boolean indicating whether the fault is the | see above | +| | | | root cause of other faults. | | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| correlatedAlarmId | CorrelatedFaultId | \- | List of IDs of correlated faults. | see above | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| faultDetails | FaultDetails | \- | Additional details about the fault/alarm. | FaultDetails information element will | +| | | | | be specified in ETSI NFV Stage 3. | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ +| \- | \- | action, previous | Additional AODH alarm related parameters. | \- | ++------------------------+------------------------+---------------------+---------------------------------------------+---------------------------------------+ Table: Comparison of alarm attributes |