diff options
-rw-r--r-- | design_docs/README | 14 | ||||
-rw-r--r-- | requirements/02-use_cases.rst | 34 | ||||
-rw-r--r-- | requirements/03-architecture.rst | 51 | ||||
-rw-r--r-- | requirements/04-gaps.rst | 4 | ||||
-rw-r--r-- | requirements/05-implementation.rst | 8 | ||||
-rw-r--r-- | requirements/index.rst | 12 |
6 files changed, 80 insertions, 43 deletions
diff --git a/design_docs/README b/design_docs/README index 52f31ff1..f0491cf6 100644 --- a/design_docs/README +++ b/design_docs/README @@ -1,9 +1,9 @@ -This is directory to store design documents which may includes draft version -of blueprints written before proposing upstream OSS communities such as -OpenStack, in order to keep the original blueprint as reviewed in OPNFV. -That means there could be out-of-dated blueprints as result of further -refinements in the upstream OSS coummunity. So please refer the link in each -document to find current version of blueprint and status of development in the -relevant OSS community. +This is the directory to store design documents which may include draft +versions of blueprints written before proposing to upstream OSS communities +such as OpenStack, in order to keep the original blueprint as reviewed in +OPNFV. That means there could be out-dated blueprints as result of further +refinements in the upstream OSS community. Please refer to the link in each +document to find the latest version of the blueprint and status of development +in the relevant OSS community. See also https://wiki.opnfv.org/requirements_projects . diff --git a/requirements/02-use_cases.rst b/requirements/02-use_cases.rst index 0a119521..f69151df 100644 --- a/requirements/02-use_cases.rst +++ b/requirements/02-use_cases.rst @@ -43,6 +43,8 @@ operation. Faults ------ +.. _uc-fault1: + Fault management using ACT-STBY configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -115,22 +117,22 @@ ACT-STBY case), or if more active instances are needed as well. Preventive actions based on fault prediction ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The fault management scenario explained in Clause 2.1.1 can also be performed -based on fault prediction. In such cases, in VIM, there is an intelligent fault -prediction module which, based on its NFVI monitoring information, can predict -an imminent fault in the elements of NFVI. A simple example is raising -temperature of a Hardware Server which might trigger a pre-emptive recovery -action. The requirements of such fault prediction in the VIM are investigated in -the OPNFV project "Data Collection for Failure Prediction" [PRED]_. - -This use case is very similar to "Fault management using ACT-STBY configuration" -in Clause 2.1.1. Instead of a fault detection (Step 1 "Fault Notification in" -:num:`Figure #figure1`), the trigger comes from a fault prediction module in the -VIM, or from a third party module which notifies the VIM about an imminent -fault. From Step 2~5, the work flow is the same as in the "Fault management -using ACT-STBY configuration" use case, except in this case, the Consumer of a -VM/VNF switches to STBY configuration based on a predicted fault, rather than an -occurred fault. +The fault management scenario explained in :ref:`uc-fault1` can also be +performed based on fault prediction. In such cases, in VIM, there is an +intelligent fault prediction module which, based on its NFVI monitoring +information, can predict an imminent fault in the elements of NFVI. +A simple example is raising temperature of a Hardware Server which might +trigger a pre-emptive recovery action. The requirements of such fault +prediction in the VIM are investigated in the OPNFV project "Data Collection +for Failure Prediction" [PRED]_. + +This use case is very similar to :ref:`uc-fault1`. Instead of a fault +detection (Step 1 "Fault Notification in" :num:`Figure #figure1`), the trigger +comes from a fault prediction module in the VIM, or from a third party module +which notifies the VIM about an imminent fault. From Step 2~5, the work flow is +the same as in the "Fault management using ACT-STBY configuration" use case, +except in this case, the Consumer of a VM/VNF switches to STBY configuration +based on a predicted fault, rather than an occurred fault. NVFI Maintenance ---------------- diff --git a/requirements/03-architecture.rst b/requirements/03-architecture.rst index b55d5a02..2f9d24be 100644 --- a/requirements/03-architecture.rst +++ b/requirements/03-architecture.rst @@ -69,12 +69,13 @@ General Features and Requirements The following features are required for the VIM to achieve high availability of applications (e.g., MME, S/P-GW) and the Network Services: -* Monitoring: Monitor physical and virtual resources. -* Detection: Detect unavailability of physical resources. -* Correlation and Cognition: Correlate faults and identify affected virtual - resources. -* Notification: Notify unavailable virtual resources to their Consumer(s). -* Recovery action: Execute actions to process fault recovery and maintenance. +1. Monitoring: Monitor physical and virtual resources. +2. Detection: Detect unavailability of physical resources. +3. Correlation and Cognition: Correlate faults and identify affected virtual + resources. +4. Notification: Notify unavailable virtual resources to their Consumer(s). +5. Fencing: Shut down or isolate a faulty resource +6. Recovery action: Execute actions to process fault recovery and maintenance. The time interval between the instant that an event is detected by the monitoring system and the Consumer notification of unavailable resources shall @@ -160,24 +161,42 @@ would lead to heavy signaling traffic. Thus, a publication/subscription messaging model is better suited for these notifications, as notifications are only sent to subscribed consumers. -Note: the VIM should only accept individual notification URLs for each resource -by its owner or administrator. +Notifications will be send out along with the configuration by the consumer. +The configuration includes endpoint(s) in which the consumers can specify +multiple targets for the notification subscription, so that various and +multiple receiver functions can consume the notification message. +Also, the conditions for notifications shall be configurable, such that +the consumer can set according policies, e.g. whether it wants to receive +fault notifications or not. +Note: the VIM should only accept notification subscriptions for each resource +by its owner or administrator. Notifications to the Consumer about the unavailability of virtualized resources will include a description of the fault, preferably with sufficient -abstraction rather than detailed physical fault information. Flexibility in -notifications is important. For example, the receiver function in the -consumer-side implementation could have different schema, location, and policies -(e.g. receive or not, aggregate events with the same cause, etc.). +abstraction rather than detailed physical fault information. + +.. _fencing: + +Fencing +^^^^^^^ +Recovery actions, e.g. safe VM evacuation, have to be preceded by fencing the +failed host. Fencing hereby means to isolate or shut down a faulty resource. +Without fencing -- when the perceived disconnection is due to some transient +or partial failure -- the evacuation might lead into two identical instances +running together and having a dangerous conflict. + +There is a cross-project effort in OpenStack ongoing to implement fencing. A +general description of fencing in OpenStack is available here: +https://wiki.openstack.org/wiki/Fencing_Instances_of_an_Unreachable_Host . Recovery Action ^^^^^^^^^^^^^^^ -In the basic "Fault management using ACT-STBY configuration" use case, no -automatic actions will be taken by the VIM, but all recovery actions executed by -the VIM and the NFVI will be instructed and coordinated by the Consumer. +In the basic :ref:`uc-fault1` use case, no automatic actions will be taken by +the VIM, but all recovery actions executed by the VIM and the NFVI will be +instructed and coordinated by the Consumer. -In a more advanced use case, the VIM shall be able to recover the failed virtual +In a more advanced use case, the VIM shall be able to recover the failed virtual resources according to a pre-defined behavior for that resource. In principle this means that the owner of the resource (i.e., its consumer or administrator) can define which recovery actions shall be taken by the VIM. Examples are a diff --git a/requirements/04-gaps.rst b/requirements/04-gaps.rst index 38dcbd27..a5e37fd4 100644 --- a/requirements/04-gaps.rst +++ b/requirements/04-gaps.rst @@ -95,6 +95,10 @@ Maintenance Notification - VIM user cannot receive maintenance notifications. +* Related blueprints + + + https://blueprints.launchpad.net/nova/+spec/service-status-notification + VIM Southbound interface ------------------------ diff --git a/requirements/05-implementation.rst b/requirements/05-implementation.rst index a4048df7..6fbf613c 100644 --- a/requirements/05-implementation.rst +++ b/requirements/05-implementation.rst @@ -12,6 +12,8 @@ the related northbound interface and the related information elements. Finally, Section 5.6 provides a first set of blueprints to address selected gaps required for the realization functionalities of the Doctor project. +.. _impl_fb: + Functional Blocks ----------------- @@ -88,7 +90,7 @@ to users with relevant ownership, whereas the latter is related to raw devices or small entities which should be handled with an administrator privilege. The northbound interface between the Notifier and the Consumer/Administrator is -specified in Section 5.5. +specified in :ref:`impl_nbi`. Sequence -------- @@ -144,7 +146,7 @@ shall be less than 1 second. Fault management scenario :num:`Figure #figure8` shows a more detailed message flow (Steps 4 to 6) between -the 4 building blocks introduced in Section 5.1. +the 4 building blocks introduced in :ref:`impl_fb`. 4. The Monitor observed a fault in the NFVI and reports the raw fault to the Inspector. @@ -428,6 +430,8 @@ and :num:`Figure #figure14`): - PhysicalResourceState [1] (String) - ZoneID [0..1] (Identifier) +.. _impl_nbi: + Detailed northbound interface specification ------------------------------------------- diff --git a/requirements/index.rst b/requirements/index.rst index f280a63f..8495365d 100644 --- a/requirements/index.rst +++ b/requirements/index.rst @@ -15,8 +15,6 @@ Doctor: Fault Management and Maintenance :Editors: Ashiq Khan (NTT DOCOMO), Gerald Kunzmann (NTT DOCOMO) :Authors: Ryota Mibu (NEC), Carlos Goncalves (NEC), Tomi Juvonen (Nokia), Tommy Lindgren (Ericsson) -:Project creation date: 2014-12-02 -:Submission date: 2015-03-XX :Abstract: Doctor is an OPNFV requirement project [DOCT]_. Its scope is NFVI fault management, and maintenance and it aims at developing and @@ -33,6 +31,16 @@ Doctor: Fault Management and Maintenance realization for a NFVI fault management and maintenance solution in open source software. +:History: + + ========== ===================================================== + Date Description + ========== ===================================================== + 02.12.2014 Project creation + 14.04.2015 Initial version of the deliverable uploaded to Gerrit + 18.05.2015 Stable version of the Doctor deliverable + ========== ===================================================== + .. raw:: latex |