summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorCarlos Goncalves <carlos.goncalves@neclab.eu>2015-06-23 11:34:23 +0200
committerCarlos Goncalves <carlos.goncalves@neclab.eu>2015-06-24 10:43:27 +0200
commit07c72004e54a6c7ea1f3adadbce652d0913f4578 (patch)
tree7153c1ab6a88d5a2fdc8fe85ae989c200ca72c95
parentfc270496c1ac63343d5d43913553e0394f24bd5f (diff)
Adding fencing as general feature
This patch is based on a related unresolved comment in https://gerrit.opnfv.org/gerrit/#/c/304 At an earlier stage, we had identified fencing as a gap in Nova, but this gap was removed. Who remembers why we made this decision? Is the feature already sufficiently implemented in Nova? The related BP seems obsoleted. JIRA: DOCTOR-10 Co-Authored-By: Gerald Kunzmann <kunzmann@docomolab-euro.com> Signed-off-by: Carlos Goncalves <carlos.goncalves@neclab.eu> Change-Id: Id31ac1552a8f1e3506c5e4d533416611d6b95217
-rw-r--r--requirements/03-architecture.rst29
1 files changed, 22 insertions, 7 deletions
diff --git a/requirements/03-architecture.rst b/requirements/03-architecture.rst
index fee136d7..984f254e 100644
--- a/requirements/03-architecture.rst
+++ b/requirements/03-architecture.rst
@@ -69,12 +69,13 @@ General Features and Requirements
The following features are required for the VIM to achieve high availability of
applications (e.g., MME, S/P-GW) and the Network Services:
-* Monitoring: Monitor physical and virtual resources.
-* Detection: Detect unavailability of physical resources.
-* Correlation and Cognition: Correlate faults and identify affected virtual
- resources.
-* Notification: Notify unavailable virtual resources to their Consumer(s).
-* Recovery action: Execute actions to process fault recovery and maintenance.
+1. Monitoring: Monitor physical and virtual resources.
+2. Detection: Detect unavailability of physical resources.
+3. Correlation and Cognition: Correlate faults and identify affected virtual
+ resources.
+4. Notification: Notify unavailable virtual resources to their Consumer(s).
+5. Fencing: Shut down or isolate a faulty resource
+6. Recovery action: Execute actions to process fault recovery and maintenance.
The time interval between the instant that an event is detected by the
monitoring system and the Consumer notification of unavailable resources shall
@@ -167,6 +168,20 @@ notifications is important. For example, the receiver function in the
consumer-side implementation could have different schema, location, and policies
(e.g. receive or not, aggregate events with the same cause, etc.).
+.. _fencing:
+
+Fencing
+^^^^^^^
+Recovery actions, e.g. safe VM evacuation, have to be preceded by fencing the
+failed host. Fencing hereby means to isolate or shut down a faulty resource.
+Without fencing -- when the perceived disconnection is due to some transient
+or partial failure -- the evacuation might lead into two identical instances
+running together and having a dangerous conflict.
+
+There is a cross-project effort in OpenStack ongoing to implement fencing. A
+general description of fencing in OpenStack is available here:
+https://wiki.openstack.org/wiki/Fencing_Instances_of_an_Unreachable_Host .
+
Recovery Action
^^^^^^^^^^^^^^^
@@ -174,7 +189,7 @@ In the basic "Fault management using ACT-STBY configuration" use case, no
automatic actions will be taken by the VIM, but all recovery actions executed by
the VIM and the NFVI will be instructed and coordinated by the Consumer.
-In a more advanced use case, the VIM shall be able to recover the failed virtual
+In a more advanced use case, the VIM shall be able to recover the failed virtual
resources according to a pre-defined behavior for that resource. In principle
this means that the owner of the resource (i.e., its consumer or administrator)
can define which recovery actions shall be taken by the VIM. Examples are a