Section 5 - patch set3 for review - HA Project Requirements

Fixed review comments from patch set 1 Patch set2: upload a wrong version of doc Patch set4: change file name to Section_5_VIM_HA.rst and fix comments Patch set5: delete all the endline blanks Patch set6: update according to the comments Patch set7: add the ability to configure HA functions Patch set8: add some small changes in comments for patch set 3 Closes-bug: JIRA: HA-8 Signed-off-by: zhang.jun3g@zte.com.cn Change-Id: Ie8f12ca141448087affc66d4a6a3594a21125f0d
author: jun.zhang <zhang.jun3g@zte.com.cn> 2015-06-02 19:39:07 -0400
committer: Julien-zte <zhang.jun3g@zte.com.cn> 2015-10-29 10:56:59 +0800
commit: 28a4aba23a93ca9096a60d3a556bf9c8be3e62cc (patch)
tree: 192b456a82121c61a640a3ecd704129e75ce1f4b /Section_5_VIM_HA.rst
parent: 5198884f87118fe0595e6882c9ea5c318b8b484a (diff)
1 files changed, 133 insertions, 0 deletions
diff --git a/Section_5_VIM_HA.rst b/Section_5_VIM_HA.rst
new file mode 100644
index 0000000..ac16515
--- /dev/null
+++ b/Section_5_VIM_HA.rst
@@ -0,0 +1,133 @@
+
+This section about VIM High availability
+
+============================
+5     VIM High availability
+============================
+The VIM in the NFV reference architecture  contains all the control nodes of OpenStack, SDN controllers
+and hardware controllers. It manages the NFVI according to the instructions/requests of the VNFM and
+NFVO and reports them back about the NFVI status. To guarantee the high availability of the VIM is
+a basic requirement of the OPNFV platform. Also the VIM should provide some mechanism for VNFs to achieve
+their own high availability.
+
+5.1 Architecture requirement of VIM HA
+---------------------------------------
+The architecture of the control nodes should avoid any single point of failure and the management
+network plane which connects the control nodes should also be redundant. Services of the control nodes
+which are stateless like nova-API, glance-API etc. should be redundant but without data synchronization.
+Stateful services like MySQL, Rabbit MQ, SDN controller should provide complex redundancy policies.
+Cloud of different scale may also require different HA policies.
+
+Requirement:
+------------
+- In small scale scenario active-standby redundancy policy would be acceptable.
+
+- In large scale scenario all stateful services like database, message queue, SDN controller
+  should be deployed in cluster mode which support N-way, N+M active-standby redundancy.
+
+- In large scale scenario all stateless services like nova-api, glance-api etc. should be deployed
+  in all active mode.
+
+- Load balance nodes which introduced for all active and N+M mode should also avoid the single point
+  of failure.
+
+- All control node servers shall have at least two network ports to connect to different networks
+  plane. These ports shall work in bonding manner.
+
+- Any failures of services in the redundant pairs should be detected and switch over should be carried out
+  automatically in less than 5 seconds totally.
+
+- Status of services must be monitored.
+
+
+5.2 Fault detection and alarm requirement of VIM
+--------------------------------------------------
+Redundant architecture can provide function continuity for the VIM. For maintenance considerations
+all failures in the VIM should be detected and notifications should be triggered to NFVO, VNFM and other
+VIM consumers.
+
+Requirement:
+------------
+- All hardware failures of control nodes should be detected and relevant alarms should be triggered.
+  OSS, NFVO, VNFM and other VIM consumers can subscribe these alarms.
+
+- Software on control nodes like OpenStack or ODL should be monitored by the clustering software
+  at process level and alarms should be triggered when exceptions are detected.
+
+- Software on compute nodes like OpenStack/nova agents, ovs should be monitored by watchdog. When
+  exceptions are detected the software should be restored automatically and alarms should be triggered.
+
+- Software on storage nodes like Ceph, should be monitored by watchdog. When
+  exceptions are detected the software should be restored automatically and alarms should be triggered.
+
+- All alarm indicators should include: Failure time, Failure location, Failure type, Failure level.
+
+- The VIM should provide an interface through which consumers can subscribe to alarms and notifications.
+
+- All alarms and notifications should be kept for future inquiry in VIM, ageing policy of these records
+  should be configurable.
+
+- VIM should distinguish between the failure of the compute node and the failure of the host HW.
+
+- VIM should be able to publish the health status of the compute node to NFV MANO.
+
+5.3 HA mechanism of VIM provided for VNFs
+------------------------------------------
+When VNFs deploy their HA scheme, they usually require from underlying resource to provide some mechanism.
+This is similar to the hardware watchdog in the traditional network devices. Also virtualization
+introduces some other requirements like affinity and anti-affinity with respect to the allocation of the
+different virtual resources.
+
+Requirement
+------------
+- VIM should provide the ability to configure HA functions like watchdog timers,
+  redundant network ports and etc. These HA functions should be properly tagged and exposed to
+  VNF and VNFM with standard APIs.
+
+- VIM should provide anti-affinity scheme for VNF to deploy redundant service on different level of
+  aggregation of resource.
+
+- VIM should be able to deploy classified virtual resources to VNFs following the SAL description in VNFD.
+
+- VIM should provide data collection to calculate the HA related metrics for VNFs.
+
+- VIM should support the VNF/VNFM to initiate the operation of resources of the NFVI, such as repair/reboot.
+
+- VIM should correlate the failures detected on collocated virtual resources to identify latent faults in
+  HW and virtualization facilities
+
+- VIM should be able to disallow the live migration of VMs and when it is allowed it should be possible
+  to specify the tolerated interruption time.
+
+- VIM should be able to restrict the simultaneous migration of VMs hosting a given VNF.
+
+- VIM should provide the APIs to trigger scale in/out to VNFM/VNF.
+
+- When scheduler of the VIM use the Active/active HA scheme, multiple scheduler instances must not create
+  a race condition
+
+- VIM should be able to trigger the evacuation of the VMs before bringing the host down
+  when *maintenance mode* is set for the compute host.
+
+- VIM should configure Consoleauth in active/active HA mode, and should store the token in database.
+
+- VIM should replace a failed VM with a new VM and this new VM should start in the same initial state
+  as the failed VM.
+
+- VIM should support policies to prioritize a certain VNF.
+
+5.4 SDN controller
+-------------------
+SDN controller: Distributed or Centralized
+
+Requriements
+-------------
+- In centralized model SDN controller must be deployed as redundant pairs.
+
+- In distributed model, mastership election must determine which node is in overall control.
+
+- For distributed model, VNF should not be aware of HA of controller. That is it is a - logically centralized
+  system for NBI(Northbound Interface).
+
+- Event notification is required as section 5.2 mentioned.
+
author	jun.zhang <zhang.jun3g@zte.com.cn>	2015-06-02 19:39:07 -0400
committer	Julien-zte <zhang.jun3g@zte.com.cn>	2015-10-29 10:56:59 +0800
commit	28a4aba23a93ca9096a60d3a556bf9c8be3e62cc (patch)
tree	192b456a82121c61a640a3ecd704129e75ce1f4b /Section_5_VIM_HA.rst
parent	5198884f87118fe0595e6882c9ea5c318b8b484a (diff)