From 197313b46e5e1b21f71ff5e264f43a284ac20dbe Mon Sep 17 00:00:00 2001
From: fuqiao <fuqiao@chinamobile.com>
Date: Wed, 25 Nov 2015 10:11:59 +0800
Subject: Scenario analysis doc - general issues for VNF HA

Scenario Analysis doc - general issues for VNF HA
JIRA:HA 15

Change-Id: I8dff0d1120ac4f5046f678667204cb6bc80d761e
---
 Scenario/scenario_analysis_multi_site.rst          | 70 ----------------
 .../scenario_analysis_VNF_external_interface.rst   | 97 ++++++++++++++++++++++
 2 files changed, 97 insertions(+), 70 deletions(-)
 delete mode 100644 Scenario/scenario_analysis_multi_site.rst
 create mode 100644 Scenario_1/scenario_analysis_VNF_external_interface.rst

diff --git a/Scenario/scenario_analysis_multi_site.rst b/Scenario/scenario_analysis_multi_site.rst
deleted file mode 100644
index 016fe58..0000000
--- a/Scenario/scenario_analysis_multi_site.rst
+++ /dev/null
@@ -1,70 +0,0 @@
-5, Multisite Scenario
-====================================================
-
-The Multisite scenario refers to the cases when VNFs are deployed on multiple VIMs.
-There could be two typical usecases for such scenario. 
-
-One is in one DC, multiple openstack cloud are deployed. Taking consideration that the
-number of compute nodes in one openstack cloud are quite limited (nearly 100) for
-both opensource and commercial product of openstack, multiple openstack cloud will
-have to be deployed in the DC to manage thousands of servers. VNFs in such DC should
-be possible to be deployed accross openstack cloud.  
-..[MT] Do we anticipate HA VNFs that require more than 100 VMs so that they need to
-be deployed across DCs? Or the goal is to provide higher availability by deploying
-across DCs?
-..[fq] Here I just try to explain what multisite scenario means. I don't think HA should
-be discussed in this scenario since as you said, we can not have 100 more VMs deployed
-to be HA.
-
-The other typical usecase is geographic redundancy. GR deployment is to deal with more
-catastrophic failures (flood, earthquake, propagating software fault, and etc.) for one site.
-In the Geographic redundancy usecase, VNFs are deployed in two sites, which are
-geographically seperated and are managed by seperate VIM. When such a catastrophic
-failure happens, the VNFs at the failed site can failover to the redundant one so as to
-proceed the service. 
-..[MT] I agree and this scenario is definitely not limited to HA VNFs. Thus there could
-be different mechanisms for the state replication between the sites and from an HA
-perspective in this case it is important that the replication mechanism does not degrade
-the performance at normal behaviour.
-
-The multisite scenario is also captured by the Multisite project, in which specific
-requirements of openstack are also proposed for different usecases. However,
-the multisite project mainly focuses on the requirement of these multisite
-usecases on openstack. HA requirements are not necessarily the requirement
-for the approaches discussed in multisite. While the HA project tries to
-capture the HA requirements in these usecases. 
-https://gerrit.opnfv.org/gerrit/#/c/2123/
-https://gerrit.opnfv.org/gerrit/#/c/1438/. 
-
-
-An architecure of stateful VNF with redundancy in the multisite scenario can be as
-follows. Architecture for the other cases can be worked out accordingly.
-https://wiki.opnfv.org/_detail/stateful_vnf_in_multisite_scenario.png?id=scenario_analysis_of_high_availability_in_nfv
-..[MT] What is the relation of the VMs of a single site e.g. on the left hand side?
-Do they collaborate? Do they protect each other? What makes the two VIMs independent
-if they need to support that VNF and its VNFM? Could they be logically the same
-VIM and wouldn't that be a better solution for the VNF?
-..[fq] This is kind of architecture captureed from the multisite project's work.
-One VM on the left site is acting as the active VNFC, and the other VM at the right
-site is acting as the standby. I assume the two VIM are cooperate with each other
-under the control of the orchestrator. I am also thinking that if the two VMs contrled
-by one VIM would be a better solution. But apparently that is not the scenario for
-multisite, cause they are thinking multisite means you have multi openstack.
-
-
-Below listed the additinal labor and extra requirements of multisite comparing with
-the basic usecases.
-
-1, specific network support for the active/standby or active/active VNFs across VIM.
-
-In the multisite scenario, instances constructing the VNFs can be placed across VIM.
-This will introduce extra network support requirement. For example, heartbeat between
-active/standby VMs placed across VIM requires overlay L2 network. The IP address used
-for VNF to connect with other VNFs should be able to be floating across VIM as well. 
-
-2, in the multisite scenario, a logical instance of VNFM should be put on multiple
-VIM to manage the instances of VNFs placed across the VIM. 
-
-3, in the VM failure scenarios, recovery of failed VM requires interface between
-VNFM and the VIM. In the multisite scenario, the VNFM should have knowledge of
-which VIM it should communicate with so as to recover the failed VNF.
\ No newline at end of file
diff --git a/Scenario_1/scenario_analysis_VNF_external_interface.rst b/Scenario_1/scenario_analysis_VNF_external_interface.rst
new file mode 100644
index 0000000..7667993
--- /dev/null
+++ b/Scenario_1/scenario_analysis_VNF_external_interface.rst
@@ -0,0 +1,97 @@
+2. Discussion for the General Issues for VNF HA schemes
+===========================================================
+
+This section is intended to talk about some general issues in the VNF HA schemes.
+In sections 1, the usecases of both stateful and stateless VNFs are discussed.
+While in this section, we would like to discuss some specific issues
+which are quite general for all the usecases proposed in the previous sections.
+
+1.1. VNF External Interfacece
+
+Regardless whether the VNF is stateful or stateless, all the VNFCs should act as
+a union from the perspective of the outside world. That means all the VNFCs should
+share a common interface where the outside modules (e.g., the other VNFs) can
+access the service from. There could be multiple solutions for this share of IP
+interface. However, all of this sharing and switching of IP address should be
+ignorant to the outside modules.
+
+There are several approaches for the VNFs to share the interfaces. A few of them
+are listed as follows and will be discussed in detail. 
+
+1) IP address of active/stand-by VM.
+
+2) Load balancers for active/active use cases
+
+Note that combinition of these two approaches is also feasible.
+
+For active/standby VNFCs, the HA manager will manage a common IP address
+to the active and standby VMs, so that they look as one instance from outside.
+(The HA manager may not be aware of this, I.e. the address may be configured
+and the active/standby state management is linked to the possession of the IP
+address, i.e. the active VNFC claims it as part of becoming active.) Only the
+active one possesses the IP address. And when failover happens, the standby
+is set to be active and can take possession of the IP address to continue traffic
+process.
+
+..[MT] In general I would rather say that the IP address is managed by the HA
+manager and not provided. But as a concrete use case "provide" works fine.
+So it depends how you want to use this text.
+..[fq] Agree, Thank you!
+
+For active/active VNFCs, LB(Load Balancer) could be used. In such scenario, there
+could be two cases for the deployment and usage of LB.
+
+Case 1: LB used before a cluster of VNFCs to distribute traffic flow.
+
+In such case, the LB is deployed in front of a cluster of multiple VNFCs. Such
+cluster can be managered by a seperate cluster manager, or can be managed just
+by the LB,  which is using heartbeat to monitor each VNFC. When one of VNFCs fails,
+the cluster manager should recover the failed one, and should also exclude the 
+failed VNFC from the cluster so that the LB will re-route the traffic to
+to the other VNFCs. In the case when the LB is acting as the cluster manager, it is
+the LB's responsibility to inform the VNFM to recover the failed VNFC if possible.
+
+
+Case 2: LB used before a cluster of VMs to distribute traffic flow.
+
+In this case, there exists a cluster manager(e.g. Pacemaker) to monitor and manage
+the VMs in the cluster. The LB sits in front of the VM cluster so as to distribute
+the traffic. When one of the VM fails, the cluster manager will ditect that and will
+be in charge of the recovery. The cluster manager will also exclude the failed VM
+out of the cluster, so that the LB won't re-route traffic to the failed one.
+ 
+In both two cases, the HA of the LB should also be considered.
+
+..[MT] I think this use case needs to show also how the LB learns about the new VNFC.
+Also we should distinguish VNFC and VM failures as VNFC failure wouldn't be detected
+in the NFVI e.g. LB, so we need a resolution, an applicability comment at least.
+..[fq] I think I have made a mistake here by saying the VNFC. Actually if the failure
+only happens in VNFC, the VNFC should reboot itself rather than have a new VNFC taking
+its place. So in this case, I think I should modify VNFC into VMs. And as you mentioned,
+the NFVI level can hardly detect VNFC level failure.
+
+..[MT] There could also be a combined case for the N+M redundancy, when there are N
+actives but also M standbys at the VNF level.
+..[fq] It could be. But I actually haven't see such a deployed case. So I am not sure
+if I can discribe the schemes correctly:)
+
+1.2. Intra-VNF Communication
+
+For stateful VNFs, data synchronization is necessary between the active and standby VMs.
+The HA manager is responsible for handling VNFC failover, and do the assignment of the
+active/standby states between the VNFCs of the VNF. Data synchronization can be handled
+either by the HA manager or by the VNFC itself.
+
+The state synchronization can happen as
+
+- direct communication between the active and the standby VNFCs
+
+- based on the information received from the HA manager on channel or messages using a common queue,
+
+..[MT] I don't understand the yellow inserted text
+..[fq] Neither do I, actually. I think it is added by some one else and I can't make
+out what it means as well:)
+
+- it could be through a shared storage assigned to the whole VNF
+
+- through in-memory database (checkpointing), when the database (checkpoint service) takes care of the data replication.
-- 
cgit 1.2.3-korg