From 197313b46e5e1b21f71ff5e264f43a284ac20dbe Mon Sep 17 00:00:00 2001 From: fuqiao Date: Wed, 25 Nov 2015 10:11:59 +0800 Subject: Scenario analysis doc - general issues for VNF HA Scenario Analysis doc - general issues for VNF HA JIRA:HA 15 Change-Id: I8dff0d1120ac4f5046f678667204cb6bc80d761e --- Scenario/scenario_analysis_multi_site.rst | 70 ---------------- .../scenario_analysis_VNF_external_interface.rst | 97 ++++++++++++++++++++++ 2 files changed, 97 insertions(+), 70 deletions(-) delete mode 100644 Scenario/scenario_analysis_multi_site.rst create mode 100644 Scenario_1/scenario_analysis_VNF_external_interface.rst diff --git a/Scenario/scenario_analysis_multi_site.rst b/Scenario/scenario_analysis_multi_site.rst deleted file mode 100644 index 016fe58..0000000 --- a/Scenario/scenario_analysis_multi_site.rst +++ /dev/null @@ -1,70 +0,0 @@ -5, Multisite Scenario -==================================================== - -The Multisite scenario refers to the cases when VNFs are deployed on multiple VIMs. -There could be two typical usecases for such scenario. - -One is in one DC, multiple openstack cloud are deployed. Taking consideration that the -number of compute nodes in one openstack cloud are quite limited (nearly 100) for -both opensource and commercial product of openstack, multiple openstack cloud will -have to be deployed in the DC to manage thousands of servers. VNFs in such DC should -be possible to be deployed accross openstack cloud. -..[MT] Do we anticipate HA VNFs that require more than 100 VMs so that they need to -be deployed across DCs? Or the goal is to provide higher availability by deploying -across DCs? -..[fq] Here I just try to explain what multisite scenario means. I don't think HA should -be discussed in this scenario since as you said, we can not have 100 more VMs deployed -to be HA. - -The other typical usecase is geographic redundancy. GR deployment is to deal with more -catastrophic failures (flood, earthquake, propagating software fault, and etc.) for one site. -In the Geographic redundancy usecase, VNFs are deployed in two sites, which are -geographically seperated and are managed by seperate VIM. When such a catastrophic -failure happens, the VNFs at the failed site can failover to the redundant one so as to -proceed the service. -..[MT] I agree and this scenario is definitely not limited to HA VNFs. Thus there could -be different mechanisms for the state replication between the sites and from an HA -perspective in this case it is important that the replication mechanism does not degrade -the performance at normal behaviour. - -The multisite scenario is also captured by the Multisite project, in which specific -requirements of openstack are also proposed for different usecases. However, -the multisite project mainly focuses on the requirement of these multisite -usecases on openstack. HA requirements are not necessarily the requirement -for the approaches discussed in multisite. While the HA project tries to -capture the HA requirements in these usecases. -https://gerrit.opnfv.org/gerrit/#/c/2123/ -https://gerrit.opnfv.org/gerrit/#/c/1438/. - - -An architecure of stateful VNF with redundancy in the multisite scenario can be as -follows. Architecture for the other cases can be worked out accordingly. -https://wiki.opnfv.org/_detail/stateful_vnf_in_multisite_scenario.png?id=scenario_analysis_of_high_availability_in_nfv -..[MT] What is the relation of the VMs of a single site e.g. on the left hand side? -Do they collaborate? Do they protect each other? What makes the two VIMs independent -if they need to support that VNF and its VNFM? Could they be logically the same -VIM and wouldn't that be a better solution for the VNF? -..[fq] This is kind of architecture captureed from the multisite project's work. -One VM on the left site is acting as the active VNFC, and the other VM at the right -site is acting as the standby. I assume the two VIM are cooperate with each other -under the control of the orchestrator. I am also thinking that if the two VMs contrled -by one VIM would be a better solution. But apparently that is not the scenario for -multisite, cause they are thinking multisite means you have multi openstack. - - -Below listed the additinal labor and extra requirements of multisite comparing with -the basic usecases. - -1, specific network support for the active/standby or active/active VNFs across VIM. - -In the multisite scenario, instances constructing the VNFs can be placed across VIM. -This will introduce extra network support requirement. For example, heartbeat between -active/standby VMs placed across VIM requires overlay L2 network. The IP address used -for VNF to connect with other VNFs should be able to be floating across VIM as well. - -2, in the multisite scenario, a logical instance of VNFM should be put on multiple -VIM to manage the instances of VNFs placed across the VIM. - -3, in the VM failure scenarios, recovery of failed VM requires interface between -VNFM and the VIM. In the multisite scenario, the VNFM should have knowledge of -which VIM it should communicate with so as to recover the failed VNF. \ No newline at end of file diff --git a/Scenario_1/scenario_analysis_VNF_external_interface.rst b/Scenario_1/scenario_analysis_VNF_external_interface.rst new file mode 100644 index 0000000..7667993 --- /dev/null +++ b/Scenario_1/scenario_analysis_VNF_external_interface.rst @@ -0,0 +1,97 @@ +2. Discussion for the General Issues for VNF HA schemes +=========================================================== + +This section is intended to talk about some general issues in the VNF HA schemes. +In sections 1, the usecases of both stateful and stateless VNFs are discussed. +While in this section, we would like to discuss some specific issues +which are quite general for all the usecases proposed in the previous sections. + +1.1. VNF External Interfacece + +Regardless whether the VNF is stateful or stateless, all the VNFCs should act as +a union from the perspective of the outside world. That means all the VNFCs should +share a common interface where the outside modules (e.g., the other VNFs) can +access the service from. There could be multiple solutions for this share of IP +interface. However, all of this sharing and switching of IP address should be +ignorant to the outside modules. + +There are several approaches for the VNFs to share the interfaces. A few of them +are listed as follows and will be discussed in detail. + +1) IP address of active/stand-by VM. + +2) Load balancers for active/active use cases + +Note that combinition of these two approaches is also feasible. + +For active/standby VNFCs, the HA manager will manage a common IP address +to the active and standby VMs, so that they look as one instance from outside. +(The HA manager may not be aware of this, I.e. the address may be configured +and the active/standby state management is linked to the possession of the IP +address, i.e. the active VNFC claims it as part of becoming active.) Only the +active one possesses the IP address. And when failover happens, the standby +is set to be active and can take possession of the IP address to continue traffic +process. + +..[MT] In general I would rather say that the IP address is managed by the HA +manager and not provided. But as a concrete use case "provide" works fine. +So it depends how you want to use this text. +..[fq] Agree, Thank you! + +For active/active VNFCs, LB(Load Balancer) could be used. In such scenario, there +could be two cases for the deployment and usage of LB. + +Case 1: LB used before a cluster of VNFCs to distribute traffic flow. + +In such case, the LB is deployed in front of a cluster of multiple VNFCs. Such +cluster can be managered by a seperate cluster manager, or can be managed just +by the LB, which is using heartbeat to monitor each VNFC. When one of VNFCs fails, +the cluster manager should recover the failed one, and should also exclude the +failed VNFC from the cluster so that the LB will re-route the traffic to +to the other VNFCs. In the case when the LB is acting as the cluster manager, it is +the LB's responsibility to inform the VNFM to recover the failed VNFC if possible. + + +Case 2: LB used before a cluster of VMs to distribute traffic flow. + +In this case, there exists a cluster manager(e.g. Pacemaker) to monitor and manage +the VMs in the cluster. The LB sits in front of the VM cluster so as to distribute +the traffic. When one of the VM fails, the cluster manager will ditect that and will +be in charge of the recovery. The cluster manager will also exclude the failed VM +out of the cluster, so that the LB won't re-route traffic to the failed one. + +In both two cases, the HA of the LB should also be considered. + +..[MT] I think this use case needs to show also how the LB learns about the new VNFC. +Also we should distinguish VNFC and VM failures as VNFC failure wouldn't be detected +in the NFVI e.g. LB, so we need a resolution, an applicability comment at least. +..[fq] I think I have made a mistake here by saying the VNFC. Actually if the failure +only happens in VNFC, the VNFC should reboot itself rather than have a new VNFC taking +its place. So in this case, I think I should modify VNFC into VMs. And as you mentioned, +the NFVI level can hardly detect VNFC level failure. + +..[MT] There could also be a combined case for the N+M redundancy, when there are N +actives but also M standbys at the VNF level. +..[fq] It could be. But I actually haven't see such a deployed case. So I am not sure +if I can discribe the schemes correctly:) + +1.2. Intra-VNF Communication + +For stateful VNFs, data synchronization is necessary between the active and standby VMs. +The HA manager is responsible for handling VNFC failover, and do the assignment of the +active/standby states between the VNFCs of the VNF. Data synchronization can be handled +either by the HA manager or by the VNFC itself. + +The state synchronization can happen as + +- direct communication between the active and the standby VNFCs + +- based on the information received from the HA manager on channel or messages using a common queue, + +..[MT] I don't understand the yellow inserted text +..[fq] Neither do I, actually. I think it is added by some one else and I can't make +out what it means as well:) + +- it could be through a shared storage assigned to the whole VNF + +- through in-memory database (checkpointing), when the database (checkpoint service) takes care of the data replication. -- cgit 1.2.3-korg