diff options
author | Chaoyi Huang <joehuang@huawei.com> | 2015-09-30 11:23:32 +0800 |
---|---|---|
committer | Chaoyi Huang <joehuang@huawei.com> | 2015-09-30 11:41:14 +0800 |
commit | f16289a06b32a5a68029f3bedf5f502691f4e91a (patch) | |
tree | 752acd260d9663caf196b7fcdf5450794e3afc46 /VNF_high_availability_across_VIM.rst | |
parent | 9009692b06aa7323af3c318d02de21d0fa1ac335 (diff) |
Use Case 2 VNF_high_availability_across_VIM
Cross Neutron l2/l3 networking to support high availability telecom application ( VNF )
deployed into different OpenStack instances.
Change-Id: I46ab9de72193e0b06efa6c9d79b8b2a976d1c68b
Signed-off-by: Chaoyi Huang <joehuang@huawei.com>
Diffstat (limited to 'VNF_high_availability_across_VIM.rst')
-rw-r--r-- | VNF_high_availability_across_VIM.rst | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/VNF_high_availability_across_VIM.rst b/VNF_high_availability_across_VIM.rst new file mode 100644 index 0000000..1a7d41b --- /dev/null +++ b/VNF_high_availability_across_VIM.rst @@ -0,0 +1,160 @@ +This work is licensed under a Creative Commons Attribution 3.0 Unported License. +http://creativecommons.org/licenses/by/3.0/legalcode + + +======================================= +VNF high availability across VIM +======================================= + +Problem description +=================== + +Abstract +------------ + +a VNF (telecom application) should, be able to realize high availability +deloyment across OpenStack instances. + +Description +------------ +VNF (Telecom application running over cloud) may (already) be designed as +Active-Standby/Active-Active/N-Way to achieve high availability, + +With a telecoms focus, this generally refers both to availability of service +(i.e. the ability to make new calls), but also maintenance of ongoing control +plane state and active media processing(i.e. “keeping up” existing calls). + +Traditionally telecoms systems are designed to maintain state and calls across +pretty much the full range of single-point failures. As listed this includes +power supply, hard drive, physical server or network switch, but also covers +software failure, and maintenance operations such as software upgrade. + +To provide this support, typically requires state replication between +application instances (directly or via replicated database services, or via +private designed message format). It may also require special case handling of +media endpoints, to allow transfer of median short time scales (<1s) without +requiring end-to-end resignalling (e.g.RTP redirection via IP / MAC address +transfers c.f VRRP). + +With a migration to NFV, a commonly expressed desire by carriers is to provide +the same resilience to any single point(s) of failure in the cloud +infrastructure. + +This could be done by making each cloud instance fully HA (a non-trivial task to +do right and to prove it has been done right) , but the preferred approach +appears to be to accept the currently limited availability of a given cloud +instance (no desire to radically rework this for telecoms), and instead to +provide solution availability by spreading function across multiple cloud +instances (i.e. the same approach used today todeal with hardware and software +failures). + +A further advantage of this approach, is it provides a good basis for seamless +upgrade of infrastructure software revision, where you can spin up an additional +up-level cloud, gradually transfer over resources / app instances from one of +your other clouds, before finally turning down the old cloud instance when no +longer required. + +If fast media / control failure over is still required (which many/most carriers +still seem to believe it is) there are some interesting/hard requirements on the +networking between cloud instances. To help with this, many people appear +willing to provide multiple “independent” cloud instances in a single geographic +site, with special networking between clouds in that physical site. +"independent" in quotes is because some coordination between cloud instances is +obviously required, but this has to be implemented in a fashion which reduces +the potential for correlated failure to very low levels (at least as low as the +required overall application availability). + +Analysis of requirements to OpenStack +=========================== +The VNF often has different networking plane for different purpose: + +external network plane: using for communication with other VNF +components inter-communication plane: one VNF often consisted of several +components, this plane is designed for components inter-communication with each +other +backup plance: this plane is used for the heart beat or state replication +between the component's active/standy or active/active or N-way cluster. +management plane: this plane is mainly for the management purpose + +Generally these planes are seperated with each other. And for legacy telecom +application, each internal plane will have its fixed or flexsible IP addressing +plane. + +to make the VNF can work with HA mode across different OpenStack instances in +one site (but not limited to), need to support at lease the backup plane across +different OpenStack instances: + +1) Overlay L2 networking or shared L2 provider networks as the backup plance for +heartbeat or state replication. Overlay L2 network is preferred, the reason is: +a. Support legacy compatibility: Some telecom app with built-in internal L2 +network, for easy to move these app to VNF, it would be better to provide L2 +network b. Support IP overlapping: multiple VNFs may have overlaping IP address +for cross OpenStack instance networking +Therefore, over L2 networking across Neutron feature is required in OpenStack. + +2) L3 networking cross OpenStack instance for heartbeat or state replication. +For L3 networking, we can leverage the floating IP provided in current Neutron, +so no new feature requirement to OpenStack. + +3) The IP address used for VNF to connect with other VNFs should be able to be +floating cross OpenStack instance. For example, if the master failed, the IP +address should be used in the standby which is running in another OpenStack +instance. There are some method like VRRP/GARP etc can help the movement of the +external IP, so no new feature will be added to OpenStack. + + +Prototype +----------- + None. + +Proposed solution +----------- + + requirements perspective It's up to application descision to use L2 or L3 +networking across Neutron. + + For Neutron, a L2 network is consisted of lots of ports. To make the cross +Neutron L2 networking is workable, we need some fake remote ports in local +Neutron to represent VMs in remote site ( remote OpenStack ). + + the fake remote port will reside on some VTEP ( for VxLAN ), the tunneling +IP address of the VTEP should be the attribute of the fake remote port, so that +the local port can forward packet to correct tunneling endpoint. + + the idea is to add one more ML2 mechnism driver to capture the fake remote +port CRUD( creation, retievement, update, delete) + + when a fake remote port is added/update/deleted, then the ML2 mechanism +driver for these fake ports will activate L2 population, so that the VTEP +tunneling endpoint information could be understood by other local ports. + + it's also required to be able to query the port's VTEP tunneling endpoint +information through Neutron API, in order to use these information to create +fake remote port in another Neutron. + + In the past, the port's VTEP ip address is the host IP where the VM resides. +But the this BP https://review.openstack.org/#/c/215409/ will make the port free +of binding to host IP as the tunneling endpoint, you can even specify L2GW ip +address as the tunneling endpoint. + + Therefore a new BP will be registered to processing the fake remote port, in +order make cross Neutron L2 networking is feasible. RFE is registered first: +https://bugs.launchpad.net/neutron/+bug/1484005 + + +Gaps +==== + 1) fake remote port for cross Neutron L2 networking + + +**NAME-THE-MODULE issues:** + +* Neutron + +Affected By +----------- + OPNFV multisite cloud. + +References +========== + |