Merge "Use Case 2 VNF_high_availability_across_VIM"

author: Chaoyi Huang <joehuang@huawei.com> 2015-11-06 00:18:27 +0000
committer: Gerrit Code Review <gerrit@172.30.200.206> 2015-11-06 00:18:27 +0000
commit: 817cb001b0ec1e30fd1db359cd71c00db59deba7 (patch)
tree: 50b0bb3f67425ff16587c36f1d293025609976b3 /VNF_high_availability_across_VIM.rst
parent: 2783cc9d74916667ffcbb4616b687b5dd2701933 (diff)
parent: f16289a06b32a5a68029f3bedf5f502691f4e91a (diff)
1 files changed, 160 insertions, 0 deletions
diff --git a/VNF_high_availability_across_VIM.rst b/VNF_high_availability_across_VIM.rst
new file mode 100644
index 0000000..1a7d41b
--- /dev/null
+++ b/VNF_high_availability_across_VIM.rst
@@ -0,0 +1,160 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+VNF high availability across VIM
+=======================================
+
+Problem description
+===================
+
+Abstract
+------------
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Description
+------------
+VNF (Telecom application running over cloud) may (already) be designed as
+Active-Standby/Active-Active/N-Way to achieve high availability,
+
+With a telecoms focus, this generally refers both to availability of service
+(i.e. the ability to make new calls), but also maintenance of ongoing control
+plane state and active media processing(i.e. “keeping up” existing calls).
+
+Traditionally telecoms systems are designed to maintain state and calls across
+pretty much the full range of single-point failures.  As listed this includes
+power supply, hard drive, physical server or network switch, but also covers
+software failure, and maintenance operations such as software upgrade.
+
+To provide this support, typically requires state replication between
+application instances (directly or via replicated database services, or via
+private designed message format).  It may also require special case handling of
+media endpoints, to allow transfer of median short time scales (<1s) without
+requiring end-to-end resignalling (e.g.RTP redirection via IP / MAC address
+transfers c.f VRRP).
+
+With a migration to NFV, a commonly expressed desire by carriers is to provide
+the same resilience to any single point(s) of failure in the cloud
+infrastructure.
+
+This could be done by making each cloud instance fully HA (a non-trivial task to
+do right and to prove it has been done right) , but the preferred approach
+appears to be to accept the currently limited availability of a given cloud
+instance (no desire to radically rework this for telecoms), and instead to
+provide solution availability by spreading function across multiple cloud
+instances (i.e. the same approach used today todeal with hardware and software
+failures).
+
+A further advantage of this approach, is it provides a good basis for seamless
+upgrade of infrastructure software revision, where you can spin up an additional
+up-level cloud, gradually transfer over resources / app instances from one of
+your other clouds, before finally turning down the old cloud instance when no
+longer required.
+
+If fast media / control failure over is still required (which many/most carriers
+still seem to believe it is) there are some interesting/hard requirements on the
+networking between cloud instances. To help with this, many people appear
+willing to provide multiple “independent” cloud instances in a single geographic
+site, with special networking between clouds in that physical site.
+"independent" in quotes is because some coordination between cloud instances is
+obviously required, but this has to be implemented in a fashion which reduces
+the potential for correlated failure to very low levels (at least as low as the
+required overall application availability).
+
+Analysis of requirements to OpenStack
+===========================
+The VNF often has different networking plane for different purpose:
+
+external network plane: using for communication with other VNF
+components inter-communication plane: one VNF often consisted of several
+components, this plane is designed for components inter-communication with each
+other
+backup plance: this plane is used for the heart beat or state replication
+between the component's active/standy or active/active or N-way cluster.
+management plane: this plane is mainly for the management purpose
+
+Generally these planes are seperated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexsible IP addressing
+plane.
+
+to make the VNF can work with HA mode across different OpenStack instances in
+one site (but not limited to), need to support at lease the backup plane across
+different OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plance for
+heartbeat or state replication. Overlay L2 network is preferred, the reason is:
+a. Support legacy compatibility: Some telecom app with built-in internal L2
+network, for easy to move these app to VNF, it would be better to provide L2
+network b. Support IP overlapping: multiple VNFs may have overlaping IP address
+for cross OpenStack instance networking
+Therefore, over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+For L3 networking, we can leverage the floating IP provided in current Neutron,
+so no new feature requirement to OpenStack.
+
+3) The IP address used for VNF to connect with other VNFs should be able to be
+floating cross OpenStack instance. For example, if the master failed, the IP
+address should be used in the standby which is running in another OpenStack
+instance. There are some method like VRRP/GARP etc can help the movement of the
+external IP, so no new feature will be added to OpenStack.
+
+
+Prototype
+-----------
+    None.
+
+Proposed solution
+-----------
+
+    requirements perspective It's up to application descision to use L2 or L3
+networking across Neutron.
+
+    For Neutron, a L2 network is consisted of lots of ports. To make the cross
+Neutron L2 networking is workable, we need some fake remote ports in local
+Neutron to represent VMs in remote site ( remote OpenStack ).
+
+    the fake remote port will reside on some VTEP ( for VxLAN ), the tunneling
+IP address of the VTEP should be the attribute of the fake remote port, so that
+the local port can forward packet to correct tunneling endpoint.
+
+    the idea is to add one more ML2 mechnism driver to capture the fake remote
+port CRUD( creation, retievement, update, delete)
+
+    when a fake remote port is added/update/deleted, then the ML2 mechanism
+driver for these fake ports will activate L2 population, so that the VTEP
+tunneling endpoint information could be understood by other local ports.
+
+    it's also required to be able to query the port's VTEP tunneling endpoint
+information through Neutron API, in order to use these information to create
+fake remote port in another Neutron.
+
+    In the past, the port's VTEP ip address is the host IP where the VM resides.
+But the this BP https://review.openstack.org/#/c/215409/ will make the port free
+of binding to host IP as the tunneling endpoint, you can even specify L2GW ip
+address as the tunneling endpoint.
+
+    Therefore a new BP will be registered to processing the fake remote port, in
+order make cross Neutron L2 networking is feasible. RFE is registered first:
+https://bugs.launchpad.net/neutron/+bug/1484005
+
+
+Gaps
+====
+    1) fake remote port for cross Neutron L2 networking
+
+
+**NAME-THE-MODULE issues:**
+
+* Neutron
+
+Affected By
+-----------
+    OPNFV multisite cloud.
+
+References
+==========
+
author	Chaoyi Huang <joehuang@huawei.com>	2015-11-06 00:18:27 +0000
committer	Gerrit Code Review <gerrit@172.30.200.206>	2015-11-06 00:18:27 +0000
commit	817cb001b0ec1e30fd1db359cd71c00db59deba7 (patch)
tree	50b0bb3f67425ff16587c36f1d293025609976b3 /VNF_high_availability_across_VIM.rst
parent	2783cc9d74916667ffcbb4616b687b5dd2701933 (diff)
parent	f16289a06b32a5a68029f3bedf5f502691f4e91a (diff)