This document will provide an overall framework for the high availability
deployment of NFV system. It will also continiously update to include HA
deployment guidelines and suggestions for the releases of OPNFV.

*********************************************************************
Overview of High Available Deployment of OPNFV
*********************************************************************

In this section, we would like to discuss the overall HA deployment of NFV system.
Different modules, such as hardware,VIM,VMs and etc, will be included, and HA
deployment of each single module will be discussed. However, not all of these HA
schemes should be deployed in on system at the same time. For the HA deployment of
a single system, we should consider the tradeoff between high availability and the
cost and resource to leverage.


Architecture of HA deployment
==================================================================

This section intends to introduce the different modules we should consider
when talking about HA deployment. These moduels include the Hardware
(compute, network, storage hardware), the VIM, the hypervisor, VMs and VNFs.
HA schemes for these different moduels should all be considered when deploying
an NFV system. And the schemes should be coordinated so that the system can make
sure to react in its best way when facing failure.

The following picture shows the the architecture of HA deployment based on the
framework from ETSI NFV ISG.

.. figure:: Overview.png
    :alt: Architecture for HA Deployment
    :figclass: align-center

    Fig 1. Architecture of HA Deployment based on the Framework of ETSI NFV ISG

HA deployment topology
==================================================================

This section will introduce the HA deployment topology for an NFV system.
The topology explained in this section is to support the software
cluster of OPNFV platform, which we will discuss in detail in section1.3.

The typical topology of deployment OPNFV platform should include at
least the controller nodes, and the compute nodes. Depend on the request of
the users, standalone network nodes or storage nodes can be added into this
topology. The simplest HA deployment of OPNFV only include the control nodes. Further
HA schemes can be provided to the compute nodes, the network nodes and the storage
nodes, according to the requirement of services deployed on the NFV system.
Figure 2 shows the deployment topology, in which the controller nodes are all in
a cluster, and the compute nodes can be in another cluster.

The control node cluster here is to provide HA for the controller services, so
that the services on the control node can successfully failover when failure
happens and the service can continue. The cluster service should also provide
automatic recovery for the control nodes. For OPNFV, the control node cluster
should include at least 3 nodes, and should be an odd number if the cluster
management system use quorum. This may change if we use different cluster
management schemes though.

The compute node clusters is responsible for providing HA for the services running
on the compute nodes. These services may include agents for openstack, host os,
hypervisors. Such cluster is responsible for the recovery and repair
of the services. However, compute node cluster will certainly bring complexity to
the whole system, and would increase the cost. There could be multiple solutions
for the compute cluster, e.g., senlin from openstack.

There could be other HA solutions for the compute nodes except for cluster. Combination
of congress and doctor can be one of them, in which doctor provides quickly notification
of failure to the VIM, and congress provides proper recovery procedure. In such scheme,
the compute nodes are not recovered by the cluster scheme, but recovered under the
supervision of VIM.

.. figure:: topology_control_compute.png
    :alt: HA Deployment Topology of Control Nodes and Compute Nodes
    :figclass: align-center

    Fig 2. HA Deployment Topology of Control Nodes and Compute Nodes

When the cloud is supporting heavy network traffic, which is often the case for the data
plane services in the Telecom scenarios, it is necessary to deploy standalone network
nodes for openstack, so that the large amont of traffic switching and routing will not
bring extra load to the controller nodes. In figure 3, we add network nodes into the
topology and shows how to deploy it in a high available way. In this figure, the
network nodes are deployed in a cluster. The cluster will provide HA for the services
runing on the network nodes. Such cluster scheme could be the same with that of the
compute nodes.

On thing to be notify is that all hosts in the NFV system should have at least two NICs
that are bonded via LACP.

.. figure:: topology_control_compute_network.png
    :alt: HA Deployment Topology of Control Nodes and Compute Nodes and Network Nodes
    :figclass: align-center

    Fig 3. HA Deployment Topology of Control Nodes, Compute Nodes and network Nodes

The HA deployment for storage can be different for all different storage schemes. We will
discuss the detail of the storage HA deployment in section 1.3.3

Software HA Framework
==================================================================

In this section, we introduce more details about the HA schemes for a complete NFV system.

Openstack Controller services (Openstack services)
--------------------------------------------------------

For the High Availability of OpenStack Controller nodes, Pacemaker and Corosync are
often used. The following texts are refering from the HA guideline of OpenStack, which
gives an example of solution of HA deployment.(http://docs.openstack.org/ha-guide/)

At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines. For OpenStack Controller nodes, a cluster management system,
such as Pacemaker, is recommended to use to provide the following metrics.

1, Awareness of other applications in the stack

2, Awareness of instances on other machines

3, A shared implementation and calculation of quorum.

4, Data integrity through fencing (a non-responsive process does not imply it is not doing anything)

5, Automated recovery of failed instances

Figure 4 shows the details of HA schemes for Openstack controller nodes with Pacemaker.

.. figure:: HA_control.png
    :alt: HA Deployment of Openstack Control Nodes based on Pacemaker
    :figclass: align-center

    Fig 4. HA Deployment of Openstack Control Nodes based on Pacemaker

High availability of all stateless services are provided by pacemaker and HAProxy.

Pacemaker cluster stack is the state-of-the-art high availability and load
balancing stack for the Linux platform. Pacemaker is useful to make OpenStack
infrastructure highly available. Also, it is storage and application-agnostic,
and in no way specific to OpenStack.

Pacemaker relies on the Corosync messaging layer for reliable cluster
communications. Corosync implements the Totem single-ring ordering and
membership protocol. It also provides UDP and InfiniBand based messaging,
quorum, and cluster membership to Pacemaker.

Pacemaker does not inherently (need or want to) understand the applications
it manages. Instead, it relies on resource agents (RAs), scripts that
encapsulate the knowledge of how to start, stop, and check the health
of each application managed by the cluster.These agents must conform
to one of the OCF, SysV Init, Upstart, or Systemd standards.Pacemaker
ships with a large set of OCF agents (such as those managing MySQL
databases, virtual IP addresses, and RabbitMQ), but can also use any
agents already installed on your system and can be extended with your
own (see the developer guide).

After deployment of Pacemaker, HAProxy is used to provide VIP for all the
OpenStack services and act as load balancer. HAProxy provides a fast and
reliable HTTP reverse proxy and load balancer for TCP or HTTP applications.
It is particularly suited for web crawling under very high loads while
needing persistence or Layer 7 processing. It realistically supports tens
of thousands of connections with recent hardware.

Each instance of HAProxy configures its front end to accept connections
only from the virtual IP (VIP) address and to terminate them as a list
of all instances of the corresponding service under load balancing, such
as any OpenStack API service. This makes the instances of HAProxy act
independently and fail over transparently together with the network endpoints
(VIP addresses) failover and, therefore, shares the same SLA.

We can alternatively use a commercial load balancer, which is a hardware or
software. A hardware load balancer generally has good performance.

Galera Cluster, or other database cluster service, should also be deployed
to provide data replication and  synchronization between data base. Galera
Cluster is a synchronous multi-master database cluster, based on MySQL and
the InnoDB storage engine. It is a high-availability service that provides
high system uptime, no data loss, and scalability for growth. The selection
of DB also will have potential influence on the behaviour on the application
code. For instance using Galera Clusterl may give you higher concurrent write
perfomance but may require a more complex conflict resolution.

We can also achieve high availability for the OpenStack database in many different
ways, depending on the type of database that we are using. There are three
implementations of Galera Cluster available:

1, Galera Cluster for MySQL The MySQL reference implementation from Codership;

2, MariaDB Galera Cluster The MariaDB implementation of Galera Cluster, which is
commonly supported in environments based on Red Hat distributions;

3, Percona XtraDB Cluster The XtraDB implementation of Galera Cluster from Percona.

In addition to Galera Cluster, we can also achieve high availability through other
database options, such as PostgreSQL, which has its own replication system.

To make the RabbitMQ high available, Rabbit HA queue should be configued, and all
openstack services should be configurd to use the Rabbit HA queue.

In the meantime, specific schemes should also be provided to avoid single point of
failure of Pacemaker. And services failed should be automaticly repaired.

Note that the scheme we described above is just one possible scheme for the HA
deployment of the controller nodes. Other schemes can also be used to provide cluster
management and monitoring.

SDN controller services
---------------------------------------

SDN controller software is data intensive application. All static and dynamic data has
one or more duplicates distributed to other physical nodes in cluster. Built-in HA schema
always be concordant with data distribution and built-in mechanism will select or
re-select master nodes in cluster. In deployment stage software of SDN controller
should be deployed to at least two or more physical nodes regardless whether the
software is deployed inside VM or containner. Dual management network plane should
be provided for SDN controller cluster to support built-in HA schema.

Storage
----------------------------------------
Depending on what storage scheme deployed, different HA schemes should be used. The following
text are refering from the Mirantis OpenStack reference architecture, which provides suggestions
on the HA deployment of different storage schemes.

1, Ceph

Ceph implements its own HA. When deploying it, enough controller nodes running the Ceph Monitor
service to form a quarum, and enough Ceph OSD nodes to satisfy the object replication factor are
needed.

2, Swift

Swift API relies on the same HAProxy setup with VIP on controller nodes as the other REST
APIs. For small scale deployment, swift storage and Proxy services can be deployed on the
controller nodes. However, for a larger production environment, dedicated storage nodes, in
which two for swift proxy and at least three for swift storage, are needed.


Host OS and Hypervisor
---------------------------------------

The Host OS and Hypervisor should be supervised and monitored for failure, and should be
repaired when failure happens. Such supervision can based on a cluster scheme, or can
just simply use controller to constantly monitor the computer host. Figure 6 shows a
simplified framework for hypervisor cluster.

When host/hypervisor failure happens, VMs on that host should be evacuated. However,
such scheme should coordinate with the VM HA scheme, so that when both the host and the
VM detect the failure, they should know who should take responsibility for the evacuation.


.. figure:: HA_Hypervisor.png
    :alt: HA Deployment of Host OS and Hypervisor
    :figclass: align-center

    Fig 5. HA Deployment of Host OS and Hypervisor

Virtual Machine (VM)
---------------------------------------

VM should be supervised and monitored for failure, and should be repaired when failure
happens. We can rely on the hypervisor to monitor the VM failure. Another scheme can be
used is a cluster for the VM, in which failure of VMs in one cluster can be supervised
and will be repaired by the cluster manager. Pacemaker and other cluster management
schemes can be considered for the VM cluster.

In case when VNFs do not have HA schemes, extra HA scheme for VM should be taken into
consideration. Such approach is kind of best effort for the NFV platform to provide HA
for the VNF service, and may lead to failure copy between VMs when VNF fails. Since the
NFVI can hardly know of the service runing in the VNF, it is imporssible for the NFVI
level to provide overall HA solution for the VNF services. Therefore, even though we
mention this scheme here, we strongly suggest the VNF should have its own HA schemes.

Figure 6 gives an example for the VM active/standby deployment. In this case, both the
active VM and the standby VM are deployed with the same VNF image. When failure happens
to the active VM, the standby VM should take the traffic and replace the active VM. Such
scheme is the best effort of the NFVI when VNFs do not have HA schemes and would only
rely on VMs to provide redundancy. However, for stateful VNFs, there should be data copy
between the active VM and standby VM. In this case, fault for the active VM can also be
copied to the standby VM, leading to failure of the new active VM.

.. figure:: HA_VM.png
    :alt: VM Active/Standby Deployment
    :figclass: align-center

    Fig 6. VM Active/Standby Deployment

Virtual Network Functions (VNF)
---------------------------------------

For telecom services, it is suggested that VNFs should have its own built-in HA schemes
or HA schemes implemented in VNF Managerhave to provide high available services to
the customers. HA schemes for the VNFs can based on cluster. In this case, OpenSAF,
pacemaker and other cluster management services can be used.

HA schemes for the VNFs should be coordinate with the lower layer. For example, it
should be clear which level will take responsibility for VM restart. A suggested
schemes could be, the VNF layer should be responsible for the redundancy and failover
of the VNFs when failure happens. Such failover should take place in quite a short
time (less then seconds). The repairing procedure will then take place from upper
layer to lower layer, that is, the VNF layer will first check if the failure is at
its layer, and should try to repair itself. If it fails to repaire the failure,
the failure should escalate to lower layers and let the NFVI layer to do the repair
work. There could also be cases that the NFVI layer has detected the failure and will
repair it before the escalation. These functions should be complished by the coordination
of all different component, including the VNFM, VIM, VNFs and NFVI.

In the meantime, the VNFs can take advantage of API the hypervisor can provide to
them to enhance HA. Such API may include constant health check from the hypervisor,
affinity/inaffinity deployment support. example about watchdog

Figure 7 gives an example for the VNF HA scheme.

.. figure:: HA_VNF.png
    :alt: HA Deployment of VNFs
    :figclass: align-center

    Fig 7. HA Deployment of VNFs

*********************************************************************************
HA deployment guideline for OPNFV releases
*********************************************************************************

In this section, we will continiously update the HA deployment guideline for the releases
of OPNFV.

HA deployment guideline for Arno
==============================================

Deployment Framework
-----------------------------------------------

Figure 8 shows an overall architecture for the HA deployment of ARNO.

.. figure:: HA_ARNO.png
    :alt: HA Deployment of OPNFV ARNO release
    :figclass: align-center

    Fig 8. HA Deployment of OPNFV ARNO release

For OPNFV Arno release, HA deployment of Openstack Control Node (Openstack Juno) and ODL
controller (ODL Helium) is supported. Both deployment tools (fuel and forman)support
such HA deployment.

For such HA deployment, the following components�� failure is protected

Software:
* Nova scheduler
* Nova conductor
* Cinder scheduler
* Neutron server
* Heat engine

Controller hardware:
* dead server
* dead switch
* dead port
* dead disk
* full disk


HA test result for ARNO
-------------------------------------------------

Two specific High Availability testcases are done on the ARNO release. These test cases
are collaboratively developed by the High Availability project and the Yardstick project.

Both cases are excuted in the China Mobile's Lab, where ARNO SR1 release is deployed with
Fuel.

The two testcases respectively test the following two aspects:

1, Controll Node Service HA

In this test, HA of "nova-api" is tested. According to the result, the service can
successfully failover to the other controller nodes  within 2.36s, once failure happens
at the active node. However, the service can't repair itself automatically. more
explaination about the repair, other services are not tested yet.

2, Control Node Hardware HA

In this test, HA of the controller node hardware is tested. One of the hardware is
abnormally shutdown, and the service of "nova-api" is monitored. According to the test
results, the service can failover to the other controller node within 10.71 secondes.
However, the failed hardware can't automatically repair itself.

See more details about these test cases in the Yardstick doc of "Test Results for
yardstick-opnfv-ha"(https://gerrit.opnfv.org/gerrit/#/c/7543/).

From these basic test cases we can see that OPNFV ARNO has integrated with some HA
schemes in its controller nodes. However, its capability of self repair should be
enhanced.

HA deployment guideline for Brahmaputra
==============================================
In the Brahmaputra release, 4 installers are provided. We will discuss about the HA
deployment of each installer.

Apex
----------------------------------------------------

For the installer of Apex, all of the OpenStack services are in HA on all 3 controllers.
The services are monitored by pacemaker and load balanced by HA Proxy with VIPs.  The
SDN controllers usually only run as a single instance on the first controller with no
HA scheme.

Database is clustered with galera in an active passive failover via pacemaker and the
message bus is rabbitHA and the services are managed by pacemaker.

Storage is using ceph, clustered across the control nodes.

In the future, more work is on the way to provide HA for the SDN controller. The Apex
team has already finished a demo that runs ODL on each controller, load balanced to
neutron via a VIP + HA Proxy, but is not using pacemaker. Meanwhile, they are also
working to include ceph storage HA for compute nodes as well.

Compass
---------------------------------------------------------
TBD

Fuel
-------------------------------------------------------------

At moment Fuel installer support the following HA schemes.

1)Openstackcontrollers: N-way redundant (1,3,5, etc)
2)OpenDaylight:No redundancy
3)Cephstorage OSD: N-way redundant (1,3,5, etc)
4)Networkingattachment redundancy: LAG
5)NTPredundancy: N-way relays, up to 3 upstream sources
6)DNSredundancy: N-way relays, up to 3 upstream sources
7)DHCP:1+1

JOID
---------------------------------------------------------

JOID provides HA based on openstack services. Individual service charms have been
deployed in a container within a host, and each charms are distributed in a way each
service which meant for HA will go into container on individual nodes. For example
keystone service, there are three containers on each control node and VIP has been
assigned to use by the front end API to use keystone. So in case any of the container
fails VIP will keep responding to via the other two services. As HA can be maintainer
with odd units at least one service container is required to response.


Reference
==========

* https://www.rdoproject.org/ha/ha-architecture/
* http://docs.openstack.org/ha-guide/
* https://wiki.opnfv.org/display/availability?preview=/2926706/2926714/scenario_analysis_for_high_availability_in_nfv.pdf
* https://wiki.opnfv.org/display/availability?preview=/2926706/2926708/ha_requirement.pdf