From 49bb869c1ca5b86164730b842f712bafbe5cef58 Mon Sep 17 00:00:00 2001 From: Yifei Xue Date: Wed, 2 Dec 2015 12:56:23 +0800 Subject: Add the use cases for network nodes JIRA: HA-16 Change-Id: I1c44ce114522b9af95ce056856f0e36c729e0722 Signed-off-by: Yifei Xue --- UseCases/UseCases_for_Network_Nodes.rst | 157 +++++++++++++++++++++ UseCases/images_network_nodes/DHCP_deployment.png | Bin 0 -> 25005 bytes UseCases/images_network_nodes/DPCH_failure.png | Bin 0 -> 30573 bytes UseCases/images_network_nodes/L3_deployment.png | Bin 0 -> 28040 bytes UseCases/images_network_nodes/L3_failure.png | Bin 0 -> 28900 bytes UseCases/images_network_nodes/L3_ha_principle.png | Bin 0 -> 30089 bytes UseCases/images_network_nodes/L3_ha_workflow.png | Bin 0 -> 11844 bytes UseCases/images_network_nodes/LBaaS_deployment.png | Bin 0 -> 29490 bytes UseCases/images_network_nodes/LBaaS_failure.png | Bin 0 -> 31159 bytes .../Network_nodes_deployment.png | Bin 0 -> 77307 bytes 10 files changed, 157 insertions(+) create mode 100644 UseCases/UseCases_for_Network_Nodes.rst create mode 100755 UseCases/images_network_nodes/DHCP_deployment.png create mode 100755 UseCases/images_network_nodes/DPCH_failure.png create mode 100755 UseCases/images_network_nodes/L3_deployment.png create mode 100755 UseCases/images_network_nodes/L3_failure.png create mode 100755 UseCases/images_network_nodes/L3_ha_principle.png create mode 100755 UseCases/images_network_nodes/L3_ha_workflow.png create mode 100755 UseCases/images_network_nodes/LBaaS_deployment.png create mode 100755 UseCases/images_network_nodes/LBaaS_failure.png create mode 100755 UseCases/images_network_nodes/Network_nodes_deployment.png diff --git a/UseCases/UseCases_for_Network_Nodes.rst b/UseCases/UseCases_for_Network_Nodes.rst new file mode 100644 index 0000000..bc9266a --- /dev/null +++ b/UseCases/UseCases_for_Network_Nodes.rst @@ -0,0 +1,157 @@ +4 High Availability Scenarios for Network Nodes +=============================================== + +4.1 Network nodes and HA deployment +----------------------------------- + +OpenStack network nodes contain: Neutron DHCP agent, Neutron L2 agent, Neutron L3 agent, Neutron LBaaS +agent and Neutron Metadata agent. The DHCP agent provides DHCP services for virtual networks. The +metadata agent provides configuration information such as credentials to instances. Note that the +L2 agent cannot be distributed and highly available. Instead, it must be installed on each data +forwarding node to control the virtual network drivers such as Open vSwitch or Linux Bridge. One L2 +agent runs per node and controls its virtual interfaces. + +A typical HA deployment of network nodes can be achieved in Fig 20. Here shows a two nodes cluster. +The number of the nodes is decided by the size of the cluster. It can be 2 or more. More details can be +achieved from each agent's part. + + +.. figure:: images_network_nodes/Network_nodes_deployment.png + :alt: HA deployment of network nodes + :figclass: align-center + + Fig 20. A typical HA deployment of network nodes + + +4.2 DHCP agent +-------------- + +The DHCP agent can be natively highly available. Neutron has a scheduler which lets you run multiple +agents across nodes. You can configure the dhcp_agents_per_network parameter in the neutron.conf file +and set it to X (X >=2 for HA, default is 1). + +If the X is set to 2, as depicted in Fig 21 three tenant networks (there can be multiple tenant networks) +are used as an example, six DHCP agents are deployed in two nodes for three networks, they are +all active. Two dhcp1s serve one network, dhcp2s and dhcp3s serve other two different networks. In a +network, all DHCP traffic is broadcast, DHCP servers race to offer IP. All the servers will update the +lease tables. In Fig 22, when the agent(s) in Node1 doesn't work which can be caused by software +failure or hardware failure, the dhcp agent(s) on Node2 will continue to offer IP for the network. + + +.. figure:: images_network_nodes/DHCP_deployment.png + :alt: HA deployment of DHCP agents + :figclass: align-center + + Fig 21. Natively HA deployment of DHCP agents + + +.. figure:: images_network_nodes/DHCP_failure.png + :alt: Failure of DHCP agents + :figclass: align-center + + Fig 22. Failure of DHCP agents + + +4.3 L3 agent +------------ + +The L3 agent is also natively highly available. To achieve HA, it can be configured in the neutron.conf +file. + +.. code-block:: bash + + l3_ha = True # All routers are highly available by default + + allow_automatic_l3agent_failover = True # Set automatic L3 agent failover for routers + + max_l3_agents_per_router = 2 # Maximum number of network nodes to use for the HA router + + min_l3_agents_per_router = 2 # Minimum number of network nodes to use for the HA router. A new router + can be created only if this number of network nodes are available. + +According to the neutron.conf file, the L3 agent scheduler supports Virtual Router Redundancy +Protocol (VRRP) to distribute virtual routers across multiple nodes (e.g. 2). The scheduler will choose +a number between the maximum and the minimum number according scheduling algorithm. VRRP is implemented +by Keepalived. + +As depicted in Fig 23, both L3 agents in Node1 and Node2 host vRouter 1 and vRouter 2. In Node 1, +vRouter 1 is active and vRouter 2 is standby (hot standby). In Node2, vRouter 1 is standby and +vRouter 2 is active. For the purpose of reducing the load, two actives are deployed in two Nodes +alternatively. In Fig 24, Keepalived will be used to manage the VIP interfaces. One instance of +keepalived per virtual router, then one per namespace. 169.254.192.0/18 is a dedicated HA network +which is created in order to isolate the administrative traffic from the tenant traffic, each vRouter +will be connected to this dedicated network via an HA port. More details can be achieved from the +Reference at the bottom. + + +.. figure:: images_network_nodes/L3_deployment.png + :alt: HA deployment of L3 agents + :figclass: align-center + + Fig 23. Natively HA deployment of L3 agents + + +.. figure:: images_network_nodes/L3_ha_principle.png + :alt: HA principle of L3 agents + :figclass: align-center + + Fig 24. Natively HA principle of L3 agents + + +In Fig 25, when vRouter 1 in Node1 is down which can be caused by software failure or hardware failure, +the Keepalived will detect the failure and the standby will take over to be active. In order to keep the +TCP connection, Conntrackd is used to maintain the TCP sessions going through the router. One instance +of conntrackd per virtual router, then one per namespace. After then, a rescheduling procedure will be +triggered to respawn the failed virtual router to another l3 agent as standby. All the workflows is +depicted in Fig 26. + + +.. figure:: images_network_nodes/L3_failure.png + :alt: Failure of L3 agents + :figclass: align-center + + Fig 25. Failure of L3 agents + + +.. figure:: images_network_nodes/L3_ha_workflow.png + :alt: HA workflow of L3 agents + :figclass: align-center + + Fig 26. HA workflow of L3 agents + + +4.4 LBaaS agent and Metadata agent +---------------------------------- + +Currently, no native feature is provided to make the LBaaS agent highly available using the defaul +plug-in HAProxy. A common way to make HAProxy highly available is to use Pacemaker. + + +.. figure:: images_network_nodes/LBaaS_deployment.png + :alt: HA deployment of LBaaS agents + :figclass: align-center + + Fig 27. HA deployment of LBaaS agents using Pacemaker + + +As shown in Fig 27 HAProxy and pacemaker are deployed in both of the network nodes. The number of network +nodes can be 2 or more. It depends on your cluster. HAProxy in Node 1 is the master and the VIP is in +Node 1. Pacemaker monitors the liveness of HAProxy. + + +.. figure:: images_network_nodes/LBaaS_failure.png + :alt: Failure of LBaaS agents + :figclass: align-center + + Fig 28. Failure of LBaaS agents + + +As shown in Fig 28 when HAProxy in Node1 falls down which can be caused by software failure or hardware +failure, Pacemaker will fail over HAProxy and the VIP to Node 2. + +Note that the default plug-in HAProxy only supports TCP and HTTP. + +No native feature is available to make Metadata agent highly available. At this time, the Active/Passive +solution exists to run the neutron metadata agent in failover mode with Pacemaker. The deployment and +failover procedure can be the same as the case of LBaaS. + diff --git a/UseCases/images_network_nodes/DHCP_deployment.png b/UseCases/images_network_nodes/DHCP_deployment.png new file mode 100755 index 0000000..90bb740 Binary files /dev/null and b/UseCases/images_network_nodes/DHCP_deployment.png differ diff --git a/UseCases/images_network_nodes/DPCH_failure.png b/UseCases/images_network_nodes/DPCH_failure.png new file mode 100755 index 0000000..07a51f8 Binary files /dev/null and b/UseCases/images_network_nodes/DPCH_failure.png differ diff --git a/UseCases/images_network_nodes/L3_deployment.png b/UseCases/images_network_nodes/L3_deployment.png new file mode 100755 index 0000000..ff573b6 Binary files /dev/null and b/UseCases/images_network_nodes/L3_deployment.png differ diff --git a/UseCases/images_network_nodes/L3_failure.png b/UseCases/images_network_nodes/L3_failure.png new file mode 100755 index 0000000..57485ad Binary files /dev/null and b/UseCases/images_network_nodes/L3_failure.png differ diff --git a/UseCases/images_network_nodes/L3_ha_principle.png b/UseCases/images_network_nodes/L3_ha_principle.png new file mode 100755 index 0000000..59a3161 Binary files /dev/null and b/UseCases/images_network_nodes/L3_ha_principle.png differ diff --git a/UseCases/images_network_nodes/L3_ha_workflow.png b/UseCases/images_network_nodes/L3_ha_workflow.png new file mode 100755 index 0000000..d923f4f Binary files /dev/null and b/UseCases/images_network_nodes/L3_ha_workflow.png differ diff --git a/UseCases/images_network_nodes/LBaaS_deployment.png b/UseCases/images_network_nodes/LBaaS_deployment.png new file mode 100755 index 0000000..d4e5929 Binary files /dev/null and b/UseCases/images_network_nodes/LBaaS_deployment.png differ diff --git a/UseCases/images_network_nodes/LBaaS_failure.png b/UseCases/images_network_nodes/LBaaS_failure.png new file mode 100755 index 0000000..5262fd0 Binary files /dev/null and b/UseCases/images_network_nodes/LBaaS_failure.png differ diff --git a/UseCases/images_network_nodes/Network_nodes_deployment.png b/UseCases/images_network_nodes/Network_nodes_deployment.png new file mode 100755 index 0000000..bb0f3db Binary files /dev/null and b/UseCases/images_network_nodes/Network_nodes_deployment.png differ -- cgit 1.2.3-korg