summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorfuqiao <fuqiao@chinamobile.com>2017-02-13 10:44:55 +0800
committerfuqiao <fuqiao@chinamobile.com>2017-02-13 10:44:55 +0800
commit4260b10e3a7c7780385a61d6d708a9480d208a1a (patch)
treea61dd18b0273932bb1f6d347022c2829399be021 /docs
parent593ef4f91ee0dac6cde4101b75174d88d0a01b28 (diff)
Danube MS6 adding document directives
add overview doc and design docs JIRA: HA-28 Change-Id: I890d2056a0fe61ca6aa1297210a1028d38d1514d Signed-off-by: fuqiao@chinamobile.com
Diffstat (limited to 'docs')
-rw-r--r--docs/development/design/OPNFV_HA_Guest_APIs-Base-Messaging-Layer.rst6
-rw-r--r--docs/development/design/OPNFV_HA_Guest_APIs-Server-Group-Messaging_HLD.rst6
-rw-r--r--docs/development/design/index.rst (renamed from docs/scenarios/index.rst)7
-rw-r--r--docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst6
-rw-r--r--docs/development/overview/index.rst (renamed from docs/userguide/index.rst)6
-rw-r--r--docs/scenarios/GAP_Analysis_Colorado.rst278
-rw-r--r--docs/scenarios/GAP_Analysis_Colorado.rst.bak278
-rw-r--r--docs/userguide/Deployment_Guideline.pdfbin762547 -> 0 bytes
-rw-r--r--docs/userguide/Deployment_Guideline.rst452
-rw-r--r--docs/userguide/Deployment_Guideline.rst.bak452
-rw-r--r--docs/userguide/HA_ARNO.pngbin308412 -> 0 bytes
-rw-r--r--docs/userguide/HA_Hypervisor.pngbin160501 -> 0 bytes
-rw-r--r--docs/userguide/HA_VM.pngbin133579 -> 0 bytes
-rw-r--r--docs/userguide/HA_VNF.pngbin131539 -> 0 bytes
-rw-r--r--docs/userguide/HA_control.pngbin356056 -> 0 bytes
-rw-r--r--docs/userguide/Overview.pngbin347799 -> 0 bytes
-rw-r--r--docs/userguide/topology_control_compute.pngbin321043 -> 0 bytes
-rw-r--r--docs/userguide/topology_control_compute_network.pngbin299765 -> 0 bytes
-rw-r--r--docs/userguide/topology_control_compute_network_storage.pngbin364315 -> 0 bytes
19 files changed, 25 insertions, 1466 deletions
diff --git a/docs/development/design/OPNFV_HA_Guest_APIs-Base-Messaging-Layer.rst b/docs/development/design/OPNFV_HA_Guest_APIs-Base-Messaging-Layer.rst
new file mode 100644
index 0000000..ef82456
--- /dev/null
+++ b/docs/development/design/OPNFV_HA_Guest_APIs-Base-Messaging-Layer.rst
@@ -0,0 +1,6 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) <optionally add copywriters name>
+=========================================================
+OPNFV HA Guest APIs -- Base Host-to-Guest Messaging Layer
+========================================================= \ No newline at end of file
diff --git a/docs/development/design/OPNFV_HA_Guest_APIs-Server-Group-Messaging_HLD.rst b/docs/development/design/OPNFV_HA_Guest_APIs-Server-Group-Messaging_HLD.rst
new file mode 100644
index 0000000..59ca24e
--- /dev/null
+++ b/docs/development/design/OPNFV_HA_Guest_APIs-Server-Group-Messaging_HLD.rst
@@ -0,0 +1,6 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) <optionally add copywriters name>
+==============================================
+OPNFV HA Guest APIs -- Server Group Messaging
+============================================== \ No newline at end of file
diff --git a/docs/scenarios/index.rst b/docs/development/design/index.rst
index dec46c7..e2d3241 100644
--- a/docs/scenarios/index.rst
+++ b/docs/development/design/index.rst
@@ -3,11 +3,12 @@
.. (c) <optionally add copywriters name>
*********************************
-Gap Analysis of High Availability
+OPNFV HA Guest APIs
*********************************
.. toctree::
:numbered:
- :maxdepth: 4
+ :maxdepth: 2
- GAP_Analysis_Colorado.rst
+ OPNFV_HA_Guest_APIs-Base-Messaging-Layer.rst
+ OPNFV_HA_Guest_APIs-server-Group-messaging_HLD.rst
diff --git a/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst b/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst
new file mode 100644
index 0000000..92ff964
--- /dev/null
+++ b/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst
@@ -0,0 +1,6 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) <optionally add copywriters name>
+====================================
+OPNFV HA Guest APIs -- Overview
+==================================== \ No newline at end of file
diff --git a/docs/userguide/index.rst b/docs/development/overview/index.rst
index 2938b15..71bdbf4 100644
--- a/docs/userguide/index.rst
+++ b/docs/development/overview/index.rst
@@ -3,11 +3,11 @@
.. (c) <optionally add copywriters name>
*********************************
-HA Deployment Framework Guideline
+OPNFV HA Guest APIs -- Overview
*********************************
.. toctree::
:numbered:
- :maxdepth: 4
+ :maxdepth: 2
- Deployment_Guideline.rst
+ OPNFV_HA_Guest_APIs-Overview_HLD.rst
diff --git a/docs/scenarios/GAP_Analysis_Colorado.rst b/docs/scenarios/GAP_Analysis_Colorado.rst
deleted file mode 100644
index 4fefc09..0000000
--- a/docs/scenarios/GAP_Analysis_Colorado.rst
+++ /dev/null
@@ -1,278 +0,0 @@
-Introduction:
-^^^^^^^^^^^^^
-
-During the Colorado release the OPNFV availability team has reviewed a number of gaps
-in support for high availability in various areas of OPNFV. The focus and goal was
-to find gaps and work with the various open source communities( OpenStack as an
-example ) to develop solutions and blueprints. This would enhance the overall
-system availability and reliability of OPNFV going forward. We also worked with
-the OPNFV Doctor team to ensure our activities were coordinated. In the next
-releases of OPNFV the availability team will update the status of open gaps and
-continue to look for additional gaps.
-
-Summary of findings:
-^^^^^^^^^^^^^^^^^^^^
-
-1. Publish health status of compute node - this gap is now closed through and
-OpenStack blueprint in Mitaka
-
-2. Health status of compute node - some good work underway in OpenStack and with
-the Doctor team, we will continue to monitor this work.
-
-3. Store consoleauth tokens to the database - this gap can be address through
-changing OpenStack configurations
-
-4. Active/Active HA of cinder-volume - active work underway in Newton, we will
-monitor closely
-
-5. Cinder volume multi-attachment - this work has been completed in OpenStack -
-this gap is now closed
-
-6. Add HA tests into Fuel - the Availability team has been working with the
-Yardstick team to create additional test case for the Colorado release. Some of
-these test cases would be good additions to installers like Fuel.
-
-Detailed explanation of the gaps and findings:
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-GAP 1: Publish the health status of compute node
-================================================
-
-* Type: 'reliability'
-* Description:
-
- Current compute node status is only kept within nova. However, NFVO and VNFM
- may also need these information. For example, NFVO may trigger scale up/down
- based on the status. VNFM may trigger evacuation. In the meantime, in the
- high availability scenarios, VNFM may need the host status info from the VIM
- so that it can figure out what the failure exactly located. Therefore, these
- info need to be published outside to the NFVO and VNFM.
-
- + Desired state
-
- - Be able to have the health status of compute nodes published.
-
- + Current behaviour
-
- - Nova queries the ServiceGroup API to get the node liveness information.
-
- + Gap
-
-- Currently Service Group is keeping the health status of compute nodes internal
-- within nova, could have had those status published to NFV MANO plane.
-
-Findings:
-
-BP from the OPNFV Doctor team has covered this GAP. Add notification for service
-status change.
-
-Status: Merged (Mitaka release)
-
- + Owner: Balazs
-
- + BP: https://blueprints.launchpad.net/nova/+spec/service-status-notification
-
- + Spec: https://review.openstack.org/182350
-
- + Code: https://review.openstack.org/#/c/245678/
-
- + Merged Jan 2016 - Mitaka
-
-GAP 2: Health status of compute node
-====================================
-
-* Type: 'reliability'
-* Description:
-
- + Desired state:
-
- - Provide the health status of compute nodes.
-
- + Current Behaviour
-
- - Currently , while performing some actions like evacuation, Nova is
- checking for the compute service. If the service is down,it is assumed
- the host is down. This is not exactly true, since there is a possibility
- to only have compute service down, while all VMs that are running on the
- host, are actually up. There is no way to distinguish between two really
- different things: host status and nova-compute status, which is deployed
- on the host.
- - Also, provided host information by API and commands, are service centric,
- i.e."nova host-list" is just another wrapper for "nova service-list" with
- different format (in fact "service-list" is a super set to "host-list").
-
-
- + Gap
-
- - Not all the health information of compute nodes can be provided. Seems like
- nova is treating *host* term equally to *compute-host*, which might be misleading.
- Such situations can be error prone for the case where there is a need to perform
- host evacuation.
-
-
-Related BP:
-
-Pacemaker and Corosync can provide info about the host. Therefore, there is
-requirement to have nova support the pacemaker service group driver. There could
-be another option by adding tooz servicegroup driver to nova, and then have to
-support corosync driver.
-
- + https://blueprints.launchpad.net/nova/+spec/tooz-for-service-groups
-
-Doctor team is not working on this blueprint
-
-NOTE: This bp is active. A suggestion is to adopt this bp and add a corosync
-driver to tooz. Could be a solution.
-
-We should keep following this bp, when it finished, see if we could add a
-corosync driver for tooz to close this gap.
-
-Here are the currently supported driver in tooz.
-https://github.com/openstack/tooz/blob/master/doc/source/drivers.rst Meanwhile,
-we should also look into the doctor project and see if this could be solved.
-
-This work is still underway, but, doesn't directly map to the gap that it is
-identified above. Doctor team looking to get faster updates on node status and
-failure status - these are other blueprints. These are good problems to solve.
-
-GAP 3: Store consoleauth tokens to the database
-===============================================
-
-* Type: 'performance'
-* Description:
-
-+ Desired state
-
- - Change the consoleauth service to store the tokens in the databaseand, optionally,
- cache them in memory as it does now for fast access.
-
-+ Current State
-
- - Currently the consoleauth service is storing the tokens and theconnection data
- only in memory. This behavior makes impossible to have multipleinstances of this
- service in a cluster as there is no way for one of theisntances to know the tokens
- issued by the other.
-
- - The consoleauth service can use a memcached server to store those tokens,but again,
- if we want to share them among different instances of it we would berelying in one
- memcached server which makes this solution unsuitable for a highly available
- architecture where we should be able to replicate all ofthe services in our cluster.
-
-+ Gap
-
- - The consoleauth service is storing the tokens and the connection data only in memory.
- This behavior makes impossible to have multiple instances of this service in a cluster
- as there is no way for one of the instances to know the tokens issued by the other.
-
-* Related BP
-
- + https://blueprints.launchpad.net/nova/+spec/consoleauth-tokens-in-db
-
- The advise in the blueprint is to use memcached as a backend. Looking to the
- documentation memcached is not able to replicate data, so this is not a
- complete solution. But maybe redis (http://redis.io/) is a suitable backend
- to store the tokens that survive node failures. This blueprint is not
- directly needed for this gap.
-
-Findings:
-
-This bp has been rejected since the community feedback is that A/A can be
-supported by memcacheD. The usecase for this bp is not quite clear, since when
-the consoleauth service is done and the token is lost, the other service can
-retrieve the token again after it recovers. Can be accomplished through a
-different configuration set up for OpenStack. Therefore not a gap.
-Recommendation of the team is to verify the redis approach.
-
-
-GAP 4: Active/Active HA of cinder-volume
-========================================
-
-* Type: 'reliability/scalability'
-
-* Description:
-
- + Desired State:
-
- - Cinder-volume can run in an active/active configuration.
-
- + Current State:
-
- - Only one cinder-volume instance can be active. Failover to be handledby
- external mechanism such as pacemaker/corosync.
-
- + Gap
-
- - Cinder-volume doesn't supprt active/active configuration.
-
-* Related BP
-
- + https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support
-
-* Findings:
-
- + This blueprint underway for Newton - as of July 6, 2016 great progress has
- been made, we will continue to monitor the progress.
-
-GAP 5: Cinder volume multi-attachment
-=====================================
-
-* Type: 'reliability'
-* Description:
-
- + Desired State
-
- - Cinder volumes can be attached to multiple VMs at the same time. So that
- active/standby stateful VNFs can share the same Cinder volume.
-
- + Current State
-
- - Cinder volumes can only be attached to one VM at a time.
-
- + Gap
-
- - Nova and cinder do not allow for multiple simultaneous attachments.
-
-* Related BP
-
- + https://blueprints.launchpad.net/openstack/?searchtext=multi-attach-volume
-
-* Findings
-
- + Multi-attach volume is still WIP in OpenStack. There is coordination work required with Nova.
- + At risk for Newton
- + Recommend adding a Yardstick test case.
-
-General comment for the next release. Remote volume replication is another
-important project for storage HA.
-The HA team will monitor this multi-blueprint activity that will span multiple
-OpenStack releases. The blueprints aren't approved yet and there dependencies
-on generic-volume-group.
-
-
-
-GAP 6: HA tests improvements in fuel
-====================================
-
-* Type: 'robustness'
-* Description:
-
- + Desired State
- - Increased test coverage for HA during install
- + Current State
- - A few test cases are available
-
- * Related BP
-
- - https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
- - Tie in with the test plans we have discussed previously.
- - Look at Yardstick tests that could be proposed back to Openstack.
- - Discussions planned with Yardstick team to engage with Openstack community to enhance Fuel or Tempest as appropriate.
-
-
-Next Steps:
-^^^^^^^^^^^
-
-The six gaps above demonstrate that on going progress is being made in various
-OPNFV and OpenStack communities. The OPNFV-HA team will work to suggest
-blueprints for the next OpenStack Summit to help continue the progress of high
-availability in the community.
diff --git a/docs/scenarios/GAP_Analysis_Colorado.rst.bak b/docs/scenarios/GAP_Analysis_Colorado.rst.bak
deleted file mode 100644
index b6b7313..0000000
--- a/docs/scenarios/GAP_Analysis_Colorado.rst.bak
+++ /dev/null
@@ -1,278 +0,0 @@
-Introduction:
-^^^^^^^^^^^^^
-
-During the Colorado release the OPNFV availability team has reviewed a number of gaps
-in support for high availability in various areas of OPNFV. The focus and goal was
-to find gaps and work with the various open source communities( OpenStack as an
-example ) to develop solutions and blueprints. This would enhance the overall
-system availability and reliability of OPNFV going forward. We also worked with
-the OPNFV Doctor team to ensure our activities were coordinated. In the next
-releases of OPNFV the availability team will update the status of open gaps and
-continue to look for additional gaps.
-
-Summary of findings:
-^^^^^^^^^^^^^^^^^^^^
-
-1. Publish health status of compute node - this gap is now closed through and
-OpenStack blueprint in Mitaka
-
-2. Health status of compute node - some good work underway in OpenStack and with
-the Doctor team, we will continue to monitor this work.
-
-3. Store consoleauth tokens to the database - this gap can be address through
-changing OpenStack configurations
-
-4. Active/Active HA of cinder-volume - active work underway in Newton, we will
-monitor closely
-
-5. Cinder volume multi-attachment - this work has been completed in OpenStack -
-this gap is now closed
-
-6. Add HA tests into Fuel - the Availability team has been working with the
-Yardstick team to create additional test case for the Colorado release. Some of
-these test cases would be good additions to installers like Fuel.
-
-Detailed explanation of the gaps and findings:
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-GAP 1: Publish the health status of compute node
-================================================
-
-* Type: 'reliability'
-* Description:
-
- Current compute node status is only kept within nova. However, NFVO and VNFM
- may also need these information. For example, NFVO may trigger scale up/down
- based on the status. VNFM may trigger evacuation. In the meantime, in the
- high availability scenarios, VNFM may need the host status info from the VIM
- so that it can figure out what the failure exactly located. Therefore, these
- info need to be published outside to the NFVO and VNFM.
-
- + Desired state
-
- - Be able to have the health status of compute nodes published.
-
- + Current behaviour
-
- - Nova queries the ServiceGroup API to get the node liveness information.
-
- + Gap
-
-- Currently Service Group is keeping the health status of compute nodes internal
-- within nova, could have had those status published to NFV MANO plane.
-
-Findings:
-
-BP from the OPNFV Doctor team has covered this GAP. Add notification for service
-status change.
-
-Status: Merged (Mitaka release)
-
- + Owner: Balazs
-
- + BP: https://blueprints.launchpad.net/nova/+spec/service-status-notification
-
- + Spec: https://review.openstack.org/182350
-
- + Code: https://review.openstack.org/#/c/245678/
-
- + Merged Jan 2016 - Mitaka
-
-GAP 2: Health status of compute node
-====================================
-
-* Type: 'reliability'
-* Description:
-
- + Desired state:
-
- - Provide the health status of compute nodes.
-
- + Current Behaviour
-
- - Currently , while performing some actions like evacuation, Nova is
- checking for the compute service. If the service is down,it is assumed
- the host is down. This is not exactly true, since there is a possibility
- to only have compute service down, while all VMs that are running on the
- host, are actually up. There is no way to distinguish between two really
- different things: host status and nova-compute status, which is deployed
- on the host.
- - Also, provided host information by API and commands, are service centric,
- i.e."nova host-list" is just another wrapper for "nova service-list" with
- different format (in fact "service-list" is a super set to "host-list").
-
-
- + Gap
-
- - Not all the health information of compute nodes can be provided. Seems like
- nova is treating *host* term equally to *compute-host*, which might be misleading.
- Such situations can be error prone for the case where there is a need to perform
- host evacuation.
-
-
-Related BP:
-
-Pacemaker and Corosync can provide info about the host. Therefore, there is
-requirement to have nova support the pacemaker service group driver. There could
-be another option by adding tooz servicegroup driver to nova, and then have to
-support corosync driver.
-
- + https://blueprints.launchpad.net/nova/+spec/tooz-for-service-groups
-
-Doctor team is not working on this blueprint
-
-NOTE: This bp is active. A suggestion is to adopt this bp and add a corosync
-driver to tooz. Could be a solution.
-
-We should keep following this bp, when it finished, see if we could add a
-corosync driver for tooz to close this gap.
-
-Here are the currently supported driver in tooz.
-https://github.com/openstack/tooz/blob/master/doc/source/drivers.rst Meanwhile,
-we should also look into the doctor project and see if this could be solved.
-
-This work is still underway, but, doesn't directly map to the gap that it is
-identified above. Doctor team looking to get faster updates on node status and
-failure status - these are other blueprints. These are good problems to solve.
-
-GAP 3: Store consoleauth tokens to the database
-===============================================
-
-* Type: 'performance'
-* Description:
-
-+ Desired state
-
- - Change the consoleauth service to store the tokens in the databaseand, optionally,
- cache them in memory as it does now for fast access.
-
-+ Current State
-
- - Currently the consoleauth service is storing the tokens and theconnection data
- only in memory. This behavior makes impossible to have multipleinstances of this
- service in a cluster as there is no way for one of theisntances to know the tokens
- issued by the other.
-
- - The consoleauth service can use a memcached server to store those tokens,but again,
- if we want to share them among different instances of it we would berelying in one
- memcached server which makes this solution unsuitable for a highly available
- architecture where we should be able to replicate all ofthe services in our cluster.
-
-+ Gap
-
- - The consoleauth service is storing the tokens and the connection data only in memory.
- This behavior makes impossible to have multiple instances of this service in a cluster
- as there is no way for one of the instances to know the tokens issued by the other.
-
-* Related BP
-
- + https://blueprints.launchpad.net/nova/+spec/consoleauth-tokens-in-db
-
- The advise in the blueprint is to use memcached as a backend. Looking to the
- documentation memcached is not able to replicate data, so this is not a
- complete solution. But maybe redis (http://redis.io/) is a suitable backend
- to store the tokens that survive node failures. This blueprint is not
- directly needed for this gap.
-
-Findings:
-
-This bp has been rejected since the community feedback is that A/A can be
-supported by memcacheD. The usecase for this bp is not quite clear, since when
-the consoleauth service is done and the token is lost, the other service can
-retrieve the token again after it recovers. Can be accomplished through a
-different configuration set up for OpenStack. Therefore not a gap.
-Recommendation of the team is to verify the redis approach.
-
-
-GAP 4: Active/Active HA of cinder-volume
-========================================
-
-* Type: 'reliability/scalability'
-
-* Description:
-
- + Desired State:
-
- - Cinder-volume can run in an active/active configuration.
-
- + Current State:
-
- - Only one cinder-volume instance can be active. Failover to be handledby
- external mechanism such as pacemaker/corosync.
-
- + Gap
-
- - Cinder-volume doesn't supprt active/active configuration.
-
-* Related BP
-
- + https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support
-
-* Findings:
-
- + This blueprint underway for Newton - as of July 6, 2016 great progress has
- been made, we will continue to monitor the progress.
-
-GAP 5: Cinder volume multi-attachment
-=====================================
-
-* Type: 'reliability'
-* Description:
-
- + Desired State
-
- - Cinder volumes can be attached to multiple VMs at the same time. So that
- active/standby stateful VNFs can share the same Cinder volume.
-
- + Current State
-
- - Cinder volumes can only be attached to one VM at a time.
-
- + Gap
-
- - Nova and cinder do not allow for multiple simultaneous attachments.
-
-* Related BP
-
- + https://blueprints.launchpad.net/openstack/?searchtext=multi-attach-volume
-
-* Findings
-
- + Multi-attach volume is still WIP in OpenStack. There is coordination work required with Nova.
- + At risk for Newton
- + Recommend adding a Yardstick test case.
-
-General comment for the next release. Remote volume replication is another
-important project for storage HA.
-The HA team will monitor this multi-blueprint activity that will span multiple
-OpenStack releases. The blueprints aren't approved yet and there dependencies
-on generic-volume-group.
-
-
-
-GAP 6: HA tests improvements in fuel
-====================================
-
-* Type: 'robustness'
-* Description:
-
- + Desired State
- - Increased test coverage for HA during install
- + Current State
- - A few test cases are available
-
- * Related BP
-
- - https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
- - Tie in with the test plans we have discussed previously.
- - Look at Yardstick tests that could be proposed back to Openstack.
- - Discussions planned with Yardstick team to engage with Openstack community to enhance Fuel or Tempest as appropriate.
-
-
-Next Steps:
-^^^^^^^^^^^
-
-The six gaps above demonstrate that on going progress is being made in various
-OPNFV and OpenStack communities. The OPNFV-HA team will work to suggest
-blueprints for the next OpenStack Summit to help continue the progress of high
-availability in the community.
diff --git a/docs/userguide/Deployment_Guideline.pdf b/docs/userguide/Deployment_Guideline.pdf
deleted file mode 100644
index 3e32429..0000000
--- a/docs/userguide/Deployment_Guideline.pdf
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/Deployment_Guideline.rst b/docs/userguide/Deployment_Guideline.rst
deleted file mode 100644
index 7d3b018..0000000
--- a/docs/userguide/Deployment_Guideline.rst
+++ /dev/null
@@ -1,452 +0,0 @@
-This document will provide an overall framework for the high availability
-deployment of NFV system. It will also continiously update to include HA
-deployment guidelines and suggestions for the releases of OPNFV.
-
-*********************************************************************
-Overview of High Available Deployment of OPNFV
-*********************************************************************
-
-In this section, we would like to discuss the overall HA deployment of NFV system.
-Different modules, such as hardware,VIM,VMs and etc, will be included, and HA
-deployment of each single module will be discussed. However, not all of these HA
-schemes should be deployed in on system at the same time. For the HA deployment of
-a single system, we should consider the tradeoff between high availability and the
-cost and resource to leverage.
-
-
-Architecture of HA deployment
-==================================================================
-
-This section intends to introduce the different modules we should consider
-when talking about HA deployment. These moduels include the Hardware
-(compute, network, storage hardware), the VIM, the hypervisor, VMs and VNFs.
-HA schemes for these different moduels should all be considered when deploying
-an NFV system. And the schemes should be coordinated so that the system can make
-sure to react in its best way when facing failure.
-
-The following picture shows the the architecture of HA deployment based on the
-framework from ETSI NFV ISG.
-
-.. figure:: Overview.png
- :alt: Architecture for HA Deployment
- :figclass: align-center
-
- Fig 1. Architecture of HA Deployment based on the Framework of ETSI NFV ISG
-
-HA deployment topology
-==================================================================
-
-This section will introduce the HA deployment topology for an NFV system.
-The topology explained in this section is to support the software
-cluster of OPNFV platform, which we will discuss in detail in section1.3.
-
-The typical topology of deployment OPNFV platform should include at
-least the controller nodes, and the compute nodes. Depend on the request of
-the users, standalone network nodes or storage nodes can be added into this
-topology. The simplest HA deployment of OPNFV only include the control nodes. Further
-HA schemes can be provided to the compute nodes, the network nodes and the storage
-nodes, according to the requirement of services deployed on the NFV system.
-Figure 2 shows the deployment topology, in which the controller nodes are all in
-a cluster, and the compute nodes can be in another cluster.
-
-The control node cluster here is to provide HA for the controller services, so
-that the services on the control node can successfully failover when failure
-happens and the service can continue. The cluster service should also provide
-automatic recovery for the control nodes. For OPNFV, the control node cluster
-should include at least 3 nodes, and should be an odd number if the cluster
-management system use quorum. This may change if we use different cluster
-management schemes though.
-
-The compute node clusters is responsible for providing HA for the services running
-on the compute nodes. These services may include agents for openstack, host os,
-hypervisors. Such cluster is responsible for the recovery and repair
-of the services. However, compute node cluster will certainly bring complexity to
-the whole system, and would increase the cost. There could be multiple solutions
-for the compute cluster, e.g., senlin from openstack.
-
-There could be other HA solutions for the compute nodes except for cluster. Combination
-of congress and doctor can be one of them, in which doctor provides quickly notification
-of failure to the VIM, and congress provides proper recovery procedure. In such scheme,
-the compute nodes are not recovered by the cluster scheme, but recovered under the
-supervision of VIM.
-
-.. figure:: topology_control_compute.png
- :alt: HA Deployment Topology of Control Nodes and Compute Nodes
- :figclass: align-center
-
- Fig 2. HA Deployment Topology of Control Nodes and Compute Nodes
-
-When the cloud is supporting heavy network traffic, which is often the case for the data
-plane services in the Telecom scenarios, it is necessary to deploy standalone network
-nodes for openstack, so that the large amont of traffic switching and routing will not
-bring extra load to the controller nodes. In figure 3, we add network nodes into the
-topology and shows how to deploy it in a high available way. In this figure, the
-network nodes are deployed in a cluster. The cluster will provide HA for the services
-runing on the network nodes. Such cluster scheme could be the same with that of the
-compute nodes.
-
-On thing to be notify is that all hosts in the NFV system should have at least two NICs
-that are bonded via LACP.
-
-.. figure:: topology_control_compute_network.png
- :alt: HA Deployment Topology of Control Nodes and Compute Nodes and Network Nodes
- :figclass: align-center
-
- Fig 3. HA Deployment Topology of Control Nodes, Compute Nodes and network Nodes
-
-The HA deployment for storage can be different for all different storage schemes. We will
-discuss the detail of the storage HA deployment in section 1.3.3
-
-Software HA Framework
-==================================================================
-
-In this section, we introduce more details about the HA schemes for a complete NFV system.
-
-Openstack Controller services (Openstack services)
---------------------------------------------------------
-
-For the High Availability of OpenStack Controller nodes, Pacemaker and Corosync are
-often used. The following texts are refering from the HA guideline of OpenStack, which
-gives an example of solution of HA deployment.(http://docs.openstack.org/ha-guide/)
-
-At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery
-of inter-related services across a set of machines. For OpenStack Controller nodes, a cluster management system,
-such as Pacemaker, is recommended to use to provide the following metrics.
-
-1, Awareness of other applications in the stack
-
-2, Awareness of instances on other machines
-
-3, A shared implementation and calculation of quorum.
-
-4, Data integrity through fencing (a non-responsive process does not imply it is not doing anything)
-
-5, Automated recovery of failed instances
-
-Figure 4 shows the details of HA schemes for Openstack controller nodes with Pacemaker.
-
-.. figure:: HA_control.png
- :alt: HA Deployment of Openstack Control Nodes based on Pacemaker
- :figclass: align-center
-
- Fig 4. HA Deployment of Openstack Control Nodes based on Pacemaker
-
-High availability of all stateless services are provided by pacemaker and HAProxy.
-
-Pacemaker cluster stack is the state-of-the-art high availability and load
-balancing stack for the Linux platform. Pacemaker is useful to make OpenStack
-infrastructure highly available. Also, it is storage and application-agnostic,
-and in no way specific to OpenStack.
-
-Pacemaker relies on the Corosync messaging layer for reliable cluster
-communications. Corosync implements the Totem single-ring ordering and
-membership protocol. It also provides UDP and InfiniBand based messaging,
-quorum, and cluster membership to Pacemaker.
-
-Pacemaker does not inherently (need or want to) understand the applications
-it manages. Instead, it relies on resource agents (RAs), scripts that
-encapsulate the knowledge of how to start, stop, and check the health
-of each application managed by the cluster.These agents must conform
-to one of the OCF, SysV Init, Upstart, or Systemd standards.Pacemaker
-ships with a large set of OCF agents (such as those managing MySQL
-databases, virtual IP addresses, and RabbitMQ), but can also use any
-agents already installed on your system and can be extended with your
-own (see the developer guide).
-
-After deployment of Pacemaker, HAProxy is used to provide VIP for all the
-OpenStack services and act as load balancer. HAProxy provides a fast and
-reliable HTTP reverse proxy and load balancer for TCP or HTTP applications.
-It is particularly suited for web crawling under very high loads while
-needing persistence or Layer 7 processing. It realistically supports tens
-of thousands of connections with recent hardware.
-
-Each instance of HAProxy configures its front end to accept connections
-only from the virtual IP (VIP) address and to terminate them as a list
-of all instances of the corresponding service under load balancing, such
-as any OpenStack API service. This makes the instances of HAProxy act
-independently and fail over transparently together with the network endpoints
-(VIP addresses) failover and, therefore, shares the same SLA.
-
-We can alternatively use a commercial load balancer, which is a hardware or
-software. A hardware load balancer generally has good performance.
-
-Galera Cluster, or other database cluster service, should also be deployed
-to provide data replication and synchronization between data base. Galera
-Cluster is a synchronous multi-master database cluster, based on MySQL and
-the InnoDB storage engine. It is a high-availability service that provides
-high system uptime, no data loss, and scalability for growth. The selection
-of DB also will have potential influence on the behaviour on the application
-code. For instance using Galera Clusterl may give you higher concurrent write
-perfomance but may require a more complex conflict resolution.
-
-We can also achieve high availability for the OpenStack database in many different
-ways, depending on the type of database that we are using. There are three
-implementations of Galera Cluster available:
-
-1, Galera Cluster for MySQL The MySQL reference implementation from Codership;
-
-2, MariaDB Galera Cluster The MariaDB implementation of Galera Cluster, which is
-commonly supported in environments based on Red Hat distributions;
-
-3, Percona XtraDB Cluster The XtraDB implementation of Galera Cluster from Percona.
-
-In addition to Galera Cluster, we can also achieve high availability through other
-database options, such as PostgreSQL, which has its own replication system.
-
-To make the RabbitMQ high available, Rabbit HA queue should be configued, and all
-openstack services should be configurd to use the Rabbit HA queue.
-
-In the meantime, specific schemes should also be provided to avoid single point of
-failure of Pacemaker. And services failed should be automaticly repaired.
-
-Note that the scheme we described above is just one possible scheme for the HA
-deployment of the controller nodes. Other schemes can also be used to provide cluster
-management and monitoring.
-
-SDN controller services
----------------------------------------
-
-SDN controller software is data intensive application. All static and dynamic data has
-one or more duplicates distributed to other physical nodes in cluster. Built-in HA schema
-always be concordant with data distribution and built-in mechanism will select or
-re-select master nodes in cluster. In deployment stage software of SDN controller
-should be deployed to at least two or more physical nodes regardless whether the
-software is deployed inside VM or containner. Dual management network plane should
-be provided for SDN controller cluster to support built-in HA schema.
-
-Storage
-----------------------------------------
-Depending on what storage scheme deployed, different HA schemes should be used. The following
-text are refering from the Mirantis OpenStack reference architecture, which provides suggestions
-on the HA deployment of different storage schemes.
-
-1, Ceph
-
-Ceph implements its own HA. When deploying it, enough controller nodes running the Ceph Monitor
-service to form a quarum, and enough Ceph OSD nodes to satisfy the object replication factor are
-needed.
-
-2, Swift
-
-Swift API relies on the same HAProxy setup with VIP on controller nodes as the other REST
-APIs. For small scale deployment, swift storage and Proxy services can be deployed on the
-controller nodes. However, for a larger production environment, dedicated storage nodes, in
-which two for swift proxy and at least three for swift storage, are needed.
-
-
-
-Host OS and Hypervisor
----------------------------------------
-
-The Host OS and Hypervisor should be supervised and monitored for failure, and should be
-repaired when failure happens. Such supervision can based on a cluster scheme, or can
-just simply use controller to constantly monitor the computer host. Figure 6 shows a
-simplified framework for hypervisor cluster.
-
-When host/hypervisor failure happens, VMs on that host should be evacuated. However,
-such scheme should coordinate with the VM HA scheme, so that when both the host and the
-VM detect the failure, they should know who should take responsibility for the evacuation.
-
-
-.. figure:: HA_Hypervisor.png
- :alt: HA Deployment of Host OS and Hypervisor
- :figclass: align-center
-
- Fig 5. HA Deployment of Host OS and Hypervisor
-
-Virtual Machine (VM)
----------------------------------------
-
-VM should be supervised and monitored for failure, and should be repaired when failure
-happens. We can rely on the hypervisor to monitor the VM failure. Another scheme can be
-used is a cluster for the VM, in which failure of VMs in one cluster can be supervised
-and will be repaired by the cluster manager. Pacemaker and other cluster management
-schemes can be considered for the VM cluster.
-
-In case when VNFs do not have HA schemes, extra HA scheme for VM should be taken into
-consideration. Such approach is kind of best effort for the NFV platform to provide HA
-for the VNF service, and may lead to failure copy between VMs when VNF fails. Since the
-NFVI can hardly know of the service runing in the VNF, it is imporssible for the NFVI
-level to provide overall HA solution for the VNF services. Therefore, even though we
-mention this scheme here, we strongly suggest the VNF should have its own HA schemes.
-
-Figure 6 gives an example for the VM active/standby deployment. In this case, both the
-active VM and the standby VM are deployed with the same VNF image. When failure happens
-to the active VM, the standby VM should take the traffic and replace the active VM. Such
-scheme is the best effort of the NFVI when VNFs do not have HA schemes and would only
-rely on VMs to provide redundancy. However, for stateful VNFs, there should be data copy
-between the active VM and standby VM. In this case, fault for the active VM can also be
-copied to the standby VM, leading to failure of the new active VM.
-
-.. figure:: HA_VM.png
- :alt: VM Active/Standby Deployment
- :figclass: align-center
-
- Fig 6. VM Active/Standby Deployment
-
-Virtual Network Functions (VNF)
----------------------------------------
-
-For telecom services, it is suggested that VNFs should have its own built-in HA schemes
-or HA schemes implemented in VNF Managerhave to provide high available services to
-the customers. HA schemes for the VNFs can based on cluster. In this case, OpenSAF,
-pacemaker and other cluster management services can be used.
-
-HA schemes for the VNFs should be coordinate with the lower layer. For example, it
-should be clear which level will take responsibility for VM restart. A suggested
-schemes could be, the VNF layer should be responsible for the redundancy and failover
-of the VNFs when failure happens. Such failover should take place in quite a short
-time (less then seconds). The repairing procedure will then take place from upper
-layer to lower layer, that is, the VNF layer will first check if the failure is at
-its layer, and should try to repair itself. If it fails to repaire the failure,
-the failure should escalate to lower layers and let the NFVI layer to do the repair
-work. There could also be cases that the NFVI layer has detected the failure and will
-repair it before the escalation. These functions should be complished by the coordination
-of all different component, including the VNFM, VIM, VNFs and NFVI.
-
-In the meantime, the VNFs can take advantage of API the hypervisor can provide to
-them to enhance HA. Such API may include constant health check from the hypervisor,
-affinity/inaffinity deployment support. example about watchdog
-
-Figure 7 gives an example for the VNF HA scheme.
-
-.. figure:: HA_VNF.png
- :alt: HA Deployment of VNFs
- :figclass: align-center
-
- Fig 7. HA Deployment of VNFs
-
-*********************************************************************************
-HA deployment guideline for OPNFV releases
-*********************************************************************************
-
-In this section, we will continiously update the HA deployment guideline for the releases
-of OPNFV.
-
-HA deployment guideline for Arno
-==============================================
-
-Deployment Framework
------------------------------------------------
-
-Figure 8 shows an overall architecture for the HA deployment of ARNO.
-
-.. figure:: HA_ARNO.png
- :alt: HA Deployment of OPNFV ARNO release
- :figclass: align-center
-
- Fig 8. HA Deployment of OPNFV ARNO release
-
-For OPNFV Arno release, HA deployment of Openstack Control Node (Openstack Juno) and ODL
-controller (ODL Helium) is supported. Both deployment tools (fuel and forman)support
-such HA deployment.
-
-For such HA deployment, the following components¡¯ failure is protected
-
-Software:
-* Nova scheduler
-* Nova conductor
-* Cinder scheduler
-* Neutron server
-* Heat engine
-
-Controller hardware:
-* dead server
-* dead switch
-* dead port
-* dead disk
-* full disk
-
-
-HA test result for ARNO
--------------------------------------------------
-
-Two specific High Availability testcases are done on the ARNO release. These test cases
-are collaboratively developed by the High Availability project and the Yardstick project.
-
-Both cases are excuted in the China Mobile's Lab, where ARNO SR1 release is deployed with
-Fuel.
-
-The two testcases respectively test the following two aspects:
-
-1, Controll Node Service HA
-
-In this test, HA of "nova-api" is tested. According to the result, the service can
-successfully failover to the other controller nodes within 2.36s, once failure happens
-at the active node. However, the service can't repair itself automatically. more
-explaination about the repair, other services are not tested yet.
-
-2, Control Node Hardware HA
-
-In this test, HA of the controller node hardware is tested. One of the hardware is
-abnormally shutdown, and the service of "nova-api" is monitored. According to the test
-results, the service can failover to the other controller node within 10.71 secondes.
-However, the failed hardware can't automatically repair itself.
-
-See more details about these test cases in the Yardstick doc of "Test Results for
-yardstick-opnfv-ha"(https://gerrit.opnfv.org/gerrit/#/c/7543/).
-
-From these basic test cases we can see that OPNFV ARNO has integrated with some HA
-schemes in its controller nodes. However, its capability of self repair should be
-enhanced.
-
-HA deployment guideline for Brahmaputra
-==============================================
-In the Brahmaputra release, 4 installers are provided. We will discuss about the HA
-deployment of each installer.
-
-Apex
-----------------------------------------------------
-
-For the installer of Apex, all of the OpenStack services are in HA on all 3 controllers.
-The services are monitored by pacemaker and load balanced by HA Proxy with VIPs. The
-SDN controllers usually only run as a single instance on the first controller with no
-HA scheme.
-
-Database is clustered with galera in an active passive failover via pacemaker and the
-message bus is rabbitHA and the services are managed by pacemaker.
-
-Storage is using ceph, clustered across the control nodes.
-
-In the future, more work is on the way to provide HA for the SDN controller. The Apex
-team has already finished a demo that runs ODL on each controller, load balanced to
-neutron via a VIP + HA Proxy, but is not using pacemaker. Meanwhile, they are also
-working to include ceph storage HA for compute nodes as well.
-
-Compass
----------------------------------------------------------
-TBD
-
-Fuel
--------------------------------------------------------------
-
-At moment Fuel installer support the following HA schemes.
-
-1)Openstackcontrollers: N-way redundant (1,3,5, etc)
-2)OpenDaylight:No redundancy
-3)Cephstorage OSD: N-way redundant (1,3,5, etc)
-4)Networkingattachment redundancy: LAG
-5)NTPredundancy: N-way relays, up to 3 upstream sources
-6)DNSredundancy: N-way relays, up to 3 upstream sources
-7)DHCP:1+1
-
-JOID
----------------------------------------------------------
-
-JOID provides HA based on openstack services. Individual service charms have been
-deployed in a container within a host, and each charms are distributed in a way each
-service which meant for HA will go into container on individual nodes. For example
-keystone service, there are three containers on each control node and VIP has been
-assigned to use by the front end API to use keystone. So in case any of the container
-fails VIP will keep responding to via the other two services. As HA can be maintainer
-with odd units at least one service container is required to response.
-
-
-Reference
-==========
-
-* https://www.rdoproject.org/ha/ha-architecture/
-* http://docs.openstack.org/ha-guide/
-* https://wiki.opnfv.org/display/availability?preview=/2926706/2926714/scenario_analysis_for_high_availability_in_nfv.pdf
-* https://wiki.opnfv.org/display/availability?preview=/2926706/2926708/ha_requirement.pdf
-
diff --git a/docs/userguide/Deployment_Guideline.rst.bak b/docs/userguide/Deployment_Guideline.rst.bak
deleted file mode 100644
index 6b04dc7..0000000
--- a/docs/userguide/Deployment_Guideline.rst.bak
+++ /dev/null
@@ -1,452 +0,0 @@
-This document will provide an overall framework for the high availability
-deployment of NFV system. It will also continiously update to include HA
-deployment guidelines and suggestions for the releases of OPNFV.
-
-*********************************************************************
-Overview of High Available Deployment of OPNFV
-*********************************************************************
-
-In this section, we would like to discuss the overall HA deployment of NFV system.
-Different modules, such as hardware,VIM,VMs and etc, will be included, and HA
-deployment of each single module will be discussed. However, not all of these HA
-schemes should be deployed in on system at the same time. For the HA deployment of
-a single system, we should consider the tradeoff between high availability and the
-cost and resource to leverage.
-
-
-Architecture of HA deployment
-==================================================================
-
-This section intends to introduce the different modules we should consider
-when talking about HA deployment. These moduels include the Hardware
-(compute, network, storage hardware), the VIM, the hypervisor, VMs and VNFs.
-HA schemes for these different moduels should all be considered when deploying
-an NFV system. And the schemes should be coordinated so that the system can make
-sure to react in its best way when facing failure.
-
-The following picture shows the the architecture of HA deployment based on the
-framework from ETSI NFV ISG.
-
-.. figure:: Overview.png
- :alt: Architecture for HA Deployment
- :figclass: align-center
-
- Fig 1. Architecture of HA Deployment based on the Framework of ETSI NFV ISG
-
-HA deployment topology
-==================================================================
-
-This section will introduce the HA deployment topology for an NFV system.
-The topology explained in this section is to support the software
-cluster of OPNFV platform, which we will discuss in detail in section1.3.
-
-The typical topology of deployment OPNFV platform should include at
-least the controller nodes, and the compute nodes. Depend on the request of
-the users, standalone network nodes or storage nodes can be added into this
-topology. The simplest HA deployment of OPNFV only include the control nodes. Further
-HA schemes can be provided to the compute nodes, the network nodes and the storage
-nodes, according to the requirement of services deployed on the NFV system.
-Figure 2 shows the deployment topology, in which the controller nodes are all in
-a cluster, and the compute nodes can be in another cluster.
-
-The control node cluster here is to provide HA for the controller services, so
-that the services on the control node can successfully failover when failure
-happens and the service can continue. The cluster service should also provide
-automatic recovery for the control nodes. For OPNFV, the control node cluster
-should include at least 3 nodes, and should be an odd number if the cluster
-management system use quorum. This may change if we use different cluster
-management schemes though.
-
-The compute node clusters is responsible for providing HA for the services running
-on the compute nodes. These services may include agents for openstack, host os,
-hypervisors. Such cluster is responsible for the recovery and repair
-of the services. However, compute node cluster will certainly bring complexity to
-the whole system, and would increase the cost. There could be multiple solutions
-for the compute cluster, e.g., senlin from openstack.
-
-There could be other HA solutions for the compute nodes except for cluster. Combination
-of congress and doctor can be one of them, in which doctor provides quickly notification
-of failure to the VIM, and congress provides proper recovery procedure. In such scheme,
-the compute nodes are not recovered by the cluster scheme, but recovered under the
-supervision of VIM.
-
-.. figure:: topology_control_compute.png
- :alt: HA Deployment Topology of Control Nodes and Compute Nodes
- :figclass: align-center
-
- Fig 2. HA Deployment Topology of Control Nodes and Compute Nodes
-
-When the cloud is supporting heavy network traffic, which is often the case for the data
-plane services in the Telecom scenarios, it is necessary to deploy standalone network
-nodes for openstack, so that the large amont of traffic switching and routing will not
-bring extra load to the controller nodes. In figure 3, we add network nodes into the
-topology and shows how to deploy it in a high available way. In this figure, the
-network nodes are deployed in a cluster. The cluster will provide HA for the services
-runing on the network nodes. Such cluster scheme could be the same with that of the
-compute nodes.
-
-On thing to be notify is that all hosts in the NFV system should have at least two NICs
-that are bonded via LACP.
-
-.. figure:: topology_control_compute_network.png
- :alt: HA Deployment Topology of Control Nodes and Compute Nodes and Network Nodes
- :figclass: align-center
-
- Fig 3. HA Deployment Topology of Control Nodes, Compute Nodes and network Nodes
-
-The HA deployment for storage can be different for all different storage schemes. We will
-discuss the detail of the storage HA deployment in section 1.3.3
-
-Software HA Framework
-==================================================================
-
-In this section, we introduce more details about the HA schemes for a complete NFV system.
-
-Openstack Controller services (Openstack services)
---------------------------------------------------------
-
-For the High Availability of OpenStack Controller nodes, Pacemaker and Corosync are
-often used. The following texts are refering from the HA guideline of OpenStack, which
-gives an example of solution of HA deployment.(http://docs.openstack.org/ha-guide/)
-
-At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery
-of inter-related services across a set of machines. For OpenStack Controller nodes, a cluster management system,
-such as Pacemaker, is recommended to use to provide the following metrics.
-
-1, Awareness of other applications in the stack
-
-2, Awareness of instances on other machines
-
-3, A shared implementation and calculation of quorum.
-
-4, Data integrity through fencing (a non-responsive process does not imply it is not doing anything)
-
-5, Automated recovery of failed instances
-
-Figure 4 shows the details of HA schemes for Openstack controller nodes with Pacemaker.
-
-.. figure:: HA_control.png
- :alt: HA Deployment of Openstack Control Nodes based on Pacemaker
- :figclass: align-center
-
- Fig 4. HA Deployment of Openstack Control Nodes based on Pacemaker
-
-High availability of all stateless services are provided by pacemaker and HAProxy.
-
-Pacemaker cluster stack is the state-of-the-art high availability and load
-balancing stack for the Linux platform. Pacemaker is useful to make OpenStack
-infrastructure highly available. Also, it is storage and application-agnostic,
-and in no way specific to OpenStack.
-
-Pacemaker relies on the Corosync messaging layer for reliable cluster
-communications. Corosync implements the Totem single-ring ordering and
-membership protocol. It also provides UDP and InfiniBand based messaging,
-quorum, and cluster membership to Pacemaker.
-
-Pacemaker does not inherently (need or want to) understand the applications
-it manages. Instead, it relies on resource agents (RAs), scripts that
-encapsulate the knowledge of how to start, stop, and check the health
-of each application managed by the cluster.These agents must conform
-to one of the OCF, SysV Init, Upstart, or Systemd standards.Pacemaker
-ships with a large set of OCF agents (such as those managing MySQL
-databases, virtual IP addresses, and RabbitMQ), but can also use any
-agents already installed on your system and can be extended with your
-own (see the developer guide).
-
-After deployment of Pacemaker, HAProxy is used to provide VIP for all the
-OpenStack services and act as load balancer. HAProxy provides a fast and
-reliable HTTP reverse proxy and load balancer for TCP or HTTP applications.
-It is particularly suited for web crawling under very high loads while
-needing persistence or Layer 7 processing. It realistically supports tens
-of thousands of connections with recent hardware.
-
-Each instance of HAProxy configures its front end to accept connections
-only from the virtual IP (VIP) address and to terminate them as a list
-of all instances of the corresponding service under load balancing, such
-as any OpenStack API service. This makes the instances of HAProxy act
-independently and fail over transparently together with the network endpoints
-(VIP addresses) failover and, therefore, shares the same SLA.
-
-We can alternatively use a commercial load balancer, which is a hardware or
-software. A hardware load balancer generally has good performance.
-
-Galera Cluster, or other database cluster service, should also be deployed
-to provide data replication and synchronization between data base. Galera
-Cluster is a synchronous multi-master database cluster, based on MySQL and
-the InnoDB storage engine. It is a high-availability service that provides
-high system uptime, no data loss, and scalability for growth. The selection
-of DB also will have potential influence on the behaviour on the application
-code. For instance using Galera Clusterl may give you higher concurrent write
-perfomance but may require a more complex conflict resolution.
-
-We can also achieve high availability for the OpenStack database in many different
-ways, depending on the type of database that we are using. There are three
-implementations of Galera Cluster available:
-
-1, Galera Cluster for MySQL The MySQL reference implementation from Codership;
-
-2, MariaDB Galera Cluster The MariaDB implementation of Galera Cluster, which is
-commonly supported in environments based on Red Hat distributions;
-
-3, Percona XtraDB Cluster The XtraDB implementation of Galera Cluster from Percona.
-
-In addition to Galera Cluster, we can also achieve high availability through other
-database options, such as PostgreSQL, which has its own replication system.
-
-To make the RabbitMQ high available, Rabbit HA queue should be configued, and all
-openstack services should be configurd to use the Rabbit HA queue.
-
-In the meantime, specific schemes should also be provided to avoid single point of
-failure of Pacemaker. And services failed should be automaticly repaired.
-
-Note that the scheme we described above is just one possible scheme for the HA
-deployment of the controller nodes. Other schemes can also be used to provide cluster
-management and monitoring.
-
-SDN controller services
----------------------------------------
-
-SDN controller software is data intensive application. All static and dynamic data has
-one or more duplicates distributed to other physical nodes in cluster. Built-in HA schema
-always be concordant with data distribution and built-in mechanism will select or
-re-select master nodes in cluster. In deployment stage software of SDN controller
-should be deployed to at least two or more physical nodes regardless whether the
-software is deployed inside VM or containner. Dual management network plane should
-be provided for SDN controller cluster to support built-in HA schema.
-
-Storage
-----------------------------------------
-Depending on what storage scheme deployed, different HA schemes should be used. The following
-text are refering from the Mirantis OpenStack reference architecture, which provides suggestions
-on the HA deployment of different storage schemes.
-
-1, Ceph
-
-Ceph implements its own HA. When deploying it, enough controller nodes running the Ceph Monitor
-service to form a quarum, and enough Ceph OSD nodes to satisfy the object replication factor are
-needed.
-
-2, Swift
-
-Swift API relies on the same HAProxy setup with VIP on controller nodes as the other REST
-APIs. For small scale deployment, swift storage and Proxy services can be deployed on the
-controller nodes. However, for a larger production environment, dedicated storage nodes, in
-which two for swift proxy and at least three for swift storage, are needed.
-
-
-
-Host OS and Hypervisor
----------------------------------------
-
-The Host OS and Hypervisor should be supervised and monitored for failure, and should be
-repaired when failure happens. Such supervision can based on a cluster scheme, or can
-just simply use controller to constantly monitor the computer host. Figure 6 shows a
-simplified framework for hypervisor cluster.
-
-When host/hypervisor failure happens, VMs on that host should be evacuated. However,
-such scheme should coordinate with the VM HA scheme, so that when both the host and the
-VM detect the failure, they should know who should take responsibility for the evacuation.
-
-
-.. figure:: HA_Hypervisor.png
- :alt: HA Deployment of Host OS and Hypervisor
- :figclass: align-center
-
- Fig 5. HA Deployment of Host OS and Hypervisor
-
-Virtual Machine (VM)
----------------------------------------
-
-VM should be supervised and monitored for failure, and should be repaired when failure
-happens. We can rely on the hypervisor to monitor the VM failure. Another scheme can be
-used is a cluster for the VM, in which failure of VMs in one cluster can be supervised
-and will be repaired by the cluster manager. Pacemaker and other cluster management
-schemes can be considered for the VM cluster.
-
-In case when VNFs do not have HA schemes, extra HA scheme for VM should be taken into
-consideration. Such approach is kind of best effort for the NFV platform to provide HA
-for the VNF service, and may lead to failure copy between VMs when VNF fails. Since the
-NFVI can hardly know of the service runing in the VNF, it is imporssible for the NFVI
-level to provide overall HA solution for the VNF services. Therefore, even though we
-mention this scheme here, we strongly suggest the VNF should have its own HA schemes.
-
-Figure 6 gives an example for the VM active/standby deployment. In this case, both the
-active VM and the standby VM are deployed with the same VNF image. When failure happens
-to the active VM, the standby VM should take the traffic and replace the active VM. Such
-scheme is the best effort of the NFVI when VNFs do not have HA schemes and would only
-rely on VMs to provide redundancy. However, for stateful VNFs, there should be data copy
-between the active VM and standby VM. In this case, fault for the active VM can also be
-copied to the standby VM, leading to failure of the new active VM.
-
-.. figure:: images/HA_VM.png
- :alt: VM Active/Standby Deployment
- :figclass: align-center
-
- Fig 6. VM Active/Standby Deployment
-
-Virtual Network Functions (VNF)
----------------------------------------
-
-For telecom services, it is suggested that VNFs should have its own built-in HA schemes
-or HA schemes implemented in VNF Managerhave to provide high available services to
-the customers. HA schemes for the VNFs can based on cluster. In this case, OpenSAF,
-pacemaker and other cluster management services can be used.
-
-HA schemes for the VNFs should be coordinate with the lower layer. For example, it
-should be clear which level will take responsibility for VM restart. A suggested
-schemes could be, the VNF layer should be responsible for the redundancy and failover
-of the VNFs when failure happens. Such failover should take place in quite a short
-time (less then seconds). The repairing procedure will then take place from upper
-layer to lower layer, that is, the VNF layer will first check if the failure is at
-its layer, and should try to repair itself. If it fails to repaire the failure,
-the failure should escalate to lower layers and let the NFVI layer to do the repair
-work. There could also be cases that the NFVI layer has detected the failure and will
-repair it before the escalation. These functions should be complished by the coordination
-of all different component, including the VNFM, VIM, VNFs and NFVI.
-
-In the meantime, the VNFs can take advantage of API the hypervisor can provide to
-them to enhance HA. Such API may include constant health check from the hypervisor,
-affinity/inaffinity deployment support. example about watchdog
-
-Figure 7 gives an example for the VNF HA scheme.
-
-.. figure:: HA_VNF.png
- :alt: HA Deployment of VNFs
- :figclass: align-center
-
- Fig 7. HA Deployment of VNFs
-
-*********************************************************************************
-HA deployment guideline for OPNFV releases
-*********************************************************************************
-
-In this section, we will continiously update the HA deployment guideline for the releases
-of OPNFV.
-
-HA deployment guideline for Arno
-==============================================
-
-Deployment Framework
------------------------------------------------
-
-Figure 8 shows an overall architecture for the HA deployment of ARNO.
-
-.. figure:: HA_ARNO.png
- :alt: HA Deployment of OPNFV ARNO release
- :figclass: align-center
-
- Fig 8. HA Deployment of OPNFV ARNO release
-
-For OPNFV Arno release, HA deployment of Openstack Control Node (Openstack Juno) and ODL
-controller (ODL Helium) is supported. Both deployment tools (fuel and forman)support
-such HA deployment.
-
-For such HA deployment, the following components¡¯ failure is protected
-
-Software:
-* Nova scheduler
-* Nova conductor
-* Cinder scheduler
-* Neutron server
-* Heat engine
-
-Controller hardware:
-* dead server
-* dead switch
-* dead port
-* dead disk
-* full disk
-
-
-HA test result for ARNO
--------------------------------------------------
-
-Two specific High Availability testcases are done on the ARNO release. These test cases
-are collaboratively developed by the High Availability project and the Yardstick project.
-
-Both cases are excuted in the China Mobile's Lab, where ARNO SR1 release is deployed with
-Fuel.
-
-The two testcases respectively test the following two aspects:
-
-1, Controll Node Service HA
-
-In this test, HA of "nova-api" is tested. According to the result, the service can
-successfully failover to the other controller nodes within 2.36s, once failure happens
-at the active node. However, the service can't repair itself automatically. more
-explaination about the repair, other services are not tested yet.
-
-2, Control Node Hardware HA
-
-In this test, HA of the controller node hardware is tested. One of the hardware is
-abnormally shutdown, and the service of "nova-api" is monitored. According to the test
-results, the service can failover to the other controller node within 10.71 secondes.
-However, the failed hardware can't automatically repair itself.
-
-See more details about these test cases in the Yardstick doc of "Test Results for
-yardstick-opnfv-ha"(https://gerrit.opnfv.org/gerrit/#/c/7543/).
-
-From these basic test cases we can see that OPNFV ARNO has integrated with some HA
-schemes in its controller nodes. However, its capability of self repair should be
-enhanced.
-
-HA deployment guideline for Brahmaputra
-==============================================
-In the Brahmaputra release, 4 installers are provided. We will discuss about the HA
-deployment of each installer.
-
-Apex
-----------------------------------------------------
-
-For the installer of Apex, all of the OpenStack services are in HA on all 3 controllers.
-The services are monitored by pacemaker and load balanced by HA Proxy with VIPs. The
-SDN controllers usually only run as a single instance on the first controller with no
-HA scheme.
-
-Database is clustered with galera in an active passive failover via pacemaker and the
-message bus is rabbitHA and the services are managed by pacemaker.
-
-Storage is using ceph, clustered across the control nodes.
-
-In the future, more work is on the way to provide HA for the SDN controller. The Apex
-team has already finished a demo that runs ODL on each controller, load balanced to
-neutron via a VIP + HA Proxy, but is not using pacemaker. Meanwhile, they are also
-working to include ceph storage HA for compute nodes as well.
-
-Compass
----------------------------------------------------------
-TBD
-
-Fuel
--------------------------------------------------------------
-
-At moment Fuel installer support the following HA schemes.
-
-1)Openstackcontrollers: N-way redundant (1,3,5, etc)
-2)OpenDaylight:No redundancy
-3)Cephstorage OSD: N-way redundant (1,3,5, etc)
-4)Networkingattachment redundancy: LAG
-5)NTPredundancy: N-way relays, up to 3 upstream sources
-6)DNSredundancy: N-way relays, up to 3 upstream sources
-7)DHCP:1+1
-
-JOID
----------------------------------------------------------
-
-JOID provides HA based on openstack services. Individual service charms have been
-deployed in a container within a host, and each charms are distributed in a way each
-service which meant for HA will go into container on individual nodes. For example
-keystone service, there are three containers on each control node and VIP has been
-assigned to use by the front end API to use keystone. So in case any of the container
-fails VIP will keep responding to via the other two services. As HA can be maintainer
-with odd units at least one service container is required to response.
-
-
-Reference
-==========
-
-* https://www.rdoproject.org/ha/ha-architecture/
-* http://docs.openstack.org/ha-guide/
-* https://wiki.opnfv.org/display/availability?preview=/2926706/2926714/scenario_analysis_for_high_availability_in_nfv.pdf
-* https://wiki.opnfv.org/display/availability?preview=/2926706/2926708/ha_requirement.pdf
-
diff --git a/docs/userguide/HA_ARNO.png b/docs/userguide/HA_ARNO.png
deleted file mode 100644
index 4d59d41..0000000
--- a/docs/userguide/HA_ARNO.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/HA_Hypervisor.png b/docs/userguide/HA_Hypervisor.png
deleted file mode 100644
index 4206a1e..0000000
--- a/docs/userguide/HA_Hypervisor.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/HA_VM.png b/docs/userguide/HA_VM.png
deleted file mode 100644
index 68fbedf..0000000
--- a/docs/userguide/HA_VM.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/HA_VNF.png b/docs/userguide/HA_VNF.png
deleted file mode 100644
index 000fe44..0000000
--- a/docs/userguide/HA_VNF.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/HA_control.png b/docs/userguide/HA_control.png
deleted file mode 100644
index 3b82663..0000000
--- a/docs/userguide/HA_control.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/Overview.png b/docs/userguide/Overview.png
deleted file mode 100644
index a860038..0000000
--- a/docs/userguide/Overview.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/topology_control_compute.png b/docs/userguide/topology_control_compute.png
deleted file mode 100644
index 673aa46..0000000
--- a/docs/userguide/topology_control_compute.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/topology_control_compute_network.png b/docs/userguide/topology_control_compute_network.png
deleted file mode 100644
index 7b2b582..0000000
--- a/docs/userguide/topology_control_compute_network.png
+++ /dev/null
Binary files differ
diff --git a/docs/userguide/topology_control_compute_network_storage.png b/docs/userguide/topology_control_compute_network_storage.png
deleted file mode 100644
index 84c7faf..0000000
--- a/docs/userguide/topology_control_compute_network_storage.png
+++ /dev/null
Binary files differ