From 4260b10e3a7c7780385a61d6d708a9480d208a1a Mon Sep 17 00:00:00 2001 From: fuqiao Date: Mon, 13 Feb 2017 10:44:55 +0800 Subject: Danube MS6 adding document directives add overview doc and design docs JIRA: HA-28 Change-Id: I890d2056a0fe61ca6aa1297210a1028d38d1514d Signed-off-by: fuqiao@chinamobile.com --- docs/scenarios/GAP_Analysis_Colorado.rst | 278 --------------------------- docs/scenarios/GAP_Analysis_Colorado.rst.bak | 278 --------------------------- docs/scenarios/index.rst | 13 -- 3 files changed, 569 deletions(-) delete mode 100644 docs/scenarios/GAP_Analysis_Colorado.rst delete mode 100644 docs/scenarios/GAP_Analysis_Colorado.rst.bak delete mode 100644 docs/scenarios/index.rst (limited to 'docs/scenarios') diff --git a/docs/scenarios/GAP_Analysis_Colorado.rst b/docs/scenarios/GAP_Analysis_Colorado.rst deleted file mode 100644 index 4fefc09..0000000 --- a/docs/scenarios/GAP_Analysis_Colorado.rst +++ /dev/null @@ -1,278 +0,0 @@ -Introduction: -^^^^^^^^^^^^^ - -During the Colorado release the OPNFV availability team has reviewed a number of gaps -in support for high availability in various areas of OPNFV. The focus and goal was -to find gaps and work with the various open source communities( OpenStack as an -example ) to develop solutions and blueprints. This would enhance the overall -system availability and reliability of OPNFV going forward. We also worked with -the OPNFV Doctor team to ensure our activities were coordinated. In the next -releases of OPNFV the availability team will update the status of open gaps and -continue to look for additional gaps. - -Summary of findings: -^^^^^^^^^^^^^^^^^^^^ - -1. Publish health status of compute node - this gap is now closed through and -OpenStack blueprint in Mitaka - -2. Health status of compute node - some good work underway in OpenStack and with -the Doctor team, we will continue to monitor this work. - -3. Store consoleauth tokens to the database - this gap can be address through -changing OpenStack configurations - -4. Active/Active HA of cinder-volume - active work underway in Newton, we will -monitor closely - -5. Cinder volume multi-attachment - this work has been completed in OpenStack - -this gap is now closed - -6. Add HA tests into Fuel - the Availability team has been working with the -Yardstick team to create additional test case for the Colorado release. Some of -these test cases would be good additions to installers like Fuel. - -Detailed explanation of the gaps and findings: -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -GAP 1: Publish the health status of compute node -================================================ - -* Type: 'reliability' -* Description: - - Current compute node status is only kept within nova. However, NFVO and VNFM - may also need these information. For example, NFVO may trigger scale up/down - based on the status. VNFM may trigger evacuation. In the meantime, in the - high availability scenarios, VNFM may need the host status info from the VIM - so that it can figure out what the failure exactly located. Therefore, these - info need to be published outside to the NFVO and VNFM. - - + Desired state - - - Be able to have the health status of compute nodes published. - - + Current behaviour - - - Nova queries the ServiceGroup API to get the node liveness information. - - + Gap - -- Currently Service Group is keeping the health status of compute nodes internal -- within nova, could have had those status published to NFV MANO plane. - -Findings: - -BP from the OPNFV Doctor team has covered this GAP. Add notification for service -status change. - -Status: Merged (Mitaka release) - - + Owner: Balazs - - + BP: https://blueprints.launchpad.net/nova/+spec/service-status-notification - - + Spec: https://review.openstack.org/182350 - - + Code: https://review.openstack.org/#/c/245678/ - - + Merged Jan 2016 - Mitaka - -GAP 2: Health status of compute node -==================================== - -* Type: 'reliability' -* Description: - - + Desired state: - - - Provide the health status of compute nodes. - - + Current Behaviour - - - Currently , while performing some actions like evacuation, Nova is - checking for the compute service. If the service is down,it is assumed - the host is down. This is not exactly true, since there is a possibility - to only have compute service down, while all VMs that are running on the - host, are actually up. There is no way to distinguish between two really - different things: host status and nova-compute status, which is deployed - on the host. - - Also, provided host information by API and commands, are service centric, - i.e."nova host-list" is just another wrapper for "nova service-list" with - different format (in fact "service-list" is a super set to "host-list"). - - - + Gap - - - Not all the health information of compute nodes can be provided. Seems like - nova is treating *host* term equally to *compute-host*, which might be misleading. - Such situations can be error prone for the case where there is a need to perform - host evacuation. - - -Related BP: - -Pacemaker and Corosync can provide info about the host. Therefore, there is -requirement to have nova support the pacemaker service group driver. There could -be another option by adding tooz servicegroup driver to nova, and then have to -support corosync driver. - - + https://blueprints.launchpad.net/nova/+spec/tooz-for-service-groups - -Doctor team is not working on this blueprint - -NOTE: This bp is active. A suggestion is to adopt this bp and add a corosync -driver to tooz. Could be a solution. - -We should keep following this bp, when it finished, see if we could add a -corosync driver for tooz to close this gap. - -Here are the currently supported driver in tooz. -https://github.com/openstack/tooz/blob/master/doc/source/drivers.rst Meanwhile, -we should also look into the doctor project and see if this could be solved. - -This work is still underway, but, doesn't directly map to the gap that it is -identified above. Doctor team looking to get faster updates on node status and -failure status - these are other blueprints. These are good problems to solve. - -GAP 3: Store consoleauth tokens to the database -=============================================== - -* Type: 'performance' -* Description: - -+ Desired state - - - Change the consoleauth service to store the tokens in the databaseand, optionally, - cache them in memory as it does now for fast access. - -+ Current State - - - Currently the consoleauth service is storing the tokens and theconnection data - only in memory. This behavior makes impossible to have multipleinstances of this - service in a cluster as there is no way for one of theisntances to know the tokens - issued by the other. - - - The consoleauth service can use a memcached server to store those tokens,but again, - if we want to share them among different instances of it we would berelying in one - memcached server which makes this solution unsuitable for a highly available - architecture where we should be able to replicate all ofthe services in our cluster. - -+ Gap - - - The consoleauth service is storing the tokens and the connection data only in memory. - This behavior makes impossible to have multiple instances of this service in a cluster - as there is no way for one of the instances to know the tokens issued by the other. - -* Related BP - - + https://blueprints.launchpad.net/nova/+spec/consoleauth-tokens-in-db - - The advise in the blueprint is to use memcached as a backend. Looking to the - documentation memcached is not able to replicate data, so this is not a - complete solution. But maybe redis (http://redis.io/) is a suitable backend - to store the tokens that survive node failures. This blueprint is not - directly needed for this gap. - -Findings: - -This bp has been rejected since the community feedback is that A/A can be -supported by memcacheD. The usecase for this bp is not quite clear, since when -the consoleauth service is done and the token is lost, the other service can -retrieve the token again after it recovers. Can be accomplished through a -different configuration set up for OpenStack. Therefore not a gap. -Recommendation of the team is to verify the redis approach. - - -GAP 4: Active/Active HA of cinder-volume -======================================== - -* Type: 'reliability/scalability' - -* Description: - - + Desired State: - - - Cinder-volume can run in an active/active configuration. - - + Current State: - - - Only one cinder-volume instance can be active. Failover to be handledby - external mechanism such as pacemaker/corosync. - - + Gap - - - Cinder-volume doesn't supprt active/active configuration. - -* Related BP - - + https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support - -* Findings: - - + This blueprint underway for Newton - as of July 6, 2016 great progress has - been made, we will continue to monitor the progress. - -GAP 5: Cinder volume multi-attachment -===================================== - -* Type: 'reliability' -* Description: - - + Desired State - - - Cinder volumes can be attached to multiple VMs at the same time. So that - active/standby stateful VNFs can share the same Cinder volume. - - + Current State - - - Cinder volumes can only be attached to one VM at a time. - - + Gap - - - Nova and cinder do not allow for multiple simultaneous attachments. - -* Related BP - - + https://blueprints.launchpad.net/openstack/?searchtext=multi-attach-volume - -* Findings - - + Multi-attach volume is still WIP in OpenStack. There is coordination work required with Nova. - + At risk for Newton - + Recommend adding a Yardstick test case. - -General comment for the next release. Remote volume replication is another -important project for storage HA. -The HA team will monitor this multi-blueprint activity that will span multiple -OpenStack releases. The blueprints aren't approved yet and there dependencies -on generic-volume-group. - - - -GAP 6: HA tests improvements in fuel -==================================== - -* Type: 'robustness' -* Description: - - + Desired State - - Increased test coverage for HA during install - + Current State - - A few test cases are available - - * Related BP - - - https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements - - Tie in with the test plans we have discussed previously. - - Look at Yardstick tests that could be proposed back to Openstack. - - Discussions planned with Yardstick team to engage with Openstack community to enhance Fuel or Tempest as appropriate. - - -Next Steps: -^^^^^^^^^^^ - -The six gaps above demonstrate that on going progress is being made in various -OPNFV and OpenStack communities. The OPNFV-HA team will work to suggest -blueprints for the next OpenStack Summit to help continue the progress of high -availability in the community. diff --git a/docs/scenarios/GAP_Analysis_Colorado.rst.bak b/docs/scenarios/GAP_Analysis_Colorado.rst.bak deleted file mode 100644 index b6b7313..0000000 --- a/docs/scenarios/GAP_Analysis_Colorado.rst.bak +++ /dev/null @@ -1,278 +0,0 @@ -Introduction: -^^^^^^^^^^^^^ - -During the Colorado release the OPNFV availability team has reviewed a number of gaps -in support for high availability in various areas of OPNFV. The focus and goal was -to find gaps and work with the various open source communities( OpenStack as an -example ) to develop solutions and blueprints. This would enhance the overall -system availability and reliability of OPNFV going forward. We also worked with -the OPNFV Doctor team to ensure our activities were coordinated. In the next -releases of OPNFV the availability team will update the status of open gaps and -continue to look for additional gaps. - -Summary of findings: -^^^^^^^^^^^^^^^^^^^^ - -1. Publish health status of compute node - this gap is now closed through and -OpenStack blueprint in Mitaka - -2. Health status of compute node - some good work underway in OpenStack and with -the Doctor team, we will continue to monitor this work. - -3. Store consoleauth tokens to the database - this gap can be address through -changing OpenStack configurations - -4. Active/Active HA of cinder-volume - active work underway in Newton, we will -monitor closely - -5. Cinder volume multi-attachment - this work has been completed in OpenStack - -this gap is now closed - -6. Add HA tests into Fuel - the Availability team has been working with the -Yardstick team to create additional test case for the Colorado release. Some of -these test cases would be good additions to installers like Fuel. - -Detailed explanation of the gaps and findings: -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -GAP 1: Publish the health status of compute node -================================================ - -* Type: 'reliability' -* Description: - - Current compute node status is only kept within nova. However, NFVO and VNFM - may also need these information. For example, NFVO may trigger scale up/down - based on the status. VNFM may trigger evacuation. In the meantime, in the - high availability scenarios, VNFM may need the host status info from the VIM - so that it can figure out what the failure exactly located. Therefore, these - info need to be published outside to the NFVO and VNFM. - - + Desired state - - - Be able to have the health status of compute nodes published. - - + Current behaviour - - - Nova queries the ServiceGroup API to get the node liveness information. - - + Gap - -- Currently Service Group is keeping the health status of compute nodes internal -- within nova, could have had those status published to NFV MANO plane. - -Findings: - -BP from the OPNFV Doctor team has covered this GAP. Add notification for service -status change. - -Status: Merged (Mitaka release) - - + Owner: Balazs - - + BP: https://blueprints.launchpad.net/nova/+spec/service-status-notification - - + Spec: https://review.openstack.org/182350 - - + Code: https://review.openstack.org/#/c/245678/ - - + Merged Jan 2016 - Mitaka - -GAP 2: Health status of compute node -==================================== - -* Type: 'reliability' -* Description: - - + Desired state: - - - Provide the health status of compute nodes. - - + Current Behaviour - - - Currently , while performing some actions like evacuation, Nova is - checking for the compute service. If the service is down,it is assumed - the host is down. This is not exactly true, since there is a possibility - to only have compute service down, while all VMs that are running on the - host, are actually up. There is no way to distinguish between two really - different things: host status and nova-compute status, which is deployed - on the host. - - Also, provided host information by API and commands, are service centric, - i.e."nova host-list" is just another wrapper for "nova service-list" with - different format (in fact "service-list" is a super set to "host-list"). - - - + Gap - - - Not all the health information of compute nodes can be provided. Seems like - nova is treating *host* term equally to *compute-host*, which might be misleading. - Such situations can be error prone for the case where there is a need to perform - host evacuation. - - -Related BP: - -Pacemaker and Corosync can provide info about the host. Therefore, there is -requirement to have nova support the pacemaker service group driver. There could -be another option by adding tooz servicegroup driver to nova, and then have to -support corosync driver. - - + https://blueprints.launchpad.net/nova/+spec/tooz-for-service-groups - -Doctor team is not working on this blueprint - -NOTE: This bp is active. A suggestion is to adopt this bp and add a corosync -driver to tooz. Could be a solution. - -We should keep following this bp, when it finished, see if we could add a -corosync driver for tooz to close this gap. - -Here are the currently supported driver in tooz. -https://github.com/openstack/tooz/blob/master/doc/source/drivers.rst Meanwhile, -we should also look into the doctor project and see if this could be solved. - -This work is still underway, but, doesn't directly map to the gap that it is -identified above. Doctor team looking to get faster updates on node status and -failure status - these are other blueprints. These are good problems to solve. - -GAP 3: Store consoleauth tokens to the database -=============================================== - -* Type: 'performance' -* Description: - -+ Desired state - - - Change the consoleauth service to store the tokens in the databaseand, optionally, - cache them in memory as it does now for fast access. - -+ Current State - - - Currently the consoleauth service is storing the tokens and theconnection data - only in memory. This behavior makes impossible to have multipleinstances of this - service in a cluster as there is no way for one of theisntances to know the tokens - issued by the other. - - - The consoleauth service can use a memcached server to store those tokens,but again, - if we want to share them among different instances of it we would berelying in one - memcached server which makes this solution unsuitable for a highly available - architecture where we should be able to replicate all ofthe services in our cluster. - -+ Gap - - - The consoleauth service is storing the tokens and the connection data only in memory. - This behavior makes impossible to have multiple instances of this service in a cluster - as there is no way for one of the instances to know the tokens issued by the other. - -* Related BP - - + https://blueprints.launchpad.net/nova/+spec/consoleauth-tokens-in-db - - The advise in the blueprint is to use memcached as a backend. Looking to the - documentation memcached is not able to replicate data, so this is not a - complete solution. But maybe redis (http://redis.io/) is a suitable backend - to store the tokens that survive node failures. This blueprint is not - directly needed for this gap. - -Findings: - -This bp has been rejected since the community feedback is that A/A can be -supported by memcacheD. The usecase for this bp is not quite clear, since when -the consoleauth service is done and the token is lost, the other service can -retrieve the token again after it recovers. Can be accomplished through a -different configuration set up for OpenStack. Therefore not a gap. -Recommendation of the team is to verify the redis approach. - - -GAP 4: Active/Active HA of cinder-volume -======================================== - -* Type: 'reliability/scalability' - -* Description: - - + Desired State: - - - Cinder-volume can run in an active/active configuration. - - + Current State: - - - Only one cinder-volume instance can be active. Failover to be handledby - external mechanism such as pacemaker/corosync. - - + Gap - - - Cinder-volume doesn't supprt active/active configuration. - -* Related BP - - + https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support - -* Findings: - - + This blueprint underway for Newton - as of July 6, 2016 great progress has - been made, we will continue to monitor the progress. - -GAP 5: Cinder volume multi-attachment -===================================== - -* Type: 'reliability' -* Description: - - + Desired State - - - Cinder volumes can be attached to multiple VMs at the same time. So that - active/standby stateful VNFs can share the same Cinder volume. - - + Current State - - - Cinder volumes can only be attached to one VM at a time. - - + Gap - - - Nova and cinder do not allow for multiple simultaneous attachments. - -* Related BP - - + https://blueprints.launchpad.net/openstack/?searchtext=multi-attach-volume - -* Findings - - + Multi-attach volume is still WIP in OpenStack. There is coordination work required with Nova. - + At risk for Newton - + Recommend adding a Yardstick test case. - -General comment for the next release. Remote volume replication is another -important project for storage HA. -The HA team will monitor this multi-blueprint activity that will span multiple -OpenStack releases. The blueprints aren't approved yet and there dependencies -on generic-volume-group. - - - -GAP 6: HA tests improvements in fuel -==================================== - -* Type: 'robustness' -* Description: - - + Desired State - - Increased test coverage for HA during install - + Current State - - A few test cases are available - - * Related BP - - - https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements - - Tie in with the test plans we have discussed previously. - - Look at Yardstick tests that could be proposed back to Openstack. - - Discussions planned with Yardstick team to engage with Openstack community to enhance Fuel or Tempest as appropriate. - - -Next Steps: -^^^^^^^^^^^ - -The six gaps above demonstrate that on going progress is being made in various -OPNFV and OpenStack communities. The OPNFV-HA team will work to suggest -blueprints for the next OpenStack Summit to help continue the progress of high -availability in the community. diff --git a/docs/scenarios/index.rst b/docs/scenarios/index.rst deleted file mode 100644 index dec46c7..0000000 --- a/docs/scenarios/index.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) - -********************************* -Gap Analysis of High Availability -********************************* - -.. toctree:: - :numbered: - :maxdepth: 4 - - GAP_Analysis_Colorado.rst -- cgit 1.2.3-korg