summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorChaoyi Huang <joehuang@huawei.com>2015-12-22 13:24:10 +0800
committerChaoyi Huang <joehuang@huawei.com>2016-01-04 11:01:31 +0800
commit8e3d1f151aa2ba629c9ed4ad61862ac147fff3ec (patch)
treecc042c0c9934f13c8a6d3f868421ec38550111d8 /docs
parent1a148540057b9bcbc32865217319021ba09ae07b (diff)
Add doc and update output according OpenStack spec's updatebrahmaputra.1.0
Add admin-user-guide and configuration-guide for multisite, and update the output requirement to sync. the change in OpenStack spec, which was reviewed in OpenStack community Change-Id: Icff3dda7e204404f8003d6e06cde45151eb03446 Signed-off-by: Chaoyi Huang <joehuang@huawei.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/configguide/index.rst30
-rw-r--r--docs/configguide/multisite-configuration-guide.rst120
-rw-r--r--docs/how-to-use-docs/documentation-example.rst9
-rw-r--r--docs/requirements/VNF_high_availability_across_VIM.rst160
-rw-r--r--docs/requirements/multisite-identity-service-management.rst376
-rw-r--r--docs/requirements/multisite-vnf-gr-requirement.rst241
-rw-r--r--docs/userguide/index.rst30
-rw-r--r--docs/userguide/multisite-admin-user-guide.rst405
8 files changed, 1367 insertions, 4 deletions
diff --git a/docs/configguide/index.rst b/docs/configguide/index.rst
new file mode 100644
index 0000000..49b30f1
--- /dev/null
+++ b/docs/configguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+ sphinx-quickstart on Tue Jun 9 19:12:31 2015.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ multisite-configuration-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/configguide/multisite-configuration-guide.rst b/docs/configguide/multisite-configuration-guide.rst
new file mode 100644
index 0000000..4a20e83
--- /dev/null
+++ b/docs/configguide/multisite-configuration-guide.rst
@@ -0,0 +1,120 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+=============================
+Multisite configuration guide
+=============================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Before you read
+---------------
+
+This chapter does not intend to cover all configuration of KeyStone and other
+OpenStack services to work together with KeyStone.
+
+This chapter focuses only on the configuration part should be taken into
+account in multi-site scenario.
+
+Please read the configuration documentation related to identity management
+of OpenStack for all configuration items.
+
+http://docs.openstack.org/liberty/config-reference/content/ch_configuring-openstack-identity.html
+
+How to configure the database cluster for synchronization or asynchrounous
+repliation in multi-site scenario is out of scope of this document. The only
+remainder is that for the synchronization or replication, only Keystone
+database is required. If you are using MySQL, you can configure like this:
+
+In the master:
+
+ .. code-block:: bash
+
+ binlog-do-db=keystone
+
+In the slave:
+
+ .. code-block:: bash
+
+ replicate-do-db=keystone
+
+
+Deployment options
+------------------
+
+For each detail description of each deployment option, please refer to the
+admin-user-guide.
+
+- Distributed KeyStone service with PKI token
+
+ In KeyStone configuration file, PKI token format should be configured
+
+ .. code-block:: bash
+
+ provider = pki
+
+ or
+
+ .. code-block:: bash
+
+ provider = pkiz
+
+ In the [keystone_authtoken] section of each OpenStack service configuration
+ file in each site, configure the identity_url and auth_uri to the address
+ of KeyStone service
+
+ .. code-block:: bash
+
+ identity_uri = https://keystone.your.com:35357/
+ auth_uri = http://keystone.your.com:5000/v2.0
+
+ It's better to use domain name for the KeyStone service, but not to use IP
+ address directly, especially if you deployed KeyStone service in at least
+ two sites for site level high availability.
+
+- Distributed KeyStone service with Fernet token
+- Distributed KeyStone service with Fernet token + Async replication (
+ star-mode).
+
+ In these two deployment options, the token validation is planned to be done
+ in local site.
+
+ In KeyStone configuration file, Fernet token format should be configured
+
+ .. code-block:: bash
+
+ provider = fernet
+
+ In the [keystone_authtoken] section of each OpenStack service configuration
+ file in each site, configure the identity_url and auth_uri to the address
+ of local KeyStone service
+
+ .. code-block:: bash
+
+ identity_uri = https://local-keystone.your.com:35357/
+ auth_uri = http://local-keystone.your.com:5000/v2.0
+
+ and especially, configure the region_name to your local region name, for
+ example, if you are configuring services in RegionOne, and there is local
+ KeyStone service in RegionOne, then
+
+ .. code-block:: bash
+
+ region_name = RegionOne
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/how-to-use-docs/documentation-example.rst b/docs/how-to-use-docs/documentation-example.rst
index afcf758..24b4c8a 100644
--- a/docs/how-to-use-docs/documentation-example.rst
+++ b/docs/how-to-use-docs/documentation-example.rst
@@ -1,5 +1,5 @@
.. two dots create a comment. please leave this logo at the top of each of your rst files.
-.. image:: ../etc/opnfv-logo.png
+.. image:: ../etc/opnfv-logo.png
:height: 40
:width: 200
:alt: OPNFV
@@ -21,8 +21,9 @@ this is the directory structure of the docs/ directory that can be found in the
./how-to-use-docs/documentation-example.rst
./how-to-use-docs/index.rst
-To create your own documentation, Create any number of directories (depending on your need) and place in each of them an index.rst.
-This index file must refence your other rst files.
+To create your own documentation, Create any number of directories (depending
+on your need) and place in each of them an index.rst. This index file must
+refence your other rst files.
* Here is an example index.rst
@@ -59,7 +60,7 @@ For verify jobs a link to the documentation will show up as a comment in gerrit
* Merge jobs
-Once you are happy with the look of your documentation you can submit the patchset the merge job will
+Once you are happy with the look of your documentation you can submit the patchset the merge job will
copy the output of each documentation directory to http://artifacts.opnfv.org/$project/docs/$name_of_your_folder/index.html
Here are some quick examples of how to use rst markup
diff --git a/docs/requirements/VNF_high_availability_across_VIM.rst b/docs/requirements/VNF_high_availability_across_VIM.rst
new file mode 100644
index 0000000..1a7d41b
--- /dev/null
+++ b/docs/requirements/VNF_high_availability_across_VIM.rst
@@ -0,0 +1,160 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+VNF high availability across VIM
+=======================================
+
+Problem description
+===================
+
+Abstract
+------------
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Description
+------------
+VNF (Telecom application running over cloud) may (already) be designed as
+Active-Standby/Active-Active/N-Way to achieve high availability,
+
+With a telecoms focus, this generally refers both to availability of service
+(i.e. the ability to make new calls), but also maintenance of ongoing control
+plane state and active media processing(i.e. “keeping up” existing calls).
+
+Traditionally telecoms systems are designed to maintain state and calls across
+pretty much the full range of single-point failures. As listed this includes
+power supply, hard drive, physical server or network switch, but also covers
+software failure, and maintenance operations such as software upgrade.
+
+To provide this support, typically requires state replication between
+application instances (directly or via replicated database services, or via
+private designed message format). It may also require special case handling of
+media endpoints, to allow transfer of median short time scales (<1s) without
+requiring end-to-end resignalling (e.g.RTP redirection via IP / MAC address
+transfers c.f VRRP).
+
+With a migration to NFV, a commonly expressed desire by carriers is to provide
+the same resilience to any single point(s) of failure in the cloud
+infrastructure.
+
+This could be done by making each cloud instance fully HA (a non-trivial task to
+do right and to prove it has been done right) , but the preferred approach
+appears to be to accept the currently limited availability of a given cloud
+instance (no desire to radically rework this for telecoms), and instead to
+provide solution availability by spreading function across multiple cloud
+instances (i.e. the same approach used today todeal with hardware and software
+failures).
+
+A further advantage of this approach, is it provides a good basis for seamless
+upgrade of infrastructure software revision, where you can spin up an additional
+up-level cloud, gradually transfer over resources / app instances from one of
+your other clouds, before finally turning down the old cloud instance when no
+longer required.
+
+If fast media / control failure over is still required (which many/most carriers
+still seem to believe it is) there are some interesting/hard requirements on the
+networking between cloud instances. To help with this, many people appear
+willing to provide multiple “independent” cloud instances in a single geographic
+site, with special networking between clouds in that physical site.
+"independent" in quotes is because some coordination between cloud instances is
+obviously required, but this has to be implemented in a fashion which reduces
+the potential for correlated failure to very low levels (at least as low as the
+required overall application availability).
+
+Analysis of requirements to OpenStack
+===========================
+The VNF often has different networking plane for different purpose:
+
+external network plane: using for communication with other VNF
+components inter-communication plane: one VNF often consisted of several
+components, this plane is designed for components inter-communication with each
+other
+backup plance: this plane is used for the heart beat or state replication
+between the component's active/standy or active/active or N-way cluster.
+management plane: this plane is mainly for the management purpose
+
+Generally these planes are seperated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexsible IP addressing
+plane.
+
+to make the VNF can work with HA mode across different OpenStack instances in
+one site (but not limited to), need to support at lease the backup plane across
+different OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plance for
+heartbeat or state replication. Overlay L2 network is preferred, the reason is:
+a. Support legacy compatibility: Some telecom app with built-in internal L2
+network, for easy to move these app to VNF, it would be better to provide L2
+network b. Support IP overlapping: multiple VNFs may have overlaping IP address
+for cross OpenStack instance networking
+Therefore, over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+For L3 networking, we can leverage the floating IP provided in current Neutron,
+so no new feature requirement to OpenStack.
+
+3) The IP address used for VNF to connect with other VNFs should be able to be
+floating cross OpenStack instance. For example, if the master failed, the IP
+address should be used in the standby which is running in another OpenStack
+instance. There are some method like VRRP/GARP etc can help the movement of the
+external IP, so no new feature will be added to OpenStack.
+
+
+Prototype
+-----------
+ None.
+
+Proposed solution
+-----------
+
+ requirements perspective It's up to application descision to use L2 or L3
+networking across Neutron.
+
+ For Neutron, a L2 network is consisted of lots of ports. To make the cross
+Neutron L2 networking is workable, we need some fake remote ports in local
+Neutron to represent VMs in remote site ( remote OpenStack ).
+
+ the fake remote port will reside on some VTEP ( for VxLAN ), the tunneling
+IP address of the VTEP should be the attribute of the fake remote port, so that
+the local port can forward packet to correct tunneling endpoint.
+
+ the idea is to add one more ML2 mechnism driver to capture the fake remote
+port CRUD( creation, retievement, update, delete)
+
+ when a fake remote port is added/update/deleted, then the ML2 mechanism
+driver for these fake ports will activate L2 population, so that the VTEP
+tunneling endpoint information could be understood by other local ports.
+
+ it's also required to be able to query the port's VTEP tunneling endpoint
+information through Neutron API, in order to use these information to create
+fake remote port in another Neutron.
+
+ In the past, the port's VTEP ip address is the host IP where the VM resides.
+But the this BP https://review.openstack.org/#/c/215409/ will make the port free
+of binding to host IP as the tunneling endpoint, you can even specify L2GW ip
+address as the tunneling endpoint.
+
+ Therefore a new BP will be registered to processing the fake remote port, in
+order make cross Neutron L2 networking is feasible. RFE is registered first:
+https://bugs.launchpad.net/neutron/+bug/1484005
+
+
+Gaps
+====
+ 1) fake remote port for cross Neutron L2 networking
+
+
+**NAME-THE-MODULE issues:**
+
+* Neutron
+
+Affected By
+-----------
+ OPNFV multisite cloud.
+
+References
+==========
+
diff --git a/docs/requirements/multisite-identity-service-management.rst b/docs/requirements/multisite-identity-service-management.rst
new file mode 100644
index 0000000..b411c28
--- /dev/null
+++ b/docs/requirements/multisite-identity-service-management.rst
@@ -0,0 +1,376 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported
+License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+ Multisite identity service management
+=======================================
+
+Glossary
+========
+
+There are 3 types of token supported by OpenStack KeyStone
+ **UUID**
+
+ **PKI/PKIZ**
+
+ **FERNET**
+
+Please refer to reference section for these token formats, benchmark and
+comparation.
+
+
+Problem description
+===================
+
+Abstract
+------------
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Description
+------------
+
+- User/Group Management: e.g. use of LDAP, should OPNFV be agnostic to this?
+ Reusing the LDAP infrastructure that is mature and has features lacking in
+Keystone (e.g.password aging and policies). KeyStone can use external system to
+do the user authentication, and user/group management could be the job of
+external system, so that KeyStone can reuse/co-work with enterprise identity
+management. KeyStone's main role in OpenStack is to provide
+service(Nova,Cinder...) aware token, and do the authorization. You can refer to
+this post https://blog-nkinder.rhcloud.com/?p=130.Therefore, LDAP itself should
+be a topic out of our scope.
+
+- Role assignment: In case of federation(and perhaps other solutions) it is not
+ feasible/scalable to do role assignment to users. Role assignment to groups
+ is better. Role assignment will be done usually based on group. KeyStone
+ supports this.
+
+- Amount of inter region traffic: should be kept as little as possible,
+ consider CERNs Ceilometer issue as described in
+http://openstack-in-production.blogspot.se/2014/03/cern-cloud-architecture-update-for.html
+
+Requirement analysis
+===========================
+
+- A user is provided with a single authentication URL to the Identity
+ (Keystone) service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. The keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. With token type UUID/Fernet, the user request the
+service catalog. With PKI tokens the service catalog is included in the token.
+The user sends a request to a service in a selected region including the token.
+Now the service in the region, say Nova needs to validate the token. Nova uses
+its configured keystone endpoint and service credentials to request token
+validation from Keystone. The Keystone token validation should preferably be
+done in the same region as Nova itself. Now Keystone has to validate the token
+that also (always?) includes a project ID in order to make sure the user is
+authorized to use Nova. The project ID is stored in the assignment backend -
+tables in the Keystone SQL database. For this project ID validation the
+assignment backend database needs to have the same content as the keystone who
+issued the token.
+
+- So either 1) services in all regions are configured with a central keystone
+ endpoint through which all token validations will happen. or 2) the Keystone
+assignment backend database is replicated and thus available to Keystone
+instances locally in each region.
+
+ Alt 2) is obviously the only scalable solution that produce no inter region
+traffic for normal service usage. Only when data in the assignment backend is
+changed, replication traffic will be sent between regions. Assignment data
+includes domains, projects, roles and role assignments.
+
+Keystone deployment:
+
+ - Centralized: a single Keystone service installed in some location, either
+ in a "master" region or totally external as a service to OpenStack
+ regions.
+ - Distributed: a Keystone service is deployed in each region
+
+Token types:
+
+ - UUID: tokens are persistently stored and creates a lot of database
+ traffic, the persistence of token is for the revoke purpose. UUID tokens
+are online validated by Keystone, each API calling to service will ask token
+validation from KeyStone. Keystone can become a bottleneck in a large system
+due to this. UUID token type is not suitable for use in multi region clouds at
+all, no matter the solution used for the Keystone database replication (or
+not). UUID tokens have a fixed size.
+
+ - PKI: tokens are non persistent cryptographic based tokens and offline
+ validated (not by the Keystone service) by Keystone middleware
+which is part of other services such as Nova. Since PKI tokens include endpoint
+for all services in all regions, the token size can become big.There are
+several ways to reduce the token size, no catalog policy, endpoint filter to
+make a project binding with limited endpoints, and compressed PKI token - PKIZ,
+but the size of token is still predictable, make it difficult to manage. If no
+catalog applied, that means the user can access all regions, in some scenario,
+it's not allowed to do like this.
+
+ - Fernet: tokens are non persistent cryptographic based tokens and online
+ validated by the Keystone service. Fernet tokens are more lightweigth
+then PKI tokens and have a fixed size.
+
+ PKI (offline validated) are needed with a centralized Keystone to avoid
+inter region traffic. PKI tokens do produce Keystone traffic for revocation
+lists.
+
+ Fernet tokens requires Keystone deployed in a distributed manner, again to
+avoid inter region traffic.
+
+ Cryptographic tokens brings new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope of
+this use case.
+
+Database deployment:
+
+ Database replication:
+ -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable
+ -Multi master synchronous: Galera(others like percona), not so scalable,
+for multi-master writing, and need more parameter tunning for WAN latency.
+ -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+ Database server sharing:
+ In an OpenStack controller normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+ Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+ Since only the Keystone database is to be sync replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+ The database server can be shared when async master/slave repl is used, if
+global transaction identifiers GTID is enabled.
+
+
+Candidate solution analysis
+------------------------------------
+
+- KeyStone service (Distributed) with Fernet token
+
+ Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project. User, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+ This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+ Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+ The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+ This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+- KeyStone service(Distributed) with Fernet token + Async replication (
+ multi-cluster mode).
+
+ We may have several KeyStone cluster with Fernet token, for example,
+cluster1 ( site1, site2, … site 10 ), cluster 2 ( site11, site 12,..,site 20).
+Then do the DB async replication among different cluster asynchronously.
+
+ A prototype of this has been down on this. In some blogs they call it
+"hybridreplication". Architecturally you have a master region where you do
+keystone writes. The other regions is read-only.
+http://severalnines.com/blog/deploy-asynchronous-slave-galera-mysql-easy-way
+http://severalnines.com/blog/replicate-mysql-server-galera-cluster
+
+ Only one DB cluster (the master DB cluster) is allowed to write(but still
+multisite, not all sites), other clusters waiting for replication. Inside the
+master cluster, "write" is allowed in multiple region for the distributed lock
+in the DB. But please notice the challenge of key distribution and rotation for
+Fernet token, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+- KeyStone service(Distributed) with Fernet token + Async replication (
+ star-mode).
+
+ one master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+ Only the master cluster nodes are allowed to write, other slave nodes
+waiting for replication from the master cluster ( very little delay) member.
+But the chanllenge of key distribution and rotation for Fernet token should be
+settled, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+ Pros.
+ Why cluster in the master sites? There are lots of master nodes in the
+cluster, in order to provide more slaves could be done with async. replication
+in parallel. Why two sites for the master cluster? to provide higher
+reliability (site level) for writing request.
+ Why using multi-slaves in other sites. Slave has no knowledge of other
+slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+ Cons. The distribution/rotation of key management.
+
+- KeyStone service(Distributed) with PKI token
+
+ The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request toKeyStone server. The
+drawback of PKI token is 1) the endpoint list size in the token. If a project
+will be only spread in very limited site number(region number), then we can use
+the endpoint filter to reduce the token size, make it workable even a lot of
+sites in the cloud. 2) KeyStone middleware(the old KeyStone client, which
+co-locate in Nova/xxx-API) will have to send the request to the KeyStone server
+frequently for the revoke-list, in order to reject some malicious API request,
+for example, a user has be deactivated, but use an old token to access
+OpenStack service.
+
+ For this solution, except above issues, we need also to provide KeyStone
+Active-Active mode across site to reduce the impact of site failure. And the
+revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+ Site level keystone load balance is required to provide site level
+redundancy. Otherwise the KeyStone middleware will not switch request to the
+health KeyStone server in time.
+
+ This solution can be used for some scenario, especially a project only
+spread in limited sites ( regions ).
+
+ And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+- KeyStone service(Distributed) with UUID token
+
+ Because each token validation will be sent to KeyStone server,and the token
+persistence also makes the DB size larger than Fernet token, not so good as the
+fernet token to provide a distributed KeyStone service. UUID is a solution
+better for small scale and inside one site.
+
+ Cons: UUID tokens are persistently stored so will cause a lot of inter
+region replication traffic, tokens will be persisted for authorization and
+revoke purpose, the frequent changed database leads to a lot of inter region
+replication traffic.
+
+- KeyStone service(Distributed) with Fernet token + KeyStone federation You
+ have to accept the drawback of KeyStone federation if you have a lot of
+sites/regions. Please refer to KeyStone federation section
+
+- KeyStone federation
+ In this solution, we can install KeyStone service in each site and with
+its own database. Because we have to make the KeyStone IdP and SP know each
+other, therefore the configuration needs to be done accordingly, and setup the
+role/domain/group mapping, create regarding region in the pair.As sites
+increase, if each user is able to access all sites, then full-meshed
+mapping/configuration has to be done. Whenever you add one more site, you have
+to do n*(n-1) sites configuration/mapping. The complexity will be great enough
+as the sites number increase.
+
+ KeyStone Federation is mainly for different cloud admin to borrow/rent
+resources, for example, A company and B company, A private cloud and B public
+cloud, and both of them using OpenStack based cloud. Therefore a lot of mapping
+and configuration has to be done to make it work.
+
+- KeyStone service (Centralized)with Fernet token
+
+ cons: inter region traffic for token validation, token validation requests
+from all other sites has to be sent to the centralized site. Too frequent inter
+region traffic.
+
+- KeyStone service(Centralized) with PKI token
+
+ cons: inter region traffic for tokenrevocation list management, the token
+revocation list request from all other sites has to be sent to the centralized
+site. Too frequent inter region traffic.
+
+- KeyStone service(Centralized) with UUID token
+
+ cons: inter region traffic for token validation, the token validation
+request from all other sites has to be sent to the centralized site. Too
+frequent inter region traffic.
+
+Prototype
+-----------
+ A prototype of the candidate solution "KeyStone service(Distributed) with
+Fernet token + Async replication ( multi-cluster mode)" has been executed Hans
+Feldt and Chaoyi Huang, please refer to https://github.com/hafe/dockers/ . And
+one issue was found "Can't specify identity endpoint for token validation among
+several keystone servers in keystonemiddleware", please refer to the Gaps
+section.
+
+Gaps
+====
+ Can't specify identity endpoint for token validation among several keystone
+servers in keystonemiddleware.
+
+
+**NAME-THE-MODULE issues:**
+
+* keystonemiddleware
+
+ * Can't specify identity endpoint for token validation among several keystone
+ * servers in keystonemiddleware:
+ * https://bugs.launchpad.net/keystone/+bug/1488347
+
+Affected By
+-----------
+ OPNFV multisite cloud.
+
+Conclusion
+-----------
+
+ As the prototype demonstrate the cluster level aysn. replication capability
+and fernet token validation in local site is feasible. And the candidate
+solution "KeyStone service(Distributed) with Fernet token + Async replication (
+star-mode)" is simplified solution of the prototyped one, it's much more easier
+in deployment and maintenance, with better scalability.
+
+ Therefore the candidate solution "KeyStone service(Distributed) with Fernet
+token + Async replication ( star-mode)" for multsite OPNFV cloud is
+recommended.
+
+References
+==========
+
+ There are 3 format token (UUID, PKI/PKIZ, Fernet) provided byKeyStone, this
+blog give a very good description, benchmark and comparation:
+ http://dolphm.com/the-anatomy-of-openstack-keystone-token-formats/
+ http://dolphm.com/benchmarking-openstack-keystone-token-formats/
+
+ To understand the benefit and shortage of PKI/PKIZ token, pleaserefer to :
+ https://www.mirantis.com/blog/understanding-openstack-authentication-keystone-pk
+
+ To understand KeyStone federation and how to use it:
+ http://blog.rodrigods.com/playing-with-keystone-to-keystone-federation/
+
+ To integrate KeyStone with external enterprise ready authentication system
+ https://blog-nkinder.rhcloud.com/?p=130.
+
+ Key repliocation used in KeyStone Fernet token
+ http://lbragstad.com/?p=133,
+ http://lbragstad.com/?p=156
+
+ KeyStone revoke
+ http://specs.openstack.org/openstack/keystone-specs/api/v3/identity-api-v3-os-revoke-ext.html
diff --git a/docs/requirements/multisite-vnf-gr-requirement.rst b/docs/requirements/multisite-vnf-gr-requirement.rst
new file mode 100644
index 0000000..7e67cd0
--- /dev/null
+++ b/docs/requirements/multisite-vnf-gr-requirement.rst
@@ -0,0 +1,241 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=========================================
+ Multisite VNF Geo site disaster recovery
+=========================================
+
+Glossary
+========
+
+
+There are serveral concept required to be understood first
+ **Volume Snapshot**
+
+ **Volume Backup**
+
+ **Volume Replication**
+
+ **VM Snapshot**
+
+Please refer to reference section for these concept and comparison.
+
+
+Problem description
+===================
+
+Abstract
+------------
+
+a VNF (telecom application) should, be able to restore in another site for
+catastrophic failures happened.
+
+Description
+------------
+GR is to deal with more catastrophic failures (flood, earthquake, propagating
+software fault), and that loss of calls, or even temporary loss of service,
+is acceptable. It is also seems more common to accept/expect manual /
+administrator intervene into drive the process, not least because you don’t
+want to trigger the transfer by mistake.
+
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
+
+And finally, the lack of a requirement to do fast media transfer (without
+resignalling) generally removes the need for special networking behavior, with
+slower DNS-style redirection being acceptable.
+
+This use case is more concerns about cloud infrastructure level capability to
+support VNF geo site redundancy
+
+Requirement and candidate solutions analysis
+============================================
+
+For VNF to be restored from the backup site for catastrophic failures,
+the VNF's bootable volume and data volumes must be restorable.
+
+There are three ways of restorable boot and data volumes. Choosing the right
+one largely depends on the underlying characteristics and requirements of a
+VNF.
+
+1. Nova Quiesce + Cinder Consistency volume snapshot+ Cinder backup
+ 1).GR(Geo site disaster recovery )software get the volumes for each VM
+ in the VNF from Nova
+ 2).GR software call Nova quiesce API to quarantee quiecing VMs in desired
+ order
+ 3).GR software takes snapshots of these volumes in Cinder (NOTE: Because
+ storage often provides fast snapshot, so the duration between quiece and
+ unquiece is a short interval)
+ 4).GR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+ order
+ 5).GR software create volumes from the snapshots just taken in Cinder
+ 6).GR software create backup (incremental) for these volumes to remote
+ backup storage ( swift or ceph, or.. ) in Cinder
+ 7).if this site failed,
+ 7.1)GR software restore these backup volumes in remote Cinder in the
+ backup site.
+ 7.2)GR software boot VMs from bootable volumes from the remote Cinder in
+ the backup site and attach the regarding data volumes.
+
+Pros: Quiesce / unquiesce api from Nova, make transactional snapshot
+of a group of VMs is possible, for example, quiesce VM1, quiesce VM2,
+quiesce VM3, snapshot VM1's volumes, snapshot VM2's volumes, snapshot
+VM3's volumes, unquiesce VM3, unquiesce VM2, unquiesce VM1. For some
+telecom application, the order is very important for a group of VMs
+with strong relationship.
+
+Cons: Need Nova to expose the quiesce / unquiesce, fortunately it's alreay
+there in Nova-compute, just to add API layer to expose the functionality.
+NOTE: It's up to the DR policy and VNF character. Some VNF may afford short
+unavailable for DR purpose, and some other may use the standby of the VNF
+or member of the cluster to do disaster recovery replication to not interfere
+the service provided by the VNF. For these VNFs which can't be quieced/unquiece
+should use the option3 (VNF aware) to do the backup/replication.
+
+Requirement to OpenStack: Nova needs to expose quiesce / unquiesce api,
+which is lack in Nova now.
+
+Example characteristics and requirements of a VNF:
+ - VNF requires full data consistency during backup/restore process -
+ entire data should be replicated.
+ - VNF's data changes infrequently, which results in less number of volume
+ snapshots during a given time interval (hour, day, etc.);
+ - VNF is not highly dynamic, e.g. the number of scaling (in/out) operations
+ is small.
+ - VNF is not geo-redundant, does not aware of available cloud replication
+ mechanisms, has no built-in logic for replication: doesn't pre-select the
+ minimum replication data required for restarting the VNF in a different
+ site.
+ (NOTE: The VNF who can perform such data cherry picking should consider
+ case 3)
+
+2. Nova Snapshot + Glance Image + Cinder Snapshot + Cinder Backup
+ - GR software create VM snapshot in Nova
+ - Nova quiece the VM internally
+ (NOTE: The upper level application or GR software should take care of
+ avoiding infra level outage induced VNF outage)
+ - Nova create image in Glance
+ - Nova create a snapshot of the VM, including volumes
+ - If the VM is volume backed VM, then create volume snapshot in Cinder
+ - No image uploaded to glance, but add the snapshot in the meta data of the
+ image in Glance
+ - GR software to get the snapshot information from the Glance
+ - GR software create volumes from these snapshots
+ - GR software create backup (incremental) for these volumes to backup
+ storage( swift or ceph, or.. ) in Cinder if this site failed,
+ - GR software restore these backup volumes to Cinder in the backup site.
+ - GR software boot vm from bootable volume from Cinder in the backup site
+ and attach the data volumes.
+
+Pros: 1) Automatically quiesce/unquiesce, and snapshot of volumes of one VM.
+
+Cons: 1) Impossible to form a transactional group of VMs backup. for example,
+ quiesce VM1, quiesce VM2, quiesce VM3, snapshot VM1, snapshot VM2,
+ snapshot VM3, unquiesce VM3, unquiesce VM2, unquiesce VM1. This is
+ quite important in telecom application in some scenario
+ 2) not leverage the Cinder consistency group.
+ 3) One more service Glance involved in the backup. Not only to manage the
+ increased snapshot in Cinder, but also need to manage the regarding
+ temporary image in Glance.
+
+Requirement to OpenStack: None.
+
+Example: It's suitable for single VM backup/restore, for example, for the small
+scale configuration database virtual machine which is running in active/standby
+model. There is very rare use case for application that only one VM need to be
+taken snapshot for back up.
+
+3. Selective Replication of Persistent Data
+ - GR software creates datastore (Block/Cinder, Object/Swift, App Custom
+ storage) with replication enabled at the relevant scope, for use to
+ selectively backup/replicate desire data to GR backup site
+ - Cinder : Various work underway to provide async replication of cinder
+ volumes for disaster recovery use, including this presentation from
+ Vancouver http://www.slideshare.net/SeanCohen/dude-wheres-my-volume-open-stack-summit-vancouver-2015
+ - Swift : Range of options of using native Swift replicas (at expense of
+ tighter coupling) to replication using backend plugins or volume
+ replication
+ - Custom : A wide range of OpenSource technologies including Cassandra
+ and Ceph, with fully application level solutions also possible
+ - GR software get the reference of storage in the remote site storage
+ - If primary site failed,
+ - GR software managing recovery in backup site gets references to
+ relevant storage and passes to new software instances
+ - Software attaches (or has attached) replicated storage, in the case of
+ volumes promoting to writable.
+
+Pros: 1) Replication will be done in the storage level automatically, no need
+ to create backup regularly, for example, daily.
+ 2) Application selection of limited amount of data to replicate reduces
+ risk of replicating failed state and generates less overhear.
+ 3) Type of replication and model (active/backup, active/active, etc) can
+ be tailored to application needs
+
+Cons: 1) Applications need to be designed with support in mind, including both
+ selection of data to be replicated and consideration of consistency
+ 2) "Standard" support in Openstack for Disaster Recovery currently
+ fairly limited, though active work in this area.
+
+Requirement to OpenStack: save the real ref to volume admin_metadata after it
+has been managed by the driver https://review.openstack.org/#/c/182150/.
+
+Prototype
+-----------
+ None.
+
+Proposed solution
+-----------
+
+ requirements perspective we could recommend all three options for different
+ sceanrio, that it is an operator choice.
+ Options 1 & 2 seem to be more about replicating/backing up any VNF, whereas
+ option 3 is about proving a service to a replication aware application. It
+ should be noted that HA requirement is not a priority here, HA for VNF
+ project will handle the specific HA requirement. It should also be noted
+ that it's up to specific application how to do HA (out of scope here).
+ For the 3rd option, the app should know which volume has replication
+ capability, and write regarding data to this volume, and guarantee
+ consistency by the app itself. Option 3 is preferrable in HA scenario.
+
+
+Gaps
+====
+ 1) Nova to expose quiesce / unquiesce API:
+ https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-api
+ 2) Get the real ref to volume admin_metadata in Cinder:
+ https://review.openstack.org/#/c/182150/
+
+
+**NAME-THE-MODULE issues:**
+
+* Nova
+
+Affected By
+-----------
+ OPNFV multisite cloud.
+
+References
+==========
+
+ Cinder snapshot ( no material/BP about snapshot itself availble from web )
+ http://docs.openstack.org/cli-reference/content/cinderclient_commands.html
+
+
+ Cinder volume backup
+ https://blueprints.launchpad.net/cinder/+spec/volume-backups
+
+ Cinder incremtal backup
+ https://blueprints.launchpad.net/cinder/+spec/incremental-backup
+
+ Cinder volume replication
+ https://blueprints.launchpad.net/cinder/+spec/volume-replication
+
+ Create VM snapshot with volume backed ( not found better matrial to explain
+ the volume backed VM snapshot, only code tells )
+ https://bugs.launchpad.net/nova/+bug/1322195
+
+ Cinder consistency group
+ https://github.com/openstack/cinder-specs/blob/master/specs/juno/consistency-groups.rst
diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
new file mode 100644
index 0000000..56b0c59
--- /dev/null
+++ b/docs/userguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+ sphinx-quickstart on Tue Jun 9 19:12:31 2015.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ multisite-admin-user-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/userguide/multisite-admin-user-guide.rst b/docs/userguide/multisite-admin-user-guide.rst
new file mode 100644
index 0000000..ed91446
--- /dev/null
+++ b/docs/userguide/multisite-admin-user-guide.rst
@@ -0,0 +1,405 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+==========================
+Multisite admin user guide
+==========================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+A user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Token Format
+------------
+
+There are 3 types of token format supported by OpenStack KeyStone
+
+ **UUID**
+ **PKI/PKIZ**
+ **FERNET**
+
+It's very important to understand these token format before we begin the
+mutltisite identity service management. Please refer to the OpenStack
+official site for the identity management.
+http://docs.openstack.org/admin-guide-cloud/identity_management.html
+
+Key consideration in multisite scenario
+---------------------------------------
+
+A user is provided with a single authentication URL to the Identity (Keystone)
+service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. Keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. The user sends a request to a service in a
+selected region including the token. Now the service in the region, say Nova
+needs to validate the token. The service uses its configured keystone endpoint
+and service credentials to request token validation from Keystone. After the
+token is validated by KeyStone, the user is authorized to use the service.
+
+The key considerations for token validation in multisite scenario are:
+* Site level failure: impact on authN and authZ shoulde be as minimal as
+ possible
+* Scalable: as more and more sites added, no bottleneck in token validation
+* Amount of inter region traffic: should be kept as little as possible
+
+Hence, Keystone token validation should preferably be done in the same
+region as the service itself.
+
+The challenge to distribute KeyStone service into each region is the KeyStone
+backend. Different token format has different data persisted in the backend.
+
+* UUID: UUID tokens have a fixed size. Tokens are persistently stored and
+create a lot of database traffic, the persistence of token is for the revoke
+purpose. UUID tokens are validated online by Keystone, call to service will
+request keystone for token validation. Keystone can become a
+bottleneck in a large system. Due to this, UUID token type is not suitable
+for use in multi region clouds, no matter the Keystone database
+replicates or not.
+
+* PKI: Tokens are non persistent cryptographic based tokens and validated
+offline (not by the Keystone service) by Keystone middleware which is part
+of other services such as Nova. Since PKI tokens include endpoint for all
+services in all regions, the token size can become big. There are
+several ways to reduce the token size such as no catalog policy, endpoint
+filter to make a project binding with limited endpoints, and compressed PKI
+token - PKIZ, but the size of token is still unpredictable, making it difficult
+to manage. If catalog is not applied, that means the user can access all
+regions, in some scenario, it's not allowed to do like this. Centralized
+Keystone with PKI token to reduce inter region backend synchronization traffic.
+PKI tokens do produce Keystone traffic for revocation lists.
+
+* Fernet: Tokens are non persistent cryptographic based tokens and validated
+online by the Keystone service. Fernet tokens are more lightweight
+than PKI tokens and have a fixed size. Fernet tokens require Keystone
+deployed in a distributed manner, again to avoid inter region traffic. The
+data synchronization cost for the Keystone backend is smaller due to the non-
+persisted token.
+
+Cryptographic tokens bring new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope for
+this use case.
+
+Database deployment as the backend for KeyStone service
+------------------------------------------------------
+
+ Database replication:
+ -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable. But only master will
+provide write functionality, domain/project/role provisioning.
+ -Multi master synchronous: Galera(others like percona), not so scalable,
+for multi-master writing, and need more parameter tunning for WAN latency.It
+can provide the capability for limited multi-sites multi-write
+function for distributed KeyStone service.
+ -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+ Database server sharing:
+ In an OpenStack controller, normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+ Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+ Since only the Keystone database is to be sync or replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+ The database server can be shared when async master/slave replication is
+used, if global transaction identifiers GTID is enabled.
+
+Deployment options
+------------------
+
+- Distributed KeyStone service with PKI token
+
+ Deploy KeyStone service in two sites with database replication. If site
+level failure impact is not considered, then KeyStone service can only be
+deployed into one site.
+
+ The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request to KeyStone server. The
+drawback of PKI token is
+ * the endpoint list size in the token. If a project will be only spread in
+ very limited site number(region number), then we can use the endpoint
+ filter to reduce the token size, make it workable even a lot of sites
+ in the cloud.
+ * KeyStone middleware(which is co-located in the service like
+ Nova-API/xxx-API) will have to send the request to the KeyStone server
+ frequently for the revoke-list, in order to reject some malicious API
+ request, for example, a user has to be deactivated, but use an old token
+ to access OpenStack service.
+
+ For this option, needs to leverage database replication to provide
+KeyStone Active-Active mode across sites to reduce the impact of site failure.
+And the revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+ Site level keystone load balance is required to provide site level
+redundancy, otherwise the KeyStone middleware will not switch request to the
+healthy KeyStone server in time.
+
+ And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+ This option can be used for some scenario where there are very limited
+sites, especially if each project only spreads into limited sites ( regions ).
+
+- Distributed KeyStone service with Fernet token
+
+ Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project, Role, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+ This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+ Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+ The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+ This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+- Distributed KeyStone service with Fernet token + Async replication (
+ star-mode).
+
+ One master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+ Only the master cluster nodes are allowed to write, other slave nodes
+waiting for replication from the master cluster member( very little delay).
+
+ Pros.
+ Deploy database cluster in the master sites is to provide more master
+nodes, in order to provide more slaves could be done with async. replication
+in parallel. Two sites for the master cluster is to provide higher
+reliability (site level) for writing request, but reduce the maintaince
+challenge at the same time by limiting the cluster spreading over too many
+sites.
+ Multi-slaves in other sites is because of the slave has no knowledge of
+other slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+ Cons. Need to be aware of the chanllenge of key distribution and rotation
+for Fernet token.
+
+
+Multisite VNF Geo site disaster recovery
+========================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to restore in another site for
+catastrophic failures happened.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Geo site disaster recovery is to deal with more catastrophic failures
+(flood, earthquake, propagating software fault), and that loss of calls, or
+even temporary loss of service, is acceptable. It is also seems more common
+to accept/expect manual / administrator intervene into drive the process, not
+least because you don’t want to trigger the transfer by mistake.
+
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
+
+And finally, the lack of a requirement to do fast media transfer (without
+resignalling) generally removes the need for special networking behavior, with
+slower DNS-style redirection being acceptable.
+
+Here is more concerns about cloud infrastructure level capability to
+support VNF geo site disaster recovery
+
+Option1, Consistency application backup
+---------------------------------------
+
+The disater recovery process will work like this:
+
+1).DR(Geo site disaster recovery )software get the volumes for each VM
+in the VNF from Nova
+2).DR software call Nova quiesce API to quarantee quiecing VMs in desired
+order
+3).DR software takes snapshots of these volumes in Cinder (NOTE: Because
+storage often provides fast snapshot, so the duration between quiece and
+unquiece is a short interval)
+4).DR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+order
+5).DR software create volumes from the snapshots just taken in Cinder
+6).DR software create backup (incremental) for these volumes to remote
+backup storage ( swift or ceph, or.. ) in Cinder
+7).if this site failed,
+7.1)DR software restore these backup volumes in remote Cinder in the
+backup site.
+7.2)DR software boot VMs from bootable volumes from the remote Cinder in
+the backup site and attach the regarding data volumes.
+
+Note: It’s up to the DR policy and VNF character how to use the API. Some
+VNF may allow the standby of the VNF or member of the cluster to do
+quiece/unquiece to avoid interfering the service provided by the VNF. Some
+other VNF may afford short unavailable for DR purpose.
+
+This option provides application level consistency disaster recovery.
+This feature is WIP in OpenStack Mitaka release, and will be avaialle in next
+OPNFV release.
+
+Option2, Vitrual Machine Snapshot
+---------------------------------
+1).DR software create VM snapshot in Nova
+2).Nova quiece the VM internally
+ (NOTE: The upper level application or DR software should take care of
+ avoiding infra level outage induced VNF outage)
+3).Nova create image in Glance
+4).Nova create a snapshot of the VM, including volumes
+5).If the VM is volume backed VM, then create volume snapshot in Cinder
+5).No image uploaded to glance, but add the snapshot in the meta data of the
+ image in Glance
+6).DR software to get the snapshot information from the Glance
+7).DR software create volumes from these snapshots
+9).DR software create backup (incremental) for these volumes to backup storage
+ ( swift or ceph, or.. ) in Cinder
+10).if this site failed,
+10.1).DR software restore these backup volumes to Cinder in the backup site.
+10.2).DR software boot vm from bootable volume from Cinder in the backup site
+ and attach the data volumes.
+
+This option only provides single VM level consistency disaster recovery
+
+This feature is already available in current OPNFV release.
+
+Option3, Consistency volume replication
+---------------------------------------
+1).DR software creates datastore (Block/Cinder, Object/Swift, App Custom
+ storage) with replication enabled at the relevant scope, for use to
+ selectively backup/replicate desire data to GR backup site
+2).DR software get the reference of storage in the remote site storage
+3).If primary site failed,
+3.1).DR software managing recovery in backup site gets references to relevant
+ storage and passes to new software instances
+3.2).Software attaches (or has attached) replicated storage, in the case of
+ volumes promoting to writable.
+
+Pros:1) Replication will be done in the storage level automatically, no need to
+ create backup regularly, for example, daily.
+ 2) Application selection of limited amount of data to replicate reduces
+ risk of replicating failed state and generates less overhear.
+ 3) Type of replication and model (active/backup, active/active, etc) can
+ be tailored to application needs
+
+Cons:1) Applications need to be designed with support in mind, including both
+ selection of data to be replicated and consideration of consistency
+ 2) "Standard" support in Openstack for Disaster Recovery currently fairly
+ limited, though active work in this area.
+
+This feature is in discussion in OpenStack Mitaka release, and hopefully will
+be avaialle in next OPNFV release.
+
+
+VNF high availability across VIM
+================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Most of telecom applications have already been designed as
+Active-Standby/Active-Active/N-Way to achieve high availability
+(99.999%, corresponds to 5.26 minutes of unplanned downtime in a year),
+typically state replication or heart beat between
+Active-Active/Active-Active/N-Way (directly or via replicated database
+services, or via private designed message format) are required.
+
+We have to accept the currently limited availability ( 99.99%) of a
+given OpenStack instance, and intend to provide the availability of the
+telecom application by spreading its function across multiple OpenStack
+instances.To help with this, many people appear willing to provide multiple
+“independent” OpenStack instances in a single geographic site, with special
+networking (L2/L3) between clouds in that physical site.
+
+The telecom application often has different networking plane for different
+purpose:
+
+1) external network plane: using for communication with other telecom
+ application.
+
+2) components inter-communication plane: one VNF often consisted of several
+ components, this plane is designed for components inter-communication with
+ each other
+
+3) backup plane: this plane is used for the heart beat or state replication
+ between the component's active/standby or active/active or N-way cluster.
+
+4) management plane: this plane is mainly for the management purpose, like
+ configuration
+
+Generally these planes are separated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexible IP addressing
+plane. There are some interesting/hard requirements on the networking (L2/L3)
+between OpenStack instances, at lease the backup plane across different
+OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plane
+ for heartbeat or state replication. Overlay L2 network is preferred, the
+ reason is:
+
+ a) Support legacy compatibility: Some telecom app with built-in internal L2
+ network, for easy to move these app to virtualized telecom application, it
+ would be better to provide L2 network.
+
+ b) Support IP overlapping: multiple telecom applications may have
+ overlapping IP address for cross OpenStack instance networking Therefore,
+ over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+ For L3 networking, we can leverage the floating IP provided in current
+ Neutron, so no new feature requirement to OpenStack.
+
+Overlay L2 networking across OpenStack instances is in discussion with Neutron
+community.
+
+
+Revision: _sha1_
+
+Build date: |today|