From 8e3d1f151aa2ba629c9ed4ad61862ac147fff3ec Mon Sep 17 00:00:00 2001
From: Chaoyi Huang <joehuang@huawei.com>
Date: Tue, 22 Dec 2015 13:24:10 +0800
Subject: Add doc and update output according OpenStack spec's update

Add admin-user-guide and configuration-guide for multisite, and update
the output requirement to sync. the change in OpenStack spec, which was
reviewed in OpenStack community

Change-Id: Icff3dda7e204404f8003d6e06cde45151eb03446
Signed-off-by: Chaoyi Huang <joehuang@huawei.com>
---
 docs/configguide/index.rst                         |  30 ++
 docs/configguide/multisite-configuration-guide.rst | 120 ++++++
 docs/how-to-use-docs/documentation-example.rst     |   9 +-
 .../VNF_high_availability_across_VIM.rst           | 160 ++++++++
 .../multisite-identity-service-management.rst      | 376 +++++++++++++++++++
 docs/requirements/multisite-vnf-gr-requirement.rst | 241 ++++++++++++
 docs/userguide/index.rst                           |  30 ++
 docs/userguide/multisite-admin-user-guide.rst      | 405 +++++++++++++++++++++
 8 files changed, 1367 insertions(+), 4 deletions(-)
 create mode 100644 docs/configguide/index.rst
 create mode 100644 docs/configguide/multisite-configuration-guide.rst
 create mode 100644 docs/requirements/VNF_high_availability_across_VIM.rst
 create mode 100644 docs/requirements/multisite-identity-service-management.rst
 create mode 100644 docs/requirements/multisite-vnf-gr-requirement.rst
 create mode 100644 docs/userguide/index.rst
 create mode 100644 docs/userguide/multisite-admin-user-guide.rst

(limited to 'docs')

diff --git a/docs/configguide/index.rst b/docs/configguide/index.rst
new file mode 100644
index 0000000..49b30f1
--- /dev/null
+++ b/docs/configguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+   sphinx-quickstart on Tue Jun  9 19:12:31 2015.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+  :height: 40
+  :width: 200
+  :alt: OPNFV
+  :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+   :numbered:
+   :maxdepth: 4
+
+   multisite-configuration-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/configguide/multisite-configuration-guide.rst b/docs/configguide/multisite-configuration-guide.rst
new file mode 100644
index 0000000..4a20e83
--- /dev/null
+++ b/docs/configguide/multisite-configuration-guide.rst
@@ -0,0 +1,120 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+  :height: 40
+  :width: 200
+  :alt: OPNFV
+  :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+=============================
+Multisite configuration guide
+=============================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Before you read
+---------------
+
+This chapter does not intend to cover all configuration of KeyStone and other
+OpenStack services to work together with KeyStone.
+
+This chapter focuses only on the configuration part should be taken into
+account in multi-site scenario.
+
+Please read the configuration documentation related to identity management
+of OpenStack for all configuration items.
+
+http://docs.openstack.org/liberty/config-reference/content/ch_configuring-openstack-identity.html
+
+How to configure the database cluster for synchronization or asynchrounous
+repliation in multi-site scenario is out of scope of this document. The only
+remainder is that for the synchronization or replication, only Keystone
+database is required. If you are using MySQL, you can configure like this:
+
+In the master:
+
+   .. code-block:: bash
+
+      binlog-do-db=keystone
+
+In the slave:
+
+   .. code-block:: bash
+
+      replicate-do-db=keystone
+
+
+Deployment options
+------------------
+
+For each detail description of each deployment option, please refer to the
+admin-user-guide.
+
+-  Distributed KeyStone service with PKI token
+
+   In KeyStone configuration file, PKI token format should be configured
+
+   .. code-block:: bash
+
+      provider = pki
+
+   or
+
+   .. code-block:: bash
+
+      provider = pkiz
+
+   In the [keystone_authtoken] section of each OpenStack service configuration
+   file in each site, configure the identity_url and auth_uri to the address
+   of KeyStone service
+
+   .. code-block:: bash
+
+      identity_uri = https://keystone.your.com:35357/
+      auth_uri = http://keystone.your.com:5000/v2.0
+
+   It's better to use domain name for the KeyStone service, but not to use IP
+   address directly, especially if you deployed KeyStone service in at least
+   two sites for site level high availability.
+
+-  Distributed KeyStone service with Fernet token
+-  Distributed KeyStone service with Fernet token + Async replication (
+   star-mode).
+
+   In these two deployment options, the token validation is planned to be done
+   in local site.
+
+   In KeyStone configuration file, Fernet token format should be configured
+
+   .. code-block:: bash
+
+      provider = fernet
+
+   In the [keystone_authtoken] section of each OpenStack service configuration
+   file in each site, configure the identity_url and auth_uri to the address
+   of local KeyStone service
+
+   .. code-block:: bash
+
+      identity_uri = https://local-keystone.your.com:35357/
+      auth_uri = http://local-keystone.your.com:5000/v2.0
+
+   and especially, configure the region_name to your local region name, for
+   example, if you are configuring services in RegionOne, and there is local
+   KeyStone service in RegionOne, then
+
+   .. code-block:: bash
+
+      region_name = RegionOne
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/how-to-use-docs/documentation-example.rst b/docs/how-to-use-docs/documentation-example.rst
index afcf758..24b4c8a 100644
--- a/docs/how-to-use-docs/documentation-example.rst
+++ b/docs/how-to-use-docs/documentation-example.rst
@@ -1,5 +1,5 @@
 .. two dots create a comment. please leave this logo at the top of each of your rst files.
-.. image:: ../etc/opnfv-logo.png 
+.. image:: ../etc/opnfv-logo.png
   :height: 40
   :width: 200
   :alt: OPNFV
@@ -21,8 +21,9 @@ this is the directory structure of the docs/ directory that can be found in the
     ./how-to-use-docs/documentation-example.rst
     ./how-to-use-docs/index.rst
 
-To create your own documentation, Create any number of directories (depending on your need) and place in each of them an index.rst.
-This index file must refence your other rst files.
+To create your own documentation, Create any number of directories (depending
+on your need) and place in each of them an index.rst. This index file must
+refence your other rst files.
 
 * Here is an example index.rst
 
@@ -59,7 +60,7 @@ For verify jobs a link to the documentation will show up as a comment in gerrit
 
 * Merge jobs
 
-Once you are happy with the look of your documentation you can submit the patchset the merge job will 
+Once you are happy with the look of your documentation you can submit the patchset the merge job will
 copy the output of each documentation directory to http://artifacts.opnfv.org/$project/docs/$name_of_your_folder/index.html
 
 Here are some quick examples of how to use rst markup
diff --git a/docs/requirements/VNF_high_availability_across_VIM.rst b/docs/requirements/VNF_high_availability_across_VIM.rst
new file mode 100644
index 0000000..1a7d41b
--- /dev/null
+++ b/docs/requirements/VNF_high_availability_across_VIM.rst
@@ -0,0 +1,160 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+VNF high availability across VIM
+=======================================
+
+Problem description
+===================
+
+Abstract
+------------
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Description
+------------
+VNF (Telecom application running over cloud) may (already) be designed as
+Active-Standby/Active-Active/N-Way to achieve high availability,
+
+With a telecoms focus, this generally refers both to availability of service
+(i.e. the ability to make new calls), but also maintenance of ongoing control
+plane state and active media processing(i.e. “keeping up” existing calls).
+
+Traditionally telecoms systems are designed to maintain state and calls across
+pretty much the full range of single-point failures.  As listed this includes
+power supply, hard drive, physical server or network switch, but also covers
+software failure, and maintenance operations such as software upgrade.
+
+To provide this support, typically requires state replication between
+application instances (directly or via replicated database services, or via
+private designed message format).  It may also require special case handling of
+media endpoints, to allow transfer of median short time scales (<1s) without
+requiring end-to-end resignalling (e.g.RTP redirection via IP / MAC address
+transfers c.f VRRP).
+
+With a migration to NFV, a commonly expressed desire by carriers is to provide
+the same resilience to any single point(s) of failure in the cloud
+infrastructure.
+
+This could be done by making each cloud instance fully HA (a non-trivial task to
+do right and to prove it has been done right) , but the preferred approach
+appears to be to accept the currently limited availability of a given cloud
+instance (no desire to radically rework this for telecoms), and instead to
+provide solution availability by spreading function across multiple cloud
+instances (i.e. the same approach used today todeal with hardware and software
+failures).
+
+A further advantage of this approach, is it provides a good basis for seamless
+upgrade of infrastructure software revision, where you can spin up an additional
+up-level cloud, gradually transfer over resources / app instances from one of
+your other clouds, before finally turning down the old cloud instance when no
+longer required.
+
+If fast media / control failure over is still required (which many/most carriers
+still seem to believe it is) there are some interesting/hard requirements on the
+networking between cloud instances. To help with this, many people appear
+willing to provide multiple “independent” cloud instances in a single geographic
+site, with special networking between clouds in that physical site.
+"independent" in quotes is because some coordination between cloud instances is
+obviously required, but this has to be implemented in a fashion which reduces
+the potential for correlated failure to very low levels (at least as low as the
+required overall application availability).
+
+Analysis of requirements to OpenStack
+===========================
+The VNF often has different networking plane for different purpose:
+
+external network plane: using for communication with other VNF
+components inter-communication plane: one VNF often consisted of several
+components, this plane is designed for components inter-communication with each
+other
+backup plance: this plane is used for the heart beat or state replication
+between the component's active/standy or active/active or N-way cluster.
+management plane: this plane is mainly for the management purpose
+
+Generally these planes are seperated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexsible IP addressing
+plane.
+
+to make the VNF can work with HA mode across different OpenStack instances in
+one site (but not limited to), need to support at lease the backup plane across
+different OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plance for
+heartbeat or state replication. Overlay L2 network is preferred, the reason is:
+a. Support legacy compatibility: Some telecom app with built-in internal L2
+network, for easy to move these app to VNF, it would be better to provide L2
+network b. Support IP overlapping: multiple VNFs may have overlaping IP address
+for cross OpenStack instance networking
+Therefore, over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+For L3 networking, we can leverage the floating IP provided in current Neutron,
+so no new feature requirement to OpenStack.
+
+3) The IP address used for VNF to connect with other VNFs should be able to be
+floating cross OpenStack instance. For example, if the master failed, the IP
+address should be used in the standby which is running in another OpenStack
+instance. There are some method like VRRP/GARP etc can help the movement of the
+external IP, so no new feature will be added to OpenStack.
+
+
+Prototype
+-----------
+    None.
+
+Proposed solution
+-----------
+
+    requirements perspective It's up to application descision to use L2 or L3
+networking across Neutron.
+
+    For Neutron, a L2 network is consisted of lots of ports. To make the cross
+Neutron L2 networking is workable, we need some fake remote ports in local
+Neutron to represent VMs in remote site ( remote OpenStack ).
+
+    the fake remote port will reside on some VTEP ( for VxLAN ), the tunneling
+IP address of the VTEP should be the attribute of the fake remote port, so that
+the local port can forward packet to correct tunneling endpoint.
+
+    the idea is to add one more ML2 mechnism driver to capture the fake remote
+port CRUD( creation, retievement, update, delete)
+
+    when a fake remote port is added/update/deleted, then the ML2 mechanism
+driver for these fake ports will activate L2 population, so that the VTEP
+tunneling endpoint information could be understood by other local ports.
+
+    it's also required to be able to query the port's VTEP tunneling endpoint
+information through Neutron API, in order to use these information to create
+fake remote port in another Neutron.
+
+    In the past, the port's VTEP ip address is the host IP where the VM resides.
+But the this BP https://review.openstack.org/#/c/215409/ will make the port free
+of binding to host IP as the tunneling endpoint, you can even specify L2GW ip
+address as the tunneling endpoint.
+
+    Therefore a new BP will be registered to processing the fake remote port, in
+order make cross Neutron L2 networking is feasible. RFE is registered first:
+https://bugs.launchpad.net/neutron/+bug/1484005
+
+
+Gaps
+====
+    1) fake remote port for cross Neutron L2 networking
+
+
+**NAME-THE-MODULE issues:**
+
+* Neutron
+
+Affected By
+-----------
+    OPNFV multisite cloud.
+
+References
+==========
+
diff --git a/docs/requirements/multisite-identity-service-management.rst b/docs/requirements/multisite-identity-service-management.rst
new file mode 100644
index 0000000..b411c28
--- /dev/null
+++ b/docs/requirements/multisite-identity-service-management.rst
@@ -0,0 +1,376 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported
+License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+ Multisite identity service management
+=======================================
+
+Glossary
+========
+
+There are 3 types of token supported by OpenStack KeyStone
+    **UUID**
+
+    **PKI/PKIZ**
+
+    **FERNET**
+
+Please refer to reference section for these token formats, benchmark and
+comparation.
+
+
+Problem description
+===================
+
+Abstract
+------------
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Description
+------------
+
+- User/Group Management: e.g. use of LDAP, should OPNFV be agnostic to this?
+  Reusing the LDAP infrastructure that is mature and has features lacking in
+Keystone (e.g.password aging and policies). KeyStone can use external system to
+do the user authentication, and user/group management could be the job of
+external system, so that KeyStone can reuse/co-work with enterprise identity
+management. KeyStone's main role in OpenStack is to provide
+service(Nova,Cinder...) aware token, and do the authorization. You can refer to
+this post https://blog-nkinder.rhcloud.com/?p=130.Therefore, LDAP itself should
+be a topic out of our scope.
+
+- Role assignment: In case of federation(and perhaps other solutions) it is not
+  feasible/scalable to do role assignment to users. Role assignment to groups
+  is better. Role assignment will be done usually based on group. KeyStone
+  supports this.
+
+- Amount of inter region traffic: should be kept as little as possible,
+  consider CERNs Ceilometer issue as described in
+http://openstack-in-production.blogspot.se/2014/03/cern-cloud-architecture-update-for.html
+
+Requirement analysis
+===========================
+
+- A user is provided with a single authentication URL to the Identity
+  (Keystone) service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. The keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. With token type UUID/Fernet, the user request the
+service catalog. With PKI tokens the service catalog is included in the token.
+The user sends a request to a service in a selected region including the token.
+Now the service in the region, say Nova needs to validate the token. Nova uses
+its configured keystone endpoint and service credentials to request token
+validation from Keystone. The Keystone token validation should preferably be
+done in the same region as Nova itself. Now Keystone has to validate the token
+that also (always?) includes a project ID in order to make sure the user is
+authorized to use Nova. The project ID is stored in the assignment backend -
+tables in the Keystone SQL database. For this project ID validation the
+assignment backend database needs to have the same content as the keystone who
+issued the token.
+
+- So either 1) services in all regions are configured with a central keystone
+  endpoint through which all token validations will happen. or 2) the Keystone
+assignment backend database is replicated and thus available to Keystone
+instances locally in each region.
+
+  Alt 2) is obviously the only scalable solution that produce no inter region
+traffic for normal service usage. Only when data in the assignment backend is
+changed, replication traffic will be sent between regions. Assignment data
+includes domains, projects, roles and role assignments.
+
+Keystone deployment:
+
+    - Centralized: a single Keystone service installed in some location, either
+      in a "master" region or totally external as a service to OpenStack
+      regions.
+    - Distributed: a Keystone service is deployed in each region
+
+Token types:
+
+    - UUID: tokens are persistently stored and creates a lot of database
+      traffic, the persistence of token is for the revoke purpose. UUID tokens
+are online validated by Keystone, each API calling to service will ask token
+validation from KeyStone. Keystone can become a bottleneck in a large system
+due to this. UUID token type is not suitable for use in multi region clouds at
+all, no matter the solution used for the Keystone database replication (or
+not). UUID tokens have a fixed size.
+
+    - PKI: tokens are non persistent cryptographic based tokens and offline
+      validated (not by the Keystone service) by Keystone middleware
+which is part of other services such as Nova. Since PKI tokens include endpoint
+for all services in all regions, the token size can become big.There are
+several ways to reduce the token size, no catalog policy, endpoint filter to
+make a project binding with limited endpoints, and compressed PKI token - PKIZ,
+but the size of token is still predictable, make it difficult to manage. If no
+catalog applied, that means the user can access all regions, in some scenario,
+it's not allowed to do like this.
+
+    - Fernet: tokens are non persistent cryptographic based tokens and online
+      validated by the Keystone service. Fernet tokens are more lightweigth
+then PKI tokens and have a fixed size.
+
+    PKI (offline validated) are needed with a centralized Keystone to avoid
+inter region traffic. PKI tokens do produce Keystone traffic for revocation
+lists.
+
+    Fernet tokens requires Keystone deployed in a distributed manner, again to
+avoid inter region traffic.
+
+    Cryptographic tokens brings new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope of
+this use case.
+
+Database deployment:
+
+    Database replication:
+    -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable
+    -Multi master synchronous: Galera(others like percona), not so scalable,
+for multi-master writing, and need more parameter tunning for WAN latency.
+    -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+    Database server sharing:
+    In an OpenStack controller normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+    Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+    Since only the Keystone database is to be sync replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+    The database server can be shared when async master/slave repl is used, if
+global transaction identifiers GTID is enabled.
+
+
+Candidate solution analysis
+------------------------------------
+
+-  KeyStone service (Distributed) with Fernet token
+
+    Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows  and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project. User, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+    This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+    Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+    The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+    This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+-  KeyStone service(Distributed) with Fernet token + Async replication (
+   multi-cluster mode).
+
+    We may have several KeyStone cluster with Fernet token, for example,
+cluster1 ( site1, site2, … site 10 ), cluster 2 ( site11, site 12,..,site 20).
+Then do the DB async replication among different cluster asynchronously.
+
+    A prototype of this has been down on this. In some blogs they call it
+"hybridreplication". Architecturally you have a master region where you do
+keystone writes. The other regions is read-only.
+http://severalnines.com/blog/deploy-asynchronous-slave-galera-mysql-easy-way
+http://severalnines.com/blog/replicate-mysql-server-galera-cluster
+
+    Only one DB cluster (the master DB cluster) is allowed to write(but still
+multisite, not all sites), other clusters waiting for replication. Inside the
+master cluster, "write" is allowed in multiple region for the distributed lock
+in the DB. But please notice the challenge of key distribution and rotation for
+Fernet token, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+-  KeyStone service(Distributed) with Fernet token + Async replication (
+   star-mode).
+
+    one master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+    Only the master cluster nodes are allowed to write,  other slave nodes
+waiting for replication from the master cluster ( very little delay) member.
+But  the chanllenge of key distribution and rotation for Fernet token should be
+settled, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+    Pros.
+    Why cluster in the master sites? There are lots of master nodes in the
+cluster, in order to provide more slaves could be done with async. replication
+in parallel.  Why two sites for the master cluster? to provide higher
+reliability (site level) for writing request.
+    Why using multi-slaves in other sites. Slave has no knowledge of other
+slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+    Cons. The distribution/rotation of key management.
+
+-  KeyStone service(Distributed) with PKI token
+
+    The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request toKeyStone server. The
+drawback of PKI token is 1) the endpoint list size in the token. If a project
+will be only spread in very limited site number(region number), then we can use
+the endpoint filter to reduce the token size, make it workable even a lot of
+sites in the cloud. 2) KeyStone middleware(the old KeyStone client, which
+co-locate in Nova/xxx-API) will have to send the request to the KeyStone server
+frequently for the revoke-list, in order to reject some malicious API request,
+for example, a user has be deactivated, but use an old token to access
+OpenStack service.
+
+    For this solution, except above issues, we need also to provide KeyStone
+Active-Active mode across site to reduce the impact of site failure. And the
+revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+    Site level keystone load balance is required to provide site level
+redundancy. Otherwise the KeyStone middleware will not switch request to the
+health KeyStone server in time.
+
+    This solution can be used for some scenario, especially a project only
+spread in limited sites ( regions ).
+
+    And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+-  KeyStone service(Distributed) with UUID token
+
+    Because each token validation will be sent to KeyStone server,and the token
+persistence also makes the DB size larger than Fernet token, not so good as the
+fernet token to provide a distributed KeyStone service. UUID is a solution
+better for small scale and inside one site.
+
+    Cons: UUID tokens are persistently stored so will cause a lot of inter
+region replication traffic, tokens will be persisted for authorization and
+revoke purpose, the frequent changed database leads to a lot of inter region
+replication traffic.
+
+-  KeyStone service(Distributed) with Fernet token + KeyStone federation You
+    have to accept the drawback of KeyStone federation if you have a lot of
+sites/regions. Please refer to KeyStone federation section
+
+-  KeyStone federation
+    In this solution, we can install KeyStone  service in each site and with
+its own database. Because we have to make the KeyStone IdP and SP know each
+other, therefore the configuration needs to be done accordingly, and setup the
+role/domain/group mapping, create regarding region in the pair.As sites
+increase, if each user is able to access all sites, then full-meshed
+mapping/configuration has to be done. Whenever you add one more site, you have
+to do n*(n-1) sites configuration/mapping. The complexity will be great enough
+as the sites number increase.
+
+    KeyStone Federation is mainly for different cloud admin to borrow/rent
+resources, for example, A company and B company, A private cloud and B public
+cloud, and both of them using OpenStack based cloud. Therefore a lot of mapping
+and configuration has to be done to make it work.
+
+-  KeyStone service (Centralized)with Fernet token
+
+    cons: inter region traffic for token validation, token validation requests
+from all other sites has to be sent to the centralized site. Too frequent inter
+region traffic.
+
+-  KeyStone service(Centralized) with PKI token
+
+    cons: inter region traffic for tokenrevocation list management, the token
+revocation list request from all other sites has to be sent to the centralized
+site. Too frequent inter region traffic.
+
+-  KeyStone service(Centralized) with UUID token
+
+    cons: inter region traffic for token validation, the token validation
+request from all other sites has to be sent to the centralized site. Too
+frequent inter region traffic.
+
+Prototype
+-----------
+    A prototype of the candidate solution "KeyStone service(Distributed) with
+Fernet token + Async replication ( multi-cluster mode)" has been executed Hans
+Feldt and Chaoyi Huang, please refer to https://github.com/hafe/dockers/ . And
+one issue was found "Can't specify identity endpoint for token validation among
+several keystone servers in keystonemiddleware", please refer to the Gaps
+section.
+
+Gaps
+====
+    Can't specify identity endpoint for token validation among several keystone
+servers in keystonemiddleware.
+
+
+**NAME-THE-MODULE issues:**
+
+* keystonemiddleware
+
+  * Can't specify identity endpoint for token validation among several keystone
+  * servers in keystonemiddleware:
+  * https://bugs.launchpad.net/keystone/+bug/1488347
+
+Affected By
+-----------
+    OPNFV multisite cloud.
+
+Conclusion
+-----------
+
+    As the prototype demonstrate the cluster level aysn. replication capability
+and fernet token validation in local site is feasible. And the candidate
+solution "KeyStone service(Distributed) with Fernet token + Async replication (
+star-mode)" is simplified solution of the prototyped one, it's much more easier
+in deployment and maintenance, with better scalability.
+
+    Therefore the candidate solution "KeyStone service(Distributed) with Fernet
+token + Async replication ( star-mode)" for multsite OPNFV cloud is
+recommended.
+
+References
+==========
+
+    There are 3 format token (UUID, PKI/PKIZ, Fernet) provided byKeyStone, this
+blog give a very good description, benchmark and comparation:
+    http://dolphm.com/the-anatomy-of-openstack-keystone-token-formats/
+    http://dolphm.com/benchmarking-openstack-keystone-token-formats/
+
+    To understand the benefit and shortage of PKI/PKIZ token, pleaserefer to :
+    https://www.mirantis.com/blog/understanding-openstack-authentication-keystone-pk
+
+    To understand KeyStone federation and how to use it:
+    http://blog.rodrigods.com/playing-with-keystone-to-keystone-federation/
+
+    To integrate KeyStone with external enterprise ready authentication system
+    https://blog-nkinder.rhcloud.com/?p=130.
+
+    Key repliocation used in KeyStone Fernet token
+    http://lbragstad.com/?p=133,
+    http://lbragstad.com/?p=156
+
+    KeyStone revoke
+    http://specs.openstack.org/openstack/keystone-specs/api/v3/identity-api-v3-os-revoke-ext.html
diff --git a/docs/requirements/multisite-vnf-gr-requirement.rst b/docs/requirements/multisite-vnf-gr-requirement.rst
new file mode 100644
index 0000000..7e67cd0
--- /dev/null
+++ b/docs/requirements/multisite-vnf-gr-requirement.rst
@@ -0,0 +1,241 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=========================================
+ Multisite VNF Geo site disaster recovery
+=========================================
+
+Glossary
+========
+
+
+There are serveral concept required to be understood first
+    **Volume Snapshot**
+
+    **Volume Backup**
+
+    **Volume Replication**
+
+    **VM Snapshot**
+
+Please refer to reference section for these concept and comparison.
+
+
+Problem description
+===================
+
+Abstract
+------------
+
+a VNF (telecom application) should, be able to restore in another site for
+catastrophic failures happened.
+
+Description
+------------
+GR is to deal with more catastrophic failures (flood, earthquake, propagating
+software fault), and that loss of calls, or even temporary loss of service,
+is acceptable. It is also seems more common to accept/expect manual /
+administrator intervene into drive the process, not least because you don’t
+want to trigger the transfer by mistake.
+
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
+
+And finally, the lack of a requirement to do fast media transfer (without
+resignalling) generally removes the need for special networking behavior, with
+slower DNS-style redirection being acceptable.
+
+This use case is more concerns about cloud infrastructure level capability to
+support VNF geo site redundancy
+
+Requirement and candidate solutions analysis
+============================================
+
+For VNF to be restored from the backup site for catastrophic failures,
+the VNF's bootable volume and data volumes must be restorable.
+
+There are three ways of restorable boot and data volumes. Choosing the right
+one largely depends on the underlying characteristics and requirements of a
+VNF.
+
+1. Nova Quiesce + Cinder Consistency volume snapshot+ Cinder backup
+   1).GR(Geo site disaster recovery )software get the volumes for each VM
+   in the VNF from Nova
+   2).GR software call Nova quiesce API to quarantee quiecing VMs in desired
+   order
+   3).GR software takes snapshots of these volumes in Cinder (NOTE: Because
+   storage often provides fast snapshot, so the duration between quiece and
+   unquiece is a short interval)
+   4).GR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+   order
+   5).GR software create volumes from the snapshots just taken in Cinder
+   6).GR software create backup (incremental) for these volumes to remote
+   backup storage ( swift or ceph, or.. ) in Cinder
+   7).if this site failed,
+   7.1)GR software restore these backup volumes in remote Cinder in the
+   backup site.
+   7.2)GR software boot VMs from bootable volumes from the remote Cinder in
+   the backup site and attach the regarding data volumes.
+
+Pros: Quiesce / unquiesce api from Nova, make transactional snapshot
+of a group of VMs is possible, for example, quiesce VM1, quiesce VM2,
+quiesce VM3, snapshot VM1's volumes, snapshot VM2's volumes, snapshot
+VM3's volumes, unquiesce VM3, unquiesce VM2, unquiesce VM1. For some
+telecom application, the order is very important for a group of VMs
+with strong relationship.
+
+Cons: Need Nova to expose the quiesce / unquiesce, fortunately it's alreay
+there in Nova-compute, just to add API layer to expose the functionality.
+NOTE: It's up to the DR policy and VNF character. Some VNF may afford short
+unavailable for DR purpose, and some other may use the standby of the VNF
+or member of the cluster to do disaster recovery replication to not interfere
+the service provided by the VNF. For these VNFs which can't be quieced/unquiece
+should use the option3 (VNF aware) to do the backup/replication.
+
+Requirement to OpenStack: Nova needs to expose quiesce / unquiesce api,
+which is lack in Nova now.
+
+Example characteristics and requirements of a VNF:
+    - VNF requires full data consistency during backup/restore process -
+      entire data should be replicated.
+    - VNF's data changes infrequently, which results in less number of volume
+      snapshots during a given time interval (hour, day, etc.);
+    - VNF is not highly dynamic, e.g. the number of scaling (in/out) operations
+      is small.
+    - VNF is not geo-redundant, does not aware of available cloud replication
+      mechanisms, has no built-in logic for replication: doesn't pre-select the
+      minimum replication data required for restarting the VNF in a different
+      site.
+      (NOTE: The VNF who can perform such data cherry picking should consider
+      case 3)
+
+2. Nova Snapshot + Glance Image + Cinder Snapshot + Cinder Backup
+    - GR software create VM snapshot in Nova
+    - Nova quiece the VM internally
+      (NOTE: The upper level application or GR software should take care of
+      avoiding infra level outage induced VNF outage)
+    - Nova create image in Glance
+    - Nova create a snapshot of the VM, including volumes
+    - If the VM is volume backed VM, then create volume snapshot in Cinder
+    - No image uploaded to glance, but add the snapshot in the meta data of the
+      image in Glance
+    - GR software to get the snapshot information from the Glance
+    - GR software create volumes from these snapshots
+    - GR software create  backup (incremental) for these volumes to backup
+      storage( swift or ceph, or.. ) in Cinder if this site failed,
+    - GR software restore these backup volumes to Cinder in the backup site.
+    - GR software boot vm from bootable volume from Cinder in the backup site
+      and attach the data volumes.
+
+Pros: 1) Automatically quiesce/unquiesce, and snapshot of volumes of one VM.
+
+Cons: 1) Impossible to form a transactional group of VMs backup.  for example,
+         quiesce VM1, quiesce VM2, quiesce VM3, snapshot VM1, snapshot VM2,
+         snapshot VM3, unquiesce VM3, unquiesce VM2, unquiesce VM1. This is
+         quite important in telecom application in some scenario
+      2) not leverage the Cinder consistency group.
+      3) One more service Glance involved in the backup. Not only to manage the
+         increased snapshot in Cinder, but also need to manage the regarding
+         temporary image in Glance.
+
+Requirement to OpenStack: None.
+
+Example: It's suitable for single VM backup/restore, for example, for the small
+scale configuration database virtual machine which is running in active/standby
+model. There is very rare use case for application that only one VM need to be
+taken snapshot for back up.
+
+3. Selective Replication of Persistent Data
+    - GR software creates datastore (Block/Cinder, Object/Swift, App Custom
+      storage) with replication enabled at the relevant scope, for use to
+      selectively backup/replicate desire data to GR backup site
+       - Cinder : Various work underway to provide async replication of cinder
+         volumes for disaster recovery use, including this presentation from
+         Vancouver http://www.slideshare.net/SeanCohen/dude-wheres-my-volume-open-stack-summit-vancouver-2015
+       - Swift : Range of options of using native Swift replicas (at expense of
+         tighter coupling) to replication using backend plugins or volume
+         replication
+       - Custom : A wide range of OpenSource technologies including Cassandra
+         and Ceph, with fully application level solutions also possible
+    - GR software get the reference of storage in the remote site storage
+    - If primary site failed,
+       - GR software managing recovery in backup site gets references to
+         relevant storage and passes to new software instances
+       - Software attaches (or has attached) replicated storage, in the case of
+         volumes promoting to writable.
+
+Pros:  1) Replication will be done in the storage level automatically, no need
+          to create backup regularly, for example, daily.
+       2) Application selection of limited amount of data to replicate reduces
+          risk of replicating failed state and generates less overhear.
+       3) Type of replication and model (active/backup, active/active, etc) can
+          be tailored to application needs
+
+Cons:  1) Applications need to be designed with support in mind, including both
+          selection of data to be replicated and consideration of consistency
+       2) "Standard" support in Openstack for Disaster Recovery currently
+          fairly limited, though active work in this area.
+
+Requirement to OpenStack: save the real ref to volume admin_metadata after it
+has been managed by the driver    https://review.openstack.org/#/c/182150/.
+
+Prototype
+-----------
+    None.
+
+Proposed solution
+-----------
+
+    requirements perspective we could recommend all three options for different
+    sceanrio, that it is an operator choice.
+    Options 1 & 2 seem to be more about replicating/backing up any VNF, whereas
+    option 3 is about proving a service to a replication aware application. It
+    should be noted that HA requirement is not a priority here, HA for VNF
+    project will handle the specific HA requirement. It should also be noted
+    that it's up to specific application how to do HA (out of scope here).
+    For the 3rd option, the app should know which volume has replication
+    capability, and write regarding data to this volume, and guarantee
+    consistency by the app itself. Option 3 is preferrable in HA scenario.
+
+
+Gaps
+====
+    1) Nova to expose quiesce / unquiesce API:
+       https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-api
+    2)  Get the real ref to volume admin_metadata in Cinder:
+       https://review.openstack.org/#/c/182150/
+
+
+**NAME-THE-MODULE issues:**
+
+* Nova
+
+Affected By
+-----------
+    OPNFV multisite cloud.
+
+References
+==========
+
+   Cinder snapshot ( no material/BP about snapshot itself availble from web )
+   http://docs.openstack.org/cli-reference/content/cinderclient_commands.html
+
+
+   Cinder volume backup
+   https://blueprints.launchpad.net/cinder/+spec/volume-backups
+
+   Cinder incremtal backup
+   https://blueprints.launchpad.net/cinder/+spec/incremental-backup
+
+   Cinder volume replication
+   https://blueprints.launchpad.net/cinder/+spec/volume-replication
+
+    Create VM snapshot with volume backed ( not found better matrial to explain
+    the volume backed VM snapshot, only code tells )
+    https://bugs.launchpad.net/nova/+bug/1322195
+
+    Cinder consistency group
+    https://github.com/openstack/cinder-specs/blob/master/specs/juno/consistency-groups.rst
diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
new file mode 100644
index 0000000..56b0c59
--- /dev/null
+++ b/docs/userguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+   sphinx-quickstart on Tue Jun  9 19:12:31 2015.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+  :height: 40
+  :width: 200
+  :alt: OPNFV
+  :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+   :numbered:
+   :maxdepth: 4
+
+   multisite-admin-user-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/userguide/multisite-admin-user-guide.rst b/docs/userguide/multisite-admin-user-guide.rst
new file mode 100644
index 0000000..ed91446
--- /dev/null
+++ b/docs/userguide/multisite-admin-user-guide.rst
@@ -0,0 +1,405 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+  :height: 40
+  :width: 200
+  :alt: OPNFV
+  :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+==========================
+Multisite admin user guide
+==========================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+A user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Token Format
+------------
+
+There are 3 types of token format supported by OpenStack KeyStone
+
+    **UUID**
+    **PKI/PKIZ**
+    **FERNET**
+
+It's very important to understand these token format before we begin the
+mutltisite identity service management. Please refer to the OpenStack
+official site for the identity management.
+http://docs.openstack.org/admin-guide-cloud/identity_management.html
+
+Key consideration in multisite scenario
+---------------------------------------
+
+A user is provided with a single authentication URL to the Identity (Keystone)
+service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. Keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. The user sends a request to a service in a
+selected region including the token. Now the service in the region, say Nova
+needs to validate the token. The service uses its configured keystone endpoint
+and service credentials to request token validation from Keystone. After the
+token is validated by KeyStone, the user is authorized to use the service.
+
+The key considerations for token validation in multisite scenario are:
+* Site level failure: impact on authN and authZ shoulde be as minimal as
+  possible
+* Scalable: as more and more sites added, no bottleneck in token validation
+* Amount of inter region traffic: should be kept as little as possible
+
+Hence, Keystone token validation should preferably be done in the same
+region as the service itself.
+
+The challenge to distribute KeyStone service into each region is the KeyStone
+backend. Different token format has different data persisted in the backend.
+
+* UUID: UUID tokens have a fixed size. Tokens are persistently stored and
+create a lot of database traffic, the persistence of token is for the revoke
+purpose. UUID tokens are validated online by Keystone, call to service will
+request keystone for token validation. Keystone can become a
+bottleneck in a large system. Due to this, UUID token type is not suitable
+for use in multi region clouds, no matter the Keystone database
+replicates or not.
+
+* PKI: Tokens are non persistent cryptographic based tokens and validated
+offline (not by the Keystone service) by Keystone middleware which is part
+of other services such as Nova. Since PKI tokens include endpoint for all
+services in all regions, the token size can become big. There are
+several ways to reduce the token size such as no catalog policy, endpoint
+filter to make a project binding with limited endpoints, and compressed PKI
+token - PKIZ, but the size of token is still unpredictable, making it difficult
+to manage. If catalog is not applied, that means the user can access all
+regions, in some scenario, it's not allowed to do like this. Centralized
+Keystone with PKI token to reduce inter region backend synchronization traffic.
+PKI tokens do produce Keystone traffic for revocation lists.
+
+* Fernet: Tokens are non persistent cryptographic based tokens and validated
+online by the Keystone service. Fernet tokens are more lightweight
+than PKI tokens and have a fixed size. Fernet tokens require Keystone
+deployed in a distributed manner, again to avoid inter region traffic. The
+data synchronization cost for the Keystone backend is smaller due to the non-
+persisted token.
+
+Cryptographic tokens bring new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope for
+this use case.
+
+Database deployment as the backend for KeyStone service
+------------------------------------------------------
+
+    Database replication:
+    -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable. But only master will
+provide write functionality, domain/project/role provisioning.
+    -Multi master synchronous: Galera(others like percona), not so scalable,
+for multi-master writing, and need more parameter tunning for WAN latency.It
+can provide the capability for limited multi-sites multi-write
+function for distributed KeyStone service.
+    -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+    Database server sharing:
+    In an OpenStack controller, normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+    Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+    Since only the Keystone database is to be sync or replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+    The database server can be shared when async master/slave replication is
+used, if global transaction identifiers GTID is enabled.
+
+Deployment options
+------------------
+
+-  Distributed KeyStone service with PKI token
+
+   Deploy KeyStone service in two sites with database replication. If site
+level failure impact is not considered, then KeyStone service can only be
+deployed into one site.
+
+   The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request to KeyStone server. The
+drawback of PKI token is
+   * the endpoint list size in the token. If a project will be only spread in
+     very limited site number(region number), then we can use the endpoint
+     filter to reduce the token size, make it workable even a lot of sites
+     in the cloud.
+   * KeyStone middleware(which is co-located in the service like
+     Nova-API/xxx-API) will have to send the request to the KeyStone server
+     frequently for the revoke-list, in order to reject some malicious API
+     request, for example, a user has to be deactivated, but use an old token
+     to access OpenStack service.
+
+    For this option, needs to leverage database replication to provide
+KeyStone Active-Active mode across sites to reduce the impact of site failure.
+And the revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+    Site level keystone load balance is required to provide site level
+redundancy, otherwise the KeyStone middleware will not switch request to the
+healthy KeyStone server in time.
+
+    And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+    This option can be used for some scenario where there are very limited
+sites, especially if each project only spreads into limited sites ( regions ).
+
+-  Distributed KeyStone service with Fernet token
+
+    Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows  and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project, Role, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+    This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+    Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+    The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+    This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+-  Distributed KeyStone service with Fernet token + Async replication (
+   star-mode).
+
+    One master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+    Only the master cluster nodes are allowed to write,  other slave nodes
+waiting for replication from the master cluster member( very little delay).
+
+    Pros.
+    Deploy database cluster in the master sites is to provide more master
+nodes, in order to provide more slaves could be done with async. replication
+in parallel. Two sites for the master cluster is to provide higher
+reliability (site level) for writing request, but reduce the maintaince
+challenge at the same time by limiting the cluster spreading over too many
+sites.
+    Multi-slaves in other sites is because of the slave has no knowledge of
+other slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+    Cons. Need to be aware of the chanllenge of key distribution and rotation
+for Fernet token.
+
+
+Multisite VNF Geo site disaster recovery
+========================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to restore in another site for
+catastrophic failures happened.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Geo site disaster recovery is to deal with more catastrophic failures
+(flood, earthquake, propagating software fault), and that loss of calls, or
+even temporary loss of service, is acceptable. It is also seems more common
+to accept/expect manual / administrator intervene into drive the process, not
+least because you don’t want to trigger the transfer by mistake.
+
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
+
+And finally, the lack of a requirement to do fast media transfer (without
+resignalling) generally removes the need for special networking behavior, with
+slower DNS-style redirection being acceptable.
+
+Here is more concerns about cloud infrastructure level capability to
+support VNF geo site disaster recovery
+
+Option1, Consistency application backup
+---------------------------------------
+
+The disater recovery process will work like this:
+
+1).DR(Geo site disaster recovery )software get the volumes for each VM
+in the VNF from Nova
+2).DR software call Nova quiesce API to quarantee quiecing VMs in desired
+order
+3).DR software takes snapshots of these volumes in Cinder (NOTE: Because
+storage often provides fast snapshot, so the duration between quiece and
+unquiece is a short interval)
+4).DR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+order
+5).DR software create volumes from the snapshots just taken in Cinder
+6).DR software create backup (incremental) for these volumes to remote
+backup storage ( swift or ceph, or.. ) in Cinder
+7).if this site failed,
+7.1)DR software restore these backup volumes in remote Cinder in the
+backup site.
+7.2)DR software boot VMs from bootable volumes from the remote Cinder in
+the backup site and attach the regarding data volumes.
+
+Note: It’s up to the DR policy and VNF character how to use the API. Some
+VNF may allow the standby of the VNF or member of the cluster to do
+quiece/unquiece to avoid interfering the service provided by the VNF. Some
+other VNF may afford short unavailable for DR purpose.
+
+This option provides application level consistency disaster recovery.
+This feature is WIP in OpenStack Mitaka release, and will be avaialle in next
+OPNFV release.
+
+Option2, Vitrual Machine Snapshot
+---------------------------------
+1).DR software create VM snapshot in Nova
+2).Nova quiece the VM internally
+ (NOTE: The upper level application or DR software should take care of
+  avoiding infra level outage induced VNF outage)
+3).Nova create image in Glance
+4).Nova create a snapshot of the VM, including volumes
+5).If the VM is volume backed VM, then create volume snapshot in Cinder
+5).No image uploaded to glance, but add the snapshot in the meta data of the
+   image in Glance
+6).DR software to get the snapshot information from the Glance
+7).DR software create volumes from these snapshots
+9).DR software create  backup (incremental) for these volumes to backup storage
+   ( swift or ceph, or.. ) in Cinder
+10).if this site failed,
+10.1).DR software restore these backup volumes to Cinder in the backup site.
+10.2).DR software boot vm from bootable volume from Cinder in the backup site
+   and attach the data volumes.
+
+This option only provides single VM level consistency disaster recovery
+
+This feature is already available in current OPNFV release.
+
+Option3, Consistency volume replication
+---------------------------------------
+1).DR software creates datastore (Block/Cinder, Object/Swift, App Custom
+   storage) with replication enabled at the relevant scope, for use to
+   selectively backup/replicate desire data to GR backup site
+2).DR software get the reference of storage in the remote site storage
+3).If primary site failed,
+3.1).DR software managing recovery in backup site gets references to relevant
+     storage and passes to new software instances
+3.2).Software attaches (or has attached) replicated storage, in the case of
+     volumes promoting to writable.
+
+Pros:1) Replication will be done in the storage level automatically, no need to
+        create backup regularly, for example, daily.
+     2) Application selection of limited amount of data to replicate reduces
+        risk of replicating failed state and generates less overhear.
+     3) Type of replication and model (active/backup, active/active, etc) can
+        be tailored to application needs
+
+Cons:1) Applications need to be designed with support in mind, including both
+        selection of data to be replicated and consideration of consistency
+     2) "Standard" support in Openstack for Disaster Recovery currently fairly
+        limited, though active work in this area.
+
+This feature is in discussion in OpenStack Mitaka release, and hopefully will
+be avaialle in next OPNFV release.
+
+
+VNF high availability across VIM
+================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Most of telecom applications have already been designed as
+Active-Standby/Active-Active/N-Way to achieve high availability
+(99.999%, corresponds to 5.26 minutes of unplanned downtime in a year),
+typically state replication or heart beat between
+Active-Active/Active-Active/N-Way (directly or via replicated database
+services, or via private designed message format) are required.
+
+We have to accept the currently limited availability ( 99.99%) of a
+given OpenStack instance, and intend to provide the availability of the
+telecom application by spreading its function across multiple OpenStack
+instances.To help with this, many people appear willing to provide multiple
+“independent” OpenStack instances in a single geographic site, with special
+networking (L2/L3) between clouds in that physical site.
+
+The telecom application often has different networking plane for different
+purpose:
+
+1) external network plane: using for communication with other telecom
+   application.
+
+2) components inter-communication plane: one VNF often consisted of several
+   components, this plane is designed for components inter-communication with
+   each other
+
+3) backup plane: this plane is used for the heart beat or state replication
+   between the component's active/standby or active/active or N-way cluster.
+
+4) management plane: this plane is mainly for the management purpose, like
+   configuration
+
+Generally these planes are separated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexible IP addressing
+plane. There are some interesting/hard requirements on the networking (L2/L3)
+between OpenStack instances, at lease the backup plane across different
+OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plane
+   for heartbeat or state replication. Overlay L2 network is preferred, the
+   reason is:
+
+   a) Support legacy compatibility: Some telecom app with built-in internal L2
+   network, for easy to move these app to virtualized telecom application, it
+   would be better to provide L2 network.
+
+   b) Support IP overlapping: multiple telecom applications may have
+      overlapping IP address for cross OpenStack instance networking Therefore,
+      over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+   For L3 networking, we can leverage the floating IP provided in current
+   Neutron, so no new feature requirement to OpenStack.
+
+Overlay L2 networking across OpenStack instances is in discussion with Neutron
+community.
+
+
+Revision: _sha1_
+
+Build date: |today|
-- 
cgit 1.2.3-korg