summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--docs/configguide/index.rst30
-rw-r--r--docs/configguide/multisite-configuration-guide.rst120
-rw-r--r--docs/how-to-use-docs/documentation-example.rst9
-rw-r--r--docs/requirements/VNF_high_availability_across_VIM.rst (renamed from VNF_high_availability_across_VIM.rst)0
-rw-r--r--docs/requirements/multisite-identity-service-management.rst (renamed from multisite-identity-service-management.rst)6
-rw-r--r--docs/requirements/multisite-vnf-gr-requirement.rst (renamed from multisite-vnf-gr-requirement.rst)152
-rw-r--r--docs/userguide/index.rst30
-rw-r--r--docs/userguide/multisite-admin-user-guide.rst405
-rw-r--r--setup.cfg24
-rwxr-xr-xsetup.py22
-rw-r--r--tox.ini25
11 files changed, 743 insertions, 80 deletions
diff --git a/docs/configguide/index.rst b/docs/configguide/index.rst
new file mode 100644
index 0000000..49b30f1
--- /dev/null
+++ b/docs/configguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+ sphinx-quickstart on Tue Jun 9 19:12:31 2015.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ multisite-configuration-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/configguide/multisite-configuration-guide.rst b/docs/configguide/multisite-configuration-guide.rst
new file mode 100644
index 0000000..4a20e83
--- /dev/null
+++ b/docs/configguide/multisite-configuration-guide.rst
@@ -0,0 +1,120 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+=============================
+Multisite configuration guide
+=============================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Before you read
+---------------
+
+This chapter does not intend to cover all configuration of KeyStone and other
+OpenStack services to work together with KeyStone.
+
+This chapter focuses only on the configuration part should be taken into
+account in multi-site scenario.
+
+Please read the configuration documentation related to identity management
+of OpenStack for all configuration items.
+
+http://docs.openstack.org/liberty/config-reference/content/ch_configuring-openstack-identity.html
+
+How to configure the database cluster for synchronization or asynchrounous
+repliation in multi-site scenario is out of scope of this document. The only
+remainder is that for the synchronization or replication, only Keystone
+database is required. If you are using MySQL, you can configure like this:
+
+In the master:
+
+ .. code-block:: bash
+
+ binlog-do-db=keystone
+
+In the slave:
+
+ .. code-block:: bash
+
+ replicate-do-db=keystone
+
+
+Deployment options
+------------------
+
+For each detail description of each deployment option, please refer to the
+admin-user-guide.
+
+- Distributed KeyStone service with PKI token
+
+ In KeyStone configuration file, PKI token format should be configured
+
+ .. code-block:: bash
+
+ provider = pki
+
+ or
+
+ .. code-block:: bash
+
+ provider = pkiz
+
+ In the [keystone_authtoken] section of each OpenStack service configuration
+ file in each site, configure the identity_url and auth_uri to the address
+ of KeyStone service
+
+ .. code-block:: bash
+
+ identity_uri = https://keystone.your.com:35357/
+ auth_uri = http://keystone.your.com:5000/v2.0
+
+ It's better to use domain name for the KeyStone service, but not to use IP
+ address directly, especially if you deployed KeyStone service in at least
+ two sites for site level high availability.
+
+- Distributed KeyStone service with Fernet token
+- Distributed KeyStone service with Fernet token + Async replication (
+ star-mode).
+
+ In these two deployment options, the token validation is planned to be done
+ in local site.
+
+ In KeyStone configuration file, Fernet token format should be configured
+
+ .. code-block:: bash
+
+ provider = fernet
+
+ In the [keystone_authtoken] section of each OpenStack service configuration
+ file in each site, configure the identity_url and auth_uri to the address
+ of local KeyStone service
+
+ .. code-block:: bash
+
+ identity_uri = https://local-keystone.your.com:35357/
+ auth_uri = http://local-keystone.your.com:5000/v2.0
+
+ and especially, configure the region_name to your local region name, for
+ example, if you are configuring services in RegionOne, and there is local
+ KeyStone service in RegionOne, then
+
+ .. code-block:: bash
+
+ region_name = RegionOne
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/how-to-use-docs/documentation-example.rst b/docs/how-to-use-docs/documentation-example.rst
index afcf758..24b4c8a 100644
--- a/docs/how-to-use-docs/documentation-example.rst
+++ b/docs/how-to-use-docs/documentation-example.rst
@@ -1,5 +1,5 @@
.. two dots create a comment. please leave this logo at the top of each of your rst files.
-.. image:: ../etc/opnfv-logo.png
+.. image:: ../etc/opnfv-logo.png
:height: 40
:width: 200
:alt: OPNFV
@@ -21,8 +21,9 @@ this is the directory structure of the docs/ directory that can be found in the
./how-to-use-docs/documentation-example.rst
./how-to-use-docs/index.rst
-To create your own documentation, Create any number of directories (depending on your need) and place in each of them an index.rst.
-This index file must refence your other rst files.
+To create your own documentation, Create any number of directories (depending
+on your need) and place in each of them an index.rst. This index file must
+refence your other rst files.
* Here is an example index.rst
@@ -59,7 +60,7 @@ For verify jobs a link to the documentation will show up as a comment in gerrit
* Merge jobs
-Once you are happy with the look of your documentation you can submit the patchset the merge job will
+Once you are happy with the look of your documentation you can submit the patchset the merge job will
copy the output of each documentation directory to http://artifacts.opnfv.org/$project/docs/$name_of_your_folder/index.html
Here are some quick examples of how to use rst markup
diff --git a/VNF_high_availability_across_VIM.rst b/docs/requirements/VNF_high_availability_across_VIM.rst
index 1a7d41b..1a7d41b 100644
--- a/VNF_high_availability_across_VIM.rst
+++ b/docs/requirements/VNF_high_availability_across_VIM.rst
diff --git a/multisite-identity-service-management.rst b/docs/requirements/multisite-identity-service-management.rst
index f46c4b4..b411c28 100644
--- a/multisite-identity-service-management.rst
+++ b/docs/requirements/multisite-identity-service-management.rst
@@ -45,8 +45,8 @@ be a topic out of our scope.
- Role assignment: In case of federation(and perhaps other solutions) it is not
feasible/scalable to do role assignment to users. Role assignment to groups
-is better. Role assignment will be done usually based on group. KeyStone
-supports this.
+ is better. Role assignment will be done usually based on group. KeyStone
+ supports this.
- Amount of inter region traffic: should be kept as little as possible,
consider CERNs Ceilometer issue as described in
@@ -129,7 +129,7 @@ Database deployment:
Database replication:
-Master/slave asynchronous: supported by the database server itself
(mysql/mariadb etc), works over WAN, it's more scalable
- -Multi master synchronous: Galera(others like percona), not so scalable,
+ -Multi master synchronous: Galera(others like percona), not so scalable,
for multi-master writing, and need more parameter tunning for WAN latency.
-Symmetrical/asymmetrical: data replicated to all regions or a subset,
in the latter case it means some regions needs to access Keystone in another
diff --git a/multisite-vnf-gr-requirement.rst b/docs/requirements/multisite-vnf-gr-requirement.rst
index a3755c2..7e67cd0 100644
--- a/multisite-vnf-gr-requirement.rst
+++ b/docs/requirements/multisite-vnf-gr-requirement.rst
@@ -2,9 +2,9 @@ This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
-=======================================
- Multisite VNF Geo site redundancy
-=======================================
+=========================================
+ Multisite VNF Geo site disaster recovery
+=========================================
Glossary
========
@@ -36,13 +36,13 @@ Description
GR is to deal with more catastrophic failures (flood, earthquake, propagating
software fault), and that loss of calls, or even temporary loss of service,
is acceptable. It is also seems more common to accept/expect manual /
-administrator intervene into drive the process, not least because you don’t want
-to trigger the transfer by mistake.
+administrator intervene into drive the process, not least because you don’t
+want to trigger the transfer by mistake.
-In terms of coordination/replication or backup/restore between geographic sites,
-discussion often (but not always) seems to focus on limited application level
-data/config replication, as opposed to replication backup/restore between of
-cloud infrastructure between different sites.
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
And finally, the lack of a requirement to do fast media transfer (without
resignalling) generally removes the need for special networking behavior, with
@@ -52,45 +52,48 @@ This use case is more concerns about cloud infrastructure level capability to
support VNF geo site redundancy
Requirement and candidate solutions analysis
-===========================
+============================================
For VNF to be restored from the backup site for catastrophic failures,
the VNF's bootable volume and data volumes must be restorable.
-There are three ways of restorable boot and data volumes. Choosing the right one
-largely depends on the underlying characteristics and requirements of a VNF.
+There are three ways of restorable boot and data volumes. Choosing the right
+one largely depends on the underlying characteristics and requirements of a
+VNF.
1. Nova Quiesce + Cinder Consistency volume snapshot+ Cinder backup
- - GR software get the attached volumes for the VMs in a VNF from Nova
- - GR software add these attached volumes to the consistency group in Cinder
- (NOTE: make sure that the block storage driver supports CG technology)
- - GR software call Nova API Quiesce to freeze VM and flush buffer
- - GR software make cgsnapshots of these volumes in Cinder
- - GR software call Nova API Unquiece to unfreeze VM. (NOTE: Because storage
- often provide fast snapshot, so the duration between quiece and unquiece is
- a short interval)
- - GR software create volumes from the cgsnapshots in Cinder
- - GR software create backup (incremental) for these volumes to backup storage
- ( swift or ceph, or.. ) in Cinder if this site failed,
- - GR software restore these backup volumes to Cinder in the backup site.
- - GR software boot vm from bootable volume rom Cinder in the backup site and
- attach the data volumes.
-
-Pros: 1) atomic quiesce / unquiesce api from Nova, make transactional snapshot
- of a group of VMs is possible, for example, quiesce VM1, quiesce VM2,
- quiesce VM3, snapshot VM1's volumes, snapshot VM2's volumes, snapshot
- VM3's volumes, unquiesce VM3, unquiesce VM2, unquiesce VM1. For some
- telecom application, the order is very important for a group of VMs
- with strong relationship.
- 2) leverage the Cinder consistency group functionality.
-
-Cons: Need Nova to expose the quiesce / unquiesce, fortunately it's alreay there
-in Nova-compute, just to add API layer to expose the functionality. NOTE: It's up
-to the GR policy and VNF character. Some VNF may afford short unavailable for GR
-purpose, and some other may use the standby of the VNF or member of the cluster to
-do disaster recovery replication to not interfere the service provided by the VNF.
-For these VNFs which can't be quieced/unquiece should use the option3 (VNF aware)
-to do the backup/replication.
+ 1).GR(Geo site disaster recovery )software get the volumes for each VM
+ in the VNF from Nova
+ 2).GR software call Nova quiesce API to quarantee quiecing VMs in desired
+ order
+ 3).GR software takes snapshots of these volumes in Cinder (NOTE: Because
+ storage often provides fast snapshot, so the duration between quiece and
+ unquiece is a short interval)
+ 4).GR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+ order
+ 5).GR software create volumes from the snapshots just taken in Cinder
+ 6).GR software create backup (incremental) for these volumes to remote
+ backup storage ( swift or ceph, or.. ) in Cinder
+ 7).if this site failed,
+ 7.1)GR software restore these backup volumes in remote Cinder in the
+ backup site.
+ 7.2)GR software boot VMs from bootable volumes from the remote Cinder in
+ the backup site and attach the regarding data volumes.
+
+Pros: Quiesce / unquiesce api from Nova, make transactional snapshot
+of a group of VMs is possible, for example, quiesce VM1, quiesce VM2,
+quiesce VM3, snapshot VM1's volumes, snapshot VM2's volumes, snapshot
+VM3's volumes, unquiesce VM3, unquiesce VM2, unquiesce VM1. For some
+telecom application, the order is very important for a group of VMs
+with strong relationship.
+
+Cons: Need Nova to expose the quiesce / unquiesce, fortunately it's alreay
+there in Nova-compute, just to add API layer to expose the functionality.
+NOTE: It's up to the DR policy and VNF character. Some VNF may afford short
+unavailable for DR purpose, and some other may use the standby of the VNF
+or member of the cluster to do disaster recovery replication to not interfere
+the service provided by the VNF. For these VNFs which can't be quieced/unquiece
+should use the option3 (VNF aware) to do the backup/replication.
Requirement to OpenStack: Nova needs to expose quiesce / unquiesce api,
which is lack in Nova now.
@@ -100,37 +103,39 @@ Example characteristics and requirements of a VNF:
entire data should be replicated.
- VNF's data changes infrequently, which results in less number of volume
snapshots during a given time interval (hour, day, etc.);
- - VNF is not highly dynamic, e.g. the number of scaling (in/out) operations is
- small.
+ - VNF is not highly dynamic, e.g. the number of scaling (in/out) operations
+ is small.
- VNF is not geo-redundant, does not aware of available cloud replication
mechanisms, has no built-in logic for replication: doesn't pre-select the
- minimum replication data required for restarting the VNF in a different site.
- (NOTE: The VNF who can perform such data cherry picking should consider case 3)
+ minimum replication data required for restarting the VNF in a different
+ site.
+ (NOTE: The VNF who can perform such data cherry picking should consider
+ case 3)
2. Nova Snapshot + Glance Image + Cinder Snapshot + Cinder Backup
- GR software create VM snapshot in Nova
- Nova quiece the VM internally
- (NOTE: The upper level application or GR software should take care of avoiding
- infra level outage induced VNF outage)
+ (NOTE: The upper level application or GR software should take care of
+ avoiding infra level outage induced VNF outage)
- Nova create image in Glance
- Nova create a snapshot of the VM, including volumes
- If the VM is volume backed VM, then create volume snapshot in Cinder
- - No image uploaded to glance, but add the snapshot in the meta data of the image
- in Glance
+ - No image uploaded to glance, but add the snapshot in the meta data of the
+ image in Glance
- GR software to get the snapshot information from the Glance
- GR software create volumes from these snapshots
- - GR software create backup (incremental) for these volumes to backup storage
- ( swift or ceph, or.. ) in Cinder if this site failed,
+ - GR software create backup (incremental) for these volumes to backup
+ storage( swift or ceph, or.. ) in Cinder if this site failed,
- GR software restore these backup volumes to Cinder in the backup site.
- - GR software boot vm from bootable volume from Cinder in the backup site and attach
- the data volumes.
+ - GR software boot vm from bootable volume from Cinder in the backup site
+ and attach the data volumes.
Pros: 1) Automatically quiesce/unquiesce, and snapshot of volumes of one VM.
Cons: 1) Impossible to form a transactional group of VMs backup. for example,
quiesce VM1, quiesce VM2, quiesce VM3, snapshot VM1, snapshot VM2,
- snapshot VM3, unquiesce VM3, unquiesce VM2, unquiesce VM1. This is quite
- important in telecom application in some scenario
+ snapshot VM3, unquiesce VM3, unquiesce VM2, unquiesce VM1. This is
+ quite important in telecom application in some scenario
2) not leverage the Cinder consistency group.
3) One more service Glance involved in the backup. Not only to manage the
increased snapshot in Cinder, but also need to manage the regarding
@@ -144,25 +149,26 @@ model. There is very rare use case for application that only one VM need to be
taken snapshot for back up.
3. Selective Replication of Persistent Data
- - GR software creates datastore (Block/Cinder, Object/Swift, App Custom storage)
- with replication enabled at the relevant scope, for use to selectively
- backup/replicate desire data to GR backup site
+ - GR software creates datastore (Block/Cinder, Object/Swift, App Custom
+ storage) with replication enabled at the relevant scope, for use to
+ selectively backup/replicate desire data to GR backup site
- Cinder : Various work underway to provide async replication of cinder
volumes for disaster recovery use, including this presentation from
Vancouver http://www.slideshare.net/SeanCohen/dude-wheres-my-volume-open-stack-summit-vancouver-2015
- Swift : Range of options of using native Swift replicas (at expense of
- tighter coupling) to replication using backend plugins or volume replication
+ tighter coupling) to replication using backend plugins or volume
+ replication
- Custom : A wide range of OpenSource technologies including Cassandra
and Ceph, with fully application level solutions also possible
- GR software get the reference of storage in the remote site storage
- If primary site failed,
- - GR software managing recovery in backup site gets references to relevant
- storage and passes to new software instances
+ - GR software managing recovery in backup site gets references to
+ relevant storage and passes to new software instances
- Software attaches (or has attached) replicated storage, in the case of
volumes promoting to writable.
-Pros: 1) Replication will be done in the storage level automatically, no need to
- create backup regularly, for example, daily.
+Pros: 1) Replication will be done in the storage level automatically, no need
+ to create backup regularly, for example, daily.
2) Application selection of limited amount of data to replicate reduces
risk of replicating failed state and generates less overhear.
3) Type of replication and model (active/backup, active/active, etc) can
@@ -170,11 +176,11 @@ Pros: 1) Replication will be done in the storage level automatically, no need t
Cons: 1) Applications need to be designed with support in mind, including both
selection of data to be replicated and consideration of consistency
- 2) "Standard" support in Openstack for Disaster Recovery currently fairly
- limited, though active work in this area.
+ 2) "Standard" support in Openstack for Disaster Recovery currently
+ fairly limited, though active work in this area.
-Requirement to OpenStack: save the real ref to volume admin_metadata after it has
-been managed by the driver https://review.openstack.org/#/c/182150/.
+Requirement to OpenStack: save the real ref to volume admin_metadata after it
+has been managed by the driver https://review.openstack.org/#/c/182150/.
Prototype
-----------
@@ -186,10 +192,10 @@ Proposed solution
requirements perspective we could recommend all three options for different
sceanrio, that it is an operator choice.
Options 1 & 2 seem to be more about replicating/backing up any VNF, whereas
- option 3 is about proving a service to a replication aware application. It should
- be noted that HA requirement is not a priority here, HA for VNF project will handle
- the specific HA requirement. It should also be noted that it's up to specific
- application how to do HA (out of scope here).
+ option 3 is about proving a service to a replication aware application. It
+ should be noted that HA requirement is not a priority here, HA for VNF
+ project will handle the specific HA requirement. It should also be noted
+ that it's up to specific application how to do HA (out of scope here).
For the 3rd option, the app should know which volume has replication
capability, and write regarding data to this volume, and guarantee
consistency by the app itself. Option 3 is preferrable in HA scenario.
@@ -199,7 +205,7 @@ Gaps
====
1) Nova to expose quiesce / unquiesce API:
https://blueprints.launchpad.net/nova/+spec/expose-quiesce-unquiesce-api
- 2) save the real ref to volume admin_metadata in Cinder:
+ 2) Get the real ref to volume admin_metadata in Cinder:
https://review.openstack.org/#/c/182150/
diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
new file mode 100644
index 0000000..56b0c59
--- /dev/null
+++ b/docs/userguide/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+ sphinx-quickstart on Tue Jun 9 19:12:31 2015.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+
+table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ multisite-admin-user-guide.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/userguide/multisite-admin-user-guide.rst b/docs/userguide/multisite-admin-user-guide.rst
new file mode 100644
index 0000000..ed91446
--- /dev/null
+++ b/docs/userguide/multisite-admin-user-guide.rst
@@ -0,0 +1,405 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+==========================
+Multisite admin user guide
+==========================
+
+Multisite identity service management
+=====================================
+
+Goal
+----
+
+A user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Token Format
+------------
+
+There are 3 types of token format supported by OpenStack KeyStone
+
+ **UUID**
+ **PKI/PKIZ**
+ **FERNET**
+
+It's very important to understand these token format before we begin the
+mutltisite identity service management. Please refer to the OpenStack
+official site for the identity management.
+http://docs.openstack.org/admin-guide-cloud/identity_management.html
+
+Key consideration in multisite scenario
+---------------------------------------
+
+A user is provided with a single authentication URL to the Identity (Keystone)
+service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. Keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. The user sends a request to a service in a
+selected region including the token. Now the service in the region, say Nova
+needs to validate the token. The service uses its configured keystone endpoint
+and service credentials to request token validation from Keystone. After the
+token is validated by KeyStone, the user is authorized to use the service.
+
+The key considerations for token validation in multisite scenario are:
+* Site level failure: impact on authN and authZ shoulde be as minimal as
+ possible
+* Scalable: as more and more sites added, no bottleneck in token validation
+* Amount of inter region traffic: should be kept as little as possible
+
+Hence, Keystone token validation should preferably be done in the same
+region as the service itself.
+
+The challenge to distribute KeyStone service into each region is the KeyStone
+backend. Different token format has different data persisted in the backend.
+
+* UUID: UUID tokens have a fixed size. Tokens are persistently stored and
+create a lot of database traffic, the persistence of token is for the revoke
+purpose. UUID tokens are validated online by Keystone, call to service will
+request keystone for token validation. Keystone can become a
+bottleneck in a large system. Due to this, UUID token type is not suitable
+for use in multi region clouds, no matter the Keystone database
+replicates or not.
+
+* PKI: Tokens are non persistent cryptographic based tokens and validated
+offline (not by the Keystone service) by Keystone middleware which is part
+of other services such as Nova. Since PKI tokens include endpoint for all
+services in all regions, the token size can become big. There are
+several ways to reduce the token size such as no catalog policy, endpoint
+filter to make a project binding with limited endpoints, and compressed PKI
+token - PKIZ, but the size of token is still unpredictable, making it difficult
+to manage. If catalog is not applied, that means the user can access all
+regions, in some scenario, it's not allowed to do like this. Centralized
+Keystone with PKI token to reduce inter region backend synchronization traffic.
+PKI tokens do produce Keystone traffic for revocation lists.
+
+* Fernet: Tokens are non persistent cryptographic based tokens and validated
+online by the Keystone service. Fernet tokens are more lightweight
+than PKI tokens and have a fixed size. Fernet tokens require Keystone
+deployed in a distributed manner, again to avoid inter region traffic. The
+data synchronization cost for the Keystone backend is smaller due to the non-
+persisted token.
+
+Cryptographic tokens bring new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope for
+this use case.
+
+Database deployment as the backend for KeyStone service
+------------------------------------------------------
+
+ Database replication:
+ -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable. But only master will
+provide write functionality, domain/project/role provisioning.
+ -Multi master synchronous: Galera(others like percona), not so scalable,
+for multi-master writing, and need more parameter tunning for WAN latency.It
+can provide the capability for limited multi-sites multi-write
+function for distributed KeyStone service.
+ -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+ Database server sharing:
+ In an OpenStack controller, normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+ Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+ Since only the Keystone database is to be sync or replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+ The database server can be shared when async master/slave replication is
+used, if global transaction identifiers GTID is enabled.
+
+Deployment options
+------------------
+
+- Distributed KeyStone service with PKI token
+
+ Deploy KeyStone service in two sites with database replication. If site
+level failure impact is not considered, then KeyStone service can only be
+deployed into one site.
+
+ The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request to KeyStone server. The
+drawback of PKI token is
+ * the endpoint list size in the token. If a project will be only spread in
+ very limited site number(region number), then we can use the endpoint
+ filter to reduce the token size, make it workable even a lot of sites
+ in the cloud.
+ * KeyStone middleware(which is co-located in the service like
+ Nova-API/xxx-API) will have to send the request to the KeyStone server
+ frequently for the revoke-list, in order to reject some malicious API
+ request, for example, a user has to be deactivated, but use an old token
+ to access OpenStack service.
+
+ For this option, needs to leverage database replication to provide
+KeyStone Active-Active mode across sites to reduce the impact of site failure.
+And the revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+ Site level keystone load balance is required to provide site level
+redundancy, otherwise the KeyStone middleware will not switch request to the
+healthy KeyStone server in time.
+
+ And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+ This option can be used for some scenario where there are very limited
+sites, especially if each project only spreads into limited sites ( regions ).
+
+- Distributed KeyStone service with Fernet token
+
+ Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project, Role, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+ This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+ Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+ The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+ This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+- Distributed KeyStone service with Fernet token + Async replication (
+ star-mode).
+
+ One master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+ Only the master cluster nodes are allowed to write, other slave nodes
+waiting for replication from the master cluster member( very little delay).
+
+ Pros.
+ Deploy database cluster in the master sites is to provide more master
+nodes, in order to provide more slaves could be done with async. replication
+in parallel. Two sites for the master cluster is to provide higher
+reliability (site level) for writing request, but reduce the maintaince
+challenge at the same time by limiting the cluster spreading over too many
+sites.
+ Multi-slaves in other sites is because of the slave has no knowledge of
+other slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+ Cons. Need to be aware of the chanllenge of key distribution and rotation
+for Fernet token.
+
+
+Multisite VNF Geo site disaster recovery
+========================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to restore in another site for
+catastrophic failures happened.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Geo site disaster recovery is to deal with more catastrophic failures
+(flood, earthquake, propagating software fault), and that loss of calls, or
+even temporary loss of service, is acceptable. It is also seems more common
+to accept/expect manual / administrator intervene into drive the process, not
+least because you don’t want to trigger the transfer by mistake.
+
+In terms of coordination/replication or backup/restore between geographic
+sites, discussion often (but not always) seems to focus on limited application
+level data/config replication, as opposed to replication backup/restore between
+of cloud infrastructure between different sites.
+
+And finally, the lack of a requirement to do fast media transfer (without
+resignalling) generally removes the need for special networking behavior, with
+slower DNS-style redirection being acceptable.
+
+Here is more concerns about cloud infrastructure level capability to
+support VNF geo site disaster recovery
+
+Option1, Consistency application backup
+---------------------------------------
+
+The disater recovery process will work like this:
+
+1).DR(Geo site disaster recovery )software get the volumes for each VM
+in the VNF from Nova
+2).DR software call Nova quiesce API to quarantee quiecing VMs in desired
+order
+3).DR software takes snapshots of these volumes in Cinder (NOTE: Because
+storage often provides fast snapshot, so the duration between quiece and
+unquiece is a short interval)
+4).DR software call Nova unquiece API to unquiece VMs of the VNF in reverse
+order
+5).DR software create volumes from the snapshots just taken in Cinder
+6).DR software create backup (incremental) for these volumes to remote
+backup storage ( swift or ceph, or.. ) in Cinder
+7).if this site failed,
+7.1)DR software restore these backup volumes in remote Cinder in the
+backup site.
+7.2)DR software boot VMs from bootable volumes from the remote Cinder in
+the backup site and attach the regarding data volumes.
+
+Note: It’s up to the DR policy and VNF character how to use the API. Some
+VNF may allow the standby of the VNF or member of the cluster to do
+quiece/unquiece to avoid interfering the service provided by the VNF. Some
+other VNF may afford short unavailable for DR purpose.
+
+This option provides application level consistency disaster recovery.
+This feature is WIP in OpenStack Mitaka release, and will be avaialle in next
+OPNFV release.
+
+Option2, Vitrual Machine Snapshot
+---------------------------------
+1).DR software create VM snapshot in Nova
+2).Nova quiece the VM internally
+ (NOTE: The upper level application or DR software should take care of
+ avoiding infra level outage induced VNF outage)
+3).Nova create image in Glance
+4).Nova create a snapshot of the VM, including volumes
+5).If the VM is volume backed VM, then create volume snapshot in Cinder
+5).No image uploaded to glance, but add the snapshot in the meta data of the
+ image in Glance
+6).DR software to get the snapshot information from the Glance
+7).DR software create volumes from these snapshots
+9).DR software create backup (incremental) for these volumes to backup storage
+ ( swift or ceph, or.. ) in Cinder
+10).if this site failed,
+10.1).DR software restore these backup volumes to Cinder in the backup site.
+10.2).DR software boot vm from bootable volume from Cinder in the backup site
+ and attach the data volumes.
+
+This option only provides single VM level consistency disaster recovery
+
+This feature is already available in current OPNFV release.
+
+Option3, Consistency volume replication
+---------------------------------------
+1).DR software creates datastore (Block/Cinder, Object/Swift, App Custom
+ storage) with replication enabled at the relevant scope, for use to
+ selectively backup/replicate desire data to GR backup site
+2).DR software get the reference of storage in the remote site storage
+3).If primary site failed,
+3.1).DR software managing recovery in backup site gets references to relevant
+ storage and passes to new software instances
+3.2).Software attaches (or has attached) replicated storage, in the case of
+ volumes promoting to writable.
+
+Pros:1) Replication will be done in the storage level automatically, no need to
+ create backup regularly, for example, daily.
+ 2) Application selection of limited amount of data to replicate reduces
+ risk of replicating failed state and generates less overhear.
+ 3) Type of replication and model (active/backup, active/active, etc) can
+ be tailored to application needs
+
+Cons:1) Applications need to be designed with support in mind, including both
+ selection of data to be replicated and consideration of consistency
+ 2) "Standard" support in Openstack for Disaster Recovery currently fairly
+ limited, though active work in this area.
+
+This feature is in discussion in OpenStack Mitaka release, and hopefully will
+be avaialle in next OPNFV release.
+
+
+VNF high availability across VIM
+================================
+
+Goal
+----
+
+a VNF (telecom application) should, be able to realize high availability
+deloyment across OpenStack instances.
+
+Key consideration in multisite scenario
+---------------------------------------
+
+Most of telecom applications have already been designed as
+Active-Standby/Active-Active/N-Way to achieve high availability
+(99.999%, corresponds to 5.26 minutes of unplanned downtime in a year),
+typically state replication or heart beat between
+Active-Active/Active-Active/N-Way (directly or via replicated database
+services, or via private designed message format) are required.
+
+We have to accept the currently limited availability ( 99.99%) of a
+given OpenStack instance, and intend to provide the availability of the
+telecom application by spreading its function across multiple OpenStack
+instances.To help with this, many people appear willing to provide multiple
+“independent” OpenStack instances in a single geographic site, with special
+networking (L2/L3) between clouds in that physical site.
+
+The telecom application often has different networking plane for different
+purpose:
+
+1) external network plane: using for communication with other telecom
+ application.
+
+2) components inter-communication plane: one VNF often consisted of several
+ components, this plane is designed for components inter-communication with
+ each other
+
+3) backup plane: this plane is used for the heart beat or state replication
+ between the component's active/standby or active/active or N-way cluster.
+
+4) management plane: this plane is mainly for the management purpose, like
+ configuration
+
+Generally these planes are separated with each other. And for legacy telecom
+application, each internal plane will have its fixed or flexible IP addressing
+plane. There are some interesting/hard requirements on the networking (L2/L3)
+between OpenStack instances, at lease the backup plane across different
+OpenStack instances:
+
+1) Overlay L2 networking or shared L2 provider networks as the backup plane
+ for heartbeat or state replication. Overlay L2 network is preferred, the
+ reason is:
+
+ a) Support legacy compatibility: Some telecom app with built-in internal L2
+ network, for easy to move these app to virtualized telecom application, it
+ would be better to provide L2 network.
+
+ b) Support IP overlapping: multiple telecom applications may have
+ overlapping IP address for cross OpenStack instance networking Therefore,
+ over L2 networking across Neutron feature is required in OpenStack.
+
+2) L3 networking cross OpenStack instance for heartbeat or state replication.
+ For L3 networking, we can leverage the floating IP provided in current
+ Neutron, so no new feature requirement to OpenStack.
+
+Overlay L2 networking across OpenStack instances is in discussion with Neutron
+community.
+
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/setup.cfg b/setup.cfg
new file mode 100644
index 0000000..5f94af8
--- /dev/null
+++ b/setup.cfg
@@ -0,0 +1,24 @@
+[metadata]
+name = telcowg-usecases
+summary = Use case repo for Telco Working Group
+description-file =
+ README.rst
+author = OpenStack
+author-email = openstack-dev@lists.openstack.org
+home-page = http://www.openstack.org/
+classifier =
+ Environment :: OpenStack
+ Intended Audience :: Developers
+ License :: OSI Approved :: Apache Software License
+ Operating System :: POSIX :: Linux
+
+[build_sphinx]
+source-dir = doc/source
+build-dir = doc/build
+all_files = 1
+
+[pbr]
+warnerrors = True
+
+[upload_sphinx]
+upload-dir = doc/build/html
diff --git a/setup.py b/setup.py
new file mode 100755
index 0000000..769f681
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python
+# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+# implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# THIS FILE IS MANAGED BY THE GLOBAL REQUIREMENTS REPO - DO NOT EDIT
+import setuptools
+
+setuptools.setup(
+ setup_requires=['pbr>=0.6,<1.0'],
+ pbr=True)
diff --git a/tox.ini b/tox.ini
new file mode 100644
index 0000000..be6cbbb
--- /dev/null
+++ b/tox.ini
@@ -0,0 +1,25 @@
+[tox]
+minversion = 1.6
+envlist = docs
+skipsdist = True
+
+[testenv]
+usedevelop = True
+install_command = pip install -U {opts} {packages}
+setenv =
+ VIRTUAL_ENV={envdir}
+deps = -r{toxinidir}/requirements.txt
+ -r{toxinidir}/test-requirements.txt
+
+[testenv:venv]
+commands = {posargs}
+
+[testenv:docs]
+commands = python setup.py build_sphinx
+
+[testenv:spelling]
+deps =
+ -r{toxinidir}/requirements.txt
+ sphinxcontrib-spelling
+ PyEnchant
+commands = sphinx-build -b spelling doc/source doc/build/spelling