Merge "Multisite identity service management"

author: Chaoyi Huang <joehuang@huawei.com> 2015-11-06 00:17:52 +0000
committer: Gerrit Code Review <gerrit@172.30.200.206> 2015-11-06 00:17:52 +0000
commit: 2783cc9d74916667ffcbb4616b687b5dd2701933 (patch)
tree: 6eca93a9e93571d7b3a0286370dfedf885d60dd5
parent: 9009692b06aa7323af3c318d02de21d0fa1ac335 (diff)
parent: 33e4c34ba7d4017049a54138dc5980bee395e274 (diff)
1 files changed, 376 insertions, 0 deletions
diff --git a/multisite-identity-service-management.rst b/multisite-identity-service-management.rst
new file mode 100644
index 0000000..f46c4b4
--- /dev/null
+++ b/multisite-identity-service-management.rst
@@ -0,0 +1,376 @@
+This work is licensed under a Creative Commons Attribution 3.0 Unported
+License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+ Multisite identity service management
+=======================================
+
+Glossary
+========
+
+There are 3 types of token supported by OpenStack KeyStone
+    **UUID**
+
+    **PKI/PKIZ**
+
+    **FERNET**
+
+Please refer to reference section for these token formats, benchmark and
+comparation.
+
+
+Problem description
+===================
+
+Abstract
+------------
+
+a user should, using a single authentication point be able to manage virtual
+resources spread over multiple OpenStack regions.
+
+Description
+------------
+
+- User/Group Management: e.g. use of LDAP, should OPNFV be agnostic to this?
+  Reusing the LDAP infrastructure that is mature and has features lacking in
+Keystone (e.g.password aging and policies). KeyStone can use external system to
+do the user authentication, and user/group management could be the job of
+external system, so that KeyStone can reuse/co-work with enterprise identity
+management. KeyStone's main role in OpenStack is to provide
+service(Nova,Cinder...) aware token, and do the authorization. You can refer to
+this post https://blog-nkinder.rhcloud.com/?p=130.Therefore, LDAP itself should
+be a topic out of our scope.
+
+- Role assignment: In case of federation(and perhaps other solutions) it is not
+  feasible/scalable to do role assignment to users. Role assignment to groups
+is better. Role assignment will be done usually based on group. KeyStone
+supports this.
+
+- Amount of inter region traffic: should be kept as little as possible,
+  consider CERNs Ceilometer issue as described in
+http://openstack-in-production.blogspot.se/2014/03/cern-cloud-architecture-update-for.html
+
+Requirement analysis
+===========================
+
+- A user is provided with a single authentication URL to the Identity
+  (Keystone) service. Using that URL, the user authenticates with Keystone by
+requesting a token typically using username/password credentials. The keystone
+server validates the credentials, possibly with an external LDAP/AD server and
+returns a token to the user. With token type UUID/Fernet, the user request the
+service catalog. With PKI tokens the service catalog is included in the token.
+The user sends a request to a service in a selected region including the token.
+Now the service in the region, say Nova needs to validate the token. Nova uses
+its configured keystone endpoint and service credentials to request token
+validation from Keystone. The Keystone token validation should preferably be
+done in the same region as Nova itself. Now Keystone has to validate the token
+that also (always?) includes a project ID in order to make sure the user is
+authorized to use Nova. The project ID is stored in the assignment backend -
+tables in the Keystone SQL database. For this project ID validation the
+assignment backend database needs to have the same content as the keystone who
+issued the token.
+
+- So either 1) services in all regions are configured with a central keystone
+  endpoint through which all token validations will happen. or 2) the Keystone
+assignment backend database is replicated and thus available to Keystone
+instances locally in each region.
+
+  Alt 2) is obviously the only scalable solution that produce no inter region
+traffic for normal service usage. Only when data in the assignment backend is
+changed, replication traffic will be sent between regions. Assignment data
+includes domains, projects, roles and role assignments.
+
+Keystone deployment:
+
+    - Centralized: a single Keystone service installed in some location, either
+      in a "master" region or totally external as a service to OpenStack
+      regions.
+    - Distributed: a Keystone service is deployed in each region
+
+Token types:
+
+    - UUID: tokens are persistently stored and creates a lot of database
+      traffic, the persistence of token is for the revoke purpose. UUID tokens
+are online validated by Keystone, each API calling to service will ask token
+validation from KeyStone. Keystone can become a bottleneck in a large system
+due to this. UUID token type is not suitable for use in multi region clouds at
+all, no matter the solution used for the Keystone database replication (or
+not). UUID tokens have a fixed size.
+
+    - PKI: tokens are non persistent cryptographic based tokens and offline
+      validated (not by the Keystone service) by Keystone middleware
+which is part of other services such as Nova. Since PKI tokens include endpoint
+for all services in all regions, the token size can become big.There are
+several ways to reduce the token size, no catalog policy, endpoint filter to
+make a project binding with limited endpoints, and compressed PKI token - PKIZ,
+but the size of token is still predictable, make it difficult to manage. If no
+catalog applied, that means the user can access all regions, in some scenario,
+it's not allowed to do like this.
+
+    - Fernet: tokens are non persistent cryptographic based tokens and online
+      validated by the Keystone service. Fernet tokens are more lightweigth
+then PKI tokens and have a fixed size.
+
+    PKI (offline validated) are needed with a centralized Keystone to avoid
+inter region traffic. PKI tokens do produce Keystone traffic for revocation
+lists.
+
+    Fernet tokens requires Keystone deployed in a distributed manner, again to
+avoid inter region traffic.
+
+    Cryptographic tokens brings new (compared to UUID tokens) issues/use-cases
+like key rotation, certificate revocation. Key management is out of scope of
+this use case.
+
+Database deployment:
+
+    Database replication:
+    -Master/slave asynchronous: supported by the database server itself
+(mysql/mariadb etc), works over WAN, it's more scalable
+    -Multi master synchronous: Galera(others like percona), not so scalable, 
+for multi-master writing, and need more parameter tunning for WAN latency.
+    -Symmetrical/asymmetrical: data replicated to all regions or a subset,
+in the latter case it means some regions needs to access Keystone in another
+region.
+
+    Database server sharing:
+    In an OpenStack controller normally many databases from different
+services are provided from the same database server instance. For HA reasons,
+the database server is usually synchronously replicated to a few other nodes
+(controllers) to form a cluster. Note that _all_ database are replicated in
+this case, for example when Galera sync repl is used.
+
+    Only the Keystone database can be replicated to other sites. Replicating
+databases for other services will cause those services to get of out sync and
+malfunction.
+
+    Since only the Keystone database is to be sync replicated to another
+region/site, it's better to deploy Keystone database into its own
+database server with extra networking requirement, cluster or replication
+configuration. How to support this by installer is out of scope.
+
+    The database server can be shared when async master/slave repl is used, if
+global transaction identifiers GTID is enabled.
+
+
+Candidate solution analysis
+------------------------------------
+
+-  KeyStone service (Distributed) with Fernet token
+
+    Fernet token is a very new format, and just introduced recently,the biggest
+gain for this token format is :1) lightweight, size is small to be carried in
+the API request, not like PKI token( as the sites increased, the endpoint-list
+will grows  and the token size is too long to carry in the API request) 2) no
+token persistence, this also make the DB not changed too much and with light
+weight data size (just project. User, domain, endpoint etc). The drawback for
+the Fernet token is that token has to be validated by KeyStone for each API
+request.
+
+    This makes that the DB of KeyStone can work as a cluster in multisite (for
+example, using MySQL galera cluster). That means install KeyStone API server in
+each site, but share the same the backend DB cluster.Because the DB cluster
+will synchronize data in real time to multisite, all KeyStone server can see
+the same data.
+
+    Because each site with KeyStone installed, and all data kept same,
+therefore all token validation could be done locally in the same site.
+
+    The challenge for this solution is how many sites the DB cluster can
+support. Question is aksed to MySQL galera developers, their answer is that no
+number/distance/network latency limitation in the code. But in the practice,
+they have seen a case to use MySQL cluster in 5 data centers, each data centers
+with 3 nodes.
+
+    This solution will be very good for limited sites which the DB cluster can
+cover very well.
+
+-  KeyStone service(Distributed) with Fernet token + Async replication (
+   multi-cluster mode).
+
+    We may have several KeyStone cluster with Fernet token, for example,
+cluster1 ( site1, site2, … site 10 ), cluster 2 ( site11, site 12,..,site 20).
+Then do the DB async replication among different cluster asynchronously.
+
+    A prototype of this has been down on this. In some blogs they call it
+"hybridreplication". Architecturally you have a master region where you do
+keystone writes. The other regions is read-only.
+http://severalnines.com/blog/deploy-asynchronous-slave-galera-mysql-easy-way
+http://severalnines.com/blog/replicate-mysql-server-galera-cluster
+
+    Only one DB cluster (the master DB cluster) is allowed to write(but still
+multisite, not all sites), other clusters waiting for replication. Inside the
+master cluster, "write" is allowed in multiple region for the distributed lock
+in the DB. But please notice the challenge of key distribution and rotation for
+Fernet token, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+-  KeyStone service(Distributed) with Fernet token + Async replication (
+   star-mode).
+
+    one master KeyStone cluster with Fernet token in two sites (for site level
+high availability purpose), other sites will be installed with at least 2 slave
+nodes where the node is configured with DB async replication from the master
+cluster members, and one slave’s mater node in site1, another slave’s master
+node in site 2.
+
+    Only the master cluster nodes are allowed to write,  other slave nodes
+waiting for replication from the master cluster ( very little delay) member.
+But  the chanllenge of key distribution and rotation for Fernet token should be
+settled, you can refer to these two blogs: http://lbragstad.com/?p=133,
+http://lbragstad.com/?p=156
+
+    Pros.
+    Why cluster in the master sites? There are lots of master nodes in the
+cluster, in order to provide more slaves could be done with async. replication
+in parallel.  Why two sites for the master cluster? to provide higher
+reliability (site level) for writing request.
+    Why using multi-slaves in other sites. Slave has no knowledge of other
+slaves, so easy to manage multi-slaves in one site than a cluster, and
+multi-slaves work independently but provide multi-instance redundancy(like a
+cluster, but independent).
+
+    Cons. The distribution/rotation of key management.
+
+-  KeyStone service(Distributed) with PKI token
+
+    The PKI token has one great advantage is that the token validation can be
+done locally, without sending token validation request toKeyStone server. The
+drawback of PKI token is 1) the endpoint list size in the token. If a project
+will be only spread in very limited site number(region number), then we can use
+the endpoint filter to reduce the token size, make it workable even a lot of
+sites in the cloud. 2) KeyStone middleware(the old KeyStone client, which
+co-locate in Nova/xxx-API) will have to send the request to the KeyStone server
+frequently for the revoke-list, in order to reject some malicious API request,
+for example, a user has be deactivated, but use an old token to access
+OpenStack service.
+
+    For this solution, except above issues, we need also to provide KeyStone
+Active-Active mode across site to reduce the impact of site failure. And the
+revoke-list request is very frequently asked, so the performance of the
+KeyStone server needs also to be taken care.
+
+    Site level keystone load balance is required to provide site level
+redundancy. Otherwise the KeyStone middleware will not switch request to the
+health KeyStone server in time.
+
+    This solution can be used for some scenario, especially a project only
+spread in limited sites ( regions ).
+
+    And also the cert distribution/revoke to each site / API server for token
+validation is required.
+
+-  KeyStone service(Distributed) with UUID token
+
+    Because each token validation will be sent to KeyStone server,and the token
+persistence also makes the DB size larger than Fernet token, not so good as the
+fernet token to provide a distributed KeyStone service. UUID is a solution
+better for small scale and inside one site.
+
+    Cons: UUID tokens are persistently stored so will cause a lot of inter
+region replication traffic, tokens will be persisted for authorization and
+revoke purpose, the frequent changed database leads to a lot of inter region
+replication traffic.
+
+-  KeyStone service(Distributed) with Fernet token + KeyStone federation You
+    have to accept the drawback of KeyStone federation if you have a lot of
+sites/regions. Please refer to KeyStone federation section
+
+-  KeyStone federation
+    In this solution, we can install KeyStone  service in each site and with
+its own database. Because we have to make the KeyStone IdP and SP know each
+other, therefore the configuration needs to be done accordingly, and setup the
+role/domain/group mapping, create regarding region in the pair.As sites
+increase, if each user is able to access all sites, then full-meshed
+mapping/configuration has to be done. Whenever you add one more site, you have
+to do n*(n-1) sites configuration/mapping. The complexity will be great enough
+as the sites number increase.
+
+    KeyStone Federation is mainly for different cloud admin to borrow/rent
+resources, for example, A company and B company, A private cloud and B public
+cloud, and both of them using OpenStack based cloud. Therefore a lot of mapping
+and configuration has to be done to make it work.
+
+-  KeyStone service (Centralized)with Fernet token
+
+    cons: inter region traffic for token validation, token validation requests
+from all other sites has to be sent to the centralized site. Too frequent inter
+region traffic.
+
+-  KeyStone service(Centralized) with PKI token
+
+    cons: inter region traffic for tokenrevocation list management, the token
+revocation list request from all other sites has to be sent to the centralized
+site. Too frequent inter region traffic.
+
+-  KeyStone service(Centralized) with UUID token
+
+    cons: inter region traffic for token validation, the token validation
+request from all other sites has to be sent to the centralized site. Too
+frequent inter region traffic.
+
+Prototype
+-----------
+    A prototype of the candidate solution "KeyStone service(Distributed) with
+Fernet token + Async replication ( multi-cluster mode)" has been executed Hans
+Feldt and Chaoyi Huang, please refer to https://github.com/hafe/dockers/ . And
+one issue was found "Can't specify identity endpoint for token validation among
+several keystone servers in keystonemiddleware", please refer to the Gaps
+section.
+
+Gaps
+====
+    Can't specify identity endpoint for token validation among several keystone
+servers in keystonemiddleware.
+
+
+**NAME-THE-MODULE issues:**
+
+* keystonemiddleware
+
+  * Can't specify identity endpoint for token validation among several keystone
+  * servers in keystonemiddleware:
+  * https://bugs.launchpad.net/keystone/+bug/1488347
+
+Affected By
+-----------
+    OPNFV multisite cloud.
+
+Conclusion
+-----------
+
+    As the prototype demonstrate the cluster level aysn. replication capability
+and fernet token validation in local site is feasible. And the candidate
+solution "KeyStone service(Distributed) with Fernet token + Async replication (
+star-mode)" is simplified solution of the prototyped one, it's much more easier
+in deployment and maintenance, with better scalability.
+
+    Therefore the candidate solution "KeyStone service(Distributed) with Fernet
+token + Async replication ( star-mode)" for multsite OPNFV cloud is
+recommended.
+
+References
+==========
+
+    There are 3 format token (UUID, PKI/PKIZ, Fernet) provided byKeyStone, this
+blog give a very good description, benchmark and comparation:
+    http://dolphm.com/the-anatomy-of-openstack-keystone-token-formats/
+    http://dolphm.com/benchmarking-openstack-keystone-token-formats/
+
+    To understand the benefit and shortage of PKI/PKIZ token, pleaserefer to :
+    https://www.mirantis.com/blog/understanding-openstack-authentication-keystone-pk
+
+    To understand KeyStone federation and how to use it:
+    http://blog.rodrigods.com/playing-with-keystone-to-keystone-federation/
+
+    To integrate KeyStone with external enterprise ready authentication system
+    https://blog-nkinder.rhcloud.com/?p=130.
+
+    Key repliocation used in KeyStone Fernet token
+    http://lbragstad.com/?p=133,
+    http://lbragstad.com/?p=156
+
+    KeyStone revoke
+    http://specs.openstack.org/openstack/keystone-specs/api/v3/identity-api-v3-os-revoke-ext.html
author	Chaoyi Huang <joehuang@huawei.com>	2015-11-06 00:17:52 +0000
committer	Gerrit Code Review <gerrit@172.30.200.206>	2015-11-06 00:17:52 +0000
commit	2783cc9d74916667ffcbb4616b687b5dd2701933 (patch)
tree	6eca93a9e93571d7b3a0286370dfedf885d60dd5
parent	9009692b06aa7323af3c318d02de21d0fa1ac335 (diff)
parent	33e4c34ba7d4017049a54138dc5980bee395e274 (diff)