diff options
author | Chaoyi Huang <joehuang@huawei.com> | 2015-09-01 11:27:19 +0800 |
---|---|---|
committer | Chaoyi Huang <joehuang@huawei.com> | 2015-09-09 17:36:25 +0800 |
commit | 33e4c34ba7d4017049a54138dc5980bee395e274 (patch) | |
tree | 57ef8c6dbe7e755e6a78ba073a43da1a5fcb2f06 /multisite-identity-service-management.rst | |
parent | 90c2b21ea0f4e23ce554e531e8ef4695bb4aa6b4 (diff) |
Multisite identity service management
A user should, using a single authentication point be able to manage virtual resources spread
over multiple OpenStack regions.This document will describe the use case, requirement token format,
and different KeyStone service deployment solution, analysis and prototype, gaps found and
conclusion.
JIRA ID: https://jira.opnfv.org/browse/MULTISITE-2
Change-Id: Ic546f27fc9ca44bce4e972ecedeacf1b0560b88e
Signed-off-by: Chaoyi Huang <joehuang@huawei.com>
Diffstat (limited to 'multisite-identity-service-management.rst')
-rw-r--r-- | multisite-identity-service-management.rst | 376 |
1 files changed, 376 insertions, 0 deletions
diff --git a/multisite-identity-service-management.rst b/multisite-identity-service-management.rst new file mode 100644 index 0000000..f46c4b4 --- /dev/null +++ b/multisite-identity-service-management.rst @@ -0,0 +1,376 @@ +This work is licensed under a Creative Commons Attribution 3.0 Unported +License. +http://creativecommons.org/licenses/by/3.0/legalcode + + +======================================= + Multisite identity service management +======================================= + +Glossary +======== + +There are 3 types of token supported by OpenStack KeyStone + **UUID** + + **PKI/PKIZ** + + **FERNET** + +Please refer to reference section for these token formats, benchmark and +comparation. + + +Problem description +=================== + +Abstract +------------ + +a user should, using a single authentication point be able to manage virtual +resources spread over multiple OpenStack regions. + +Description +------------ + +- User/Group Management: e.g. use of LDAP, should OPNFV be agnostic to this? + Reusing the LDAP infrastructure that is mature and has features lacking in +Keystone (e.g.password aging and policies). KeyStone can use external system to +do the user authentication, and user/group management could be the job of +external system, so that KeyStone can reuse/co-work with enterprise identity +management. KeyStone's main role in OpenStack is to provide +service(Nova,Cinder...) aware token, and do the authorization. You can refer to +this post https://blog-nkinder.rhcloud.com/?p=130.Therefore, LDAP itself should +be a topic out of our scope. + +- Role assignment: In case of federation(and perhaps other solutions) it is not + feasible/scalable to do role assignment to users. Role assignment to groups +is better. Role assignment will be done usually based on group. KeyStone +supports this. + +- Amount of inter region traffic: should be kept as little as possible, + consider CERNs Ceilometer issue as described in +http://openstack-in-production.blogspot.se/2014/03/cern-cloud-architecture-update-for.html + +Requirement analysis +=========================== + +- A user is provided with a single authentication URL to the Identity + (Keystone) service. Using that URL, the user authenticates with Keystone by +requesting a token typically using username/password credentials. The keystone +server validates the credentials, possibly with an external LDAP/AD server and +returns a token to the user. With token type UUID/Fernet, the user request the +service catalog. With PKI tokens the service catalog is included in the token. +The user sends a request to a service in a selected region including the token. +Now the service in the region, say Nova needs to validate the token. Nova uses +its configured keystone endpoint and service credentials to request token +validation from Keystone. The Keystone token validation should preferably be +done in the same region as Nova itself. Now Keystone has to validate the token +that also (always?) includes a project ID in order to make sure the user is +authorized to use Nova. The project ID is stored in the assignment backend - +tables in the Keystone SQL database. For this project ID validation the +assignment backend database needs to have the same content as the keystone who +issued the token. + +- So either 1) services in all regions are configured with a central keystone + endpoint through which all token validations will happen. or 2) the Keystone +assignment backend database is replicated and thus available to Keystone +instances locally in each region. + + Alt 2) is obviously the only scalable solution that produce no inter region +traffic for normal service usage. Only when data in the assignment backend is +changed, replication traffic will be sent between regions. Assignment data +includes domains, projects, roles and role assignments. + +Keystone deployment: + + - Centralized: a single Keystone service installed in some location, either + in a "master" region or totally external as a service to OpenStack + regions. + - Distributed: a Keystone service is deployed in each region + +Token types: + + - UUID: tokens are persistently stored and creates a lot of database + traffic, the persistence of token is for the revoke purpose. UUID tokens +are online validated by Keystone, each API calling to service will ask token +validation from KeyStone. Keystone can become a bottleneck in a large system +due to this. UUID token type is not suitable for use in multi region clouds at +all, no matter the solution used for the Keystone database replication (or +not). UUID tokens have a fixed size. + + - PKI: tokens are non persistent cryptographic based tokens and offline + validated (not by the Keystone service) by Keystone middleware +which is part of other services such as Nova. Since PKI tokens include endpoint +for all services in all regions, the token size can become big.There are +several ways to reduce the token size, no catalog policy, endpoint filter to +make a project binding with limited endpoints, and compressed PKI token - PKIZ, +but the size of token is still predictable, make it difficult to manage. If no +catalog applied, that means the user can access all regions, in some scenario, +it's not allowed to do like this. + + - Fernet: tokens are non persistent cryptographic based tokens and online + validated by the Keystone service. Fernet tokens are more lightweigth +then PKI tokens and have a fixed size. + + PKI (offline validated) are needed with a centralized Keystone to avoid +inter region traffic. PKI tokens do produce Keystone traffic for revocation +lists. + + Fernet tokens requires Keystone deployed in a distributed manner, again to +avoid inter region traffic. + + Cryptographic tokens brings new (compared to UUID tokens) issues/use-cases +like key rotation, certificate revocation. Key management is out of scope of +this use case. + +Database deployment: + + Database replication: + -Master/slave asynchronous: supported by the database server itself +(mysql/mariadb etc), works over WAN, it's more scalable + -Multi master synchronous: Galera(others like percona), not so scalable, +for multi-master writing, and need more parameter tunning for WAN latency. + -Symmetrical/asymmetrical: data replicated to all regions or a subset, +in the latter case it means some regions needs to access Keystone in another +region. + + Database server sharing: + In an OpenStack controller normally many databases from different +services are provided from the same database server instance. For HA reasons, +the database server is usually synchronously replicated to a few other nodes +(controllers) to form a cluster. Note that _all_ database are replicated in +this case, for example when Galera sync repl is used. + + Only the Keystone database can be replicated to other sites. Replicating +databases for other services will cause those services to get of out sync and +malfunction. + + Since only the Keystone database is to be sync replicated to another +region/site, it's better to deploy Keystone database into its own +database server with extra networking requirement, cluster or replication +configuration. How to support this by installer is out of scope. + + The database server can be shared when async master/slave repl is used, if +global transaction identifiers GTID is enabled. + + +Candidate solution analysis +------------------------------------ + +- KeyStone service (Distributed) with Fernet token + + Fernet token is a very new format, and just introduced recently,the biggest +gain for this token format is :1) lightweight, size is small to be carried in +the API request, not like PKI token( as the sites increased, the endpoint-list +will grows and the token size is too long to carry in the API request) 2) no +token persistence, this also make the DB not changed too much and with light +weight data size (just project. User, domain, endpoint etc). The drawback for +the Fernet token is that token has to be validated by KeyStone for each API +request. + + This makes that the DB of KeyStone can work as a cluster in multisite (for +example, using MySQL galera cluster). That means install KeyStone API server in +each site, but share the same the backend DB cluster.Because the DB cluster +will synchronize data in real time to multisite, all KeyStone server can see +the same data. + + Because each site with KeyStone installed, and all data kept same, +therefore all token validation could be done locally in the same site. + + The challenge for this solution is how many sites the DB cluster can +support. Question is aksed to MySQL galera developers, their answer is that no +number/distance/network latency limitation in the code. But in the practice, +they have seen a case to use MySQL cluster in 5 data centers, each data centers +with 3 nodes. + + This solution will be very good for limited sites which the DB cluster can +cover very well. + +- KeyStone service(Distributed) with Fernet token + Async replication ( + multi-cluster mode). + + We may have several KeyStone cluster with Fernet token, for example, +cluster1 ( site1, site2, … site 10 ), cluster 2 ( site11, site 12,..,site 20). +Then do the DB async replication among different cluster asynchronously. + + A prototype of this has been down on this. In some blogs they call it +"hybridreplication". Architecturally you have a master region where you do +keystone writes. The other regions is read-only. +http://severalnines.com/blog/deploy-asynchronous-slave-galera-mysql-easy-way +http://severalnines.com/blog/replicate-mysql-server-galera-cluster + + Only one DB cluster (the master DB cluster) is allowed to write(but still +multisite, not all sites), other clusters waiting for replication. Inside the +master cluster, "write" is allowed in multiple region for the distributed lock +in the DB. But please notice the challenge of key distribution and rotation for +Fernet token, you can refer to these two blogs: http://lbragstad.com/?p=133, +http://lbragstad.com/?p=156 + +- KeyStone service(Distributed) with Fernet token + Async replication ( + star-mode). + + one master KeyStone cluster with Fernet token in two sites (for site level +high availability purpose), other sites will be installed with at least 2 slave +nodes where the node is configured with DB async replication from the master +cluster members, and one slave’s mater node in site1, another slave’s master +node in site 2. + + Only the master cluster nodes are allowed to write, other slave nodes +waiting for replication from the master cluster ( very little delay) member. +But the chanllenge of key distribution and rotation for Fernet token should be +settled, you can refer to these two blogs: http://lbragstad.com/?p=133, +http://lbragstad.com/?p=156 + + Pros. + Why cluster in the master sites? There are lots of master nodes in the +cluster, in order to provide more slaves could be done with async. replication +in parallel. Why two sites for the master cluster? to provide higher +reliability (site level) for writing request. + Why using multi-slaves in other sites. Slave has no knowledge of other +slaves, so easy to manage multi-slaves in one site than a cluster, and +multi-slaves work independently but provide multi-instance redundancy(like a +cluster, but independent). + + Cons. The distribution/rotation of key management. + +- KeyStone service(Distributed) with PKI token + + The PKI token has one great advantage is that the token validation can be +done locally, without sending token validation request toKeyStone server. The +drawback of PKI token is 1) the endpoint list size in the token. If a project +will be only spread in very limited site number(region number), then we can use +the endpoint filter to reduce the token size, make it workable even a lot of +sites in the cloud. 2) KeyStone middleware(the old KeyStone client, which +co-locate in Nova/xxx-API) will have to send the request to the KeyStone server +frequently for the revoke-list, in order to reject some malicious API request, +for example, a user has be deactivated, but use an old token to access +OpenStack service. + + For this solution, except above issues, we need also to provide KeyStone +Active-Active mode across site to reduce the impact of site failure. And the +revoke-list request is very frequently asked, so the performance of the +KeyStone server needs also to be taken care. + + Site level keystone load balance is required to provide site level +redundancy. Otherwise the KeyStone middleware will not switch request to the +health KeyStone server in time. + + This solution can be used for some scenario, especially a project only +spread in limited sites ( regions ). + + And also the cert distribution/revoke to each site / API server for token +validation is required. + +- KeyStone service(Distributed) with UUID token + + Because each token validation will be sent to KeyStone server,and the token +persistence also makes the DB size larger than Fernet token, not so good as the +fernet token to provide a distributed KeyStone service. UUID is a solution +better for small scale and inside one site. + + Cons: UUID tokens are persistently stored so will cause a lot of inter +region replication traffic, tokens will be persisted for authorization and +revoke purpose, the frequent changed database leads to a lot of inter region +replication traffic. + +- KeyStone service(Distributed) with Fernet token + KeyStone federation You + have to accept the drawback of KeyStone federation if you have a lot of +sites/regions. Please refer to KeyStone federation section + +- KeyStone federation + In this solution, we can install KeyStone service in each site and with +its own database. Because we have to make the KeyStone IdP and SP know each +other, therefore the configuration needs to be done accordingly, and setup the +role/domain/group mapping, create regarding region in the pair.As sites +increase, if each user is able to access all sites, then full-meshed +mapping/configuration has to be done. Whenever you add one more site, you have +to do n*(n-1) sites configuration/mapping. The complexity will be great enough +as the sites number increase. + + KeyStone Federation is mainly for different cloud admin to borrow/rent +resources, for example, A company and B company, A private cloud and B public +cloud, and both of them using OpenStack based cloud. Therefore a lot of mapping +and configuration has to be done to make it work. + +- KeyStone service (Centralized)with Fernet token + + cons: inter region traffic for token validation, token validation requests +from all other sites has to be sent to the centralized site. Too frequent inter +region traffic. + +- KeyStone service(Centralized) with PKI token + + cons: inter region traffic for tokenrevocation list management, the token +revocation list request from all other sites has to be sent to the centralized +site. Too frequent inter region traffic. + +- KeyStone service(Centralized) with UUID token + + cons: inter region traffic for token validation, the token validation +request from all other sites has to be sent to the centralized site. Too +frequent inter region traffic. + +Prototype +----------- + A prototype of the candidate solution "KeyStone service(Distributed) with +Fernet token + Async replication ( multi-cluster mode)" has been executed Hans +Feldt and Chaoyi Huang, please refer to https://github.com/hafe/dockers/ . And +one issue was found "Can't specify identity endpoint for token validation among +several keystone servers in keystonemiddleware", please refer to the Gaps +section. + +Gaps +==== + Can't specify identity endpoint for token validation among several keystone +servers in keystonemiddleware. + + +**NAME-THE-MODULE issues:** + +* keystonemiddleware + + * Can't specify identity endpoint for token validation among several keystone + * servers in keystonemiddleware: + * https://bugs.launchpad.net/keystone/+bug/1488347 + +Affected By +----------- + OPNFV multisite cloud. + +Conclusion +----------- + + As the prototype demonstrate the cluster level aysn. replication capability +and fernet token validation in local site is feasible. And the candidate +solution "KeyStone service(Distributed) with Fernet token + Async replication ( +star-mode)" is simplified solution of the prototyped one, it's much more easier +in deployment and maintenance, with better scalability. + + Therefore the candidate solution "KeyStone service(Distributed) with Fernet +token + Async replication ( star-mode)" for multsite OPNFV cloud is +recommended. + +References +========== + + There are 3 format token (UUID, PKI/PKIZ, Fernet) provided byKeyStone, this +blog give a very good description, benchmark and comparation: + http://dolphm.com/the-anatomy-of-openstack-keystone-token-formats/ + http://dolphm.com/benchmarking-openstack-keystone-token-formats/ + + To understand the benefit and shortage of PKI/PKIZ token, pleaserefer to : + https://www.mirantis.com/blog/understanding-openstack-authentication-keystone-pk + + To understand KeyStone federation and how to use it: + http://blog.rodrigods.com/playing-with-keystone-to-keystone-federation/ + + To integrate KeyStone with external enterprise ready authentication system + https://blog-nkinder.rhcloud.com/?p=130. + + Key repliocation used in KeyStone Fernet token + http://lbragstad.com/?p=133, + http://lbragstad.com/?p=156 + + KeyStone revoke + http://specs.openstack.org/openstack/keystone-specs/api/v3/identity-api-v3-os-revoke-ext.html |