diff options
Diffstat (limited to 'cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike')
4 files changed, 877 insertions, 0 deletions
diff --git a/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst new file mode 100644 index 0000000..f47ae11 --- /dev/null +++ b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst @@ -0,0 +1,164 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================================== + Cyborg Agent Proposal +========================================== + +https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-agent + +This spec proposes the responsibilities and initial design of the +Cyborg Agent. + +Problem description +=================== + +Cyborg requires an agent on the compute hosts to manage the several +responsibilities, including locating accelerators, monitoring their +status, and orchestrating driver operations. + +Use Cases +--------- + +Use of accelerators attached to virtual machine instances in OpenStack + +Proposed change +=============== + +Cyborg Agent resides on various compute hosts and monitors them for accelerators. +On it's first run Cyborg Agent will run the detect accelerator functions of all +it's installed drivers. The resulting list of accelerators available on the host +will be reported to the conductor where it will be stored into the database and +listed during API requests. By default accelerators will be inserted into the +database in a inactive state. It will be up to the operators to manually set +an accelerator to 'ready' at which point cyborg agent will be responsible for +calling the drivers install function and ensuring that the accelerator is ready +for use. + +In order to mirror the current Nova model of using the placement API each Agent +will send updates on it's resources directly to the placement API endpoint as well +as to the conductor for usage aggregation. This should keep placement API up to date +on accelerators and their usage. + +Alternatives +------------ + +There are lots of alternate ways to lay out the communication between the Agent +and the API endpoint or the driver. Almost all of them involving exactly where we +draw the line between the driver, Conductor , and Agent. I've written my proposal +with the goal of having the Agent act mostly as a monitoring tool, reporting to +the cloud operator or other Cyborg components to take action. A more active role +for Cyborg Agent is possible but either requires significant synchronization with +the Conductor or potentially steps on the toes of operators. + +Data model impact +----------------- + +Cyborg Agent will create new entries in the database for accelerators it detects +it will also update those entries with the current status of the accelerator +at a high level. More temporary data like the current usage of a given accelerator +will be broadcast via a message passing system and won't be stored. + +Cyborg Agent will retain a local cache of this data with the goal of not losing accelerator +state on system interruption or loss of connection. + + +REST API impact +--------------- + +TODO once we firm up who's responsible for what. + +Security impact +--------------- + +Monitoring capability might be useful to an attacker, but without root +this is a fairly minor concern. + +Notifications impact +-------------------- + +Notifying users that their accelerators are ready? + +Other end user impact +--------------------- + +Interaction details around adding/removing/setting up accelerators +details TBD. + +Performance Impact +------------------ + +Agent heartbeat for updated accelerator performance stats might make +scaling to many accelerator hosts a challenge for the Cyborg endpoint +and database. Perhaps we should consider doing an active 'load census' +before scheduling instances? But that just moves the problem from constant +load to issues with a bootstorm. + + +Other deployer impact +--------------------- + +By not placing the drivers with the Agent we keep the deployment footprint +pretty small. We do add development complexity and security concerns sending +them over the wire though. + +Developer impact +---------------- + +TBD + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + <jkilpatr> + +Other contributors: + <launchpad-id or None> + +Work Items +---------- + +* Agent implementation + +Dependencies +============ + +* Cyborg Driver Spec +* Cyborg API Spec +* Cyborg Conductor Spec + +Testing +======= + +CI infrastructure with a set of accelerators, drivers, and hardware will be +required for testing the Agent installation and operation regularly. + +Documentation Impact +==================== + +Little to none. Perhaps on an on compute config file that may need to be +documented. But I think it's best to avoid local configuration where possible. + +References +========== + +Other Cyborg Specs + +History +======= + + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Pike + - Introduced diff --git a/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-api-proposal.rst b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-api-proposal.rst new file mode 100644 index 0000000..42ddad7 --- /dev/null +++ b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-api-proposal.rst @@ -0,0 +1,410 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +=================== +Cyborg API proposal +=================== + +https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-api + +This spec proposes to provide the initial API design for Cyborg. + +Problem description +=================== + +Cyborg as a common management framework for dedicated devices (hardware/ +software accelerators, high-speed storage, etc) needs RESTful API to expose +the basic functionalities. + +Use Cases +--------- + +* As a user I want to be able to spawn VM with dedicated hardware, so +that I can utilize provided hardware. +* As a compute service I need to know how requested resource should be +attached to the VM. +* As a scheduler service I'd like to know on which resource provider +requested resource can be found. + +Proposed change +=============== + +In general we want to develop the APIs that support basic life cycle management +for Cyborg. + +Life Cycle Management Phases +---------------------------- + +For cyborg, LCM phases include typical create, retrieve, update, delete operations. +One thing should be noted that deprovisioning mainly refers to detach(delete) operation +which deactivate an acceleration capability but preserve the resource itself +for future usage. For Cyborg, from functional point of view, the LCM includes provision, +attach,update,list, and detach. There is no notion of deprovisioning for Cyborg API +in a sense that we decomission or disconnect an entire accelerator device from +the bus. + +Difference between Provision and Attach/Detach +---------------------------------------------- + +Noted that while the APIs support provisioning via CRUD operations, attach/detach +are considered different: + +* Provision operations (create) will involve api-> +conductor->agent->driver workflow, where as attach/detach (update/delete) could be taken +care of at the driver layer without the involvement of the pre-mentioned workflow. This +is similar to the difference between create a volume and attach/detach a volume in Cinder. + +* The attach/detach in Cyborg API will mainly involved in DB status modification. + +Difference between Attach/Detach To VM and Host +----------------------------------------------- + +Moreover there are also differences when we attach an accelerator to a VM or +a host, similar to Cinder. + +* When the attachment happens to a VM, we are expecting that Nova could call +the virt driver to perform the action for the instance. In this case Nova +needs to support the acc-attach and acc-detach action. + +* When the attachment happens to a host, we are expecting that Cyborg could +take care of the action itself via Cyborg driver. Althrough currently there +is the generic driver to accomplish the job, we should consider a os-brick +like standalone lib for accelerator attach/detach operations. + +Alternatives +------------ + +* For attaching an accelerator to a VM, we could let Cyborg perform the action +itself, however it runs into the risk of tight-coupling with Nova of which Cyborg +needs to get instance related information. +* For attaching an accelerator to a host, we could consider to use Ironic drivers +however it might not bode well with the standalone accelerator rack scenarios where +accelerators are not attached to server at all. + +Data model impact +----------------- + +A new table in the API database will be created:: + + CREATE TABLE accelerators ( + accelerator_id INT NOT NULL, + device_type STRING NOT NULL, + acc_type STRING NOT NULL, + acc_capability STRING NOT NULL, + vendor_id STRING, + product_id STRING, + remotable INT, + ); + +Note that there is an ongoing discussion on nested resource +provider new data structures that will impact Cyborg DB imp- +lementation. For code implementation it should be aligned +with resource provider db requirement as much as possible. + + +REST API impact +--------------- + +The API changes add resource endpoints to: + +* `GET` a list of all the accelerators +* `GET` a single accelerator for a given id +* `POST` create a new accelerator resource +* `PUT` an update to an existing accelerator spec +* `PUT` attach an accelerator to a VM or a host +* `DELETE` detach an existing accelerator for a given id + +The following new REST API call will be created: + +'GET /accelerators' +************************* + +Return a list of accelerators managed by Cyborg + +Example message body of the response to the GET operation:: + + 200 OK + Content-Type: application/json + + { + "accelerator":[ + { + "uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e", + "acc_specs": + { + "remote":0, + "num":1, + "device_type":"CRYPTO" + "acc_capability": + { + "num":2 + "ipsec": + { + "aes": + { + "3des":50, + "num":1, + } + } + } + } + }, + { + "uuid":"eaaf1c04-ced2-40e4-89a2-87edded06d64", + "acc_specs": + { + "remote":0, + "num":1, + "device_type":"CRYPTO" + "acc_capability": + { + "num":2 + "ipsec": + { + "aes": + { + "3des":40, + "num":1, + } + } + } + } + } + ] + } + +'GET /accelerators/{uuid}' +************************* + +Retrieve a certain accelerator info indetified by '{uuid}' + +Example GET Request:: + + GET /accelerators/8e45a2ea-5364-4b0d-a252-bf8becaa606e + + 200 OK + Content-Type: application/json + + { + "uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e", + "acc_specs":{ + "remote":0, + "num":1, + "device_type":"CRYPTO" + "acc_capability":{ + "num":2 + "ipsec":{ + "aes":{ + "3des":50, + "num":1, + } + } + } + } + } + +If the accelerator does not exist a `404 Not Found` must be +returned. + +'POST /accelerators/{uuid}' +******************* + +Create a new accelerator + +Example POST Request:: + + Content-type: application/json + + { + "name": "IPSec Card", + "uuid": "8e45a2ea-5364-4b0d-a252-bf8becaa606e" + } + +The body of the request must match the following JSONSchema document:: + + { + "type": "object", + "properties": { + "name": { + "type": "string" + }, + "uuid": { + "type": "string", + "format": "uuid" + } + }, + "required": [ + "name" + ] + "additionalProperties": False + } + +The response body is empty. The headers include a location header +pointing to the created accelerator resource:: + + 201 Created + Location: /accelerators/8e45a2ea-5364-4b0d-a252-bf8becaa606e + +A `409 Conflict` response code will be returned if another accelerator +exists with the provided name. + +'PUT /accelerators/{uuid}/{acc_spec}' +************************* + +Update the spec for the accelerator identified by `{uuid}`. + +Example:: + + PUT /accelerator/8e45a2ea-5364-4b0d-a252-bf8becaa606e + + Content-type: application/json + + { + "acc_specs":{ + "remote":0, + "num":1, + "device_type":"CRYPTO" + "acc_capability":{ + "num":2 + "ipsec":{ + "aes":{ + "3des":50, + "num":1, + } + } + } + } + } + +The returned HTTP response code will be one of the following: + +* `200 OK` if the spec is successfully updated +* `404 Not Found` if the accelerator identified by `{uuid}` was + not found +* `400 Bad Request` for bad or invalid syntax +* `409 Conflict` if another process updated the same spec. + + +'PUT /accelerators/{uuid}' +************************* + +Attach the accelerator identified by `{uuid}`. + +Example:: + + PUT /accelerator/8e45a2ea-5364-4b0d-a252-bf8becaa606e + + Content-type: application/json + + { + "name": "IPSec Card", + "uuid": "8e45a2ea-5364-4b0d-a252-bf8becaa606e" + } + +The returned HTTP response code will be one of the following: + +* `200 OK` if the accelerator is successfully attached +* `404 Not Found` if the accelerator identified by `{uuid}` was + not found +* `400 Bad Request` for bad or invalid syntax +* `409 Conflict` if another process attach the same accelerator. + + +'DELETE /accelerator/{uuid}' +**************************** + +Detach the accelerator identified by `{uuid}`. + +The body of the request and the response is empty. + +The returned HTTP response code will be one of the following: + +* `204 No Content` if the request was successful and the accelerator was detached. +* `404 Not Found` if the accelerator identified by `{uuid}` was + not found. +* `409 Conflict` if there exist allocations records for any of the + accelerator resource that would be detached as a result of detaching the accelerator. + + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +Developers can use this REST API after it has been implemented. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + zhipengh <huangzhipeng@huawei.com> + +Work Items +---------- + +* Implement the APIs specified in this spec +* Proposal to Nova about the new accelerator +attach/detach api +* Implement the DB specified in this spec + + +Dependencies +============ + +None. + +Testing +======= + +* Unit tests will be added to Cyborg API. + +Documentation Impact +==================== + +None + +References +========== + +None + +History +======= + + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Pike + - Introduced diff --git a/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-conductor.rst b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-conductor.rst new file mode 100644 index 0000000..a1e8ffc --- /dev/null +++ b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-conductor.rst @@ -0,0 +1,142 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================================== + Cyborg Conductor Proposal +========================================== + +https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-agent + +This spec proposes the responsibilities and initial design of the +Cyborg Conductor. + +Problem description +=================== + +Cyborg requires a conductor on the controller hosts to manage the cyborg +system state and coalesce database operations. + +Use Cases +--------- + +Use of accelerators attached to virtual machine instances in OpenStack + +Proposed change +=============== + +Cyborg Conductor will reside on the control node and will be +responsible for stateful actions taken by Cyborg. Acting as both a cache to +the database and as a method of combining reads and writes to the database. +All other Cyborg components will go through the conductor for database operations. + +Alternatives +------------ + +Having each Cyborg Agent instance hit the database on it's own is a possible +alternative, and it may even be feasible if the accelerator load monitoring rate is +very low and the vast majority of operations are reads. But since we intend to store +metadata about accelerator usage updated regularly this model probably will not scale +well. + +Data model impact +----------------- + +Using the conductor 'properly' will result in little or no per instance state and stateful +operations moving through the conductor with the exception of some local caching where it +can be garunteed to work well. + +REST API impact +--------------- + +N/A + +Security impact +--------------- + +Negligible + +Notifications impact +-------------------- + +N/A + +Other end user impact +--------------------- + +Faster Cybrog operation and less database load. + +Performance Impact +------------------ + +Generally positive so long as we don't overload the messaging bus trying +to pass things to the Conductor to write out. + +Other deployer impact +--------------------- + +Conductor must be installed and configured on the controllers. + + +Developer impact +---------------- + +None for API users, internally heavy use of message passing will +be required if we want to keep all system state in the controllers. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + jkilpatr + +Other contributors: + None + +Work Items +---------- + +* Implementation +* Integration with API and Agent + +Dependencies +============ + +* Cyborg API spec +* Cyborg Agent spec + +Testing +======= + +This component should be possible to fully test using unit tests and functional +CI using the dummy driver. + +Documentation Impact +==================== + +Some configuration values tuning save out rate and other parameters on the controller +will need to be documented for end users + +References +========== + +Cyborg API Spec +Cyborg Agent Spec + +History +======= + + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Pike + - Introduced diff --git a/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-driver-proposal.rst b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-driver-proposal.rst new file mode 100644 index 0000000..4fabf56 --- /dev/null +++ b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-driver-proposal.rst @@ -0,0 +1,161 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +============================== +Cyborg Generic Driver Proposal +============================== + +https://blueprints.launchpad.net/openstack-cyborg/+spec/generic-driver-cyborg + +This spec proposes to provide the initial design for Cyborg's generic driver. + +Problem description +=================== + +This blueprint proposes to add a generic driver for openstack-cyborg. +The goal is to provide users & operators with a reliable generic +implementation that is hardware agnostic and provides basic +accelerator functionality. + +Use Cases +--------- + +* As an admin user and a non-admin user with elevated privileges, I should be + able to identify and discover attached accelerator backends. +* As an admin user and a non-admin user with elevated privileges, I should be + able to view services on each attached backend after the agent has + discovered services on each backend. +* As an admin user and a non-admin user, I should be able to list and update + attached accelerators by driver by querying nova with the Cyborg-API. +* As an admin user and a non-admin user with elevated privileges, I should be + able to install accelerator generic driver. +* As an admin user and a non-admin user with elevated privileges, I should be + able to uninstall accelerator generic driver. +* As an admin user and a non-admin user with elevated privileges, I should be + able to issue attach command to the instance via the driver which gets + routed to Nova via the Cyborg API. +* As an admin user and a non-admin user with elevated privileges, I should be + able to issue detach command to the instance via the driver which gets + routed to Nova via the Cyborg API. + +Proposed change +=============== + +* Cyborg needs a reference implementation that can be used as a model for + future driver implementations and that will be referred to as the generic + driver implementation +* Develop the generic driver implementation that supports CRUD operations for + accelerators for single backend and multi backend setup scenarios. + + +Alternatives +------------ + +None + +Data model impact +----------------- + +* The generic driver will update the central database when any CRUD or + attach/detach operations take place + +REST API impact +--------------- + +This blueprint proposes to add the following APIs: +*cyborg install-driver <driver_id> +*cyborg uninstall-driver <driver_id> +*cyborg attach-instance <instance_id> +*cyborg detach-instance <instance_id> +*cyborg service-list +*cyborg driver-list +*cyborg update-driver <driver_id> +*cyborg discover-services + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +None + +Developer impact +---------------- + +Developers will have access to a reference generic implementation which +can be used to build vendor-specific drivers. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Rushil Chugh <rushil.chugh@gmail.com> + +Work Items +---------- + +This change would entail the following: +* Add a feature to identify and discover attached accelerator backends. +* Add a feature to list services running on the backend +* Add a feature to attach accelerators to the generic backend. +* Add a feature to detach accelerators from the generic backend. +* Add a feature to list accelerators attached to the generic backend. +* Add a feature to modify accelerators attached to the generic backend. +* Defining a reference implementation detailing the flow of requests between + the cyborg-api, cyborg-conductor and nova-compute services. + +Dependencies +============ + +Dependent on Cyborg API and Agent implementations. + +Testing +======= + +* Unit tests will be added test Cyborg generic driver. + +Documentation Impact +==================== + +None + +References +========== + +None + +History +======= + + +.. list-table:: Revisions + :header-rows: 1 + + * - Release + - Description + * - Pike + - Introduced |