summaryrefslogtreecommitdiffstats
path: root/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst
diff options
context:
space:
mode:
Diffstat (limited to 'cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst')
-rw-r--r--cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst164
1 files changed, 164 insertions, 0 deletions
diff --git a/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst
new file mode 100644
index 0000000..f47ae11
--- /dev/null
+++ b/cyborg_enhancement/mitaka_version/cyborg/doc/source/devdoc/specs/pike/approved/cyborg-agent.rst
@@ -0,0 +1,164 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+==========================================
+ Cyborg Agent Proposal
+==========================================
+
+https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-agent
+
+This spec proposes the responsibilities and initial design of the
+Cyborg Agent.
+
+Problem description
+===================
+
+Cyborg requires an agent on the compute hosts to manage the several
+responsibilities, including locating accelerators, monitoring their
+status, and orchestrating driver operations.
+
+Use Cases
+---------
+
+Use of accelerators attached to virtual machine instances in OpenStack
+
+Proposed change
+===============
+
+Cyborg Agent resides on various compute hosts and monitors them for accelerators.
+On it's first run Cyborg Agent will run the detect accelerator functions of all
+it's installed drivers. The resulting list of accelerators available on the host
+will be reported to the conductor where it will be stored into the database and
+listed during API requests. By default accelerators will be inserted into the
+database in a inactive state. It will be up to the operators to manually set
+an accelerator to 'ready' at which point cyborg agent will be responsible for
+calling the drivers install function and ensuring that the accelerator is ready
+for use.
+
+In order to mirror the current Nova model of using the placement API each Agent
+will send updates on it's resources directly to the placement API endpoint as well
+as to the conductor for usage aggregation. This should keep placement API up to date
+on accelerators and their usage.
+
+Alternatives
+------------
+
+There are lots of alternate ways to lay out the communication between the Agent
+and the API endpoint or the driver. Almost all of them involving exactly where we
+draw the line between the driver, Conductor , and Agent. I've written my proposal
+with the goal of having the Agent act mostly as a monitoring tool, reporting to
+the cloud operator or other Cyborg components to take action. A more active role
+for Cyborg Agent is possible but either requires significant synchronization with
+the Conductor or potentially steps on the toes of operators.
+
+Data model impact
+-----------------
+
+Cyborg Agent will create new entries in the database for accelerators it detects
+it will also update those entries with the current status of the accelerator
+at a high level. More temporary data like the current usage of a given accelerator
+will be broadcast via a message passing system and won't be stored.
+
+Cyborg Agent will retain a local cache of this data with the goal of not losing accelerator
+state on system interruption or loss of connection.
+
+
+REST API impact
+---------------
+
+TODO once we firm up who's responsible for what.
+
+Security impact
+---------------
+
+Monitoring capability might be useful to an attacker, but without root
+this is a fairly minor concern.
+
+Notifications impact
+--------------------
+
+Notifying users that their accelerators are ready?
+
+Other end user impact
+---------------------
+
+Interaction details around adding/removing/setting up accelerators
+details TBD.
+
+Performance Impact
+------------------
+
+Agent heartbeat for updated accelerator performance stats might make
+scaling to many accelerator hosts a challenge for the Cyborg endpoint
+and database. Perhaps we should consider doing an active 'load census'
+before scheduling instances? But that just moves the problem from constant
+load to issues with a bootstorm.
+
+
+Other deployer impact
+---------------------
+
+By not placing the drivers with the Agent we keep the deployment footprint
+pretty small. We do add development complexity and security concerns sending
+them over the wire though.
+
+Developer impact
+----------------
+
+TBD
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+ <jkilpatr>
+
+Other contributors:
+ <launchpad-id or None>
+
+Work Items
+----------
+
+* Agent implementation
+
+Dependencies
+============
+
+* Cyborg Driver Spec
+* Cyborg API Spec
+* Cyborg Conductor Spec
+
+Testing
+=======
+
+CI infrastructure with a set of accelerators, drivers, and hardware will be
+required for testing the Agent installation and operation regularly.
+
+Documentation Impact
+====================
+
+Little to none. Perhaps on an on compute config file that may need to be
+documented. But I think it's best to avoid local configuration where possible.
+
+References
+==========
+
+Other Cyborg Specs
+
+History
+=======
+
+
+.. list-table:: Revisions
+ :header-rows: 1
+
+ * - Release
+ - Description
+ * - Pike
+ - Introduced