Merge "Update docs structure according to new guidelines in https://wiki.opnfv.org/display/DOC"

author: Ryota Mibu <r-mibu@cq.jp.nec.com> 2017-02-17 04:36:05 +0000
committer: Gerrit Code Review <gerrit@opnfv.org> 2017-02-17 04:36:05 +0000
commit: f3ab498aaddb27f6f598a84e2dbe0203ced6d666 (patch)
tree: fc7b2be2681db87adc1eb935e6fdcc93a8bc1645 /docs/development/design
parent: 5d9c24fd28bcc02243306a8c96d0c68809523343 (diff)
parent: d0b22e1d856cf8f78e152dfb6c150e001e03dd52 (diff)
7 files changed, 899 insertions, 0 deletions
diff --git a/docs/development/design/index.rst b/docs/development/design/index.rst
new file mode 100644
index 00000000..963002a0
--- /dev/null
+++ b/docs/development/design/index.rst
@@ -0,0 +1,27 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+****************
+Design Documents
+****************
+
+This is the directory to store design documents which may include draft
+versions of blueprints written before proposing to upstream OSS communities
+such as OpenStack, in order to keep the original blueprint as reviewed in
+OPNFV. That means there could be out-dated blueprints as result of further
+refinements in the upstream OSS community. Please refer to the link in each
+document to find the latest version of the blueprint and status of development
+in the relevant OSS community.
+
+See also https://wiki.opnfv.org/requirements_projects .
+
+.. toctree::
+   :numbered:
+   :maxdepth: 4
+
+   report-host-fault-to-update-server-state-immediately.rst
+   notification-alarm-evaluator.rst
+   rfe-port-status-update.rst
+   port-data-plane-status.rst
+   inspector-design-guideline.rst
+   performance-profiler.rst
diff --git a/docs/development/design/inspector-design-guideline.rst b/docs/development/design/inspector-design-guideline.rst
new file mode 100644
index 00000000..4add8c0f
--- /dev/null
+++ b/docs/development/design/inspector-design-guideline.rst
@@ -0,0 +1,46 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+==========================
+Inspector Design Guideline
+==========================
+
+.. NOTE::
+   This is spec draft of design guideline for inspector component.
+   JIRA ticket to track the update and collect comments: `DOCTOR-73`_.
+
+This document summarize the best practise in designing a high performance
+inspector to meet the requirements in `OPNFV Doctor project`_.
+
+Problem Description
+===================
+
+Some pitfalls has be detected during the development of sample inspector, e.g.
+we suffered a significant `performance degrading in listing VMs in a host`_.
+
+A `patch set for caching the list`_ has been committed to solve issue. When a
+new inspector is integrated, it would be nice to have an evaluation of existing
+design and give recommendations for improvements.
+
+This document can be treated as a source of related blueprints in inspector
+projects.
+
+Guidelines
+==========
+
+Host specific VMs list
+----------------------
+
+TBD, see `DOCTOR-76`_.
+
+Parallel execution
+------------------
+
+TBD, see `discussion in mailing list`_.
+
+.. _DOCTOR-73: https://jira.opnfv.org/browse/DOCTOR-73
+.. _OPNFV Doctor project: https://wiki.opnfv.org/doctor
+.. _performance degrading in listing VMs in a host: https://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-September/012591.html
+.. _patch set for caching the list: https://gerrit.opnfv.org/gerrit/#/c/20877/
+.. _DOCTOR-76: https://jira.opnfv.org/browse/DOCTOR-76
+.. _discussion in mailing list: https://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-October/013036.html
diff --git a/docs/development/design/notification-alarm-evaluator.rst b/docs/development/design/notification-alarm-evaluator.rst
new file mode 100644
index 00000000..d1bf787a
--- /dev/null
+++ b/docs/development/design/notification-alarm-evaluator.rst
@@ -0,0 +1,248 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+============================
+Notification Alarm Evaluator
+============================
+
+.. NOTE::
+   This is spec draft of blueprint for OpenStack Ceilomter Liberty.
+   To see current version: https://review.openstack.org/172893
+   To track development activity:
+   https://blueprints.launchpad.net/ceilometer/+spec/notification-alarm-evaluator
+
+https://blueprints.launchpad.net/ceilometer/+spec/notification-alarm-evaluator
+
+This blueprint proposes to add a new alarm evaluator for handling alarms on
+events passed from other OpenStack services, that provides event-driven alarm
+evaluation which makes new sequence in Ceilometer instead of the polling-based
+approach of the existing Alarm Evaluator, and realizes immediate alarm
+notification to end users.
+
+Problem description
+===================
+
+As an end user, I need to receive alarm notification immediately once
+Ceilometer captured an event which would make alarm fired, so that I can
+perform recovery actions promptly to shorten downtime of my service.
+The typical use case is that an end user set alarm on "compute.instance.update"
+in order to trigger recovery actions once the instance status has changed to
+'shutdown' or 'error'. It should be nice that an end user can receive
+notification within 1 second after fault observed as the same as other helth-
+check mechanisms can do in some cases.
+
+The existing Alarm Evaluator is periodically querying/polling the databases
+in order to check all alarms independently from other processes. This is good
+approach for evaluating an alarm on samples stored in a certain period.
+However, this is not efficient to evaluate an alarm on events which are emitted
+by other OpenStack servers once in a while.
+
+The periodical evaluation leads delay on sending alarm notification to users.
+The default period of evaluation cycle is 60 seconds. It is recommended that
+an operator set longer interval than configured pipeline interval for
+underlying metrics, and also longer enough to evaluate all defined alarms
+in certain period while taking into account the number of resources, users and
+alarms.
+
+Proposed change
+===============
+
+The proposal is to add a new event-driven alarm evaluator which receives
+messages from Notification Agent and finds related Alarms, then evaluates each
+alarms;
+
+* New alarm evaluator could receive event notification from Notification Agent
+  by which adding a dedicated notifier as a publisher in pipeline.yaml
+  (e.g. notifier://?topic=event_eval).
+
+* When new alarm evaluator received event notification, it queries alarm
+  database by Project ID and Resource ID written in the event notification.
+
+* Found alarms are evaluated by referring event notification.
+
+* Depending on the result of evaluation, those alarms would be fired through
+  Alarm Notifier as the same as existing Alarm Evaluator does.
+
+This proposal also adds new alarm type "notification" and "notification_rule".
+This enables users to create alarms on events. The separation from other alarm
+types (such as "threshold" type) is intended to show different timing of
+evaluation and different format of condition, since the new evaluator will
+check each event notification once it received whereas "threshold" alarm can
+evaluate average of values in certain period calculated from multiple samples.
+
+The new alarm evaluator handles Notification type alarms, so we have to change
+existing alarm evaluator to exclude "notification" type alarms from evaluation
+targets.
+
+Alternatives
+------------
+
+There was similar blueprint proposal "Alarm type based on notification", but
+the approach is different. The old proposal was to adding new step (alarm
+evaluations) in Notification Agent every time it received event from other
+OpenStack services, whereas this proposal intends to execute alarm evaluation
+in another component which can minimize impact to existing pipeline processing.
+
+Another approach is enhancement of existing alarm evaluator by adding
+notification listener. However, there are two issues; 1) this approach could
+cause stall of periodical evaluations when it receives bulk of notifications,
+and 2) this could break the alarm portioning i.e. when alarm evaluator received
+notification, it might have to evaluate some alarms which are not assign to it.
+
+Data model impact
+-----------------
+
+Resource ID will be added to Alarm model as an optional attribute.
+This would help the new alarm evaluator to filter out non-related alarms
+while querying alarms, otherwise it have to evaluate all alarms in the project.
+
+REST API impact
+---------------
+
+Alarm API will be extended as follows;
+
+* Add "notification" type into alarm type list
+* Add "resource_id" to "alarm"
+* Add "notification_rule" to "alarm"
+
+Sample data of Notification-type alarm::
+
+  {
+      "alarm_actions": [
+          "http://site:8000/alarm"
+      ],
+      "alarm_id": null,
+      "description": "An alarm",
+      "enabled": true,
+      "insufficient_data_actions": [
+          "http://site:8000/nodata"
+      ],
+      "name": "InstanceStatusAlarm",
+      "notification_rule": {
+          "event_type": "compute.instance.update",
+          "query" : [
+              {
+                  "field" : "traits.state",
+                  "type" : "string",
+                  "value" : "error",
+                  "op" : "eq",
+              },
+          ]
+      },
+      "ok_actions": [],
+      "project_id": "c96c887c216949acbdfbd8b494863567",
+      "repeat_actions": false,
+      "resource_id": "153462d0-a9b8-4b5b-8175-9e4b05e9b856",
+      "severity": "moderate",
+      "state": "ok",
+      "state_timestamp": "2015-04-03T17:49:38.406845",
+      "timestamp": "2015-04-03T17:49:38.406839",
+      "type": "notification",
+      "user_id": "c96c887c216949acbdfbd8b494863567"
+  }
+
+"resource_id" will be refered to query alarm and will not be check permission
+and belonging of project.
+
+Security impact
+---------------
+
+None
+
+Pipeline impact
+---------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance/Scalability Impacts
+-------------------------------
+
+When Ceilomter received a number of events from other OpenStack services in
+short period, this alarm evaluator can keep working since events are queued in
+a messaging queue system, but it can cause delay of alarm notification to users
+and increase the number of read and write access to alarm database.
+
+"resource_id" can be optional, but restricting it to mandatory could be reduce
+performance impact. If user create "notification" alarm without "resource_id",
+those alarms will be evaluated every time event occurred in the project.
+That may lead new evaluator heavy.
+
+Other deployer impact
+---------------------
+
+New service process have to be run.
+
+Developer impact
+----------------
+
+Developers should be aware that events could be notified to end users and avoid
+passing raw infra information to end users, while defining events and traits.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  r-mibu
+
+Other contributors:
+  None
+
+Ongoing maintainer:
+  None
+
+Work Items
+----------
+
+* New event-driven alarm evaluator
+
+* Add new alarm type "notification" as well as AlarmNotificationRule
+
+* Add "resource_id" to Alarm model
+
+* Modify existing alarm evaluator to filter out "notification" alarms
+
+* Add new config parameter for alarm request check whether accepting alarms
+  without specifying "resource_id" or not
+
+Future lifecycle
+================
+
+This proposal is key feature to provide information of cloud resources to end
+users in real-time that enables efficient integration with user-side manager
+or Orchestrator, whereas currently those information are considered to be
+consumed by admin side tool or service.
+Based on this change, we will seek orchestrating scenarios including fault
+recovery and add useful event definition as well as additional traits.
+
+Dependencies
+============
+
+None
+
+Testing
+=======
+
+New unit/scenario tests are required for this change.
+
+Documentation Impact
+====================
+
+* Proposed evaluator will be described in the developer document.
+
+* New alarm type and how to use will be explained in user guide.
+
+References
+==========
+
+* OPNFV Doctor project: https://wiki.opnfv.org/doctor
+
+* Blueprint "Alarm type based on notification":
+  https://blueprints.launchpad.net/ceilometer/+spec/alarm-on-notification
diff --git a/docs/development/design/performance-profiler.rst b/docs/development/design/performance-profiler.rst
new file mode 100644
index 00000000..f834a915
--- /dev/null
+++ b/docs/development/design/performance-profiler.rst
@@ -0,0 +1,118 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+
+====================
+Performance Profiler
+====================
+
+https://goo.gl/98Osig
+
+This blueprint proposes to create a performance profiler for doctor scenarios.
+
+Problem Description
+===================
+
+In the verification job for notification time, we have encountered some
+performance issues, such as
+
+1. In environment deployed by APEX, it meets the criteria while in the one by
+Fuel, the performance is much more poor.
+2. Signification performance degradation was spotted when we increase the total
+number of VMs
+
+It takes time to dig the log and analyse the reason. People have to collect
+timestamp at each checkpoints manually to find out the bottleneck. A performance
+profiler will make this process automatic.
+
+Proposed Change
+===============
+
+Current Doctor scenario covers the inspector and notifier in the whole fault
+management cycle::
+
+  start                                          end
+    +       +         +        +       +          +
+    |       |         |        |       |          |
+    |monitor|inspector|notifier|manager|controller|
+    +------>+         |        |       |          |
+  occurred  +-------->+        |       |          |
+    |     detected    +------->+       |          |
+    |       |     identified   +-------+          |
+    |       |               notified   +--------->+
+    |       |                  |    processed  resolved
+    |       |                  |                  |
+    |       +<-----doctor----->+                  |
+    |                                             |
+    |                                             |
+    +<---------------fault management------------>+
+
+The notification time can be split into several parts and visualized as a
+timeline::
+
+  start                                         end
+    0----5---10---15---20---25---30---35---40---45--> (x 10ms)
+    +    +   +   +   +    +      +   +   +   +   +
+  0-hostdown |   |   |    |      |   |   |   |   |
+    +--->+   |   |   |    |      |   |   |   |   |
+    |  1-raw failure |    |      |   |   |   |   |
+    |    +-->+   |   |    |      |   |   |   |   |
+    |    | 2-found affected      |   |   |   |   |
+    |    |   +-->+   |    |      |   |   |   |   |
+    |    |     3-marked host down|   |   |   |   |
+    |    |       +-->+    |      |   |   |   |   |
+    |    |         4-set VM error|   |   |   |   |
+    |    |           +--->+      |   |   |   |   |
+    |    |           |  5-notified VM error  |   |
+    |    |           |    +----->|   |   |   |   |
+    |    |           |    |    6-transformed event
+    |    |           |    |      +-->+   |   |   |
+    |    |           |    |      | 7-evaluated event
+    |    |           |    |      |   +-->+   |   |
+    |    |           |    |      |     8-fired alarm
+    |    |           |    |      |       +-->+   |
+    |    |           |    |      |         9-received alarm
+    |    |           |    |      |           +-->+
+  sample | sample    |    |      |           |10-handled alarm
+  monitor| inspector |nova| c/m  |    aodh   |
+    |                                        |
+    +<-----------------doctor--------------->+
+
+Note: c/m = ceilometer
+
+And a table of components sorted by time cost from most to least
+
++----------+---------+----------+
+|Component |Time Cost|Percentage|
++==========+=========+==========+
+|inspector |160ms    | 40%      |
++----------+---------+----------+
+|aodh      |110ms    | 30%      |
++----------+---------+----------+
+|monitor   |50ms     | 14%      |
++----------+---------+----------+
+|...       |         |          |
++----------+---------+----------+
+|...       |         |          |
++----------+---------+----------+
+
+Note: data in the table is for demonstration only, not actual measurement
+
+Timestamps can be collected from various sources
+
+1. log files
+2. trace point in code
+
+The performance profiler will be integrated into the verification job to provide
+detail result of the test. It can also be deployed independently to diagnose
+performance issue in specified environment.
+
+Working Items
+=============
+
+1. PoC with limited checkpoints
+2. Integration with verification job
+3. Collect timestamp at all checkpoints
+4. Display the profiling result in console
+5. Report the profiling result to test database
+6. Independent package which can be installed to specified environment
diff --git a/docs/development/design/port-data-plane-status.rst b/docs/development/design/port-data-plane-status.rst
new file mode 100644
index 00000000..06cfc3c6
--- /dev/null
+++ b/docs/development/design/port-data-plane-status.rst
@@ -0,0 +1,180 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+====================================
+Port data plane status
+====================================
+
+https://bugs.launchpad.net/neutron/+bug/1598081
+
+Neutron does not detect data plane failures affecting its logical resources.
+This spec addresses that issue by means of allowing external tools to report to
+Neutron about faults in the data plane that are affecting the ports. A new REST
+API field is proposed to that end.
+
+
+Problem Description
+===================
+
+An initial description of the problem was introduced in bug #159801 [1_]. This
+spec focuses on capturing one (main) part of the problem there described, i.e.
+extending Neutron's REST API to cover the scenario of allowing external tools
+to report network failures to Neutron. Out of scope of this spec are works to
+enable port status changes to be received and managed by mechanism drivers.
+
+This spec also tries to address bug #1575146 [2_]. Specifically, and argued by
+the Neutron driver team in [3_]:
+
+ * Neutron should not shut down the port completly upon detection of physnet
+   failure; connectivity between instances on the same node may still be
+   reachable. Externals tools may or may not want to trigger a status change on
+   the port based on their own logic and orchestration.
+
+ * Port down is not detected when an uplink of a switch is down;
+
+ * The physnet bridge may have multiple physical interfaces plugged; shutting
+   down the logical port may not be needed in case network redundancy is in
+   place.
+
+
+Proposed Change
+===============
+
+A couple of possible approaches were proposed in [1_] (comment #3). This spec
+proposes tackling the problema via a new extension API to the port resource.
+The extension adds a new attribute 'dp-down' (data plane down) to represent the
+status of the data plane. The field should be read-only by tenants and
+read-write by admins.
+
+Neutron should send out an event to the message bus upon toggling the data
+plane status value. The event is relevant for e.g. auditing.
+
+
+Data Model Impact
+-----------------
+
+A new attribute as extension will be added to the 'ports' table.
+
++------------+-------+----------+---------+--------------------+--------------+
+|Attribute   |Type   |Access    |Default  |Validation/         |Description   |
+|Name        |       |          |Value    |Conversion          |              |
++============+=======+==========+=========+====================+==============+
+|dp_down     |boolean|RO, tenant|False    |True/False          |              |
+|            |       |RW, admin |         |                    |              |
++------------+-------+----------+---------+--------------------+--------------+
+
+
+REST API Impact
+---------------
+
+A new API extension to the ports resource is going to be introduced.
+
+.. code-block:: python
+
+  EXTENDED_ATTRIBUTES_2_0 = {
+      'ports': {
+          'dp_down': {'allow_post': False, 'allow_put': True,
+                      'default': False, 'convert_to': convert_to_boolean,
+                      'is_visible': True},
+      },
+  }
+
+
+Examples
+~~~~~~~~
+
+Updating port data plane status to down:
+
+.. code-block:: json
+
+   PUT /v2.0/ports/<port-uuid>
+   Accept: application/json
+   {
+       "port": {
+           "dp_down": true
+       }
+   }
+
+
+
+Command Line Client Impact
+--------------------------
+
+::
+
+  neutron port-update [--dp-down <True/False>] <port>
+  openstack port set [--dp-down <True/False>] <port>
+
+Argument --dp-down is optional. Defaults to False.
+
+
+Security Impact
+---------------
+
+None
+
+Notifications Impact
+--------------------
+
+A notification (event) upon toggling the data plane status (i.e. 'dp-down'
+attribute) value should be sent to the message bus. Such events do not happen
+with high frequency and thus no negative impact on the notification bus is
+expected.
+
+Performance Impact
+------------------
+
+None
+
+IPv6 Impact
+-----------
+
+None
+
+Other Deployer Impact
+---------------------
+
+None
+
+Developer Impact
+----------------
+
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+ * cgoncalves
+
+Work Items
+----------
+
+ * New 'dp-down' attribute in 'ports' database table
+ * API extension to introduce new field to port
+ * Client changes to allow for data plane status (i.e. 'dp-down' attribute')
+   being set
+ * Policy (tenants read-only; admins read-write)
+
+
+Documentation Impact
+====================
+
+Documentation for both administrators and end users will have to be
+contemplated. Administrators will need to know how to set/unset the data plane
+status field.
+
+
+References
+==========
+
+.. [1] RFE: Port status update,
+   https://bugs.launchpad.net/neutron/+bug/1598081
+
+.. [2] RFE: ovs port status should the same as physnet
+   https://bugs.launchpad.net/neutron/+bug/1575146
+
+.. [3] Neutron Drivers meeting, July 21, 2016
+   http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-07-21-22.00.html
diff --git a/docs/development/design/report-host-fault-to-update-server-state-immediately.rst b/docs/development/design/report-host-fault-to-update-server-state-immediately.rst
new file mode 100644
index 00000000..2f6ce145
--- /dev/null
+++ b/docs/development/design/report-host-fault-to-update-server-state-immediately.rst
@@ -0,0 +1,248 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+.. NOTE::
+   This is a specification draft of a blueprint proposed for OpenStack Nova
+   Liberty. It was written by project member(s) and agreed within the project
+   before submitting it upstream. No further changes to its content will be
+   made here anymore; please follow it upstream:
+
+   * Current version upstream: https://review.openstack.org/#/c/169836/
+   * Development activity:
+     https://blueprints.launchpad.net/nova/+spec/mark-host-down
+
+   **Original draft is as follow:**
+
+====================================================
+Report host fault to update server state immediately
+====================================================
+
+https://blueprints.launchpad.net/nova/+spec/update-server-state-immediately
+
+A new API is needed to report a host fault to change the state of the
+instances and compute node immediately. This allows usage of evacuate API
+without a delay. The new API provides the possibility for external monitoring
+system to detect any kind of host failure fast and reliably and inform
+OpenStack about it. Nova updates the compute node state and states of the
+instances. This way the states in the Nova DB will be in sync with the
+real state of the system.
+
+Problem description
+===================
+* Nova state change for failed or unreachable host is slow and does not
+  reliably state compute node is down or not. This might cause same instance
+  to run twice if action taken to evacuate instance to another host.
+* Nova state for instances on failed compute node will not change,
+  but remains active and running. This gives user a false information about
+  instance state. Currently one would need to call "nova reset-state" for each
+  instance to have them in error state.
+* OpenStack user cannot make HA actions fast and reliably by trusting instance
+  state and compute node state.
+* As compute node state changes slowly one cannot evacuate instances.
+
+Use Cases
+---------
+Use case in general is that in case there is a host fault one should change
+compute node state fast and reliably when using DB servicegroup backend.
+On top of this here is the use cases that are not covered currently to have
+instance states changed correctly:
+* Management network connectivity lost between controller and compute node.
+* Host HW failed.
+
+Generic use case flow:
+
+* The external monitoring system detects a host fault.
+* The external monitoring system fences the host if not down already.
+* The external system calls the new Nova API to force the failed compute node
+  into down state as well as instances running on it.
+* Nova updates the compute node state and state of the effected instances to
+  Nova DB.
+
+Currently nova-compute state will be changing "down", but it takes a long
+time. Server state keeps as "vm_state: active" and "power_state:
+running", which is not correct. By having external tool to detect host faults
+fast, fence host by powering down and then report host down to OpenStack, all
+these states would reflect to actual situation. Also if OpenStack will not
+implement automatic actions for fault correlation, external tool can do that.
+This could be configured for example in server instance METADATA easily and be
+read by external tool.
+
+Project Priority
+-----------------
+Liberty priorities have not yet been defined.
+
+Proposed change
+===============
+There needs to be a new API for Admin to state host is down. This API is used
+to mark compute node and instances running on it down to reflect the real
+situation.
+
+Example on compute node is:
+
+* When compute node is up and running:
+  vm_state: active and power_state: running
+  nova-compute state: up status: enabled
+* When compute node goes down and new API is called to state host is down:
+  vm_state: stopped power_state: shutdown
+  nova-compute state: down status: enabled
+
+vm_state values: soft-delete, deleted, resized and error
+should not be touched.
+task_state effect needs to be worked out if needs to be touched.
+
+Alternatives
+------------
+There is no attractive alternatives to detect all different host faults than
+to have a external tool to detect different host faults. For this kind of tool
+to exist there needs to be new API in Nova to report fault. Currently there
+must have been some kind of workarounds implemented as cannot trust or get the
+states from OpenStack fast enough.
+
+Data model impact
+-----------------
+None
+
+REST API impact
+---------------
+* Update CLI to report host is down
+
+  nova host-update command
+
+  usage: nova host-update [--status <enable|disable>]
+                        [--maintenance <enable|disable>]
+                        [--report-host-down]
+                        <hostname>
+
+  Update host settings.
+
+  Positional arguments
+
+  <hostname>
+  Name of host.
+
+  Optional arguments
+
+  --status <enable|disable>
+  Either enable or disable a host.
+
+  --maintenance <enable|disable>
+  Either put or resume host to/from maintenance.
+
+  --down
+  Report host down to update instance and compute node state in db.
+
+* Update Compute API to report host is down:
+
+  /v2.1/{tenant_id}/os-hosts/{host_name}
+
+  Normal response codes: 200
+  Request parameters
+
+  Parameter     Style   Type          Description
+  host_name     URI     xsd:string    The name of the host of interest to you.
+
+  {
+      "host": {
+          "status": "enable",
+          "maintenance_mode": "enable"
+          "host_down_reported": "true"
+
+      }
+
+  }
+
+  {
+      "host": {
+          "host": "65c5d5b7e3bd44308e67fc50f362aee6",
+          "maintenance_mode": "enabled",
+          "status": "enabled"
+          "host_down_reported": "true"
+
+      }
+
+  }
+
+* New method to nova.compute.api module HostAPI class to have a
+  to mark host related instances and compute node down:
+  set_host_down(context, host_name)
+
+* class novaclient.v2.hosts.HostManager(api) method update(host, values)
+  Needs to handle reporting host down.
+
+* Schema does not need changes as in db only service and server states are to
+  be changed.
+
+Security impact
+---------------
+API call needs admin privileges (in the default policy configuration).
+
+Notifications impact
+--------------------
+None
+
+Other end user impact
+---------------------
+None
+
+Performance Impact
+------------------
+Only impact is that user can get information faster about instance and
+compute node state. This also gives possibility to evacuate faster.
+No impact that would slow down. Host down should be rare occurrence.
+
+Other deployer impact
+---------------------
+Developer can make use of any external tool to detect host fault and report it
+to OpenStack.
+
+Developer impact
+----------------
+None
+
+Implementation
+==============
+Assignee(s)
+-----------
+Primary assignee:   Tomi Juvonen
+Other contributors: Ryota Mibu
+
+Work Items
+----------
+* Test cases.
+* API changes.
+* Documentation.
+
+Dependencies
+============
+None
+
+Testing
+=======
+Test cases that exists for enabling or putting host to maintenance should be
+altered or similar new cases made test new functionality.
+
+Documentation Impact
+====================
+
+New API needs to be documented:
+
+* Compute API extensions documentation.
+  http://developer.openstack.org/api-ref-compute-v2.1.html
+* Nova commands documentation.
+  http://docs.openstack.org/user-guide-admin/content/novaclient_commands.html
+* Compute command-line client documentation.
+  http://docs.openstack.org/cli-reference/content/novaclient_commands.html
+* nova.compute.api documentation.
+  http://docs.openstack.org/developer/nova/api/nova.compute.api.html
+* High Availability guide might have page to tell external tool could provide
+  ability to provide faster HA as able to update states by new API.
+  http://docs.openstack.org/high-availability-guide/content/index.html
+
+References
+==========
+* OPNFV Doctor project: https://wiki.opnfv.org/doctor
+* OpenStack Instance HA Proposal:
+  http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
+* The Different Facets of OpenStack HA:
+  http://blog.russellbryant.net/2015/03/10/
+  the-different-facets-of-openstack-ha/
diff --git a/docs/development/design/rfe-port-status-update.rst b/docs/development/design/rfe-port-status-update.rst
new file mode 100644
index 00000000..d87d7d7b
--- /dev/null
+++ b/docs/development/design/rfe-port-status-update.rst
@@ -0,0 +1,32 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+==========================
+Neutron Port Status Update
+==========================
+
+.. NOTE::
+   This document represents a Neutron RFE reviewed in the Doctor project before submitting upstream to Launchpad Neutron
+   space. The document is not intended to follow a blueprint format or to be an extensive document.
+   For more information, please visit http://docs.openstack.org/developer/neutron/policies/blueprints.html
+
+   The RFE was submitted to Neutron. You can follow the discussions in https://bugs.launchpad.net/neutron/+bug/1598081
+
+Neutron port status field represents the current status of a port in the cloud infrastructure. The field can take one of
+the following values: 'ACTIVE', 'DOWN', 'BUILD' and 'ERROR'.
+
+At present, if a network event occurs in the data-plane (e.g. virtual or physical switch fails or one of its ports,
+cable gets pulled unintentionally, infrastructure topology changes, etc.), connectivity to logical ports may be affected
+and tenants' services interrupted. When tenants/cloud administrators are looking up their resources' status (e.g. Nova
+instances and services running in them, network ports, etc.), they will wrongly see everything looks fine. The problem
+is that Neutron will continue reporting port 'status' as 'ACTIVE'.
+
+Many SDN Controllers managing network elements have the ability to detect and report network events to upper layers.
+This allows SDN Controllers' users to be notified of changes and react accordingly. Such information could be consumed
+by Neutron so that Neutron could update the 'status' field of those logical ports, and additionally generate a
+notification message to the message bus.
+
+However, Neutron misses a way to be able to receive such information through e.g. ML2 driver or the REST API ('status'
+field is read-only). There are pros and cons on both of these approaches as well as other possible approaches. This RFE
+intends to trigger a discussion on how Neutron could be improved to receive fault/change events from SDN Controllers or
+even also from 3rd parties not in charge of controlling the network (e.g. monitoring systems, human admins).
author	Ryota Mibu <r-mibu@cq.jp.nec.com>	2017-02-17 04:36:05 +0000
committer	Gerrit Code Review <gerrit@opnfv.org>	2017-02-17 04:36:05 +0000
commit	f3ab498aaddb27f6f598a84e2dbe0203ced6d666 (patch)
tree	fc7b2be2681db87adc1eb935e6fdcc93a8bc1645 /docs/development/design
parent	5d9c24fd28bcc02243306a8c96d0c68809523343 (diff)
parent	d0b22e1d856cf8f78e152dfb6c150e001e03dd52 (diff)