4 files changed, 287 insertions, 0 deletions
diff --git a/docs/release/userguide/get-valid-server-state.rst b/docs/release/userguide/get-valid-server-state.rst
new file mode 100644
index 00000000..824ea3c2
--- /dev/null
+++ b/docs/release/userguide/get-valid-server-state.rst
@@ -0,0 +1,125 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+======================
+Get valid server state
+======================
+
+Related Blueprints:
+===================
+
+https://blueprints.launchpad.net/nova/+spec/get-valid-server-state
+
+Problem description
+===================
+
+Previously when the owner of a VM has queried his VMs, he has not received
+enough state information, states have not changed fast enough in the VIM and
+they have not been accurate in some scenarios. With this change this gap is now
+closed.
+
+A typical case is that, in case of a fault of a host, the user of a high
+availability service running on top of that host, needs to make an immediate
+switch over from the faulty host to an active standby host. Now, if the compute
+host is forced down [1] as a result of that fault, the user has to be notified
+about this state change such that the user can react accordingly. Similarly,
+a change of the host state to "maintenance" should also be notified to the
+users.
+
+What is changed
+===============
+
+A new ``host_status`` parameter is added to the ``/servers/{server_id}`` and
+``/servers/detail`` endpoints in microversion 2.16. By this new parameter
+user can get additional state information about the host.
+
+``host_status`` possible values where next value in list can override the
+previous:
+
+- ``UP`` if nova-compute is up.
+- ``UNKNOWN`` if nova-compute status was not reported by servicegroup driver
+  within configured time period. Default is within 60 seconds,
+  but can be changed with ``service_down_time`` in nova.conf.
+- ``DOWN`` if nova-compute was forced down.
+- ``MAINTENANCE`` if nova-compute was disabled. MAINTENANCE in API directly
+  means nova-compute service is disabled. Different wording is used to avoid
+  the impression that the whole host is down, as only scheduling of new VMs
+  is disabled.
+- Empty string indicates there is no host for server.
+
+``host_status`` is returned in the response in case the policy permits. By
+default the policy is for admin only in Nova policy.json::
+
+  "os_compute_api:servers:show:host_status": "rule:admin_api"
+
+For an NFV use case this has to also be enabled for the owner of the VM::
+
+  "os_compute_api:servers:show:host_status": "rule:admin_or_owner"
+
+REST API examples:
+==================
+
+Case where nova-compute is enabled and reporting normally::
+
+    GET /v2.1/{tenant_id}/servers/{server_id}
+
+    200 OK
+    {
+      "server": {
+        "host_status": "UP",
+        ...
+      }
+    }
+
+Case where nova-compute is enabled, but not reporting normally::
+
+    GET /v2.1/{tenant_id}/servers/{server_id}
+
+    200 OK
+    {
+      "server": {
+        "host_status": "UNKNOWN",
+        ...
+      }
+    }
+
+Case where nova-compute is enabled, but forced_down::
+
+    GET /v2.1/{tenant_id}/servers/{server_id}
+
+    200 OK
+    {
+      "server": {
+        "host_status": "DOWN",
+        ...
+      }
+    }
+
+Case where nova-compute is disabled::
+
+    GET /v2.1/{tenant_id}/servers/{server_id}
+
+    200 OK
+    {
+      "server": {
+        "host_status": "MAINTENANCE",
+        ...
+      }
+    }
+
+Host Status is also visible in python-novaclient::
+
+  +-------+------+--------+------------+-------------+----------+-------------+
+  | ID    | Name | Status | Task State | Power State | Networks | Host Status |
+  +-------+------+--------+------------+-------------+----------+-------------+
+  | 9a... | vm1  | ACTIVE | -          | RUNNING     | xnet=... | UP          |
+  +-------+------+--------+------------+-------------+----------+-------------+
+
+Links:
+======
+
+[1] Manual for OpenStack NOVA API for marking host down
+http://artifacts.opnfv.org/doctor/docs/manuals/mark-host-down_manual.html
+
+[2] OpenStack compute manual page
+http://developer.openstack.org/api-ref-compute-v2.1.html#compute-v2.1
diff --git a/docs/release/userguide/index.rst b/docs/release/userguide/index.rst
index eee855dc..577072c7 100644
--- a/docs/release/userguide/index.rst
+++ b/docs/release/userguide/index.rst
@@ -11,3 +11,6 @@ Doctor User Guide
     :maxdepth: 2
 
     feature.userguide.rst
+    get-valid-server-state.rst
+    mark-host-down_manual.rst
+    monitors.rst
diff --git a/docs/release/userguide/mark-host-down_manual.rst b/docs/release/userguide/mark-host-down_manual.rst
new file mode 100644
index 00000000..3815205d
--- /dev/null
+++ b/docs/release/userguide/mark-host-down_manual.rst
@@ -0,0 +1,122 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+=========================================
+OpenStack NOVA API for marking host down.
+=========================================
+
+Related Blueprints:
+===================
+
+  https://blueprints.launchpad.net/nova/+spec/mark-host-down
+  https://blueprints.launchpad.net/python-novaclient/+spec/support-force-down-service
+
+What the API is for
+===================
+
+  This API will give external fault monitoring system a possibility of telling
+  OpenStack Nova fast that compute host is down. This will immediately enable
+  calling of evacuation of any VM on host and further enabling faster HA
+  actions.
+
+What this API does
+==================
+
+  In OpenStack the nova-compute service state can represent the compute host
+  state and this new API is used to force this service down. It is assumed
+  that the one calling this API has made sure the host is also fenced or
+  powered down. This is important, so there is no chance same VM instance will
+  appear twice in case evacuated to new compute host. When host is recovered
+  by any means, the external system is responsible of calling the API again to
+  disable forced_down flag and let the host nova-compute service report again
+  host being up. If network fenced host come up again it should not boot VMs
+  it had if figuring out they are evacuated to other compute host. The
+  decision of deleting or booting VMs there used to be on host should be
+  enhanced later to be more reliable by Nova blueprint:
+  https://blueprints.launchpad.net/nova/+spec/robustify-evacuate
+
+REST API for forcing down:
+==========================
+
+  Parameter explanations:
+  tenant_id:       Identifier of the tenant.
+  binary:          Compute service binary name.
+  host:            Compute host name.
+  forced_down:     Compute service forced down flag.
+  token:           Token received after successful authentication.
+  service_host_ip: Serving controller node ip.
+
+  request:
+  PUT /v2.1/{tenant_id}/os-services/force-down
+  {
+  "binary": "nova-compute",
+  "host": "compute1",
+  "forced_down": true
+  }
+
+  response:
+  200 OK
+  {
+  "service": {
+  "host": "compute1",
+  "binary": "nova-compute",
+  "forced_down": true
+  }
+  }
+
+  Example:
+  curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services
+  /force-down -H "Content-Type: application/json" -H "Accept: application/json
+  " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b
+  inary": "nova-compute", "host": "compute1", "forced_down": true}'
+
+CLI for forcing down:
+=====================
+
+  nova service-force-down <hostname> nova-compute
+
+  Example:
+  nova service-force-down compute1 nova-compute
+
+REST API for disabling forced down:
+===================================
+
+  Parameter explanations:
+  tenant_id:       Identifier of the tenant.
+  binary:          Compute service binary name.
+  host:            Compute host name.
+  forced_down:     Compute service forced down flag.
+  token:           Token received after successful authentication.
+  service_host_ip: Serving controller node ip.
+
+  request:
+  PUT /v2.1/{tenant_id}/os-services/force-down
+  {
+  "binary": "nova-compute",
+  "host": "compute1",
+  "forced_down": false
+  }
+
+  response:
+  200 OK
+  {
+  "service": {
+  "host": "compute1",
+  "binary": "nova-compute",
+  "forced_down": false
+  }
+  }
+
+  Example:
+  curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services
+  /force-down -H "Content-Type: application/json" -H "Accept: application/json
+  " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b
+  inary": "nova-compute", "host": "compute1", "forced_down": false}'
+
+CLI for disabling forced down:
+==============================
+
+  nova service-force-down --unset <hostname> nova-compute
+
+  Example:
+  nova service-force-down --unset compute1 nova-compute
diff --git a/docs/release/userguide/monitors.rst b/docs/release/userguide/monitors.rst
new file mode 100644
index 00000000..eeb5e226
--- /dev/null
+++ b/docs/release/userguide/monitors.rst
@@ -0,0 +1,37 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+Monitor Types and Limitations
+=============================
+
+Currently there are two monitor types supported: sample and collectd
+
+Sample Monitor
+--------------
+
+Sample monitor type pings the compute host from the control host and calculates the
+notification time after the ping timeout.
+Also if inspector type is sample, the compute node needs to communicate with the control
+node on port 12345. This port needs to be opened for incomming traffic on control node.
+
+Collectd Monitor
+----------------
+
+Collectd monitor type uses collectd daemon running ovs_events plugin. Collectd runs on
+compute to send instant notification to the control node. The notification time is
+calculated by using the difference of time at which compute node sends notification to
+control node and the time at which consumer is notified. The time on control and compute
+node has to be synchronized for this reason. For further details on setting up collectd
+on the compute node, use the following link:
+:doc:`<barometer:release/userguide/feature.userguide>`
+
+
+Collectd monitors an interface managed by OVS. If the interface is not be assigned
+an IP, the user has to provide the name of interface to be monitored. The command to
+launch the doctor test in that case is:
+MONITOR_TYPE=collectd INSPECTOR_TYPE=sample INTERFACE_NAME=example_iface ./run.sh
+
+If the interface name or IP is not provided, the collectd monitor type will monitor the
+default management interface. This may result in the failure of doctor run.sh test case.
+The test case sets the monitored interface down and if the inspector (sample or congress)
+is running on the same subnet, collectd monitor will not be able to communicate with it.