diff options
Diffstat (limited to 'docs/release/userguide')
-rw-r--r-- | docs/release/userguide/get-valid-server-state.rst | 125 | ||||
-rw-r--r-- | docs/release/userguide/index.rst | 3 | ||||
-rw-r--r-- | docs/release/userguide/mark-host-down_manual.rst | 122 | ||||
-rw-r--r-- | docs/release/userguide/monitors.rst | 37 |
4 files changed, 287 insertions, 0 deletions
diff --git a/docs/release/userguide/get-valid-server-state.rst b/docs/release/userguide/get-valid-server-state.rst new file mode 100644 index 00000000..824ea3c2 --- /dev/null +++ b/docs/release/userguide/get-valid-server-state.rst @@ -0,0 +1,125 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +====================== +Get valid server state +====================== + +Related Blueprints: +=================== + +https://blueprints.launchpad.net/nova/+spec/get-valid-server-state + +Problem description +=================== + +Previously when the owner of a VM has queried his VMs, he has not received +enough state information, states have not changed fast enough in the VIM and +they have not been accurate in some scenarios. With this change this gap is now +closed. + +A typical case is that, in case of a fault of a host, the user of a high +availability service running on top of that host, needs to make an immediate +switch over from the faulty host to an active standby host. Now, if the compute +host is forced down [1] as a result of that fault, the user has to be notified +about this state change such that the user can react accordingly. Similarly, +a change of the host state to "maintenance" should also be notified to the +users. + +What is changed +=============== + +A new ``host_status`` parameter is added to the ``/servers/{server_id}`` and +``/servers/detail`` endpoints in microversion 2.16. By this new parameter +user can get additional state information about the host. + +``host_status`` possible values where next value in list can override the +previous: + +- ``UP`` if nova-compute is up. +- ``UNKNOWN`` if nova-compute status was not reported by servicegroup driver + within configured time period. Default is within 60 seconds, + but can be changed with ``service_down_time`` in nova.conf. +- ``DOWN`` if nova-compute was forced down. +- ``MAINTENANCE`` if nova-compute was disabled. MAINTENANCE in API directly + means nova-compute service is disabled. Different wording is used to avoid + the impression that the whole host is down, as only scheduling of new VMs + is disabled. +- Empty string indicates there is no host for server. + +``host_status`` is returned in the response in case the policy permits. By +default the policy is for admin only in Nova policy.json:: + + "os_compute_api:servers:show:host_status": "rule:admin_api" + +For an NFV use case this has to also be enabled for the owner of the VM:: + + "os_compute_api:servers:show:host_status": "rule:admin_or_owner" + +REST API examples: +================== + +Case where nova-compute is enabled and reporting normally:: + + GET /v2.1/{tenant_id}/servers/{server_id} + + 200 OK + { + "server": { + "host_status": "UP", + ... + } + } + +Case where nova-compute is enabled, but not reporting normally:: + + GET /v2.1/{tenant_id}/servers/{server_id} + + 200 OK + { + "server": { + "host_status": "UNKNOWN", + ... + } + } + +Case where nova-compute is enabled, but forced_down:: + + GET /v2.1/{tenant_id}/servers/{server_id} + + 200 OK + { + "server": { + "host_status": "DOWN", + ... + } + } + +Case where nova-compute is disabled:: + + GET /v2.1/{tenant_id}/servers/{server_id} + + 200 OK + { + "server": { + "host_status": "MAINTENANCE", + ... + } + } + +Host Status is also visible in python-novaclient:: + + +-------+------+--------+------------+-------------+----------+-------------+ + | ID | Name | Status | Task State | Power State | Networks | Host Status | + +-------+------+--------+------------+-------------+----------+-------------+ + | 9a... | vm1 | ACTIVE | - | RUNNING | xnet=... | UP | + +-------+------+--------+------------+-------------+----------+-------------+ + +Links: +====== + +[1] Manual for OpenStack NOVA API for marking host down +http://artifacts.opnfv.org/doctor/docs/manuals/mark-host-down_manual.html + +[2] OpenStack compute manual page +http://developer.openstack.org/api-ref-compute-v2.1.html#compute-v2.1 diff --git a/docs/release/userguide/index.rst b/docs/release/userguide/index.rst index eee855dc..577072c7 100644 --- a/docs/release/userguide/index.rst +++ b/docs/release/userguide/index.rst @@ -11,3 +11,6 @@ Doctor User Guide :maxdepth: 2 feature.userguide.rst + get-valid-server-state.rst + mark-host-down_manual.rst + monitors.rst diff --git a/docs/release/userguide/mark-host-down_manual.rst b/docs/release/userguide/mark-host-down_manual.rst new file mode 100644 index 00000000..3815205d --- /dev/null +++ b/docs/release/userguide/mark-host-down_manual.rst @@ -0,0 +1,122 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +========================================= +OpenStack NOVA API for marking host down. +========================================= + +Related Blueprints: +=================== + + https://blueprints.launchpad.net/nova/+spec/mark-host-down + https://blueprints.launchpad.net/python-novaclient/+spec/support-force-down-service + +What the API is for +=================== + + This API will give external fault monitoring system a possibility of telling + OpenStack Nova fast that compute host is down. This will immediately enable + calling of evacuation of any VM on host and further enabling faster HA + actions. + +What this API does +================== + + In OpenStack the nova-compute service state can represent the compute host + state and this new API is used to force this service down. It is assumed + that the one calling this API has made sure the host is also fenced or + powered down. This is important, so there is no chance same VM instance will + appear twice in case evacuated to new compute host. When host is recovered + by any means, the external system is responsible of calling the API again to + disable forced_down flag and let the host nova-compute service report again + host being up. If network fenced host come up again it should not boot VMs + it had if figuring out they are evacuated to other compute host. The + decision of deleting or booting VMs there used to be on host should be + enhanced later to be more reliable by Nova blueprint: + https://blueprints.launchpad.net/nova/+spec/robustify-evacuate + +REST API for forcing down: +========================== + + Parameter explanations: + tenant_id: Identifier of the tenant. + binary: Compute service binary name. + host: Compute host name. + forced_down: Compute service forced down flag. + token: Token received after successful authentication. + service_host_ip: Serving controller node ip. + + request: + PUT /v2.1/{tenant_id}/os-services/force-down + { + "binary": "nova-compute", + "host": "compute1", + "forced_down": true + } + + response: + 200 OK + { + "service": { + "host": "compute1", + "binary": "nova-compute", + "forced_down": true + } + } + + Example: + curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services + /force-down -H "Content-Type: application/json" -H "Accept: application/json + " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b + inary": "nova-compute", "host": "compute1", "forced_down": true}' + +CLI for forcing down: +===================== + + nova service-force-down <hostname> nova-compute + + Example: + nova service-force-down compute1 nova-compute + +REST API for disabling forced down: +=================================== + + Parameter explanations: + tenant_id: Identifier of the tenant. + binary: Compute service binary name. + host: Compute host name. + forced_down: Compute service forced down flag. + token: Token received after successful authentication. + service_host_ip: Serving controller node ip. + + request: + PUT /v2.1/{tenant_id}/os-services/force-down + { + "binary": "nova-compute", + "host": "compute1", + "forced_down": false + } + + response: + 200 OK + { + "service": { + "host": "compute1", + "binary": "nova-compute", + "forced_down": false + } + } + + Example: + curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services + /force-down -H "Content-Type: application/json" -H "Accept: application/json + " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b + inary": "nova-compute", "host": "compute1", "forced_down": false}' + +CLI for disabling forced down: +============================== + + nova service-force-down --unset <hostname> nova-compute + + Example: + nova service-force-down --unset compute1 nova-compute diff --git a/docs/release/userguide/monitors.rst b/docs/release/userguide/monitors.rst new file mode 100644 index 00000000..eeb5e226 --- /dev/null +++ b/docs/release/userguide/monitors.rst @@ -0,0 +1,37 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +Monitor Types and Limitations +============================= + +Currently there are two monitor types supported: sample and collectd + +Sample Monitor +-------------- + +Sample monitor type pings the compute host from the control host and calculates the +notification time after the ping timeout. +Also if inspector type is sample, the compute node needs to communicate with the control +node on port 12345. This port needs to be opened for incomming traffic on control node. + +Collectd Monitor +---------------- + +Collectd monitor type uses collectd daemon running ovs_events plugin. Collectd runs on +compute to send instant notification to the control node. The notification time is +calculated by using the difference of time at which compute node sends notification to +control node and the time at which consumer is notified. The time on control and compute +node has to be synchronized for this reason. For further details on setting up collectd +on the compute node, use the following link: +:doc:`<barometer:release/userguide/feature.userguide>` + + +Collectd monitors an interface managed by OVS. If the interface is not be assigned +an IP, the user has to provide the name of interface to be monitored. The command to +launch the doctor test in that case is: +MONITOR_TYPE=collectd INSPECTOR_TYPE=sample INTERFACE_NAME=example_iface ./run.sh + +If the interface name or IP is not provided, the collectd monitor type will monitor the +default management interface. This may result in the failure of doctor run.sh test case. +The test case sets the monitored interface down and if the inspector (sample or congress) +is running on the same subnet, collectd monitor will not be able to communicate with it. |