summaryrefslogtreecommitdiffstats
path: root/docs/release
diff options
context:
space:
mode:
authorTomi Juvonen <tomi.juvonen@nokia.com>2020-10-13 16:37:57 +0300
committerTomi Juvonen <tomi.juvonen@nokia.com>2020-10-13 16:48:12 +0300
commit72a1f8c92f1692f1ea8dcb5bc706ec9939c30e0a (patch)
treee851c88b40b16d6c8d791b746b8c78728fff0c4f /docs/release
parent6ff11513a0d3728c79033af623c79dd6df7c621e (diff)
According to document guidelines Release notes ETSI FEAT03 support and other minor enhancements JIRA: DOCTOR-143 Signed-off-by: Tomi Juvonen <tomi.juvonen@nokia.com> Change-Id: Iefa74004dfada376d1ab05c0149029a26f822275
Diffstat (limited to 'docs/release')
-rw-r--r--docs/release/configguide/feature.configuration.rst54
-rw-r--r--docs/release/configguide/index.rst6
-rw-r--r--docs/release/index.rst12
-rw-r--r--docs/release/installation/index.rst13
-rw-r--r--docs/release/installation/installation.rst44
-rw-r--r--docs/release/release-notes/release-notes.rst61
-rw-r--r--docs/release/release-notes/releasenotes_iruya.rst129
-rw-r--r--docs/release/scenarios/fault_management/fault_management.rst90
-rw-r--r--docs/release/scenarios/maintenance/images/Fault-management-design.pngbin0 -> 237110 bytes
-rw-r--r--docs/release/scenarios/maintenance/images/LICENSE14
-rw-r--r--docs/release/scenarios/maintenance/images/Maintenance-design.pngbin0 -> 316640 bytes
-rw-r--r--docs/release/scenarios/maintenance/images/Maintenance-workflow.pngbin0 -> 81286 bytes
-rw-r--r--docs/release/scenarios/maintenance/maintenance.rst120
-rw-r--r--docs/release/userguide/get-valid-server-state.rst125
-rw-r--r--docs/release/userguide/index.rst3
-rw-r--r--docs/release/userguide/mark-host-down_manual.rst122
-rw-r--r--docs/release/userguide/monitors.rst37
17 files changed, 801 insertions, 29 deletions
diff --git a/docs/release/configguide/feature.configuration.rst b/docs/release/configguide/feature.configuration.rst
index 64928eea..8fbff50e 100644
--- a/docs/release/configguide/feature.configuration.rst
+++ b/docs/release/configguide/feature.configuration.rst
@@ -159,3 +159,57 @@ You can configure the Sample Monitor as follows (Example for Apex deployment):
"http://127.0.0.1:$INSPECTOR_PORT/events" > monitor.log 2>&1 &
**Collectd Monitor**
+
+OpenStack components
+====================
+
+In OPNFV and with Doctor testing you can have all OpenStack components configured
+as needed. Here is sample of the needed configuration modifications.
+
+Ceilometer
+----------
+
+/etc/ceilometer/event_definitions.yaml:
+# Maintenance use case needs new alarm definitions to be added
+- event_type: maintenance.scheduled
+ traits:
+ actions_at:
+ fields: payload.maintenance_at
+ type: datetime
+ allowed_actions:
+ fields: payload.allowed_actions
+ host_id:
+ fields: payload.host_id
+ instances:
+ fields: payload.instances
+ metadata:
+ fields: payload.metadata
+ project_id:
+ fields: payload.project_id
+ reply_url:
+ fields: payload.reply_url
+ session_id:
+ fields: payload.session_id
+ state:
+ fields: payload.state
+- event_type: maintenance.host
+ traits:
+ host:
+ fields: payload.host
+ project_id:
+ fields: payload.project_id
+ session_id:
+ fields: payload.session_id
+ state:
+ fields: payload.state
+
+/etc/ceilometer/event_pipeline.yaml:
+# Maintenance and Fault management both needs these to be added
+ - notifier://
+ - notifier://?topic=alarm.all
+
+Nova
+----
+
+/etc/nova/nova.conf
+cpu_allocation_ratio=1.0
diff --git a/docs/release/configguide/index.rst b/docs/release/configguide/index.rst
index b1e7c33d..c2331115 100644
--- a/docs/release/configguide/index.rst
+++ b/docs/release/configguide/index.rst
@@ -3,9 +3,9 @@
.. _doctor-configguide:
-*************************
-Doctor Installation Guide
-*************************
+**************************
+Doctor Configuration Guide
+**************************
.. toctree::
:maxdepth: 2
diff --git a/docs/release/index.rst b/docs/release/index.rst
index 8a1bf405..67eb4c5f 100644
--- a/docs/release/index.rst
+++ b/docs/release/index.rst
@@ -2,14 +2,18 @@
.. http://creativecommons.org/licenses/by/4.0
.. (c) 2017 OPNFV.
+.. _release:
-======
-Doctor
-======
+=======
+Release
+=======
.. toctree::
:maxdepth: 2
+ ./configguide/index.rst
./installation/index.rst
+ ./release-notes/index.rst
+ ./scenarios/fault_management/fault_management.rst
+ ./scenarios/maintenance/maintenance.rst
./userguide/index.rst
-
diff --git a/docs/release/installation/index.rst b/docs/release/installation/index.rst
new file mode 100644
index 00000000..f6527e5d
--- /dev/null
+++ b/docs/release/installation/index.rst
@@ -0,0 +1,13 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+.. _doctor-configguide:
+
+*************************
+Doctor Installation Guide
+*************************
+
+.. toctree::
+ :maxdepth: 2
+
+ installation.rst
diff --git a/docs/release/installation/installation.rst b/docs/release/installation/installation.rst
new file mode 100644
index 00000000..564f19fd
--- /dev/null
+++ b/docs/release/installation/installation.rst
@@ -0,0 +1,44 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+Doctor Installation
+====================
+
+You can clone doctor project in OPNFV installer jumphost or if you are not
+in OPNFV environment you can clone Doctor to DevStack controller node
+
+git clone https://gerrit.opnfv.org/gerrit/doctor
+
+In DevStack controller here is a sample of including what Doctor testing
+will require for sample fault management testing and for maintenance
+testing using Fenix
+
+.. code-block:: bash
+
+ git clone https://github.com/openstack/devstack -b stable/train
+
+.. code-block:: bash
+
+ cd devstack vi local.conf
+
+.. code-block:: bash
+
+ [[local|localrc]]
+ GIT_BASE=https://git.openstack.org
+ HOST_IP=<host_ip>
+ ADMIN_PASSWORD=admin
+ DATABASE_PASSWORD=admin
+ RABBIT_PASSWORD=admin
+ SERVICE_PASSWORD=admin
+ LOGFILE=/opt/stack/stack.sh.log
+
+ PUBLIC_INTERFACE=eth0
+
+ CEILOMETER_EVENT_ALARM=True
+
+ ENABLED_SERVICES=key,rabbit,mysql,fenix-engine,fenix-api,aodh-evaluator,aodh-notifier,aodh-api
+
+ enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer stable/train
+ enable_plugin aodh https://git.openstack.org/openstack/aodh stable/train
+ enable_plugin gnocchi https://github.com/openstack/gnocchi
+ enable_plugin fenix https://opendev.org/x/fenix master
diff --git a/docs/release/release-notes/release-notes.rst b/docs/release/release-notes/release-notes.rst
index 92775557..b525335e 100644
--- a/docs/release/release-notes/release-notes.rst
+++ b/docs/release/release-notes/release-notes.rst
@@ -7,33 +7,41 @@ This document provides the release notes for Iruya version of Doctor.
Important notes
===============
-In Iruya release there has not been many changes.
-
-All testing is now being made with Fuel installer. Maintenance use case
-is now only tested against latest upstream Fenix. Only sample inspector is
-tested as Fuel do not support Vitrage or Congress.
+Jerma release has mainly been for finalizing maintenance use case testing
+supporting the ETSI FEAT03 defined interactino between VNFM and infrastructure.
+This is mainly to have infrastructure maintenance and upgrade operations
+opttimized as fast as they can while keeping VNFs on top with zero impact
+on their service.
+
+Further more this is the final release of Doctor and the more deep testing is
+moving more to upstream projects like Fenix for the maintenance. Also in
+this release we have made sure that all Doctor testing and any deeper testing
+with ehe upstream projects can be done in DevStack. This also makes DevStack
+the most important installer.
Summary
=======
-Iruya Doctor framework uses OpenStack Stein integrated into its test cases.
+Jerma Doctor framework uses OpenStack Train integrated into its test cases.
Release Data
============
Doctor changes
-- Maintenance use case updated to support latest version of Fenix running
- in container on controller node
-- Maintenance use case now support Fuel installer
-- Doctor updated to use OpenStack Stein and only python 3.6
-- Testing only sample inspector as lacking installer support for
- Vitrage and Congress
+- Maintenance use case updated to support latest version of Fenix.
+- Maintenance use case now supports ETSI FEAT03 optimization with Fenix.
+- Doctor testing is now preferred to be done in DevStack environment
+ where one can easily select OpenStack release from Rocky to Ussuri to
+ test Doctor functionality. Latest OPNFV Fuel can also be used for the
+ OpenStack version it supports.
-Releng changes
+Doctor CI
-- Doctor testing running with python 3.6 and with sample inspector
-- Doctor is only tested with Fuel installer
+- Doctor tested with fuel installer.
+- Fault management use case is tested with sample inspector.
+- Maintenance use case is tested with sample implementation and towards
+ the latest Fenix version. The includes the new ETSI FEAT03 optimization.
Version change
^^^^^^^^^^^^^^
@@ -41,12 +49,13 @@ Version change
Module version changes
~~~~~~~~~~~~~~~~~~~~~~
-- OpenStack has changed from Rocky to Stein since previous Hunter release.
+- OpenStack has changed Train
Document version changes
~~~~~~~~~~~~~~~~~~~~~~~~
-N/A
+All documentation is updated to OPNFV unified format according to
+documentation guidelines. Small updates in many documents.
Reason for version
^^^^^^^^^^^^^^^^^^
@@ -56,11 +65,14 @@ N/A
Feature additions
~~~~~~~~~~~~~~~~~
-+--------------------+--------------------------------------------------------------+
-| **JIRA REFERENCE** | **SLOGAN** |
-+--------------------+--------------------------------------------------------------+
-| DOCTOR-134 | Update Doctor maintenance use case to work with latest Fenix |
-+--------------------+--------------------------------------------------------------+
++--------------------+--------------------------------------------+
+| **JIRA REFERENCE** | **SLOGAN** |
++--------------------+--------------------------------------------+
+| DOCTOR-137 | VNFM maintenance with ETSI changes |
++--------------------+--------------------------------------------+
+| DOCTOR-136 | DevStack support |
++--------------------+--------------------------------------------+
+
Deliverables
------------
@@ -127,3 +139,8 @@ References
For more information about the OPNFV Doctor latest work, please see:
https://wiki.opnfv.org/display/doctor/Doctor+Home
+
+Further information about ETSI FEAT03 optimization can be found from Fenix
+Documentation:
+
+https://fenix.readthedocs.io/en/latest
diff --git a/docs/release/release-notes/releasenotes_iruya.rst b/docs/release/release-notes/releasenotes_iruya.rst
new file mode 100644
index 00000000..92775557
--- /dev/null
+++ b/docs/release/release-notes/releasenotes_iruya.rst
@@ -0,0 +1,129 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+
+This document provides the release notes for Iruya version of Doctor.
+
+Important notes
+===============
+
+In Iruya release there has not been many changes.
+
+All testing is now being made with Fuel installer. Maintenance use case
+is now only tested against latest upstream Fenix. Only sample inspector is
+tested as Fuel do not support Vitrage or Congress.
+
+Summary
+=======
+
+Iruya Doctor framework uses OpenStack Stein integrated into its test cases.
+
+Release Data
+============
+
+Doctor changes
+
+- Maintenance use case updated to support latest version of Fenix running
+ in container on controller node
+- Maintenance use case now support Fuel installer
+- Doctor updated to use OpenStack Stein and only python 3.6
+- Testing only sample inspector as lacking installer support for
+ Vitrage and Congress
+
+Releng changes
+
+- Doctor testing running with python 3.6 and with sample inspector
+- Doctor is only tested with Fuel installer
+
+Version change
+^^^^^^^^^^^^^^
+
+Module version changes
+~~~~~~~~~~~~~~~~~~~~~~
+
+- OpenStack has changed from Rocky to Stein since previous Hunter release.
+
+Document version changes
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+N/A
+
+Reason for version
+^^^^^^^^^^^^^^^^^^
+
+N/A
+
+Feature additions
+~~~~~~~~~~~~~~~~~
+
++--------------------+--------------------------------------------------------------+
+| **JIRA REFERENCE** | **SLOGAN** |
++--------------------+--------------------------------------------------------------+
+| DOCTOR-134 | Update Doctor maintenance use case to work with latest Fenix |
++--------------------+--------------------------------------------------------------+
+
+Deliverables
+------------
+
+Software deliverables
+=====================
+
+None
+
+Documentation deliverables
+==========================
+
+https://git.opnfv.org/doctor/tree/docs
+
+Known Limitations, Issues and Workarounds
+=========================================
+
+System Limitations
+^^^^^^^^^^^^^^^^^^
+
+Maintenance test case requirements:
+
+- Minimum number of nodes: 1 Controller, 3 Computes
+- Min number of VCPUs: 2 VCPUs for each compute
+
+Known issues
+^^^^^^^^^^^^
+
+None
+
+Workarounds
+^^^^^^^^^^^
+
+None
+
+Test Result
+===========
+
+Doctor CI results with TEST_CASE='fault_management' and INSPECTOR_TYPE=sample
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++--------------------------------------+--------------+
+| **TEST-SUITE** | **Results:** |
++--------------------------------------+--------------+
+| INSTALLER_TYPE='fuel' | SUCCESS |
++--------------------------------------+--------------+
+
+Doctor CI results with TEST_CASE='maintenance' and INSPECTOR_TYPE=sample
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++--------------------------------------+--------------+
+| **TEST-SUITE** | **Results:** |
++--------------------------------------+--------------+
+| INSTALLER_TYPE='fuel' | SUCCESS |
+| ADMIN_TOOL_TYPE='fenix' *) | |
++--------------------------------------+--------------+
+
+*) Sample implementation not updated according to latest upstream Fenix
+ and is currently not being tested.
+
+References
+==========
+
+For more information about the OPNFV Doctor latest work, please see:
+
+https://wiki.opnfv.org/display/doctor/Doctor+Home
diff --git a/docs/release/scenarios/fault_management/fault_management.rst b/docs/release/scenarios/fault_management/fault_management.rst
new file mode 100644
index 00000000..99371201
--- /dev/null
+++ b/docs/release/scenarios/fault_management/fault_management.rst
@@ -0,0 +1,90 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+
+Running test cases
+""""""""""""""""""
+
+Functest will call the "doctor_tests/main.py" in Doctor to run the test job.
+Doctor testing can also be triggered by tox on OPNFV installer jumphost. Tox
+is normally used for functional, module and coding style testing in Python
+project.
+
+Currently 'MCP' and 'devstack' installer are supported.
+
+
+Fault management use case
+"""""""""""""""""""""""""
+
+* A consumer of the NFVI wants to receive immediate notifications about faults
+ in the NFVI affecting the proper functioning of the virtual resources.
+ Therefore, such faults have to be detected as quickly as possible, and, when
+ a critical error is observed, the affected consumer is immediately informed
+ about the fault and can switch over to the STBY configuration.
+
+The faults to be monitored (and at which detection rate) will be configured by
+the consumer. Once a fault is detected, the Inspector in the Doctor
+architecture will check the resource map maintained by the Controller, to find
+out which virtual resources are affected and then update the resources state.
+The Notifier will receive the failure event requests sent from the Controller,
+and notify the consumer(s) of the affected resources according to the alarm
+configuration.
+
+Detailed workflow information is as follows:
+
+* Consumer(VNFM): (step 0) creates resources (network, server/instance) and an
+ event alarm on state down notification of that server/instance or Neutron
+ port.
+
+* Monitor: (step 1) periodically checks nodes, such as ping from/to each
+ dplane nic to/from gw of node, (step 2) once it fails to send out event
+ with "raw" fault event information to Inspector
+
+* Inspector: when it receives an event, it will (step 3) mark the host down
+ ("mark-host-down"), (step 4) map the PM to VM, and change the VM status to
+ down. In network failure case, also Neutron port is changed to down.
+
+* Controller: (step 5) sends out instance update event to Ceilometer. In network
+ failure case, also Neutron port is changed to down and corresponding event is
+ sent to Ceilometer.
+
+* Notifier: (step 6) Ceilometer transforms and passes the events to AODH,
+ (step 7) AODH will evaluate events with the registered alarm definitions,
+ then (step 8) it will fire the alarm to the "consumer" who owns the
+ instance
+
+* Consumer(VNFM): (step 9) receives the event and (step 10) recreates a new
+ instance
+
+Fault management test case
+""""""""""""""""""""""""""
+
+Functest will call the 'doctor-test' command in Doctor to run the test job.
+
+The following steps are executed:
+
+Firstly, get the installer ip according to the installer type. Then ssh to
+the installer node to get the private key for accessing to the cloud. As
+'fuel' installer, ssh to the controller node to modify nova and ceilometer
+configurations.
+
+Secondly, prepare image for booting VM, then create a test project and test
+user (both default to doctor) for the Doctor tests.
+
+Thirdly, boot a VM under the doctor project and check the VM status to verify
+that the VM is launched completely. Then get the compute host info where the VM
+is launched to verify connectivity to the target compute host. Get the consumer
+ip according to the route to compute ip and create an alarm event in Ceilometer
+using the consumer ip.
+
+Fourthly, the Doctor components are started, and, based on the above preparation,
+a failure is injected to the system, i.e. the network of compute host is
+disabled for 3 minutes. To ensure the host is down, the status of the host
+will be checked.
+
+Finally, the notification time, i.e. the time between the execution of step 2
+(Monitor detects failure) and step 9 (Consumer receives failure notification)
+is calculated.
+
+According to the Doctor requirements, the Doctor test is successful if the
+notification time is below 1 second.
diff --git a/docs/release/scenarios/maintenance/images/Fault-management-design.png b/docs/release/scenarios/maintenance/images/Fault-management-design.png
new file mode 100644
index 00000000..6d98cdec
--- /dev/null
+++ b/docs/release/scenarios/maintenance/images/Fault-management-design.png
Binary files differ
diff --git a/docs/release/scenarios/maintenance/images/LICENSE b/docs/release/scenarios/maintenance/images/LICENSE
new file mode 100644
index 00000000..21a2d03d
--- /dev/null
+++ b/docs/release/scenarios/maintenance/images/LICENSE
@@ -0,0 +1,14 @@
+Copyright 2017 Open Platform for NFV Project, Inc. and its contributors
+
+Open Platform for NFV Project Documentation License
+===================================================
+Any documentation developed by the "Open Platform for NFV Project"
+is licensed under a Creative Commons Attribution 4.0 International License.
+You should have received a copy of the license along with this. If not,
+see <http://creativecommons.org/licenses/by/4.0/>.
+
+Unless required by applicable law or agreed to in writing, documentation
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/docs/release/scenarios/maintenance/images/Maintenance-design.png b/docs/release/scenarios/maintenance/images/Maintenance-design.png
new file mode 100644
index 00000000..8f21db6a
--- /dev/null
+++ b/docs/release/scenarios/maintenance/images/Maintenance-design.png
Binary files differ
diff --git a/docs/release/scenarios/maintenance/images/Maintenance-workflow.png b/docs/release/scenarios/maintenance/images/Maintenance-workflow.png
new file mode 100644
index 00000000..9b65fd59
--- /dev/null
+++ b/docs/release/scenarios/maintenance/images/Maintenance-workflow.png
Binary files differ
diff --git a/docs/release/scenarios/maintenance/maintenance.rst b/docs/release/scenarios/maintenance/maintenance.rst
new file mode 100644
index 00000000..ecfe76b1
--- /dev/null
+++ b/docs/release/scenarios/maintenance/maintenance.rst
@@ -0,0 +1,120 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+
+Maintenance use case
+""""""""""""""""""""
+
+* A consumer of the NFVI wants to interact with NFVI maintenance, upgrade,
+ scaling and to have graceful retirement. Receiving notifications over these
+ NFVI events and responding to those within given time window, consumer can
+ guarantee zero downtime to his service.
+
+The maintenance use case adds the Doctor platform an `admin tool` and an
+`app manager` component. Overview of maintenance components can be seen in
+:numref:`figure-p2`.
+
+.. figure:: ./images/Maintenance-design.png
+ :name: figure-p2
+ :width: 100%
+
+ Doctor platform components in maintenance use case
+
+In maintenance use case, `app manager` (VNFM) will subscribe to maintenance
+notifications triggered by project specific alarms through AODH. This is the way
+it gets to know different NFVI maintenance, upgrade and scaling operations that
+effect to its instances. The `app manager` can do actions depicted in `green
+color` or tell `admin tool` to do admin actions depicted in `orange color`
+
+Any infrastructure component like `Inspector` can subscribe to maintenance
+notifications triggered by host specific alarms through AODH. Subscribing to the
+notifications needs admin privileges and can tell when a host is out of use as
+in maintenance and when it is taken back to production.
+
+Maintenance test case
+"""""""""""""""""""""
+
+Maintenance test case is currently running in our Apex CI and executed by tox.
+This is because the special limitation mentioned below and also the fact we
+currently have only sample implementation as a proof of concept and we also
+support unofficial OpenStack project Fenix. Environment variable
+TEST_CASE='maintenance' needs to be used when executing "doctor_tests/main.py"
+and ADMIN_TOOL_TYPE='fenix' if want to test with Fenix instead of sample
+implementation. Test case workflow can be seen in :numref:`figure-p3`.
+
+.. figure:: ./images/Maintenance-workflow.png
+ :name: figure-p3
+ :width: 100%
+
+ Maintenance test case workflow
+
+In test case all compute capacity will be consumed with project (VNF) instances.
+For redundant services on instances and an empty compute needed for maintenance,
+test case will need at least 3 compute nodes in system. There will be 2
+instances on each compute, so minimum number of VCPUs is also 2. Depending on
+how many compute nodes there is application will always have 2 redundant
+instances (ACT-STDBY) on different compute nodes and rest of the compute
+capacity will be filled with non-redundant instances.
+
+For each project specific maintenance message there is a time window for
+`app manager` to make any needed action. This will guarantee zero
+down time for his service. All replies back are done by calling `admin tool` API
+given in the message.
+
+The following steps are executed:
+
+Infrastructure admin will call `admin tool` API to trigger maintenance for
+compute hosts having instances belonging to a VNF.
+
+Project specific `MAINTENANCE` notification is triggered to tell `app manager`
+that his instances are going to hit by infrastructure maintenance at a specific
+point in time. `app manager` will call `admin tool` API to answer back
+`ACK_MAINTENANCE`.
+
+When the time comes to start the actual maintenance workflow in `admin tool`,
+a `DOWN_SCALE` notification is triggered as there is no empty compute node for
+maintenance (or compute upgrade). Project receives corresponding alarm and scales
+down instances and call `admin tool` API to answer back `ACK_DOWN_SCALE`.
+
+As it might happen instances are not scaled down (removed) from a single
+compute node, `admin tool` might need to figure out what compute node should be
+made empty first and send `PREPARE_MAINTENANCE` to project telling which instance
+needs to be migrated to have the needed empty compute. `app manager` makes sure
+he is ready to migrate instance and call `admin tool` API to answer back
+`ACK_PREPARE_MAINTENANCE`. `admin tool` will make the migration and answer
+`ADMIN_ACTION_DONE`, so `app manager` knows instance can be again used.
+
+:numref:`figure-p3` has next a light blue section of actions to be done for each
+compute. However as we now have one empty compute, we will maintain/upgrade that
+first. So on first round, we can straight put compute in maintenance and send
+admin level host specific `IN_MAINTENANCE` message. This is caught by `Inspector`
+to know host is down for maintenance. `Inspector` can now disable any automatic
+fault management actions for the host as it can be down for a purpose. After
+`admin tool` has completed maintenance/upgrade `MAINTENANCE_COMPLETE` message
+is sent to tell host is back in production.
+
+Next rounds we always have instances on compute, so we need to have
+`PLANNED_MAINTANANCE` message to tell that those instances are now going to hit
+by maintenance. When `app manager` now receives this message, he knows instances
+to be moved away from compute will now move to already maintained/upgraded host.
+In test case no upgrade is done on application side to upgrade instances
+according to new infrastructure capabilities, but this could be done here as
+this information is also passed in the message. This might be just upgrading
+some RPMs, but also totally re-instantiating instance with a new flavor. Now if
+application runs an active side of a redundant instance on this compute,
+a switch over will be done. After `app manager` is ready he will call
+`admin tool` API to answer back `ACK_PLANNED_MAINTENANCE`. In test case the
+answer is `migrate`, so `admin tool` will migrate instances and reply
+`ADMIN_ACTION_DONE` and then `app manager` knows instances can be again used.
+Then we are ready to make the actual maintenance as previously trough
+`IN_MAINTENANCE` and `MAINTENANCE_COMPLETE` steps.
+
+After all computes are maintained, `admin tool` can send `MAINTENANCE_COMPLETE`
+to tell maintenance/upgrade is now complete. For `app manager` this means he
+can scale back to full capacity.
+
+There is currently sample implementation on VNFM and test case. In
+infrastructure side there is sample implementation of 'admin_tool' and
+there is also support for the OpenStack Fenix that extends the use case to
+support 'ETSI FEAT03' for VNFM interaction and to optimize the whole
+infrastructure mainteannce and upgrade.
diff --git a/docs/release/userguide/get-valid-server-state.rst b/docs/release/userguide/get-valid-server-state.rst
new file mode 100644
index 00000000..824ea3c2
--- /dev/null
+++ b/docs/release/userguide/get-valid-server-state.rst
@@ -0,0 +1,125 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+======================
+Get valid server state
+======================
+
+Related Blueprints:
+===================
+
+https://blueprints.launchpad.net/nova/+spec/get-valid-server-state
+
+Problem description
+===================
+
+Previously when the owner of a VM has queried his VMs, he has not received
+enough state information, states have not changed fast enough in the VIM and
+they have not been accurate in some scenarios. With this change this gap is now
+closed.
+
+A typical case is that, in case of a fault of a host, the user of a high
+availability service running on top of that host, needs to make an immediate
+switch over from the faulty host to an active standby host. Now, if the compute
+host is forced down [1] as a result of that fault, the user has to be notified
+about this state change such that the user can react accordingly. Similarly,
+a change of the host state to "maintenance" should also be notified to the
+users.
+
+What is changed
+===============
+
+A new ``host_status`` parameter is added to the ``/servers/{server_id}`` and
+``/servers/detail`` endpoints in microversion 2.16. By this new parameter
+user can get additional state information about the host.
+
+``host_status`` possible values where next value in list can override the
+previous:
+
+- ``UP`` if nova-compute is up.
+- ``UNKNOWN`` if nova-compute status was not reported by servicegroup driver
+ within configured time period. Default is within 60 seconds,
+ but can be changed with ``service_down_time`` in nova.conf.
+- ``DOWN`` if nova-compute was forced down.
+- ``MAINTENANCE`` if nova-compute was disabled. MAINTENANCE in API directly
+ means nova-compute service is disabled. Different wording is used to avoid
+ the impression that the whole host is down, as only scheduling of new VMs
+ is disabled.
+- Empty string indicates there is no host for server.
+
+``host_status`` is returned in the response in case the policy permits. By
+default the policy is for admin only in Nova policy.json::
+
+ "os_compute_api:servers:show:host_status": "rule:admin_api"
+
+For an NFV use case this has to also be enabled for the owner of the VM::
+
+ "os_compute_api:servers:show:host_status": "rule:admin_or_owner"
+
+REST API examples:
+==================
+
+Case where nova-compute is enabled and reporting normally::
+
+ GET /v2.1/{tenant_id}/servers/{server_id}
+
+ 200 OK
+ {
+ "server": {
+ "host_status": "UP",
+ ...
+ }
+ }
+
+Case where nova-compute is enabled, but not reporting normally::
+
+ GET /v2.1/{tenant_id}/servers/{server_id}
+
+ 200 OK
+ {
+ "server": {
+ "host_status": "UNKNOWN",
+ ...
+ }
+ }
+
+Case where nova-compute is enabled, but forced_down::
+
+ GET /v2.1/{tenant_id}/servers/{server_id}
+
+ 200 OK
+ {
+ "server": {
+ "host_status": "DOWN",
+ ...
+ }
+ }
+
+Case where nova-compute is disabled::
+
+ GET /v2.1/{tenant_id}/servers/{server_id}
+
+ 200 OK
+ {
+ "server": {
+ "host_status": "MAINTENANCE",
+ ...
+ }
+ }
+
+Host Status is also visible in python-novaclient::
+
+ +-------+------+--------+------------+-------------+----------+-------------+
+ | ID | Name | Status | Task State | Power State | Networks | Host Status |
+ +-------+------+--------+------------+-------------+----------+-------------+
+ | 9a... | vm1 | ACTIVE | - | RUNNING | xnet=... | UP |
+ +-------+------+--------+------------+-------------+----------+-------------+
+
+Links:
+======
+
+[1] Manual for OpenStack NOVA API for marking host down
+http://artifacts.opnfv.org/doctor/docs/manuals/mark-host-down_manual.html
+
+[2] OpenStack compute manual page
+http://developer.openstack.org/api-ref-compute-v2.1.html#compute-v2.1
diff --git a/docs/release/userguide/index.rst b/docs/release/userguide/index.rst
index eee855dc..577072c7 100644
--- a/docs/release/userguide/index.rst
+++ b/docs/release/userguide/index.rst
@@ -11,3 +11,6 @@ Doctor User Guide
:maxdepth: 2
feature.userguide.rst
+ get-valid-server-state.rst
+ mark-host-down_manual.rst
+ monitors.rst
diff --git a/docs/release/userguide/mark-host-down_manual.rst b/docs/release/userguide/mark-host-down_manual.rst
new file mode 100644
index 00000000..3815205d
--- /dev/null
+++ b/docs/release/userguide/mark-host-down_manual.rst
@@ -0,0 +1,122 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+=========================================
+OpenStack NOVA API for marking host down.
+=========================================
+
+Related Blueprints:
+===================
+
+ https://blueprints.launchpad.net/nova/+spec/mark-host-down
+ https://blueprints.launchpad.net/python-novaclient/+spec/support-force-down-service
+
+What the API is for
+===================
+
+ This API will give external fault monitoring system a possibility of telling
+ OpenStack Nova fast that compute host is down. This will immediately enable
+ calling of evacuation of any VM on host and further enabling faster HA
+ actions.
+
+What this API does
+==================
+
+ In OpenStack the nova-compute service state can represent the compute host
+ state and this new API is used to force this service down. It is assumed
+ that the one calling this API has made sure the host is also fenced or
+ powered down. This is important, so there is no chance same VM instance will
+ appear twice in case evacuated to new compute host. When host is recovered
+ by any means, the external system is responsible of calling the API again to
+ disable forced_down flag and let the host nova-compute service report again
+ host being up. If network fenced host come up again it should not boot VMs
+ it had if figuring out they are evacuated to other compute host. The
+ decision of deleting or booting VMs there used to be on host should be
+ enhanced later to be more reliable by Nova blueprint:
+ https://blueprints.launchpad.net/nova/+spec/robustify-evacuate
+
+REST API for forcing down:
+==========================
+
+ Parameter explanations:
+ tenant_id: Identifier of the tenant.
+ binary: Compute service binary name.
+ host: Compute host name.
+ forced_down: Compute service forced down flag.
+ token: Token received after successful authentication.
+ service_host_ip: Serving controller node ip.
+
+ request:
+ PUT /v2.1/{tenant_id}/os-services/force-down
+ {
+ "binary": "nova-compute",
+ "host": "compute1",
+ "forced_down": true
+ }
+
+ response:
+ 200 OK
+ {
+ "service": {
+ "host": "compute1",
+ "binary": "nova-compute",
+ "forced_down": true
+ }
+ }
+
+ Example:
+ curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services
+ /force-down -H "Content-Type: application/json" -H "Accept: application/json
+ " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b
+ inary": "nova-compute", "host": "compute1", "forced_down": true}'
+
+CLI for forcing down:
+=====================
+
+ nova service-force-down <hostname> nova-compute
+
+ Example:
+ nova service-force-down compute1 nova-compute
+
+REST API for disabling forced down:
+===================================
+
+ Parameter explanations:
+ tenant_id: Identifier of the tenant.
+ binary: Compute service binary name.
+ host: Compute host name.
+ forced_down: Compute service forced down flag.
+ token: Token received after successful authentication.
+ service_host_ip: Serving controller node ip.
+
+ request:
+ PUT /v2.1/{tenant_id}/os-services/force-down
+ {
+ "binary": "nova-compute",
+ "host": "compute1",
+ "forced_down": false
+ }
+
+ response:
+ 200 OK
+ {
+ "service": {
+ "host": "compute1",
+ "binary": "nova-compute",
+ "forced_down": false
+ }
+ }
+
+ Example:
+ curl -g -i -X PUT http://{service_host_ip}:8774/v2.1/{tenant_id}/os-services
+ /force-down -H "Content-Type: application/json" -H "Accept: application/json
+ " -H "X-OpenStack-Nova-API-Version: 2.11" -H "X-Auth-Token: {token}" -d '{"b
+ inary": "nova-compute", "host": "compute1", "forced_down": false}'
+
+CLI for disabling forced down:
+==============================
+
+ nova service-force-down --unset <hostname> nova-compute
+
+ Example:
+ nova service-force-down --unset compute1 nova-compute
diff --git a/docs/release/userguide/monitors.rst b/docs/release/userguide/monitors.rst
new file mode 100644
index 00000000..eeb5e226
--- /dev/null
+++ b/docs/release/userguide/monitors.rst
@@ -0,0 +1,37 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+Monitor Types and Limitations
+=============================
+
+Currently there are two monitor types supported: sample and collectd
+
+Sample Monitor
+--------------
+
+Sample monitor type pings the compute host from the control host and calculates the
+notification time after the ping timeout.
+Also if inspector type is sample, the compute node needs to communicate with the control
+node on port 12345. This port needs to be opened for incomming traffic on control node.
+
+Collectd Monitor
+----------------
+
+Collectd monitor type uses collectd daemon running ovs_events plugin. Collectd runs on
+compute to send instant notification to the control node. The notification time is
+calculated by using the difference of time at which compute node sends notification to
+control node and the time at which consumer is notified. The time on control and compute
+node has to be synchronized for this reason. For further details on setting up collectd
+on the compute node, use the following link:
+:doc:`<barometer:release/userguide/feature.userguide>`
+
+
+Collectd monitors an interface managed by OVS. If the interface is not be assigned
+an IP, the user has to provide the name of interface to be monitored. The command to
+launch the doctor test in that case is:
+MONITOR_TYPE=collectd INSPECTOR_TYPE=sample INTERFACE_NAME=example_iface ./run.sh
+
+If the interface name or IP is not provided, the collectd monitor type will monitor the
+default management interface. This may result in the failure of doctor run.sh test case.
+The test case sets the monitored interface down and if the inspector (sample or congress)
+is running on the same subnet, collectd monitor will not be able to communicate with it.