diff options
-rw-r--r-- | docs/design/index.rst | 1 | ||||
-rw-r--r-- | docs/design/port-status-update.rst | 32 | ||||
-rw-r--r-- | docs/userguide/doctor_scenario_in_functest.rst | 118 |
3 files changed, 151 insertions, 0 deletions
diff --git a/docs/design/index.rst b/docs/design/index.rst index 9b062349..30ce3c6c 100644 --- a/docs/design/index.rst +++ b/docs/design/index.rst @@ -21,3 +21,4 @@ See also https://wiki.opnfv.org/requirements_projects . report-host-fault-to-update-server-state-immediately.rst notification-alarm-evaluator.rst + port-status-update.rst diff --git a/docs/design/port-status-update.rst b/docs/design/port-status-update.rst new file mode 100644 index 00000000..d87d7d7b --- /dev/null +++ b/docs/design/port-status-update.rst @@ -0,0 +1,32 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +========================== +Neutron Port Status Update +========================== + +.. NOTE:: + This document represents a Neutron RFE reviewed in the Doctor project before submitting upstream to Launchpad Neutron + space. The document is not intended to follow a blueprint format or to be an extensive document. + For more information, please visit http://docs.openstack.org/developer/neutron/policies/blueprints.html + + The RFE was submitted to Neutron. You can follow the discussions in https://bugs.launchpad.net/neutron/+bug/1598081 + +Neutron port status field represents the current status of a port in the cloud infrastructure. The field can take one of +the following values: 'ACTIVE', 'DOWN', 'BUILD' and 'ERROR'. + +At present, if a network event occurs in the data-plane (e.g. virtual or physical switch fails or one of its ports, +cable gets pulled unintentionally, infrastructure topology changes, etc.), connectivity to logical ports may be affected +and tenants' services interrupted. When tenants/cloud administrators are looking up their resources' status (e.g. Nova +instances and services running in them, network ports, etc.), they will wrongly see everything looks fine. The problem +is that Neutron will continue reporting port 'status' as 'ACTIVE'. + +Many SDN Controllers managing network elements have the ability to detect and report network events to upper layers. +This allows SDN Controllers' users to be notified of changes and react accordingly. Such information could be consumed +by Neutron so that Neutron could update the 'status' field of those logical ports, and additionally generate a +notification message to the message bus. + +However, Neutron misses a way to be able to receive such information through e.g. ML2 driver or the REST API ('status' +field is read-only). There are pros and cons on both of these approaches as well as other possible approaches. This RFE +intends to trigger a discussion on how Neutron could be improved to receive fault/change events from SDN Controllers or +even also from 3rd parties not in charge of controlling the network (e.g. monitoring systems, human admins). diff --git a/docs/userguide/doctor_scenario_in_functest.rst b/docs/userguide/doctor_scenario_in_functest.rst new file mode 100644 index 00000000..3a435706 --- /dev/null +++ b/docs/userguide/doctor_scenario_in_functest.rst @@ -0,0 +1,118 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +Doctor +^^^^^^ + +Platform overview +""""""""""""""""" + +Doctor platform provides these features in `Colorado Release <https://wiki.opnfv.org/display/SWREL/Colorado>`_: + +* Immediate Notification +* Consistent resource state awareness for compute host down +* Valid compute host status given to VM owner + +These features enable high availability of Network Services on top of +the virtualized infrastructure. Immediate notification allows VNF managers +(VNFM) to process recovery actions promptly once a failure has occurred. + +Consistency of resource state is necessary to execute recovery actions +properly in the VIM. + +Ability to query host status gives VM owner the possibility to get +consistent state information through an API in case of a compute host +fault. + +The Doctor platform consists of the following components: + +* OpenStack Compute (Nova) +* OpenStack Telemetry (Ceilometer) +* OpenStack Alarming (Aodh) +* Doctor Inspector +* Doctor Monitor + +.. note:: + Doctor Inspector and Monitor are sample implementations for reference. + +You can see an overview of the Doctor platform and how components interact in +:numref:`figure-p1`. + +.. figure:: /platformoverview/images/figure-p1.png + :name: figure-p1 + :width: 100% + + Doctor platform and typical sequence (Colorado) + +Detailed information on the Doctor architecture can be found in the Doctor +requirements documentation: +http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html + + +Use case +"""""""" + +* A consumer of the NFVI wants to receive immediate notifications about faults + in the NFVI affecting the proper functioning of the virtual resources. + Therefore, such faults have to be detected as quickly as possible, and, when + a critical error is observed, the affected consumer is immediately informed + about the fault and can switch over to the STBY configuration. + +The faults to be monitored (and at which detection rate) will be configured by +the consumer. Once a fault is detected, the Inspector in the Doctor +architecture will check the resource map maintained by the Controller, to find +out which virtual resources are affected and then update the resources state. +The Notifier will receive the failure event requests sent from the Controller, +and notify the consumer(s) of the affected resources according to the alarm +configuration. + +Detailed workflow information is as follows: + +* Consumer(VNFM): (step 0) creates resources (network, server/instance) and an + event alarm on state down notification of that server/instance + +* Monitor: (step 1) periodically checks nodes, such as ping from/to each + dplane nic to/from gw of node, (step 2) once it fails to send out event + with "raw" fault event information to Inspector + +* Inspector: when it receives an event, it will (step 3) mark the host down + ("mark-host-down"), (step 4) map the PM to VM, and change the VM status to + down + +* Controller: (step 5) sends out instance update event to Ceilometer + +* Notifier: (step 6) Ceilometer transforms and passes the event to Aodh, + (step 7) Aodh will evaluate event with the registered alarm definitions, + then (step 8) it will fire the alarm to the "consumer" who owns the + instance + +* Consumer(VNFM): (step 9) receives the event and (step 10) recreates a new + instance + +Test case +""""""""" + +Functest will call the "run.sh" script in Doctor to run the test job. + +The "run.sh" script will execute the following steps. + +Firstly, verify connectivity to target compute host according to different +installer and prepare image for booting VM. Currently, only 'Apex' and +'local' installer are supported. + +Secondly, the Doctor components are started, and, based on the above +preparation, a test user (default as demo) will be created for the Doctor +tests. + +Thirdly, the VM is booted, and an alarm event is created in Ceilometer. +After sleeping for 1 minute in order to wait for the VM launch to complete, +a failure is injected to the system, i.e. the network of compute host is +disabled for 3 minutes. To ensure the host is down, the status of the host +will be checked. + +Finally, the notification time, i.e. the time between the execution of step 2 +(Monitor detects failure) and step 9 (Consumer receives failure notification) +is calculated. + +According to the Doctor requirements, the Doctor test is successful if the +notification time is below 1 second. |