diff options
Diffstat (limited to 'docs/design')
-rw-r--r-- | docs/design/index.rst | 27 | ||||
-rw-r--r-- | docs/design/inspector-design-guideline.rst | 46 | ||||
-rw-r--r-- | docs/design/notification-alarm-evaluator.rst | 248 | ||||
-rw-r--r-- | docs/design/performance-profiler.rst | 118 | ||||
-rw-r--r-- | docs/design/port-data-plane-status.rst | 180 | ||||
-rw-r--r-- | docs/design/report-host-fault-to-update-server-state-immediately.rst | 248 | ||||
-rw-r--r-- | docs/design/rfe-port-status-update.rst | 32 |
7 files changed, 0 insertions, 899 deletions
diff --git a/docs/design/index.rst b/docs/design/index.rst deleted file mode 100644 index 963002a0..00000000 --- a/docs/design/index.rst +++ /dev/null @@ -1,27 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -**************** -Design Documents -**************** - -This is the directory to store design documents which may include draft -versions of blueprints written before proposing to upstream OSS communities -such as OpenStack, in order to keep the original blueprint as reviewed in -OPNFV. That means there could be out-dated blueprints as result of further -refinements in the upstream OSS community. Please refer to the link in each -document to find the latest version of the blueprint and status of development -in the relevant OSS community. - -See also https://wiki.opnfv.org/requirements_projects . - -.. toctree:: - :numbered: - :maxdepth: 4 - - report-host-fault-to-update-server-state-immediately.rst - notification-alarm-evaluator.rst - rfe-port-status-update.rst - port-data-plane-status.rst - inspector-design-guideline.rst - performance-profiler.rst diff --git a/docs/design/inspector-design-guideline.rst b/docs/design/inspector-design-guideline.rst deleted file mode 100644 index 4add8c0f..00000000 --- a/docs/design/inspector-design-guideline.rst +++ /dev/null @@ -1,46 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -========================== -Inspector Design Guideline -========================== - -.. NOTE:: - This is spec draft of design guideline for inspector component. - JIRA ticket to track the update and collect comments: `DOCTOR-73`_. - -This document summarize the best practise in designing a high performance -inspector to meet the requirements in `OPNFV Doctor project`_. - -Problem Description -=================== - -Some pitfalls has be detected during the development of sample inspector, e.g. -we suffered a significant `performance degrading in listing VMs in a host`_. - -A `patch set for caching the list`_ has been committed to solve issue. When a -new inspector is integrated, it would be nice to have an evaluation of existing -design and give recommendations for improvements. - -This document can be treated as a source of related blueprints in inspector -projects. - -Guidelines -========== - -Host specific VMs list ----------------------- - -TBD, see `DOCTOR-76`_. - -Parallel execution ------------------- - -TBD, see `discussion in mailing list`_. - -.. _DOCTOR-73: https://jira.opnfv.org/browse/DOCTOR-73 -.. _OPNFV Doctor project: https://wiki.opnfv.org/doctor -.. _performance degrading in listing VMs in a host: https://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-September/012591.html -.. _patch set for caching the list: https://gerrit.opnfv.org/gerrit/#/c/20877/ -.. _DOCTOR-76: https://jira.opnfv.org/browse/DOCTOR-76 -.. _discussion in mailing list: https://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-October/013036.html diff --git a/docs/design/notification-alarm-evaluator.rst b/docs/design/notification-alarm-evaluator.rst deleted file mode 100644 index d1bf787a..00000000 --- a/docs/design/notification-alarm-evaluator.rst +++ /dev/null @@ -1,248 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -============================ -Notification Alarm Evaluator -============================ - -.. NOTE:: - This is spec draft of blueprint for OpenStack Ceilomter Liberty. - To see current version: https://review.openstack.org/172893 - To track development activity: - https://blueprints.launchpad.net/ceilometer/+spec/notification-alarm-evaluator - -https://blueprints.launchpad.net/ceilometer/+spec/notification-alarm-evaluator - -This blueprint proposes to add a new alarm evaluator for handling alarms on -events passed from other OpenStack services, that provides event-driven alarm -evaluation which makes new sequence in Ceilometer instead of the polling-based -approach of the existing Alarm Evaluator, and realizes immediate alarm -notification to end users. - -Problem description -=================== - -As an end user, I need to receive alarm notification immediately once -Ceilometer captured an event which would make alarm fired, so that I can -perform recovery actions promptly to shorten downtime of my service. -The typical use case is that an end user set alarm on "compute.instance.update" -in order to trigger recovery actions once the instance status has changed to -'shutdown' or 'error'. It should be nice that an end user can receive -notification within 1 second after fault observed as the same as other helth- -check mechanisms can do in some cases. - -The existing Alarm Evaluator is periodically querying/polling the databases -in order to check all alarms independently from other processes. This is good -approach for evaluating an alarm on samples stored in a certain period. -However, this is not efficient to evaluate an alarm on events which are emitted -by other OpenStack servers once in a while. - -The periodical evaluation leads delay on sending alarm notification to users. -The default period of evaluation cycle is 60 seconds. It is recommended that -an operator set longer interval than configured pipeline interval for -underlying metrics, and also longer enough to evaluate all defined alarms -in certain period while taking into account the number of resources, users and -alarms. - -Proposed change -=============== - -The proposal is to add a new event-driven alarm evaluator which receives -messages from Notification Agent and finds related Alarms, then evaluates each -alarms; - -* New alarm evaluator could receive event notification from Notification Agent - by which adding a dedicated notifier as a publisher in pipeline.yaml - (e.g. notifier://?topic=event_eval). - -* When new alarm evaluator received event notification, it queries alarm - database by Project ID and Resource ID written in the event notification. - -* Found alarms are evaluated by referring event notification. - -* Depending on the result of evaluation, those alarms would be fired through - Alarm Notifier as the same as existing Alarm Evaluator does. - -This proposal also adds new alarm type "notification" and "notification_rule". -This enables users to create alarms on events. The separation from other alarm -types (such as "threshold" type) is intended to show different timing of -evaluation and different format of condition, since the new evaluator will -check each event notification once it received whereas "threshold" alarm can -evaluate average of values in certain period calculated from multiple samples. - -The new alarm evaluator handles Notification type alarms, so we have to change -existing alarm evaluator to exclude "notification" type alarms from evaluation -targets. - -Alternatives ------------- - -There was similar blueprint proposal "Alarm type based on notification", but -the approach is different. The old proposal was to adding new step (alarm -evaluations) in Notification Agent every time it received event from other -OpenStack services, whereas this proposal intends to execute alarm evaluation -in another component which can minimize impact to existing pipeline processing. - -Another approach is enhancement of existing alarm evaluator by adding -notification listener. However, there are two issues; 1) this approach could -cause stall of periodical evaluations when it receives bulk of notifications, -and 2) this could break the alarm portioning i.e. when alarm evaluator received -notification, it might have to evaluate some alarms which are not assign to it. - -Data model impact ------------------ - -Resource ID will be added to Alarm model as an optional attribute. -This would help the new alarm evaluator to filter out non-related alarms -while querying alarms, otherwise it have to evaluate all alarms in the project. - -REST API impact ---------------- - -Alarm API will be extended as follows; - -* Add "notification" type into alarm type list -* Add "resource_id" to "alarm" -* Add "notification_rule" to "alarm" - -Sample data of Notification-type alarm:: - - { - "alarm_actions": [ - "http://site:8000/alarm" - ], - "alarm_id": null, - "description": "An alarm", - "enabled": true, - "insufficient_data_actions": [ - "http://site:8000/nodata" - ], - "name": "InstanceStatusAlarm", - "notification_rule": { - "event_type": "compute.instance.update", - "query" : [ - { - "field" : "traits.state", - "type" : "string", - "value" : "error", - "op" : "eq", - }, - ] - }, - "ok_actions": [], - "project_id": "c96c887c216949acbdfbd8b494863567", - "repeat_actions": false, - "resource_id": "153462d0-a9b8-4b5b-8175-9e4b05e9b856", - "severity": "moderate", - "state": "ok", - "state_timestamp": "2015-04-03T17:49:38.406845", - "timestamp": "2015-04-03T17:49:38.406839", - "type": "notification", - "user_id": "c96c887c216949acbdfbd8b494863567" - } - -"resource_id" will be refered to query alarm and will not be check permission -and belonging of project. - -Security impact ---------------- - -None - -Pipeline impact ---------------- - -None - -Other end user impact ---------------------- - -None - -Performance/Scalability Impacts -------------------------------- - -When Ceilomter received a number of events from other OpenStack services in -short period, this alarm evaluator can keep working since events are queued in -a messaging queue system, but it can cause delay of alarm notification to users -and increase the number of read and write access to alarm database. - -"resource_id" can be optional, but restricting it to mandatory could be reduce -performance impact. If user create "notification" alarm without "resource_id", -those alarms will be evaluated every time event occurred in the project. -That may lead new evaluator heavy. - -Other deployer impact ---------------------- - -New service process have to be run. - -Developer impact ----------------- - -Developers should be aware that events could be notified to end users and avoid -passing raw infra information to end users, while defining events and traits. - -Implementation -============== - -Assignee(s) ------------ - -Primary assignee: - r-mibu - -Other contributors: - None - -Ongoing maintainer: - None - -Work Items ----------- - -* New event-driven alarm evaluator - -* Add new alarm type "notification" as well as AlarmNotificationRule - -* Add "resource_id" to Alarm model - -* Modify existing alarm evaluator to filter out "notification" alarms - -* Add new config parameter for alarm request check whether accepting alarms - without specifying "resource_id" or not - -Future lifecycle -================ - -This proposal is key feature to provide information of cloud resources to end -users in real-time that enables efficient integration with user-side manager -or Orchestrator, whereas currently those information are considered to be -consumed by admin side tool or service. -Based on this change, we will seek orchestrating scenarios including fault -recovery and add useful event definition as well as additional traits. - -Dependencies -============ - -None - -Testing -======= - -New unit/scenario tests are required for this change. - -Documentation Impact -==================== - -* Proposed evaluator will be described in the developer document. - -* New alarm type and how to use will be explained in user guide. - -References -========== - -* OPNFV Doctor project: https://wiki.opnfv.org/doctor - -* Blueprint "Alarm type based on notification": - https://blueprints.launchpad.net/ceilometer/+spec/alarm-on-notification diff --git a/docs/design/performance-profiler.rst b/docs/design/performance-profiler.rst deleted file mode 100644 index f834a915..00000000 --- a/docs/design/performance-profiler.rst +++ /dev/null @@ -1,118 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - - -==================== -Performance Profiler -==================== - -https://goo.gl/98Osig - -This blueprint proposes to create a performance profiler for doctor scenarios. - -Problem Description -=================== - -In the verification job for notification time, we have encountered some -performance issues, such as - -1. In environment deployed by APEX, it meets the criteria while in the one by -Fuel, the performance is much more poor. -2. Signification performance degradation was spotted when we increase the total -number of VMs - -It takes time to dig the log and analyse the reason. People have to collect -timestamp at each checkpoints manually to find out the bottleneck. A performance -profiler will make this process automatic. - -Proposed Change -=============== - -Current Doctor scenario covers the inspector and notifier in the whole fault -management cycle:: - - start end - + + + + + + - | | | | | | - |monitor|inspector|notifier|manager|controller| - +------>+ | | | | - occurred +-------->+ | | | - | detected +------->+ | | - | | identified +-------+ | - | | notified +--------->+ - | | | processed resolved - | | | | - | +<-----doctor----->+ | - | | - | | - +<---------------fault management------------>+ - -The notification time can be split into several parts and visualized as a -timeline:: - - start end - 0----5---10---15---20---25---30---35---40---45--> (x 10ms) - + + + + + + + + + + + - 0-hostdown | | | | | | | | | - +--->+ | | | | | | | | | - | 1-raw failure | | | | | | | - | +-->+ | | | | | | | | - | | 2-found affected | | | | | - | | +-->+ | | | | | | | - | | 3-marked host down| | | | | - | | +-->+ | | | | | | - | | 4-set VM error| | | | | - | | +--->+ | | | | | - | | | 5-notified VM error | | - | | | +----->| | | | | - | | | | 6-transformed event - | | | | +-->+ | | | - | | | | | 7-evaluated event - | | | | | +-->+ | | - | | | | | 8-fired alarm - | | | | | +-->+ | - | | | | | 9-received alarm - | | | | | +-->+ - sample | sample | | | |10-handled alarm - monitor| inspector |nova| c/m | aodh | - | | - +<-----------------doctor--------------->+ - -Note: c/m = ceilometer - -And a table of components sorted by time cost from most to least - -+----------+---------+----------+ -|Component |Time Cost|Percentage| -+==========+=========+==========+ -|inspector |160ms | 40% | -+----------+---------+----------+ -|aodh |110ms | 30% | -+----------+---------+----------+ -|monitor |50ms | 14% | -+----------+---------+----------+ -|... | | | -+----------+---------+----------+ -|... | | | -+----------+---------+----------+ - -Note: data in the table is for demonstration only, not actual measurement - -Timestamps can be collected from various sources - -1. log files -2. trace point in code - -The performance profiler will be integrated into the verification job to provide -detail result of the test. It can also be deployed independently to diagnose -performance issue in specified environment. - -Working Items -============= - -1. PoC with limited checkpoints -2. Integration with verification job -3. Collect timestamp at all checkpoints -4. Display the profiling result in console -5. Report the profiling result to test database -6. Independent package which can be installed to specified environment diff --git a/docs/design/port-data-plane-status.rst b/docs/design/port-data-plane-status.rst deleted file mode 100644 index 06cfc3c6..00000000 --- a/docs/design/port-data-plane-status.rst +++ /dev/null @@ -1,180 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -==================================== -Port data plane status -==================================== - -https://bugs.launchpad.net/neutron/+bug/1598081 - -Neutron does not detect data plane failures affecting its logical resources. -This spec addresses that issue by means of allowing external tools to report to -Neutron about faults in the data plane that are affecting the ports. A new REST -API field is proposed to that end. - - -Problem Description -=================== - -An initial description of the problem was introduced in bug #159801 [1_]. This -spec focuses on capturing one (main) part of the problem there described, i.e. -extending Neutron's REST API to cover the scenario of allowing external tools -to report network failures to Neutron. Out of scope of this spec are works to -enable port status changes to be received and managed by mechanism drivers. - -This spec also tries to address bug #1575146 [2_]. Specifically, and argued by -the Neutron driver team in [3_]: - - * Neutron should not shut down the port completly upon detection of physnet - failure; connectivity between instances on the same node may still be - reachable. Externals tools may or may not want to trigger a status change on - the port based on their own logic and orchestration. - - * Port down is not detected when an uplink of a switch is down; - - * The physnet bridge may have multiple physical interfaces plugged; shutting - down the logical port may not be needed in case network redundancy is in - place. - - -Proposed Change -=============== - -A couple of possible approaches were proposed in [1_] (comment #3). This spec -proposes tackling the problema via a new extension API to the port resource. -The extension adds a new attribute 'dp-down' (data plane down) to represent the -status of the data plane. The field should be read-only by tenants and -read-write by admins. - -Neutron should send out an event to the message bus upon toggling the data -plane status value. The event is relevant for e.g. auditing. - - -Data Model Impact ------------------ - -A new attribute as extension will be added to the 'ports' table. - -+------------+-------+----------+---------+--------------------+--------------+ -|Attribute |Type |Access |Default |Validation/ |Description | -|Name | | |Value |Conversion | | -+============+=======+==========+=========+====================+==============+ -|dp_down |boolean|RO, tenant|False |True/False | | -| | |RW, admin | | | | -+------------+-------+----------+---------+--------------------+--------------+ - - -REST API Impact ---------------- - -A new API extension to the ports resource is going to be introduced. - -.. code-block:: python - - EXTENDED_ATTRIBUTES_2_0 = { - 'ports': { - 'dp_down': {'allow_post': False, 'allow_put': True, - 'default': False, 'convert_to': convert_to_boolean, - 'is_visible': True}, - }, - } - - -Examples -~~~~~~~~ - -Updating port data plane status to down: - -.. code-block:: json - - PUT /v2.0/ports/<port-uuid> - Accept: application/json - { - "port": { - "dp_down": true - } - } - - - -Command Line Client Impact --------------------------- - -:: - - neutron port-update [--dp-down <True/False>] <port> - openstack port set [--dp-down <True/False>] <port> - -Argument --dp-down is optional. Defaults to False. - - -Security Impact ---------------- - -None - -Notifications Impact --------------------- - -A notification (event) upon toggling the data plane status (i.e. 'dp-down' -attribute) value should be sent to the message bus. Such events do not happen -with high frequency and thus no negative impact on the notification bus is -expected. - -Performance Impact ------------------- - -None - -IPv6 Impact ------------ - -None - -Other Deployer Impact ---------------------- - -None - -Developer Impact ----------------- - -None - -Implementation -============== - -Assignee(s) ------------ - - * cgoncalves - -Work Items ----------- - - * New 'dp-down' attribute in 'ports' database table - * API extension to introduce new field to port - * Client changes to allow for data plane status (i.e. 'dp-down' attribute') - being set - * Policy (tenants read-only; admins read-write) - - -Documentation Impact -==================== - -Documentation for both administrators and end users will have to be -contemplated. Administrators will need to know how to set/unset the data plane -status field. - - -References -========== - -.. [1] RFE: Port status update, - https://bugs.launchpad.net/neutron/+bug/1598081 - -.. [2] RFE: ovs port status should the same as physnet - https://bugs.launchpad.net/neutron/+bug/1575146 - -.. [3] Neutron Drivers meeting, July 21, 2016 - http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-07-21-22.00.html diff --git a/docs/design/report-host-fault-to-update-server-state-immediately.rst b/docs/design/report-host-fault-to-update-server-state-immediately.rst deleted file mode 100644 index 2f6ce145..00000000 --- a/docs/design/report-host-fault-to-update-server-state-immediately.rst +++ /dev/null @@ -1,248 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -.. NOTE:: - This is a specification draft of a blueprint proposed for OpenStack Nova - Liberty. It was written by project member(s) and agreed within the project - before submitting it upstream. No further changes to its content will be - made here anymore; please follow it upstream: - - * Current version upstream: https://review.openstack.org/#/c/169836/ - * Development activity: - https://blueprints.launchpad.net/nova/+spec/mark-host-down - - **Original draft is as follow:** - -==================================================== -Report host fault to update server state immediately -==================================================== - -https://blueprints.launchpad.net/nova/+spec/update-server-state-immediately - -A new API is needed to report a host fault to change the state of the -instances and compute node immediately. This allows usage of evacuate API -without a delay. The new API provides the possibility for external monitoring -system to detect any kind of host failure fast and reliably and inform -OpenStack about it. Nova updates the compute node state and states of the -instances. This way the states in the Nova DB will be in sync with the -real state of the system. - -Problem description -=================== -* Nova state change for failed or unreachable host is slow and does not - reliably state compute node is down or not. This might cause same instance - to run twice if action taken to evacuate instance to another host. -* Nova state for instances on failed compute node will not change, - but remains active and running. This gives user a false information about - instance state. Currently one would need to call "nova reset-state" for each - instance to have them in error state. -* OpenStack user cannot make HA actions fast and reliably by trusting instance - state and compute node state. -* As compute node state changes slowly one cannot evacuate instances. - -Use Cases ---------- -Use case in general is that in case there is a host fault one should change -compute node state fast and reliably when using DB servicegroup backend. -On top of this here is the use cases that are not covered currently to have -instance states changed correctly: -* Management network connectivity lost between controller and compute node. -* Host HW failed. - -Generic use case flow: - -* The external monitoring system detects a host fault. -* The external monitoring system fences the host if not down already. -* The external system calls the new Nova API to force the failed compute node - into down state as well as instances running on it. -* Nova updates the compute node state and state of the effected instances to - Nova DB. - -Currently nova-compute state will be changing "down", but it takes a long -time. Server state keeps as "vm_state: active" and "power_state: -running", which is not correct. By having external tool to detect host faults -fast, fence host by powering down and then report host down to OpenStack, all -these states would reflect to actual situation. Also if OpenStack will not -implement automatic actions for fault correlation, external tool can do that. -This could be configured for example in server instance METADATA easily and be -read by external tool. - -Project Priority ------------------ -Liberty priorities have not yet been defined. - -Proposed change -=============== -There needs to be a new API for Admin to state host is down. This API is used -to mark compute node and instances running on it down to reflect the real -situation. - -Example on compute node is: - -* When compute node is up and running: - vm_state: active and power_state: running - nova-compute state: up status: enabled -* When compute node goes down and new API is called to state host is down: - vm_state: stopped power_state: shutdown - nova-compute state: down status: enabled - -vm_state values: soft-delete, deleted, resized and error -should not be touched. -task_state effect needs to be worked out if needs to be touched. - -Alternatives ------------- -There is no attractive alternatives to detect all different host faults than -to have a external tool to detect different host faults. For this kind of tool -to exist there needs to be new API in Nova to report fault. Currently there -must have been some kind of workarounds implemented as cannot trust or get the -states from OpenStack fast enough. - -Data model impact ------------------ -None - -REST API impact ---------------- -* Update CLI to report host is down - - nova host-update command - - usage: nova host-update [--status <enable|disable>] - [--maintenance <enable|disable>] - [--report-host-down] - <hostname> - - Update host settings. - - Positional arguments - - <hostname> - Name of host. - - Optional arguments - - --status <enable|disable> - Either enable or disable a host. - - --maintenance <enable|disable> - Either put or resume host to/from maintenance. - - --down - Report host down to update instance and compute node state in db. - -* Update Compute API to report host is down: - - /v2.1/{tenant_id}/os-hosts/{host_name} - - Normal response codes: 200 - Request parameters - - Parameter Style Type Description - host_name URI xsd:string The name of the host of interest to you. - - { - "host": { - "status": "enable", - "maintenance_mode": "enable" - "host_down_reported": "true" - - } - - } - - { - "host": { - "host": "65c5d5b7e3bd44308e67fc50f362aee6", - "maintenance_mode": "enabled", - "status": "enabled" - "host_down_reported": "true" - - } - - } - -* New method to nova.compute.api module HostAPI class to have a - to mark host related instances and compute node down: - set_host_down(context, host_name) - -* class novaclient.v2.hosts.HostManager(api) method update(host, values) - Needs to handle reporting host down. - -* Schema does not need changes as in db only service and server states are to - be changed. - -Security impact ---------------- -API call needs admin privileges (in the default policy configuration). - -Notifications impact --------------------- -None - -Other end user impact ---------------------- -None - -Performance Impact ------------------- -Only impact is that user can get information faster about instance and -compute node state. This also gives possibility to evacuate faster. -No impact that would slow down. Host down should be rare occurrence. - -Other deployer impact ---------------------- -Developer can make use of any external tool to detect host fault and report it -to OpenStack. - -Developer impact ----------------- -None - -Implementation -============== -Assignee(s) ------------ -Primary assignee: Tomi Juvonen -Other contributors: Ryota Mibu - -Work Items ----------- -* Test cases. -* API changes. -* Documentation. - -Dependencies -============ -None - -Testing -======= -Test cases that exists for enabling or putting host to maintenance should be -altered or similar new cases made test new functionality. - -Documentation Impact -==================== - -New API needs to be documented: - -* Compute API extensions documentation. - http://developer.openstack.org/api-ref-compute-v2.1.html -* Nova commands documentation. - http://docs.openstack.org/user-guide-admin/content/novaclient_commands.html -* Compute command-line client documentation. - http://docs.openstack.org/cli-reference/content/novaclient_commands.html -* nova.compute.api documentation. - http://docs.openstack.org/developer/nova/api/nova.compute.api.html -* High Availability guide might have page to tell external tool could provide - ability to provide faster HA as able to update states by new API. - http://docs.openstack.org/high-availability-guide/content/index.html - -References -========== -* OPNFV Doctor project: https://wiki.opnfv.org/doctor -* OpenStack Instance HA Proposal: - http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/ -* The Different Facets of OpenStack HA: - http://blog.russellbryant.net/2015/03/10/ - the-different-facets-of-openstack-ha/ diff --git a/docs/design/rfe-port-status-update.rst b/docs/design/rfe-port-status-update.rst deleted file mode 100644 index d87d7d7b..00000000 --- a/docs/design/rfe-port-status-update.rst +++ /dev/null @@ -1,32 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 - -========================== -Neutron Port Status Update -========================== - -.. NOTE:: - This document represents a Neutron RFE reviewed in the Doctor project before submitting upstream to Launchpad Neutron - space. The document is not intended to follow a blueprint format or to be an extensive document. - For more information, please visit http://docs.openstack.org/developer/neutron/policies/blueprints.html - - The RFE was submitted to Neutron. You can follow the discussions in https://bugs.launchpad.net/neutron/+bug/1598081 - -Neutron port status field represents the current status of a port in the cloud infrastructure. The field can take one of -the following values: 'ACTIVE', 'DOWN', 'BUILD' and 'ERROR'. - -At present, if a network event occurs in the data-plane (e.g. virtual or physical switch fails or one of its ports, -cable gets pulled unintentionally, infrastructure topology changes, etc.), connectivity to logical ports may be affected -and tenants' services interrupted. When tenants/cloud administrators are looking up their resources' status (e.g. Nova -instances and services running in them, network ports, etc.), they will wrongly see everything looks fine. The problem -is that Neutron will continue reporting port 'status' as 'ACTIVE'. - -Many SDN Controllers managing network elements have the ability to detect and report network events to upper layers. -This allows SDN Controllers' users to be notified of changes and react accordingly. Such information could be consumed -by Neutron so that Neutron could update the 'status' field of those logical ports, and additionally generate a -notification message to the message bus. - -However, Neutron misses a way to be able to receive such information through e.g. ML2 driver or the REST API ('status' -field is read-only). There are pros and cons on both of these approaches as well as other possible approaches. This RFE -intends to trigger a discussion on how Neutron could be improved to receive fault/change events from SDN Controllers or -even also from 3rd parties not in charge of controlling the network (e.g. monitoring systems, human admins). |