summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYujun Zhang <zhang.yujunz@zte.com.cn>2017-09-06 14:05:36 +0800
committerYujun Zhang <zhang.yujunz@zte.com.cn>2017-09-21 16:18:29 +0800
commit1f00955295c2461a181aa1fa5d8587f12832bf4d (patch)
tree0f34450bdfa5eeec18e8d0e233ea609dc96d1b65
parentf5a750f19896f6f161876a5babc5b42c720dedf8 (diff)
Add parallel execution and shortcut notification to inspector design guideline
JIRA: DOCTOR-73 Change-Id: Ic412b0c5e966f4391bc0f9e5e71d64e23e2eba68 Signed-off-by: Yujun Zhang <zhang.yujunz@zte.com.cn>
-rw-r--r--docs/development/design/images/conservative-notification.pngbin0 -> 63926 bytes
-rw-r--r--docs/development/design/images/notification-time.pngbin0 -> 34847 bytes
-rw-r--r--docs/development/design/images/shortcut-notification.pngbin0 -> 66098 bytes
-rw-r--r--docs/development/design/index.rst1
-rw-r--r--docs/development/design/inspector-design-guideline.rst48
5 files changed, 48 insertions, 1 deletions
diff --git a/docs/development/design/images/conservative-notification.png b/docs/development/design/images/conservative-notification.png
new file mode 100644
index 00000000..b2645720
--- /dev/null
+++ b/docs/development/design/images/conservative-notification.png
Binary files differ
diff --git a/docs/development/design/images/notification-time.png b/docs/development/design/images/notification-time.png
new file mode 100644
index 00000000..8e140172
--- /dev/null
+++ b/docs/development/design/images/notification-time.png
Binary files differ
diff --git a/docs/development/design/images/shortcut-notification.png b/docs/development/design/images/shortcut-notification.png
new file mode 100644
index 00000000..54a3ce28
--- /dev/null
+++ b/docs/development/design/images/shortcut-notification.png
Binary files differ
diff --git a/docs/development/design/index.rst b/docs/development/design/index.rst
index e50c1704..713bb9b4 100644
--- a/docs/development/design/index.rst
+++ b/docs/development/design/index.rst
@@ -27,3 +27,4 @@ See also https://wiki.opnfv.org/requirements_projects .
inspector-design-guideline.rst
performance-profiler.rst
maintenance-design-guideline.rst
+ inspector-design-guideline.rst
diff --git a/docs/development/design/inspector-design-guideline.rst b/docs/development/design/inspector-design-guideline.rst
index faa5e424..5396f883 100644
--- a/docs/development/design/inspector-design-guideline.rst
+++ b/docs/development/design/inspector-design-guideline.rst
@@ -53,7 +53,51 @@ This guideline can be summarized as following:
Parallel execution
------------------
-TBD, see `discussion in mailing list`_.
+In doctor's architecture, the inspector is responsible to set error state for the affected VMs in order to notify the
+consumers of such failure. This is done by calling the nova `reset-state`_ API. However, this action is a synchronous
+request with many underlying steps and cost typically hundreds of milliseconds. According to the
+`discussion in mailing list`_, this time cost will grow linearly if the requests are sent one by one. It will become
+a critical issue in large scale system.
+
+It is recommended to introduce **parallel execution** for actions like ``reset-state`` that takes a list of targets.
+
+Shortcut notification
+---------------------
+
+An alternative way to improve notification performance is to take a shortcut from inspector to notifier instead of
+triggering it from controller. The difference between the two workflow is shown below:
+
+.. figure:: images/conservative-notification.png
+ :alt: conservative notification
+
+ Conservative Notification
+
+.. figure:: images/shortcut-notification.png
+ :alt: shortcut notification
+
+ Shortcut Notification
+
+It worth noting that the shortcut notification has a side effect that cloud resource states could still be out-of-sync
+by the time consumer processes the alarm notification. This is out of scope of inspector design but need to be taken
+consideration in system level.
+
+Also the call of "reset servers state to error" is not necessary in the alternative notification case where the "host
+forced down" is still called. "get-valid-server-state" was implemented to have valid server state while earlier one
+couldn't get it unless calling "reset servers state to error". When not having "reset servers state to error", states
+are more unlikely to be out of sync while notification and force down host would be parallel.
+
+Appendix
+========
+
+A study has been made to evaluate the effect of parallel execution and shortcut notification on OPNFV Beijing Summit
+2017.
+
+.. figure:: images/notification-time.png
+ :alt: notification time
+
+ Notification Time
+
+Download the `full presentation slides`_ here.
.. _DOCTOR-73: https://jira.opnfv.org/browse/DOCTOR-73
.. _OPNFV Doctor project: https://wiki.opnfv.org/doctor
@@ -61,3 +105,5 @@ TBD, see `discussion in mailing list`_.
.. _patch set for caching the list: https://gerrit.opnfv.org/gerrit/#/c/20877/
.. _DOCTOR-76: https://jira.opnfv.org/browse/DOCTOR-76
.. _discussion in mailing list: https://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-October/013036.html
+.. _reset-state: https://developer.openstack.org/api-ref/compute/#reset-server-state-os-resetstate-action
+.. _full presentation slides: https://wiki.opnfv.org/download/attachments/5046291/doctor_qtip_faster_higher_stronger.pdf \ No newline at end of file