summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/design/performance-profiler.rst118
-rw-r--r--docs/requirements/02-use_cases.rst2
-rw-r--r--docs/requirements/03-architecture.rst4
-rw-r--r--[-rwxr-xr-x]docs/requirements/images/figure1.pngbin977880 -> 79420 bytes
-rw-r--r--[-rwxr-xr-x]docs/requirements/images/figure2.pngbin1043699 -> 82010 bytes
5 files changed, 121 insertions, 3 deletions
diff --git a/docs/design/performance-profiler.rst b/docs/design/performance-profiler.rst
new file mode 100644
index 00000000..f834a915
--- /dev/null
+++ b/docs/design/performance-profiler.rst
@@ -0,0 +1,118 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+
+====================
+Performance Profiler
+====================
+
+https://goo.gl/98Osig
+
+This blueprint proposes to create a performance profiler for doctor scenarios.
+
+Problem Description
+===================
+
+In the verification job for notification time, we have encountered some
+performance issues, such as
+
+1. In environment deployed by APEX, it meets the criteria while in the one by
+Fuel, the performance is much more poor.
+2. Signification performance degradation was spotted when we increase the total
+number of VMs
+
+It takes time to dig the log and analyse the reason. People have to collect
+timestamp at each checkpoints manually to find out the bottleneck. A performance
+profiler will make this process automatic.
+
+Proposed Change
+===============
+
+Current Doctor scenario covers the inspector and notifier in the whole fault
+management cycle::
+
+ start end
+ + + + + + +
+ | | | | | |
+ |monitor|inspector|notifier|manager|controller|
+ +------>+ | | | |
+ occurred +-------->+ | | |
+ | detected +------->+ | |
+ | | identified +-------+ |
+ | | notified +--------->+
+ | | | processed resolved
+ | | | |
+ | +<-----doctor----->+ |
+ | |
+ | |
+ +<---------------fault management------------>+
+
+The notification time can be split into several parts and visualized as a
+timeline::
+
+ start end
+ 0----5---10---15---20---25---30---35---40---45--> (x 10ms)
+ + + + + + + + + + + +
+ 0-hostdown | | | | | | | | |
+ +--->+ | | | | | | | | |
+ | 1-raw failure | | | | | | |
+ | +-->+ | | | | | | | |
+ | | 2-found affected | | | | |
+ | | +-->+ | | | | | | |
+ | | 3-marked host down| | | | |
+ | | +-->+ | | | | | |
+ | | 4-set VM error| | | | |
+ | | +--->+ | | | | |
+ | | | 5-notified VM error | |
+ | | | +----->| | | | |
+ | | | | 6-transformed event
+ | | | | +-->+ | | |
+ | | | | | 7-evaluated event
+ | | | | | +-->+ | |
+ | | | | | 8-fired alarm
+ | | | | | +-->+ |
+ | | | | | 9-received alarm
+ | | | | | +-->+
+ sample | sample | | | |10-handled alarm
+ monitor| inspector |nova| c/m | aodh |
+ | |
+ +<-----------------doctor--------------->+
+
+Note: c/m = ceilometer
+
+And a table of components sorted by time cost from most to least
+
++----------+---------+----------+
+|Component |Time Cost|Percentage|
++==========+=========+==========+
+|inspector |160ms | 40% |
++----------+---------+----------+
+|aodh |110ms | 30% |
++----------+---------+----------+
+|monitor |50ms | 14% |
++----------+---------+----------+
+|... | | |
++----------+---------+----------+
+|... | | |
++----------+---------+----------+
+
+Note: data in the table is for demonstration only, not actual measurement
+
+Timestamps can be collected from various sources
+
+1. log files
+2. trace point in code
+
+The performance profiler will be integrated into the verification job to provide
+detail result of the test. It can also be deployed independently to diagnose
+performance issue in specified environment.
+
+Working Items
+=============
+
+1. PoC with limited checkpoints
+2. Integration with verification job
+3. Collect timestamp at all checkpoints
+4. Display the profiling result in console
+5. Report the profiling result to test database
+6. Independent package which can be installed to specified environment
diff --git a/docs/requirements/02-use_cases.rst b/docs/requirements/02-use_cases.rst
index 424a3c6e..0a1f6413 100644
--- a/docs/requirements/02-use_cases.rst
+++ b/docs/requirements/02-use_cases.rst
@@ -136,7 +136,7 @@ the same as in the "Fault management using ACT-STBY configuration" use case,
except in this case, the Consumer of a VM/VNF switches to STBY configuration
based on a predicted fault, rather than an occurred fault.
-NVFI Maintenance
+NFVI Maintenance
----------------
VM Retirement
diff --git a/docs/requirements/03-architecture.rst b/docs/requirements/03-architecture.rst
index 8ff5dacf..9f620e68 100644
--- a/docs/requirements/03-architecture.rst
+++ b/docs/requirements/03-architecture.rst
@@ -217,7 +217,7 @@ restart of the VM, migration/evacuation of the VM, or no action.
High level northbound interface specification
---------------------------------------------
-Fault management
+Fault Management
^^^^^^^^^^^^^^^^
This interface allows the Consumer to subscribe to fault notification from the
@@ -321,7 +321,7 @@ An example of a high level message flow to cover the failed NFVI maintenance cas
shown in :numref:`figure5c`.
It consists of the following steps:
-5. The Consumer C3 switches to standby configuration (STDBY).
+5. The Consumer C3 switches to standby configuration (STBY).
6. Instructions from Consumers C2/C3 are shared to VIM requesting certain actions to be performed (steps 6a, 6b).
The VIM executes the requested actions and sends back a NACK to consumer C2 (step 6d) as the
migration of the virtual resource(s) is not completed by the given timeout.
diff --git a/docs/requirements/images/figure1.png b/docs/requirements/images/figure1.png
index dacf0dd4..267ddddc 100755..100644
--- a/docs/requirements/images/figure1.png
+++ b/docs/requirements/images/figure1.png
Binary files differ
diff --git a/docs/requirements/images/figure2.png b/docs/requirements/images/figure2.png
index 3c8a2bf1..9a3b166d 100755..100644
--- a/docs/requirements/images/figure2.png
+++ b/docs/requirements/images/figure2.png
Binary files differ