diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/design/performance-profiler.rst | 118 | ||||
-rw-r--r-- | docs/requirements/02-use_cases.rst | 2 | ||||
-rw-r--r-- | docs/requirements/03-architecture.rst | 4 | ||||
-rw-r--r--[-rwxr-xr-x] | docs/requirements/images/figure1.png | bin | 977880 -> 79420 bytes | |||
-rw-r--r--[-rwxr-xr-x] | docs/requirements/images/figure2.png | bin | 1043699 -> 82010 bytes |
5 files changed, 121 insertions, 3 deletions
diff --git a/docs/design/performance-profiler.rst b/docs/design/performance-profiler.rst new file mode 100644 index 00000000..f834a915 --- /dev/null +++ b/docs/design/performance-profiler.rst @@ -0,0 +1,118 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + + +==================== +Performance Profiler +==================== + +https://goo.gl/98Osig + +This blueprint proposes to create a performance profiler for doctor scenarios. + +Problem Description +=================== + +In the verification job for notification time, we have encountered some +performance issues, such as + +1. In environment deployed by APEX, it meets the criteria while in the one by +Fuel, the performance is much more poor. +2. Signification performance degradation was spotted when we increase the total +number of VMs + +It takes time to dig the log and analyse the reason. People have to collect +timestamp at each checkpoints manually to find out the bottleneck. A performance +profiler will make this process automatic. + +Proposed Change +=============== + +Current Doctor scenario covers the inspector and notifier in the whole fault +management cycle:: + + start end + + + + + + + + | | | | | | + |monitor|inspector|notifier|manager|controller| + +------>+ | | | | + occurred +-------->+ | | | + | detected +------->+ | | + | | identified +-------+ | + | | notified +--------->+ + | | | processed resolved + | | | | + | +<-----doctor----->+ | + | | + | | + +<---------------fault management------------>+ + +The notification time can be split into several parts and visualized as a +timeline:: + + start end + 0----5---10---15---20---25---30---35---40---45--> (x 10ms) + + + + + + + + + + + + + 0-hostdown | | | | | | | | | + +--->+ | | | | | | | | | + | 1-raw failure | | | | | | | + | +-->+ | | | | | | | | + | | 2-found affected | | | | | + | | +-->+ | | | | | | | + | | 3-marked host down| | | | | + | | +-->+ | | | | | | + | | 4-set VM error| | | | | + | | +--->+ | | | | | + | | | 5-notified VM error | | + | | | +----->| | | | | + | | | | 6-transformed event + | | | | +-->+ | | | + | | | | | 7-evaluated event + | | | | | +-->+ | | + | | | | | 8-fired alarm + | | | | | +-->+ | + | | | | | 9-received alarm + | | | | | +-->+ + sample | sample | | | |10-handled alarm + monitor| inspector |nova| c/m | aodh | + | | + +<-----------------doctor--------------->+ + +Note: c/m = ceilometer + +And a table of components sorted by time cost from most to least + ++----------+---------+----------+ +|Component |Time Cost|Percentage| ++==========+=========+==========+ +|inspector |160ms | 40% | ++----------+---------+----------+ +|aodh |110ms | 30% | ++----------+---------+----------+ +|monitor |50ms | 14% | ++----------+---------+----------+ +|... | | | ++----------+---------+----------+ +|... | | | ++----------+---------+----------+ + +Note: data in the table is for demonstration only, not actual measurement + +Timestamps can be collected from various sources + +1. log files +2. trace point in code + +The performance profiler will be integrated into the verification job to provide +detail result of the test. It can also be deployed independently to diagnose +performance issue in specified environment. + +Working Items +============= + +1. PoC with limited checkpoints +2. Integration with verification job +3. Collect timestamp at all checkpoints +4. Display the profiling result in console +5. Report the profiling result to test database +6. Independent package which can be installed to specified environment diff --git a/docs/requirements/02-use_cases.rst b/docs/requirements/02-use_cases.rst index 424a3c6e..0a1f6413 100644 --- a/docs/requirements/02-use_cases.rst +++ b/docs/requirements/02-use_cases.rst @@ -136,7 +136,7 @@ the same as in the "Fault management using ACT-STBY configuration" use case, except in this case, the Consumer of a VM/VNF switches to STBY configuration based on a predicted fault, rather than an occurred fault. -NVFI Maintenance +NFVI Maintenance ---------------- VM Retirement diff --git a/docs/requirements/03-architecture.rst b/docs/requirements/03-architecture.rst index 8ff5dacf..9f620e68 100644 --- a/docs/requirements/03-architecture.rst +++ b/docs/requirements/03-architecture.rst @@ -217,7 +217,7 @@ restart of the VM, migration/evacuation of the VM, or no action. High level northbound interface specification --------------------------------------------- -Fault management +Fault Management ^^^^^^^^^^^^^^^^ This interface allows the Consumer to subscribe to fault notification from the @@ -321,7 +321,7 @@ An example of a high level message flow to cover the failed NFVI maintenance cas shown in :numref:`figure5c`. It consists of the following steps: -5. The Consumer C3 switches to standby configuration (STDBY). +5. The Consumer C3 switches to standby configuration (STBY). 6. Instructions from Consumers C2/C3 are shared to VIM requesting certain actions to be performed (steps 6a, 6b). The VIM executes the requested actions and sends back a NACK to consumer C2 (step 6d) as the migration of the virtual resource(s) is not completed by the given timeout. diff --git a/docs/requirements/images/figure1.png b/docs/requirements/images/figure1.png Binary files differindex dacf0dd4..267ddddc 100755..100644 --- a/docs/requirements/images/figure1.png +++ b/docs/requirements/images/figure1.png diff --git a/docs/requirements/images/figure2.png b/docs/requirements/images/figure2.png Binary files differindex 3c8a2bf1..9a3b166d 100755..100644 --- a/docs/requirements/images/figure2.png +++ b/docs/requirements/images/figure2.png |