aboutsummaryrefslogtreecommitdiffstats
path: root/docs/release/userguide/UC02-feature.userguide.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/release/userguide/UC02-feature.userguide.rst')
-rw-r--r--docs/release/userguide/UC02-feature.userguide.rst176
1 files changed, 176 insertions, 0 deletions
diff --git a/docs/release/userguide/UC02-feature.userguide.rst b/docs/release/userguide/UC02-feature.userguide.rst
new file mode 100644
index 0000000..9746914
--- /dev/null
+++ b/docs/release/userguide/UC02-feature.userguide.rst
@@ -0,0 +1,176 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. SPDX-License-Identifier CC-BY-4.0
+.. (c) Open Platform for NFV Project, Inc. and its contributors
+
+
+================================================================
+Auto User Guide: Use Case 2 Resiliency Improvements Through ONAP
+================================================================
+
+This document provides the user guide for Fraser release of Auto,
+specifically for Use Case 2: Resiliency Improvements Through ONAP.
+
+
+Description
+===========
+
+This use case illustrates VNF failure recovery time reduction with ONAP, thanks to its automated monitoring
+and management. It:
+
+* simulates an underlying problem (failure, stress, or any adverse condition in the network that can impact VNFs)
+* tracks a VNF
+* measures the amount of time it takes for ONAP to restore the VNF functionality.
+
+The benefit for NFV edge service providers is to assess what degree of added VIM+NFVI platform resilience for VNFs
+is obtained by leveraging ONAP closed-loop control, vs. VIM+NFVI self-managed resilience (which may not be aware
+of the VNF or the corresponding end-to-end Service, but only of underlying resources such as VMs and servers).
+
+Also, a problem, or challenge, may not necessarily be a failure (which could also be recovered by other layers):
+it could be an issue leading to suboptimal performance, without failure. A VNF management layer as provided by
+ONAP may detect such non-failure problems, and provide a recovery solution which no other layer could provide
+in a given deployment.
+
+
+Preconditions:
+
+#. hardware environment in which Edge cloud may be deployed
+#. Edge cloud has been deployed and is ready for operation
+#. ONAP has been deployed onto a cloud and is interfaced (i.e. provisioned for API access) to the Edge cloud
+#. Components of ONAP have been deployed on the Edge cloud as necessary for specific test objectives
+
+In future releases, Auto Use cases will also include the deployment of ONAP (if not already installed),
+the deployment of test VNFs (pre-existing VNFs in pre-existing ONAP can be used in the test as well),
+the configuration of ONAP for monitoring these VNFs (policies, CLAMP, DCAE), in addition to the test
+scripts which simulate a problem and measures recovery time.
+
+Different types of problems can be simulated, hence the identification of multiple test cases corresponding
+to this use case, as illustrated in this diagram:
+
+.. image:: auto-UC02-testcases.jpg
+
+Description of simulated problems/challenges, leading to various test cases:
+
+* Physical Infra Failure
+
+ * Migration upon host failure: Compute host power is interrupted, and affected workloads are migrated to other available hosts.
+ * Migration upon disk failure: Disk volumes are unmounted, and affected workloads are migrated to other available hosts.
+ * Migration upon link failure: Traffic on links is interrupted/corrupted, and affected workloads are migrated to other available hosts.
+ * Migration upon NIC failure: NIC ports are disabled by host commands, and affected workloads are migrated to other available hosts.
+
+* Virtual Infra Failure
+
+ * OpenStack compute host service fail: Core OpenStack service processes on compute hosts are terminated, and auto-restored, or affected workloads are migrated to other available hosts.
+ * SDNC service fail: Core SDNC service processes are terminated, and auto-restored.
+ * OVS fail: OVS bridges are disabled, and affected workloads are migrated to other available hosts.
+ * etc.
+
+* Security
+
+ * Host tampering: Host tampering is detected, the host is fenced, and affected workloads are migrated to other available hosts.
+ * Host intrusion: Host intrusion attempts are detected, an offending workload, device, or flow is identified and fenced, and as needed affected workloads are migrated to other available hosts.
+ * Network intrusion: Network intrusion attempts are detected, and an offending flow is identified and fenced.
+
+
+
+Test execution high-level description
+=====================================
+
+The following two MSCs (Message Sequence Charts) show the actors and high-level interactions.
+
+The first MSC shows the preparation activities (assuming the hardware, network, cloud, and ONAP have already
+been installed): onboarding and deployment of VNFs (via ONAP portal and modules in sequence: SDC, VID, SO),
+and ONAP configuration (policy framework, closed-loops in CLAMP, activation of DCAE).
+
+.. image:: auto-UC02-preparation.jpg
+
+
+The second MSC illustrates the pattern of all test cases for the Resiliency Improvements:
+
+* simulate the chosen problem (a.k.a. a "Challenge") for this test case, for example suspend a VM which may be used by a VNF
+* start tracking the target VNF of this test case
+* measure the ONAP-orchestrated VNF Recovery Time
+* then the test stops simulating the problem (for example: resume the VM that was suspended)
+
+In parallel, the MSC also shows the sequence of events happening in ONAP, thanks to its configuration to provide Service Assurance for the VNF.
+
+.. image:: auto-UC02-pattern.jpg
+
+
+Test design: data model, implementation modules
+===============================================
+
+The high-level design of classes identifies several entities, described as follows:
+
+* ``Test Case`` : as identified above, each is a special case of the overall use case (e.g., categorized by challenge type)
+* ``Test Definition`` : gathers all the information necessary to run a certain test case
+* ``Metric Definition`` : describes a certain metric that may be measured for a Test Case, in addition to Recovery Time
+* ``Challenge Definition`` : describe the challenge (problem, failure, stress, ...) simulated by the test case
+* ``Recipient`` : entity that can receive commands and send responses, and that is queried by the Test Definition
+ or Challenge Definition (a recipient would be typically a management service, with interfaces (CLI or API) for
+ clients to query)
+* ``Resources`` : with 3 types (VNF, cloud virtual resource such as a VM, physical resource such as a server)
+
+
+Three of these entities have execution-time corresponding classes:
+
+* ``Test Execution`` , which captures all the relevant data of the execution of a Test Definition
+* ``Challenge Execution`` , which captures all the relevant data of the execution of a Challenge Definition
+* ``Metric Value`` , which captures the quantitative measurement of a Metric Definition (with a timestamp)
+
+.. image:: auto-UC02-data1.jpg
+
+
+The following diagram illustrates an implementation-independent design of the attributes of these entities:
+
+.. image:: auto-UC02-data2.jpg
+
+
+This next diagram shows the Python classes and attributes, as implemented by this Use Case (for all test cases):
+
+.. image:: auto-UC02-data3.jpg
+
+
+Test definition data is stored in serialization files (Python pickles), while test execution data is stored in CSV files, for easier post-analysis.
+
+The module design is straightforward: functions and classes for managing data, for interfacing with recipients,
+for executing tests, and for interacting with the test user (choosing a Test Definition, showing the details of
+a Test Definition, starting the execution).
+
+.. image:: auto-UC02-module1.jpg
+
+
+This last diagram shows the test user menu functions, when used interactively:
+
+.. image:: auto-UC02-module2.jpg
+
+
+In future releases of Auto, testing environments such as Robot, FuncTest and Yardstick might be leveraged. Use Case code will then be invoked by API, not by a CLI interaction.
+
+Also, anonymized test results could be collected from users willing to share them, and aggregates could be
+maintained as benchmarks.
+
+As further illustration, the next figure shows cardinalities of class instances: one Test Definition per Test Case,
+multiple Test Executions per Test Definition, zero or one Recovery Time Metric Value per Test Execution (zero if
+the test failed for any reason, including if ONAP failed to recover the challenge), etc.
+
+.. image:: auto-UC02-cardinalities.png
+
+
+In this particular implementation, both Test Definition and Challenge Definition classes have a generic execution method
+(e.g., ``run_test_code()`` for Test Definition) which can invoke a particular script, by way of an ID (which can be
+configured, and serves as a script selector for each Test Definition instance). The overall test execution logic
+between classes is show in the next figure.
+
+.. image:: auto-UC02-logic.png
+
+The execution of a test case starts with invoking the generic method from Test Definition, which then creates Execution
+instances, invokes Challenge Definition methods, performs the Recovery time calculation, performs script-specific
+actions, and writes results to the CSV files.
+
+Finally, the following diagram show a mapping between these class instances and the initial test case design. It
+corresponds to the test case which simulates a VM failure, and shows how the OpenStack SDK API is invoked (with
+a connection object) by the Challenge Definition methods, to suspend and resume a VM.
+
+.. image:: auto-UC02-TC-mapping.png
+