summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Requirement/virtual_facilities_HA_new.rst39
-rw-r--r--Section6_VNF_HA.rst329
-rw-r--r--Section_2_Hardware_HA.rst186
-rw-r--r--Section_4_Virtual_Infra.rst181
-rw-r--r--UseCases/UseCases.rst731
-rw-r--r--UseCases/images/Slide10.pngbin0 -> 119323 bytes
-rw-r--r--UseCases/images/Slide11.pngbin0 -> 143237 bytes
-rw-r--r--UseCases/images/Slide13.pngbin0 -> 118070 bytes
-rw-r--r--UseCases/images/Slide14.pngbin0 -> 142022 bytes
-rw-r--r--UseCases/images/Slide16.pngbin0 -> 113076 bytes
-rw-r--r--UseCases/images/Slide17.pngbin0 -> 137640 bytes
-rw-r--r--UseCases/images/Slide19.pngbin0 -> 115266 bytes
-rw-r--r--UseCases/images/Slide20.pngbin0 -> 114247 bytes
-rw-r--r--UseCases/images/Slide4.pngbin0 -> 118822 bytes
-rw-r--r--UseCases/images/Slide6.pngbin0 -> 141097 bytes
-rw-r--r--UseCases/images/StatefullVNF-VMfailure.pngbin0 -> 51353 bytes
-rw-r--r--UseCases/images/StatefullVNF-VMfailureNoRed.pngbin0 -> 39956 bytes
-rw-r--r--UseCases/images/StatefullVNF-VNFCfailure.pngbin0 -> 28838 bytes
-rw-r--r--UseCases/images/StatefullVNF-VNFCfailureNoRed.pngbin0 -> 30368 bytes
-rw-r--r--UseCases/images/StatelessVNF-VMfailure.pngbin0 -> 48843 bytes
-rw-r--r--UseCases/images/StatelessVNF-VMfailureNoRed.pngbin0 -> 35560 bytes
-rw-r--r--UseCases/images/StatelessVNF-VNFCfailure.pngbin0 -> 26125 bytes
-rw-r--r--UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.pngbin0 -> 41571 bytes
-rw-r--r--UseCases/images/StatelessVNF-VNFCfailureNoRed.pngbin0 -> 18897 bytes
-rw-r--r--docs/etc/conf.py34
-rw-r--r--docs/etc/opnfv-logo.pngbin0 -> 2829 bytes
-rw-r--r--docs/how-to-use-docs/documentation-example.rst86
-rw-r--r--docs/how-to-use-docs/index.rst30
28 files changed, 1616 insertions, 0 deletions
diff --git a/Requirement/virtual_facilities_HA_new.rst b/Requirement/virtual_facilities_HA_new.rst
new file mode 100644
index 0000000..e313230
--- /dev/null
+++ b/Requirement/virtual_facilities_HA_new.rst
@@ -0,0 +1,39 @@
+3 Virtualization Facilities (Host OS, Hypervisor)
+====================================================
+
+3.1 Requirements on Host OS and Hypervisor and Storage
+Requirements:
+- The hypervisor should support distributed HA mechanism
+- Hypervisor should detect the failure of the VM. Failure of the VM should be reported to
+ the VIM within 1s
+- The hypervisor should report (and if possible log) its failure and recovery action.
+ and the destination to whom they are reported should be configurable.
+- The hypervisor should support VM migration
+- The hypervisor should provide isolation for VMs, so that VMs running on the same
+ hardware do not impact each other.
+- The host OS should provide sufficient process isolation so that VMs running on
+ the same hardware do not impact each other.
+- The hypervisor should record the VM information regularly and provide logs of
+ VM actions for future diagnoses.
+- The NFVI should maintain the number of VMs provided to the VNF in the face of failures.
+ I.e. the failed VM instances should be replaced by new VM instances
+3.2 Requirements on Middlewares
+Requirements:
+- It should be possible to detect and automatically recover from hypervisor failures
+ without the involvement of the VIM
+- Failure of the hypervisor should be reported to the VIM within 1s
+- Notifications about the state of the (distributed) storage backends shall be send to the
+ VIM (in-synch/healthy, re-balancing/re-building, degraded).
+- Process of VIM runing on the compute node should be monitored, and failure of it should
+ be notified to the VIM within 1s
+- Fault detection and reporting capability. There should be middlewares supporting in-band
+ reporting of HW failure to VIM.
+- Storage data path traffic shall be redundant and fail over within 1 second on link
+ failures.
+- Large deployments using distributed software-based storage shall separate storage and
+ compute nodes (non-hyperconverged deployment).
+- Distributed software-based storage services shall be deployed redundantly.
+- Data shall be stored redundantly in distributed storage backends.
+- Upon failures of storage services, automatic repair mechanisms (re-build/re-balance of
+ data) shall be triggered automatically.
+- The storage backend shall support geo-redundancy. \ No newline at end of file
diff --git a/Section6_VNF_HA.rst b/Section6_VNF_HA.rst
new file mode 100644
index 0000000..afc84ac
--- /dev/null
+++ b/Section6_VNF_HA.rst
@@ -0,0 +1,329 @@
+=======================
+6 VNF High Availability
+=======================
+
+
+************************
+6.1 Service Availability
+************************
+
+In the context of NFV, Service Availability refers to the End-to-End (E2E) Service
+Availability which includes all the elements in the end-to-end service (VNFs and
+infrastructure components) with the exception of the customer terminal such as
+handsets, computers, modems, etc. The service availability requirements for NFV
+should be the same as those for legacy systems (for the same service).
+
+Service Availability =
+total service available time /
+(total service available time + total service recovery time)
+
+The service recovery time among others depends on the number of redundant resources
+provisioned and/or instantiated that can be used for restoring the service.
+
+In the E2E relation a Network Service is available only of all the necessary
+Network Functions are available and interconnected appropriately to collaborate
+according to the NF chain.
+
+General Service Availability Requirements
+=========================================
+
+* We need to be able to define the E2E (V)NF chain based on which the E2E availability
+ requirements can be decomposed into requirements applicable to individual VNFs and
+ their interconnections
+* The interconnection of the VNFs should be logical and be maintained by the NFVI with
+ guaranteed characteristics, e.g. in case of failure the connection should be
+ restored within the acceptable tolerance time
+* These characteristics should be maintained in VM migration, failovers and switchover,
+ scale in/out, etc. scenarios
+* It should be possible to prioritize the different network services and their VNFs.
+ These priorities should be used when pre-emption policies are applied due to
+ resource shortage for example.
+* VIM should support policies to prioritize a certain VNF.
+* VIM should be able to provide classified virtual resources to VNFs in different SAL
+
+6.1.1 Service Availability Classification Levels
+================================================
+
+The [ETSI-NFV-REL_] defined three Service Availability Levels
+(SAL) are classified in Table 1. They are based on the relevant ITU-T recommendations
+and reflect the service types and the customer agreements a network operator should
+consider.
+
+.. [ETSI-NFV-REL] `ETSI GS NFV-REL 001 V1.1.1 (2015-01) <http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf>`_
+
+
+*Table 1: Service Availability classification levels*
+
++-------------+-----------------+-----------------------+---------------------+
+|SAL Type | Customer Type | Service/Function | Notes |
++=============+=================+=======================+=====================+
+|Level 1 | Network Operator| * Intra-carrier | Sub-levels within |
+| | Control Traffic | engineering | Level 1 may be |
+| | | traffic | created by the |
+| | Government/ | * Emergency | Network Operator |
+| | Regulatory | telecommunication | depending on |
+| | Emergency | service (emergency | Customer demands |
+| | Services | response, emergency| E.g.: |
+| | | dispatch) | |
+| | | * Critical Network | * 1A - Control; |
+| | | Infrastructure | * 1B - Real-time; |
+| | | Functions (e.g | * 1C - Data; |
+| | | VoLTE functions | |
+| | | DNS Servers,etc.) | May require 1+1 |
+| | | | Redundancy with |
+| | | | Instantaneous |
+| | | | Switchover |
++-------------+-----------------+-----------------------+---------------------+
+|Level 2 | Enterprise and/ | * VPN | Sub-levels within |
+| | or large scale | * Real-time traffic | Level 2 may be |
+| | customers | (Voice and video) | created by the |
+| | (e.g. | * Network | Network Operator |
+| | Corporations, | Infrastructure | depending on |
+| | University) | Functions | Customer demands. |
+| | | supporting Level | E.g.: |
+| | Network | 2 services (e.g. | |
+| | Operators | VPN servers, | * 2A - VPN; |
+| | (Tier1/2/3) | Corporate Web/ | * 2B - Real-time; |
+| | service traffic | Mail servers) | * 2C - Data; |
+| | | | |
+| | | | May require 1:1 |
+| | | | Redundancy with |
+| | | | Fast (maybe |
+| | | | Instantaneous) |
+| | | | Switchover |
++-------------+-----------------+-----------------------+---------------------+
+|Level 3 | General Consumer| * Data traffic | While this is |
+| | Public and ISP | (including voice | typically |
+| | Traffic | and video traffic | considered to be |
+| | | provided by OTT) | "Best Effort" |
+| | | * Network | traffic, it is |
+| | | Infrastructure | expected that |
+| | | Functions | Network Operators |
+| | | supporting Level | will devote |
+| | | 3 services | sufficient |
+| | | | resources to |
+| | | | assure |
+| | | | "satisfactory" |
+| | | | levels of |
+| | | | availability. |
+| | | | This level of |
+| | | | service may be |
+| | | | pre-empted by |
+| | | | those with |
+| | | | higher levels of |
+| | | | Service |
+| | | | Availability. May |
+| | | | require M+1 |
+| | | | Redundancy with |
+| | | | Fast Switchover; |
+| | | | where M > 1 and |
+| | | | the value of M to |
+| | | | be determined by |
+| | | | further study |
++-------------+-----------------+-----------------------+---------------------+
+
+Requirements
+^^^^^^^^^^^^
+
+* It shall be possible to define different service availability levels
+* It shall be possible to classify the virtual resources for the different
+ availability class levels
+* The VIM shall provide a mechanism by which VNF-specific requirements
+ can be mapped to NFVI-specific capabilities.
+
+More specifically, the requirements and capabilities may or may not be made up of the
+same KPI-like strings, but the cloud administrator must be able to configure which
+HA-specific VNF requirements are satisfied by which HA-specific NFVI capabilities.
+
+
+
+6.1.2 Metrics for Service Availability
+======================================
+
+The [ETSI-NFV-REL_] identifies four metrics relevant to service
+availability:
+
+* Failure recovery time,
+* Failure impact fraction,
+* Failure frequency, and
+* Call drop rate.
+
+6.1.2.1 Failure Recovery Time
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The failure recovery time is the time interval from the occurrence of an abnormal
+event (e.g. failure, manual interruption of service, etc.) until the recovery of the
+service regardless if it is a scheduled or unscheduled abnormal event. For the
+unscheduled case, the recovery time includes the failure detection time and the
+failure restoration time.
+More specifically restoration also allows for a service recovery by the restart of
+the failed provider(s) while failover implies that the service is recovered by a
+redundant provider taking over the service. This provider may be a standby
+(i.e. synchronizing the service state with the active provider) or a spare
+(i.e. having no state information). Accordingly failover also means switchover, that
+is, an orederly takeover of the service from the active provider by the standby/spare.
+
+Requirements
+^^^^^^^^^^^^
+
+* It should be irrelevant whether the abnormal event is due to a scheduled or
+ unscheduled operation or it is caused by a fault.
+* Failure detection mechanisms should be available in the NFVI and configurable so
+ that the target recovery times can be met
+* Abnormal events should be logged and communicated (i.e. notifications and alarms as
+ appropriate)
+
+The TL-9000 forum has specified a service interruption time of 15 seconds as outage
+for all traditional telecom system services. [ETSI-NFV-REL_]
+recommends the setting of different thresholds for the different Service Availability
+Levels. An example setting is given in the following table 2. Note that for all
+Service Availability levels Real-time Services require the fastest recovery time.
+Data services can tolerate longer recovery times. These recovery times are applicable
+to the user plane. A failure in the control plane does not have to impact the user plane.
+The main concern should be simultaneous failures in the control and user planes
+as the user plane cannot typically recover without the control plane. However an HA
+mechanism in VNF itself can further mitigate the risk. Note also that the impact on
+the user plane depends on the control plane service experiencing the failure,
+some of them are more critical than others.
+
+
+*Table 2: Example service recovery times for the service availability levels*
+
++------------+-----------------+------------------------------------------+
+|SAL | Service | Notes |
+| | Recovery | |
+| | Time | |
+| | Threshold | |
++============+=================+==========================================+
+|1 | 5 - 6 seconds | Recommendation: Redundant resources to be|
+| | | made available on-site to ensure fast |
+| | | recovery. |
++------------+-----------------+------------------------------------------+
+|2 | 10 - 15 seconds | Recommendation: Redundant resources to be|
+| | | available as a mix of on-site and off- |
+| | | site as appropriate. |
+| | | |
+| | | * On-site resources to be utilized for |
+| | | recovery of real-time services. |
+| | | * Off-site resources to be utilized for |
+| | | recovery of data services. |
++------------+-----------------+------------------------------------------+
+|3 | 20 - 25 seconds | Recommendation: Redundant resources to be|
+| | | mostly available off-site. Real-time |
+| | | services should be recovered before data |
+| | | services |
++------------+-----------------+------------------------------------------+
+
+
+6.1.2.2 Failure Impact Fraction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The failure impact fraction is the maximum percentage of the capacity or user
+population affected by a failure compared with the total capacity or the user
+population supported by a service. It is directly associated with the failure impact
+zone which is the set of resources/elements of the system to which the fault may
+propagate.
+
+Requirements
+^^^^^^^^^^^^
+
+* It should be possible to define the failure impact zone for all the elements of the
+ system
+* At the detection of a failure of an element, its failure impact zone must be
+ isolated before the associated recovery mechanism is triggered
+* If the isolation of the failure impact zone is unsuccessful the isolation should be
+ attempted at the next higher level as soon as possible to prevent fault propagation.
+* It should be possible to define different levels of failure impact zones with
+ associated isolation and alarm generation policies
+* It should be possible to limit the collocation of VMs to reduce the failure impact
+ zone as well as to provide sufficient resources
+
+6.1.2.3 Failure Frequency
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Failure frequency is the number of failures in a certain period of time.
+
+Requirements
+^^^^^^^^^^^^
+
+* There should be a probation period for each failure impact zones within which
+ failures are correlated.
+* The threshold and the probation period for the failure impact zones should be
+ configurable
+* It should be possible to define failure escalation policies for the different
+ failure impact zones
+
+
+6.1.2.4 Call Drop Rate
+^^^^^^^^^^^^^^^^^^^^^^
+
+Call drop rate reflects service continuity as well as system reliability and
+stability. The metric is inside the VNF and therefore is not specified further for
+the NFV environment.
+
+Requirements
+^^^^^^^^^^^^
+
+* It shall be possible to specify for each service availability class the associated
+ availability metrics and their thresholds
+* It shall be possible to collect data for the defined metrics
+* It shall be possible to delegate the enforcement of some thresholds to the NFVI
+* Accordingly it shall be possible to request virtual resources with guaranteed
+ characteristics, such as guaranteed latency between VMs (i.e. VNFCs), between a VM
+ and storage, between VNFs
+
+
+**********************
+6.2 Service Continuity
+**********************
+
+The determining factor with respect to service continuity is the statefulness of the
+VNF. If the VNF is stateless, there is no state information which needs to be
+preserved to prevent the perception of service discontinuity in case of failure or
+other disruptive events.
+If the VNF is stateful, the NF has a service state which needs to be preserved
+throughout such disruptive events in order to shield the service consumer from these
+events and provide the perception of service continuity. A VNF may maintain this state
+internally or externally or a combination with or without the NFVI being aware of the
+purpose of the stored data.
+
+Requirements
+============
+
+* The NFVI should maintain the number of VMs provided to the VNF in the face of
+ failures. I.e. the failed VM instances should be replaced by new VM instances
+* It should be possible to specify whether the NFVI or the VNF/VNFM handles the
+ service recovery and continuity
+* If the VNF/VNFM handles the service recovery it should be able to receive error
+ reports and/or detect failures in a timely manner.
+* The VNF (i.e. between VNFCs) may have its own fault detection mechanism, which might
+ be triggered prior to receiving the error report from the underlying NFVI therefore
+ the NFVI/VIM should not attempt to preserve the state of a failing VM if not
+ configured to do so
+* The VNF/VNFM should be able to initiate the repair/reboot of resources of the VNFI
+ (e.g. to recover from a fault persisting at the VNF level => failure impact zone
+ escalation)
+* It should be possible to disallow the live migration of VMs and when it is allowed
+ it should be possible to specify the tolerated interruption time.
+* It should be possible to restrict the simultaneous migration of VMs hosting a given
+ VNF
+* It should be possible to define under which circumstances the NFV-MANO in
+ collaboration with the NFVI should provide error handling (e.g. VNF handles local
+ recoveries while NFV-MANO handles geo-redundancy)
+* The NFVI/VIM should provide virtual resource such as storage according to the needs
+ of the VNF with the required guarantees (see virtual resource classification).
+* The VNF shall be able to define the information to be stored on its associated
+ virtual storage
+* It should be possible to define HA requirements for the storage, its availability,
+ accessibility, resilience options, i.e. the NFVI shall handle the failover for the
+ storage.
+* The NFVI shall handle the network/connectivity failures transparent to the VNFs
+* The VNFs with different requirements should be able to coexist in the NFV Framework
+* The scale in/out is triggered by the VNF (VNFM) towards the VIM (to be executed in
+ the NFVI)
+* It should be possible to define the metrics to monitor and the related thresholds
+ that trigger the scale in/out operation
+* Scale in operation should not jeopardize availability (managed by the VNF/VNFM),
+ i.e. resources can only be removed one at a time with a period in between sufficient
+ for the VNF to restore any required redundancy.
+
diff --git a/Section_2_Hardware_HA.rst b/Section_2_Hardware_HA.rst
new file mode 100644
index 0000000..7f4e054
--- /dev/null
+++ b/Section_2_Hardware_HA.rst
@@ -0,0 +1,186 @@
+===============
+2.0 Hardware HA
+===============
+
+The hardware HA can be solved by several legacy HA schemes. However, when
+considering the NFV scenarios, a hardware failure will cause collateral damage to
+not only to the services but also virtual infrastructure running on it.
+
+A redundant architecture and automatic failover for the hardware are required
+for the NFV scenario. At the same time, the fault detection and report of HW
+failure from the hardware to VIM, VNFM and if necessary the Orchestrator to achieve HA in OPNFV. A
+sample fault table can be found in the Doctor project. (https://wiki.opnfv.org/doctor/faults)
+All the critical hardware failures should be reported to the VIM within 1s.
+
+.. (MT2) Should we keep the 50ms here? Other places have been modified to <1sec, e.g. for SAL 1.
+
+.. (fq2) agree with 1s
+
+Other warnings for the hardware should also be reported to the VIM in a
+timely manner.
+
+*********************
+General Requirements:
+*********************
+
+.. (MT) Are these general requirements or just for the servers?
+
+.. (fq) I think these should be the general requirements. not just the server.
+
+* Hardware Failures should be reported to the hypervisor and the VIM.
+* Hardware Failures should not be directly reported to the VNF as in the traditional ATCA
+ architecture.
+* Hardware failure detection message should be sent to the VIM within a specified period of time,
+ based on the SAL as defined in Section 1.
+* Alarm thresholds should be detected and the alarm delivered to the VIM within 1min. A certain
+ threshold can be set for such notification.
+* Direct notification from the hardware to some specific VNF should be possible.
+ Such notification should be within 1s.
+* Periodical update of hardware running conditions (operational state?) to the
+ NFVI and VIM is required for further operation, which may include fault
+ prediction, failure analysis, and etc.. Such info should be updated every 60s
+* Transparent failover is required once the failure of storage and network
+ hardware happens.
+* Hardware should support SNMP and IPMI for centralized management, monitoring and
+ control.
+
+.. (MT) I would assume that this is OK if no guest was impacted, if there was a guest impact I think the VIM etc should know about the issue; in any case logging the failure and its correction would be still important
+.. (fq) It seems the hardware failure detection message should send to VIM, shall we delete the hypervisor part?
+.. (MT) The reason I asked the question whether this is about the servers was the hypervisor. I agree to remove this from the genaral requirement.
+.. (Yifei) Shall we take VIM user (VNFM & NFVO) into consideration? As some of the messages should be send to VIM user.
+.. (fq) yifei, I am a little bit confused, do you mean the Hardware send messages directly to VIM user? I myself think this may not be possible?
+.. (Yifei) Yes, ur right, they should be sent to VIM first.
+.. (MT) I agree, they should be sent to the VIM, the hypervisor can only be conditional because it may not be relevant as in a general requirement or may be dead with the HW.
+.. (fq) Agree. I have delete the hypervisor part so that it is not a general requirement.
+.. may require realtime features in openstack
+
+.. (fq) We may need some discussion about the time constraints? including failure detection time, VNF failover time, warning for abnormal situations. A table might be needed to clearify these. Different level of VNF may require differnent failover time.
+
+.. (MT) I agree. A VNF that manages its own availability with "built-in" redundancy wouldn't really care whether it's 1s or 1min because it would detect the failure and do the failover at the VNF level. But if the availability is managed by the VIM and VNFM then this time becomes critical.
+
+.. (joe) VIM can only rescue or migrate the VM onto anther host in case of hardware failure. The VNF should have being rescalready finish the failover before the failed/fault VM ued or migrated. VIM's responisbility is to keep the number of alive VM instances required by VNF, even for auto scaling, but not to replacethe VNF failover.That's why hardware failure dection message for VIM is not so time sensitive, because VM creation is often a slow task compared to failover(Althoug a lot of technology to accelerate the VM generation speed or use spare VM pool ).
+
+.. (fq) Yes. But here we just mean failure detection, not rescue or migration of the VM. I mean the hardware and NFVI failure should be reported to the VIM and the VNF in a timely manner, then the VNF can do the failover, and the VIM can do the migration and rescue afterwards.
+
+.. (bb) There is confusion regarding time span within which hardware failure should be reported to VIM. In 2nd paragraph(of Hardware HA), it has been mentioned as; "within 50ms" and in this point it is "1s".
+
+.. (fq) I try to modify the 50ms to 1s.
+
+.. (chayi) hard for openstack
+
+.. VNF failover time < 1s
+
+.. (MT) Indeed, it's not designed for that
+
+.. (MT) Do the "hardware failure detection message" and the "alarm of hardware failure" refer to the same notification? It may be better to speak about hardware failure detection (and reporting) time.
+
+.. (fq) I have made the modification. see if it makes sense to you now.
+
+.. (MT) Based on the definition section I think you are talking about these threshold alarms only, because a failure is also an abnormal situation, but you want to detect it within a second
+
+.. (fq) Actually, I want to define Alarm as messages that might lead to failure in the near future, for example, a high tempreture, or maybe a prediction of failure. These alarm maybe important, but they do not need to be answered and solved within seconds.
+
+.. Alarms for abnormal situations and performance decrease (i.e. overuse of cpu)
+.. should be raised to the VIM within 1min(?).
+
+
+.. (MT) There should be possible to set some threshold at which the notification should be triggered and probably ceilometer is not reliable enough to deliver such notifications since it has no real-time requirement nor it is expected to be lossless.
+
+.. (fq) modification made.
+
+.. (MT) agree with the realtime extension part :-)
+
+.. (MT) Considering the modified definitions can we say that: Alarm conditions should be detected and the alarm delivered to the VIM within 1min?
+
+.. This effectively result in two requirements: one on the detection and one on the
+.. delivery mechanism.
+
+.. (fq) Agree. I have made the modification.
+
+
+
+.. In the meantime, I see the discussion of
+.. this requirement is still open.
+
+.. (Yifei) As before I do not think it is needed to send HW fault/failure to VNF. For it is different from traditional interated NF, all the lifecycle of VNF is managed by VNFM.
+
+.. (joe) the HW fault/failure to VNF is required directly for VNF failover purpose. For example, memory or nic failure should be noticed by VNF ASAP, so that the service can be taken over and handled correctly by another VNF instance.
+
+.. (YY) In what case HW failure to VNF directly?Next is my understanding,may be not correct. If cpu/memory fails hostOS may be crashed at the same time the failure occured then no notification could be send to anywhere. If it is not crashed in some well managed smp OS, and if we use cpu-pinning to VM, the vm guestOS may be crashed. If cpu-pinning is not applied to VM, the hypervisor can continue scheduling the VMs on the server just like over-allocation mode. Another point, to accelerate the failover, the failure should be sent to standby service entity not the failed one. The standby vm should not be in same server because of anti-affinity scheme. How can "direct notice" apply?
+
+.. (joe) not all HW fault leads to the VNF will be crushed. For example, the nic can not send packet as usual, then it'll affect the service, but the VNF is still running.
+
+
+.. Maybe 10 min is too long. As far as I know, Zabbix which is used by Doctor can
+.. achieve 60s.
+
+.. (fq) change the constraint to 60s
+
+.. (MT2) I think this applies primarily to storage, network hardware and maybe some controllers, which also run in some type of redundancy e.g. active/active or active/standby. For compute, we need redundancy, but it's more of the spare concept to replace any failed compute in the cluster (e.g. N+1). In this context the failover doesn't mean the recovery of a state, it only means replacing the failed HW with a healthy one in the initial state and that's not transparent at the HW level at least, i.e. the host is not brought up with the same identiy as the failed one.
+
+.. (fq) agree. I have made some modification. I wonder what controller do you mean? is it SDN controller?
+
+.. (MT3) Yes, SDN, storage controllers. I don't know if any of the OpenStack controllers would also have such requirement, e.g. Ironic
+
+
+
+.. (MT) Is it expected for _all_ hardware?
+
+.. (YY) As general requirement should we add that the hardware should allow for
+.. centralized management and control? Maybe we could be even more specific
+.. e.g. what protocol should be supported.
+
+.. (fq) I agree. as far as I know, the protocol we use for hardware include SNMP and IPMI.
+
+.. (MT) OK, we can start with those as minimum requirement, i.e. HW should support at least them. Also I think the Ironic project in OpenStack manages the HW and also supports these. I was thinking maybe it could also be used for the HW management although that's not the general goal of Ironic as far as I know.
+
+***************************
+Network plane Requirements:
+***************************
+
+* The hardware should provide a redundant architecture for the network plane.
+* Failures of the network plane should be reported to the VIM within 1s.
+* QoS should be used to protect against link congestion.
+
+.. (MT) Do you mean the failure of the entire network plane?
+.. (fq) no, I mean the failure of the network connection of a certain HW, or a VNF.
+
+********************
+Power supply system:
+********************
+
+* The power supply architecture should be redundant at the server and site level.
+* Fault of the power supply system should be reported to the VIM within 1s.
+* Failure of a power supply will trigure automatic failover to the redundant supply.
+
+***************
+Cooling system:
+***************
+
+* The architecture of the cooling system should be redundant.
+* Fault of the cooling system should be reported to the VIM within 1s
+* Failure of the cooling systme will trigger automatic failover of the system
+
+***********
+Disk Array:
+***********
+
+* The architecture for the disk array should be redundant.
+* Fault of the disk array should be reported to the VIM within 1s
+* Failure of the the disk array will trigger automatic failover of the system
+ support for protected cache after an unexpected power loss.
+
+* Data shall be stored redundantly in the storage backend
+ (e.g., by means of RAID across disks.)
+* Upon failures of storage hardware components (e.g., disks services, storage
+ nodes) automatic repair mechanisms (re-build/re-balance of data) shall be
+ triggered automatically.
+* Centralized storage arrays shall consist of redundant hardware
+
+********
+Servers:
+********
+
+* Support precise timming with accuracy higher than 4.6ppm
+
+.. (MT2) Should we have time synchronization requirements in the other parts? I.e. having NTP in control nodes or even in all hosts
diff --git a/Section_4_Virtual_Infra.rst b/Section_4_Virtual_Infra.rst
new file mode 100644
index 0000000..7779f6c
--- /dev/null
+++ b/Section_4_Virtual_Infra.rst
@@ -0,0 +1,181 @@
+4.0 Virtual Infrastructure HA – Requirements:
+=============================================
+
+This section is written with the goal to ensure that there is alignment with
+Section 4.2 of the ETSI/NFV REL-001 document.
+
+Key reference requirements from ETSI/NFV document:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+[Req.4.2.12] On the NFVI level, there should be a transparent fail-over in the
+case of for example compute, memory,storage or connectivity failures.
+
+.. (fq) According to VNF part, the following bullet may be added:
+
+* The virtual infrastructure should provide classified virtual resource for
+ different SAL VNFs. Each class of the resources should have guaranteed
+ performance metrics.
+
+* Specific HA handling schemes for each classified virtual resource,
+ e.g. recovery mechanisms, recovery priorities, migration options,
+ should be defined.
+
+* The NFVI should maintain the number of VMs provided to the VNF in the face of
+ failures. I.e. the failed VM instances should be replaced by new VM instances.
+
+.. (MT) this might be a requirement on the hypervisor and/or the
+.. VIM. In this respect I wonder where the nova agent running on the compute node
+.. belongs. Is it the VIM already or the Virtualization Facilities? The reason I'm
+.. asking is that together with the hypervisor they are in a unique position of
+.. correlating different failures on the host that may be due to HW, OS or
+.. hypervisor.
+
+.. (fq) I agree this might be for the hypervisor part. The VNF (i.e.
+.. between VNFCs) may have its own fault detection mechanism, which might be
+.. triggered prior to receiving the error report from the underlying NFVI therefore
+.. the NFVI/VIM should not attempt to preserve the state of a failing VM if not
+.. configured to do so
+
+4.1 Compute
+===========
+
+VM including CPU, memory and ephemeral disk
+
+.. (Yifei) Including noca-compute fq) What do you mean? Yifei) I mean nova-
+.. (compute is important enough for us to define some requirement about it.
+.. (IJ)(Nova-compute is important, but implementation specific, this should be
+.. requirements focused.
+
+Requirements:
+
+* Detection of failures must be sub 1 second.
+* Recovery of a failed VM (VNF) must be automatic. The recovery must re-launch
+ the VM based on the required initial state defined in the VNFD.
+
+.. (MT) I think this is the same essentially as the one brought over from the VNF part in the paragraph above, where I have the question also.
+.. (Yifei) Different mechanisms should be defined according to the SLA of the service running on the VM.
+.. (fq) What do you mean by failure detection? Do you mean hypervisor notice the failure and perform automatic recovery? or do you mean hypervisor notice the failure and inform VIM?
+.. (fq) How to define the time limit for the failure detection? whether 1s is sufficient enough, or we should require for sometime less?
+
+.. Requirements do have some dependency on the NFVI interface definitions that are
+.. currently being defined by ETSI/NFV working groups. Ongoing alignment will
+.. be required.
+
+* On evacuation, fencing of instances from an unreachable host is required.
+
+.. orginal wording for above: Fencing instances of an unreachable host when evacuation happens.[GAP 10]
+
+.. (YY) If a host is unreachable how to evacuate VMs on it? Fencing function may be moved toVIM part.
+.. (fq) copy from the Gap 10:
+
+.. Safe VM evacuation has to be preceded by fencing (isolate, shut down) the failed
+.. host. Failing to do so – when the perceived disconnection is due to some
+.. transient or partial failure – the evacuation might lead into two identical
+.. instances running together and having a dangerous conflict.
+
+.. (unknown commenter) I agree it should be move to VIM part.
+.. (IJ) Not clear what or if the above comment has been moved.
+
+.. (Yifei) In OpenStack, evacuate means that "VMs whose storage is accessible from other nodes (e.g. shared storage) could be rebuilt and restarted on a target node", it is different from migration. link: https://wiki.openstack.org/wiki/Evacuate
+
+* Resources of a migrated VM must be evacuated once the VM is
+ migrated to a different compute node, placement policies must be preserved.
+ For example during maintenance activities.
+
+.. (MT) Do you mean maintenance of the compute node? In any case I think the evacuation should follow the palcement policy.
+.. (fq) Yes. What placement policy do you mean?
+.. (Yifei) e.g. keep the same scheduler hints as before, am I right ,@Maria?
+.. (MT) Yes, the affinity, anti-affinity, etc
+.. (fq) Got it. I am adding a requirement that the evacuation should follow the placement policy.
+.. (fq) insert below.
+
+* Failure detection of the VNF software process is required
+ in order to detect the failure of the VNF sufficiently. Detection should be
+ within less than 1 second.
+
+.. ( may require interface extension)
+
+.. (MT) What do youy mean by the VNF software process? Is it the application(s) running in the VM? If yes, Heat has such consideration already, but I'm only familiar with the first version which was cron job based and therefore the resolution was 1 minute.
+.. (fq) Yes, I mean the applications. 1 min might be too long I am afraid. I think this failure detection should be at least less than the failover time. Otherwise it does not make sense.
+.. (I don't know if 50ms is sufficient enough, since we require the failover of the VNFs should be within 50ms, if the detection is longer than this, there is no meaning to do the detection)
+.. (MT) Do you assume that the entire VM needs to be repaired in case of application failure? Also the question is whether there's a VM ready to failover to. It might be that OpenStack just starts to build the VM when the failover is triggere. If that's the case it can take minutes. If the VM exists then starting it still takes ~half a minute I think.
+.. I think there's a need to have the VM images in shared storage otherwise there's an issue with migration and failover
+.. (fq) I don't mean the recovery of the entire VM. I only mean the failover of the service. In our testing, we use an active /active VM, so it only takes less than 1s to do the failover. I understand the situation you said above. I wonder if we should set a time constraint for such failover? for me, I think such constraint should be less than second.
+.. (Yifei) Maria, I cannot understand " If the VM exists then starting it still takes ~half a minute", would please explain it more detailed? Thank you.
+.. (MT) As far as I know Heat rebuilds the VM from scratch as part of the failure recovery. Once the VM is rebuilt it's booted and only after that it can actualy provide service. This time till the VM is ready to serve can take 20-30sec after the VM is already reported as existing.
+.. ([Yifei) ah, I see. Thank you so much!
+.. (YY) As I understand, what heat provides is not what fuqiao wants here. To failover within 50ms/or 1s means two VMs are all running, in NFVI view there are two VMs running, but in application view one is master the other is standby. What I did not find above is how to monitoring application processes in VM? Tradictionally watchdog is applied to this task. In new version of Qemu watchdog is simulated with software but timeslot of watchdog could not be as narrow as hardware watchdog. I was told lower than 15s may cause fault action.
+.. Do you mean this watchdog? https://libvirt.org/formatdomain.html#elementsWatchdog
+.. (fq) Yes, Yuan Yue got my idea:)
+
+.. 4.2 Storage dedicated section (new section 7).
+.. (GK) please see dedicated section on storage below (Section 7)
+.. Virtual disk and volumes for applications.
+.. Storage related to NFVI must be redundant.
+.. Requirements:
+.. For small systems a small local redundant file system must be supported.
+.. For larger system – replication of data across multiple storage nodes. Processes controlling the storage nodes must also be replicated, such that there is no single point of failure.
+.. Block storage supported by a clustered files system is required.
+.. Should be tranparent to the storage user
+
+4.2 Network
+===========
+
+Virtual network:
+^^^^^^^^^^^^^^^^
+
+Requirements:
+
+* Redundant top of rack switches must be supported as part of the deployment.
+
+.. (MT) Shouldn't this be a HW requirement?
+.. (Yifei) Agree with Maria
+.. (IJ) The ToR is not typically in the NFVI, that is why I put the ToR here.
+
+* Static LAG must be supported to ensure sub 50ms detection and failover of
+ redundant links between nodes. The distributed virtual router should
+ support HA.
+
+.. (Yifei) Add ?: Service provided by Network agents should be keeped availability and continuity. e.g. VRRP is used for L3 agent HA (keepalived or pacemaker)
+.. (IJ) this is a requirements document. Exclude the implementation details. Added the requirement below
+
+* Service provided by network agents should be highly available (L3 Agent, DHCP
+ agent as examples)
+
+* L3-agent, DHCP-agent should clean up network artifacts (IPs, Namespaces) from
+ the database in case of failover.
+
+vSwitch Requirements:
+^^^^^^^^^^^^^^^^^^^^^
+
+* Monitoring and health of vSwitch processes is required.
+* The vSwitch must adapt to changes in network topology and automatically
+ support recovery modes in a transparent manner.
+
+Link Redundancy Requirements:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* The ability to manage redundant interfaces and support of LAG on the compute
+ node is required.
+* Support of LAG on all interfaces, internal platform control
+ interfaces,internal platform storage interfaces, as well as interfaces
+ connecting to provide networks.
+* LACP is optional for dynamic management of LAG links
+* Automated configuration LAG should support active/standby and
+ balanced modes. Should adapt to changes in network topology and automatically
+ support recovery modes in a transparent manner.
+* In SR-IOV scenario, link redundancy could not be transparent, VM should have
+ two ports directly connect to physical port on host. Then app may bind
+ these two ports for HA.
+
+.. (MT) Should we consider also load balancers? I'm not familiar with the LBaaS, but it seems to be key for the load distribution for the multi-VM VNFs.
+.. (YY) As I know LBaaS was not mature this time in openstack. Openstack does provide API for LBaaS,but it depend on LB entity and its plugin. We have not found any mature LB agent and LB entity in community. The LB inside VNF usually approached by VNF itsself.
+.. (fq) I think LB should be taken into consideration as well. eventhough openstack now is not mature. This is how OPNFV is working, we work out requirement for our side, propose possible bp to openstack so that these features can be added in the future releases.
+.. (YIfei) Agree. Because of it is not mature, there is possibility to find gap between OpenStack and our requirement.
+.. (MT) Agree. We may even influence how it matures ;-)
+.. vlb, vFW are part of virtual resources?
+.. (Yifei) From my side, network node.
+.. (Yifei) If you mean LB or FW in NFVI, I do not think vXX is a suitable name as in OpenStack Neutron there are LBaas and FWaas. If you mean VNF, then you can call them vLB and vFW. However i do not think LBaas is the same as vLB, they are different use cases. What we need to consider should be LBaas and FWaas not vLB or vFW.
+.. For more details about LBaas and FWaas, you can find on the wiki page of neutron...
+.. (fq) Thank you for Yifei. I wonder what's the difference between vLB and LBaas. You mean they have different functions?
+.. (IJ) LBaaS is good for enterprise - for Carrier applications won't higher data rates be needed and therefore a Load Balancer in a VNF is probably a better solution. \ No newline at end of file
diff --git a/UseCases/UseCases.rst b/UseCases/UseCases.rst
new file mode 100644
index 0000000..57dfbc0
--- /dev/null
+++ b/UseCases/UseCases.rst
@@ -0,0 +1,731 @@
+============
+HA Use Cases
+============
+
+**************
+1 Introduction
+**************
+
+This use case document outlines the model and failure modes for NFV systems. Its goal is along
+with the requirements documents and gap analysis help set context for engagement with various
+upstream projects. The OPNFV HA project team continuously evolving these documents, and in
+particular this use case document starting with a set of basic use cases.
+
+*****************
+2 Basic Use Cases
+*****************
+
+
+In this section we review some of the basic use cases related to service high availability,
+that is, the availability of the service or function provided by a VNF. The goal is to
+understand the different scenarios that need to be considered and the specific requirements
+to provide service high availability. More complex use cases will be discussed in
+other sections.
+
+With respect to service high availability we need to consider whether a VNF implementation is
+statefull or stateless and if it includes or not an HA manager which handles redundancy.
+For statefull VNFs we can also distinguish the cases when the state is maintained inside
+of the VNF or it is stored in an external shared storage making the VNF itself virtually
+stateless.
+
+Managing availability usually implies a fault detection mechanism, which triggers the
+actions necessary for fault isolation followed by the recovery from the fault.
+This recovery includes two parts:
+
+* the recovery of the service and
+* the repair of the failed entity.
+
+Very often the recovery of the service and the repair actions are perceived to be the same, for
+example, restarting a failed application repairs the application, which then provides the service again.
+Such a restart may take significant time causing service outage, for which redundancy is the solution.
+In cases when the service is protected by redundancy of the providing entities (e.g. application
+processes), the service is "failed over" to the standby or a spare entity, which replaces the
+failed entity while it is being repaired. E.g. when an application process providing the service fails,
+the standby application process takes over providing the service, while the failed one is restarted.
+Such a failover often allows for faster recovery of the service.
+
+We also need to distinguish between the failed and the faulty entities as a fault may or
+may not manifest in the entity containing the fault. Faults may propagate, i.e. cause other entities
+to fail or misbehave, i.e. an error, which in turn might be detected by a different failure or
+error detector entity each of which has its own scope. Similarly, the managers acting on these
+detected errors may have a limited scope. E.g. an HA manager contained in a VNF can only repair
+entities within the VNF. It cannot repair a failed VM, in fact due to the layered architecture
+in the VNF it cannot even know whether the VM failed, its hosting hypervisor, or the physical host.
+But its error detection mechanism will detect the result of such failures - a failure in the VNF -
+and the service can be recovered at the VNF level.
+On the other hand, the failure should be detected in the NFVI and the VIM should repair the failed
+entity (e.g. the VM). Accordingly a failure may be detected by different managers in different layers
+of the system, each of which may react to the event. This may cause interference.
+Thus, to resolve the problem in a consistent manner and completely recover from
+a failure the managers may need to collaborate and coordinate their actions.
+
+Considering all these issues the following basic use cases can be identified (see table 1.).
+These use cases assume that the failure is detected in the faulty entity (VNF component
+or the VM).
+
+
+*Table 1: VNF high availability use cases*
+
++---------+-------------------+----------------+-------------------+----------+
+| | VNF Statefullness | VNF Redundancy | Failure detection | Use Case |
++=========+===================+================+===================+==========+
+| VNF | yes | yes | VNF level only | UC1 |
+| | | +-------------------+----------+
+| | | | VNF & NFVI levels | UC2 |
+| | +----------------+-------------------+----------+
+| | | no | VNF level only | UC3 |
+| | | +-------------------+----------+
+| | | | VNF & NFVI levels | UC4 |
+| +-------------------+----------------+-------------------+----------+
+| | no | yes | VNF level only | UC5 |
+| | | +-------------------+----------+
+| | | | VNF & NFVI levels | UC6 |
+| | +----------------+-------------------+----------+
+| | | no | VNF level only | UC7 |
+| | | +-------------------+----------+
+| | | | VNF & NFVI levels | UC8 |
++---------+-------------------+----------------+-------------------+----------+
+
+As discussed, there is no guarantee that a fault manifests within the faulty entity. For
+example, a memory leak in one process may impact or even crash any other process running in
+the same execution environment. Accordingly, the repair of a failing entity (i.e. the crashed process)
+may not resolve the problem and soon the same or another process may fail within this execution
+environment indicating that the fault has remained in the system.
+Thus, there is a need for extrapolating the failure to a wider scope and perform the
+recovery at that level to get rid of the problem (at least temporarily till a patch is available
+for our leaking process).
+This requires the correlation of repeated failures in a wider scope and the escalation of the
+recovery action to this wider scope. In the layered architecture this means that the manager detecting the
+failure may not be the one in charge of the scope at which it can be resolved, so the escalation needs to
+be forwarded to the manager in charge of that scope, which brings us to an additional use case UC9.
+
+We need to consider for each of these use cases the events detected, their impact on other entities,
+and the actions triggered to recover the service provided by the VNF, and to repair the
+faulty entity.
+
+We are going to describe each of the listed use cases from this perspective to better
+understand how the problem of service high availability can be tackled the best.
+
+Before getting into the details it is worth mentioning the example end-to-end service recovery
+times provided in the ETSI NFV REL document [REL]_ (see table 2.). These values may change over time
+including lowering these thresholds.
+
+*Table 2: Service availability levels (SAL)*
+
++----+---------------+----------------------+------------------------------------+
+|SAL |Service |Customer Type | Recommendation |
+| |Recovery | | |
+| |Time | | |
+| |Threshold | | |
++====+===============+======================+====================================+
+|1 |5 - 6 seconds |Network Operator |Redundant resources to be |
+| | |Control Traffic |made available on-site to |
+| | | |ensure fastrecovery. |
+| | |Government/Regulatory | |
+| | |Emergency Services | |
++----+---------------+----------------------+------------------------------------+
+|2 |10 - 15 seconds|Enterprise and/or |Redundant resources to be available |
+| | |large scale customers |as a mix of on-site and off-site |
+| | | |as appropriate: On-site resources to|
+| | |Network Operators |be utilized for recovery of |
+| | |service traffic |real-time service; Off-site |
+| | | |resources to be utilized for |
+| | | |recovery of data services |
++----+---------------+----------------------+------------------------------------+
+|3 |20 - 25 seconds|General Consumer |Redundant resources to be mostly |
+| | |Public and ISP |available off-site. Real-time |
+| | |Traffic |services should be recovered before |
+| | | |data services |
++----+---------------+----------------------+------------------------------------+
+
+Note that even though SAL 1 of [REL]_ allows for 5-6 seconds of service recovery,
+for many services this is too long and such outage causes a service level reset or
+the loss of significant amount of data. Also the end-to-end service or network service
+may be served by multiple VNFs. Therefore for a single VNF the desired
+service recovery time is sub-second.
+
+Note that failing over the service to another provider entity implies the redirection of the traffic
+flow the VNF is handling. This could be achieved in different ways ranging from floating IP addresses
+to load balancers. The topic deserves its own investigation, therefore in these first set of
+use cases we assume that it is part of the solution without going into the details, which
+we will address as a complementary set of use cases.
+
+.. [REL] ETSI GS NFV-REL 001 V1.1.1 (2015-01)
+
+
+2.1 Use Case 1: VNFC failure in a statefull VNF with redundancy
+==============================================================
+
+Use case 1 represents a statefull VNF with redundancy managed by an HA manager,
+which is part of the VNF (Fig 1). The VNF consists of VNFC1, VNFC2 and the HA Manager.
+The latter managing the two VNFCs, e.g. the role they play in providing the service
+named "Provided NF" (Fig 2).
+
+The failure happens in one of the VNFCs and it is detected and handled by the HA manager.
+On practice the HA manager could be part of the VNFC implementations or it could
+be a separate entity in the VNF. The point is that the communication of these
+entities inside the VNF is not visible to the rest of the system. The observable
+events need to cross the boundary represented by the VNF box.
+
+
+.. figure:: images/Slide4.png
+ :alt: VNFC failure in a statefull VNF
+ :figclass: align-center
+
+ Fig 1. VNFC failure in a statefull VNF with built-in HA manager
+
+
+.. figure:: images/StatefullVNF-VNFCfailure.png
+ :alt: MSC of the VNFC failure in a statefull VNF
+ :figclass: align-center
+
+ Fig 2. Sequence of events for use case 1
+
+
+As shown in Fig 2. initially VNFC2 is active, i.e. provides the Provided NF and VNFC1
+is a standby. It is not shown, but it is expected that VNFC1 has some means to get the update
+of the state of the Provided NF from the active VNFC2, so that it is prepared to continue to
+provide the service in case VNFC2 fails.
+The sequence of events starts with the failure of VNFC2, which also interrupts the
+Provided NF. This failure is detected somehow and/or reported to the HA Manager, which
+in turn may report the failure to the VNFM and simultaneously it tries to isolate the
+fault by cleaning up VNFC2.
+
+Once the cleanup succeeds (i.e. the OK is received) it fails over the active role to
+VNFC1 by setting it active. This recovers the service, the Provided NF is indeed
+provided again. Thus this point marks the end of the outage caused by the failure
+that need to be considered from the perspective of service availability.
+
+The repair of the failed VNFC2, which might have started at the same time
+when VNFC1 was assigned the active state, may take longer but without further impact
+on the availability of the Provided NF service.
+If the HA Manager reported the interruption of the Provided NF to the VNFM, it should
+clear the error condition.
+
+The key points in this scenario are:
+
+* The failure of the VNFC2 is not detectable by any other part of the system except
+ the consumer of the Provided NF. The VNFM only
+ knows about the failure because of the error report, and only the information this
+ report provides. I.e. it may or may not include the information on what failed.
+* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long
+ it takes to repair VNFC2.
+* The HA manager could be part of the VNFM as well. This requires an interface to
+ detect the failures and to manage the VNFC life-cycle and the role assignments.
+
+2.2 Use Case 2: VM failure in a statefull VNF with redundacy
+============================================================
+
+Use case 2 also represents a statefull VNF with its redundancy managed by an HA manager,
+which is part of the VNF. The VNFCs of the VNF are hosted on the VMs provided by
+the NFVI (Fig 3).
+
+The VNF consists of VNFC1, VNFC2 and the HA Manager (Fig 4). The latter managing
+the role the VNFCs play in providing the service - Provided NF.
+The VMs provided by the NFVI are managed by the VIM.
+
+
+In this use case it is one of the VMs hosting the VNF fails. The failure is detected
+and handled at both the NFVI and the VNF levels simultaneously. The coordination occurs
+between the VIM and the VNFM.
+
+
+.. figure:: images/Slide6.png
+ :alt: VM failure in a statefull VNF
+ :figclass: align-center
+
+ Fig 3. VM failure in a statefull VNF with built-in HA manager
+
+
+.. figure:: images/StatefullVNF-VMfailure.png
+ :alt: MSC of the VM failure in a statefull VNF
+ :figclass: align-center
+
+ Fig 4. Sequence of events for use case 2
+
+
+Again initially VNFC2 is active and provides the Provided NF, while VNFC1 is the standby.
+It is not shown in Fig 4., but it is expected that VNFC1 has some means to learn the state
+of the Provided NF from the active VNFC2, so that it is able to continue providing the
+service if VNFC2 fails. VNFC1 is hosted on VM1, while VNFC2 is hosted on VM2 as indicated by
+the arrows between these objects in Fig 4.
+
+The sequence of events starts with the failure of VM2, which results in VNFC2 failing and
+interrupting the Provided NF. The HA Manager detects the failure of VNFC2 somehow
+and tries to handle it the same way as in use case 1. However because the VM is gone the
+clean up either not initiated at all or interrupted as soon as the failure of the VM is
+identified. In either case the faulty VNFC2 is considered as isolated.
+
+To recover the service the HA Manager fails over the active role to VNFC1 by setting it active.
+This recovers the Provided NF. Thus this point marks again the end of the outage caused
+by the VM failure that need to be considered from the perspective of service availability.
+If the HA Manager reported the interruption of the Provided NF to the VNFM, it should
+clear the error condition.
+
+On the other hand the failure of the VM is also detected in the NFVI and reported to the VIM.
+The VIM reports the VM failure to the VNFM, which passes on this information
+to the HA Manager of the VNF. This confirms for the VNF HA Manager the VM failure and that
+it needs to wait with the repair of the failed VNFC2 until the VM is provided again. The
+VNFM also confirms towards the VIM that it is safe to restart the VM.
+
+The repair of the failed VM may take some time, but since the service has been failed over
+to VNFC1 in the VNF, there is no further impact on the availability of Provided NF.
+
+When eventually VM2 is restarted the VIM reports this to the VNFM and
+the VNFC2 can be restored.
+
+The key points in this scenario are:
+
+* The failure of the VM2 is detectable at both levels VNF and NFVI, therefore both the HA
+ manager and the VIM reacts to it. It is essential that these reactions do not interfere,
+ e.g. if the VIM tries to protect the VM state at NFVI level that would conflict with the
+ service failover action at the VNF level.
+* While the failure detection happens at both NFVI and VNF levels, the time frame within
+ which the VIM and the HA manager detect and react may be very different. For service
+ availability the VNF level detection, i.e. by the HA manager is the critical one and expected
+ to be faster.
+* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long
+ it takes to repair VM2 and VNFC2.
+* The HA manager could be part of the VNFM as well.
+ This requires an interface to detect failures in/of the VNFC and to manage its life-cycle and
+ role assignments.
+* The VNFM may not know for sure that the VM failed until the VIM reports it, i.e. whether
+ the VM failure is due to host, hypervisor, host OS failure. Thus the VIM should report/alarm
+ and log VM, hypervisor, and physical host failures. The use cases for these failures
+ are similar with respect to the Provided NF.
+* The VM repair also should start with the fault isolation as appropriate for the actual
+ failed entity, e.g. if the VM failed due to a host failure a host may be fenced first.
+* The negotiation between the VNFM and the VIM may be replaced by configured repair actions.
+ E.g. on error restart VM in initial state, restart VM from last snapshot, or fail VM over to standby.
+
+
+2.3 Use Case 3: VNFC failure in a statefull VNF with no redundancy
+=================================================================
+
+Use case 3 also represents a statefull VNF, but it stores its state externally on a
+virtual disk provided by the NFVI. It has a single VNFC and it is managed by the VNFM
+(Fig 5).
+
+In this use case the VNFC fails and the failure is detected and handled by the VNFM.
+
+
+.. figure:: images/Slide10.png
+ :alt: VNFC failure in a statefull VNF No-Red
+ :figclass: align-center
+
+ Fig 5. VNFC failure in a statefull VNF with no redundancy
+
+
+.. figure:: images/StatefullVNF-VNFCfailureNoRed.png
+ :alt: MSC of the VNFC failure in a statefull VNF No-Red
+ :figclass: align-center
+
+ Fig 6. Sequence of events for use case 3
+
+
+The VNFC periodically checkpoints the state of the Provided NF to the external storage,
+so that in case of failure the Provided NF can be resumed (Fig 6).
+
+When the VNFC fails the Provided NF is interrupted. The failure is detected by the VNFM
+somehow, which to isolate the fault first cleans up the VNFC, then if the cleanup is
+successful it restarts the VNFC. When the VNFC starts up, first it reads the last checkpoint
+for the Provided NF, then resumes providing it. The service outage lasts from the VNFC failure
+till this moment.
+
+The key points in this scenario are:
+
+* The service state is saved in an external storage which should be highly available too to
+ protect the service.
+* The NFVI should provide this guarantee and also that storage and access network failures
+ are handled seemlessly from the VNF's perspective.
+* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. This is
+ not required if the VNF also provides its availability management.
+* The Provided NF can be resumed only after the VNFC is restarted and it has restored the
+ service state from the last checkpoint created before the failure.
+* Having a spare VNFC can speed up the service recovery. This requires that the VNFM coordinates
+ the role each VNFC takes with respect to the Provided NF. I.e. the VNFCs do not act on the
+ stored state simultaneously potentially interfering and corrupting it.
+
+
+
+2.4 Use Case 4: VM failure in a statefull VNF with no redundancy
+===============================================================
+
+Use case 4 also represents a statefull VNF without redundancy, which stores its state externally on a
+virtual disk provided by the NFVI. It has a single VNFC managed by the VNFM
+(Fig 7) as in use case 3.
+
+In this use case the VM hosting the VNFC fails and the failure is detected and handled by
+the VNFM and the VIM simultaneously.
+
+
+.. figure:: images/Slide11.png
+ :alt: VM failure in a statefull VNF No-Red
+ :figclass: align-center
+
+ Fig 7. VM failure in a statefull VNF with no redundancy
+
+.. figure:: images/StatefullVNF-VMfailureNoRed.png
+ :alt: MSC of the VM failure in a statefull VNF No-Red
+ :figclass: align-center
+
+ Fig 8. Sequence of events for use case 4
+
+Again, the VNFC regularly checkpoints the state of the Provided NF to the external storage,
+so that it can be resumed in case of a failure (Fig 8).
+
+When the VM hosting the VNFC fails the Provided NF is interrupted.
+
+On the one hand side, the failure is detected by the VNFM somehow, which to isolate the fault tries
+to clean the VNFC up which cannot be done because of the VM failure. When the absence of the VM has been
+determined the VNFM has to wait with restarting the VNFC until the hosting VM is restored. The VNFM
+may report the problem to the VIM, requesting a repair.
+
+On the other hand the failure is detected in the NFVI and reported to the VIM, which reports it
+to the VNFM, if the VNFM hasn't reported it yet.
+If the VNFM has requested the VM repair or if it acknowledges the repair, the VIM restarts the VM.
+Once the VM is up the VIM reports it to the VNFM, which in turn can restart the VNFC.
+
+When the VNFC restarts first it reads the last checkpoint for the Provided NF,
+to be able to resume it.
+The service outage last until this is recovery completed.
+
+The key points in this scenario are:
+
+
+* The service state is saved in external storage which should be highly available to
+ protect the service.
+* The NFVI should provide such a guarantee and also that storage and access network failures
+ are handled seemlessly from the perspective of the VNF.
+* The Provided NF can be resumed only after the VM and the VNFC are restarted and the VNFC
+ has restored the service state from the last checkpoint created before the failure.
+* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. Alternatively
+ the VNF may also provide its availability management.
+* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot
+ distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log
+ VM, hypervisor, and physical host failures. The use cases for these failures are
+ similar with respect to the Provided NF.
+* The VM repair also should start with the fault isolation as appropriate for the actual
+ failed entity, e.g. if the VM failed due to a host failure a host may be fenced first.
+* The negotiation between the VNFM and the VIM may be replaced by configured repair actions.
+* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service
+ recovery for this use case, but by itself it may not protect against VNFC level failures. I.e.
+ VNFC level error detection is still required.
+
+
+
+2.5 Use Case 5: VNFC failure in a stateless VNF with redundancy
+===============================================================
+
+Use case 5 represents a stateless VNF with redundancy, i.e. it is composed of VNFC1 and VNFC2.
+They are managed by an HA manager within the VNF. The HA manager assigns the active role to provide
+the Provided NF to one of the VNFCs while the other remains a spare meaning that it has no state
+information for the Provided NF (Fig 9) therefore it could replace any other VNFC capable of
+providing the Provided NF service.
+
+In this use case the VNFC fails and the failure is detected and handled by the HA manager.
+
+
+.. figure:: images/Slide13.png
+ :alt: VNFC failure in a stateless VNF with redundancy
+ :figclass: align-center
+
+ Fig 9. VNFC failure in a stateless VNF with redundancy
+
+
+.. figure:: images/StatelessVNF-VNFCfailure.png
+ :alt: MSC of the VNFC failure in a stateless VNF with redundancy
+ :figclass: align-center
+
+ Fig 10. Sequence of events for use case 5
+
+
+Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not even been instantiated
+yet (Fig 10).
+
+When VNFC2 fails the Provided NF is interrupted. This failure is detected by the HA manager,
+which as a first reaction cleans up VNFC2 (fault isolation), then it assigns the active role to
+VNFC1. It may report an error to the VNFM as well.
+
+Since there is no state information to recover, VNFC1 can accept the active role right away
+and resume providing the Provided NF service. Thus the service outage is over. If the HA manager
+reported an error to the VNFM it should clear it at this point.
+
+The key points in this scenario are:
+
+* The spare VNFC may be instantiated only once the failure of active VNFC is detected.
+* As a result the HA manager's role might be limited to life-cycle management, i.e. no role
+ assignment is needed if the VNFCs provide the service as soon as they are started up.
+* Accordingly the HA management could be part of a generic VNFM provided it is capable of detecting
+ the VNFC failures. Besides the service users, the VNFC failure may not be detectable at any other
+ part of the system.
+* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby
+ may protect all of them.
+* Reporting the service failure to the VNFM is optional as the HA manager is in charge of recovering
+ the service and it is aware of the redundancy needed to do so.
+
+
+2.6 Use Case 6: VM failure in a stateless VNF with redundancy
+============================================================
+
+
+Similarly to use case 5, use case 6 represents a stateless VNF composed of VNFC1 and VNFC2,
+which are managed by an HA manager within the VNF. The HA manager assigns the active role to
+provide the Provided NF to one of the VNFCs while the other remains a spare meaning that it has
+no state information for the Provided NF (Fig 11) and it could replace any other VNFC capable
+of providing the Provided NF service.
+
+As opposed to use case 5 in this use case the VM hosting one of the VNFCs fails. This failure is
+detected and handled by the HA manager as well as the VIM.
+
+
+.. figure:: images/Slide14.png
+ :alt: VM failure in a stateless VNF with redundancy
+ :figclass: align-center
+
+ Fig 11. VM failure in a stateless VNF with redundancy
+
+
+.. figure:: images/StatelessVNF-VMfailure.png
+ :alt: MSC of the VM failure in a stateless VNF with redundancy
+ :figclass: align-center
+
+ Fig 12. Sequence of events for use case 6
+
+
+Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not have been instantiated
+yet (Fig 12) as in use case 5.
+
+When VM2 fails VNFC2 fails with it and the Provided NF is interrupted. The failure is detected by
+the HA manager and by the VIM simultaneously and independently.
+
+The HA manager's first reaction is trying to clean up VNFC2 to isolate the fault. This is considered to
+be successful as soon as the disappearance of the VM is confirmed.
+After this the HA manager assigns the active role to VNFC1. It may report the error to the VNFM as well
+requesting a VM repair.
+
+Since there is no state information to recover, VNFC1 can accept the assignment right away
+and resume the Provided NF service. Thus the service outage is over. If the HA manager reported
+an error to the VNFM for the service it should clear it at this point.
+
+Simultaneously the VM failure is detected in the NFVI and reported to the VIM, which reports it
+to the VNFM, if the VNFM hasn't requested a repair yet. If the VNFM requested the VM repair or if
+it acknowledges the repair, the VIM restarts the VM.
+
+Once the VM is up the VIM reports it to the VNFM, which in turn may restart the VNFC if needed.
+
+
+The key points in this scenario are:
+
+* The spare VNFC may be instantiated only after the detection of the failure of the active VNFC.
+* As a result the HA manager's role might be limited to life-cycle management, i.e. no role
+ assignment is needed if the VNFC provides the service as soon as it is started up.
+* Accordingly the HA management could be part of a generic VNFM provided if it is capable of detecting
+ failures in/of the VNFC and managing its life-cycle.
+* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby
+ may protect all of them.
+* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot
+ distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log
+ VM, hypervisor, and physical host failures. The use cases for these failures are
+ similar with respect to each Provided NF.
+* The VM repair also should start with the fault isolation as appropriate for the actual
+ failed entity, e.g. if the VM failed due to a host failure a host needs to be fenced first.
+* The negotiation between the VNFM and the VIM may be replaced by configured repair actions.
+* Reporting the service failure to the VNFM is optional as the HA manager is in charge recovering
+ the service and it is aware of the redundancy needed to do so.
+
+
+
+2.7 Use Case 7: VNFC failure in a stateless VNF with no redundancy
+==================================================================
+
+Use case 7 represents a stateless VNF composed of a single VNFC, i.e. with no redundancy.
+The VNF and in particular its VNFC is managed by the VNFM through managing its life-cycle (Fig 13).
+
+In this use case the VNFC fails. This failure is detected and handled by the VNFM. This use case
+requires that the VNFM can detect the failures in the VNF or they are reported to the VNFM.
+
+The failure is only detectable at the VNFM level and it is handled by the VNFM restarting the VNFC.
+
+
+.. figure:: images/Slide16.png
+ :alt: VNFC failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 13. VNFC failure in a stateless VNF with no redundancy
+
+
+.. figure:: images/StatelessVNF-VNFCfailureNoRed.png
+ :alt: MSC of the VNFC failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 14. Sequence of events for use case 7
+
+The VNFC is providing the Provided NF when it fails (Fig 14). This failure is detected or reported to
+the VNFM, which has to clean up the VNFC to isolate the fault. After cleanup success it can proceed
+with restarting the VNFC, which as soon as it is up it starts to provide the Provided NF
+as there is no state to recover.
+
+Thus the service outage is over, but it has included the entire time needed to restart the VNFC.
+Considering that the VNF is stateless this may not be significant still.
+
+
+The key points in this scenario are:
+
+* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately.
+ This is not required if the VNF comes with its availability management, but this is very unlikely
+ for such stateless VNFs.
+* The Provided NF can be resumed as soon as the VNFC is restarted, i.e. the restart time determines
+ the outage.
+* In case multiple VNFCs are used they should not interfere with one another, they should
+ operate independently.
+
+
+2.8 Use Case 8: VM failure in a stateless VNF with no redundancy
+================================================================
+
+Use case 8 represents the same stateless VNF composed of a single VNFC as use case 7, i.e. with
+no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its
+life-cycle (Fig 15).
+
+In this use case the VM hosting the VNFC fails. This failure is detected and handled by the VNFM
+as well as by the VIM.
+
+
+.. figure:: images/Slide17.png
+ :alt: VM failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 15. VM failure in a stateless VNF with no redundancy
+
+
+.. figure:: images/StatelessVNF-VMfailureNoRed.png
+ :alt: MSC of the VM failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 16. Sequence of events for use case 8
+
+The VNFC is providing the Provided NF when the VM hosting the VNFC fails (Fig 16).
+
+This failure may be detected or reported to the VNFM as a failure of the VNFC. The VNFM may
+not be aware at this point that it is a VM failure. Accordingly its first reaction as in use case 7
+is to clean up the VNFC to isolate the fault. Since the VM is gone, this cannot succeed and the VNFM
+becomes aware of the VM failure through this or it is reported by the VIM. In either case it has to wait
+with the repair of the VMFC until the VM becomes available again.
+
+Meanwhile the VIM also detects the VM failure and reports it to the VNFM unless the VNFM has already
+requested the VM repair. After the VNFM confirming the VM repair the VIM restarts the VM and reports
+the successful repair to the VNFM, which in turn can start the VNFC hosted on it.
+
+
+Thus the recovery of the Provided NF includes the restart time of the VM and of the VNFC.
+
+The key points in this scenario are:
+
+* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately.
+ This is not required if the VNF comes with its availability management, but this is very unlikely
+ for such stateless VNFs.
+* The Provided NF can be resumed only after the VNFC is restarted on the repaired VM, i.e. the
+ restart time of the VM and the VNFC determines the outage.
+* In case multiple VNFCs are used they should not interfere with one another, they should
+ operate independently.
+* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot
+ distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log
+ VM, hypervisor, and physical host failures. The use cases for these failures are
+ similar with respect to each Provided NF.
+* The VM repair also should start with the fault isolation as appropriate for the actual
+ failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first.
+* The repair negotiation between the VNFM and the VIM may be replaced by configured repair actions.
+* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service
+ recovery for this use case, but by itself it may not protect against VNFC level failures. I.e.
+ VNFC level error detection is still required.
+
+2.9 Use Case 9: Repeated VNFC failure in a stateless VNF with no redundancy
+===========================================================================
+
+Finally use case 9 represents again a stateless VNF composed of a single VNFC as in use case 7, i.e.
+with no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its
+life-cycle.
+
+In this use case the VNFC fails repeatedly. This failure is detected and handled by the VNFM,
+but results in no resolution of the fault (Fig 17) because the VNFC is manifesting a fault,
+which is not in its scope. I.e. the fault is propagating to the VNFC from a faulty VM or host,
+for example. Thus the VNFM cannot resolve the problem by itself.
+
+
+.. figure:: images/Slide19.png
+ :alt: Repeated VNFC failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 17. VM failure in a stateless VNF with no redundancy
+
+
+To handle this case the failure handling needs to be escalated to the a bigger fault zone
+(or fault domain), i.e. a scope within which the faults may propagate and manifest. In case of the
+VNF the bigger fault zone is the VM and the facilities hosting it, all managed by the VIM.
+
+Thus the VNFM should request the repair from the VIM (Fig 18).
+
+Since the VNFM is only aware of the VM, it needs to report an error on the VM and it is the
+VIM's responsibility to sort out what might be the scope of the actual fault depending on other
+failures and error reports in its scope.
+
+
+.. figure:: images/Slide20.png
+ :alt: Escalation of repeated VNFC failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 18. VM failure in a stateless VNF with no redundancy
+
+
+.. figure:: images/StatelessVNF-VNFCfailureNoRed-Escalation.png
+ :alt: MSC of the VM failure in a stateless VNF with no redundancy
+ :figclass: align-center
+
+ Fig 19. Sequence of events for use case 9
+
+
+This use case starts similarly to use case 7, i.e. the VNFC is providing the Provided NF when it fails
+(Fig 17).
+This failure is detected or reported to the VNFM, which cleans up the VNFC to isolate the fault.
+After successful cleanup the VNFM proceeds with restarting the VNFC, which as soon as it is up
+starts to provide the Provided NF again as in use case 7.
+
+However the VNFC failure occurs N times repeatedly within some Probation time for which the VNFM starts
+the timer when it detects the first failure of the VNFC. When the VNFC fails once more still within the
+probation time the Escalation counter maximum is exceeded and the VNFM reports an error to the VIM on
+the VM hosting the VNFC as obviously cleaning up and restarting the VNFC did not solve the problem.
+
+When the VIM receives the error report for the VM it has to isolate the fault by cleaning up at least
+the VM. After successful cleanup it can restart the VM and once it is up report the VM repair to the VNFM.
+At this point the VNFM can restart the VNFC, which in turn resumes the Provided VM.
+
+In this scenario the VIM needs to evaluate what may be the scope of the fault to determine what entity
+needs a repair. For example, if it has detected VM failures on that same host, or other VNFMs
+reported errors on VMs hosted on the same host, it should consider that the entire host needs a repair.
+
+
+The key points in this scenario are:
+
+* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately.
+ This is not required if the VNF comes with its availability management, but this is very unlikely
+ for such stateless VNFs.
+* The VNFM needs to correlate VNFC failures over time to be able to detect failure of a bigger fault zone.
+ One way to do so is through counting the failures within a probation time.
+* The VIM cannot detect all failures caused by faults in the entities under its control. It should be
+ able to receive error reports and correlate these error reports based on the dependencies
+ of the different entities.
+* The VNFM does not know the source of the failure, i.e. the faulty entity.
+* The VM repair should start with the fault isolation as appropriate for the actual
+ failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first.
+
+********************
+3 Concluding remarks
+********************
+
+This use case document outlined the model and some failure modes for NFV systems. These are an
+initial list. The OPNFV HA project team is continuing to grow the list of use cases and will
+issue additional documents going forward. The basic use cases and service availability considerations
+help define the key considerations for each use case taking into account the impact on the end service.
+The use case document along with the requirements documents and gap analysis help set context for
+engagement with various upstream projects.
diff --git a/UseCases/images/Slide10.png b/UseCases/images/Slide10.png
new file mode 100644
index 0000000..b3545e8
--- /dev/null
+++ b/UseCases/images/Slide10.png
Binary files differ
diff --git a/UseCases/images/Slide11.png b/UseCases/images/Slide11.png
new file mode 100644
index 0000000..3aa5f67
--- /dev/null
+++ b/UseCases/images/Slide11.png
Binary files differ
diff --git a/UseCases/images/Slide13.png b/UseCases/images/Slide13.png
new file mode 100644
index 0000000..207c4a7
--- /dev/null
+++ b/UseCases/images/Slide13.png
Binary files differ
diff --git a/UseCases/images/Slide14.png b/UseCases/images/Slide14.png
new file mode 100644
index 0000000..e6083c9
--- /dev/null
+++ b/UseCases/images/Slide14.png
Binary files differ
diff --git a/UseCases/images/Slide16.png b/UseCases/images/Slide16.png
new file mode 100644
index 0000000..484ffa2
--- /dev/null
+++ b/UseCases/images/Slide16.png
Binary files differ
diff --git a/UseCases/images/Slide17.png b/UseCases/images/Slide17.png
new file mode 100644
index 0000000..7240aaa
--- /dev/null
+++ b/UseCases/images/Slide17.png
Binary files differ
diff --git a/UseCases/images/Slide19.png b/UseCases/images/Slide19.png
new file mode 100644
index 0000000..7e3c10b
--- /dev/null
+++ b/UseCases/images/Slide19.png
Binary files differ
diff --git a/UseCases/images/Slide20.png b/UseCases/images/Slide20.png
new file mode 100644
index 0000000..2e9759b
--- /dev/null
+++ b/UseCases/images/Slide20.png
Binary files differ
diff --git a/UseCases/images/Slide4.png b/UseCases/images/Slide4.png
new file mode 100644
index 0000000..a701f42
--- /dev/null
+++ b/UseCases/images/Slide4.png
Binary files differ
diff --git a/UseCases/images/Slide6.png b/UseCases/images/Slide6.png
new file mode 100644
index 0000000..04a904f
--- /dev/null
+++ b/UseCases/images/Slide6.png
Binary files differ
diff --git a/UseCases/images/StatefullVNF-VMfailure.png b/UseCases/images/StatefullVNF-VMfailure.png
new file mode 100644
index 0000000..2f62232
--- /dev/null
+++ b/UseCases/images/StatefullVNF-VMfailure.png
Binary files differ
diff --git a/UseCases/images/StatefullVNF-VMfailureNoRed.png b/UseCases/images/StatefullVNF-VMfailureNoRed.png
new file mode 100644
index 0000000..6f3058d
--- /dev/null
+++ b/UseCases/images/StatefullVNF-VMfailureNoRed.png
Binary files differ
diff --git a/UseCases/images/StatefullVNF-VNFCfailure.png b/UseCases/images/StatefullVNF-VNFCfailure.png
new file mode 100644
index 0000000..9021f2d
--- /dev/null
+++ b/UseCases/images/StatefullVNF-VNFCfailure.png
Binary files differ
diff --git a/UseCases/images/StatefullVNF-VNFCfailureNoRed.png b/UseCases/images/StatefullVNF-VNFCfailureNoRed.png
new file mode 100644
index 0000000..4fd7e2e
--- /dev/null
+++ b/UseCases/images/StatefullVNF-VNFCfailureNoRed.png
Binary files differ
diff --git a/UseCases/images/StatelessVNF-VMfailure.png b/UseCases/images/StatelessVNF-VMfailure.png
new file mode 100644
index 0000000..9b94183
--- /dev/null
+++ b/UseCases/images/StatelessVNF-VMfailure.png
Binary files differ
diff --git a/UseCases/images/StatelessVNF-VMfailureNoRed.png b/UseCases/images/StatelessVNF-VMfailureNoRed.png
new file mode 100644
index 0000000..2a14b67
--- /dev/null
+++ b/UseCases/images/StatelessVNF-VMfailureNoRed.png
Binary files differ
diff --git a/UseCases/images/StatelessVNF-VNFCfailure.png b/UseCases/images/StatelessVNF-VNFCfailure.png
new file mode 100644
index 0000000..f2dcc3b
--- /dev/null
+++ b/UseCases/images/StatelessVNF-VNFCfailure.png
Binary files differ
diff --git a/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png b/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png
new file mode 100644
index 0000000..6719177
--- /dev/null
+++ b/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png
Binary files differ
diff --git a/UseCases/images/StatelessVNF-VNFCfailureNoRed.png b/UseCases/images/StatelessVNF-VNFCfailureNoRed.png
new file mode 100644
index 0000000..a0970fc
--- /dev/null
+++ b/UseCases/images/StatelessVNF-VNFCfailureNoRed.png
Binary files differ
diff --git a/docs/etc/conf.py b/docs/etc/conf.py
new file mode 100644
index 0000000..0066035
--- /dev/null
+++ b/docs/etc/conf.py
@@ -0,0 +1,34 @@
+import datetime
+import sys
+import os
+
+try:
+ __import__('imp').find_module('sphinx.ext.numfig')
+ extensions = ['sphinx.ext.numfig']
+except ImportError:
+ # 'pip install sphinx_numfig'
+ extensions = ['sphinx_numfig']
+
+# numfig:
+number_figures = True
+figure_caption_prefix = "Fig."
+
+source_suffix = '.rst'
+master_doc = 'index'
+pygments_style = 'sphinx'
+html_use_index = False
+
+pdf_documents = [('index', u'OPNFV', u'OPNFV Project', u'OPNFV')]
+pdf_fit_mode = "shrink"
+pdf_stylesheets = ['sphinx','kerning','a4']
+#latex_domain_indices = False
+#latex_use_modindex = False
+
+latex_elements = {
+ 'printindex': '',
+}
+
+project = u'OPNFV: Template documentation config'
+copyright = u'%s, OPNFV' % datetime.date.today().year
+version = u'1.0.0'
+release = u'1.0.0'
diff --git a/docs/etc/opnfv-logo.png b/docs/etc/opnfv-logo.png
new file mode 100644
index 0000000..1519503
--- /dev/null
+++ b/docs/etc/opnfv-logo.png
Binary files differ
diff --git a/docs/how-to-use-docs/documentation-example.rst b/docs/how-to-use-docs/documentation-example.rst
new file mode 100644
index 0000000..afcf758
--- /dev/null
+++ b/docs/how-to-use-docs/documentation-example.rst
@@ -0,0 +1,86 @@
+.. two dots create a comment. please leave this logo at the top of each of your rst files.
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+.. these two pipes are to seperate the logo from the first title
+|
+|
+How to create documentation for your OPNFV project
+==================================================
+
+this is the directory structure of the docs/ directory that can be found in the root of your project directory
+
+.. code-block:: bash
+
+ ./etc
+ ./etc/opnfv-logo.png
+ ./etc/conf.py
+ ./how-to-use-docs
+ ./how-to-use-docs/documentation-example.rst
+ ./how-to-use-docs/index.rst
+
+To create your own documentation, Create any number of directories (depending on your need) and place in each of them an index.rst.
+This index file must refence your other rst files.
+
+* Here is an example index.rst
+
+.. code-block:: bash
+
+ Example Documentation table of contents
+ =======================================
+
+ Contents:
+
+ .. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ documentation-example.rst
+
+ Indices and tables
+ ==================
+
+ * :ref:`search`
+
+ Revision: _sha1_
+
+ Build date: |today|
+
+
+The Sphinx Build
+================
+
+When you push documentation changes to gerrit a jenkins job will create html documentation.
+
+* Verify Jobs
+For verify jobs a link to the documentation will show up as a comment in gerrit for you to see the result.
+
+* Merge jobs
+
+Once you are happy with the look of your documentation you can submit the patchset the merge job will
+copy the output of each documentation directory to http://artifacts.opnfv.org/$project/docs/$name_of_your_folder/index.html
+
+Here are some quick examples of how to use rst markup
+
+This is a headline::
+
+ here is some code, note that it is indented
+
+links are easy to add: Here is a link to sphinx, the tool that we are using to generate documetation http://sphinx-doc.org/
+
+* Bulleted Items
+
+ **this will be bold**
+
+.. code-block:: bash
+
+ echo "Heres is a code block with bash syntax highlighting"
+
+
+Leave these at the bottom of each of your documents they are used internally
+
+Revision: _sha1_
+
+Build date: |today|
diff --git a/docs/how-to-use-docs/index.rst b/docs/how-to-use-docs/index.rst
new file mode 100644
index 0000000..36710b3
--- /dev/null
+++ b/docs/how-to-use-docs/index.rst
@@ -0,0 +1,30 @@
+.. OPNFV Release Engineering documentation, created by
+ sphinx-quickstart on Tue Jun 9 19:12:31 2015.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+.. image:: ../etc/opnfv-logo.png
+ :height: 40
+ :width: 200
+ :alt: OPNFV
+ :align: left
+
+Example Documentation table of contents
+=======================================
+
+Contents:
+
+.. toctree::
+ :numbered:
+ :maxdepth: 4
+
+ documentation-example.rst
+
+Indices and tables
+==================
+
+* :ref:`search`
+
+Revision: _sha1_
+
+Build date: |today|