diff options
47 files changed, 0 insertions, 5264 deletions
diff --git a/Requirement/HA Requirement.pdf b/Requirement/HA Requirement.pdf Binary files differdeleted file mode 100644 index 7fd3c61..0000000 --- a/Requirement/HA Requirement.pdf +++ /dev/null diff --git a/Requirement/HA Requirement.rst b/Requirement/HA Requirement.rst deleted file mode 100644 index ee28c39..0000000 --- a/Requirement/HA Requirement.rst +++ /dev/null @@ -1,1153 +0,0 @@ -.. image:: opnfv-logo.png - :height: 40 - :width: 200 - :alt: OPNFV - :align: left - - -=================================================== -1 Overall Principle for High Availability in NFV -=================================================== - -The ultimate goal for the High Availability schema is to provide high -availability to the upper layer services. - -High availability is provided by the following steps once a failure happens: - - Step 1: failover of services once failure happens and service is out of work - - Step 2: Recovery of failed parts in each layer. - -****************************************** -1.1 Framework for High Availability in NFV -****************************************** - -Framework for Carrier Grade High availability: - -A layered approach to availability is required for the following reasons: - -* fault isolation -* fault tolerance -* fault recovery - -Among the OPNFV projects the OPNFV-HA project's focus is on requirements related -to service high availability. This is complemented by other projects such as the -OPNFV - Doctor project, whose focus is reporting and management of faults along -with maintenance, the OPNFV-Escalator project that considers the upgrade of the -NFVI and VIM, or the OPNFV-Multisite that adds geographical redundancy to the -picture. - -A layered approach allows the definition of failure domains (e.g., the -networking hardware, the distributed storage system, etc.). If possible, a fault -shall be handled at the layer (failure domain) where it occurs. If a failure -cannot be handled at its corresponding layer, the next higher layer needs to be -able to handle it. In no case, shall a failure cause cascading failures at other -layers. - -The layers are: - - -+---------------------------+-------------------------------------+ -+ Service + End customer visible service | -+===========================+=====================================+ -+ Application + VNF's, VNFC's | -+---------------------------+-------------------------------------+ -+ NFVI/VIM + Infrastructure, VIM, VNFM, VM | -+---------------------------+-------------------------------------+ -+ Hardware + Servers, COTS platforms | -+---------------------------+-------------------------------------+ - -The following document describes the various layers and how they need to -address high availability. - -************** -1.2 Definitons -************** - -Reference from the ETSI NFV doc. - -**Availability:** Availability of an item to be in a state to perform a required -function at a given instant of time or at any instant of time within a given -time interval, assuming that the external resources, if required, are provided. - -**Accessibility:** It is the ability of a service to access (physical) resources -necessary to provide that service. If the target service satisfies the minimum -level of accessibility, it is possible to provide this service to end users. - -**Admission control:** It is the administrative decision (e.g. by operator's -policy) to actually provide a service. In order to provide a more stable and -reliable service, admission control may require better performance and/or -additional resources than the minimum requirement. Failure: deviation of the -delivered service from fulfilling the system function. - -**Fault:** adjudged or hypothesized cause of an error - -**Service availability:** service availability of <Service X> is the long-term -average of the ratio of aggregate time between interruptions to scheduled -service time of <ServiceX> (expressed as a percentage) on a user-to-user basis. -The time between interruptions is categorized as Available (Up time) using the -availability criteria as defined by the parameter thresholds that are relevant -for <Service X>. - -Accoring to the ETSI GS NFV-REL 001 V1.1.1 (2015-01) document service -availability in the context of NFV is defined as End-to-End Service availability - -.. (MT) The relevant parts in NFV-REL defines SA as: - -Service Availability refers to the End-to-End Service Availability which -includes all the elements in the end-to-end service (VNFs and infrastructure -components) with the exception of the customer terminal. This is a customer -facing (end user) availability definition and it is the result of accessibility -and #admission control (see their respective definitions above). - -Service Availability=total service available time/ - (total service available time + total restoration time) - -**Service continuity:** Continuous delivery of service in conformance with -service's functional and behavioural specification and SLA requirements, -both in the control and data planes, for any initiated transaction or session -until its full completion even in the events of intervening exceptions or -anomalies, whether scheduled or unscheduled, malicious, intentional -or unintentional. - -The relevant parts in NFV-REL: -The basic property of service continuity is that the same service is provided -during VNF scaling in/out operations, or when the VNF offering that service -needs to be relocated to another site due to an anomaly event -(e.g. CPU overload, hardware failure or security threat). - -**Service failover:** when the instance providing a service/VNF becomes -unavailable due to fault or failure, another instance will (automatically) take -over the service, and this whole process is transparent to the user. It is -possible that an entire VNF instance becomes unavailble while providing its -service. - -.. (MT) I think the service or an instance of it is a logical entity on its own and the service availability and continuity is with respect to this logical entity. For examlpe if a HTTP server serves a given URL, the HTTP server is the provider while that URL is the service it is providing. As long as I have an HTTP server running and serving this URL I have the service available. But no matter how many HTTP servers I'm running if they are not assigned to serve the URL, then it is not available. Unfortunately in the ETSI NFV documents there's not a clear distinction between the service and the provider logical entities. The distinction is more on the level of the different incarnations of the provider entity, i.e. VNF and its instances or VNFC and its instances. I don't know if I'm clear enough and to what extent we should go into this, but I tried to modify the definition along these lines. Now regarding the user perception and whether it's automatic I agreed that we want it automatic and seemless for the user, but I don't think that this is part of the failover definition. If it's done manually or if the user detects it it's still a failover. It's just not seemless. Requiring it being automatic and seemless should be in the requirement section as appropriate. - -.. (fq) Agree. - -**Service failover time:** Service failover is when the instance providing a -service becomes unavailable due to a fault or a failure and another healthy -instance takes over in providing the service. In the HA context this should be -an automatic action and this whole process should be transparent to the user. -It is possible that an entire VNF instance becomes unavailble while providing -its service. - -.. (MT) Aligned with the above I would say that the serice failover time is the time from the moment of detecting the failure of the instance providing the service until the service is provided again by a new instance. - -.. (fq) So in such definition, the time duration for the failure of the service=failure detection time+service failover time. Am I correct? - -.. (bb) I feel, it is; "time duration for failover of the service = failure detection time + service failover time". -.. (MT) I would say that the "failure detection time" + "service failover time" = "service outage time" or actually we defined it below as the "service recovery time" . To reduce the outage we probably can't do much with the "service failover time", it is whatever is needed to perform the failover procedure, so it's tied to the implementation. It's somewhat "given". We may have more control over the detection time as that depends on the frequency of the health-check/heartbeat as this is often configurable. - -.. (fq) Got it. Agree. - -**Failure detection:** If a failure is detected, the failure must be identified -to the component responsible for correction. - -.. (MT) I would rather say "failure detection" as the fault is not detectable until it becomes a failure, even then we may not know where the actual fault is. We only know what failed due to the fault. E.g. we can detect the memory leak, something may crash due to it, but it's much more difficult to figure out where the fault is, i.e. the bug in the software. - -.. (MT) Also I think failures may be detected by different entities in the system, e.g. it could be a monitoring entity, a watchdog, the hypervisor, the VNF itself or a VNF tryng to use the services of a failed VNF. For me all these are failure detections regardless whether they are reported to the VNF. I think from an HA perspective what's important is the error report API(s) that entities should use if they detect a failure they are not in charge of correcting. -.. (fq) Agree. I modify the definition. - -**Failure detection time:** Failure detection time is the time interval from the -moment the failure occurs till it is reported as a detected failure. - -**Alarm:** Alarms are notifications (not queried) that are activated in response -to an event, a set of conditions, or the state of an inventory object. They -also require attention from an entity external to the reporting entity (if not -then the entity should cope with it and not raise the alarm). - -.. (MT) According to NFV-INF 004: Alarms are notifications (not queried) that are activated in response to an event, a set of conditions, or the state of an inventory object. I would add also that they also require attention from an entity external to the reporting entity (if not then the entity should cope with it and not raise the alarm). - -**Alarm threshold condition detection:** Alarm threshold condition is detected -by the component responsible for it. The component periodically evaluates the -condition associated with the alarm and if the threshold is reached, it -generates an alarm on the approprite channel, which in turn delivers it to the -entity(ies) responsible, such as the VIM. - -.. (fq) I don't think the VNF need to know all the alarm. so I use VIM as the terminal point for the alarm detection - -.. (MT) The same way as for the faults/failures, I don't think it's the receiving end that is important but the generatitng end and that it has the right and appropriate channel to communicate the alarm. But I have the impression that you are focusing on a particular type of alarm (i.e. threshold alarm) and not alarms in general. - -.. (fq) Yes, I actully have the threshold alarm in my mind when I wrote this. So I think VIM might be the right receiving end for these alarm. I agree with your ideas about the right channel. I am just not sure whether we should put this part in a high lever perspective or we should define some details. After all OPNFV is an opensource project and we don't want it to be like standarization projects in ETSI. But I agree for the definition part we should have a high level and abstract definition for these, and then we can specify the detail channels in the API definition. - -.. (MT) I tried to modify accordingly. Pls check. I think when it comes to the receiver we don't need to be specific from the detection perspective as usually there is a well-known notification channel that the management entity if it exists would listen to. The alarm detection does not require this entity, it just states that something is wrong and someone should deal with it hence the alarm. - -**Alarm threshold detection time:** the threshold time interval between the -metrics exceeding the threshold and the alarm been detected. - -.. (MT) I assume you are focusing on these threshold alarms, and not alarms in general. -.. (MT) Here similar to the failover time, we may have some control over the detection time (i.e. shorten the evaluation period), but may not on the delivery time. -.. (MT2) I changed "condition" to "threshold" to make it clearer as failure is a "condition" too :-) - -**Service recovery:** The restoration of the service state after the instance of -a service/VNF is unavailable due to fault or failure or manual interuption. - -.. (MT) I think the service recovery is the restoration of the state in which the required function is provided - -**Service recovery time:** Service recovery time is the time interval from the -occurrence of an abnormal event (e.g. failure, manual interruption of service, -etc.) until recovery of the service. - -.. (MT) in NFV-REL: Service recovery time is the time interval from the occurrence of an abnormal event (e.g. failure, manual interruption of service, etc.) until recovery of the service. - -**SAL:** Service Availability Level - -************************ -1.3 Overall requirements -************************ - -Service availability shall be considered with respect to the delivery of end to -end services. - -* There should be no single point of failure in the NFV framework -* All resiliency mechanisms shall be designed for a multi-vendor environment, - where for example the NFVI, NFV-MANO, and VNFs may be supplied by different - vendors. -* Resiliency related information shall always be explicitly specified and - communicated using the reference interfaces (including policies/templates) of - the NFV framework. - -********************* -1.4 Time requirements -********************* - -The time requirements below are examples in order to break out of the failure -detection times considering the service recovery times presented as examples for -the different service availability levels in the ETSI GS NFV-REL 001 V1.1.1 -(2015-01) document. - -The table below maps failure modes to example failure detection times. - -+------------------------------------------------------------+---------------+ -|Failure Mode | Time | -+============================================================+===============+ -|Failure detection of HW | <1s | -+------------------------------------------------------------+---------------+ -|Failure detection of virtual resource | <1s | -+------------------------------------------------------------+---------------+ -|Alarm threshold detection | <1min | -+------------------------------------------------------------+---------------+ -|Failure detection over of SAL 1 | <1s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 1 | 5-6s | -+------------------------------------------------------------+---------------+ -|Failure detectionover of SAL 2 | <5s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 2 | 10-15s | -+------------------------------------------------------------+---------------+ -|Failure detectionover of SAL 3 | <10s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 3 | 20-25s | -+------------------------------------------------------------+---------------+ - - -=============== -2 Hardware HA -=============== - -The hardware HA can be solved by several legacy HA schemes. However, when -considering the NFV scenarios, a hardware failure will cause collateral damage to -not only to the services but also virtual infrastructure running on it. - -A redundant architecture and automatic failover for the hardware are required -for the NFV scenario. At the same time, the fault detection and report of HW -failure from the hardware to VIM, VNFM and if necessary the Orchestrator to achieve HA in OPNFV. A -sample fault table can be found in the Doctor project. (https://wiki.opnfv.org/doctor/faults) -All the critical hardware failures should be reported to the VIM within 1s. - -.. (MT2) Should we keep the 50ms here? Other places have been modified to <1sec, e.g. for SAL 1. - -.. (fq2) agree with 1s - -Other warnings for the hardware should also be reported to the VIM in a -timely manner. - -***************************** -2.1 General Requirements -***************************** - -.. (MT) Are these general requirements or just for the servers? - -.. (fq) I think these should be the general requirements. not just the server. - -* Hardware Failures should be reported to the hypervisor and the VIM. -* Hardware Failures should not be directly reported to the VNF as in the traditional ATCA - architecture. -* Hardware failure detection message should be sent to the VIM within a specified period of time, - based on the SAL as defined in Section 1. -* Alarm thresholds should be detected and the alarm delivered to the VIM within 1min. A certain - threshold can be set for such notification. -* Direct notification from the hardware to some specific VNF should be possible. - Such notification should be within 1s. -* Periodical update of hardware running conditions (operational state?) to the - NFVI and VIM is required for further operation, which may include fault - prediction, failure analysis, and etc.. Such info should be updated every 60s -* Transparent failover is required once the failure of storage and network - hardware happens. -* Hardware should support SNMP and IPMI for centralized management, monitoring and - control. - -.. (MT) I would assume that this is OK if no guest was impacted, if there was a guest impact I think the VIM etc should know about the issue; in any case logging the failure and its correction would be still important -.. (fq) It seems the hardware failure detection message should send to VIM, shall we delete the hypervisor part? -.. (MT) The reason I asked the question whether this is about the servers was the hypervisor. I agree to remove this from the genaral requirement. -.. (Yifei) Shall we take VIM user (VNFM & NFVO) into consideration? As some of the messages should be send to VIM user. -.. (fq) yifei, I am a little bit confused, do you mean the Hardware send messages directly to VIM user? I myself think this may not be possible? -.. (Yifei) Yes, ur right, they should be sent to VIM first. -.. (MT) I agree, they should be sent to the VIM, the hypervisor can only be conditional because it may not be relevant as in a general requirement or may be dead with the HW. -.. (fq) Agree. I have delete the hypervisor part so that it is not a general requirement. -.. may require realtime features in openstack - -.. (fq) We may need some discussion about the time constraints? including failure detection time, VNF failover time, warning for abnormal situations. A table might be needed to clearify these. Different level of VNF may require differnent failover time. - -.. (MT) I agree. A VNF that manages its own availability with "built-in" redundancy wouldn't really care whether it's 1s or 1min because it would detect the failure and do the failover at the VNF level. But if the availability is managed by the VIM and VNFM then this time becomes critical. - -.. (joe) VIM can only rescue or migrate the VM onto anther host in case of hardware failure. The VNF should have being rescalready finish the failover before the failed/fault VM ued or migrated. VIM's responisbility is to keep the number of alive VM instances required by VNF, even for auto scaling, but not to replacethe VNF failover.That's why hardware failure dection message for VIM is not so time sensitive, because VM creation is often a slow task compared to failover(Althoug a lot of technology to accelerate the VM generation speed or use spare VM pool ). - -.. (fq) Yes. But here we just mean failure detection, not rescue or migration of the VM. I mean the hardware and NFVI failure should be reported to the VIM and the VNF in a timely manner, then the VNF can do the failover, and the VIM can do the migration and rescue afterwards. - -.. (bb) There is confusion regarding time span within which hardware failure should be reported to VIM. In 2nd paragraph(of Hardware HA), it has been mentioned as; "within 50ms" and in this point it is "1s". - -.. (fq) I try to modify the 50ms to 1s. - -.. (chayi) hard for openstack - -.. VNF failover time < 1s - -.. (MT) Indeed, it's not designed for that - -.. (MT) Do the "hardware failure detection message" and the "alarm of hardware failure" refer to the same notification? It may be better to speak about hardware failure detection (and reporting) time. - -.. (fq) I have made the modification. see if it makes sense to you now. - -.. (MT) Based on the definition section I think you are talking about these threshold alarms only, because a failure is also an abnormal situation, but you want to detect it within a second - -.. (fq) Actually, I want to define Alarm as messages that might lead to failure in the near future, for example, a high tempreture, or maybe a prediction of failure. These alarm maybe important, but they do not need to be answered and solved within seconds. - -.. Alarms for abnormal situations and performance decrease (i.e. overuse of cpu) -.. should be raised to the VIM within 1min(?). - - -.. (MT) There should be possible to set some threshold at which the notification should be triggered and probably ceilometer is not reliable enough to deliver such notifications since it has no real-time requirement nor it is expected to be lossless. - -.. (fq) modification made. - -.. (MT) agree with the realtime extension part :-) - -.. (MT) Considering the modified definitions can we say that: Alarm conditions should be detected and the alarm delivered to the VIM within 1min? - -.. This effectively result in two requirements: one on the detection and one on the -.. delivery mechanism. - -.. (fq) Agree. I have made the modification. - - - -.. In the meantime, I see the discussion of -.. this requirement is still open. - -.. (Yifei) As before I do not think it is needed to send HW fault/failure to VNF. For it is different from traditional interated NF, all the lifecycle of VNF is managed by VNFM. - -.. (joe) the HW fault/failure to VNF is required directly for VNF failover purpose. For example, memory or nic failure should be noticed by VNF ASAP, so that the service can be taken over and handled correctly by another VNF instance. - -.. (YY) In what case HW failure to VNF directly?Next is my understanding,may be not correct. If cpu/memory fails hostOS may be crashed at the same time the failure occured then no notification could be send to anywhere. If it is not crashed in some well managed smp OS, and if we use cpu-pinning to VM, the vm guestOS may be crashed. If cpu-pinning is not applied to VM, the hypervisor can continue scheduling the VMs on the server just like over-allocation mode. Another point, to accelerate the failover, the failure should be sent to standby service entity not the failed one. The standby vm should not be in same server because of anti-affinity scheme. How can "direct notice" apply? - -.. (joe) not all HW fault leads to the VNF will be crushed. For example, the nic can not send packet as usual, then it'll affect the service, but the VNF is still running. - - -.. Maybe 10 min is too long. As far as I know, Zabbix which is used by Doctor can -.. achieve 60s. - -.. (fq) change the constraint to 60s - -.. (MT2) I think this applies primarily to storage, network hardware and maybe some controllers, which also run in some type of redundancy e.g. active/active or active/standby. For compute, we need redundancy, but it's more of the spare concept to replace any failed compute in the cluster (e.g. N+1). In this context the failover doesn't mean the recovery of a state, it only means replacing the failed HW with a healthy one in the initial state and that's not transparent at the HW level at least, i.e. the host is not brought up with the same identiy as the failed one. - -.. (fq) agree. I have made some modification. I wonder what controller do you mean? is it SDN controller? - -.. (MT3) Yes, SDN, storage controllers. I don't know if any of the OpenStack controllers would also have such requirement, e.g. Ironic - - - -.. (MT) Is it expected for _all_ hardware? - -.. (YY) As general requirement should we add that the hardware should allow for -.. centralized management and control? Maybe we could be even more specific -.. e.g. what protocol should be supported. - -.. (fq) I agree. as far as I know, the protocol we use for hardware include SNMP and IPMI. - -.. (MT) OK, we can start with those as minimum requirement, i.e. HW should support at least them. Also I think the Ironic project in OpenStack manages the HW and also supports these. I was thinking maybe it could also be used for the HW management although that's not the general goal of Ironic as far as I know. - -********************************* -2.2 Network plane Requirements -********************************* - -* The hardware should provide a redundant architecture for the network plane. -* Failures of the network plane should be reported to the VIM within 1s. -* QoS should be used to protect against link congestion. - -.. (MT) Do you mean the failure of the entire network plane? -.. (fq) no, I mean the failure of the network connection of a certain HW, or a VNF. - -************************** -2.3 Power supply system -************************** - -* The power supply architecture should be redundant at the server and site level. -* Fault of the power supply system should be reported to the VIM within 1s. -* Failure of a power supply will trigure automatic failover to the redundant supply. - -********************* -2.4 Cooling system -********************* - -* The architecture of the cooling system should be redundant. -* Fault of the cooling system should be reported to the VIM within 1s -* Failure of the cooling systme will trigger automatic failover of the system - -*************** -2.5 Disk Array -*************** - -* The architecture for the disk array should be redundant. -* Fault of the disk array should be reported to the VIM within 1s -* Failure of the the disk array will trigger automatic failover of the system - support for protected cache after an unexpected power loss. - -* Data shall be stored redundantly in the storage backend - (e.g., by means of RAID across disks.) -* Upon failures of storage hardware components (e.g., disks services, storage - nodes) automatic repair mechanisms (re-build/re-balance of data) shall be - triggered automatically. -* Centralized storage arrays shall consist of redundant hardware - -************* -2.6 Servers -************* - -* Support precise timing with accuracy higher than 4.6ppm - -.. (MT2) Should we have time synchronization requirements in the other parts? I.e. having NTP in control nodes or even in all hosts - - -==================================================== -3 Virtualization Facilities (Host OS, Hypervisor) -==================================================== - -********************************************************** -3.1 Requirements on Host OS and Hypervisor and Storage -********************************************************** - -Requirements: -============== - -- The hypervisor should support distributed HA mechanism -- Hypervisor should detect the failure of the VM. Failure of the VM should be reported to - the VIM within 1s -- The hypervisor should report (and if possible log) its failure and recovery action. - and the destination to whom they are reported should be configurable. -- The hypervisor should support VM migration -- The hypervisor should provide isolation for VMs, so that VMs running on the same - hardware do not impact each other. -- The host OS should provide sufficient process isolation so that VMs running on - the same hardware do not impact each other. -- The hypervisor should record the VM information regularly and provide logs of - VM actions for future diagnoses. -- The NFVI should maintain the number of VMs provided to the VNF in the face of failures. - I.e. the failed VM instances should be replaced by new VM instances - -************************************ -3.2 Requirements on Middlewares -************************************ - -Requirements: -============== - -- It should be possible to detect and automatically recover from hypervisor failures - without the involvement of the VIM -- Failure of the hypervisor should be reported to the VIM within 1s -- Notifications about the state of the (distributed) storage backends shall be send to the - VIM (in-synch/healthy, re-balancing/re-building, degraded). -- Process of VIM runing on the compute node should be monitored, and failure of it should - be notified to the VIM within 1s -- Fault detection and reporting capability. There should be middlewares supporting in-band - reporting of HW failure to VIM. -- Storage data path traffic shall be redundant and fail over within 1 second on link - failures. -- Large deployments using distributed software-based storage shall separate storage and - compute nodes (non-hyperconverged deployment). -- Distributed software-based storage services shall be deployed redundantly. -- Data shall be stored redundantly in distributed storage backends. -- Upon failures of storage services, automatic repair mechanisms (re-build/re-balance of - data) shall be triggered automatically. -- The storage backend shall support geo-redundancy. - -============================================= -4 Virtual Infrastructure HA C Requirements -============================================= - -This section is written with the goal to ensure that there is alignment with -Section 4.2 of the ETSI/NFV REL-001 document. - -Key reference requirements from ETSI/NFV document: -=================================================== - -[Req.4.2.12] On the NFVI level, there should be a transparent fail-over in the -case of for example compute, memory,storage or connectivity failures. - -.. (fq) According to VNF part, the following bullet may be added: - -* The virtual infrastructure should provide classified virtual resource for - different SAL VNFs. Each class of the resources should have guaranteed - performance metrics. - -* Specific HA handling schemes for each classified virtual resource, - e.g. recovery mechanisms, recovery priorities, migration options, - should be defined. - -* The NFVI should maintain the number of VMs provided to the VNF in the face of - failures. I.e. the failed VM instances should be replaced by new VM instances. - -.. (MT) this might be a requirement on the hypervisor and/or the -.. VIM. In this respect I wonder where the nova agent running on the compute node -.. belongs. Is it the VIM already or the Virtualization Facilities? The reason I'm -.. asking is that together with the hypervisor they are in a unique position of -.. correlating different failures on the host that may be due to HW, OS or -.. hypervisor. - -.. (fq) I agree this might be for the hypervisor part. The VNF (i.e. -.. between VNFCs) may have its own fault detection mechanism, which might be -.. triggered prior to receiving the error report from the underlying NFVI therefore -.. the NFVI/VIM should not attempt to preserve the state of a failing VM if not -.. configured to do so - -************** -4.1 Compute -************** - -VM including CPU, memory and ephemeral disk - -.. (Yifei) Including noca-compute fq) What do you mean? Yifei) I mean nova- -.. (compute is important enough for us to define some requirement about it. -.. (IJ)(Nova-compute is important, but implementation specific, this should be -.. requirements focused. - -Requirements: -============== - -* Detection of failures must be sub 1 second. -* Recovery of a failed VM (VNF) must be automatic. The recovery must re-launch - the VM based on the required initial state defined in the VNFD. - -.. (MT) I think this is the same essentially as the one brought over from the VNF part in the paragraph above, where I have the question also. -.. (Yifei) Different mechanisms should be defined according to the SLA of the service running on the VM. -.. (fq) What do you mean by failure detection? Do you mean hypervisor notice the failure and perform automatic recovery? or do you mean hypervisor notice the failure and inform VIM? -.. (fq) How to define the time limit for the failure detection? whether 1s is sufficient enough, or we should require for sometime less? - -.. Requirements do have some dependency on the NFVI interface definitions that are -.. currently being defined by ETSI/NFV working groups. Ongoing alignment will -.. be required. - -* On evacuation, fencing of instances from an unreachable host is required. - -.. orginal wording for above: Fencing instances of an unreachable host when evacuation happens.[GAP 10] - -.. (YY) If a host is unreachable how to evacuate VMs on it? Fencing function may be moved toVIM part. -.. (fq) copy from the Gap 10: - -.. Safe VM evacuation has to be preceded by fencing (isolate, shut down) the failed -.. host. Failing to do so C when the perceived disconnection is due to some -.. transient or partial failure C the evacuation might lead into two identical -.. instances running together and having a dangerous conflict. - -.. (unknown commenter) I agree it should be move to VIM part. -.. (IJ) Not clear what or if the above comment has been moved. - -.. (Yifei) In OpenStack, evacuate means that "VMs whose storage is accessible from other nodes (e.g. shared storage) could be rebuilt and restarted on a target node", it is different from migration. link: https://wiki.openstack.org/wiki/Evacuate - -* Resources of a migrated VM must be evacuated once the VM is - migrated to a different compute node, placement policies must be preserved. - For example during maintenance activities. - -.. (MT) Do you mean maintenance of the compute node? In any case I think the evacuation should follow the palcement policy. -.. (fq) Yes. What placement policy do you mean? -.. (Yifei) e.g. keep the same scheduler hints as before, am I right ,@Maria? -.. (MT) Yes, the affinity, anti-affinity, etc -.. (fq) Got it. I am adding a requirement that the evacuation should follow the placement policy. -.. (fq) insert below. - -* Failure detection of the VNF software process is required - in order to detect the failure of the VNF sufficiently. Detection should be - within less than 1 second. - -.. ( may require interface extension) - -.. (MT) What do youy mean by the VNF software process? Is it the application(s) running in the VM? If yes, Heat has such consideration already, but I'm only familiar with the first version which was cron job based and therefore the resolution was 1 minute. -.. (fq) Yes, I mean the applications. 1 min might be too long I am afraid. I think this failure detection should be at least less than the failover time. Otherwise it does not make sense. -.. (I don't know if 50ms is sufficient enough, since we require the failover of the VNFs should be within 50ms, if the detection is longer than this, there is no meaning to do the detection) -.. (MT) Do you assume that the entire VM needs to be repaired in case of application failure? Also the question is whether there's a VM ready to failover to. It might be that OpenStack just starts to build the VM when the failover is triggere. If that's the case it can take minutes. If the VM exists then starting it still takes ~half a minute I think. -.. I think there's a need to have the VM images in shared storage otherwise there's an issue with migration and failover -.. (fq) I don't mean the recovery of the entire VM. I only mean the failover of the service. In our testing, we use an active /active VM, so it only takes less than 1s to do the failover. I understand the situation you said above. I wonder if we should set a time constraint for such failover? for me, I think such constraint should be less than second. -.. (Yifei) Maria, I cannot understand " If the VM exists then starting it still takes ~half a minute", would please explain it more detailed? Thank you. -.. (MT) As far as I know Heat rebuilds the VM from scratch as part of the failure recovery. Once the VM is rebuilt it's booted and only after that it can actualy provide service. This time till the VM is ready to serve can take 20-30sec after the VM is already reported as existing. -.. ([Yifei) ah, I see. Thank you so much! -.. (YY) As I understand, what heat provides is not what fuqiao wants here. To failover within 50ms/or 1s means two VMs are all running, in NFVI view there are two VMs running, but in application view one is master the other is standby. What I did not find above is how to monitoring application processes in VM? Tradictionally watchdog is applied to this task. In new version of Qemu watchdog is simulated with software but timeslot of watchdog could not be as narrow as hardware watchdog. I was told lower than 15s may cause fault action. -.. Do you mean this watchdog? https://libvirt.org/formatdomain.html#elementsWatchdog -.. (fq) Yes, Yuan Yue got my idea:) - -.. 4.2 Storage dedicated section (new section 7). -.. (GK) please see dedicated section on storage below (Section 7) -.. Virtual disk and volumes for applications. -.. Storage related to NFVI must be redundant. -.. Requirements: -.. For small systems a small local redundant file system must be supported. -.. For larger system C replication of data across multiple storage nodes. Processes controlling the storage nodes must also be replicated, such that there is no single point of failure. -.. Block storage supported by a clustered files system is required. -.. Should be tranparent to the storage user - -************ -4.2 Network -************ - -4.2.1 Virtual network -======================== - -Requirements: --------------- -* Redundant top of rack switches must be supported as part of the deployment. - -.. (MT) Shouldn't this be a HW requirement? -.. (Yifei) Agree with Maria -.. (IJ) The ToR is not typically in the NFVI, that is why I put the ToR here. - -* Static LAG must be supported to ensure sub 50ms detection and failover of - redundant links between nodes. The distributed virtual router should - support HA. - -.. (Yifei) Add ?: Service provided by Network agents should be keeped availability and continuity. e.g. VRRP is used for L3 agent HA (keepalived or pacemaker) -.. (IJ) this is a requirements document. Exclude the implementation details. Added the requirement below - -* Service provided by network agents should be highly available (L3 Agent, DHCP - agent as examples) - -* L3-agent, DHCP-agent should clean up network artifacts (IPs, Namespaces) from - the database in case of failover. - -4.2.2 vSwitch -=============== - -Requirements: --------------- - -* Monitoring and health of vSwitch processes is required. -* The vSwitch must adapt to changes in network topology and automatically - support recovery modes in a transparent manner. - -4.2.3 Link Redundancy -========================= - -Requirements: --------------- - -* The ability to manage redundant interfaces and support of LAG on the compute - node is required. -* Support of LAG on all interfaces, internal platform control - interfaces,internal platform storage interfaces, as well as interfaces - connecting to provide networks. -* LACP is optional for dynamic management of LAG links -* Automated configuration LAG should support active/standby and - balanced modes. Should adapt to changes in network topology and automatically - support recovery modes in a transparent manner. -* In SR-IOV scenario, link redundancy could not be transparent, VM should have - two ports directly connect to physical port on host. Then app may bind - these two ports for HA. - -.. (MT) Should we consider also load balancers? I'm not familiar with the LBaaS, but it seems to be key for the load distribution for the multi-VM VNFs. -.. (YY) As I know LBaaS was not mature this time in openstack. Openstack does provide API for LBaaS,but it depend on LB entity and its plugin. We have not found any mature LB agent and LB entity in community. The LB inside VNF usually approached by VNF itsself. -.. (fq) I think LB should be taken into consideration as well. eventhough openstack now is not mature. This is how OPNFV is working, we work out requirement for our side, propose possible bp to openstack so that these features can be added in the future releases. -.. (YIfei) Agree. Because of it is not mature, there is possibility to find gap between OpenStack and our requirement. -.. (MT) Agree. We may even influence how it matures ;-) -.. vlb, vFW are part of virtual resources? -.. (Yifei) From my side, network node. -.. (Yifei) If you mean LB or FW in NFVI, I do not think vXX is a suitable name as in OpenStack Neutron there are LBaas and FWaas. If you mean VNF, then you can call them vLB and vFW. However i do not think LBaas is the same as vLB, they are different use cases. What we need to consider should be LBaas and FWaas not vLB or vFW. -.. For more details about LBaas and FWaas, you can find on the wiki page of neutron... -.. (fq) Thank you for Yifei. I wonder what's the difference between vLB and LBaas. You mean they have different functions? -.. (IJ) LBaaS is good for enterprise - for Carrier applications won't higher data rates be needed and therefore a Load Balancer in a VNF is probably a better solution. - - - -============================ -5 VIM High availability -============================ -The VIM in the NFV reference architecture contains all the control nodes of OpenStack, SDN controllers -and hardware controllers. It manages the NFVI according to the instructions/requests of the VNFM and -NFVO and reports them back about the NFVI status. To guarantee the high availability of the VIM is -a basic requirement of the OPNFV platform. Also the VIM should provide some mechanism for VNFs to achieve -their own high availability. - - -******************************************* -5.1 Architecture requirement of VIM HA -******************************************* - -The architecture of the control nodes should avoid any single point of failure and the management -network plane which connects the control nodes should also be redundant. Services of the control nodes -which are stateless like nova-API, glance-API etc. should be redundant but without data synchronization. -Stateful services like MySQL, Rabbit MQ, SDN controller should provide complex redundancy policies. -Cloud of different scale may also require different HA policies. - -Requirement: -============= -- In small scale scenario active-standby redundancy policy would be acceptable. - -- In large scale scenario all stateful services like database, message queue, SDN controller - should be deployed in cluster mode which support N-way, N+M active-standby redundancy. - -- In large scale scenario all stateless services like nova-api, glance-api etc. should be deployed - in all active mode. - -- Load balance nodes which introduced for all active and N+M mode should also avoid the single point - of failure. - -- All control node servers shall have at least two network ports to connect to different networks - plane. These ports shall work in bonding manner. - -- Any failures of services in the redundant pairs should be detected and switch over should be carried out - automatically in less than 5 seconds totally. - -- Status of services must be monitored. - -****************************************************** -5.2 Fault detection and alarm requirement of VIM -****************************************************** - - -Redundant architecture can provide function continuity for the VIM. For maintenance considerations -all failures in the VIM should be detected and notifications should be triggered to NFVO, VNFM and other -VIM consumers. - -Requirement: -============= - -- All hardware failures of control nodes should be detected and relevant alarms should be triggered. - OSS, NFVO, VNFM and other VIM consumers can subscribe these alarms. - -- Software on control nodes like OpenStack or ODL should be monitored by the clustering software - at process level and alarms should be triggered when exceptions are detected. - -- Software on compute nodes like OpenStack/nova agents, ovs should be monitored by watchdog. When - exceptions are detected the software should be restored automatically and alarms should be triggered. - -- Software on storage nodes like Ceph, should be monitored by watchdog. When - exceptions are detected the software should be restored automatically and alarms should be triggered. - -- All alarm indicators should include: Failure time, Failure location, Failure type, Failure level. - -- The VIM should provide an interface through which consumers can subscribe to alarms and notifications. - -- All alarms and notifications should be kept for future inquiry in VIM, ageing policy of these records - should be configurable. - -- VIM should distinguish between the failure of the compute node and the failure of the host HW. - -- VIM should be able to publish the health status of the compute node to NFV MANO. - -******************************************* -5.3 HA mechanism of VIM provided for VNFs -******************************************* - -When VNFs deploy their HA scheme, they usually require from underlying resource to provide some mechanism. -This is similar to the hardware watchdog in the traditional network devices. Also virtualization -introduces some other requirements like affinity and anti-affinity with respect to the allocation of the -different virtual resources. - -Requirement: -============ - -- VIM should provide the ability to configure HA functions like watchdog timers, - redundant network ports and etc. These HA functions should be properly tagged and exposed to - VNF and VNFM with standard APIs. - -- VIM should provide anti-affinity scheme for VNF to deploy redundant service on different level of - aggregation of resource. - -- VIM should be able to deploy classified virtual resources to VNFs following the SAL description in VNFD. - -- VIM should provide data collection to calculate the HA related metrics for VNFs. - -- VIM should support the VNF/VNFM to initiate the operation of resources of the NFVI, such as repair/reboot. - -- VIM should correlate the failures detected on collocated virtual resources to identify latent faults in - HW and virtualization facilities - -- VIM should be able to disallow the live migration of VMs and when it is allowed it should be possible - to specify the tolerated interruption time. - -- VIM should be able to restrict the simultaneous migration of VMs hosting a given VNF. - -- VIM should provide the APIs to trigger scale in/out to VNFM/VNF. - -- When scheduler of the VIM use the Active/active HA scheme, multiple scheduler instances must not create - a race condition - -- VIM should be able to trigger the evacuation of the VMs before bringing the host down - when *maintenance mode* is set for the compute host. - -- VIM should configure Consoleauth in active/active HA mode, and should store the token in database. - -- VIM should replace a failed VM with a new VM and this new VM should start in the same initial state - as the failed VM. - -- VIM should support policies to prioritize a certain VNF. - -********************* -5.4 SDN controller -********************* - -SDN controller: Distributed or Centralized - -Requriements: -============== -- In centralized model SDN controller must be deployed as redundant pairs. - -- In distributed model, mastership election must determine which node is in overall control. - -- For distributed model, VNF should not be aware of HA of controller. That is it is a - logically centralized - system for NBI(Northbound Interface). - -- Event notification is required as section 5.2 mentioned. - -======================= -6 VNF High Availability -======================= - - -************************ -6.1 Service Availability -************************ - -In the context of NFV, Service Availability refers to the End-to-End (E2E) Service -Availability which includes all the elements in the end-to-end service (VNFs and -infrastructure components) with the exception of the customer terminal such as -handsets, computers, modems, etc. The service availability requirements for NFV -should be the same as those for legacy systems (for the same service). - -Service Availability = -total service available time / -(total service available time + total service recovery time) - -The service recovery time among others depends on the number of redundant resources -provisioned and/or instantiated that can be used for restoring the service. - -In the E2E relation a Network Service is available only of all the necessary -Network Functions are available and interconnected appropriately to collaborate -according to the NF chain. - -General Service Availability Requirements -========================================= - -* We need to be able to define the E2E (V)NF chain based on which the E2E availability - requirements can be decomposed into requirements applicable to individual VNFs and - their interconnections -* The interconnection of the VNFs should be logical and be maintained by the NFVI with - guaranteed characteristics, e.g. in case of failure the connection should be - restored within the acceptable tolerance time -* These characteristics should be maintained in VM migration, failovers and switchover, - scale in/out, etc. scenarios -* It should be possible to prioritize the different network services and their VNFs. - These priorities should be used when pre-emption policies are applied due to - resource shortage for example. -* VIM should support policies to prioritize a certain VNF. -* VIM should be able to provide classified virtual resources to VNFs in different SAL - - -6.1.1 Service Availability Classification Levels -================================================= - - -The [ETSI-NFV-REL_] defined three Service Availability Levels -(SAL) are classified in Table 1. They are based on the relevant ITU-T recommendations -and reflect the service types and the customer agreements a network operator should -consider. - -.. [ETSI-NFV-REL] `ETSI GS NFV-REL 001 V1.1.1 (2015-01) <http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf>`_ - - -*Table 1: Service Availability classification levels* - -+-------------+-----------------+-----------------------+---------------------+ -|SAL Type | Customer Type | Service/Function | Notes | -+=============+=================+=======================+=====================+ -|Level 1 | Network Operator| * Intra-carrier | Sub-levels within | -| | Control Traffic | engineering | Level 1 may be | -| | | traffic | created by the | -| | Government/ | * Emergency | Network Operator | -| | Regulatory | telecommunication | depending on | -| | Emergency | service (emergency | Customer demands | -| | Services | response, emergency| E.g.: | -| | | dispatch) | | -| | | * Critical Network | * 1A - Control; | -| | | Infrastructure | * 1B - Real-time; | -| | | Functions (e.g | * 1C - Data; | -| | | VoLTE functions | | -| | | DNS Servers,etc.) | May require 1+1 | -| | | | Redundancy with | -| | | | Instantaneous | -| | | | Switchover | -+-------------+-----------------+-----------------------+---------------------+ -|Level 2 | Enterprise and/ | * VPN | Sub-levels within | -| | or large scale | * Real-time traffic | Level 2 may be | -| | customers | (Voice and video) | created by the | -| | (e.g. | * Network | Network Operator | -| | Corporations, | Infrastructure | depending on | -| | University) | Functions | Customer demands. | -| | | supporting Level | E.g.: | -| | Network | 2 services (e.g. | | -| | Operators | VPN servers, | * 2A - VPN; | -| | (Tier1/2/3) | Corporate Web/ | * 2B - Real-time; | -| | service traffic | Mail servers) | * 2C - Data; | -| | | | | -| | | | May require 1:1 | -| | | | Redundancy with | -| | | | Fast (maybe | -| | | | Instantaneous) | -| | | | Switchover | -+-------------+-----------------+-----------------------+---------------------+ -|Level 3 | General Consumer| * Data traffic | While this is | -| | Public and ISP | (including voice | typically | -| | Traffic | and video traffic | considered to be | -| | | provided by OTT) | "Best Effort" | -| | | * Network | traffic, it is | -| | | Infrastructure | expected that | -| | | Functions | Network Operators | -| | | supporting Level | will devote | -| | | 3 services | sufficient | -| | | | resources to | -| | | | assure | -| | | | "satisfactory" | -| | | | levels of | -| | | | availability. | -| | | | This level of | -| | | | service may be | -| | | | pre-empted by | -| | | | those with | -| | | | higher levels of | -| | | | Service | -| | | | Availability. May | -| | | | require M+1 | -| | | | Redundancy with | -| | | | Fast Switchover; | -| | | | where M > 1 and | -| | | | the value of M to | -| | | | be determined by | -| | | | further study | -+-------------+-----------------+-----------------------+---------------------+ - -Requirements -------------- - -* It shall be possible to define different service availability levels -* It shall be possible to classify the virtual resources for the different - availability class levels -* The VIM shall provide a mechanism by which VNF-specific requirements - can be mapped to NFVI-specific capabilities. - -More specifically, the requirements and capabilities may or may not be made up of the -same KPI-like strings, but the cloud administrator must be able to configure which -HA-specific VNF requirements are satisfied by which HA-specific NFVI capabilities. - - - -6.1.2 Metrics for Service Availability -====================================== - -The [ETSI-NFV-REL_] identifies four metrics relevant to service -availability: - -* Failure recovery time, -* Failure impact fraction, -* Failure frequency, and -* Call drop rate. - -6.1.2.1 Failure Recovery Time ---------------------------------- - -The failure recovery time is the time interval from the occurrence of an abnormal -event (e.g. failure, manual interruption of service, etc.) until the recovery of the -service regardless if it is a scheduled or unscheduled abnormal event. For the -unscheduled case, the recovery time includes the failure detection time and the -failure restoration time. -More specifically restoration also allows for a service recovery by the restart of -the failed provider(s) while failover implies that the service is recovered by a -redundant provider taking over the service. This provider may be a standby -(i.e. synchronizing the service state with the active provider) or a spare -(i.e. having no state information). Accordingly failover also means switchover, that -is, an orederly takeover of the service from the active provider by the standby/spare. - -Requirements: -^^^^^^^^^^^^^^^ - -* It should be irrelevant whether the abnormal event is due to a scheduled or - unscheduled operation or it is caused by a fault. -* Failure detection mechanisms should be available in the NFVI and configurable so - that the target recovery times can be met -* Abnormal events should be logged and communicated (i.e. notifications and alarms as - appropriate) - -The TL-9000 forum has specified a service interruption time of 15 seconds as outage -for all traditional telecom system services. [ETSI-NFV-REL_] -recommends the setting of different thresholds for the different Service Availability -Levels. An example setting is given in the following table 2. Note that for all -Service Availability levels Real-time Services require the fastest recovery time. -Data services can tolerate longer recovery times. These recovery times are applicable -to the user plane. A failure in the control plane does not have to impact the user plane. -The main concern should be simultaneous failures in the control and user planes -as the user plane cannot typically recover without the control plane. However an HA -mechanism in VNF itself can further mitigate the risk. Note also that the impact on -the user plane depends on the control plane service experiencing the failure, -some of them are more critical than others. - - -*Table 2: Example service recovery times for the service availability levels* - -+------------+-----------------+------------------------------------------+ -|SAL | Service | Notes | -| | Recovery | | -| | Time | | -| | Threshold | | -+============+=================+==========================================+ -|1 | 5 - 6 seconds | Recommendation: Redundant resources to be| -| | | made available on-site to ensure fast | -| | | recovery. | -+------------+-----------------+------------------------------------------+ -|2 | 10 - 15 seconds | Recommendation: Redundant resources to be| -| | | available as a mix of on-site and off- | -| | | site as appropriate. | -| | | | -| | | * On-site resources to be utilized for | -| | | recovery of real-time services. | -| | | * Off-site resources to be utilized for | -| | | recovery of data services. | -+------------+-----------------+------------------------------------------+ -|3 | 20 - 25 seconds | Recommendation: Redundant resources to be| -| | | mostly available off-site. Real-time | -| | | services should be recovered before data | -| | | services | -+------------+-----------------+------------------------------------------+ - - -6.1.2.2 Failure Impact Fraction ------------------------------------- - -The failure impact fraction is the maximum percentage of the capacity or user -population affected by a failure compared with the total capacity or the user -population supported by a service. It is directly associated with the failure impact -zone which is the set of resources/elements of the system to which the fault may -propagate. - -Requirements: -^^^^^^^^^^^^^^^ - -* It should be possible to define the failure impact zone for all the elements of the - system -* At the detection of a failure of an element, its failure impact zone must be - isolated before the associated recovery mechanism is triggered -* If the isolation of the failure impact zone is unsuccessful the isolation should be - attempted at the next higher level as soon as possible to prevent fault propagation. -* It should be possible to define different levels of failure impact zones with - associated isolation and alarm generation policies -* It should be possible to limit the collocation of VMs to reduce the failure impact - zone as well as to provide sufficient resources - -6.1.2.3 Failure Frequency ---------------------------- - -Failure frequency is the number of failures in a certain period of time. - -Requirements: -^^^^^^^^^^^^^^^^ - -* There should be a probation period for each failure impact zones within which - failures are correlated. -* The threshold and the probation period for the failure impact zones should be - configurable -* It should be possible to define failure escalation policies for the different - failure impact zones - - -6.1.2.4 Call Drop Rate ------------------------- - -Call drop rate reflects service continuity as well as system reliability and -stability. The metric is inside the VNF and therefore is not specified further for -the NFV environment. - -Requirements: -^^^^^^^^^^^^^^^^ - -* It shall be possible to specify for each service availability class the associated - availability metrics and their thresholds -* It shall be possible to collect data for the defined metrics -* It shall be possible to delegate the enforcement of some thresholds to the NFVI -* Accordingly it shall be possible to request virtual resources with guaranteed - characteristics, such as guaranteed latency between VMs (i.e. VNFCs), between a VM - and storage, between VNFs - - -********************** -6.2 Service Continuity -********************** - -The determining factor with respect to service continuity is the statefulness of the -VNF. If the VNF is stateless, there is no state information which needs to be -preserved to prevent the perception of service discontinuity in case of failure or -other disruptive events. -If the VNF is stateful, the NF has a service state which needs to be preserved -throughout such disruptive events in order to shield the service consumer from these -events and provide the perception of service continuity. A VNF may maintain this state -internally or externally or a combination with or without the NFVI being aware of the -purpose of the stored data. - -Requirements: -=============== - -* The NFVI should maintain the number of VMs provided to the VNF in the face of - failures. I.e. the failed VM instances should be replaced by new VM instances -* It should be possible to specify whether the NFVI or the VNF/VNFM handles the - service recovery and continuity -* If the VNF/VNFM handles the service recovery it should be able to receive error - reports and/or detect failures in a timely manner. -* The VNF (i.e. between VNFCs) may have its own fault detection mechanism, which might - be triggered prior to receiving the error report from the underlying NFVI therefore - the NFVI/VIM should not attempt to preserve the state of a failing VM if not - configured to do so -* The VNF/VNFM should be able to initiate the repair/reboot of resources of the VNFI - (e.g. to recover from a fault persisting at the VNF level => failure impact zone - escalation) -* It should be possible to disallow the live migration of VMs and when it is allowed - it should be possible to specify the tolerated interruption time. -* It should be possible to restrict the simultaneous migration of VMs hosting a given - VNF -* It should be possible to define under which circumstances the NFV-MANO in - collaboration with the NFVI should provide error handling (e.g. VNF handles local - recoveries while NFV-MANO handles geo-redundancy) -* The NFVI/VIM should provide virtual resource such as storage according to the needs - of the VNF with the required guarantees (see virtual resource classification). -* The VNF shall be able to define the information to be stored on its associated - virtual storage -* It should be possible to define HA requirements for the storage, its availability, - accessibility, resilience options, i.e. the NFVI shall handle the failover for the - storage. -* The NFVI shall handle the network/connectivity failures transparent to the VNFs -* The VNFs with different requirements should be able to coexist in the NFV Framework -* The scale in/out is triggered by the VNF (VNFM) towards the VIM (to be executed in - the NFVI) -* It should be possible to define the metrics to monitor and the related thresholds - that trigger the scale in/out operation -* Scale in operation should not jeopardize availability (managed by the VNF/VNFM), - i.e. resources can only be removed one at a time with a period in between sufficient - for the VNF to restore any required redundancy. - - - diff --git a/Requirement/virtual_facilities_HA_new.rst b/Requirement/virtual_facilities_HA_new.rst deleted file mode 100644 index e313230..0000000 --- a/Requirement/virtual_facilities_HA_new.rst +++ /dev/null @@ -1,39 +0,0 @@ -3 Virtualization Facilities (Host OS, Hypervisor) -==================================================== - -3.1 Requirements on Host OS and Hypervisor and Storage -Requirements: -- The hypervisor should support distributed HA mechanism -- Hypervisor should detect the failure of the VM. Failure of the VM should be reported to - the VIM within 1s -- The hypervisor should report (and if possible log) its failure and recovery action. - and the destination to whom they are reported should be configurable. -- The hypervisor should support VM migration -- The hypervisor should provide isolation for VMs, so that VMs running on the same - hardware do not impact each other. -- The host OS should provide sufficient process isolation so that VMs running on - the same hardware do not impact each other. -- The hypervisor should record the VM information regularly and provide logs of - VM actions for future diagnoses. -- The NFVI should maintain the number of VMs provided to the VNF in the face of failures. - I.e. the failed VM instances should be replaced by new VM instances -3.2 Requirements on Middlewares -Requirements: -- It should be possible to detect and automatically recover from hypervisor failures - without the involvement of the VIM -- Failure of the hypervisor should be reported to the VIM within 1s -- Notifications about the state of the (distributed) storage backends shall be send to the - VIM (in-synch/healthy, re-balancing/re-building, degraded). -- Process of VIM runing on the compute node should be monitored, and failure of it should - be notified to the VIM within 1s -- Fault detection and reporting capability. There should be middlewares supporting in-band - reporting of HW failure to VIM. -- Storage data path traffic shall be redundant and fail over within 1 second on link - failures. -- Large deployments using distributed software-based storage shall separate storage and - compute nodes (non-hyperconverged deployment). -- Distributed software-based storage services shall be deployed redundantly. -- Data shall be stored redundantly in distributed storage backends. -- Upon failures of storage services, automatic repair mechanisms (re-build/re-balance of - data) shall be triggered automatically. -- The storage backend shall support geo-redundancy.
\ No newline at end of file diff --git a/Scenario/Scenario Analysis for High Availability in NFV.pdf b/Scenario/Scenario Analysis for High Availability in NFV.pdf Binary files differdeleted file mode 100644 index 769201c..0000000 --- a/Scenario/Scenario Analysis for High Availability in NFV.pdf +++ /dev/null diff --git a/Scenario/Scenario.rst b/Scenario/Scenario.rst deleted file mode 100644 index 396569f..0000000 --- a/Scenario/Scenario.rst +++ /dev/null @@ -1,1452 +0,0 @@ -.. image:: opnfv-logo.png - :height: 40 - :width: 200 - :alt: OPNFV - :align: left - -============ -Scenario Analysis for High Availability in NFV -============ - -****************** -1 Introduction -****************** - -This scenario analysis document outlines the model and failure modes for NFV systems. Its goal is along -with the requirements documents and gap analysis help set context for engagement with various -upstream projects. The OPNFV HA project team will continuously evolve these documents. - - -******************** -2 Basic Use Cases -******************** - - -In this section we review some of the basic use cases related to service high availability, -that is, the availability of the service or function provided by a VNF. The goal is to -understand the different scenarios that need to be considered and the specific requirements -to provide service high availability. More complex use cases will be discussed in -other sections. - -With respect to service high availability we need to consider whether a VNF implementation is -statefull or stateless and if it includes or not an HA manager which handles redundancy. -For statefull VNFs we can also distinguish the cases when the state is maintained inside -of the VNF or it is stored in an external shared storage making the VNF itself virtually -stateless. - -Managing availability usually implies a fault detection mechanism, which triggers the -actions necessary for fault isolation followed by the recovery from the fault. -This recovery includes two parts: - -* the recovery of the service and -* the repair of the failed entity. - -Very often the recovery of the service and the repair actions are perceived to be the same, for -example, restarting a failed application repairs the application, which then provides the service again. -Such a restart may take significant time causing service outage, for which redundancy is the solution. -In cases when the service is protected by redundancy of the providing entities (e.g. application -processes), the service is "failed over" to the standby or a spare entity, which replaces the -failed entity while it is being repaired. E.g. when an application process providing the service fails, -the standby application process takes over providing the service, while the failed one is restarted. -Such a failover often allows for faster recovery of the service. - -We also need to distinguish between the failed and the faulty entities as a fault may or -may not manifest in the entity containing the fault. Faults may propagate, i.e. cause other entities -to fail or misbehave, i.e. an error, which in turn might be detected by a different failure or -error detector entity each of which has its own scope. Similarly, the managers acting on these -detected errors may have a limited scope. E.g. an HA manager contained in a VNF can only repair -entities within the VNF. It cannot repair a failed VM, in fact due to the layered architecture -in the VNF it cannot even know whether the VM failed, its hosting hypervisor, or the physical host. -But its error detection mechanism will detect the result of such failures - a failure in the VNF - -and the service can be recovered at the VNF level. -On the other hand, the failure should be detected in the NFVI and the VIM should repair the failed -entity (e.g. the VM). Accordingly a failure may be detected by different managers in different layers -of the system, each of which may react to the event. This may cause interference. -Thus, to resolve the problem in a consistent manner and completely recover from -a failure the managers may need to collaborate and coordinate their actions. - -Considering all these issues the following basic use cases can be identified (see table 1.). -These use cases assume that the failure is detected in the faulty entity (VNF component -or the VM). - - -*Table 1: VNF high availability use cases* - -+---------+-------------------+----------------+-------------------+----------+ -| | VNF Statefullness | VNF Redundancy | Failure detection | Use Case | -+=========+===================+================+===================+==========+ -| VNF | yes | yes | VNF level only | UC1 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC2 | -| | +----------------+-------------------+----------+ -| | | no | VNF level only | UC3 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC4 | -| +-------------------+----------------+-------------------+----------+ -| | no | yes | VNF level only | UC5 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC6 | -| | +----------------+-------------------+----------+ -| | | no | VNF level only | UC7 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC8 | -+---------+-------------------+----------------+-------------------+----------+ - -As discussed, there is no guarantee that a fault manifests within the faulty entity. For -example, a memory leak in one process may impact or even crash any other process running in -the same execution environment. Accordingly, the repair of a failing entity (i.e. the crashed process) -may not resolve the problem and soon the same or another process may fail within this execution -environment indicating that the fault has remained in the system. -Thus, there is a need for extrapolating the failure to a wider scope and perform the -recovery at that level to get rid of the problem (at least temporarily till a patch is available -for our leaking process). -This requires the correlation of repeated failures in a wider scope and the escalation of the -recovery action to this wider scope. In the layered architecture this means that the manager detecting the -failure may not be the one in charge of the scope at which it can be resolved, so the escalation needs to -be forwarded to the manager in charge of that scope, which brings us to an additional use case UC9. - -We need to consider for each of these use cases the events detected, their impact on other entities, -and the actions triggered to recover the service provided by the VNF, and to repair the -faulty entity. - -We are going to describe each of the listed use cases from this perspective to better -understand how the problem of service high availability can be tackled the best. - -Before getting into the details it is worth mentioning the example end-to-end service recovery -times provided in the ETSI NFV REL document [REL]_ (see table 2.). These values may change over time -including lowering these thresholds. - -*Table 2: Service availability levels (SAL)* - -+----+---------------+----------------------+------------------------------------+ -|SAL |Service |Customer Type | Recommendation | -| |Recovery | | | -| |Time | | | -| |Threshold | | | -+====+===============+======================+====================================+ -|1 |5 - 6 seconds |Network Operator |Redundant resources to be | -| | |Control Traffic |made available on-site to | -| | | |ensure fast recovery. | -| | |Government/Regulatory | | -| | |Emergency Services | | -+----+---------------+----------------------+------------------------------------+ -|2 |10 - 15 seconds|Enterprise and/or |Redundant resources to be available | -| | |large scale customers |as a mix of on-site and off-site | -| | | |as appropriate: On-site resources to| -| | |Network Operators |be utilized for recovery of | -| | |service traffic |real-time service; Off-site | -| | | |resources to be utilized for | -| | | |recovery of data services | -+----+---------------+----------------------+------------------------------------+ -|3 |20 - 25 seconds|General Consumer |Redundant resources to be mostly | -| | |Public and ISP |available off-site. Real-time | -| | |Traffic |services should be recovered before | -| | | |data services | -+----+---------------+----------------------+------------------------------------+ - -Note that even though SAL 1 of [REL]_ allows for 5-6 seconds of service recovery, -for many services this is too long and such outage causes a service level reset or -the loss of significant amount of data. Also the end-to-end service or network service -may be served by multiple VNFs. Therefore for a single VNF the desired -service recovery time is sub-second. - -Note that failing over the service to another provider entity implies the redirection of the traffic -flow the VNF is handling. This could be achieved in different ways ranging from floating IP addresses -to load balancers. The topic deserves its own investigation, therefore in these first set of -use cases we assume that it is part of the solution without going into the details, which -we will address as a complementary set of use cases. - -.. [REL] ETSI GS NFV-REL 001 V1.1.1 (2015-01) - - -2.1 Use Case 1: VNFC failure in a statefull VNF with redundancy -================================================================== - -Use case 1 represents a statefull VNF with redundancy managed by an HA manager, -which is part of the VNF (Fig 1). The VNF consists of VNFC1, VNFC2 and the HA Manager. -The latter managing the two VNFCs, e.g. the role they play in providing the service -named "Provided NF" (Fig 2). - -The failure happens in one of the VNFCs and it is detected and handled by the HA manager. -On practice the HA manager could be part of the VNFC implementations or it could -be a separate entity in the VNF. The point is that the communication of these -entities inside the VNF is not visible to the rest of the system. The observable -events need to cross the boundary represented by the VNF box. - - -.. figure:: images/Slide4.png - :alt: VNFC failure in a statefull VNF - :figclass: align-center - - Fig 1. VNFC failure in a statefull VNF with built-in HA manager - - -.. figure:: images/StatefullVNF-VNFCfailure.png - :alt: MSC of the VNFC failure in a statefull VNF - :figclass: align-center - - Fig 2. Sequence of events for use case 1 - - -As shown in Fig 2. initially VNFC2 is active, i.e. provides the Provided NF and VNFC1 -is a standby. It is not shown, but it is expected that VNFC1 has some means to get the update -of the state of the Provided NF from the active VNFC2, so that it is prepared to continue to -provide the service in case VNFC2 fails. -The sequence of events starts with the failure of VNFC2, which also interrupts the -Provided NF. This failure is detected somehow and/or reported to the HA Manager, which -in turn may report the failure to the VNFM and simultaneously it tries to isolate the -fault by cleaning up VNFC2. - -Once the cleanup succeeds (i.e. the OK is received) it fails over the active role to -VNFC1 by setting it active. This recovers the service, the Provided NF is indeed -provided again. Thus this point marks the end of the outage caused by the failure -that need to be considered from the perspective of service availability. - -The repair of the failed VNFC2, which might have started at the same time -when VNFC1 was assigned the active state, may take longer but without further impact -on the availability of the Provided NF service. -If the HA Manager reported the interruption of the Provided NF to the VNFM, it should -clear the error condition. - -The key points in this scenario are: - -* The failure of the VNFC2 is not detectable by any other part of the system except - the consumer of the Provided NF. The VNFM only - knows about the failure because of the error report, and only the information this - report provides. I.e. it may or may not include the information on what failed. -* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long - it takes to repair VNFC2. -* The HA manager could be part of the VNFM as well. This requires an interface to - detect the failures and to manage the VNFC life-cycle and the role assignments. - -2.2 Use Case 2: VM failure in a statefull VNF with redundacy -============================================================== - -Use case 2 also represents a statefull VNF with its redundancy managed by an HA manager, -which is part of the VNF. The VNFCs of the VNF are hosted on the VMs provided by -the NFVI (Fig 3). - -The VNF consists of VNFC1, VNFC2 and the HA Manager (Fig 4). The latter managing -the role the VNFCs play in providing the service - Provided NF. -The VMs provided by the NFVI are managed by the VIM. - - -In this use case it is one of the VMs hosting the VNF fails. The failure is detected -and handled at both the NFVI and the VNF levels simultaneously. The coordination occurs -between the VIM and the VNFM. - - -.. figure:: images/Slide6.png - :alt: VM failure in a statefull VNF - :figclass: align-center - - Fig 3. VM failure in a statefull VNF with built-in HA manager - - -.. figure:: images/StatefullVNF-VMfailure.png - :alt: MSC of the VM failure in a statefull VNF - :figclass: align-center - - Fig 4. Sequence of events for use case 2 - - -Again initially VNFC2 is active and provides the Provided NF, while VNFC1 is the standby. -It is not shown in Fig 4., but it is expected that VNFC1 has some means to learn the state -of the Provided NF from the active VNFC2, so that it is able to continue providing the -service if VNFC2 fails. VNFC1 is hosted on VM1, while VNFC2 is hosted on VM2 as indicated by -the arrows between these objects in Fig 4. - -The sequence of events starts with the failure of VM2, which results in VNFC2 failing and -interrupting the Provided NF. The HA Manager detects the failure of VNFC2 somehow -and tries to handle it the same way as in use case 1. However because the VM is gone the -clean up either not initiated at all or interrupted as soon as the failure of the VM is -identified. In either case the faulty VNFC2 is considered as isolated. - -To recover the service the HA Manager fails over the active role to VNFC1 by setting it active. -This recovers the Provided NF. Thus this point marks again the end of the outage caused -by the VM failure that need to be considered from the perspective of service availability. -If the HA Manager reported the interruption of the Provided NF to the VNFM, it should -clear the error condition. - -On the other hand the failure of the VM is also detected in the NFVI and reported to the VIM. -The VIM reports the VM failure to the VNFM, which passes on this information -to the HA Manager of the VNF. This confirms for the VNF HA Manager the VM failure and that -it needs to wait with the repair of the failed VNFC2 until the VM is provided again. The -VNFM also confirms towards the VIM that it is safe to restart the VM. - -The repair of the failed VM may take some time, but since the service has been failed over -to VNFC1 in the VNF, there is no further impact on the availability of Provided NF. - -When eventually VM2 is restarted the VIM reports this to the VNFM and -the VNFC2 can be restored. - -The key points in this scenario are: - -* The failure of the VM2 is detectable at both levels VNF and NFVI, therefore both the HA - manager and the VIM reacts to it. It is essential that these reactions do not interfere, - e.g. if the VIM tries to protect the VM state at NFVI level that would conflict with the - service failover action at the VNF level. -* While the failure detection happens at both NFVI and VNF levels, the time frame within - which the VIM and the HA manager detect and react may be very different. For service - availability the VNF level detection, i.e. by the HA manager is the critical one and expected - to be faster. -* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long - it takes to repair VM2 and VNFC2. -* The HA manager could be part of the VNFM as well. - This requires an interface to detect failures in/of the VNFC and to manage its life-cycle and - role assignments. -* The VNFM may not know for sure that the VM failed until the VIM reports it, i.e. whether - the VM failure is due to host, hypervisor, host OS failure. Thus the VIM should report/alarm - and log VM, hypervisor, and physical host failures. The use cases for these failures - are similar with respect to the Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host may be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. - E.g. on error restart VM in initial state, restart VM from last snapshot, or fail VM over to standby. - - -2.3 Use Case 3: VNFC failure in a statefull VNF with no redundancy -==================================================================== - -Use case 3 also represents a statefull VNF, but it stores its state externally on a -virtual disk provided by the NFVI. It has a single VNFC and it is managed by the VNFM -(Fig 5). - -In this use case the VNFC fails and the failure is detected and handled by the VNFM. - - -.. figure:: images/Slide10.png - :alt: VNFC failure in a statefull VNF No-Red - :figclass: align-center - - Fig 5. VNFC failure in a statefull VNF with no redundancy - - -.. figure:: images/StatefullVNF-VNFCfailureNoRed.png - :alt: MSC of the VNFC failure in a statefull VNF No-Red - :figclass: align-center - - Fig 6. Sequence of events for use case 3 - - -The VNFC periodically checkpoints the state of the Provided NF to the external storage, -so that in case of failure the Provided NF can be resumed (Fig 6). - -When the VNFC fails the Provided NF is interrupted. The failure is detected by the VNFM -somehow, which to isolate the fault first cleans up the VNFC, then if the cleanup is -successful it restarts the VNFC. When the VNFC starts up, first it reads the last checkpoint -for the Provided NF, then resumes providing it. The service outage lasts from the VNFC failure -till this moment. - -The key points in this scenario are: - -* The service state is saved in an external storage which should be highly available too to - protect the service. -* The NFVI should provide this guarantee and also that storage and access network failures - are handled seemlessly from the VNF's perspective. -* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. This is - not required if the VNF also provides its availability management. -* The Provided NF can be resumed only after the VNFC is restarted and it has restored the - service state from the last checkpoint created before the failure. -* Having a spare VNFC can speed up the service recovery. This requires that the VNFM coordinates - the role each VNFC takes with respect to the Provided NF. I.e. the VNFCs do not act on the - stored state simultaneously potentially interfering and corrupting it. - - - -2.4 Use Case 4: VM failure in a statefull VNF with no redundancy -================================================================== - -Use case 4 also represents a statefull VNF without redundancy, which stores its state externally on a -virtual disk provided by the NFVI. It has a single VNFC managed by the VNFM -(Fig 7) as in use case 3. - -In this use case the VM hosting the VNFC fails and the failure is detected and handled by -the VNFM and the VIM simultaneously. - - -.. figure:: images/Slide11.png - :alt: VM failure in a statefull VNF No-Red - :figclass: align-center - - Fig 7. VM failure in a statefull VNF with no redundancy - -.. figure:: images/StatefullVNF-VMfailureNoRed.png - :alt: MSC of the VM failure in a statefull VNF No-Red - :figclass: align-center - - Fig 8. Sequence of events for use case 4 - -Again, the VNFC regularly checkpoints the state of the Provided NF to the external storage, -so that it can be resumed in case of a failure (Fig 8). - -When the VM hosting the VNFC fails the Provided NF is interrupted. - -On the one hand side, the failure is detected by the VNFM somehow, which to isolate the fault tries -to clean the VNFC up which cannot be done because of the VM failure. When the absence of the VM has been -determined the VNFM has to wait with restarting the VNFC until the hosting VM is restored. The VNFM -may report the problem to the VIM, requesting a repair. - -On the other hand the failure is detected in the NFVI and reported to the VIM, which reports it -to the VNFM, if the VNFM hasn't reported it yet. -If the VNFM has requested the VM repair or if it acknowledges the repair, the VIM restarts the VM. -Once the VM is up the VIM reports it to the VNFM, which in turn can restart the VNFC. - -When the VNFC restarts first it reads the last checkpoint for the Provided NF, -to be able to resume it. -The service outage last until this is recovery completed. - -The key points in this scenario are: - - -* The service state is saved in external storage which should be highly available to - protect the service. -* The NFVI should provide such a guarantee and also that storage and access network failures - are handled seemlessly from the perspective of the VNF. -* The Provided NF can be resumed only after the VM and the VNFC are restarted and the VNFC - has restored the service state from the last checkpoint created before the failure. -* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. Alternatively - the VNF may also provide its availability management. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to the Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host may be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service - recovery for this use case, but by itself it may not protect against VNFC level failures. I.e. - VNFC level error detection is still required. - - - -2.5 Use Case 5: VNFC failure in a stateless VNF with redundancy -================================================================= - -Use case 5 represents a stateless VNF with redundancy, i.e. it is composed of VNFC1 and VNFC2. -They are managed by an HA manager within the VNF. The HA manager assigns the active role to provide -the Provided NF to one of the VNFCs while the other remains a spare meaning that it has no state -information for the Provided NF (Fig 9) therefore it could replace any other VNFC capable of -providing the Provided NF service. - -In this use case the VNFC fails and the failure is detected and handled by the HA manager. - - -.. figure:: images/Slide13.png - :alt: VNFC failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 9. VNFC failure in a stateless VNF with redundancy - - -.. figure:: images/StatelessVNF-VNFCfailure.png - :alt: MSC of the VNFC failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 10. Sequence of events for use case 5 - - -Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not even been instantiated -yet (Fig 10). - -When VNFC2 fails the Provided NF is interrupted. This failure is detected by the HA manager, -which as a first reaction cleans up VNFC2 (fault isolation), then it assigns the active role to -VNFC1. It may report an error to the VNFM as well. - -Since there is no state information to recover, VNFC1 can accept the active role right away -and resume providing the Provided NF service. Thus the service outage is over. If the HA manager -reported an error to the VNFM it should clear it at this point. - -The key points in this scenario are: - -* The spare VNFC may be instantiated only once the failure of active VNFC is detected. -* As a result the HA manager's role might be limited to life-cycle management, i.e. no role - assignment is needed if the VNFCs provide the service as soon as they are started up. -* Accordingly the HA management could be part of a generic VNFM provided it is capable of detecting - the VNFC failures. Besides the service users, the VNFC failure may not be detectable at any other - part of the system. -* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby - may protect all of them. -* Reporting the service failure to the VNFM is optional as the HA manager is in charge of recovering - the service and it is aware of the redundancy needed to do so. - - -2.6 Use Case 6: VM failure in a stateless VNF with redundancy -================================================================= - - -Similarly to use case 5, use case 6 represents a stateless VNF composed of VNFC1 and VNFC2, -which are managed by an HA manager within the VNF. The HA manager assigns the active role to -provide the Provided NF to one of the VNFCs while the other remains a spare meaning that it has -no state information for the Provided NF (Fig 11) and it could replace any other VNFC capable -of providing the Provided NF service. - -As opposed to use case 5 in this use case the VM hosting one of the VNFCs fails. This failure is -detected and handled by the HA manager as well as the VIM. - - -.. figure:: images/Slide14.png - :alt: VM failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 11. VM failure in a stateless VNF with redundancy - - -.. figure:: images/StatelessVNF-VMfailure.png - :alt: MSC of the VM failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 12. Sequence of events for use case 6 - - -Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not have been instantiated -yet (Fig 12) as in use case 5. - -When VM2 fails VNFC2 fails with it and the Provided NF is interrupted. The failure is detected by -the HA manager and by the VIM simultaneously and independently. - -The HA manager's first reaction is trying to clean up VNFC2 to isolate the fault. This is considered to -be successful as soon as the disappearance of the VM is confirmed. -After this the HA manager assigns the active role to VNFC1. It may report the error to the VNFM as well -requesting a VM repair. - -Since there is no state information to recover, VNFC1 can accept the assignment right away -and resume the Provided NF service. Thus the service outage is over. If the HA manager reported -an error to the VNFM for the service it should clear it at this point. - -Simultaneously the VM failure is detected in the NFVI and reported to the VIM, which reports it -to the VNFM, if the VNFM hasn't requested a repair yet. If the VNFM requested the VM repair or if -it acknowledges the repair, the VIM restarts the VM. - -Once the VM is up the VIM reports it to the VNFM, which in turn may restart the VNFC if needed. - - -The key points in this scenario are: - -* The spare VNFC may be instantiated only after the detection of the failure of the active VNFC. -* As a result the HA manager's role might be limited to life-cycle management, i.e. no role - assignment is needed if the VNFC provides the service as soon as it is started up. -* Accordingly the HA management could be part of a generic VNFM provided if it is capable of detecting - failures in/of the VNFC and managing its life-cycle. -* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby - may protect all of them. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to each Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host needs to be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* Reporting the service failure to the VNFM is optional as the HA manager is in charge recovering - the service and it is aware of the redundancy needed to do so. - - - -2.7 Use Case 7: VNFC failure in a stateless VNF with no redundancy -===================================================================== - -Use case 7 represents a stateless VNF composed of a single VNFC, i.e. with no redundancy. -The VNF and in particular its VNFC is managed by the VNFM through managing its life-cycle (Fig 13). - -In this use case the VNFC fails. This failure is detected and handled by the VNFM. This use case -requires that the VNFM can detect the failures in the VNF or they are reported to the VNFM. - -The failure is only detectable at the VNFM level and it is handled by the VNFM restarting the VNFC. - - -.. figure:: images/Slide16.png - :alt: VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 13. VNFC failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VNFCfailureNoRed.png - :alt: MSC of the VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 14. Sequence of events for use case 7 - -The VNFC is providing the Provided NF when it fails (Fig 14). This failure is detected or reported to -the VNFM, which has to clean up the VNFC to isolate the fault. After cleanup success it can proceed -with restarting the VNFC, which as soon as it is up it starts to provide the Provided NF -as there is no state to recover. - -Thus the service outage is over, but it has included the entire time needed to restart the VNFC. -Considering that the VNF is stateless this may not be significant still. - - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The Provided NF can be resumed as soon as the VNFC is restarted, i.e. the restart time determines - the outage. -* In case multiple VNFCs are used they should not interfere with one another, they should - operate independently. - - -2.8 Use Case 8: VM failure in a stateless VNF with no redundancy -================================================================== - -Use case 8 represents the same stateless VNF composed of a single VNFC as use case 7, i.e. with -no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its -life-cycle (Fig 15). - -In this use case the VM hosting the VNFC fails. This failure is detected and handled by the VNFM -as well as by the VIM. - - -.. figure:: images/Slide17.png - :alt: VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 15. VM failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VMfailureNoRed.png - :alt: MSC of the VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 16. Sequence of events for use case 8 - -The VNFC is providing the Provided NF when the VM hosting the VNFC fails (Fig 16). - -This failure may be detected or reported to the VNFM as a failure of the VNFC. The VNFM may -not be aware at this point that it is a VM failure. Accordingly its first reaction as in use case 7 -is to clean up the VNFC to isolate the fault. Since the VM is gone, this cannot succeed and the VNFM -becomes aware of the VM failure through this or it is reported by the VIM. In either case it has to wait -with the repair of the VMFC until the VM becomes available again. - -Meanwhile the VIM also detects the VM failure and reports it to the VNFM unless the VNFM has already -requested the VM repair. After the VNFM confirming the VM repair the VIM restarts the VM and reports -the successful repair to the VNFM, which in turn can start the VNFC hosted on it. - - -Thus the recovery of the Provided NF includes the restart time of the VM and of the VNFC. - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The Provided NF can be resumed only after the VNFC is restarted on the repaired VM, i.e. the - restart time of the VM and the VNFC determines the outage. -* In case multiple VNFCs are used they should not interfere with one another, they should - operate independently. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to each Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first. -* The repair negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service - recovery for this use case, but by itself it may not protect against VNFC level failures. I.e. - VNFC level error detection is still required. - -2.9 Use Case 9: Repeated VNFC failure in a stateless VNF with no redundancy -============================================================================== - -Finally use case 9 represents again a stateless VNF composed of a single VNFC as in use case 7, i.e. -with no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its -life-cycle. - -In this use case the VNFC fails repeatedly. This failure is detected and handled by the VNFM, -but results in no resolution of the fault (Fig 17) because the VNFC is manifesting a fault, -which is not in its scope. I.e. the fault is propagating to the VNFC from a faulty VM or host, -for example. Thus the VNFM cannot resolve the problem by itself. - - -.. figure:: images/Slide19.png - :alt: Repeated VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 17. VM failure in a stateless VNF with no redundancy - - -To handle this case the failure handling needs to be escalated to the a bigger fault zone -(or fault domain), i.e. a scope within which the faults may propagate and manifest. In case of the -VNF the bigger fault zone is the VM and the facilities hosting it, all managed by the VIM. - -Thus the VNFM should request the repair from the VIM (Fig 18). - -Since the VNFM is only aware of the VM, it needs to report an error on the VM and it is the -VIM's responsibility to sort out what might be the scope of the actual fault depending on other -failures and error reports in its scope. - - -.. figure:: images/Slide20.png - :alt: Escalation of repeated VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 18. VM failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VNFCfailureNoRed-Escalation.png - :alt: MSC of the VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 19. Sequence of events for use case 9 - - -This use case starts similarly to use case 7, i.e. the VNFC is providing the Provided NF when it fails -(Fig 17). -This failure is detected or reported to the VNFM, which cleans up the VNFC to isolate the fault. -After successful cleanup the VNFM proceeds with restarting the VNFC, which as soon as it is up -starts to provide the Provided NF again as in use case 7. - -However the VNFC failure occurs N times repeatedly within some Probation time for which the VNFM starts -the timer when it detects the first failure of the VNFC. When the VNFC fails once more still within the -probation time the Escalation counter maximum is exceeded and the VNFM reports an error to the VIM on -the VM hosting the VNFC as obviously cleaning up and restarting the VNFC did not solve the problem. - -When the VIM receives the error report for the VM it has to isolate the fault by cleaning up at least -the VM. After successful cleanup it can restart the VM and once it is up report the VM repair to the VNFM. -At this point the VNFM can restart the VNFC, which in turn resumes the Provided VM. - -In this scenario the VIM needs to evaluate what may be the scope of the fault to determine what entity -needs a repair. For example, if it has detected VM failures on that same host, or other VNFMs -reported errors on VMs hosted on the same host, it should consider that the entire host needs a repair. - - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The VNFM needs to correlate VNFC failures over time to be able to detect failure of a bigger fault zone. - One way to do so is through counting the failures within a probation time. -* The VIM cannot detect all failures caused by faults in the entities under its control. It should be - able to receive error reports and correlate these error reports based on the dependencies - of the different entities. -* The VNFM does not know the source of the failure, i.e. the faulty entity. -* The VM repair should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first. - - - -***************************************************************** -3. Communication Interfaces for VNF HA schemes -***************************************************************** - -This section will discuss some general issues about communication interfaces -in the VNF HA schemes. In sections 2, the usecases of both stateful and -stateless VNFs are discussed. While in this section, we would like to discuss -some specific issues which are quite general for all the usecases proposed -in the previous sections. - -3.1. VNF External Interfaces -============================= - -Regardless whether the VNF is stateful or stateless, all the VNFCs should act as -a union from the perspective of the outside world. That means all the VNFCs should -share a common interface where the outside modules (e.g., the other VNFs) can -access the service from. There could be multiple solutions for this share of IP -interface. However, all of this sharing and switching of IP address should be -ignorant to the outside modules. - -There are several approaches for the VNFs to share the interfaces. A few of them -are listed as follows and will be discussed in detail. - -1) IP address of VMs for active/stand-by VM. - -2) Load balancers for active/active use cases - -Note that combinition of these two approaches is also feasible. - -For active/standby VNFCs, there is a common IP address shared by the VMs hosting -the active and standby VNFCs, so that they look as one instance from outside. -The HA manager will manage the assignment of the IP address to the VMs. -(The HA manager may not be aware of this, I.e. the address may be configured -and the active/standby state management is linked to the possession of the IP -address, i.e. the active VNFC claims it as part of becoming active.) Only the -active one possesses the IP address. And when failover happens, the standby -is set to be active and can take possession of the IP address to continue traffic -process. - - -For active/active VNFCs, a LB(Load Balancer) could be used. In such scenario, there -could be two cases for the deployment and usage of LB. - -Case 1: LB used in front of a cluster of VNFCs to distribute the traffic flow. - -In such case, the LB is deployed in front of a cluster of multiple VNFCs. Such -cluster can be managed by a seperate cluster manager, or can be managed just -by the LB, which uses heartbeat to monitor each VNFC. When one of VNFCs fails, -the cluster manager should first exclude the failed VNFC from the cluster so that -the LB will re-route the traffic to the other VNFCs, and then the failed one should -be recovered. In the case when the LB is acting as the cluster manager, it is -the LB's responsibility to inform the VNFM to recover the failed VNFC if possible. - - -Case 2: LB used in front of a cluster of VMs to distribute traffic flow. - -In this case, there exists a cluster manager(e.g. Pacemaker) to monitor and manage -the VMs in the cluster. The LB sits in front of the VM cluster so as to distribute -the traffic. When one of the VM fails, the cluster manager will detect that and will -be in charge of the recovery. The cluster manager will also exclude the failed VM -out of the cluster, so that the LB won't route traffic to the failed one. - -In both two cases, the HA of the LB should also be considered. - - -3.2. Intra-VNF Communication -============================== - -For stateful VNFs, data synchronization is necessary between the active and standby VMs. -The HA manager is responsible for handling VNFC failover, and do the assignment of the -active/standby states between the VNFCs of the VNF. Data synchronization can be handled -either by the HA manager or by the VNFC itself. - -The state synchronization can happen as - -- direct communication between the active and the standby VNFCs - -- based on the information received from the HA manager on channel or messages using a common queue, - -- it could be through a shared storage assigned to the whole VNF - -- through the checkpointing of state information via underlying memory and/or database checkpointing services to a separate VM and storage repository. - - -***************************************************************** -4 High Availability Scenarios for Network Nodes -***************************************************************** - -4.1 Network nodes and HA deployment -==================================== - -OpenStack network nodes contain: Neutron DHCP agent, Neutron L2 agent, Neutron L3 agent, Neutron LBaaS -agent and Neutron Metadata agent. The DHCP agent provides DHCP services for virtual networks. The -metadata agent provides configuration information such as credentials to instances. Note that the -L2 agent cannot be distributed and highly available. Instead, it must be installed on each data -forwarding node to control the virtual network drivers such as Open vSwitch or Linux Bridge. One L2 -agent runs per node and controls its virtual interfaces. - -A typical HA deployment of network nodes can be achieved in Fig 20. Here shows a two nodes cluster. -The number of the nodes is decided by the size of the cluster. It can be 2 or more. More details can be -achieved from each agent's part. - - -.. figure:: images_network_nodes/Network_nodes_deployment.png - :alt: HA deployment of network nodes - :figclass: align-center - - Fig 20. A typical HA deployment of network nodes - - -4.2 DHCP agent -================ - -The DHCP agent can be natively highly available. Neutron has a scheduler which lets you run multiple -agents across nodes. You can configure the dhcp_agents_per_network parameter in the neutron.conf file -and set it to X (X >=2 for HA, default is 1). - -If the X is set to 2, as depicted in Fig 21 three tenant networks (there can be multiple tenant networks) -are used as an example, six DHCP agents are deployed in two nodes for three networks, they are -all active. Two dhcp1s serve one network, dhcp2s and dhcp3s serve other two different networks. In a -network, all DHCP traffic is broadcast, DHCP servers race to offer IP. All the servers will update the -lease tables. In Fig 22, when the agent(s) in Node1 doesn't work which can be caused by software -failure or hardware failure, the dhcp agent(s) on Node2 will continue to offer IP for the network. - - -.. figure:: images_network_nodes/DHCP_deployment.png - :alt: HA deployment of DHCP agents - :figclass: align-center - - Fig 21. Natively HA deployment of DHCP agents - - -.. figure:: images_network_nodes/DHCP_failure.png - :alt: Failure of DHCP agents - :figclass: align-center - - Fig 22. Failure of DHCP agents - - -4.3 L3 agent -============== - -The L3 agent is also natively highly available. To achieve HA, it can be configured in the neutron.conf -file. - -.. code-block:: bash - - l3_ha = True # All routers are highly available by default - - allow_automatic_l3agent_failover = True # Set automatic L3 agent failover for routers - - max_l3_agents_per_router = 2 # Maximum number of network nodes to use for the HA router - - min_l3_agents_per_router = 2 # Minimum number of network nodes to use for the HA router. A new router - can be created only if this number of network nodes are available. - -According to the neutron.conf file, the L3 agent scheduler supports Virtual Router Redundancy -Protocol (VRRP) to distribute virtual routers across multiple nodes (e.g. 2). The scheduler will choose -a number between the maximum and the minimum number according scheduling algorithm. VRRP is implemented -by Keepalived. - -As depicted in Fig 23, both L3 agents in Node1 and Node2 host vRouter 1 and vRouter 2. In Node 1, -vRouter 1 is active and vRouter 2 is standby (hot standby). In Node2, vRouter 1 is standby and -vRouter 2 is active. For the purpose of reducing the load, two actives are deployed in two Nodes -alternatively. In Fig 24, Keepalived will be used to manage the VIP interfaces. One instance of -keepalived per virtual router, then one per namespace. 169.254.192.0/18 is a dedicated HA network -which is created in order to isolate the administrative traffic from the tenant traffic, each vRouter -will be connected to this dedicated network via an HA port. More details can be achieved from the -Reference at the bottom. - - -.. figure:: images_network_nodes/L3_deployment.png - :alt: HA deployment of L3 agents - :figclass: align-center - - Fig 23. Natively HA deployment of L3 agents - - -.. figure:: images_network_nodes/L3_ha_principle.png - :alt: HA principle of L3 agents - :figclass: align-center - - Fig 24. Natively HA principle of L3 agents - - -In Fig 25, when vRouter 1 in Node1 is down which can be caused by software failure or hardware failure, -the Keepalived will detect the failure and the standby will take over to be active. In order to keep the -TCP connection, Conntrackd is used to maintain the TCP sessions going through the router. One instance -of conntrackd per virtual router, then one per namespace. After then, a rescheduling procedure will be -triggered to respawn the failed virtual router to another l3 agent as standby. All the workflows is -depicted in Fig 26. - - -.. figure:: images_network_nodes/L3_failure.png - :alt: Failure of L3 agents - :figclass: align-center - - Fig 25. Failure of L3 agents - - -.. figure:: images_network_nodes/L3_ha_workflow.png - :alt: HA workflow of L3 agents - :figclass: align-center - - Fig 26. HA workflow of L3 agents - - -4.4 LBaaS agent and Metadata agent -==================================== - -Currently, no native feature is provided to make the LBaaS agent highly available using the defaul -plug-in HAProxy. A common way to make HAProxy highly available is to use Pacemaker. - - -.. figure:: images_network_nodes/LBaaS_deployment.png - :alt: HA deployment of LBaaS agents - :figclass: align-center - - Fig 27. HA deployment of LBaaS agents using Pacemaker - - -As shown in Fig 27 HAProxy and pacemaker are deployed in both of the network nodes. The number of network -nodes can be 2 or more. It depends on your cluster. HAProxy in Node 1 is the master and the VIP is in -Node 1. Pacemaker monitors the liveness of HAProxy. - - -.. figure:: images_network_nodes/LBaaS_failure.png - :alt: Failure of LBaaS agents - :figclass: align-center - - Fig 28. Failure of LBaaS agents - - -As shown in Fig 28 when HAProxy in Node1 falls down which can be caused by software failure or hardware -failure, Pacemaker will fail over HAProxy and the VIP to Node 2. - -Note that the default plug-in HAProxy only supports TCP and HTTP. - -No native feature is available to make Metadata agent highly available. At this time, the Active/Passive -solution exists to run the neutron metadata agent in failover mode with Pacemaker. The deployment and -failover procedure can be the same as the case of LBaaS. - - -********************************************** -5 Storage and High Availability Scenarios -********************************************** - - -5.1 Elements of HA Storage Management and Delivery -====================================================== - -Storage infrastructure, in any environment, can be broken down into two -domains: Data Path and Control Path. Generally, High Availability of the -storage infrastructure is measured by the occurence of Data -Unavailability and Data Loss (DU/DL) events. While that meaning is -obvious as it relates to the Data Path, it is also applicable to Control -Path as well. The inability to attach a volume that has data to a host, -for example, can be considered a Data Unavailability event. Likewise, -the inability to create a volume to store data could be considered Data -Loss since it may result in the inability to store critical data. - -Storage HA mechanisms are an integral part of most High Availability -solutions today. In the first two sections below, we define the -mechanisms of redundancy and protection required in the infrastructure -for storage delivery in both the Data and Control Paths. Storage -services that have these mechanisms can be used in HA environments that -are based on a highly available storage infrastructure. - -In the third section below, we examine HA implementations that rely on -highly available storage infrastructure. Note that the scope throughout this -section is focused on local HA solutions. This does not address rapid remote -Disaster Recovery scenarios that may be provided by storage, nor -does it address metro active/active environments that implement stretched -clusters of hosts across multiple sites for workload migration and availability. - - -5.2 Storage Failure & Recovery Scenarios: Storage Data Path -============================================================ - -In the failure and recovery scenarios described below, a redundant -network infrastructure provides HA through network-related device -failures, while a variety of strategies are used to reduce or minimize -DU/DL events based on storage system failures. This starts with redundant -storage network paths, as shown in Fig 29. - -.. figure:: StorageImages/RedundantStoragePaths.png - :alt: HA Storage Infrastructure - :figclass: align-center - - Fig 29: Typical Highly Available Storage Infrastructure - -Storage implementations vary tremendously, and the recovery mechanisms -for each implementation will vary. These scenarios described below are -limited to 1) high level descriptions of the most common implementations -since it is unpredictable as to -which storage implementations may be used for NFVI; 2) HW- and -SW-related failures (and recovery) of the storage data path, and not -anything associated with user configuration and operational issues which -typically create the most common storage failure scenarios; 3) -non-LVM/DAS based storage implementations(managing failure and recovery -in LVM-based storage for OpenStack is a very different scenario with -less of a reliable track record); and 4) I will assume block storage -only, and not object storage, which is often used for stateless -applications (at a high level, object stores may include a -subset of the block scenarios under the covers). - -To define the requirements for the data path, I will start at the -compute node and work my way down the storage IO stack and touch on both -HW and SW failure/recovery scenarios for HA along the way. I will use Fig 29 as a reference. - -1. Compute IO driver: Assuming iSCSI for connectivity between the -compute and storage, an iSCSI initiator on the compute node maintains -redundant connections to multiple iSCSI targets for the same storage -service. These redundant connections may be aggregated for greater -throughput, or run independently. This redundancy allows the iSCSI -Initiator to handle failures in network connectivity from compute to -storage infrastructure. (Fibre Channel works largely the same way, as do -proprietary drivers that connect a host's IO stack to storage systems). - -2. Compute node network interface controller (NIC): This device may -fail, and said failure reported via whatever means is in place for such -reporting from the host.The redundant paths between iSCSI initiators and -targets will allow connectivity from compute to storage to remain up, -though operating at reduced capacity. - -3. Network Switch failure for storage network: Assuming there are -redundant switches in place, and everything is properly configured so -that two compute NICs go to two separate switches, which in turn go to -two different storage controllers, then a switch may fail and the -redundant paths between iSCSI initiators and targets allows connectivity -from compute to storage to operational, though operating at reduced -capacity. - -4. Storage system network interface failure: Assuming there are -redundant storage system network interfaces (on separate storage -controllers), then one may fail and the redundant paths between iSCSI -initiators and targets allows connectivity from compute to storage to -remain operational, though operating at reduced performance. The extent -of the reduced performance is dependent upon the storage architecture. -See 3.5 for more. - -5. Storage controller failure: A storage system can, at a very high -level, be described as composed of network interfaces, one or more -storage controllers that manage access to data, and a shared Data Path -access to the HDD/SSD subsystem. The network interface failure is -described in #4, and the HDD/SSD subsystem is described in #6. All -modern storage architectures have either redundant or distributed -storage controller architectures. In **dual storage controller -architectures**, high availability is maintained through the ALUA -protocol maintaining access to primary and secondary paths to iSCSI -targets. Once a storage controller fails, the array operates in -(potentially) degraded performance mode until the failed storage controller is -replaced. The degree of reduced performance is dependent on the overall -original load on the array. Dual storage controller arrays also remain at risk -of a Data Unavailability event if the second storage controller should fail. -This is rare, but should be accounted for in planning support and -maintenance contracts. - -**Distributed storage controller architectures** are generally server-based, -which may or may not operate on the compute servers in Converged -Infrastructure environments. Hence the concept of storage controller -is abstract in that it may involve a distribution of software components -across multiple servers. Examples: Ceph and ScaleIO. In these environments, -the data may be stored -redundantly, and metadata for accessing the data in these redundant -locations is available for whichever compute node needs the data (with -authorization, of course). Data may also be stored using erasure coding -(EC) for greater efficiency. The loss of a storage controller in this -context leads to a discussion of impact caused by loss of a server in -this distributed storage controller architecture. In the event of such a loss, -if data is held in duplicate or triplicate on other servers, then access -is simply redirected to maintain data availability. In the case of -EC-based protection, then the data is simply re-built on the fly. The -performance and increased risk impact in this case is dependent on the -time required to rebalance storage distribution across other servers in -the environment. Depending on configuration and implementation, it could -impact storage access performance to VNFs as well. - -6. HDD/SSD subsystem: This subsystem contains any RAID controllers, -spinning hard disk drives, and Solid State Drives. The failure of a RAID -controller is equivalent to failure of a storage controller, as -described in 5 above. The failure of one or more storage devices is -protected by either RAID parity-based protection, Erasure Coding -protection, or duplicate/triplicate storage of the data. RAID and -Erasure Coding are typically more efficient in terms of space -efficiency, but duplicate/triplicate provides better performance. This -tradeoff is a common point of contention among implementations, and this -will not go into greater detail than to assume that failed devices do -not cause Data Loss events due to these protection algorithms. Multiple -device failures can potentially cause Data Loss events, and the risk of -each method must be taken into consideration for the HA requirements of -the desired deployment. - -5.3 Storage Failure & Recovery Scenarios: Storage Control Path -=============================================================== - -As it relates to an NFVI environment, as proposed by OPNFV, there are -two parts to the storage control path. - -* The storage system-specific control path to the storage controller - -* The OpenStack-specific cloud management framework for managing different storage elements - - -5.3.1 Storage System Control Paths ---------------------------------------- - -High Availability of a storage controller is storage -system-specific. Breaking it down to implementation variants is the best -approach. However, both variants assume an IP-based management API in -order to leverage network redundancy mechanisms for ubiquitous -management access. - -An appliance style storage array with dual storage controllers must implement IP -address failover for the management API's IP endpoint in either an -active/active or active/passive configuration. Likewise, a storage array -with >2 storage controllers would bring up a management endpoint on -another storage controller in such an event. Cluster-style IP address load -balancing is also a viable implementation in these scenarios. - -In the case of distributed storage controller architectures, the storage system -provides redundant storage controller interfaces. E.g., Ceph's RADOS provides -redundant paths to access an OSD for volume creation or access. In EMC's -ScaleIO, there are redundant MetaData Managers for managing volume -creation and access. In the case of the former, the access is via -proprietary protocol, in the case of the latter, it is via HTTP-based -REST API. Other storage implementations may also provide alternative -methods, but any enterprise-class storage system will have built-in HA -for management API access. - -Finally, note that single server-based storage solutions, such as LVM, -do not have HA solutions for control paths. If the server is failed, the -management of that server's storage is not available. - -5.3.2 OpenStack Controller Management -------------------------------------------- - -OpenStack cloud management is comprised of a number of different -function-specific management modules such as Keystone for Identity and -Access management (IAM), Nova for compute management, Cinder for block -storage management, Swift for Object Storage delivery, Neutron for -Network management, and Glance as an image repository. In smaller -single-cloud environments, these management systems are managed in -concert for High Availability; in larger multi-cloud environments, the -Keystone IAM may logically stand alone in its own HA delivery across the -multiple clouds, as might Swift as a common Object Store. Nova, Cinder, -and Glance may have separate scopes of management, but they are more -typically managed together as a logical cloud deployment. - -It is the OpenStack deployment mechanisms that are responsible for HA -deployment of these HA management infrastructures. These tools, such as -Fuel, RDO, and others, have matured to include highly available -implementations for the database, the API, and each of the manager -modules associated with the scope of cloud management domains. - -There are many interdependencies among these modules that impact Cinder high availability. -For example: - -* Cinder is implemented as an Active/Standby failover implementation since it requires a single point of control at one time for the Cinder manager/driver implementation. The Cinder manager/driver is deployed on two of the three OpenStack controller nodes, and one is made active while the other is passive. This may be improved to active/active in a future release. - -* A highly available database implementation must be delivered using something like MySQL/Galera replication across the 3 OpenStack controller nodes. Cinder requires an HA database in order for it to be HA. - -* A redundant RabbitMQ messaging implementation across the same three OpenStack controller nodes. Likewise, Cinder requires an HA messaging system. - -* A redundant OpenStack API to ensure Cinder requests can be delivered. - -* An HA Cluster Manager, like PaceMaker for monitoring each of the deployed manager elements on the OpenStack controllers, with restart capability. Keepalived is an alternative implementation for monitoring processes and restarting on alternate OpenStack controller nodes. While statistics are lacking, it is generally believed that the PaceMaker implementation is more frequently implemented in HA environments. - - -For more information on OpenStack and Cinder HA, see http://docs.openstack.org/ha-guide -for current thinking. - -While the specific combinations of management functions in these -redundant OpenStack controllers may vary with the specific small/large environment -deployment requirements, the basic implementation of three OpenStack controller -redundancy remains relatively common. In these implementations, the -highly available OpenStack controller environment provides HA access to -the highly available storage controllers via the highly available IP -network. - - -5.4 The Role of Storage in HA -=============================== - -In the sections above, we describe data and control path requirements -and example implementations for delivery of highly available storage -infrastructure. In summary: - -* Most modern storage infrastructure implementations are inherently highly available. Exceptions certainly apply; e.g., simply using LVM for storage presentation at each server does not satisfy HA requirements. However, modern storage systems such as Ceph, ScaleIO, XIV, VNX, and many others with OpenStack integrations, certainly do have such HA capabilities. - -* This is predominantly through network-accessible shared storage systems in tightly coupled configurations such as clustered hosts, or in loosely coupled configurations such as with global object stores. - - -Storage is an integral part of HA delivery today for applications, -including VNFs. This is examined below in terms of using storage as a -key part of HA delivery, the possible scope and limitations of that -delivery, and example implementations for delivery of such service. We -will examine this for both block and object storage infrastructures below. - -5.4.1 VNF, VNFC, and VM HA in a Block Storage HA Context ----------------------------------------------------------- - -Several scenarios were described in another section with regard to -managing HA at the VNFC level, with variants of recovery based on either -VIM- or VNFM-based reporting/detection/recovery mechanisms. In a block -storage environment, these differentiations are abstract and -meaningless, regardless of whether it is or is not intended to be HA. - -In a block storage context, HA is delivered via a logical block device -(sometimes called a Logical Unit, or LUN), or in some cases, to a VM. -VM and logical block devices are the units of currency. - -.. figure:: StorageImages/HostStorageCluster.png - :alt: Host Storage Cluster - :figclass: align-center - - Fig 30: Typical HA Cluster With Shared Storage - -In Fig 30, several hosts all share access, via an IP network -or via Fibre Channel, to a common set of logical storage devices. In an -ESX cluster implementation, these hosts all access all devices with -coordination provided with the SCSI Reservation mechanism. In the -particular ESX case, the logical storage devices provided by the storage -service actually aggregate volumes (VMDKs) utilized by VMs. As a result, -multiple host access to the same storage service logical device is -dynamic. The vSphere management layer provides for host cluster -management. - -In other cases, such as for KVM, cluster management is not formally -required, per se, because each logical block device presented by the -storage service is uniquely allocated for one particular VM which can -only execute on a single host at a time. In this case, any host that can -access the same storage service is potentially a part of the "cluster". -While *potential* access from another host to the same logical block -device is necessary, the actual connectivity is restricted to one host -at a time. This is more of a loosely coupled cluster implementation, -rather than the tightly coupled cluster implementation of ESX. - -So, if a single VNF is implemented as a single VM, then HA is provided -by allowing that VM to execute on a different host, with access to the -same logical block device and persistent data for that VM, located on -the storage service. This also applies to multiple VNFs implemented -within a single VM, though it impacts all VNFs together. - -If a single VNF is implemented across multiple VMs as multiple VNFCs, -then all of the VMs that comprise the VNF may need to be protected in a consistent -fashion. The storage service is not aware of the -distinction from the previous example. However, a higher level -implementation, such as an HA Manager (perhaps implemented in a VNFM) -may monitor and restart a collection of VMs on alternate hosts. In an ESX environment, -VM restarts are most expeditiously handled by using vSphere-level HA -mechanisms within an HA cluster for individual or collections of VMs. -In KVM environments, a separate HA -monitoring service, such as Pacemaker, can be used to monitor individual -VMs, or entire multi-VM applications, and provide restart capabilities -on separately configured hosts that also have access to the same logical -storage devices. - -VM restart times, however, are measured in 10's of seconds. This may -sometimes meet the SAL-3 recovery requirements for General Consumer, -Public, and ISP Traffic, but will never meet the 5-6 seconds required -for SAL-1 Network Operator Control and Emergency Services. For this, -additional capabilities are necessary. - -In order to meet SAL-1 restart times, it is necessary to have: 1. A hot -spare VM already up and running in an active/passive configuration 2. -Little-to-no-state update requirements for the passive VM to takeover. - -Having a spare VM up and running is easy enough, but putting that VM in -an appropriate state to take over execution is the difficult part. In shared storage -implementations for Fault Tolerance, which can achieve SAL-1 requirements, -the VMs share access to the same storage device, and another wrapper function -is used to update internal memory state for every interaction to the active -VM. - -This may be done in one of two ways, as illustrated in Fig 31. In the first way, -the hypervisor sends all interface interactions to the passive as well -as the active VM. The interaction is handled completely by -hypervisor-to-hypervisor wrappers, as represented by the purple box encapsulating -the VM in Figure 31, and is completely transparent to the VM. -This is available with the vSphere Fault Tolerant option, but not with -KVM at this time. - -.. figure:: StorageImages/FTCluster.png - :alt: FT host and storage cluster - :figclass: align-center - - Fig 31: A Fault Tolerant Host/Storage Configuration - -In the second way, a VM-level wrapper is used to capture checkpoints of -state from the active VM and transfers these to the passive VM, similarly represented -as the purple box encapsulating the VM in Fig 31. There -are various levels of application-specific integration required for this -wrapper to capture and transfer checkpoints of state, depending on the -level of state consistency required. OpenSAF is an example of an -application wrapper that can be used for this purpose. Both techniques -have significant network bandwidth requirements and may have certain -limitations and requirements for implementation. - -In both cases, the active and passive VMs share the same storage infrastructure. -Although the OpenSAF implementation may also utilize separate storage infrastructure -as well (not shown in Fig 31). - -Looking forward to the long term, both of these may be made obsolete. As soon as 2016, -PCIe fabrics will start to be available that enable shared NVMe-based -storage systems. While these storage systems may be used with -traditional protocols like SCSI, they will also be usable with true -NVMe-oriented applications whose memory state are persisted, and can be -shared, in an active/passive mode across hosts. The HA mechanisms here -are yet to be defined, but will be far superior to either of the -mechanisms described above. This is still a future. - - -5.4.2 HA and Object stores in loosely coupled compute environments -------------------------------------------------------------------- - -Whereas block storage services require tight coupling of hosts to -storage services via SCSI protocols, the interaction of applications -with HTTP-based object stores utilizes a very loosely coupled -relationship. This means that VMs can come and go, or be organized as an -N+1 redundant deployment of VMs for a given VNF. Each individual object -transaction constitutes the duration of the coupling, whereas with -SCSI-based logical block devices, the coupling is active for the -duration of the VM's mounting of the device. - -However, the requirement for implementation here is that the state of a -transaction being performed is made persistent to the object store by -the VM, as the restartable checkpoint for high availability. Multiple -VMs may access the object store somewhat simultaneously, and it is -required that each object transaction is made idempotent by the -application. - -HA restart of a transaction in this environment is dependent on failure -detection and transaction timeout values for applications calling the -VNFs. These may be rather high and even unachievable for the SAL -requirements. For example, while the General Consumer, Public, and ISP -Traffic recovery time for SAL-3 is 20-25 seconds, default browser -timeouts are upwards of 120 seconds. Common default timeouts for -applications using HTTP are typically around 10 seconds or higher -(browsers are upward of 120 seconds), so this puts a requirement on the -load balancers to manage and restart transactions in a timeframe that -may be a challenge to meeting even SAL-3 requirements. - -Despite these issues of performance, the use of object storage for highly -available solutions in native cloud applications is very powerful. Object -storage services are generally globally distributed and replicated using -eventual consistency techniques, though transaction-level consistency can -also be achieved in some cases (at the cost of performance). (For an interesting -discussion of this, lookup the CAP Theorem.) - - -5.5 Summary -============= - -This section addressed several points: - -* Modern storage systems are inherently Highly Available based on modern and reasonable implementations and deployments. - -* Storage is typically a central component in offering highly available infrastructures, whether for block storage services for traditional applications, or through object storage services that may be shared globally with eventual consistency. - -* Cinder HA management capabilities are defined and available through the use of OpenStack deployment tools, making the entire storage control and data paths highly available. - -************************** -6 Multisite Scenario -************************** - -The Multisite scenario refers to the cases when VNFs are deployed on multiple VIMs. -There could be three typical usecases for such scenario. - -One is in one DC, multiple openstack clouds are deployed. Taking into consideration that the -number of compute nodes in one openstack cloud are quite limited (nearly 100) for -both opensource and commercial product of openstack, multiple openstack clouds will -have to be deployed in the DC to manage thousands of servers. In such a DC, it should -be possible to deploy VNFs accross openstack clouds. - - -Another typical usecase is Geographic Redundancy (GR). GR deployment is to deal with more -catastrophic failures (flood, earthquake, propagating software fault, and etc.) of a single site. -In the Geographic redundancy usecase, VNFs are deployed in two sites, which are -geographically seperated and are deployed on NFVI managed by seperate VIM. When -such a catastrophic failure happens, the VNFs at the failed site can failover to -the redundant one so as to continue the service. Different VNFs may have specified -requirement of such failover. Some VNFs may need stateful failover, while others -may just need their VMs restarted on the redundant site in their initial state. -The first would create the overhead of state replication. The latter may still -have state replication through the storage. Accordingly for storage we don't want -to loose any data, and for networking the NFs should be connected the same way as -they were in the original site. We probably want also to have the same number of -VMs on the redundant site coming up for the VNFs. - - -The other usecase is the maintainance. When one site is planning for a maintaining, -it should first replicate the service to another site before it stops them. Such -replication should not disturb the service, nor should it cause any data loss. The -service at the second site should be excuted, before the first site is stopped and -began maintenance. In such case, the multisite schemes may be used. - -The multisite scenario is also captured by the Multisite project, in which specific -requirements of openstack are also proposed for different usecases. However, -the multisite project mainly focuses on the requirement of these multisite -usecases on openstack. HA requirements are not necessarily the requirement -for the approaches discussed in multisite. While the HA project tries to -capture the HA requirements in these usecases. The following links are the scenarios -and Usecases discussed in the Multisite project. -https://gerrit.opnfv.org/gerrit/#/c/2123/ https://gerrit.opnfv.org/gerrit/#/c/1438/. - -************************* -7 Concluding remarks -************************* - -This scenario analysis document outlined the model and some failure modes for NFV systems. These are an -initial list. The OPNFV HA project team is continuing to grow the list of scenarios and will -issue additional documents going forward. The basic use cases and service availability considerations -help define the key considerations for each use case taking into account the impact on the end service. -The use case document along with the requirements documents and gap analysis help set context for -engagement with various upstream projects. - -Reference -========== - -* OpenStack HA guide: http://docs.openstack.org/ha-guide/networking-ha.html - -* L3 High Availability VRRP: https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP
\ No newline at end of file diff --git a/Scenario_1/Scenario_Analysis_Communication_Interfaces.rst b/Scenario_1/Scenario_Analysis_Communication_Interfaces.rst deleted file mode 100644 index c97776b..0000000 --- a/Scenario_1/Scenario_Analysis_Communication_Interfaces.rst +++ /dev/null @@ -1,80 +0,0 @@ -3. Communication Interfaces for VNF HA schemes -=========================================================== - -This section will discuss some general issues about communication interfaces -in the VNF HA schemes. In sections 2, the usecases of both stateful and -stateless VNFs are discussed. While in this section, we would like to discuss -some specific issues which are quite general for all the usecases proposed -in the previous sections. - -3.1. VNF External Interfaces - -Regardless whether the VNF is stateful or stateless, all the VNFCs should act as -a union from the perspective of the outside world. That means all the VNFCs should -share a common interface where the outside modules (e.g., the other VNFs) can -access the service from. There could be multiple solutions for this share of IP -interface. However, all of this sharing and switching of IP address should be -ignorant to the outside modules. - -There are several approaches for the VNFs to share the interfaces. A few of them -are listed as follows and will be discussed in detail. - -1) IP address of VMs for active/stand-by VM. - -2) Load balancers for active/active use cases - -Note that combinition of these two approaches is also feasible. - -For active/standby VNFCs, there is a common IP address shared by the VMs hosting -the active and standby VNFCs, so that they look as one instance from outside. -The HA manager will manage the assignment of the IP address to the VMs. -(The HA manager may not be aware of this, I.e. the address may be configured -and the active/standby state management is linked to the possession of the IP -address, i.e. the active VNFC claims it as part of becoming active.) Only the -active one possesses the IP address. And when failover happens, the standby -is set to be active and can take possession of the IP address to continue traffic -process. - - -For active/active VNFCs, a LB(Load Balancer) could be used. In such scenario, there -could be two cases for the deployment and usage of LB. - -Case 1: LB used in front of a cluster of VNFCs to distribute the traffic flow. - -In such case, the LB is deployed in front of a cluster of multiple VNFCs. Such -cluster can be managed by a seperate cluster manager, or can be managed just -by the LB, which uses heartbeat to monitor each VNFC. When one of VNFCs fails, -the cluster manager should first exclude the failed VNFC from the cluster so that -the LB will re-route the traffic to the other VNFCs, and then the failed one should -be recovered. In the case when the LB is acting as the cluster manager, it is -the LB's responsibility to inform the VNFM to recover the failed VNFC if possible. - - -Case 2: LB used in front of a cluster of VMs to distribute traffic flow. - -In this case, there exists a cluster manager(e.g. Pacemaker) to monitor and manage -the VMs in the cluster. The LB sits in front of the VM cluster so as to distribute -the traffic. When one of the VM fails, the cluster manager will detect that and will -be in charge of the recovery. The cluster manager will also exclude the failed VM -out of the cluster, so that the LB won't route traffic to the failed one. - -In both two cases, the HA of the LB should also be considered. - - -3.2. Intra-VNF Communication - -For stateful VNFs, data synchronization is necessary between the active and standby VMs. -The HA manager is responsible for handling VNFC failover, and do the assignment of the -active/standby states between the VNFCs of the VNF. Data synchronization can be handled -either by the HA manager or by the VNFC itself. - -The state synchronization can happen as - -- direct communication between the active and the standby VNFCs - -- based on the information received from the HA manager on channel or messages using a common queue, - -- it could be through a shared storage assigned to the whole VNF - -- through the checkpointing of state information via underlying memory and/or -database checkpointing services to a separate VM and storage repository. diff --git a/Scenario_1/scenario_analysis_VNF_external_interface.rst b/Scenario_1/scenario_analysis_VNF_external_interface.rst deleted file mode 100644 index c634c20..0000000 --- a/Scenario_1/scenario_analysis_VNF_external_interface.rst +++ /dev/null @@ -1,99 +0,0 @@ -3. Communication Interfaces for VNF HA schemes -=========================================================== - -This section will discuss some general issues about communication interfaces -in the VNF HA schemes. In sections 2, the usecases of both stateful and -stateless VNFs are discussed. While in this section, we would like to discuss -some specific issues which are quite general for all the usecases proposed -in the previous sections. - -3.1. VNF External Interfacece - -Regardless whether the VNF is stateful or stateless, all the VNFCs should act as -a union from the perspective of the outside world. That means all the VNFCs should -share a common interface where the outside modules (e.g., the other VNFs) can -access the service from. There could be multiple solutions for this share of IP -interface. However, all of this sharing and switching of IP address should be -ignorant to the outside modules. - -There are several approaches for the VNFs to share the interfaces. A few of them -are listed as follows and will be discussed in detail. - -1) IP address of VMs for active/stand-by VM. - -2) Load balancers for active/active use cases - -Note that combinition of these two approaches is also feasible. - -For active/standby VNFCs, there is a common IP address shared by the VMs hosting -the active and standby VNFCs, so that they look as one instance from outside. -The HA manager will manage the assignment of the IP address to the VMs. -(The HA manager may not be aware of this, I.e. the address may be configured -and the active/standby state management is linked to the possession of the IP -address, i.e. the active VNFC claims it as part of becoming active.) Only the -active one possesses the IP address. And when failover happens, the standby -is set to be active and can take possession of the IP address to continue traffic -process. - -..[MT] In general I would rather say that the IP address is managed by the HA -manager and not provided. But as a concrete use case "provide" works fine. -So it depends how you want to use this text. -..[fq] Agree, Thank you! - -For active/active VNFCs, a LB(Load Balancer) could be used. In such scenario, there -could be two cases for the deployment and usage of LB. - -Case 1: LB used in front of a cluster of VNFCs to distribute the traffic flow. - -In such case, the LB is deployed in front of a cluster of multiple VNFCs. Such -cluster can be managed by a seperate cluster manager, or can be managed just -by the LB, which uses heartbeat to monitor each VNFC. When one of VNFCs fails, -the cluster manager should recover the failed one, and should also exclude the -failed VNFC from the cluster so that the LB will re-route the traffic to -to the other VNFCs. In the case when the LB is acting as the cluster manager, it is -the LB's responsibility to inform the VNFM to recover the failed VNFC if possible. - - -Case 2: LB used in front of a cluster of VMs to distribute traffic flow. - -In this case, there exists a cluster manager(e.g. Pacemaker) to monitor and manage -the VMs in the cluster. The LB sits in front of the VM cluster so as to distribute -the traffic. When one of the VM fails, the cluster manager will detect that and will -be in charge of the recovery. The cluster manager will also exclude the failed VM -out of the cluster, so that the LB won't route traffic to the failed one. - -In both two cases, the HA of the LB should also be considered. - -..[MT] I think this use case needs to show also how the LB learns about the new VNFC. -Also we should distinguish VNFC and VM failures as VNFC failure wouldn't be detected -in the NFVI e.g. LB, so we need a resolution, an applicability comment at least. -..[fq] I think I have made a mistake here by saying the VNFC. Actually if the failure -only happens in VNFC, the VNFC should reboot itself rather than have a new VNFC taking -its place. So in this case, I think I should modify VNFC into VMs. And as you mentioned, -the NFVI level can hardly detect VNFC level failure. - -..[MT] There could also be a combined case for the N+M redundancy, when there are N -actives but also M standbys at the VNF level. -..[fq] It could be. But I actually haven't see such a deployed case. So I am not sure -if I can discribe the schemes correctly:) - -3.2. Intra-VNF Communication - -For stateful VNFs, data synchronization is necessary between the active and standby VMs. -The HA manager is responsible for handling VNFC failover, and do the assignment of the -active/standby states between the VNFCs of the VNF. Data synchronization can be handled -either by the HA manager or by the VNFC itself. - -The state synchronization can happen as - -- direct communication between the active and the standby VNFCs - -- based on the information received from the HA manager on channel or messages using a common queue, - -..[MT] I don't understand the yellow inserted text -..[fq] Neither do I, actually. I think it is added by some one else and I can't make -out what it means as well:) - -- it could be through a shared storage assigned to the whole VNF - -- through in-memory database (checkpointing), when the database (checkpoint service) takes care of the data replication. diff --git a/Scenario_2/scenario_analysis_multi_site.rst b/Scenario_2/scenario_analysis_multi_site.rst deleted file mode 100644 index 2e43471..0000000 --- a/Scenario_2/scenario_analysis_multi_site.rst +++ /dev/null @@ -1,45 +0,0 @@ -6, Multisite Scenario -==================================================== - -The Multisite scenario refers to the cases when VNFs are deployed on multiple VIMs. -There could be three typical usecases for such scenario. - -One is in one DC, multiple openstack clouds are deployed. Taking into consideration that the -number of compute nodes in one openstack cloud are quite limited (nearly 100) for -both opensource and commercial product of openstack, multiple openstack clouds will -have to be deployed in the DC to manage thousands of servers. In such a DC, it should -be possible to deploy VNFs accross openstack clouds. - - -Another typical usecase is Geographic Redundancy (GR). GR deployment is to deal with more -catastrophic failures (flood, earthquake, propagating software fault, and etc.) of a single site. -In the Geographic redundancy usecase, VNFs are deployed in two sites, which are -geographically seperated and are deployed on NFVI managed by seperate VIM. When -such a catastrophic failure happens, the VNFs at the failed site can failover to -the redundant one so as to continue the service. Different VNFs may have specified -requirement of such failover. Some VNFs may need stateful failover, while others -may just need their VMs restarted on the redundant site in their initial state. -The first would create the overhead of state replication. The latter may still -have state replication through the storage. Accordingly for storage we don't want -to loose any data, and for networking the NFs should be connected the same way as -they were in the original site. We probably want also to have the same number of -VMs on the redundant site coming up for the VNFs. - - -The other usecase is the maintainance. When one site is planning for a maintaining, -it should first replicate the service to another site before it stops them. Such -replication should not disturb the service, nor should it cause any data loss. The -service at the second site should be excuted, before the first site is stopped and -began maintenance. In such case, the multisite schemes may be used. - -The multisite scenario is also captured by the Multisite project, in which specific -requirements of openstack are also proposed for different usecases. However, -the multisite project mainly focuses on the requirement of these multisite -usecases on openstack. HA requirements are not necessarily the requirement -for the approaches discussed in multisite. While the HA project tries to -capture the HA requirements in these usecases. The following links are the scenarios -and Usecases discussed in the Multisite project. -https://gerrit.opnfv.org/gerrit/#/c/2123/ -https://gerrit.opnfv.org/gerrit/#/c/1438/. - - diff --git a/Section6_VNF_HA.rst b/Section6_VNF_HA.rst deleted file mode 100644 index afc84ac..0000000 --- a/Section6_VNF_HA.rst +++ /dev/null @@ -1,329 +0,0 @@ -======================= -6 VNF High Availability -======================= - - -************************ -6.1 Service Availability -************************ - -In the context of NFV, Service Availability refers to the End-to-End (E2E) Service -Availability which includes all the elements in the end-to-end service (VNFs and -infrastructure components) with the exception of the customer terminal such as -handsets, computers, modems, etc. The service availability requirements for NFV -should be the same as those for legacy systems (for the same service). - -Service Availability = -total service available time / -(total service available time + total service recovery time) - -The service recovery time among others depends on the number of redundant resources -provisioned and/or instantiated that can be used for restoring the service. - -In the E2E relation a Network Service is available only of all the necessary -Network Functions are available and interconnected appropriately to collaborate -according to the NF chain. - -General Service Availability Requirements -========================================= - -* We need to be able to define the E2E (V)NF chain based on which the E2E availability - requirements can be decomposed into requirements applicable to individual VNFs and - their interconnections -* The interconnection of the VNFs should be logical and be maintained by the NFVI with - guaranteed characteristics, e.g. in case of failure the connection should be - restored within the acceptable tolerance time -* These characteristics should be maintained in VM migration, failovers and switchover, - scale in/out, etc. scenarios -* It should be possible to prioritize the different network services and their VNFs. - These priorities should be used when pre-emption policies are applied due to - resource shortage for example. -* VIM should support policies to prioritize a certain VNF. -* VIM should be able to provide classified virtual resources to VNFs in different SAL - -6.1.1 Service Availability Classification Levels -================================================ - -The [ETSI-NFV-REL_] defined three Service Availability Levels -(SAL) are classified in Table 1. They are based on the relevant ITU-T recommendations -and reflect the service types and the customer agreements a network operator should -consider. - -.. [ETSI-NFV-REL] `ETSI GS NFV-REL 001 V1.1.1 (2015-01) <http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf>`_ - - -*Table 1: Service Availability classification levels* - -+-------------+-----------------+-----------------------+---------------------+ -|SAL Type | Customer Type | Service/Function | Notes | -+=============+=================+=======================+=====================+ -|Level 1 | Network Operator| * Intra-carrier | Sub-levels within | -| | Control Traffic | engineering | Level 1 may be | -| | | traffic | created by the | -| | Government/ | * Emergency | Network Operator | -| | Regulatory | telecommunication | depending on | -| | Emergency | service (emergency | Customer demands | -| | Services | response, emergency| E.g.: | -| | | dispatch) | | -| | | * Critical Network | * 1A - Control; | -| | | Infrastructure | * 1B - Real-time; | -| | | Functions (e.g | * 1C - Data; | -| | | VoLTE functions | | -| | | DNS Servers,etc.) | May require 1+1 | -| | | | Redundancy with | -| | | | Instantaneous | -| | | | Switchover | -+-------------+-----------------+-----------------------+---------------------+ -|Level 2 | Enterprise and/ | * VPN | Sub-levels within | -| | or large scale | * Real-time traffic | Level 2 may be | -| | customers | (Voice and video) | created by the | -| | (e.g. | * Network | Network Operator | -| | Corporations, | Infrastructure | depending on | -| | University) | Functions | Customer demands. | -| | | supporting Level | E.g.: | -| | Network | 2 services (e.g. | | -| | Operators | VPN servers, | * 2A - VPN; | -| | (Tier1/2/3) | Corporate Web/ | * 2B - Real-time; | -| | service traffic | Mail servers) | * 2C - Data; | -| | | | | -| | | | May require 1:1 | -| | | | Redundancy with | -| | | | Fast (maybe | -| | | | Instantaneous) | -| | | | Switchover | -+-------------+-----------------+-----------------------+---------------------+ -|Level 3 | General Consumer| * Data traffic | While this is | -| | Public and ISP | (including voice | typically | -| | Traffic | and video traffic | considered to be | -| | | provided by OTT) | "Best Effort" | -| | | * Network | traffic, it is | -| | | Infrastructure | expected that | -| | | Functions | Network Operators | -| | | supporting Level | will devote | -| | | 3 services | sufficient | -| | | | resources to | -| | | | assure | -| | | | "satisfactory" | -| | | | levels of | -| | | | availability. | -| | | | This level of | -| | | | service may be | -| | | | pre-empted by | -| | | | those with | -| | | | higher levels of | -| | | | Service | -| | | | Availability. May | -| | | | require M+1 | -| | | | Redundancy with | -| | | | Fast Switchover; | -| | | | where M > 1 and | -| | | | the value of M to | -| | | | be determined by | -| | | | further study | -+-------------+-----------------+-----------------------+---------------------+ - -Requirements -^^^^^^^^^^^^ - -* It shall be possible to define different service availability levels -* It shall be possible to classify the virtual resources for the different - availability class levels -* The VIM shall provide a mechanism by which VNF-specific requirements - can be mapped to NFVI-specific capabilities. - -More specifically, the requirements and capabilities may or may not be made up of the -same KPI-like strings, but the cloud administrator must be able to configure which -HA-specific VNF requirements are satisfied by which HA-specific NFVI capabilities. - - - -6.1.2 Metrics for Service Availability -====================================== - -The [ETSI-NFV-REL_] identifies four metrics relevant to service -availability: - -* Failure recovery time, -* Failure impact fraction, -* Failure frequency, and -* Call drop rate. - -6.1.2.1 Failure Recovery Time -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The failure recovery time is the time interval from the occurrence of an abnormal -event (e.g. failure, manual interruption of service, etc.) until the recovery of the -service regardless if it is a scheduled or unscheduled abnormal event. For the -unscheduled case, the recovery time includes the failure detection time and the -failure restoration time. -More specifically restoration also allows for a service recovery by the restart of -the failed provider(s) while failover implies that the service is recovered by a -redundant provider taking over the service. This provider may be a standby -(i.e. synchronizing the service state with the active provider) or a spare -(i.e. having no state information). Accordingly failover also means switchover, that -is, an orederly takeover of the service from the active provider by the standby/spare. - -Requirements -^^^^^^^^^^^^ - -* It should be irrelevant whether the abnormal event is due to a scheduled or - unscheduled operation or it is caused by a fault. -* Failure detection mechanisms should be available in the NFVI and configurable so - that the target recovery times can be met -* Abnormal events should be logged and communicated (i.e. notifications and alarms as - appropriate) - -The TL-9000 forum has specified a service interruption time of 15 seconds as outage -for all traditional telecom system services. [ETSI-NFV-REL_] -recommends the setting of different thresholds for the different Service Availability -Levels. An example setting is given in the following table 2. Note that for all -Service Availability levels Real-time Services require the fastest recovery time. -Data services can tolerate longer recovery times. These recovery times are applicable -to the user plane. A failure in the control plane does not have to impact the user plane. -The main concern should be simultaneous failures in the control and user planes -as the user plane cannot typically recover without the control plane. However an HA -mechanism in VNF itself can further mitigate the risk. Note also that the impact on -the user plane depends on the control plane service experiencing the failure, -some of them are more critical than others. - - -*Table 2: Example service recovery times for the service availability levels* - -+------------+-----------------+------------------------------------------+ -|SAL | Service | Notes | -| | Recovery | | -| | Time | | -| | Threshold | | -+============+=================+==========================================+ -|1 | 5 - 6 seconds | Recommendation: Redundant resources to be| -| | | made available on-site to ensure fast | -| | | recovery. | -+------------+-----------------+------------------------------------------+ -|2 | 10 - 15 seconds | Recommendation: Redundant resources to be| -| | | available as a mix of on-site and off- | -| | | site as appropriate. | -| | | | -| | | * On-site resources to be utilized for | -| | | recovery of real-time services. | -| | | * Off-site resources to be utilized for | -| | | recovery of data services. | -+------------+-----------------+------------------------------------------+ -|3 | 20 - 25 seconds | Recommendation: Redundant resources to be| -| | | mostly available off-site. Real-time | -| | | services should be recovered before data | -| | | services | -+------------+-----------------+------------------------------------------+ - - -6.1.2.2 Failure Impact Fraction -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The failure impact fraction is the maximum percentage of the capacity or user -population affected by a failure compared with the total capacity or the user -population supported by a service. It is directly associated with the failure impact -zone which is the set of resources/elements of the system to which the fault may -propagate. - -Requirements -^^^^^^^^^^^^ - -* It should be possible to define the failure impact zone for all the elements of the - system -* At the detection of a failure of an element, its failure impact zone must be - isolated before the associated recovery mechanism is triggered -* If the isolation of the failure impact zone is unsuccessful the isolation should be - attempted at the next higher level as soon as possible to prevent fault propagation. -* It should be possible to define different levels of failure impact zones with - associated isolation and alarm generation policies -* It should be possible to limit the collocation of VMs to reduce the failure impact - zone as well as to provide sufficient resources - -6.1.2.3 Failure Frequency -^^^^^^^^^^^^^^^^^^^^^^^^^ - -Failure frequency is the number of failures in a certain period of time. - -Requirements -^^^^^^^^^^^^ - -* There should be a probation period for each failure impact zones within which - failures are correlated. -* The threshold and the probation period for the failure impact zones should be - configurable -* It should be possible to define failure escalation policies for the different - failure impact zones - - -6.1.2.4 Call Drop Rate -^^^^^^^^^^^^^^^^^^^^^^ - -Call drop rate reflects service continuity as well as system reliability and -stability. The metric is inside the VNF and therefore is not specified further for -the NFV environment. - -Requirements -^^^^^^^^^^^^ - -* It shall be possible to specify for each service availability class the associated - availability metrics and their thresholds -* It shall be possible to collect data for the defined metrics -* It shall be possible to delegate the enforcement of some thresholds to the NFVI -* Accordingly it shall be possible to request virtual resources with guaranteed - characteristics, such as guaranteed latency between VMs (i.e. VNFCs), between a VM - and storage, between VNFs - - -********************** -6.2 Service Continuity -********************** - -The determining factor with respect to service continuity is the statefulness of the -VNF. If the VNF is stateless, there is no state information which needs to be -preserved to prevent the perception of service discontinuity in case of failure or -other disruptive events. -If the VNF is stateful, the NF has a service state which needs to be preserved -throughout such disruptive events in order to shield the service consumer from these -events and provide the perception of service continuity. A VNF may maintain this state -internally or externally or a combination with or without the NFVI being aware of the -purpose of the stored data. - -Requirements -============ - -* The NFVI should maintain the number of VMs provided to the VNF in the face of - failures. I.e. the failed VM instances should be replaced by new VM instances -* It should be possible to specify whether the NFVI or the VNF/VNFM handles the - service recovery and continuity -* If the VNF/VNFM handles the service recovery it should be able to receive error - reports and/or detect failures in a timely manner. -* The VNF (i.e. between VNFCs) may have its own fault detection mechanism, which might - be triggered prior to receiving the error report from the underlying NFVI therefore - the NFVI/VIM should not attempt to preserve the state of a failing VM if not - configured to do so -* The VNF/VNFM should be able to initiate the repair/reboot of resources of the VNFI - (e.g. to recover from a fault persisting at the VNF level => failure impact zone - escalation) -* It should be possible to disallow the live migration of VMs and when it is allowed - it should be possible to specify the tolerated interruption time. -* It should be possible to restrict the simultaneous migration of VMs hosting a given - VNF -* It should be possible to define under which circumstances the NFV-MANO in - collaboration with the NFVI should provide error handling (e.g. VNF handles local - recoveries while NFV-MANO handles geo-redundancy) -* The NFVI/VIM should provide virtual resource such as storage according to the needs - of the VNF with the required guarantees (see virtual resource classification). -* The VNF shall be able to define the information to be stored on its associated - virtual storage -* It should be possible to define HA requirements for the storage, its availability, - accessibility, resilience options, i.e. the NFVI shall handle the failover for the - storage. -* The NFVI shall handle the network/connectivity failures transparent to the VNFs -* The VNFs with different requirements should be able to coexist in the NFV Framework -* The scale in/out is triggered by the VNF (VNFM) towards the VIM (to be executed in - the NFVI) -* It should be possible to define the metrics to monitor and the related thresholds - that trigger the scale in/out operation -* Scale in operation should not jeopardize availability (managed by the VNF/VNFM), - i.e. resources can only be removed one at a time with a period in between sufficient - for the VNF to restore any required redundancy. - diff --git a/Section_1.rst b/Section_1.rst deleted file mode 100644 index 0b8d72f..0000000 --- a/Section_1.rst +++ /dev/null @@ -1,237 +0,0 @@ -=================================================== -1.0 Overall Principle for High Availability in NFV -=================================================== - -The ultimate goal for the High Availability schema is to provide high -availability to the upper layer services. - -High availability is provided by the following steps once a failure happens: - - Step 1: failover of services once failure happens and service is out of work - - Step 2: Recovery of failed parts in each layer. - -****************************************** -1.1 Framework for High Availability in NFV -****************************************** - -Framework for Carrier Grade High availability: - -A layered approach to availability is required for the following reasons: - -* fault isolation -* fault tolerance -* fault recovery - -Among the OPNFV projects the OPNFV-HA project's focus is on requirements related -to service high availability. This is complemented by other projects such as the -OPNFV - Doctor project, whose focus is reporting and management of faults along -with maintenance, the OPNFV-Escalator project that considers the upgrade of the -NFVI and VIM, or the OPNFV-Multisite that adds geographical redundancy to the -picture. - -A layered approach allows the definition of failure domains (e.g., the -networking hardware, the distributed storage system, etc.). If possible, a fault -shall be handled at the layer (failure domain) where it occurs. If a failure -cannot be handled at its corresponding layer, the next higher layer needs to be -able to handle it. In no case, shall a failure cause cascading failures at other -layers. - -The layers are: - - -+---------------------------+-------------------------------------+ -+ Service + End customer visible service | -+===========================+=====================================+ -+ Application + VNF's, VNFC's | -+---------------------------+-------------------------------------+ -+ NFVI/VIM + Infrastructure, VIM, VNFM, VM | -+---------------------------+-------------------------------------+ -+ Hardware + Servers, COTS platforms | -+---------------------------+-------------------------------------+ - -The following document describes the various layers and how they need to -address high availability. - -************** -1.2 Definitons -************** - -Reference from the ETSI NFV doc. - -**Availability:** Availability of an item to be in a state to perform a required -function at a given instant of time or at any instant of time within a given -time interval, assuming that the external resources, if required, are provided. - -**Accessibility:** It is the ability of a service to access (physical) resources -necessary to provide that service. If the target service satisfies the minimum -level of accessibility, it is possible to provide this service to end users. - -**Admission control:** It is the administrative decision (e.g. by operator's -policy) to actually provide a service. In order to provide a more stable and -reliable service, admission control may require better performance and/or -additional resources than the minimum requirement. Failure: deviation of the -delivered service from fulfilling the system function. - -**Fault:** adjudged or hypothesized cause of an error - -**Service availability:** service availability of <Service X> is the long-term -average of the ratio of aggregate time between interruptions to scheduled -service time of <ServiceX> (expressed as a percentage) on a user-to-user basis. -The time between interruptions is categorized as Available (Up time) using the -availability criteria as defined by the parameter thresholds that are relevant -for <Service X>. - -Accoring to the ETSI GS NFV-REL 001 V1.1.1 (2015-01) document service -availability in the context of NFV is defined as End-to-End Service availability - -.. (MT) The relevant parts in NFV-REL defines SA as: - -Service Availability refers to the End-to-End Service Availability which -includes all the elements in the end-to-end service (VNFs and infrastructure -components) with the exception of the customer terminal. This is a customer -facing (end user) availability definition and it is the result of accessibility -and #admission control (see their respective definitions above). - -Service Availability=total service available time/ - (total service available time + total restoration time) - -**Service continuity:** Continuous delivery of service in conformance with -service's functional and behavioural specification and SLA requirements, -both in the control and data planes, for any initiated transaction or session -until its full completion even in the events of intervening exceptions or -anomalies, whether scheduled or unscheduled, malicious, intentional -or unintentional. - -The relevant parts in NFV-REL: -The basic property of service continuity is that the same service is provided -during VNF scaling in/out operations, or when the VNF offering that service -needs to be relocated to another site due to an anomaly event -(e.g. CPU overload, hardware failure or security threat). - -**Service failover:** when the instance providing a service/VNF becomes -unavailable due to fault or failure, another instance will (automatically) take -over the service, and this whole process is transparent to the user. It is -possible that an entire VNF instance becomes unavailble while providing its -service. - -.. (MT) I think the service or an instance of it is a logical entity on its own and the service availability and continuity is with respect to this logical entity. For examlpe if a HTTP server serves a given URL, the HTTP server is the provider while that URL is the service it is providing. As long as I have an HTTP server running and serving this URL I have the service available. But no matter how many HTTP servers I'm running if they are not assigned to serve the URL, then it is not available. Unfortunately in the ETSI NFV documents there's not a clear distinction between the service and the provider logical entities. The distinction is more on the level of the different incarnations of the provider entity, i.e. VNF and its instances or VNFC and its instances. I don't know if I'm clear enough and to what extent we should go into this, but I tried to modify the definition along these lines. Now regarding the user perception and whether it's automatic I agreed that we want it automatic and seemless for the user, but I don't think that this is part of the failover definition. If it's done manually or if the user detects it it's still a failover. It's just not seemless. Requiring it being automatic and seemless should be in the requirement section as appropriate. - -.. (fq) Agree. - -**Service failover time:** Service failover is when the instance providing a -service becomes unavailable due to a fault or a failure and another healthy -instance takes over in providing the service. In the HA context this should be -an automatic action and this whole process should be transparent to the user. -It is possible that an entire VNF instance becomes unavailble while providing -its service. - -.. (MT) Aligned with the above I would say that the serice failover time is the time from the moment of detecting the failure of the instance providing the service until the service is provided again by a new instance. - -.. (fq) So in such definition, the time duration for the failure of the service=failure detection time+service failover time. Am I correct? - -.. (bb) I feel, it is; "time duration for failover of the service = failure detection time + service failover time". -.. (MT) I would say that the "failure detection time" + "service failover time" = "service outage time" or actually we defined it below as the "service recovery time" . To reduce the outage we probably can't do much with the "service failover time", it is whatever is needed to perform the failover procedure, so it's tied to the implementation. It's somewhat "given". We may have more control over the detection time as that depends on the frequency of the health-check/heartbeat as this is often configurable. - -.. (fq) Got it. Agree. - -**Failure detection:** If a failure is detected, the failure must be identified -to the component responsible for correction. - -.. (MT) I would rather say "failure detection" as the fault is not detectable until it becomes a failure, even then we may not know where the actual fault is. We only know what failed due to the fault. E.g. we can detect the memory leak, something may crash due to it, but it's much more difficult to figure out where the fault is, i.e. the bug in the software. - -.. (MT) Also I think failures may be detected by different entities in the system, e.g. it could be a monitoring entity, a watchdog, the hypervisor, the VNF itself or a VNF tryng to use the services of a failed VNF. For me all these are failure detections regardless whether they are reported to the VNF. I think from an HA perspective what's important is the error report API(s) that entities should use if they detect a failure they are not in charge of correcting. -.. (fq) Agree. I modify the definition. - -**Failure detection time:** Failure detection time is the time interval from the -moment the failure occurs till it is reported as a detected failure. - -**Alarm:** Alarms are notifications (not queried) that are activated in response -to an event, a set of conditions, or the state of an inventory object. They -also require attention from an entity external to the reporting entity (if not -then the entity should cope with it and not raise the alarm). - -.. (MT) According to NFV-INF 004: Alarms are notifications (not queried) that are activated in response to an event, a set of conditions, or the state of an inventory object. I would add also that they also require attention from an entity external to the reporting entity (if not then the entity should cope with it and not raise the alarm). - -**Alarm threshold condition detection:** Alarm threshold condition is detected -by the component responsible for it. The component periodically evaluates the -condition associated with the alarm and if the threshold is reached, it -generates an alarm on the approprite channel, which in turn delivers it to the -entity(ies) responsible, such as the VIM. - -.. (fq) I don't think the VNF need to know all the alarm. so I use VIM as the terminal point for the alarm detection - -.. (MT) The same way as for the faults/failures, I don't think it's the receiving end that is important but the generatitng end and that it has the right and appropriate channel to communicate the alarm. But I have the impression that you are focusing on a particular type of alarm (i.e. threshold alarm) and not alarms in general. - -.. (fq) Yes, I actully have the threshold alarm in my mind when I wrote this. So I think VIM might be the right receiving end for these alarm. I agree with your ideas about the right channel. I am just not sure whether we should put this part in a high lever perspective or we should define some details. After all OPNFV is an opensource project and we don't want it to be like standarization projects in ETSI. But I agree for the definition part we should have a high level and abstract definition for these, and then we can specify the detail channels in the API definition. - -.. (MT) I tried to modify accordingly. Pls check. I think when it comes to the receiver we don't need to be specific from the detection perspective as usually there is a well-known notification channel that the management entity if it exists would listen to. The alarm detection does not require this entity, it just states that something is wrong and someone should deal with it hence the alarm. - -**Alarm threshold detection time:** the threshold time interval between the -metrics exceeding the threshold and the alarm been detected. - -.. (MT) I assume you are focusing on these threshold alarms, and not alarms in general. -.. (MT) Here similar to the failover time, we may have some control over the detection time (i.e. shorten the evaluation period), but may not on the delivery time. -.. (MT2) I changed "condition" to "threshold" to make it clearer as failure is a "condition" too :-) - -**Service recovery:** The restoration of the service state after the instance of -a service/VNF is unavailable due to fault or failure or manual interuption. - -.. (MT) I think the service recovery is the restoration of the state in which the required function is provided - -**Service recovery time:** Service recovery time is the time interval from the -occurrence of an abnormal event (e.g. failure, manual interruption of service, -etc.) until recovery of the service. - -.. (MT) in NFV-REL: Service recovery time is the time interval from the occurrence of an abnormal event (e.g. failure, manual interruption of service, etc.) until recovery of the service. - -**SAL:** Service Availability Level - -************************ -1.3 Overall requirements -************************ - -Service availability shall be considered with respect to the delivery of end to -end services. - -* There should be no single point of failure in the NFV framework -* All resiliency mechanisms shall be designed for a multi-vendor environment, - where for example the NFVI, NFV-MANO, and VNFs may be supplied by different - vendors. -* Resiliency related information shall always be explicitly specified and - communicated using the reference interfaces (including policies/templates) of - the NFV framework. - -********************* -1.4 Time requirements -********************* - -The time requirements below are examples in order to break out of the failure -detection times considering the service recovery times presented as examples for -the different service availability levels in the ETSI GS NFV-REL 001 V1.1.1 -(2015-01) document. - -The table below maps failure modes to example failure detection times. - -+------------------------------------------------------------+---------------+ -|Failure Mode | Time | -+============================================================+===============+ -|Failure detection of HW | <1s | -+------------------------------------------------------------+---------------+ -|Failure detection of virtual resource | <1s | -+------------------------------------------------------------+---------------+ -|Alarm threshold detection | <1min | -+------------------------------------------------------------+---------------+ -|Failure detection over of SAL 1 | <1s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 1 | 5-6s | -+------------------------------------------------------------+---------------+ -|Failure detectionover of SAL 2 | <5s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 2 | 10-15s | -+------------------------------------------------------------+---------------+ -|Failure detectionover of SAL 3 | <10s | -+------------------------------------------------------------+---------------+ -|Recovery of SAL 3 | 20-25s | -+------------------------------------------------------------+---------------+ - diff --git a/Section_2_Hardware_HA.rst b/Section_2_Hardware_HA.rst deleted file mode 100644 index 7f4e054..0000000 --- a/Section_2_Hardware_HA.rst +++ /dev/null @@ -1,186 +0,0 @@ -=============== -2.0 Hardware HA -=============== - -The hardware HA can be solved by several legacy HA schemes. However, when -considering the NFV scenarios, a hardware failure will cause collateral damage to -not only to the services but also virtual infrastructure running on it. - -A redundant architecture and automatic failover for the hardware are required -for the NFV scenario. At the same time, the fault detection and report of HW -failure from the hardware to VIM, VNFM and if necessary the Orchestrator to achieve HA in OPNFV. A -sample fault table can be found in the Doctor project. (https://wiki.opnfv.org/doctor/faults) -All the critical hardware failures should be reported to the VIM within 1s. - -.. (MT2) Should we keep the 50ms here? Other places have been modified to <1sec, e.g. for SAL 1. - -.. (fq2) agree with 1s - -Other warnings for the hardware should also be reported to the VIM in a -timely manner. - -********************* -General Requirements: -********************* - -.. (MT) Are these general requirements or just for the servers? - -.. (fq) I think these should be the general requirements. not just the server. - -* Hardware Failures should be reported to the hypervisor and the VIM. -* Hardware Failures should not be directly reported to the VNF as in the traditional ATCA - architecture. -* Hardware failure detection message should be sent to the VIM within a specified period of time, - based on the SAL as defined in Section 1. -* Alarm thresholds should be detected and the alarm delivered to the VIM within 1min. A certain - threshold can be set for such notification. -* Direct notification from the hardware to some specific VNF should be possible. - Such notification should be within 1s. -* Periodical update of hardware running conditions (operational state?) to the - NFVI and VIM is required for further operation, which may include fault - prediction, failure analysis, and etc.. Such info should be updated every 60s -* Transparent failover is required once the failure of storage and network - hardware happens. -* Hardware should support SNMP and IPMI for centralized management, monitoring and - control. - -.. (MT) I would assume that this is OK if no guest was impacted, if there was a guest impact I think the VIM etc should know about the issue; in any case logging the failure and its correction would be still important -.. (fq) It seems the hardware failure detection message should send to VIM, shall we delete the hypervisor part? -.. (MT) The reason I asked the question whether this is about the servers was the hypervisor. I agree to remove this from the genaral requirement. -.. (Yifei) Shall we take VIM user (VNFM & NFVO) into consideration? As some of the messages should be send to VIM user. -.. (fq) yifei, I am a little bit confused, do you mean the Hardware send messages directly to VIM user? I myself think this may not be possible? -.. (Yifei) Yes, ur right, they should be sent to VIM first. -.. (MT) I agree, they should be sent to the VIM, the hypervisor can only be conditional because it may not be relevant as in a general requirement or may be dead with the HW. -.. (fq) Agree. I have delete the hypervisor part so that it is not a general requirement. -.. may require realtime features in openstack - -.. (fq) We may need some discussion about the time constraints? including failure detection time, VNF failover time, warning for abnormal situations. A table might be needed to clearify these. Different level of VNF may require differnent failover time. - -.. (MT) I agree. A VNF that manages its own availability with "built-in" redundancy wouldn't really care whether it's 1s or 1min because it would detect the failure and do the failover at the VNF level. But if the availability is managed by the VIM and VNFM then this time becomes critical. - -.. (joe) VIM can only rescue or migrate the VM onto anther host in case of hardware failure. The VNF should have being rescalready finish the failover before the failed/fault VM ued or migrated. VIM's responisbility is to keep the number of alive VM instances required by VNF, even for auto scaling, but not to replacethe VNF failover.That's why hardware failure dection message for VIM is not so time sensitive, because VM creation is often a slow task compared to failover(Althoug a lot of technology to accelerate the VM generation speed or use spare VM pool ). - -.. (fq) Yes. But here we just mean failure detection, not rescue or migration of the VM. I mean the hardware and NFVI failure should be reported to the VIM and the VNF in a timely manner, then the VNF can do the failover, and the VIM can do the migration and rescue afterwards. - -.. (bb) There is confusion regarding time span within which hardware failure should be reported to VIM. In 2nd paragraph(of Hardware HA), it has been mentioned as; "within 50ms" and in this point it is "1s". - -.. (fq) I try to modify the 50ms to 1s. - -.. (chayi) hard for openstack - -.. VNF failover time < 1s - -.. (MT) Indeed, it's not designed for that - -.. (MT) Do the "hardware failure detection message" and the "alarm of hardware failure" refer to the same notification? It may be better to speak about hardware failure detection (and reporting) time. - -.. (fq) I have made the modification. see if it makes sense to you now. - -.. (MT) Based on the definition section I think you are talking about these threshold alarms only, because a failure is also an abnormal situation, but you want to detect it within a second - -.. (fq) Actually, I want to define Alarm as messages that might lead to failure in the near future, for example, a high tempreture, or maybe a prediction of failure. These alarm maybe important, but they do not need to be answered and solved within seconds. - -.. Alarms for abnormal situations and performance decrease (i.e. overuse of cpu) -.. should be raised to the VIM within 1min(?). - - -.. (MT) There should be possible to set some threshold at which the notification should be triggered and probably ceilometer is not reliable enough to deliver such notifications since it has no real-time requirement nor it is expected to be lossless. - -.. (fq) modification made. - -.. (MT) agree with the realtime extension part :-) - -.. (MT) Considering the modified definitions can we say that: Alarm conditions should be detected and the alarm delivered to the VIM within 1min? - -.. This effectively result in two requirements: one on the detection and one on the -.. delivery mechanism. - -.. (fq) Agree. I have made the modification. - - - -.. In the meantime, I see the discussion of -.. this requirement is still open. - -.. (Yifei) As before I do not think it is needed to send HW fault/failure to VNF. For it is different from traditional interated NF, all the lifecycle of VNF is managed by VNFM. - -.. (joe) the HW fault/failure to VNF is required directly for VNF failover purpose. For example, memory or nic failure should be noticed by VNF ASAP, so that the service can be taken over and handled correctly by another VNF instance. - -.. (YY) In what case HW failure to VNF directly?Next is my understanding,may be not correct. If cpu/memory fails hostOS may be crashed at the same time the failure occured then no notification could be send to anywhere. If it is not crashed in some well managed smp OS, and if we use cpu-pinning to VM, the vm guestOS may be crashed. If cpu-pinning is not applied to VM, the hypervisor can continue scheduling the VMs on the server just like over-allocation mode. Another point, to accelerate the failover, the failure should be sent to standby service entity not the failed one. The standby vm should not be in same server because of anti-affinity scheme. How can "direct notice" apply? - -.. (joe) not all HW fault leads to the VNF will be crushed. For example, the nic can not send packet as usual, then it'll affect the service, but the VNF is still running. - - -.. Maybe 10 min is too long. As far as I know, Zabbix which is used by Doctor can -.. achieve 60s. - -.. (fq) change the constraint to 60s - -.. (MT2) I think this applies primarily to storage, network hardware and maybe some controllers, which also run in some type of redundancy e.g. active/active or active/standby. For compute, we need redundancy, but it's more of the spare concept to replace any failed compute in the cluster (e.g. N+1). In this context the failover doesn't mean the recovery of a state, it only means replacing the failed HW with a healthy one in the initial state and that's not transparent at the HW level at least, i.e. the host is not brought up with the same identiy as the failed one. - -.. (fq) agree. I have made some modification. I wonder what controller do you mean? is it SDN controller? - -.. (MT3) Yes, SDN, storage controllers. I don't know if any of the OpenStack controllers would also have such requirement, e.g. Ironic - - - -.. (MT) Is it expected for _all_ hardware? - -.. (YY) As general requirement should we add that the hardware should allow for -.. centralized management and control? Maybe we could be even more specific -.. e.g. what protocol should be supported. - -.. (fq) I agree. as far as I know, the protocol we use for hardware include SNMP and IPMI. - -.. (MT) OK, we can start with those as minimum requirement, i.e. HW should support at least them. Also I think the Ironic project in OpenStack manages the HW and also supports these. I was thinking maybe it could also be used for the HW management although that's not the general goal of Ironic as far as I know. - -*************************** -Network plane Requirements: -*************************** - -* The hardware should provide a redundant architecture for the network plane. -* Failures of the network plane should be reported to the VIM within 1s. -* QoS should be used to protect against link congestion. - -.. (MT) Do you mean the failure of the entire network plane? -.. (fq) no, I mean the failure of the network connection of a certain HW, or a VNF. - -******************** -Power supply system: -******************** - -* The power supply architecture should be redundant at the server and site level. -* Fault of the power supply system should be reported to the VIM within 1s. -* Failure of a power supply will trigure automatic failover to the redundant supply. - -*************** -Cooling system: -*************** - -* The architecture of the cooling system should be redundant. -* Fault of the cooling system should be reported to the VIM within 1s -* Failure of the cooling systme will trigger automatic failover of the system - -*********** -Disk Array: -*********** - -* The architecture for the disk array should be redundant. -* Fault of the disk array should be reported to the VIM within 1s -* Failure of the the disk array will trigger automatic failover of the system - support for protected cache after an unexpected power loss. - -* Data shall be stored redundantly in the storage backend - (e.g., by means of RAID across disks.) -* Upon failures of storage hardware components (e.g., disks services, storage - nodes) automatic repair mechanisms (re-build/re-balance of data) shall be - triggered automatically. -* Centralized storage arrays shall consist of redundant hardware - -******** -Servers: -******** - -* Support precise timming with accuracy higher than 4.6ppm - -.. (MT2) Should we have time synchronization requirements in the other parts? I.e. having NTP in control nodes or even in all hosts diff --git a/Section_4_Virtual_Infra.rst b/Section_4_Virtual_Infra.rst deleted file mode 100644 index 7779f6c..0000000 --- a/Section_4_Virtual_Infra.rst +++ /dev/null @@ -1,181 +0,0 @@ -4.0 Virtual Infrastructure HA – Requirements: -============================================= - -This section is written with the goal to ensure that there is alignment with -Section 4.2 of the ETSI/NFV REL-001 document. - -Key reference requirements from ETSI/NFV document: -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -[Req.4.2.12] On the NFVI level, there should be a transparent fail-over in the -case of for example compute, memory,storage or connectivity failures. - -.. (fq) According to VNF part, the following bullet may be added: - -* The virtual infrastructure should provide classified virtual resource for - different SAL VNFs. Each class of the resources should have guaranteed - performance metrics. - -* Specific HA handling schemes for each classified virtual resource, - e.g. recovery mechanisms, recovery priorities, migration options, - should be defined. - -* The NFVI should maintain the number of VMs provided to the VNF in the face of - failures. I.e. the failed VM instances should be replaced by new VM instances. - -.. (MT) this might be a requirement on the hypervisor and/or the -.. VIM. In this respect I wonder where the nova agent running on the compute node -.. belongs. Is it the VIM already or the Virtualization Facilities? The reason I'm -.. asking is that together with the hypervisor they are in a unique position of -.. correlating different failures on the host that may be due to HW, OS or -.. hypervisor. - -.. (fq) I agree this might be for the hypervisor part. The VNF (i.e. -.. between VNFCs) may have its own fault detection mechanism, which might be -.. triggered prior to receiving the error report from the underlying NFVI therefore -.. the NFVI/VIM should not attempt to preserve the state of a failing VM if not -.. configured to do so - -4.1 Compute -=========== - -VM including CPU, memory and ephemeral disk - -.. (Yifei) Including noca-compute fq) What do you mean? Yifei) I mean nova- -.. (compute is important enough for us to define some requirement about it. -.. (IJ)(Nova-compute is important, but implementation specific, this should be -.. requirements focused. - -Requirements: - -* Detection of failures must be sub 1 second. -* Recovery of a failed VM (VNF) must be automatic. The recovery must re-launch - the VM based on the required initial state defined in the VNFD. - -.. (MT) I think this is the same essentially as the one brought over from the VNF part in the paragraph above, where I have the question also. -.. (Yifei) Different mechanisms should be defined according to the SLA of the service running on the VM. -.. (fq) What do you mean by failure detection? Do you mean hypervisor notice the failure and perform automatic recovery? or do you mean hypervisor notice the failure and inform VIM? -.. (fq) How to define the time limit for the failure detection? whether 1s is sufficient enough, or we should require for sometime less? - -.. Requirements do have some dependency on the NFVI interface definitions that are -.. currently being defined by ETSI/NFV working groups. Ongoing alignment will -.. be required. - -* On evacuation, fencing of instances from an unreachable host is required. - -.. orginal wording for above: Fencing instances of an unreachable host when evacuation happens.[GAP 10] - -.. (YY) If a host is unreachable how to evacuate VMs on it? Fencing function may be moved toVIM part. -.. (fq) copy from the Gap 10: - -.. Safe VM evacuation has to be preceded by fencing (isolate, shut down) the failed -.. host. Failing to do so – when the perceived disconnection is due to some -.. transient or partial failure – the evacuation might lead into two identical -.. instances running together and having a dangerous conflict. - -.. (unknown commenter) I agree it should be move to VIM part. -.. (IJ) Not clear what or if the above comment has been moved. - -.. (Yifei) In OpenStack, evacuate means that "VMs whose storage is accessible from other nodes (e.g. shared storage) could be rebuilt and restarted on a target node", it is different from migration. link: https://wiki.openstack.org/wiki/Evacuate - -* Resources of a migrated VM must be evacuated once the VM is - migrated to a different compute node, placement policies must be preserved. - For example during maintenance activities. - -.. (MT) Do you mean maintenance of the compute node? In any case I think the evacuation should follow the palcement policy. -.. (fq) Yes. What placement policy do you mean? -.. (Yifei) e.g. keep the same scheduler hints as before, am I right ,@Maria? -.. (MT) Yes, the affinity, anti-affinity, etc -.. (fq) Got it. I am adding a requirement that the evacuation should follow the placement policy. -.. (fq) insert below. - -* Failure detection of the VNF software process is required - in order to detect the failure of the VNF sufficiently. Detection should be - within less than 1 second. - -.. ( may require interface extension) - -.. (MT) What do youy mean by the VNF software process? Is it the application(s) running in the VM? If yes, Heat has such consideration already, but I'm only familiar with the first version which was cron job based and therefore the resolution was 1 minute. -.. (fq) Yes, I mean the applications. 1 min might be too long I am afraid. I think this failure detection should be at least less than the failover time. Otherwise it does not make sense. -.. (I don't know if 50ms is sufficient enough, since we require the failover of the VNFs should be within 50ms, if the detection is longer than this, there is no meaning to do the detection) -.. (MT) Do you assume that the entire VM needs to be repaired in case of application failure? Also the question is whether there's a VM ready to failover to. It might be that OpenStack just starts to build the VM when the failover is triggere. If that's the case it can take minutes. If the VM exists then starting it still takes ~half a minute I think. -.. I think there's a need to have the VM images in shared storage otherwise there's an issue with migration and failover -.. (fq) I don't mean the recovery of the entire VM. I only mean the failover of the service. In our testing, we use an active /active VM, so it only takes less than 1s to do the failover. I understand the situation you said above. I wonder if we should set a time constraint for such failover? for me, I think such constraint should be less than second. -.. (Yifei) Maria, I cannot understand " If the VM exists then starting it still takes ~half a minute", would please explain it more detailed? Thank you. -.. (MT) As far as I know Heat rebuilds the VM from scratch as part of the failure recovery. Once the VM is rebuilt it's booted and only after that it can actualy provide service. This time till the VM is ready to serve can take 20-30sec after the VM is already reported as existing. -.. ([Yifei) ah, I see. Thank you so much! -.. (YY) As I understand, what heat provides is not what fuqiao wants here. To failover within 50ms/or 1s means two VMs are all running, in NFVI view there are two VMs running, but in application view one is master the other is standby. What I did not find above is how to monitoring application processes in VM? Tradictionally watchdog is applied to this task. In new version of Qemu watchdog is simulated with software but timeslot of watchdog could not be as narrow as hardware watchdog. I was told lower than 15s may cause fault action. -.. Do you mean this watchdog? https://libvirt.org/formatdomain.html#elementsWatchdog -.. (fq) Yes, Yuan Yue got my idea:) - -.. 4.2 Storage dedicated section (new section 7). -.. (GK) please see dedicated section on storage below (Section 7) -.. Virtual disk and volumes for applications. -.. Storage related to NFVI must be redundant. -.. Requirements: -.. For small systems a small local redundant file system must be supported. -.. For larger system – replication of data across multiple storage nodes. Processes controlling the storage nodes must also be replicated, such that there is no single point of failure. -.. Block storage supported by a clustered files system is required. -.. Should be tranparent to the storage user - -4.2 Network -=========== - -Virtual network: -^^^^^^^^^^^^^^^^ - -Requirements: - -* Redundant top of rack switches must be supported as part of the deployment. - -.. (MT) Shouldn't this be a HW requirement? -.. (Yifei) Agree with Maria -.. (IJ) The ToR is not typically in the NFVI, that is why I put the ToR here. - -* Static LAG must be supported to ensure sub 50ms detection and failover of - redundant links between nodes. The distributed virtual router should - support HA. - -.. (Yifei) Add ?: Service provided by Network agents should be keeped availability and continuity. e.g. VRRP is used for L3 agent HA (keepalived or pacemaker) -.. (IJ) this is a requirements document. Exclude the implementation details. Added the requirement below - -* Service provided by network agents should be highly available (L3 Agent, DHCP - agent as examples) - -* L3-agent, DHCP-agent should clean up network artifacts (IPs, Namespaces) from - the database in case of failover. - -vSwitch Requirements: -^^^^^^^^^^^^^^^^^^^^^ - -* Monitoring and health of vSwitch processes is required. -* The vSwitch must adapt to changes in network topology and automatically - support recovery modes in a transparent manner. - -Link Redundancy Requirements: -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -* The ability to manage redundant interfaces and support of LAG on the compute - node is required. -* Support of LAG on all interfaces, internal platform control - interfaces,internal platform storage interfaces, as well as interfaces - connecting to provide networks. -* LACP is optional for dynamic management of LAG links -* Automated configuration LAG should support active/standby and - balanced modes. Should adapt to changes in network topology and automatically - support recovery modes in a transparent manner. -* In SR-IOV scenario, link redundancy could not be transparent, VM should have - two ports directly connect to physical port on host. Then app may bind - these two ports for HA. - -.. (MT) Should we consider also load balancers? I'm not familiar with the LBaaS, but it seems to be key for the load distribution for the multi-VM VNFs. -.. (YY) As I know LBaaS was not mature this time in openstack. Openstack does provide API for LBaaS,but it depend on LB entity and its plugin. We have not found any mature LB agent and LB entity in community. The LB inside VNF usually approached by VNF itsself. -.. (fq) I think LB should be taken into consideration as well. eventhough openstack now is not mature. This is how OPNFV is working, we work out requirement for our side, propose possible bp to openstack so that these features can be added in the future releases. -.. (YIfei) Agree. Because of it is not mature, there is possibility to find gap between OpenStack and our requirement. -.. (MT) Agree. We may even influence how it matures ;-) -.. vlb, vFW are part of virtual resources? -.. (Yifei) From my side, network node. -.. (Yifei) If you mean LB or FW in NFVI, I do not think vXX is a suitable name as in OpenStack Neutron there are LBaas and FWaas. If you mean VNF, then you can call them vLB and vFW. However i do not think LBaas is the same as vLB, they are different use cases. What we need to consider should be LBaas and FWaas not vLB or vFW. -.. For more details about LBaas and FWaas, you can find on the wiki page of neutron... -.. (fq) Thank you for Yifei. I wonder what's the difference between vLB and LBaas. You mean they have different functions? -.. (IJ) LBaaS is good for enterprise - for Carrier applications won't higher data rates be needed and therefore a Load Balancer in a VNF is probably a better solution.
\ No newline at end of file diff --git a/Section_5_VIM_HA.rst b/Section_5_VIM_HA.rst deleted file mode 100644 index ac16515..0000000 --- a/Section_5_VIM_HA.rst +++ /dev/null @@ -1,133 +0,0 @@ - -This section about VIM High availability - -============================ -5 VIM High availability -============================ -The VIM in the NFV reference architecture contains all the control nodes of OpenStack, SDN controllers -and hardware controllers. It manages the NFVI according to the instructions/requests of the VNFM and -NFVO and reports them back about the NFVI status. To guarantee the high availability of the VIM is -a basic requirement of the OPNFV platform. Also the VIM should provide some mechanism for VNFs to achieve -their own high availability. - -5.1 Architecture requirement of VIM HA ---------------------------------------- -The architecture of the control nodes should avoid any single point of failure and the management -network plane which connects the control nodes should also be redundant. Services of the control nodes -which are stateless like nova-API, glance-API etc. should be redundant but without data synchronization. -Stateful services like MySQL, Rabbit MQ, SDN controller should provide complex redundancy policies. -Cloud of different scale may also require different HA policies. - -Requirement: ------------- -- In small scale scenario active-standby redundancy policy would be acceptable. - -- In large scale scenario all stateful services like database, message queue, SDN controller - should be deployed in cluster mode which support N-way, N+M active-standby redundancy. - -- In large scale scenario all stateless services like nova-api, glance-api etc. should be deployed - in all active mode. - -- Load balance nodes which introduced for all active and N+M mode should also avoid the single point - of failure. - -- All control node servers shall have at least two network ports to connect to different networks - plane. These ports shall work in bonding manner. - -- Any failures of services in the redundant pairs should be detected and switch over should be carried out - automatically in less than 5 seconds totally. - -- Status of services must be monitored. - - -5.2 Fault detection and alarm requirement of VIM --------------------------------------------------- -Redundant architecture can provide function continuity for the VIM. For maintenance considerations -all failures in the VIM should be detected and notifications should be triggered to NFVO, VNFM and other -VIM consumers. - -Requirement: ------------- -- All hardware failures of control nodes should be detected and relevant alarms should be triggered. - OSS, NFVO, VNFM and other VIM consumers can subscribe these alarms. - -- Software on control nodes like OpenStack or ODL should be monitored by the clustering software - at process level and alarms should be triggered when exceptions are detected. - -- Software on compute nodes like OpenStack/nova agents, ovs should be monitored by watchdog. When - exceptions are detected the software should be restored automatically and alarms should be triggered. - -- Software on storage nodes like Ceph, should be monitored by watchdog. When - exceptions are detected the software should be restored automatically and alarms should be triggered. - -- All alarm indicators should include: Failure time, Failure location, Failure type, Failure level. - -- The VIM should provide an interface through which consumers can subscribe to alarms and notifications. - -- All alarms and notifications should be kept for future inquiry in VIM, ageing policy of these records - should be configurable. - -- VIM should distinguish between the failure of the compute node and the failure of the host HW. - -- VIM should be able to publish the health status of the compute node to NFV MANO. - -5.3 HA mechanism of VIM provided for VNFs ------------------------------------------- -When VNFs deploy their HA scheme, they usually require from underlying resource to provide some mechanism. -This is similar to the hardware watchdog in the traditional network devices. Also virtualization -introduces some other requirements like affinity and anti-affinity with respect to the allocation of the -different virtual resources. - -Requirement ------------- -- VIM should provide the ability to configure HA functions like watchdog timers, - redundant network ports and etc. These HA functions should be properly tagged and exposed to - VNF and VNFM with standard APIs. - -- VIM should provide anti-affinity scheme for VNF to deploy redundant service on different level of - aggregation of resource. - -- VIM should be able to deploy classified virtual resources to VNFs following the SAL description in VNFD. - -- VIM should provide data collection to calculate the HA related metrics for VNFs. - -- VIM should support the VNF/VNFM to initiate the operation of resources of the NFVI, such as repair/reboot. - -- VIM should correlate the failures detected on collocated virtual resources to identify latent faults in - HW and virtualization facilities - -- VIM should be able to disallow the live migration of VMs and when it is allowed it should be possible - to specify the tolerated interruption time. - -- VIM should be able to restrict the simultaneous migration of VMs hosting a given VNF. - -- VIM should provide the APIs to trigger scale in/out to VNFM/VNF. - -- When scheduler of the VIM use the Active/active HA scheme, multiple scheduler instances must not create - a race condition - -- VIM should be able to trigger the evacuation of the VMs before bringing the host down - when *maintenance mode* is set for the compute host. - -- VIM should configure Consoleauth in active/active HA mode, and should store the token in database. - -- VIM should replace a failed VM with a new VM and this new VM should start in the same initial state - as the failed VM. - -- VIM should support policies to prioritize a certain VNF. - -5.4 SDN controller -------------------- -SDN controller: Distributed or Centralized - -Requriements -------------- -- In centralized model SDN controller must be deployed as redundant pairs. - -- In distributed model, mastership election must determine which node is in overall control. - -- For distributed model, VNF should not be aware of HA of controller. That is it is a - logically centralized - system for NBI(Northbound Interface). - -- Event notification is required as section 5.2 mentioned. - diff --git a/Storage-HA-Scenarios.rst b/Storage-HA-Scenarios.rst deleted file mode 100644 index b8b37a3..0000000 --- a/Storage-HA-Scenarios.rst +++ /dev/null @@ -1,442 +0,0 @@ -Storage and High Availability Scenarios -======================================= - -5.1 Elements of HA Storage Management and Delivery --------------------------------------------------- - -Storage infrastructure, in any environment, can be broken down into two -domains: Data Path and Control Path. Generally, High Availability of the -storage infrastructure is measured by the occurence of Data -Unavailability and Data Loss (DU/DL) events. While that meaning is -obvious as it relates to the Data Path, it is also applicable to Control -Path as well. The inability to attach a volume that has data to a host, -for example, can be considered a Data Unavailability event. Likewise, -the inability to create a volume to store data could be considered Data -Loss since it may result in the inability to store critical data. - -Storage HA mechanisms are an integral part of most High Availability -solutions today. In the first two sections below, we define the -mechanisms of redundancy and protection required in the infrastructure -for storage delivery in both the Data and Control Paths. Storage -services that have these mechanisms can be used in HA environments that -are based on a highly available storage infrastructure. - -In the third section below, we examine HA implementations that rely on -highly available storage infrastructure. Note that the scope throughout this -section is focused on local HA solutions. This does not address rapid remote -Disaster Recovery scenarios that may be provided by storage, nor -does it address metro active/active environments that implement stretched -clusters of hosts across multiple sites for workload migration and availability. - - -5.2 Storage Failure & Recovery Scenarios: Storage Data Path ------------------------------------------------------------ - -In the failure and recovery scenarios described below, a redundant -network infrastructure provides HA through network-related device -failures, while a variety of strategies are used to reduce or minimize -DU/DL events based on storage system failures. This starts with redundant -storage network paths, as shown in Figure 29. - -.. figure:: StorageImages/RedundantStoragePaths.png - :alt: HA Storage Infrastructure - :figclass: align-center - - Figure 29: Typical Highly Available Storage Infrastructure - -Storage implementations vary tremendously, and the recovery mechanisms -for each implementation will vary. These scenarios described below are -limited to 1) high level descriptions of the most common implementations -since it is unpredictable as to -which storage implementations may be used for NFVI; 2) HW- and -SW-related failures (and recovery) of the storage data path, and not -anything associated with user configuration and operational issues which -typically create the most common storage failure scenarios; 3) -non-LVM/DAS based storage implementations(managing failure and recovery -in LVM-based storage for OpenStack is a very different scenario with -less of a reliable track record); and 4) I will assume block storage -only, and not object storage, which is often used for stateless -applications (at a high level, object stores may include a -subset of the block scenarios under the covers). - -To define the requirements for the data path, I will start at the -compute node and work my way down the storage IO stack and touch on both -HW and SW failure/recovery scenarios for HA along the way. I will use Figure 1 as a reference. - -1. Compute IO driver: Assuming iSCSI for connectivity between the -compute and storage, an iSCSI initiator on the compute node maintains -redundant connections to multiple iSCSI targets for the same storage -service. These redundant connections may be aggregated for greater -throughput, or run independently. This redundancy allows the iSCSI -Initiator to handle failures in network connectivity from compute to -storage infrastructure. (Fibre Channel works largely the same way, as do -proprietary drivers that connect a host's IO stack to storage systems). - -2. Compute node network interface controller (NIC): This device may -fail, and said failure reported via whatever means is in place for such -reporting from the host.The redundant paths between iSCSI initiators and -targets will allow connectivity from compute to storage to remain up, -though operating at reduced capacity. - -3. Network Switch failure for storage network: Assuming there are -redundant switches in place, and everything is properly configured so -that two compute NICs go to two separate switches, which in turn go to -two different storage controllers, then a switch may fail and the -redundant paths between iSCSI initiators and targets allows connectivity -from compute to storage to operational, though operating at reduced -capacity. - -4. Storage system network interface failure: Assuming there are -redundant storage system network interfaces (on separate storage -controllers), then one may fail and the redundant paths between iSCSI -initiators and targets allows connectivity from compute to storage to -remain operational, though operating at reduced performance. The extent -of the reduced performance is dependent upon the storage architecture. -See 3.5 for more. - -5. Storage controller failure: A storage system can, at a very high -level, be described as composed of network interfaces, one or more -storage controllers that manage access to data, and a shared Data Path -access to the HDD/SSD subsystem. The network interface failure is -described in #4, and the HDD/SSD subsystem is described in #6. All -modern storage architectures have either redundant or distributed -storage controller architectures. In **dual storage controller -architectures**, high availability is maintained through the ALUA -protocol maintaining access to primary and secondary paths to iSCSI -targets. Once a storage controller fails, the array operates in -(potentially) degraded performance mode until the failed storage controller is -replaced. The degree of reduced performance is dependent on the overall -original load on the array. Dual storage controller arrays also remain at risk -of a Data Unavailability event if the second storage controller should fail. -This is rare, but should be accounted for in planning support and -maintenance contracts. - -**Distributed storage controller architectures** are generally server-based, -which may or may not operate on the compute servers in Converged -Infrastructure environments. Hence the concept of “storage controller” -is abstract in that it may involve a distribution of software components -across multiple servers. Examples: Ceph and ScaleIO. In these environments, -the data may be stored -redundantly, and metadata for accessing the data in these redundant -locations is available for whichever compute node needs the data (with -authorization, of course). Data may also be stored using erasure coding -(EC) for greater efficiency. The loss of a storage controller in this -context leads to a discussion of impact caused by loss of a server in -this distributed storage controller architecture. In the event of such a loss, -if data is held in duplicate or triplicate on other servers, then access -is simply redirected to maintain data availability. In the case of -EC-based protection, then the data is simply re-built on the fly. The -performance and increased risk impact in this case is dependent on the -time required to rebalance storage distribution across other servers in -the environment. Depending on configuration and implementation, it could -impact storage access performance to VNFs as well. - -6. HDD/SSD subsystem: This subsystem contains any RAID controllers, -spinning hard disk drives, and Solid State Drives. The failure of a RAID -controller is equivalent to failure of a storage controller, as -described in 5 above. The failure of one or more storage devices is -protected by either RAID parity-based protection, Erasure Coding -protection, or duplicate/triplicate storage of the data. RAID and -Erasure Coding are typically more efficient in terms of space -efficiency, but duplicate/triplicate provides better performance. This -tradeoff is a common point of contention among implementations, and this -will not go into greater detail than to assume that failed devices do -not cause Data Loss events due to these protection algorithms. Multiple -device failures can potentially cause Data Loss events, and the risk of -each method must be taken into consideration for the HA requirements of -the desired deployment. - -5.3 Storage Failure & Recovery Scenarios: Storage Control Path --------------------------------------------------------------- - -As it relates to an NFVI environment, as proposed by OPNFV, there are -two parts to the storage control path. - -* The storage system-specific control path to the storage controller - -* The OpenStack-specific cloud management framework for managing different -storage elements - - -5.3.1 Storage System Control Paths -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -High Availability of a storage controller is storage -system-specific. Breaking it down to implementation variants is the best -approach. However, both variants assume an IP-based management API in -order to leverage network redundancy mechanisms for ubiquitous -management access. - -An appliance style storage array with dual storage controllers must implement IP -address failover for the management API's IP endpoint in either an -active/active or active/passive configuration. Likewise, a storage array -with >2 storage controllers would bring up a management endpoint on -another storage controller in such an event. Cluster-style IP address load -balancing is also a viable implementation in these scenarios. - -In the case of distributed storage controller architectures, the storage system -provides redundant storage controller interfaces. E.g., Ceph's RADOS provides -redundant paths to access an OSD for volume creation or access. In EMC's -ScaleIO, there are redundant MetaData Managers for managing volume -creation and access. In the case of the former, the access is via -proprietary protocol, in the case of the latter, it is via HTTP-based -REST API. Other storage implementations may also provide alternative -methods, but any enterprise-class storage system will have built-in HA -for management API access. - -Finally, note that single server-based storage solutions, such as LVM, -do not have HA solutions for control paths. If the server is failed, the -management of that server's storage is not available. - -5.3.2 OpenStack Controller Management -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -OpenStack cloud management is comprised of a number of different -function-specific management modules such as Keystone for Identity and -Access management (IAM), Nova for compute management, Cinder for block -storage management, Swift for Object Storage delivery, Neutron for -Network management, and Glance as an image repository. In smaller -single-cloud environments, these management systems are managed in -concert for High Availability; in larger multi-cloud environments, the -Keystone IAM may logically stand alone in its own HA delivery across the -multiple clouds, as might Swift as a common Object Store. Nova, Cinder, -and Glance may have separate scopes of management, but they are more -typically managed together as a logical cloud deployment. - -It is the OpenStack deployment mechanisms that are responsible for HA -deployment of these HA management infrastructures. These tools, such as -Fuel, RDO, and others, have matured to include highly available -implementations for the database, the API, and each of the manager -modules associated with the scope of cloud management domains. - -There are many interdependencies among these modules that impact Cinder high availability. -For example: - -* Cinder is implemented as an Active/Standby failover implementation since it -requires a single point of control at one time for the Cinder manager/driver implementation. -The Cinder manager/driver is deployed on two of the three OpenStack controller nodes, and -one is made active while the other is passive. This may be improved to active/active -in a future release. - -* A highly available database implementation must be delivered -using something like MySQL/Galera replication across the 3 OpenStack controller -nodes. Cinder requires an HA database in order for it to be HA. - -* A redundant RabbitMQ messaging implementation across the same -three OpenStack controller nodes. Likewise, Cinder requires an HA messaging system. - -* A redundant OpenStack API to ensure Cinder requests can be delivered. - -* An HA Cluster Manager, like PaceMaker for monitoring each of the -deployed manager elements on the OpenStack controllers, with restart capability. -Keepalived is an alternative implementation for monitoring processes and restarting on -alternate OpenStack controller nodes. While statistics are lacking, it is generally -believed that the PaceMaker implementation is more frequently implemented -in HA environments. - - -For more information on OpenStack and Cinder HA, see http://docs.openstack.org/ha-guide -for current thinking. - -While the specific combinations of management functions in these -redundant OpenStack controllers may vary with the specific small/large environment -deployment requirements, the basic implementation of three OpenStack controller -redundancy remains relatively common. In these implementations, the -highly available OpenStack controller environment provides HA access to -the highly available storage controllers via the highly available IP -network. - - -5.4 The Role of Storage in HA ------------------------------ - -In the sections above, we describe data and control path requirements -and example implementations for delivery of highly available storage -infrastructure. In summary: - -* Most modern storage infrastructure implementations are inherently -highly available. Exceptions certainly apply; e.g., simply using LVM for -storage presentation at each server does not satisfy HA requirements. -However, modern storage systems such as Ceph, ScaleIO, XIV, VNX, and -many others with OpenStack integrations, certainly do have such HA -capabilities. - -* This is predominantly through network-accessible shared storage -systems in tightly coupled configurations such as clustered hosts, or in -loosely coupled configurations such as with global object stores. - - -Storage is an integral part of HA delivery today for applications, -including VNFs. This is examined below in terms of using storage as a -key part of HA delivery, the possible scope and limitations of that -delivery, and example implementations for delivery of such service. We -will examine this for both block and object storage infrastructures below. - -5.4.1 VNF, VNFC, and VM HA in a Block Storage HA Context -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Several scenarios were described in another section with regard to -managing HA at the VNFC level, with variants of recovery based on either -VIM- or VNFM-based reporting/detection/recovery mechanisms. In a block -storage environment, these differentiations are abstract and -meaningless, regardless of whether it is or is not intended to be HA. - -In a block storage context, HA is delivered via a logical block device -(sometimes called a Logical Unit, or LUN), or in some cases, to a VM. -VM and logical block devices are the units of currency. - -.. figure:: StorageImages/HostStorageCluster.png - :alt: Host Storage Cluster - :figclass: align-center - - Figure 30: Typical HA Cluster With Shared Storage - -In Figure 30, several hosts all share access, via an IP network -or via Fibre Channel, to a common set of logical storage devices. In an -ESX cluster implementation, these hosts all access all devices with -coordination provided with the SCSI Reservation mechanism. In the -particular ESX case, the logical storage devices provided by the storage -service actually aggregate volumes (VMDKs) utilized by VMs. As a result, -multiple host access to the same storage service logical device is -dynamic. The vSphere management layer provides for host cluster -management. - -In other cases, such as for KVM, cluster management is not formally -required, per se, because each logical block device presented by the -storage service is uniquely allocated for one particular VM which can -only execute on a single host at a time. In this case, any host that can -access the same storage service is potentially a part of the "cluster". -While *potential* access from another host to the same logical block -device is necessary, the actual connectivity is restricted to one host -at a time. This is more of a loosely coupled cluster implementation, -rather than the tightly coupled cluster implementation of ESX. - -So, if a single VNF is implemented as a single VM, then HA is provided -by allowing that VM to execute on a different host, with access to the -same logical block device and persistent data for that VM, located on -the storage service. This also applies to multiple VNFs implemented -within a single VM, though it impacts all VNFs together. - -If a single VNF is implemented across multiple VMs as multiple VNFCs, -then all of the VMs that comprise the VNF may need to be protected in a consistent -fashion. The storage service is not aware of the -distinction from the previous example. However, a higher level -implementation, such as an HA Manager (perhaps implemented in a VNFM) -may monitor and restart a collection of VMs on alternate hosts. In an ESX environment, -VM restarts are most expeditiously handled by using vSphere-level HA -mechanisms within an HA cluster for individual or collections of VMs. -In KVM environments, a separate HA -monitoring service, such as Pacemaker, can be used to monitor individual -VMs, or entire multi-VM applications, and provide restart capabilities -on separately configured hosts that also have access to the same logical -storage devices. - -VM restart times, however, are measured in 10's of seconds. This may -sometimes meet the SAL-3 recovery requirements for General Consumer, -Public, and ISP Traffic, but will never meet the 5-6 seconds required -for SAL-1 Network Operator Control and Emergency Services. For this, -additional capabilities are necessary. - -In order to meet SAL-1 restart times, it is necessary to have: 1. A hot -spare VM already up and running in an active/passive configuration 2. -Little-to-no-state update requirements for the passive VM to takeover. - -Having a spare VM up and running is easy enough, but putting that VM in -an appropriate state to take over execution is the difficult part. In shared storage -implementations for Fault Tolerance, which can achieve SAL-1 requirements, -the VMs share access to the same storage device, and another wrapper function -is used to update internal memory state for every interaction to the active -VM. - -This may be done in one of two ways, as illustrated in Figure 31. In the first way, -the hypervisor sends all interface interactions to the passive as well -as the active VM. The interaction is handled completely by -hypervisor-to-hypervisor wrappers, as represented by the purple box encapsulating -the VM in Figure 31, and is completely transparent to the VM. -This is available with the vSphere Fault Tolerant option, but not with -KVM at this time. - -.. figure:: StorageImages/FTCluster.png - :alt: FT host and storage cluster - :figclass: align-center - - Figure 31: A Fault Tolerant Host/Storage Configuration - -In the second way, a VM-level wrapper is used to capture checkpoints of -state from the active VM and transfers these to the passive VM, similarly represented -as the purple box encapsulating the VM in Figure 3. There -are various levels of application-specific integration required for this -wrapper to capture and transfer checkpoints of state, depending on the -level of state consistency required. OpenSAF is an example of an -application wrapper that can be used for this purpose. Both techniques -have significant network bandwidth requirements and may have certain -limitations and requirements for implementation. - -In both cases, the active and passive VMs share the same storage infrastructure. -Although the OpenSAF implementation may also utilize separate storage infrastructure -as well (not shown in Figure 3). - -Looking forward to the long term, both of these may be made obsolete. As soon as 2016, -PCIe fabrics will start to be available that enable shared NVMe-based -storage systems. While these storage systems may be used with -traditional protocols like SCSI, they will also be usable with true -NVMe-oriented applications whose memory state are persisted, and can be -shared, in an active/passive mode across hosts. The HA mechanisms here -are yet to be defined, but will be far superior to either of the -mechanisms described above. This is still a future. - - -5.4.2 HA and Object stores in loosely coupled compute environments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Whereas block storage services require tight coupling of hosts to -storage services via SCSI protocols, the interaction of applications -with HTTP-based object stores utilizes a very loosely coupled -relationship. This means that VMs can come and go, or be organized as an -N+1 redundant deployment of VMs for a given VNF. Each individual object -transaction constitutes the duration of the coupling, whereas with -SCSI-based logical block devices, the coupling is active for the -duration of the VM's mounting of the device. - -However, the requirement for implementation here is that the state of a -transaction being performed is made persistent to the object store by -the VM, as the restartable checkpoint for high availability. Multiple -VMs may access the object store somewhat simultaneously, and it is -required that each object transaction is made idempotent by the -application. - -HA restart of a transaction in this environment is dependent on failure -detection and transaction timeout values for applications calling the -VNFs. These may be rather high and even unachievable for the SAL -requirements. For example, while the General Consumer, Public, and ISP -Traffic recovery time for SAL-3 is 20-25 seconds, default browser -timeouts are upwards of 120 seconds. Common default timeouts for -applications using HTTP are typically around 10 seconds or higher -(browsers are upward of 120 seconds), so this puts a requirement on the -load balancers to manage and restart transactions in a timeframe that -may be a challenge to meeting even SAL-3 requirements. - -Despite these issues of performance, the use of object storage for highly -available solutions in native cloud applications is very powerful. Object -storage services are generally globally distributed and replicated using -eventual consistency techniques, though transaction-level consistency can -also be achieved in some cases (at the cost of performance). (For an interesting -discussion of this, lookup the CAP Theorem.) - - -5.5 Summary ------------ - -This section addressed several points: - -* Modern storage systems are inherently Highly Available based on modern and reasonable -implementations and deployments. - -* Storage is typically a central component in offering highly available infrastructures, -whether for block storage services for traditional applications, or through object -storage services that may be shared globally with eventual consistency. - -* Cinder HA management capabilities are defined and available through the use of -OpenStack deployment tools, making the entire storage control and data paths -highly available. - diff --git a/StorageImages/FTCluster.png b/StorageImages/FTCluster.png Binary files differdeleted file mode 100644 index c42f4a0..0000000 --- a/StorageImages/FTCluster.png +++ /dev/null diff --git a/StorageImages/HostStorageCluster.png b/StorageImages/HostStorageCluster.png Binary files differdeleted file mode 100644 index baca591..0000000 --- a/StorageImages/HostStorageCluster.png +++ /dev/null diff --git a/StorageImages/RedundantStoragePaths.png b/StorageImages/RedundantStoragePaths.png Binary files differdeleted file mode 100644 index 1850ca9..0000000 --- a/StorageImages/RedundantStoragePaths.png +++ /dev/null diff --git a/UseCases/UseCases.rst b/UseCases/UseCases.rst deleted file mode 100644 index 57dfbc0..0000000 --- a/UseCases/UseCases.rst +++ /dev/null @@ -1,731 +0,0 @@ -============ -HA Use Cases -============ - -************** -1 Introduction -************** - -This use case document outlines the model and failure modes for NFV systems. Its goal is along -with the requirements documents and gap analysis help set context for engagement with various -upstream projects. The OPNFV HA project team continuously evolving these documents, and in -particular this use case document starting with a set of basic use cases. - -***************** -2 Basic Use Cases -***************** - - -In this section we review some of the basic use cases related to service high availability, -that is, the availability of the service or function provided by a VNF. The goal is to -understand the different scenarios that need to be considered and the specific requirements -to provide service high availability. More complex use cases will be discussed in -other sections. - -With respect to service high availability we need to consider whether a VNF implementation is -statefull or stateless and if it includes or not an HA manager which handles redundancy. -For statefull VNFs we can also distinguish the cases when the state is maintained inside -of the VNF or it is stored in an external shared storage making the VNF itself virtually -stateless. - -Managing availability usually implies a fault detection mechanism, which triggers the -actions necessary for fault isolation followed by the recovery from the fault. -This recovery includes two parts: - -* the recovery of the service and -* the repair of the failed entity. - -Very often the recovery of the service and the repair actions are perceived to be the same, for -example, restarting a failed application repairs the application, which then provides the service again. -Such a restart may take significant time causing service outage, for which redundancy is the solution. -In cases when the service is protected by redundancy of the providing entities (e.g. application -processes), the service is "failed over" to the standby or a spare entity, which replaces the -failed entity while it is being repaired. E.g. when an application process providing the service fails, -the standby application process takes over providing the service, while the failed one is restarted. -Such a failover often allows for faster recovery of the service. - -We also need to distinguish between the failed and the faulty entities as a fault may or -may not manifest in the entity containing the fault. Faults may propagate, i.e. cause other entities -to fail or misbehave, i.e. an error, which in turn might be detected by a different failure or -error detector entity each of which has its own scope. Similarly, the managers acting on these -detected errors may have a limited scope. E.g. an HA manager contained in a VNF can only repair -entities within the VNF. It cannot repair a failed VM, in fact due to the layered architecture -in the VNF it cannot even know whether the VM failed, its hosting hypervisor, or the physical host. -But its error detection mechanism will detect the result of such failures - a failure in the VNF - -and the service can be recovered at the VNF level. -On the other hand, the failure should be detected in the NFVI and the VIM should repair the failed -entity (e.g. the VM). Accordingly a failure may be detected by different managers in different layers -of the system, each of which may react to the event. This may cause interference. -Thus, to resolve the problem in a consistent manner and completely recover from -a failure the managers may need to collaborate and coordinate their actions. - -Considering all these issues the following basic use cases can be identified (see table 1.). -These use cases assume that the failure is detected in the faulty entity (VNF component -or the VM). - - -*Table 1: VNF high availability use cases* - -+---------+-------------------+----------------+-------------------+----------+ -| | VNF Statefullness | VNF Redundancy | Failure detection | Use Case | -+=========+===================+================+===================+==========+ -| VNF | yes | yes | VNF level only | UC1 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC2 | -| | +----------------+-------------------+----------+ -| | | no | VNF level only | UC3 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC4 | -| +-------------------+----------------+-------------------+----------+ -| | no | yes | VNF level only | UC5 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC6 | -| | +----------------+-------------------+----------+ -| | | no | VNF level only | UC7 | -| | | +-------------------+----------+ -| | | | VNF & NFVI levels | UC8 | -+---------+-------------------+----------------+-------------------+----------+ - -As discussed, there is no guarantee that a fault manifests within the faulty entity. For -example, a memory leak in one process may impact or even crash any other process running in -the same execution environment. Accordingly, the repair of a failing entity (i.e. the crashed process) -may not resolve the problem and soon the same or another process may fail within this execution -environment indicating that the fault has remained in the system. -Thus, there is a need for extrapolating the failure to a wider scope and perform the -recovery at that level to get rid of the problem (at least temporarily till a patch is available -for our leaking process). -This requires the correlation of repeated failures in a wider scope and the escalation of the -recovery action to this wider scope. In the layered architecture this means that the manager detecting the -failure may not be the one in charge of the scope at which it can be resolved, so the escalation needs to -be forwarded to the manager in charge of that scope, which brings us to an additional use case UC9. - -We need to consider for each of these use cases the events detected, their impact on other entities, -and the actions triggered to recover the service provided by the VNF, and to repair the -faulty entity. - -We are going to describe each of the listed use cases from this perspective to better -understand how the problem of service high availability can be tackled the best. - -Before getting into the details it is worth mentioning the example end-to-end service recovery -times provided in the ETSI NFV REL document [REL]_ (see table 2.). These values may change over time -including lowering these thresholds. - -*Table 2: Service availability levels (SAL)* - -+----+---------------+----------------------+------------------------------------+ -|SAL |Service |Customer Type | Recommendation | -| |Recovery | | | -| |Time | | | -| |Threshold | | | -+====+===============+======================+====================================+ -|1 |5 - 6 seconds |Network Operator |Redundant resources to be | -| | |Control Traffic |made available on-site to | -| | | |ensure fastrecovery. | -| | |Government/Regulatory | | -| | |Emergency Services | | -+----+---------------+----------------------+------------------------------------+ -|2 |10 - 15 seconds|Enterprise and/or |Redundant resources to be available | -| | |large scale customers |as a mix of on-site and off-site | -| | | |as appropriate: On-site resources to| -| | |Network Operators |be utilized for recovery of | -| | |service traffic |real-time service; Off-site | -| | | |resources to be utilized for | -| | | |recovery of data services | -+----+---------------+----------------------+------------------------------------+ -|3 |20 - 25 seconds|General Consumer |Redundant resources to be mostly | -| | |Public and ISP |available off-site. Real-time | -| | |Traffic |services should be recovered before | -| | | |data services | -+----+---------------+----------------------+------------------------------------+ - -Note that even though SAL 1 of [REL]_ allows for 5-6 seconds of service recovery, -for many services this is too long and such outage causes a service level reset or -the loss of significant amount of data. Also the end-to-end service or network service -may be served by multiple VNFs. Therefore for a single VNF the desired -service recovery time is sub-second. - -Note that failing over the service to another provider entity implies the redirection of the traffic -flow the VNF is handling. This could be achieved in different ways ranging from floating IP addresses -to load balancers. The topic deserves its own investigation, therefore in these first set of -use cases we assume that it is part of the solution without going into the details, which -we will address as a complementary set of use cases. - -.. [REL] ETSI GS NFV-REL 001 V1.1.1 (2015-01) - - -2.1 Use Case 1: VNFC failure in a statefull VNF with redundancy -============================================================== - -Use case 1 represents a statefull VNF with redundancy managed by an HA manager, -which is part of the VNF (Fig 1). The VNF consists of VNFC1, VNFC2 and the HA Manager. -The latter managing the two VNFCs, e.g. the role they play in providing the service -named "Provided NF" (Fig 2). - -The failure happens in one of the VNFCs and it is detected and handled by the HA manager. -On practice the HA manager could be part of the VNFC implementations or it could -be a separate entity in the VNF. The point is that the communication of these -entities inside the VNF is not visible to the rest of the system. The observable -events need to cross the boundary represented by the VNF box. - - -.. figure:: images/Slide4.png - :alt: VNFC failure in a statefull VNF - :figclass: align-center - - Fig 1. VNFC failure in a statefull VNF with built-in HA manager - - -.. figure:: images/StatefullVNF-VNFCfailure.png - :alt: MSC of the VNFC failure in a statefull VNF - :figclass: align-center - - Fig 2. Sequence of events for use case 1 - - -As shown in Fig 2. initially VNFC2 is active, i.e. provides the Provided NF and VNFC1 -is a standby. It is not shown, but it is expected that VNFC1 has some means to get the update -of the state of the Provided NF from the active VNFC2, so that it is prepared to continue to -provide the service in case VNFC2 fails. -The sequence of events starts with the failure of VNFC2, which also interrupts the -Provided NF. This failure is detected somehow and/or reported to the HA Manager, which -in turn may report the failure to the VNFM and simultaneously it tries to isolate the -fault by cleaning up VNFC2. - -Once the cleanup succeeds (i.e. the OK is received) it fails over the active role to -VNFC1 by setting it active. This recovers the service, the Provided NF is indeed -provided again. Thus this point marks the end of the outage caused by the failure -that need to be considered from the perspective of service availability. - -The repair of the failed VNFC2, which might have started at the same time -when VNFC1 was assigned the active state, may take longer but without further impact -on the availability of the Provided NF service. -If the HA Manager reported the interruption of the Provided NF to the VNFM, it should -clear the error condition. - -The key points in this scenario are: - -* The failure of the VNFC2 is not detectable by any other part of the system except - the consumer of the Provided NF. The VNFM only - knows about the failure because of the error report, and only the information this - report provides. I.e. it may or may not include the information on what failed. -* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long - it takes to repair VNFC2. -* The HA manager could be part of the VNFM as well. This requires an interface to - detect the failures and to manage the VNFC life-cycle and the role assignments. - -2.2 Use Case 2: VM failure in a statefull VNF with redundacy -============================================================ - -Use case 2 also represents a statefull VNF with its redundancy managed by an HA manager, -which is part of the VNF. The VNFCs of the VNF are hosted on the VMs provided by -the NFVI (Fig 3). - -The VNF consists of VNFC1, VNFC2 and the HA Manager (Fig 4). The latter managing -the role the VNFCs play in providing the service - Provided NF. -The VMs provided by the NFVI are managed by the VIM. - - -In this use case it is one of the VMs hosting the VNF fails. The failure is detected -and handled at both the NFVI and the VNF levels simultaneously. The coordination occurs -between the VIM and the VNFM. - - -.. figure:: images/Slide6.png - :alt: VM failure in a statefull VNF - :figclass: align-center - - Fig 3. VM failure in a statefull VNF with built-in HA manager - - -.. figure:: images/StatefullVNF-VMfailure.png - :alt: MSC of the VM failure in a statefull VNF - :figclass: align-center - - Fig 4. Sequence of events for use case 2 - - -Again initially VNFC2 is active and provides the Provided NF, while VNFC1 is the standby. -It is not shown in Fig 4., but it is expected that VNFC1 has some means to learn the state -of the Provided NF from the active VNFC2, so that it is able to continue providing the -service if VNFC2 fails. VNFC1 is hosted on VM1, while VNFC2 is hosted on VM2 as indicated by -the arrows between these objects in Fig 4. - -The sequence of events starts with the failure of VM2, which results in VNFC2 failing and -interrupting the Provided NF. The HA Manager detects the failure of VNFC2 somehow -and tries to handle it the same way as in use case 1. However because the VM is gone the -clean up either not initiated at all or interrupted as soon as the failure of the VM is -identified. In either case the faulty VNFC2 is considered as isolated. - -To recover the service the HA Manager fails over the active role to VNFC1 by setting it active. -This recovers the Provided NF. Thus this point marks again the end of the outage caused -by the VM failure that need to be considered from the perspective of service availability. -If the HA Manager reported the interruption of the Provided NF to the VNFM, it should -clear the error condition. - -On the other hand the failure of the VM is also detected in the NFVI and reported to the VIM. -The VIM reports the VM failure to the VNFM, which passes on this information -to the HA Manager of the VNF. This confirms for the VNF HA Manager the VM failure and that -it needs to wait with the repair of the failed VNFC2 until the VM is provided again. The -VNFM also confirms towards the VIM that it is safe to restart the VM. - -The repair of the failed VM may take some time, but since the service has been failed over -to VNFC1 in the VNF, there is no further impact on the availability of Provided NF. - -When eventually VM2 is restarted the VIM reports this to the VNFM and -the VNFC2 can be restored. - -The key points in this scenario are: - -* The failure of the VM2 is detectable at both levels VNF and NFVI, therefore both the HA - manager and the VIM reacts to it. It is essential that these reactions do not interfere, - e.g. if the VIM tries to protect the VM state at NFVI level that would conflict with the - service failover action at the VNF level. -* While the failure detection happens at both NFVI and VNF levels, the time frame within - which the VIM and the HA manager detect and react may be very different. For service - availability the VNF level detection, i.e. by the HA manager is the critical one and expected - to be faster. -* The Provided NF is resumed as soon as VNFC1 is assigned active regardless how long - it takes to repair VM2 and VNFC2. -* The HA manager could be part of the VNFM as well. - This requires an interface to detect failures in/of the VNFC and to manage its life-cycle and - role assignments. -* The VNFM may not know for sure that the VM failed until the VIM reports it, i.e. whether - the VM failure is due to host, hypervisor, host OS failure. Thus the VIM should report/alarm - and log VM, hypervisor, and physical host failures. The use cases for these failures - are similar with respect to the Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host may be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. - E.g. on error restart VM in initial state, restart VM from last snapshot, or fail VM over to standby. - - -2.3 Use Case 3: VNFC failure in a statefull VNF with no redundancy -================================================================= - -Use case 3 also represents a statefull VNF, but it stores its state externally on a -virtual disk provided by the NFVI. It has a single VNFC and it is managed by the VNFM -(Fig 5). - -In this use case the VNFC fails and the failure is detected and handled by the VNFM. - - -.. figure:: images/Slide10.png - :alt: VNFC failure in a statefull VNF No-Red - :figclass: align-center - - Fig 5. VNFC failure in a statefull VNF with no redundancy - - -.. figure:: images/StatefullVNF-VNFCfailureNoRed.png - :alt: MSC of the VNFC failure in a statefull VNF No-Red - :figclass: align-center - - Fig 6. Sequence of events for use case 3 - - -The VNFC periodically checkpoints the state of the Provided NF to the external storage, -so that in case of failure the Provided NF can be resumed (Fig 6). - -When the VNFC fails the Provided NF is interrupted. The failure is detected by the VNFM -somehow, which to isolate the fault first cleans up the VNFC, then if the cleanup is -successful it restarts the VNFC. When the VNFC starts up, first it reads the last checkpoint -for the Provided NF, then resumes providing it. The service outage lasts from the VNFC failure -till this moment. - -The key points in this scenario are: - -* The service state is saved in an external storage which should be highly available too to - protect the service. -* The NFVI should provide this guarantee and also that storage and access network failures - are handled seemlessly from the VNF's perspective. -* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. This is - not required if the VNF also provides its availability management. -* The Provided NF can be resumed only after the VNFC is restarted and it has restored the - service state from the last checkpoint created before the failure. -* Having a spare VNFC can speed up the service recovery. This requires that the VNFM coordinates - the role each VNFC takes with respect to the Provided NF. I.e. the VNFCs do not act on the - stored state simultaneously potentially interfering and corrupting it. - - - -2.4 Use Case 4: VM failure in a statefull VNF with no redundancy -=============================================================== - -Use case 4 also represents a statefull VNF without redundancy, which stores its state externally on a -virtual disk provided by the NFVI. It has a single VNFC managed by the VNFM -(Fig 7) as in use case 3. - -In this use case the VM hosting the VNFC fails and the failure is detected and handled by -the VNFM and the VIM simultaneously. - - -.. figure:: images/Slide11.png - :alt: VM failure in a statefull VNF No-Red - :figclass: align-center - - Fig 7. VM failure in a statefull VNF with no redundancy - -.. figure:: images/StatefullVNF-VMfailureNoRed.png - :alt: MSC of the VM failure in a statefull VNF No-Red - :figclass: align-center - - Fig 8. Sequence of events for use case 4 - -Again, the VNFC regularly checkpoints the state of the Provided NF to the external storage, -so that it can be resumed in case of a failure (Fig 8). - -When the VM hosting the VNFC fails the Provided NF is interrupted. - -On the one hand side, the failure is detected by the VNFM somehow, which to isolate the fault tries -to clean the VNFC up which cannot be done because of the VM failure. When the absence of the VM has been -determined the VNFM has to wait with restarting the VNFC until the hosting VM is restored. The VNFM -may report the problem to the VIM, requesting a repair. - -On the other hand the failure is detected in the NFVI and reported to the VIM, which reports it -to the VNFM, if the VNFM hasn't reported it yet. -If the VNFM has requested the VM repair or if it acknowledges the repair, the VIM restarts the VM. -Once the VM is up the VIM reports it to the VNFM, which in turn can restart the VNFC. - -When the VNFC restarts first it reads the last checkpoint for the Provided NF, -to be able to resume it. -The service outage last until this is recovery completed. - -The key points in this scenario are: - - -* The service state is saved in external storage which should be highly available to - protect the service. -* The NFVI should provide such a guarantee and also that storage and access network failures - are handled seemlessly from the perspective of the VNF. -* The Provided NF can be resumed only after the VM and the VNFC are restarted and the VNFC - has restored the service state from the last checkpoint created before the failure. -* The VNFM has means to detect VNFC failures and manage its life-cycle appropriately. Alternatively - the VNF may also provide its availability management. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to the Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host may be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service - recovery for this use case, but by itself it may not protect against VNFC level failures. I.e. - VNFC level error detection is still required. - - - -2.5 Use Case 5: VNFC failure in a stateless VNF with redundancy -=============================================================== - -Use case 5 represents a stateless VNF with redundancy, i.e. it is composed of VNFC1 and VNFC2. -They are managed by an HA manager within the VNF. The HA manager assigns the active role to provide -the Provided NF to one of the VNFCs while the other remains a spare meaning that it has no state -information for the Provided NF (Fig 9) therefore it could replace any other VNFC capable of -providing the Provided NF service. - -In this use case the VNFC fails and the failure is detected and handled by the HA manager. - - -.. figure:: images/Slide13.png - :alt: VNFC failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 9. VNFC failure in a stateless VNF with redundancy - - -.. figure:: images/StatelessVNF-VNFCfailure.png - :alt: MSC of the VNFC failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 10. Sequence of events for use case 5 - - -Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not even been instantiated -yet (Fig 10). - -When VNFC2 fails the Provided NF is interrupted. This failure is detected by the HA manager, -which as a first reaction cleans up VNFC2 (fault isolation), then it assigns the active role to -VNFC1. It may report an error to the VNFM as well. - -Since there is no state information to recover, VNFC1 can accept the active role right away -and resume providing the Provided NF service. Thus the service outage is over. If the HA manager -reported an error to the VNFM it should clear it at this point. - -The key points in this scenario are: - -* The spare VNFC may be instantiated only once the failure of active VNFC is detected. -* As a result the HA manager's role might be limited to life-cycle management, i.e. no role - assignment is needed if the VNFCs provide the service as soon as they are started up. -* Accordingly the HA management could be part of a generic VNFM provided it is capable of detecting - the VNFC failures. Besides the service users, the VNFC failure may not be detectable at any other - part of the system. -* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby - may protect all of them. -* Reporting the service failure to the VNFM is optional as the HA manager is in charge of recovering - the service and it is aware of the redundancy needed to do so. - - -2.6 Use Case 6: VM failure in a stateless VNF with redundancy -============================================================ - - -Similarly to use case 5, use case 6 represents a stateless VNF composed of VNFC1 and VNFC2, -which are managed by an HA manager within the VNF. The HA manager assigns the active role to -provide the Provided NF to one of the VNFCs while the other remains a spare meaning that it has -no state information for the Provided NF (Fig 11) and it could replace any other VNFC capable -of providing the Provided NF service. - -As opposed to use case 5 in this use case the VM hosting one of the VNFCs fails. This failure is -detected and handled by the HA manager as well as the VIM. - - -.. figure:: images/Slide14.png - :alt: VM failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 11. VM failure in a stateless VNF with redundancy - - -.. figure:: images/StatelessVNF-VMfailure.png - :alt: MSC of the VM failure in a stateless VNF with redundancy - :figclass: align-center - - Fig 12. Sequence of events for use case 6 - - -Initially VNFC2 provides the Provided NF while VNFC1 is idle or might not have been instantiated -yet (Fig 12) as in use case 5. - -When VM2 fails VNFC2 fails with it and the Provided NF is interrupted. The failure is detected by -the HA manager and by the VIM simultaneously and independently. - -The HA manager's first reaction is trying to clean up VNFC2 to isolate the fault. This is considered to -be successful as soon as the disappearance of the VM is confirmed. -After this the HA manager assigns the active role to VNFC1. It may report the error to the VNFM as well -requesting a VM repair. - -Since there is no state information to recover, VNFC1 can accept the assignment right away -and resume the Provided NF service. Thus the service outage is over. If the HA manager reported -an error to the VNFM for the service it should clear it at this point. - -Simultaneously the VM failure is detected in the NFVI and reported to the VIM, which reports it -to the VNFM, if the VNFM hasn't requested a repair yet. If the VNFM requested the VM repair or if -it acknowledges the repair, the VIM restarts the VM. - -Once the VM is up the VIM reports it to the VNFM, which in turn may restart the VNFC if needed. - - -The key points in this scenario are: - -* The spare VNFC may be instantiated only after the detection of the failure of the active VNFC. -* As a result the HA manager's role might be limited to life-cycle management, i.e. no role - assignment is needed if the VNFC provides the service as soon as it is started up. -* Accordingly the HA management could be part of a generic VNFM provided if it is capable of detecting - failures in/of the VNFC and managing its life-cycle. -* Also there could be multiple active VNFCs sharing the load of Provided NF and the spare/standby - may protect all of them. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to each Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure a host needs to be fenced first. -* The negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* Reporting the service failure to the VNFM is optional as the HA manager is in charge recovering - the service and it is aware of the redundancy needed to do so. - - - -2.7 Use Case 7: VNFC failure in a stateless VNF with no redundancy -================================================================== - -Use case 7 represents a stateless VNF composed of a single VNFC, i.e. with no redundancy. -The VNF and in particular its VNFC is managed by the VNFM through managing its life-cycle (Fig 13). - -In this use case the VNFC fails. This failure is detected and handled by the VNFM. This use case -requires that the VNFM can detect the failures in the VNF or they are reported to the VNFM. - -The failure is only detectable at the VNFM level and it is handled by the VNFM restarting the VNFC. - - -.. figure:: images/Slide16.png - :alt: VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 13. VNFC failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VNFCfailureNoRed.png - :alt: MSC of the VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 14. Sequence of events for use case 7 - -The VNFC is providing the Provided NF when it fails (Fig 14). This failure is detected or reported to -the VNFM, which has to clean up the VNFC to isolate the fault. After cleanup success it can proceed -with restarting the VNFC, which as soon as it is up it starts to provide the Provided NF -as there is no state to recover. - -Thus the service outage is over, but it has included the entire time needed to restart the VNFC. -Considering that the VNF is stateless this may not be significant still. - - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The Provided NF can be resumed as soon as the VNFC is restarted, i.e. the restart time determines - the outage. -* In case multiple VNFCs are used they should not interfere with one another, they should - operate independently. - - -2.8 Use Case 8: VM failure in a stateless VNF with no redundancy -================================================================ - -Use case 8 represents the same stateless VNF composed of a single VNFC as use case 7, i.e. with -no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its -life-cycle (Fig 15). - -In this use case the VM hosting the VNFC fails. This failure is detected and handled by the VNFM -as well as by the VIM. - - -.. figure:: images/Slide17.png - :alt: VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 15. VM failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VMfailureNoRed.png - :alt: MSC of the VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 16. Sequence of events for use case 8 - -The VNFC is providing the Provided NF when the VM hosting the VNFC fails (Fig 16). - -This failure may be detected or reported to the VNFM as a failure of the VNFC. The VNFM may -not be aware at this point that it is a VM failure. Accordingly its first reaction as in use case 7 -is to clean up the VNFC to isolate the fault. Since the VM is gone, this cannot succeed and the VNFM -becomes aware of the VM failure through this or it is reported by the VIM. In either case it has to wait -with the repair of the VMFC until the VM becomes available again. - -Meanwhile the VIM also detects the VM failure and reports it to the VNFM unless the VNFM has already -requested the VM repair. After the VNFM confirming the VM repair the VIM restarts the VM and reports -the successful repair to the VNFM, which in turn can start the VNFC hosted on it. - - -Thus the recovery of the Provided NF includes the restart time of the VM and of the VNFC. - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The Provided NF can be resumed only after the VNFC is restarted on the repaired VM, i.e. the - restart time of the VM and the VNFC determines the outage. -* In case multiple VNFCs are used they should not interfere with one another, they should - operate independently. -* The VNFM may not know for sure that the VM failed until the VIM reports this. It also cannot - distinguish host, hypervisor and host OS failures. Thus the VIM should report/alarm and log - VM, hypervisor, and physical host failures. The use cases for these failures are - similar with respect to each Provided NF. -* The VM repair also should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first. -* The repair negotiation between the VNFM and the VIM may be replaced by configured repair actions. -* VM level redundancy, i.e. running a standby or spare VM in the NFVI would allow faster service - recovery for this use case, but by itself it may not protect against VNFC level failures. I.e. - VNFC level error detection is still required. - -2.9 Use Case 9: Repeated VNFC failure in a stateless VNF with no redundancy -=========================================================================== - -Finally use case 9 represents again a stateless VNF composed of a single VNFC as in use case 7, i.e. -with no redundancy. The VNF and in particular its VNFC is managed by the VNFM through managing its -life-cycle. - -In this use case the VNFC fails repeatedly. This failure is detected and handled by the VNFM, -but results in no resolution of the fault (Fig 17) because the VNFC is manifesting a fault, -which is not in its scope. I.e. the fault is propagating to the VNFC from a faulty VM or host, -for example. Thus the VNFM cannot resolve the problem by itself. - - -.. figure:: images/Slide19.png - :alt: Repeated VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 17. VM failure in a stateless VNF with no redundancy - - -To handle this case the failure handling needs to be escalated to the a bigger fault zone -(or fault domain), i.e. a scope within which the faults may propagate and manifest. In case of the -VNF the bigger fault zone is the VM and the facilities hosting it, all managed by the VIM. - -Thus the VNFM should request the repair from the VIM (Fig 18). - -Since the VNFM is only aware of the VM, it needs to report an error on the VM and it is the -VIM's responsibility to sort out what might be the scope of the actual fault depending on other -failures and error reports in its scope. - - -.. figure:: images/Slide20.png - :alt: Escalation of repeated VNFC failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 18. VM failure in a stateless VNF with no redundancy - - -.. figure:: images/StatelessVNF-VNFCfailureNoRed-Escalation.png - :alt: MSC of the VM failure in a stateless VNF with no redundancy - :figclass: align-center - - Fig 19. Sequence of events for use case 9 - - -This use case starts similarly to use case 7, i.e. the VNFC is providing the Provided NF when it fails -(Fig 17). -This failure is detected or reported to the VNFM, which cleans up the VNFC to isolate the fault. -After successful cleanup the VNFM proceeds with restarting the VNFC, which as soon as it is up -starts to provide the Provided NF again as in use case 7. - -However the VNFC failure occurs N times repeatedly within some Probation time for which the VNFM starts -the timer when it detects the first failure of the VNFC. When the VNFC fails once more still within the -probation time the Escalation counter maximum is exceeded and the VNFM reports an error to the VIM on -the VM hosting the VNFC as obviously cleaning up and restarting the VNFC did not solve the problem. - -When the VIM receives the error report for the VM it has to isolate the fault by cleaning up at least -the VM. After successful cleanup it can restart the VM and once it is up report the VM repair to the VNFM. -At this point the VNFM can restart the VNFC, which in turn resumes the Provided VM. - -In this scenario the VIM needs to evaluate what may be the scope of the fault to determine what entity -needs a repair. For example, if it has detected VM failures on that same host, or other VNFMs -reported errors on VMs hosted on the same host, it should consider that the entire host needs a repair. - - -The key points in this scenario are: - -* The VNFM has to have the means to detect VNFC failures and manage its life-cycle appropriately. - This is not required if the VNF comes with its availability management, but this is very unlikely - for such stateless VNFs. -* The VNFM needs to correlate VNFC failures over time to be able to detect failure of a bigger fault zone. - One way to do so is through counting the failures within a probation time. -* The VIM cannot detect all failures caused by faults in the entities under its control. It should be - able to receive error reports and correlate these error reports based on the dependencies - of the different entities. -* The VNFM does not know the source of the failure, i.e. the faulty entity. -* The VM repair should start with the fault isolation as appropriate for the actual - failed entity, e.g. if the VM failed due to a host failure the host needs to be fenced first. - -******************** -3 Concluding remarks -******************** - -This use case document outlined the model and some failure modes for NFV systems. These are an -initial list. The OPNFV HA project team is continuing to grow the list of use cases and will -issue additional documents going forward. The basic use cases and service availability considerations -help define the key considerations for each use case taking into account the impact on the end service. -The use case document along with the requirements documents and gap analysis help set context for -engagement with various upstream projects. diff --git a/UseCases/UseCases_for_Network_Nodes.rst b/UseCases/UseCases_for_Network_Nodes.rst deleted file mode 100644 index bc9266a..0000000 --- a/UseCases/UseCases_for_Network_Nodes.rst +++ /dev/null @@ -1,157 +0,0 @@ -4 High Availability Scenarios for Network Nodes -=============================================== - -4.1 Network nodes and HA deployment ------------------------------------ - -OpenStack network nodes contain: Neutron DHCP agent, Neutron L2 agent, Neutron L3 agent, Neutron LBaaS -agent and Neutron Metadata agent. The DHCP agent provides DHCP services for virtual networks. The -metadata agent provides configuration information such as credentials to instances. Note that the -L2 agent cannot be distributed and highly available. Instead, it must be installed on each data -forwarding node to control the virtual network drivers such as Open vSwitch or Linux Bridge. One L2 -agent runs per node and controls its virtual interfaces. - -A typical HA deployment of network nodes can be achieved in Fig 20. Here shows a two nodes cluster. -The number of the nodes is decided by the size of the cluster. It can be 2 or more. More details can be -achieved from each agent's part. - - -.. figure:: images_network_nodes/Network_nodes_deployment.png - :alt: HA deployment of network nodes - :figclass: align-center - - Fig 20. A typical HA deployment of network nodes - - -4.2 DHCP agent --------------- - -The DHCP agent can be natively highly available. Neutron has a scheduler which lets you run multiple -agents across nodes. You can configure the dhcp_agents_per_network parameter in the neutron.conf file -and set it to X (X >=2 for HA, default is 1). - -If the X is set to 2, as depicted in Fig 21 three tenant networks (there can be multiple tenant networks) -are used as an example, six DHCP agents are deployed in two nodes for three networks, they are -all active. Two dhcp1s serve one network, dhcp2s and dhcp3s serve other two different networks. In a -network, all DHCP traffic is broadcast, DHCP servers race to offer IP. All the servers will update the -lease tables. In Fig 22, when the agent(s) in Node1 doesn't work which can be caused by software -failure or hardware failure, the dhcp agent(s) on Node2 will continue to offer IP for the network. - - -.. figure:: images_network_nodes/DHCP_deployment.png - :alt: HA deployment of DHCP agents - :figclass: align-center - - Fig 21. Natively HA deployment of DHCP agents - - -.. figure:: images_network_nodes/DHCP_failure.png - :alt: Failure of DHCP agents - :figclass: align-center - - Fig 22. Failure of DHCP agents - - -4.3 L3 agent ------------- - -The L3 agent is also natively highly available. To achieve HA, it can be configured in the neutron.conf -file. - -.. code-block:: bash - - l3_ha = True # All routers are highly available by default - - allow_automatic_l3agent_failover = True # Set automatic L3 agent failover for routers - - max_l3_agents_per_router = 2 # Maximum number of network nodes to use for the HA router - - min_l3_agents_per_router = 2 # Minimum number of network nodes to use for the HA router. A new router - can be created only if this number of network nodes are available. - -According to the neutron.conf file, the L3 agent scheduler supports Virtual Router Redundancy -Protocol (VRRP) to distribute virtual routers across multiple nodes (e.g. 2). The scheduler will choose -a number between the maximum and the minimum number according scheduling algorithm. VRRP is implemented -by Keepalived. - -As depicted in Fig 23, both L3 agents in Node1 and Node2 host vRouter 1 and vRouter 2. In Node 1, -vRouter 1 is active and vRouter 2 is standby (hot standby). In Node2, vRouter 1 is standby and -vRouter 2 is active. For the purpose of reducing the load, two actives are deployed in two Nodes -alternatively. In Fig 24, Keepalived will be used to manage the VIP interfaces. One instance of -keepalived per virtual router, then one per namespace. 169.254.192.0/18 is a dedicated HA network -which is created in order to isolate the administrative traffic from the tenant traffic, each vRouter -will be connected to this dedicated network via an HA port. More details can be achieved from the -Reference at the bottom. - - -.. figure:: images_network_nodes/L3_deployment.png - :alt: HA deployment of L3 agents - :figclass: align-center - - Fig 23. Natively HA deployment of L3 agents - - -.. figure:: images_network_nodes/L3_ha_principle.png - :alt: HA principle of L3 agents - :figclass: align-center - - Fig 24. Natively HA principle of L3 agents - - -In Fig 25, when vRouter 1 in Node1 is down which can be caused by software failure or hardware failure, -the Keepalived will detect the failure and the standby will take over to be active. In order to keep the -TCP connection, Conntrackd is used to maintain the TCP sessions going through the router. One instance -of conntrackd per virtual router, then one per namespace. After then, a rescheduling procedure will be -triggered to respawn the failed virtual router to another l3 agent as standby. All the workflows is -depicted in Fig 26. - - -.. figure:: images_network_nodes/L3_failure.png - :alt: Failure of L3 agents - :figclass: align-center - - Fig 25. Failure of L3 agents - - -.. figure:: images_network_nodes/L3_ha_workflow.png - :alt: HA workflow of L3 agents - :figclass: align-center - - Fig 26. HA workflow of L3 agents - - -4.4 LBaaS agent and Metadata agent ----------------------------------- - -Currently, no native feature is provided to make the LBaaS agent highly available using the defaul -plug-in HAProxy. A common way to make HAProxy highly available is to use Pacemaker. - - -.. figure:: images_network_nodes/LBaaS_deployment.png - :alt: HA deployment of LBaaS agents - :figclass: align-center - - Fig 27. HA deployment of LBaaS agents using Pacemaker - - -As shown in Fig 27 HAProxy and pacemaker are deployed in both of the network nodes. The number of network -nodes can be 2 or more. It depends on your cluster. HAProxy in Node 1 is the master and the VIP is in -Node 1. Pacemaker monitors the liveness of HAProxy. - - -.. figure:: images_network_nodes/LBaaS_failure.png - :alt: Failure of LBaaS agents - :figclass: align-center - - Fig 28. Failure of LBaaS agents - - -As shown in Fig 28 when HAProxy in Node1 falls down which can be caused by software failure or hardware -failure, Pacemaker will fail over HAProxy and the VIP to Node 2. - -Note that the default plug-in HAProxy only supports TCP and HTTP. - -No native feature is available to make Metadata agent highly available. At this time, the Active/Passive -solution exists to run the neutron metadata agent in failover mode with Pacemaker. The deployment and -failover procedure can be the same as the case of LBaaS. - diff --git a/UseCases/images/Slide10.png b/UseCases/images/Slide10.png Binary files differdeleted file mode 100644 index b3545e8..0000000 --- a/UseCases/images/Slide10.png +++ /dev/null diff --git a/UseCases/images/Slide11.png b/UseCases/images/Slide11.png Binary files differdeleted file mode 100644 index 3aa5f67..0000000 --- a/UseCases/images/Slide11.png +++ /dev/null diff --git a/UseCases/images/Slide13.png b/UseCases/images/Slide13.png Binary files differdeleted file mode 100644 index 207c4a7..0000000 --- a/UseCases/images/Slide13.png +++ /dev/null diff --git a/UseCases/images/Slide14.png b/UseCases/images/Slide14.png Binary files differdeleted file mode 100644 index e6083c9..0000000 --- a/UseCases/images/Slide14.png +++ /dev/null diff --git a/UseCases/images/Slide16.png b/UseCases/images/Slide16.png Binary files differdeleted file mode 100644 index 484ffa2..0000000 --- a/UseCases/images/Slide16.png +++ /dev/null diff --git a/UseCases/images/Slide17.png b/UseCases/images/Slide17.png Binary files differdeleted file mode 100644 index 7240aaa..0000000 --- a/UseCases/images/Slide17.png +++ /dev/null diff --git a/UseCases/images/Slide19.png b/UseCases/images/Slide19.png Binary files differdeleted file mode 100644 index 7e3c10b..0000000 --- a/UseCases/images/Slide19.png +++ /dev/null diff --git a/UseCases/images/Slide20.png b/UseCases/images/Slide20.png Binary files differdeleted file mode 100644 index 2e9759b..0000000 --- a/UseCases/images/Slide20.png +++ /dev/null diff --git a/UseCases/images/Slide4.png b/UseCases/images/Slide4.png Binary files differdeleted file mode 100644 index a701f42..0000000 --- a/UseCases/images/Slide4.png +++ /dev/null diff --git a/UseCases/images/Slide6.png b/UseCases/images/Slide6.png Binary files differdeleted file mode 100644 index 04a904f..0000000 --- a/UseCases/images/Slide6.png +++ /dev/null diff --git a/UseCases/images/StatefullVNF-VMfailure.png b/UseCases/images/StatefullVNF-VMfailure.png Binary files differdeleted file mode 100644 index 2f62232..0000000 --- a/UseCases/images/StatefullVNF-VMfailure.png +++ /dev/null diff --git a/UseCases/images/StatefullVNF-VMfailureNoRed.png b/UseCases/images/StatefullVNF-VMfailureNoRed.png Binary files differdeleted file mode 100644 index 6f3058d..0000000 --- a/UseCases/images/StatefullVNF-VMfailureNoRed.png +++ /dev/null diff --git a/UseCases/images/StatefullVNF-VNFCfailure.png b/UseCases/images/StatefullVNF-VNFCfailure.png Binary files differdeleted file mode 100644 index 9021f2d..0000000 --- a/UseCases/images/StatefullVNF-VNFCfailure.png +++ /dev/null diff --git a/UseCases/images/StatefullVNF-VNFCfailureNoRed.png b/UseCases/images/StatefullVNF-VNFCfailureNoRed.png Binary files differdeleted file mode 100644 index 4fd7e2e..0000000 --- a/UseCases/images/StatefullVNF-VNFCfailureNoRed.png +++ /dev/null diff --git a/UseCases/images/StatelessVNF-VMfailure.png b/UseCases/images/StatelessVNF-VMfailure.png Binary files differdeleted file mode 100644 index 9b94183..0000000 --- a/UseCases/images/StatelessVNF-VMfailure.png +++ /dev/null diff --git a/UseCases/images/StatelessVNF-VMfailureNoRed.png b/UseCases/images/StatelessVNF-VMfailureNoRed.png Binary files differdeleted file mode 100644 index 2a14b67..0000000 --- a/UseCases/images/StatelessVNF-VMfailureNoRed.png +++ /dev/null diff --git a/UseCases/images/StatelessVNF-VNFCfailure.png b/UseCases/images/StatelessVNF-VNFCfailure.png Binary files differdeleted file mode 100644 index f2dcc3b..0000000 --- a/UseCases/images/StatelessVNF-VNFCfailure.png +++ /dev/null diff --git a/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png b/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png Binary files differdeleted file mode 100644 index 6719177..0000000 --- a/UseCases/images/StatelessVNF-VNFCfailureNoRed-Escalation.png +++ /dev/null diff --git a/UseCases/images/StatelessVNF-VNFCfailureNoRed.png b/UseCases/images/StatelessVNF-VNFCfailureNoRed.png Binary files differdeleted file mode 100644 index a0970fc..0000000 --- a/UseCases/images/StatelessVNF-VNFCfailureNoRed.png +++ /dev/null diff --git a/UseCases/images_network_nodes/DHCP_deployment.png b/UseCases/images_network_nodes/DHCP_deployment.png Binary files differdeleted file mode 100755 index 90bb740..0000000 --- a/UseCases/images_network_nodes/DHCP_deployment.png +++ /dev/null diff --git a/UseCases/images_network_nodes/DPCH_failure.png b/UseCases/images_network_nodes/DPCH_failure.png Binary files differdeleted file mode 100755 index 07a51f8..0000000 --- a/UseCases/images_network_nodes/DPCH_failure.png +++ /dev/null diff --git a/UseCases/images_network_nodes/L3_deployment.png b/UseCases/images_network_nodes/L3_deployment.png Binary files differdeleted file mode 100755 index ff573b6..0000000 --- a/UseCases/images_network_nodes/L3_deployment.png +++ /dev/null diff --git a/UseCases/images_network_nodes/L3_failure.png b/UseCases/images_network_nodes/L3_failure.png Binary files differdeleted file mode 100755 index 57485ad..0000000 --- a/UseCases/images_network_nodes/L3_failure.png +++ /dev/null diff --git a/UseCases/images_network_nodes/L3_ha_principle.png b/UseCases/images_network_nodes/L3_ha_principle.png Binary files differdeleted file mode 100755 index 59a3161..0000000 --- a/UseCases/images_network_nodes/L3_ha_principle.png +++ /dev/null diff --git a/UseCases/images_network_nodes/L3_ha_workflow.png b/UseCases/images_network_nodes/L3_ha_workflow.png Binary files differdeleted file mode 100755 index d923f4f..0000000 --- a/UseCases/images_network_nodes/L3_ha_workflow.png +++ /dev/null diff --git a/UseCases/images_network_nodes/LBaaS_deployment.png b/UseCases/images_network_nodes/LBaaS_deployment.png Binary files differdeleted file mode 100755 index d4e5929..0000000 --- a/UseCases/images_network_nodes/LBaaS_deployment.png +++ /dev/null diff --git a/UseCases/images_network_nodes/LBaaS_failure.png b/UseCases/images_network_nodes/LBaaS_failure.png Binary files differdeleted file mode 100755 index 5262fd0..0000000 --- a/UseCases/images_network_nodes/LBaaS_failure.png +++ /dev/null diff --git a/UseCases/images_network_nodes/Network_nodes_deployment.png b/UseCases/images_network_nodes/Network_nodes_deployment.png Binary files differdeleted file mode 100755 index bb0f3db..0000000 --- a/UseCases/images_network_nodes/Network_nodes_deployment.png +++ /dev/null |