From cdc8a81fe94348282c5a4f975163f8c231976fa8 Mon Sep 17 00:00:00 2001 From: Maria Toeroe Date: Thu, 22 Oct 2015 10:04:31 -0400 Subject: Incorporate software dimensions and other comments Add definitions of rollback, downgrade and restore Add rollforward and apply other comments JIRA: ESCALATOR-24 Change-Id: I4a576c8fe1a7751ee934693ed8f948617a5542a0 Signed-off-by: Maria Toeroe --- doc/02-Background_and_Terminologies.rst | 267 +++++++++++++++++++------------- 1 file changed, 163 insertions(+), 104 deletions(-) (limited to 'doc/02-Background_and_Terminologies.rst') diff --git a/doc/02-Background_and_Terminologies.rst b/doc/02-Background_and_Terminologies.rst index afb392f..36a81f2 100644 --- a/doc/02-Background_and_Terminologies.rst +++ b/doc/02-Background_and_Terminologies.rst @@ -1,4 +1,4 @@ -General Requirements Background and Terminology +General Requirements Background and Terminology ----------------------------------------------- Terminologies and definitions @@ -12,7 +12,7 @@ NFVI VIM The term is an abbreviation for Virtual Infrastructure Management; sometimes it is also referred as control plane in this document. - + Operator The term refers to network service providers and Virtual Network Function (VNF) providers. @@ -25,12 +25,12 @@ Network Service End-users using a set of (virtualized) Network Functions Infrastructure Services - The term refers to services provided by the NFV Infrastructure and the - the Management & Orchestration functions to the VNFs. I.e. + The term refers to services provided by the NFV Infrastructure and the + the Management & Orchestration functions to the VNFs. I.e. these are the virtual resources as perceived by the VNFs. Smooth Upgrade - The term refers to an upgrade that results in no service outage + The term refers to an upgrade that results in no service outage for the end-users. Rolling Upgrade @@ -38,7 +38,7 @@ Rolling Upgrade a subset of nodes in a wave style rolling through the data centre. It is a popular upgrade strategy to maintain service availability. -Parallel Universe +Parallel Universe Upgrade The term refers to an upgrade strategy that creates and deploys a new universe - a system with the new configuration - while the old system continues running. The state of the old system is transferred @@ -59,22 +59,57 @@ Virtual Resource they are the resources on which VNF entities are deployed, e.g. the VMs, virtual switches, virtual routers, virtual disks etc. -.. I don't think the VNF is the virtual resource. Virtual - resources are the VMs, virtual switches, virtual routers, virtual - disks etc. The VNF uses them, but I don't think they are equal. The - VIM doesn't manage the VNF, but it does manage virtual resources. - Visualization Facility - The term refers to a resource that enables the creation - of virtual environments on top of the physical resources, e.g. - hypervisor, OpenStack, etc. + The term refers to a resource that enables the creation + of virtual environments on top of the physical resources, e.g. + hypervisor, OpenStack, etc. + +Upgrade Campaign + The term refers to a choreography that describes how the upgrade should + be performed in terms of its targets (i.e. upgrade objects), the + steps/actions required of upgrading each, and the coordination of these + steps so that service availability can be maintained. It is an input to an + upgrade tool (Escalator) to carry out the upgrade. + +Upgrade Duration + The duration of an upgrade characterized by the time elapsed between its + initiation and its completion. E.g. from the moment the execution of an + upgrade campaign has started until it has been committed. Depending on + the upgrade method and its target some parts of the system may be in a more + vulnerable state. + +Outage + The period of time during which a given service is not provided is referred + as the outage of that given service. If a subsystem or the entire system + does not provide any service, it is the outage of the given subsystem or the + system. Smooth upgrade means upgrade with no outage for the user plane, i.e. + no VNF should experience service outage. + +Rollback + The term refers to a failure handling strategy that reverts the changes + done by a potentially failed upgrade execution one by one in a reverse order. + I.e. it is like undoing the changes done by the upgrade. + +Restore + The term refers to a failure handling strategy that reverts the changes + done by an upgrade by restoring the system from some backup data. This + results in the loss of any data persisted since the backup has been taken. + +Rollforward + The term refers to a failure handling strategy applied after a restore + (from a backup) opertaion to recover any loss of data persisted between + the time the backup has been taken and the moment it is restored. Rollforward + requires that data that needs to survive the restore operation is logged at + a location not impacted by the restore so that it can be re-applied to the + system after its restoration from the backup. + +Downgrade + The term refers to an upgrade in which an earlier version of the software + is restored through the upgrade procedure. A system can be downgraded to any + earlier version and the compatibility of the versions will determine the + applicable upgrade strategies and whether service outage can be avoided. + In particular any data conversion needs special attention. -Upgrade Plan (or Campaign?) - The term refers to a choreography that describes how the upgrade should - be performed in terms of its targets (i.e. upgrade objects), the - steps/actions required of upgrading each, and the coordination of these - steps so that service availability can be maintained. It is an input to an - upgrade tool (Escalator) to carry out the upgrade Upgrade Objects @@ -83,29 +118,33 @@ Upgrade Objects Physical Resource ^^^^^^^^^^^^^^^^^ -Most of cloud infrastructures support dynamic addition/removal of -hardware. A hardware upgrade could be done by adding the new -hardware node and removing the old one. From the persepctive of smooth +Most cloud infrastructures support the dynamic addition/removal of +hardware. Accordingly a hardware upgrade could be done by adding the new +piece of hardware and removing the old one. From the persepctive of smooth upgrade the orchestration/scheduling of this actions is the primary concern. -Upgrading a physical resource, -like upgrading its firmware and/or modify its configuration data, may -also be considered in the future. +Upgrading a physical resource may involve as well the upgrade of its firmware +and/or modifying its configuration data. This may require the restart of the +hardware. + Virtual Resources ^^^^^^^^^^^^^^^^^ -Virtual resource upgrade mainly done by users. OPNFV may facilitate -the activity, but suggest to have it in long term roadmap instead of -initiate release. +Addition and removal of virtual resources may be initiated by the users or be +a result of an elasticity action. Users may also request the upgrade of their +virtual resources using a new VM image. + +.. Needs to be moved to requirement section: Escalator should facilitate such an +option and allow for a smooth upgrade. -.. same comment here: I don't think the VNF is the virtual - resource. Virtual resources are the VMs, virtual switches, virtual - routers, virtual disks etc. The VNF uses them, but I don't think they - are equal. For example if by some reason the hypervisor is changed and - the current VMs cannot be migrated to the new hypervisor, they are - incompatible, then the VMs need to be upgraded too. This is not - something the NFVI user (i.e. VNFs ) would even know about. +On the other hand changes in the infrastructure, namely, in the hardware and/or +the virtualization facility resources may result in the upgrade of the virtual +resources. For example if by some reason the hypervisor is changed and +the current VMs cannot be migrated to the new hypervisor - they are +incompatible - then the VMs need to be upgraded too. This is not +something the NFVI user (i.e. VNFs ) would know about. In such cases +smooth upgrade is essential. Virtualization Facility Resources @@ -189,95 +228,115 @@ result in the same. Updating 3 might lead to control plane services interruption if not an HA deployment. -Upgrade Span -~~~~~~~~~~~~ -**Major Upgrade** -Upgrades between major releases may introducing significant changes in -function, configuration and data, such as the upgrade of OPNFV from -Arno to Brahmaputra. -**Minor Upgrade** - -Upgrades inside one major releases which would not leads to changing -the structure of the platform and may not infect the schema of the -system data. Upgrade Granularity ~~~~~~~~~~~~~~~~~~~ -Physical/Hardware Dimension -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The granularity of an upgrade can be characterized from two perspective: +- the physical dimension and +- the software dimension + + +Physical Dimension +^^^^^^^^^^^^^^^^^^ + +The physical dimension characterizes the number of similar upgrade objects +targeted by the upgrade, i.e. whether it is full / partial upgrade of a +data centre, cluster, zone. +Because of the upgrade of a data centre or a zone, it may be divided into +several batches. Thus there is a need for efficiency in the execution of +upgrades of potentially huge number of upgrade objects while still maintain +availability to fulfill the requirement of smooth upgrade. -Support full / partial upgrade for data centre, cluster, zone. Because -of the upgrade of a data centre or a zone, it may be divided into -several batches. The upgrade of a cloud environment (cluster) may also +The upgrade of a cloud environment (cluster) may also be partial. For example, in one cloud environment running a number of -VNFs, we may just try one of them to check the stability and +VNFs, we may just try to upgrade one of them to check the stability and performance, before we upgrade all of them. +Thus there is a need for proper organization of the artifacts associated with +the different upgrade objects. Also the different versions should be able +to coextist beyond the upgrade period. + +From this perspective special attention may be needed when upgrading +objects that are collaborating in a redundancy schema as in this case +different versions not only need to coexist but also collaborate. This +puts requirement on the upgrade objects primarily. If this is not possible +the upgrade campaign should be designed in such a way that the proper +isolation is ensured. Software Dimension ^^^^^^^^^^^^^^^^^^ -- The upgrade of host OS or kernel may need a 'hot migration' -- The upgrade of OpenStack’s components +The software dimension of the upgrade characterizes the upgrade object +type targeted and the combination in which they are upgraded together. - i.the one-shot upgrade of all components - - ii.the partial upgrade (or bugfix patch) which only affects some - components (e.g., computing, storage, network, database, message - queue, etc.) +Even though the upgrade may +initially target only one type of upgrade object, e.g. the hypervisor +the dependency of other upgrade objects on this initial target object may +require their upgrade as well. I.e. the upgrades need to be combined. From this +perspective the main concern is compatibility of the dependent and +sponsor objects. To take into consideration of these dependencies +they need to be described together with the version compatility information. +Breaking dependencies is the major cause of outages during upgrades. -.. this section seems to overlap with 2.1. - I can see the following dimensions for the software. +In other cases it is more efficient to upgrade a combination of upgrade +objects than to do it one by one. One aspect of the combination is how +the upgrade packages can be combined, whether a new image can be created for +them before hand or the different packages can be installed during the upgrade +independently, but activated together. -.. different software packages +The combination of upgrade objects may span across +layers (e.g. software stack in the host and the VM of the VNF). +Thus, it may require additional coordination between the management layers. -.. different functions - Considering that the target versions of all - software are compatible the upgrade needs to ensure that any - dependencies between SW and therefore packages are taken into account - in the upgrade plan, i.e. no version mismatch occurs during the - upgrade therefore dependencies are not broken - -.. same function - This is an upgrade specific question if different - versions can coexist in the system when a SW is being upgraded from - one version to another. This is particularly important for stateful - functions e.g. storage, networking, control services. The upgrade - method must consider the compatibility of the redundant entities. +With respect to each upgrade object type and even stacks we can +distingush major and minor upgrades: -.. different versions of the same software package +**Major Upgrade** + +Upgrades between major releases may introducing significant changes in +function, configuration and data, such as the upgrade of OPNFV from +Arno to Brahmaputra. + +**Minor Upgrade** + +Upgrades inside one major releases which would not leads to changing +the structure of the platform and may not infect the schema of the +system data. + +Scope of Impact +~~~~~~~~~~~~~~~ + +Considering availability and therefore smooth upgrade, one of the major +concerns is the predictability and control of the outcome of the different +upgrade operations. Ideally an upgrade can be performed without impacting any +entity in the system, which means none of the operations change or potentially +change the behaviour of any entity in the system in an uncotrolled manner. +Accordingly the operations of such an upgrade can be performed any time while +the system is running, while all the entities are online. No entity needs to be +taken offline to avoid such adverse effects. Hence such upgrade operations +are referred as online operations. The effects of the upgrade might be activated +next time it is used, or may require a special activation action such as a +restart. Note that the activation action provides more control and predictability. + +If an entity's behavior in the system may change due to the upgrade it may +be better to take it offline for the time of the relevant upgrade operations. +The main question is however considering the hosting relation of an upgrade +object what hosted entities are impacted. Accordingly we can identify a scope +which is impacted by taking the given upgrade object offline. The entities +that are in the scope of impact may need to be taken offline or moved out of +this scope i.e. migrated. + +If the impacted entity is in a different layer managed by another manager +this may require coordination because taking out of service some +infrastructure resources for the time of their upgrade which support virtual +resources used by VNFs that should not experience outages. The hosted VNFs +may or may not allow for the hot migration of their VMs. In case of migration +the VMs placement policy should be considered. -.. major version changes - they may introduce incompatibilities. Even - when there are backward compatibility requirements changes may cause - issues at graceful roll-back - -.. minor version changes - they must not introduce incompatibility - between versions, these should be primarily bug fixes, so live - patches should be possible - -.. different installations of the same software package -.. using different installation options - they may reflect different - users with different needs so redundancy issues are less likely - between installations of different options; but they could be the - reflection of the heterogeneous system in which case they may provide - redundancy for higher availability, i.e. deeper inspection is needed - -.. using the same installation options - they often reflect that the are - used by redundant entities across space - -.. different distribution possibilities in space - same or different - availability zones, multi-site, geo-redundancy - -.. different entities running from the same installation of a software - package - -.. using different start-up options - they may reflect different users so - redundancy may not be an issues between them - -.. using same start-up options - they often reflect redundant - entities Upgrade duration ~~~~~~~~~~~~~~~~ -- cgit 1.2.3-korg