summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorJie Hu <hu.jie@zte.com.cn>2015-09-17 19:43:22 +0800
committerJie Hu <hu.jie@zte.com.cn>2015-09-17 19:43:22 +0800
commit81eeba7607f9453ef18ba0917024fe0476cc9178 (patch)
tree1af43bb3f6200ac354d27fa3a3280f94652ee8ce /doc
parent4c56dfd4b3434d217a13f14808f4a41184e1d3bd (diff)
JIRA ESCALATOR-3
Change-Id: I6044ea74eca6ad6337cbc19f35cbfb437f8d1386 Signed-off-by: Jie Hu <hu.jie@zte.com.cn>
Diffstat (limited to 'doc')
-rw-r--r--doc/00-Authors.rst15
-rw-r--r--doc/01-Scope.rst26
-rw-r--r--doc/02-Background_and_Terminologies.rst362
-rw-r--r--doc/03-Functional_Requirements.rst224
-rw-r--r--doc/04-Use_Cases_and_Scenarios.rst32
-rw-r--r--doc/05-Reference_Architecture.rst6
-rw-r--r--doc/06-Information_Flows.rst7
-rw-r--r--doc/07-Interfaces_and_Files.rst27
-rw-r--r--doc/08-Requirements_from_other_OPNFV_Project.rst35
-rw-r--r--doc/09-Reference.rst11
-rw-r--r--doc/10-Useful_Working_Drafts_of_ETSI_NFV.rst10
-rw-r--r--doc/A1-Appendix.rst46
12 files changed, 801 insertions, 0 deletions
diff --git a/doc/00-Authors.rst b/doc/00-Authors.rst
new file mode 100644
index 0000000..fdbf61b
--- /dev/null
+++ b/doc/00-Authors.rst
@@ -0,0 +1,15 @@
+Authors:
+--------
+
+| Jie Hu (ZTE, hu.jie@zte.com.cn)
+| Qiao Fu (China Mobile, fuqiao@chinamobile.com)
+| Ulrich Kleber (Huawei, Ulrich.Kleber@huawei.com)
+| Maria Toeroe (Ericsson, maria.toeroe@ericsson.com)
+| Sama, Malla Reddy (DOCOMO, sama@docomolab-euro.com)
+| Zhong Chao (ZTE, chao.zhong@zte.com.cn)
+| Julien Zhang (ZTE, zhang.jun3g@zte.com.cn)
+| Yuri Yuan (ZTE, yuan.yue@zte.com.cn)
+| Zhipeng Huang (Huawei, huangzhipeng@huawei.com)
+| Jia Meng (ZTE, meng.jia@zte.com.cn)
+| Liyi Meng (Ericsson, liyi.meng@ericsson.com)
+| Pasi Vaananen (Stratus, pasi.vaananen@stratus.com) \ No newline at end of file
diff --git a/doc/01-Scope.rst b/doc/01-Scope.rst
new file mode 100644
index 0000000..2f791eb
--- /dev/null
+++ b/doc/01-Scope.rst
@@ -0,0 +1,26 @@
+Scope
+-----
+
+| This document describes the user requirements on the smooth upgrade
+ function of the NFVI and VIM with respect to the upgrades of the OPNFV
+ platform from one version to another. Smooth upgrade means that the
+ upgrade results in no service outage for the end-users. This requires
+ that the process of the upgrade is automatically carried out by a tool
+ (code name: Escalator) with pre-configured data. The upgrade process
+ includes preparation, validation, execution, monitoring and
+ conclusion.
+| ==[MT] While it is good to have a tool for the entire upgrade process,
+ but it is a challenging task, so maybe we shouldn't require automation
+ for the entire process right away. Automation is essential at
+ execution.==
+| ==[hujie] Maybe we can analysis information flows of the upgrade tool,
+ abstract the basic / essential actions from the tool (or tools), and
+ map them to a command set of NFVI / VIM's interfaces.==
+
+The requirements are defined in a stepwise approach, i.e. in the first
+phase focusing on the upgrade of the VIM then widening the scope to the
+NFVI.
+
+The requirements may apply to different NFV functions (NFVI, or VIM, or
+both of them) . They will be classified in the Appendix of this
+document. \ No newline at end of file
diff --git a/doc/02-Background_and_Terminologies.rst b/doc/02-Background_and_Terminologies.rst
new file mode 100644
index 0000000..b6e9552
--- /dev/null
+++ b/doc/02-Background_and_Terminologies.rst
@@ -0,0 +1,362 @@
+General Requirements Background and Terminology
+-----------------------------------------------
+
+Terminologies and definitions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **NFVI** is abbreviation for Network Function Virtualization
+ Infrastructure; sometimes it is also referred as data plane in this
+ document.
+- **VIM** is abbreviation for Virtual Infrastructure Management;
+ sometimes it is also referred as control plane in this document.
+- **Operators** are network service providers and Virtual Network
+ Function (VNF) providers.
+- **End-Users** are subscribers of Operator's services.
+- **Network Service** is a service provided by an Operator to its
+ End-users using a set of (virtualized) Network Functions
+- **Infrastructure Services** are those provided by the NFV
+ Infrastructure and the Management & Orchestration functions to the
+ VNFs. I.e. these are the virtual resources as perceived by the VNFs.
+- **Smooth Upgrade** means that the upgrade results in no service
+ outage for the end-users.
+- **Rolling Upgrade** is an upgrade strategy that upgrades each node or
+ a subset of nodes in a wave rolling style through the data centre. It
+ is a popular upgrade strategy to maintains service availability.
+- **Parallel Universe** is an upgrade strategy that creates and deploys
+ a new universe - a system with the new configuration - while the old
+ system continues running. The state of the old system is transferred
+ to the new system after sufficient testing of the later.
+- **Infrastructure Resource Model** ==(suggested by MT)== is identified
+ as: physical resources, virtualization facility resources and virtual
+ resources.
+- **Physical Resources** are the hardware of the infrastructure, may
+ also includes the firmware that enable the hardware.
+- **Virtual Resources** are resources provided as services built on top
+ of the physical resources via the virtualization facilities; in our
+ case, they are the components that VNF entities are built on, e.g.
+ the VMs, virtual switches, virtual routers, virtual disks etc
+ ==[MT] I don't think the VNF is the virtual resource. Virtual
+ resources are the VMs, virtual switches, virtual routers, virtual
+ disks etc. The VNF uses them, but I don't think they are equal. The
+ VIM doesn't manage the VNF, but it does manage virtual resources.==
+- **Visualization Facilities** are resources that enable the creation
+ of virtual environments on top of the physical resources, e.g.
+ hypervisor, OpenStack, etc.
+
+Upgrade Objects
+~~~~~~~~~~~~~~~
+
+Physical Resource
+^^^^^^^^^^^^^^^^^
+
+| Most of the cloud infrastructures support dynamic addition/removal of
+ hardware. A hardware upgrade could be done by removing the old
+ hardware node and adding the new one. Upgrade a physical resource,
+ like upgrade the firmware and modify the configuration data, may
+ be considered in the future.
+
+Virtual Resources
+^^^^^^^^^^^^^^^^^
+
+| Virtual resource upgrade mainly done by users. OPNFV may facilitate
+ the activity, but suggest to have it in long term roadmap instead of
+ initiate release.
+| ==[MT] same comment here: I don't think the VNF is the virtual
+ resource. Virtual resources are the VMs, virtual switches, virtual
+ routers, virtual disks etc. The VNF uses them, but I don't think they
+ are equal. For example if by some reason the hypervisor is changed and
+ the current VMs cannot be migrated to the new hypervisor, they are
+ incompatible, then the VMs need to be upgraded too. This is not
+ something the NFVI user (i.e. VNFs ) would even know about.==
+
+Virtualization Facility Resources
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| Based on the functionality they provide, virtualization facility
+ resources could be divided into computing node, networking node,
+ storage node and management node.
+| The possible upgrade objects in these nodes are addressed below:
+ (Note: hardware based virtualization may considered as virtualization
+ facility resource, but from escalator perspective, it is better
+ considered it as part of hardware upgrade. )
+
+**Computing node**
+
+1. OS Kernel
+2. Hypvervisor and virtual switch
+3. Other kernel modules, like driver
+4. User space software packages, like nova-compute agents and other
+ control plane programs
+
+| Updating 1 and 2 will cause the loss of virtualzation functionality of
+ the compute node, which may lead to data plane services interruption
+ if the virtual resource is not redudant.
+| Updating 3 might result the same.
+| Updating 4 might lead to control plane services interruption if not an
+ HA deployment.
+
+**Networking node**
+
+1. OS kernel, optional, not all switch/router allow you to upgrade its
+ OS since it is more like a firmware than a generic OS.
+2. User space software package, like neutron agents and other control
+ plane programs
+
+| Updating 1 if allowed will cause a node reboot and therefore leads to
+ data plane services interruption if the virtual resource is not
+ redudant.
+| Updating 2 might lead to control plane services interruption if not an
+ HA deployment.
+
+**Storage node**
+
+1. OS kernel, optional, not all storage node allow you to upgrade its OS
+ since it is more like a firmware than a generic OS.
+2. Kernel modules
+3. User space software packages, control plane programs
+
+| Updating 1 if allowed will cause a node reboot and therefore leads to
+ data plane services interruption if the virtual resource is not
+ redudant.
+| Update 2 might result in the same.
+| Updating 3 might lead to control plane services interruption if not an
+ HA deployment.
+
+**Management node**
+
+1. OS Kernel
+2. Kernel modules, like driver
+3. User space software packages, like database, message queue and
+ control plane programs.
+
+| Updating 1 will cause a node reboot and therefore leads to control
+ plane services interruption if not an HA deployment. Updating 2 might
+ result in the same.
+| Updating 3 might lead to control plane services interruption if not an
+ HA deployment.
+
+Upgrade Span
+~~~~~~~~~~~~
+
+| **Major Upgrade**
+| Upgrades between major releases may introducing significant changes in
+ function, configuration and data, such as the upgrade of OPNFV from
+ Arno to Brahmaputra.
+
+| **Minor Upgrade**
+| Upgrades inside one major releases which would not leads to changing
+ the structure of the platform and may not infect the schema of the
+ system data.
+
+Upgrade Granularity
+~~~~~~~~~~~~~~~~~~~
+
+Physical/Hardware Dimension
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Support full / partial upgrade for data centre, cluster, zone. Because
+of the upgrade of a data centre or a zone, it may be divided into
+several batches. The upgrade of a cloud environment (cluster) may also
+be partial. For example, in one cloud environment running a number of
+VNFs, we may just try one of them to check the stability and
+performance, before we upgrade all of them.
+
+Software Dimension
+^^^^^^^^^^^^^^^^^^
+
+- The upgrade of host OS or kernel may need a 'hot migration'
+- The upgrade of OpenStack’s components
+ i.the one-shot upgrade of all components
+ ii.the partial upgrade (or bugfix patch) which only affects some
+ components (e.g., computing, storage, network, database, message
+ queue, etc.)
+
+| ==[MT] this section seems to overlap with 2.1.==
+| I can see the following dimensions for the software
+
+- different software packages
+- different funtions - Considering that the target versions of all
+ software are compatible the upgrade needs to ensure that any
+ dependencies between SW and therefore packages are taken into account
+ in the upgrade plan, i.e. no version mismatch occurs during the
+ upgrade therefore dependencies are not broken
+- same function - This is an upgrade specific question if different
+ versions can coexist in the system when a SW is being upgraded from
+ one version to another. This is particularly important for stateful
+ functions e.g. storage, networking, control services. The upgrade
+ method must consider the compatibility of the redundant entities.
+
+- different versions of the same software package
+- major version changes - they may introduce incompatibilities. Even
+ when there are backward compatibility requirements changes may cause
+ issues at graceful rollback
+- minor version changes - they must not introduce incompatibility
+ between versions, these should be primarily bug fixes, so live
+ patches should be possible
+
+- different installations of the same software package
+- using different installation options - they may reflect different
+ users with different needs so redundancy issues are less likely
+ between installations of different options; but they could be the
+ reflection of the heterogeneous system in which case they may provide
+ redundancy for higher availability, i.e. deeper inspection is needed
+- using the same installation options - they often reflect that the are
+ used by redundant entities across space
+
+- different distribution possibilities in space - same or different
+ availability zones, multi-site, geo-redundancy
+
+- different entities running from the same installation of a software
+ package
+- using different startup options - they may reflect different users so
+ redundancy may not be an issues between them
+- using same startup options - they often reflect redundant
+ entities====
+
+Upgrade duration
+~~~~~~~~~~~~~~~~
+
+As the OPNFV end-users are primarily Telco operators, the network
+services provided by the VNFs deployed on the NFVI should meet the
+requirement of 'Carrier Grade'.
+
+In telecommunication, a "carrier grade" or"carrier class" refers to a
+system, or a hardware or software component that is extremely reliable,
+well tested and proven in its capabilities. Carrier grade systems are
+tested and engineered to meet or exceed "five nines" high availability
+standards, and provide very fast fault recovery through redundancy
+(normally less than 50 milliseconds). [from wikipedia.org]
+
+"five nines" means working all the time in ONE YEAR except 5'15".
+
+We have learnt that a well prepared upgrade of OpenStack needs 10
+minutes. The major time slot in the outage time is used spent on
+synchronizing the database. [from ' Ten minutes OpenStack Upgrade? Done!
+' by Symantec]
+
+This 10 minutes of downtime of OpenStack however did not impact the
+users, i.e. the VMs running on the compute nodes. This was the outage of
+the control plane only. On the other hand with respect to the
+preparations this was a manually tailored upgrade specific to the
+particular deployment and the versions of each OpenStack service.
+
+The project targets to achieve a more generic methodology, which however
+requires that the upgrade objects fulfill ceratin requirements. Since
+this is only possible on the long run we target first upgrades from
+version to version for the different VIM services.
+
+**Questions:**
+
+#. | Can we manage to upgrade OPNFV in only 5 minutes?
+ | ==[MT] The first question is whether we have the same carrier grade
+ requirement on the control plane as on the user plane. I.e. how
+ much control plane outage we can/willing to tolerate?
+ | In the above case probably if the database is only half of the size
+ we can do the upgrade in 5 minutes, but is that good? It also means
+ that if the database is twice as much then the outage is 20
+ minutes.
+ | For the user plane we should go for less as with two release yearly
+ that means 10 minutes outage per year.==
+ | ==[Malla] 10 minutes outage per year to the users? Plus, if we take
+ control plane into the consideration, then total outage will be
+ more than 10 minute in whole network, right?==
+ | ==[MT] The control plane outage does not have to cause outage to
+ the users, but it may of course depending on the size of the system
+ as it's more likely that there's a failure that needs to be handled
+ by the control plane.==
+
+#. | Is it acceptable for end users ? Such as a planed service
+ interruption will lasting more than ten minutes for software
+ upgrade.
+ | ==[MT] For user plane, no it's not acceptable in case of
+ carrier-grade. The 5' 15" downtime should include unplanned and
+ planned downtimes.==
+ | ==[Malla] I go agree with Maria, it is not acceptable.==
+
+#. | Will any VNFs still working well when VIM is down?
+ | ==[MT] In case of OpenStack it seems yes. .:)==
+
+The maximum duration of an upgrade
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| The duration of an upgrade is related to and proportional with the
+ scale and the complexity of the OPNFV platform as well as the
+ granularity (in function and in space) of the upgrade.
+| [Malla] Also, if is a partial upgrade like module upgrade, it depends
+ also on the OPNFV modules and their tight connection entities as well.
+
+The maximum duration of a roll back when an upgrade is failed
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| The duration of a roll back is short than the corresponding upgrade. It
+ depends on the duration of restore the software and configure data from
+ pre-upgrade backup / snapshot.
+| ==[MT] During the upgrade process two types of failure may happen:
+| In case we can recover from the failure by undoing the upgrade
+ actions it is possible to roll back the already executed part of the
+ upgrade in graceful manner introducing no more service outage than
+ what was introduced during the upgrade. Such a graceful roll back
+ requires typically the same amount of time as the executed portion of
+ the upgrade and impose minimal state/data loss.==
+| ==[MT] Requirement: It should be possible to roll back gracefully the
+ failed upgrade of stateful services of the control plane.
+| In case we cannot recover from the failure by just undoing the
+ upgrade actions, we have to restore the upgraded entities from their
+ backed up state. In other terms the system falls back to an earlier
+ state, which is typically a faster recovery procedure than graceful
+ roll back and depending on the statefulness of the entities involved it
+ may result in significant state/data loss.==
+| **Two possible types of failures can happen during an upgrade**
+
+#. We can recover from the failure that occurred in the upgrade process:
+ In this case, a graceful rolling back of the executed part of the
+ upgrade may be possible which would "undo" the executed part in a
+ similar fashion. Thus, such a roll back introduces no more service
+ outage during an upgrade than the executed part introduced. This
+ process typically requires the same amount of time as the executed
+ portion of the upgrade and impose minimal state/data loss.
+#. We cannot recover from the failure that occurred in the upgrade
+ process: In this case, the system needs to fall back to an earlier
+ consistent state by reloading this backed-up state. This is typically
+ a faster recovery procedure than the graceful roll back, but can cause
+ state/data loss. The state/data loss usually depends on the
+ statefulness of the entities whose state is restored from the backup.
+
+The maximum duration of a VNF interruption (Service outage)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| Since not the entire process of a smooth upgrade will affect the VNFs,
+ the duration of the VNF interruption may be shorter than the duration
+ of the upgrade. In some cases, the VNF running without the control
+ from of the VIM is acceptable.
+| ==[MT] Should require explicitly that the NFVI should be able to
+ provide its services to the VNFs independent of the control plane?==
+| ==[MT] Requirement: The upgrade of the control plane must not cause
+ interruption of the NFVI services provided to the VNFs.==
+| ==[MT] With respect to carrier-grade the yearly service outage of the
+ VNF should not exceed 5' 15" regardless whether it is planned or
+ unplanned outage. Considering the HA requirements TL-9000 requires an
+ ent-to-end service recovery time of 15 seconds based on which the ETSI
+ GS NFV-REL 001 V1.1.1 (2015-01) document defines three service
+ availability levels (SAL). The proposed example service recovery times
+ for these levels are:
+| SAL1: 5-6 seconds
+| SAL2: 10-15 seconds
+| SAL3: 20-25 seconds==
+| ==[Pva] my comment was actually that the downtime metrics of the
+ underlying elements, components and services are small fraction of the
+ total E2E service availability time. No-one on the E2E service path
+ will get the whole downtime allocation (in this context it includes
+ upgrade process related outages for the services provided by VIM etc.
+ elements that are subject to upgrade process).==
+| ==[MT] So what you are saying is that the upgrade of any entity
+ (component, service) shouldn't cause even this much service
+ interruption. This was the reason I brought these figures here as well
+ that they are posing some kind of upper-upper boundary. Ideally the
+ interruption is in the millisecond range i.e. no more than a
+ switchover or a live migration.==
+| ==[MT] Requirement: Any interruption caused to the VNF by the upgrade
+ of the NFVI should be in the sub-second range.==
+
+==[MT] In the future we also need to consider the upgrade of the NFVI,
+i.e. HW, firmware, hypervisors, host OS etc.== \ No newline at end of file
diff --git a/doc/03-Functional_Requirements.rst b/doc/03-Functional_Requirements.rst
new file mode 100644
index 0000000..a380180
--- /dev/null
+++ b/doc/03-Functional_Requirements.rst
@@ -0,0 +1,224 @@
+Functional Requirements
+-----------------------
+
+Basic Actions
+~~~~~~~~~~~~~
+
+This section describes the basic functions may required by Escalator.
+
+Preparation (offline)
+^^^^^^^^^^^^^^^^^^^^^
+
+This is the design phase when the upgrade plan (or upgrade campaign) is
+being designed so that it can be executed automatically with minimal
+service outage. It may include the following work:
+
+1. Check the dependencies of the software modules and their impact,
+ backward compatibilities to figure out the appropriate upgrade method
+ and ordering.
+2. Find out if a rolling upgrade could be planned with several rolling
+ steps to avoid any service outage due to the upgrade some
+ parts/services at the same time.
+3. Collect the proper version files and check the integration for
+ upgrading.
+4. The preparation step should produce an output (i.e. upgrade
+ campaign/plan), which is executable automatically in an NFV Framework
+ and which can be validated before execution.
+
+ - The upgrade campaign should not be referring to scalable entities
+ directly, but allow for adaptation to the system configuration and
+ state at any given moment.
+ - The upgrade campaign should describe the ordering of the upgrade
+ of different entities so that dependencies, redundancies can be
+ maintained during the upgrade execution
+ - The upgrade campaign should provide information about the
+ applicable recovery procedures and their ordering.
+ - The upgrade campaign should consider information about the
+ verification/testing procedures to be performed during the upgrade
+ so that upgrade failures can be detected as soon as possible and
+ the appropriate recovery procedure can be identified and applied.
+ - The upgrade campaign should provide information on the expected
+ execution time so that hanging execution can be identified
+ - The upgrade campaign should indicate any point in the upgrade when
+ coordination with the users (VNFs) is required.
+
+==[hujie]Depends on the attributes of the object being upgraded, the
+upgrade plan may be slitted into step(s) and/or sub-plan(s), and even
+more small sub-plans in design phase. The plan(s) or sub-plan(s) my
+include step(s) or sub-plan(s).==
+
+Validation the upgrade plan / Checking the pre-requisites of System( offline / online)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| The upgrade plan should be validated before the execution by testing
+ it in a test environment which is similar to the product environment.
+| ==[MT]However it could also mean that we can identify some properties
+ that it should satisfy e.g. what operations can or cannot be executed
+ simultaneously like never take out two VMs of the same VNF.
+| Another question is if it requires that the system is in a particular
+ state when the upgrade is applied. I.e. if there's certain amount of
+ redundancy in the system, migration is enabled for VMs, when the NFVI
+ is upgraded the VIM is healthy, when the VIM is upgraded the NFVI is
+ healthy, etc.
+| I'm not sure what online validation means: Is it the validation of the
+ upgrade plan/campaign or the validation of the system that it is in a
+ state that the upgrade can be performed without too much risk?==
+
+| Before the upgrade plan being executed, the system heathly of the
+ online product environment should be checked and confirmed to satisfy
+ the requirements which were described in the upgrade plan. The
+ sysinfo, e.g. which included system alarms, performance statistics and
+ diagnostic logs, will be collected and analyized. It is required to
+ resolve all of the system faults or exclud the unhealthy part before
+ executing the upgrade plan.
+| ==[hujie] Text merged.==
+
+Backup/Snapshot (online)
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For avoid loss of data when a unsuccessful upgrade was encountered, the
+data should be backuped and the system state snapshot should be taken
+before the excution of upgrade plan. This would be considered in the
+upgrade plan.
+
+Several backups/Snapshots may be generated and stored before the single
+steps of changes. The following data/files are required to be
+considered:
+
+1. running version files for each node.
+2. system components' configuration file and database.
+3. image and storage, if it is necessary.
+ ==[MT] Does 3 imply VNF image and storage? I.e. VNF state and data?==
+
+| ==[hujie] The following text is derived from previous "4. Negotiate
+ with the VNF if it's ready for the upgrade"==
+
+| Although the upper layer, which include VNFs and VNFMs, is out of the
+ scope of Escalator, but it is still recommended to let it ready for a
+ smooth system upgrade. The escalator could not guarantee the safe of
+ VNFs. The upper layer should have some safe guard mechanism in design,
+ and ready for avoiding failure in system upgrade.
+
+Execution (online)
+^^^^^^^^^^^^^^^^^^
+
+| The execution of upgrade plan should be a dynamical procedure which is
+ controlled by Escalator.
+| ==[hujie] Revised text to be general.==
+
+1. It is required to supporting execution ether in sequence or in
+ parallel.
+2. It is required to checke the result of the execution and take the
+ action according the situation and the policies in the upgrade plan.
+3. It is required to execute properly on various configurations of
+ system object. I.e. stand-alone, HA, etc.
+4. It is required to excecute on the designated different parts of the
+ system. I.e. physical server, virtualized server, rack, chassis,
+ cluster, even different geographical places.
+
+Testing (online)
+^^^^^^^^^^^^^^^^
+
+| The testing after upgrade the whole system or parts of system to make
+ sure the upgraded system(object) is working normally.
+| ==[hujie] Revised text to be general.==
+
+1. It is recommended to run the prepared test cases to see if the
+ functionalities are available without any problem.
+2. It is recommended to check the sysinfo, e.g. system alarms,
+ performance statistics and diagnostic logs to see if there are any
+ abnormal.
+
+Restore/Roll-back (online)
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| When upgrade is failure unfortunately, a quick system restore or system
+ roll-back should be taken to recovery the system and the services.
+| ==[hujie] Revised text to be general.==
+
+1. It is recommend to support system restore from backup when upgrade
+ was failed.
+2. It is recommend to support graceful roll-back with reverse order
+ steps if possible.
+
+Monitoring (online)
+^^^^^^^^^^^^^^^^^^^
+
+| Escalator should continually monitor the process of upgrade. It is
+ keeping update status of each module, each node, each cluster into a
+ status table during upgrade.
+| ==[hujie] Revised text to be general.==
+
+1. It is required to collect the status of every objects being upgraded
+ and sending abnormal alerms during the upgrade.
+2. It is recommend to reuse the existing monitoring system, like alarm.
+3. It is recommend to support pro-actively query.
+4. It is recommend to support passively wait for notification.
+
+| **Two possible ways for monitoring:**
+| **Pro-Actively Query** requires NFVI/VIM provides proper API or CLI
+ interface. If Escalator serves as a service, it should pass on these
+ interfaces.
+| **Passively Wait for Notification** requires Escalator provides
+ callback interface, which could be used by NFVI/VIM systems or upgrade
+ agent to send back notification.
+| [hujie] I am not sure why not to subscribe the notification.
+
+Logging (online)
+^^^^^^^^^^^^^^^^
+
+Record the information generated by escalator into log files. The log
+file is used for manual diagnostic of exceptions.
+
+1. It is required to support logging.
+2. It is recommended to include time stamp, object id, action name,
+ error code, etc.
+
+Administrative Control (online)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Administrative Control is used for control the privilege to start any
+escalator's actions for avoiding unauthorized operations.
+
+#. It is required to support administrative control mechanism
+#. It is recommend to reuse the system's own secure system.
+#. It is required to avoid conflicts when the system's own secure system
+ being upgraded.
+
+Requirements on Object being upgraded
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+| ==We can develop BPs in future from requirements of this section and
+ gap analysis for upper stream projects==
+| Escalator focus on smooth upgrade. In practical implementation, it
+ might be combined with installer/deplorer, or act as an independent
+ tool/service. In either way, it requires targeting systems(NFVI and
+ VIM) are developed/deployed in a way that Escalator could perform
+ upgrade on them.
+
+On NFVI system, live-migration is likely used to maintain availability
+because OPNFV would like to make HA transparent from end user. This
+requires VIM system being able to put compute node into maintenance mode
+and then isolated from normal service. Otherwise, new NFVI instances
+might risk at being schedule into the upgrading node.
+
+| On VIM system, availability is likely achieved by redundancy. This
+ impose less requirements on system/services being upgrade (see PVA
+ comments in early version). However, there should be a way to put the
+ target system into standby mode. Because starting upgrade on the
+ master node in a cluster is likely a bad idea.
+| ==[hujie] Revised text to be general.==
+
+1. It is required for NFVI/VIM to support **service handover** mechanism
+ that minimize interruption to 0.001%(i.e. 99.999% service
+ availability). Possible implementations are live-migration, redundant
+ deployment, etc, (Note: for VIM, interruption could be less
+ restrictive)
+2. It is required for NFVI/VIM to restore the early version in a efficient
+ way, such as **snapshot**.
+3. It is required for NFVI/VIM to **migration data** efficiently between
+ base and upgraded system.
+ ==[hujie] What is exact meaning of "base" here?==
+4. It is recommend for NFV/VIM's interface to support upgrade
+ orchestration, e.g. reading/setting system state
+ ==[hujie] I am not sure if it reflect the previous text.==
diff --git a/doc/04-Use_Cases_and_Scenarios.rst b/doc/04-Use_Cases_and_Scenarios.rst
new file mode 100644
index 0000000..f7cfeb1
--- /dev/null
+++ b/doc/04-Use_Cases_and_Scenarios.rst
@@ -0,0 +1,32 @@
+Use Cases and Scenarios
+-----------------------
+
+This section describes the use cases and scenarios to verify the
+requirements of Escalator.
+
+Upgrade a system with minimal configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A minimal configuration system is normally deployed for experimental or
+development usages, such as a OPNFV test bed. Although it dose not have
+large workload, but it is a typical system to be upgraded frequently.
+
+Upgrade a system with HA configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A HA configuration system is very popular in the operator's data centre.
+And it is a typical product environment. It always running 7 \* 24 a
+week with VNFs running on it to provide services to the end users.
+
+Upgrade a system with Multi-Site configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Upgrade in one site may cause service interruption to other site, if
+both sites are depended and sharing the same modules/data base (e.g. a
+keystone for both sites).
+
+If a site failure during an upgrade, the roll-back missing any minimal
+state/data loss can cause an affect/failure to the depended site.
+
+==Consider one site of ARNO release first. Then, multi-site in the
+future.== \ No newline at end of file
diff --git a/doc/05-Reference_Architecture.rst b/doc/05-Reference_Architecture.rst
new file mode 100644
index 0000000..1b16dbe
--- /dev/null
+++ b/doc/05-Reference_Architecture.rst
@@ -0,0 +1,6 @@
+Reference Architecture
+----------------------
+
+This section describes the reference architecture, the function blocks,
+the function entities of Escalator for the reader to well understand how
+the basic functions be organized. \ No newline at end of file
diff --git a/doc/06-Information_Flows.rst b/doc/06-Information_Flows.rst
new file mode 100644
index 0000000..684377f
--- /dev/null
+++ b/doc/06-Information_Flows.rst
@@ -0,0 +1,7 @@
+Information Flows
+-----------------
+
+| This section describes the information flows among the function
+ entities when Escalator is in actions.
+| We should consider a generic procedure / frameworks of upgrading. And
+ may provide plug-ins interface for specialized tasks \ No newline at end of file
diff --git a/doc/07-Interfaces_and_Files.rst b/doc/07-Interfaces_and_Files.rst
new file mode 100644
index 0000000..87f916e
--- /dev/null
+++ b/doc/07-Interfaces_and_Files.rst
@@ -0,0 +1,27 @@
+Interfaces and Files
+--------------------
+
+This section describes the required interfaces and files of Escalator.
+
+
+CLI Interface
+~~~~~~~~~~~~~~~~
+
+This section describes CLI of Escalator.
+
+RESTful API
+~~~~~~~~~~~
+
+This section describes the API of Escalator for developer.
+
+Configuration File
+~~~~~~~~~~~~~~~~~~
+
+This section will suggest a format of the configuration files and how to
+deal with it.
+
+Log File
+~~~~~~~~
+
+This section will suggest a format of the log files and how to deal with
+it. \ No newline at end of file
diff --git a/doc/08-Requirements_from_other_OPNFV_Project.rst b/doc/08-Requirements_from_other_OPNFV_Project.rst
new file mode 100644
index 0000000..17bb677
--- /dev/null
+++ b/doc/08-Requirements_from_other_OPNFV_Project.rst
@@ -0,0 +1,35 @@
+Requirements from other OPNFV projects
+--------------------------------------
+
+| We have created a questionnaire for collecting other projects
+ requirments
+ (https://docs.google.com/forms/d/11o1mt15zcq0WBtXYK0n6lKF8XuIzQTwvv8ePTjmcoF0/viewform?usp=send_form),
+ please advertise it.
+| ==[hujie] Can we force other OPNFV projects to complete the survey by
+ using JIRA dependence?==
+
+Doctor Project
+~~~~~~~~~~~~~~
+
+| ==Note: This scenario could be out of scope in Escalator project, but
+ having the option to support this should be better to align with
+ Doctor requirements.==
+| The scope of Doctor project also covers maintenance scenario in which
+ 1) the VIM administorator requests host maintenance to VIM, 2) VIM
+ will notifiy it to consumer such as VNFM to trigger application level
+ migration or switching active-standby nodes, and 3) VIM waits responce
+ from the consumer for a short while.
+
+- VIM should send out notification of VM migration to consumer (VNFM)
+ as abstracted message like "maintenance".
+- VIM could wait VM migration until it receives "VM ready to
+ maintenance" message from the owner (VNFM)
+
+HA Project
+~~~~~~~~~~
+
+Multi-site Project
+~~~~~~~~~~~~~~~~~~
+
+- Escalator upgrade one site should at least not lead to the other site
+ API token validation failed.
diff --git a/doc/09-Reference.rst b/doc/09-Reference.rst
new file mode 100644
index 0000000..6ca2949
--- /dev/null
+++ b/doc/09-Reference.rst
@@ -0,0 +1,11 @@
+Reference
+---------
+
+| [1] ETSI GS NFV 002 (V1.1.1): “Architectural Framework”
+| [2] ETSI GS NFV 003 (V1.1.1): "Terminology for Main Concepts in NFV".
+| [3] ETSI GS NFV-SWA001:“Virtual Network Function Architecture”
+| [4] ETSI GS NFV-MAN001:“Management and Orchestration”
+| [5] ETSI GS NFV-REL001:"Resiliency Requirements"
+| [6] QuEST Forum TL-9000:"Quality Management System Requirement
+ Handbook"
+| [7] Service Availability Forum AIS:"Software Management Framework" \ No newline at end of file
diff --git a/doc/10-Useful_Working_Drafts_of_ETSI_NFV.rst b/doc/10-Useful_Working_Drafts_of_ETSI_NFV.rst
new file mode 100644
index 0000000..f2a5227
--- /dev/null
+++ b/doc/10-Useful_Working_Drafts_of_ETSI_NFV.rst
@@ -0,0 +1,10 @@
+Useful Working Drafts of ETSI NFV
+---------------------------------
+
+| Access them with your own ETSI account, please DO NOT disclose the
+ content.
+| [1] Migrate Virtualised Compute Resource operation @ 7.3.1.8
+| ftp://docbox.etsi.org/ISG/NFV/Open/Drafts/IFA005_Or-Vi_ref_point_Spec/NFV-IFA005v070.zip
+| [2] Reliability issues during NFV Software upgrade and improvement
+ mechanisms @ 8
+| ftp://@docbox.etsi.org/ISG/NFV/Open/Drafts/REL003_E2E_reliability_models/NFV-REL003v030.zip \ No newline at end of file
diff --git a/doc/A1-Appendix.rst b/doc/A1-Appendix.rst
new file mode 100644
index 0000000..dcacccf
--- /dev/null
+++ b/doc/A1-Appendix.rst
@@ -0,0 +1,46 @@
+Appendix
+--------
+
+A.1 Impact Analysis
+~~~~~~~~~~~~~~~~~~~
+
+Upgrading the different software modules may cause different impact on
+the availability of the infrastructure resources and even on the service
+continuity of the vNFs.
+
+**Software modules in the computing nodes**
+
+#. Host OS patch
+ ==[MT] As SW module, we should list the host OS and maybe ====its
+ drivers as well. From upgrade perspective do we limit host OS
+ upgrades to patches only?==
+#. Hypervisor, such as KVM, QEMU, XEN, libvirt
+#. Openstack agent in computing nodes (like Nova agent, Ceilometer
+ agent...)
+
+**Software modules in network nodes**
+
+#. Neutron L2/L3 agent
+#. OVS, SR-IOV Driver
+
+**Software modules storage nodes**
+
+#. Ceph
+
+The table below analyses such an impact - considering a single instance
+of each software module - from the following aspects:
+
+- the function which will be lost during upgrade,
+- the duration of the loss of this specific function,
+- if this causes the loss of the vNF function,
+- if it causes incompatibility in the different parts of the software,
+- what should be backed up before the upgrade,
+- the duration of restoration time if the upgrade fails
+
+| These values provided come from internal testing and based on some
+ assumptions, they may vary depending on the deployment techniques.
+ Please feel free to add if you find more efficient values during your
+ testing.
+| https://wiki.opnfv.org/_media/upgrade_analysis_v0.5.xlsx
+| Note that no redundancy of the software modules is considered in the
+ table.