summaryrefslogtreecommitdiffstats
path: root/docs/userguide
diff options
context:
space:
mode:
Diffstat (limited to 'docs/userguide')
-rw-r--r--docs/userguide/abstract.rst16
-rw-r--r--docs/userguide/common.platform.render.rst15
-rw-r--r--docs/userguide/feature.userguide.render.rst14
-rw-r--r--docs/userguide/index.rst20
-rw-r--r--docs/userguide/introduction.rst53
-rw-r--r--docs/userguide/live_migration.userguide.rst121
-rw-r--r--docs/userguide/lmdowntime.jpgbin0 -> 6372 bytes
-rw-r--r--docs/userguide/lmnetwork.jpgbin0 -> 53332 bytes
-rw-r--r--docs/userguide/lmtotaltime.jpgbin0 -> 24171 bytes
-rw-r--r--docs/userguide/low_latency.userguide.rst68
-rw-r--r--docs/userguide/openstack.rst51
-rw-r--r--docs/userguide/tuning.userguide.rst102
12 files changed, 460 insertions, 0 deletions
diff --git a/docs/userguide/abstract.rst b/docs/userguide/abstract.rst
new file mode 100644
index 000000000..8c36c268f
--- /dev/null
+++ b/docs/userguide/abstract.rst
@@ -0,0 +1,16 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+========
+Abstract
+========
+
+In KVM4NFV project, we focus on the KVM hypervisor to enhance it for NFV,
+by looking at the following areas initially-
+
+* Minimal Interrupt latency variation for data plane VNFs:
+ * Minimal Timing Variation for Timing correctness of real-time VNFs
+ * Minimal packet latency variation for data-plane VNFs
+* Inter-VM communication
+* Fast live migration
diff --git a/docs/userguide/common.platform.render.rst b/docs/userguide/common.platform.render.rst
new file mode 100644
index 000000000..486ca469f
--- /dev/null
+++ b/docs/userguide/common.platform.render.rst
@@ -0,0 +1,15 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+================================
+Using common platform components
+================================
+
+This section outlines basic usage principals and methods for some of the
+commonly deployed components of supported OPNFV scenario's in Colorado.
+The subsections provide an outline of how these components are commonly
+used and how to address them in an OPNFV deployment.The components derive
+from autonomous upstream communities and where possible this guide will
+provide direction to the relevant documentation made available by those
+communities to better help you navigate the OPNFV deployment.
diff --git a/docs/userguide/feature.userguide.render.rst b/docs/userguide/feature.userguide.render.rst
new file mode 100644
index 000000000..d903f0711
--- /dev/null
+++ b/docs/userguide/feature.userguide.render.rst
@@ -0,0 +1,14 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+==========================
+Using Colorado Features
+==========================
+
+The following sections of the user guide provide feature specific usage
+guidelines and references for KVM4NFV CICD project.
+
+* <project>/docs/userguide/low_latency.userguide.rst
+* <project>/docs/userguide/live_migration.userguide.rst
+* <project>/docs/userguide/tuning.userguide.rst
diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
new file mode 100644
index 000000000..55042ec04
--- /dev/null
+++ b/docs/userguide/index.rst
@@ -0,0 +1,20 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+****************
+OPNFV User Guide
+****************
+Colorado 1.0
+------------
+
+.. toctree::
+ :maxdepth: 2
+
+ ./abstract.rst
+ ./introduction.rst
+ ./common.platform.render.rst
+ ./feature.userguide.render.rst
+ ./low_latency.userguide.rst
+ ./live_migration.userguide.rst
+ ./tuning.userguide.rst
diff --git a/docs/userguide/introduction.rst b/docs/userguide/introduction.rst
new file mode 100644
index 000000000..501d6391b
--- /dev/null
+++ b/docs/userguide/introduction.rst
@@ -0,0 +1,53 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+========
+Overview
+========
+
+The project "NFV Hypervisors-KVM" makes collaborative efforts to enable NFV
+features for existing hypervisors, which are not necessarily designed or
+targeted to meet the requirements for the NFVI.The KVM4NFV CICD scenario
+consists of Continuous Integration builds, deployments and testing
+combinations of virtual infrastructure components.
+
+KVM4NFV Features
+================
+
+Using this project, the following areas are targeted-
+
+* Minimal Interrupt latency variation for data plane VNFs:
+ * Minimal Timing Variation for Timing correctness of real-time VNFs
+ * Minimal packet latency variation for data-plane VNFs
+* Inter-VM communication
+* Fast live migration
+
+Some of the above items would require software development and/or specific
+hardware features, and some need just configurations information for the
+system (hardware, BIOS, OS, etc.).
+
+We include a requirements gathering stage as a formal part of the project.
+For each subproject, we will start with an organized requirement stage so
+that we can determine specific use cases (e.g. what kind of VMs should be
+live migrated) and requirements (e.g. interrupt latency, jitters, Mpps,
+migration-time, down-time, etc.) to set out the performance goals.
+
+Potential future projects would include:
+
+* Dynamic scaling (via scale-out) using VM instantiation
+* Fast live migration for SR-IOV
+
+The user guide outlines how to work with key components and features in
+the platform, each feature description section will indicate the scenarios
+that provide the components and configurations required to use it.
+
+The configuration guide details which scenarios are best for you and how to
+install and configure them.
+
+General usage guidelines
+========================
+
+The user guide for KVM4NFV CICD features and capabilities provide step by step
+instructions for using features that have been configured according to the
+installation and configuration instructions.
diff --git a/docs/userguide/live_migration.userguide.rst b/docs/userguide/live_migration.userguide.rst
new file mode 100644
index 000000000..9fa9b82fd
--- /dev/null
+++ b/docs/userguide/live_migration.userguide.rst
@@ -0,0 +1,121 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+Fast Live Migration
+===================
+
+The NFV project requires fast live migration. The specific requirement is
+total live migration time < 2Sec, while keeping the VM down time < 10ms when
+running DPDK L2 forwarding workload.
+
+We measured the baseline data of migrating an idle 8GiB guest running a DPDK L2
+forwarding work load and observed that the total live migration time was 2271ms
+while the VM downtime was 26ms. Both of these two indicators failed to satisfy
+the requirements.
+
+Current Challenges
+------------------
+
+The following 4 features have been developed over the years to make the live
+migration process faster.
+
++ XBZRLE:
+ Helps to reduce the network traffic by just sending the
+ compressed data.
++ RDMA:
+ Uses a specific NIC to increase the efficiency of data
+ transmission.
++ Multi thread compression:
+ Compresses the data before transmission.
++ Auto convergence:
+ Reduces the data rate of dirty pages.
+
+Tests show none of the above features can satisfy the requirement of NFV.
+XBZRLE and Multi thread compression do the compression entirely in software and
+they are not fast enough in a 10Gbps network environment. RDMA is not flexible
+because it has to transport all the guest memory to the destination without zero
+page optimization. Auto convergence is not appropriate for NFV because it will
+impact guest’s performance.
+
+So we need to find other ways for optimization.
+
+Optimizations
+-------------------------
+a. Delay non-emergency operations
+ By profiling, it was discovered that some of the cleanup operations during
+ the stop and copy stage are the main reason for the long VM down time. The
+ cleanup operation includes stopping the dirty page logging, which is a time
+ consuming operation. By deferring these operations until the data transmission
+ is completed the VM down time is reduced to about 5-7ms.
+b. Optimize zero page checking
+ Currently QEMU uses the SSE2 instruction to optimize the zero pages
+ checking. The SSE2 instruction can process 16 bytes per instruction.
+ By using the AVX2 instruction, we can process 32 bytes per instruction.
+ Testing shows that using AVX2 can speed up the zero pages checking process
+ by about 25%.
+c. Remove unnecessary context synchronization.
+ The CPU context was being synchronized twice during live migration. Removing
+ this unnecessary synchronization shortened the VM downtime by about 100us.
+
+Test Environment
+----------------
+
+The source and destination host have the same hardware and OS:
+::
+Host: HSW-EP
+CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
+RAM: 64G
+OS: RHEL 7.1
+Kernel: 4.2
+QEMU v2.4.0
+
+Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit
+X540-AT2 (rev 01)
+QEMU parameters:
+::
+${qemu} -smp ${guest_cpus} -monitor unix:${qmp_sock},server,nowait -daemonize \
+-cpu host,migratable=off,+invtsc,+tsc-deadline,pmu=off \
+-realtime mlock=on -mem-prealloc -enable-kvm -m 1G \
+-mem-path /mnt/hugetlbfs-1g \
+-drive file=/root/minimal-centos1.qcow2,cache=none,aio=threads \
+-netdev user,id=guest0,hostfwd=tcp:5555-:22 \
+-device virtio-net-pci,netdev=guest0 \
+-nographic -serial /dev/null -parallel /dev/null
+
+Network connection
+
+.. figure:: lmnetwork.jpg
+ :align: center
+ :alt: live migration network connection
+ :figwidth: 80%
+
+
+Test Result
+-----------
+The down time is set to 10ms when doing the test. We use pktgen to send the
+packages to guest, the package size is 64 bytes, and the line rate is 2013
+Mbps.
+
+a. Total live migration time
+
+ The total live migration time before and after optimization is shown in the
+ chart below. For an idle guest, we can reduce the total live migration time
+ from 2070ms to 401ms. For a guest running the DPDK L2 forwarding workload,
+ the total live migration time is reduced from 2271ms to 654ms.
+
+.. figure:: lmtotaltime.jpg
+ :align: center
+ :alt: total live migration time
+
+b. VM downtime
+
+ The VM down time before and after optimization is shown in the chart below.
+ For an idle guest, we can reduce the VM down time from 29ms to 9ms. For a guest
+ running the DPDK L2 forwarding workload, the VM down time is reduced from 26ms to
+ 5ms.
+
+.. figure:: lmdowntime.jpg
+ :align: center
+ :alt: vm downtime
+ :figwidth: 80%
diff --git a/docs/userguide/lmdowntime.jpg b/docs/userguide/lmdowntime.jpg
new file mode 100644
index 000000000..c9faa4c73
--- /dev/null
+++ b/docs/userguide/lmdowntime.jpg
Binary files differ
diff --git a/docs/userguide/lmnetwork.jpg b/docs/userguide/lmnetwork.jpg
new file mode 100644
index 000000000..8a9a324c3
--- /dev/null
+++ b/docs/userguide/lmnetwork.jpg
Binary files differ
diff --git a/docs/userguide/lmtotaltime.jpg b/docs/userguide/lmtotaltime.jpg
new file mode 100644
index 000000000..2dced3987
--- /dev/null
+++ b/docs/userguide/lmtotaltime.jpg
Binary files differ
diff --git a/docs/userguide/low_latency.userguide.rst b/docs/userguide/low_latency.userguide.rst
new file mode 100644
index 000000000..df4581506
--- /dev/null
+++ b/docs/userguide/low_latency.userguide.rst
@@ -0,0 +1,68 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+Low Latency Environment
+=======================
+
+Achieving low latency with the KVM4NFV project requires setting up a special
+test environment. This environment includes the BIOS settings, kernel
+configuration, kernel parameters and the run-time environment.
+
+Hardware Environment Description
+--------------------------------
+
+BIOS setup plays an important role in achieving real-time latency. A collection
+of relevant settings, used on the platform where the baseline performance data
+was collected, is detailed below:
+
+CPU Features
+~~~~~~~~~~~~
+
+Some special CPU features like TSC-deadline timer, invariant TSC and Process
+posted interrupts, etc, are helpful for latency reduction.
+
+CPU Topology
+~~~~~~~~~~~~
+
+NUMA topology is also important for latency reduction.
+
+BIOS Setup
+~~~~~~~~~~
+
+Careful BIOS setup is important in achieving real time latency. Different
+platforms have different BIOS setups, below are the important BIOS settings on
+the platform used to collect the baseline performance data.
+
+Software Environment Setup
+--------------------------
+Both the host and the guest environment need to be configured properly to
+reduce latency variations. Below are some suggested kernel configurations.
+The ci/envs/ directory gives detailed implementation on how to setup the
+environment.
+
+Kernel Parameter
+~~~~~~~~~~~~~~~~
+
+Please check the default kernel configuration in the source code at:
+kernel/arch/x86/configs/opnfv.config.
+
+Below is host kernel boot line example:
+::
+isolcpus=11-15,31-35 nohz_full=11-15,31-35 rcu_nocbs=11-15,31-35
+iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G mce=off idle=poll
+intel_pstate=disable processor.max_cstate=1 pcie_asmp=off tsc=reliable
+
+Below is guest kernel boot line example
+::
+isolcpus=1 nohz_full=1 rcu_nocbs=1 mce=off idle=poll default_hugepagesz=1G
+hugepagesz=1G
+
+Please refer to `tunning.userguide` for more explanation.
+
+Run-time Environment Setup
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Not only are special kernel parameters needed but a special run-time
+environment is also required. Please refer to `tunning.userguide` for
+more explanation.
diff --git a/docs/userguide/openstack.rst b/docs/userguide/openstack.rst
new file mode 100644
index 000000000..bd1919991
--- /dev/null
+++ b/docs/userguide/openstack.rst
@@ -0,0 +1,51 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+--------------------------------
+Colorado OpenStack User Guide
+--------------------------------
+
+OpenStack is a cloud operating system developed and released by the
+`OpenStack project <https://www.openstack.org>`_. OpenStack is used in OPNFV
+for controlling pools of compute, storage, and networking resources in a Pharos
+compliant infrastructure.
+
+OpenStack is used in Colorado to manage tenants (known in OpenStack as
+projects),users, services, images, flavours, and quotas across the Pharos
+infrastructure.The OpenStack interface provides the primary interface for an
+operational Colorado deployment and it is from the "horizon console" that an
+OPNFV user will perform the majority of administrative and operational
+activities on the deployment.
+
+OpenStack references
+--------------------
+
+The `OpenStack user guide <http://docs.openstack.org/user-guide>`_ provides
+details and descriptions of how to configure and interact with the OpenStack
+deployment.This guide can be used by lab engineers and operators to tune the
+OpenStack deployment to your liking.
+
+Once you have configured OpenStack to your purposes, or the Colorado
+deployment meets your needs as deployed, an operator, or administrator, will
+find the best guidance for working with OpenStack in the
+`OpenStack administration guide <http://docs.openstack.org/user-guide-admin>`_.
+
+Connecting to the OpenStack instance
+------------------------------------
+
+Once familiar with the basic of working with OpenStack you will want to connect
+to the OpenStack instance via the Horizon Console. The Horizon console provide
+a Web based GUI that will allow you operate the deployment.
+To do this you should open a browser on the JumpHost to the following address
+and enter the username and password:
+
+
+ http://{Controller-VIP}:80/index.html>
+ username: admin
+ password: admin
+
+Other methods of interacting with and configuring OpenStack,, like the REST API
+and CLI are also available in the Colorado deployment, see the
+`OpenStack administration guide <http://docs.openstack.org/user-guide-admin>`_
+for more information on using those interfaces.
diff --git a/docs/userguide/tuning.userguide.rst b/docs/userguide/tuning.userguide.rst
new file mode 100644
index 000000000..3673ae2d4
--- /dev/null
+++ b/docs/userguide/tuning.userguide.rst
@@ -0,0 +1,102 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+
+.. http://creativecommons.org/licenses/by/4.0
+
+Low Latency Tunning Suggestion
+==============================
+
+The correct configuration is critical for improving the NFV
+performance/latency.Even working on the same codebase, configurations can cause
+wildly different performance/latency results.
+
+There are many combinations of configurations, from hardware configuration to
+Operating System configuration and application level configuration. And there
+is no one simple configuration that works for every case. To tune a specific
+scenario, it's important to know the behaviors of different configurations and
+their impact.
+
+Platform Configuration
+----------------------
+
+Some hardware features can be configured through firmware interface(like BIOS)
+but others may not be configurable (e.g. SMI on most platforms).
+
+* **Power management:**
+ Most power management related features save power at the
+ expensive of latency. These features include: Intel®Turbo Boost Technology,
+ Enhanced Intel®SpeedStep, Processor C state and P state. Normally they
+ should be disabled but, depending on the real-time application design and
+ latency requirements, there might be some features that can be enabled if
+ the impact on deterministic execution of the workload is small.
+
+* **Hyper-Threading:**
+ The logic cores that share resource with other logic cores can introduce
+ latency so the recommendation is to disable this feature for realtime use
+ cases.
+
+* **Legacy USB Support/Port 60/64 Emulation:**
+ These features involve some emulation in firmware and can introduce random
+ latency. It is recommended that they are disabled.
+
+* **SMI (System Management Interrupt):**
+ SMI runs outside of the kernel code and can potentially cause
+ latency. It is a pity there is no simple way to disable it. Some vendors may
+ provide related switches in BIOS but most machines do not have this
+ capability.
+
+Operating System Configuration
+------------------------------
+
+* **CPU isolation:**
+ To achieve deterministic latency, dedicated CPUs should be allocated for
+ realtime application. This can be achieved by isolating cpus from kernel
+ scheduler. Please refer to
+ http://lxr.free-electrons.com/source/Documentation/kernel-parameters.txt#L1608
+ for more information.
+
+* **Memory allocation:**
+ Memory shoud be reserved for realtime applications and usually hugepage
+ should be used to reduce page fauts/TLB misses.
+
+* **IRQ affinity:**
+ All the non-realtime IRQs should be affinitized to non realtime CPUs to
+ reduce the impact on realtime CPUs. Some OS distributions contain an
+ irqbalance daemon which balances the IRQs among all the cores dynamically.
+ It should be disabled as well.
+
+* **Device assignment for VM:**
+ If a device is used in a VM, then device passthrough is desirable. In this
+ case,the IOMMU should be enabled.
+
+* **Tickless:**
+ Frequent clock ticks cause latency. CONFIG_NOHZ_FULL should be enabled in
+ the linux kernel. With CONFIG_NOHZ_FULL, the physical CPU will trigger many
+ fewer clock tick interrupts(currently, 1 tick per second). This can reduce
+ latency because each host timer interrupt triggers a VM exit from guest to
+ host which causes performance/latency impacts.
+
+* **TSC:**
+ Mark TSC clock source as reliable. A TSC clock source that seems to be
+ unreliable causes the kernel to continuously enable the clock source
+ watchdog to check if TSC frequency is still correct. On recent Intel
+ platforms with Constant TSC/Invariant TSC/Synchronized TSC, the TSC is
+ reliable so the watchdog is useless but cause latency.
+
+* **Idle:**
+ The poll option forces a polling idle loop that can slightly improve the
+ performance of waking up an idle CPU.
+
+* **RCU_NOCB:**
+ RCU is a kernel synchronization mechanism. Refer to
+ http://lxr.free-electrons.com/source/Documentation/RCU/whatisRCU.txt for more
+ information. With RCU_NOCB, the impact from RCU to the VNF will be reduced.
+
+* **Disable the RT throttling:**
+ RT Throttling is a Linux kernel mechanism that
+ occurs when a process or thread uses 100% of the core, leaving no resources
+ for the Linux scheduler to execute the kernel/housekeeping tasks. RT
+ Throttling increases the latency so should be disabled.
+
+* **NUMA configuration:**
+ To achieve the best latency. CPU/Memory and device allocated for realtime
+ application/VM should be in the same NUMA node.