summaryrefslogtreecommitdiffstats
path: root/docs/all
diff options
context:
space:
mode:
Diffstat (limited to 'docs/all')
-rw-r--r--docs/all/environment-setup.rst147
-rw-r--r--docs/all/index.rst44
-rw-r--r--docs/all/live_migration.rst108
-rw-r--r--docs/all/lmdowntime.jpgbin0 -> 6372 bytes
-rw-r--r--docs/all/lmnetwork.jpgbin0 -> 53332 bytes
-rw-r--r--docs/all/lmtotaltime.jpgbin0 -> 24171 bytes
-rw-r--r--docs/all/tunning.rst97
7 files changed, 396 insertions, 0 deletions
diff --git a/docs/all/environment-setup.rst b/docs/all/environment-setup.rst
new file mode 100644
index 000000000..b60deb7e0
--- /dev/null
+++ b/docs/all/environment-setup.rst
@@ -0,0 +1,147 @@
+Low Latency Environment
+=======================
+
+Achieving low latency with the KVM4NFV project requires setting up a special
+test environment. This environment includes the BIOS settings, kernel
+configuration, kernel parameters and the run-time environment.
+
+Hardware Environment Description
+--------------------------------
+
+BIOS setup plays an important role in achieving real-time latency. A collection
+of relevant settings, used on the platform where the baseline performance data
+was collected, is detailed below:
+
+CPU Features
+~~~~~~~~~~~~
+
+Some special CPU features like TSC-deadline timer, invariant TSC and Process posted
+interrupts, etc, are helpful for latency reduction.
+
+Below is the CPU information on the baseline test platform.
+::
+ processor : 35
+ vendor_id : GenuineIntel
+ cpu family : 6
+ model : 63
+ model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
+ stepping : 2
+ microcode : 0x2d
+ cpu MHz : 2294.795
+ cache size : 46080 KB
+ physical id : 1
+ siblings : 18
+ core id : 27
+ cpu cores : 18
+ apicid : 118
+ initial apicid : 118
+ fpu : yes
+ fpu_exception : yes
+ cpuid level : 15
+ wp : yes
+ flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
+ mca cmov pat pse36 clflush dts acpi mmx fxsr sse
+ sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
+ constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
+ aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
+ ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
+ tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm arat epb
+ pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase
+ tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
+ cqm_occup_llcbugs
+ bogomips : 4595.54
+ clflush size : 64
+ cache_alignment : 64
+ address sizes : 46 bits physical, 48 bits virtual
+ power management:
+
+CPU Topology
+~~~~~~~~~~~~
+
+NUMA topology is also important for latency reduction.
+
+Below is the CPU topology on the baseline test platform.
+::
+ [nfv@otcnfv02 ~]$ lscpu
+ Architecture: x86_64
+ CPU op-mode(s): 32-bit, 64-bit
+ Byte Order: Little Endian
+ CPU(s): 36
+ On-line CPU(s) list: 0-35
+ Thread(s) per core: 1
+ Core(s) per socket: 18
+ Socket(s): 2
+ NUMA node(s): 2
+ Vendor ID: GenuineIntel
+ CPU family: 6
+ Model: 63
+ Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
+ Stepping: 2
+ CPU MHz: 2294.795
+ BogoMIPS: 4595.54
+ Virtualization: VT-x
+ L1d cache: 32K
+ L1i cache: 32K
+ L2 cache: 256K
+ L3 cache: 46080K
+ NUMA node0 CPU(s): 0-17
+ NUMA node1 CPU(s): 18-35
+
+BIOS Setup
+~~~~~~~~~~
+
+Careful BIOS setup is important in achieving real time latency. Different
+platforms have different BIOS setups, below are the important BIOS settings on
+the platform used to collect the baseline performance data.
+::
+ CPU Power and Performance <Performance>
+ CPU C-State <Disabled>
+ C1E Autopromote <Disabled>
+ Processor C3 <Disabled>
+ Processor C6 <Disabled>
+ Select Memory RAS <Maximum Performance>
+ NUMA Optimized <Enabled>
+ Cluster-on-Die <Disabled>
+ Patrol Scrub <Disabled>
+ Demand Scrub <Disabled>
+ Correctable Error <10>
+ Intel(R) Hyper-Threading <Disabled>
+ Active Processor Cores <All>
+ Execute Disable Bit <Enabled>
+ Intel(R) Virtualization Technology <Enabled>
+ Intel(R) TXT <Disabled>
+ Enhanced Error Containment Mode <Disabled>
+ USB Controller <Enabled>
+ USB 3.0 Controller <Auto>
+ Legacy USB Support <Disabled>
+ Port 60/64 Emulation <Disabled>
+
+Software Environment Setup
+--------------------------
+Both the host and the guest environment need to be configured properly to
+reduce latency variations. Below are some suggested kernel configurations.
+The ci/envs/ directory gives detailed implementation on how to setup the
+environment.
+
+Kernel Parameter
+~~~~~~~~~~~~~~~~
+
+Please check the default kernel configuration in the source code at:
+kernel/arch/x86/configs/opnfv.config.
+
+Below is host kernel boot line example:
+::
+ isolcpus=11-15,31-35 nohz_full=11-15,31-35 rcu_nocbs=11-15,31-35 iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G mce=off idle=poll intel_pstate=disable processor.max_cstate=1 pcie_asmp=off tsc=reliable
+
+Below is guest kernel boot line example
+::
+ isolcpus=1 nohz_full=1 rcu_nocbs=1 mce=off idle=poll default_hugepagesz=1G hugepagesz=1G
+
+Please refer to :doc:`tunning` for more explanation.
+
+Run-time Environment Setup
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Not only are special kernel parameters needed but a special run-time
+environment is also required. Please refer to :doc:`tunning` for more
+explanation.
diff --git a/docs/all/index.rst b/docs/all/index.rst
new file mode 100644
index 000000000..fb1b137d7
--- /dev/null
+++ b/docs/all/index.rst
@@ -0,0 +1,44 @@
+===============
+KVM4NFV project
+===============
+
+Welcome to KVM4NFV_ project!
+
+
+
+.. _KVM4NFV: https://wiki.opnfv.org/nfv-kvm
+
+Contents:
+
+KVM4NFV Project Description
+===========================
+
+The NFV hypervisors provide crucial functionality in the NFV Infrastructure
+(NFVI). The existing hypervisors, however, are not necessarily designed or
+targeted to meet the requirements for the NFVI, and we need to make
+collaborative efforts toward enabling the NFV features.
+
+The KVM4NFV project focuses on the KVM hypervisor to enhance it for NFV, by
+looking at the following areas
+
++ Minimal Interrupt latency variation for data plane VNFs
+ * Minimal Timing Variation for Timing correctness of real-time VNFs
+ * Minimal packet latency variation for data-plane VNFs
++ Fast live migration
+
+While these items require software development and/or specific hardware features
+there are also some adjustments that need to be made to system configuration
+information, like hardware, BIOS, OS, etc.
+
+.. toctree::
+ :numbered:
+ :maxdepth: 1
+
+Setup Guides
+============
+.. toctree::
+ :maxdepth: 2
+
+ environment-setup
+ tunning
+ live_migration
diff --git a/docs/all/live_migration.rst b/docs/all/live_migration.rst
new file mode 100644
index 000000000..51265cc8d
--- /dev/null
+++ b/docs/all/live_migration.rst
@@ -0,0 +1,108 @@
+Fast Live Migration
+===================
+
+The NFV project requires fast live migration. The specific requirement is total
+live migration time < 2Sec, while keeping the VM down time < 10ms when running
+DPDK L2 forwarding workload.
+
+We measured the baseline data of migrating an idle 8GiB guest running a DPDK L2
+forwarding work load and observed that the total live migration time was 2271ms
+while the VM downtime was 26ms. Both of these two indicators failed to satisfy
+the requirements.
+
+Current Challenges
+------------------
+
+The following 4 features have been developed over the years to make the live
+migration process faster.
+
++ XBZRLE:
+ Helps to reduce the network traffic by just sending the
+ compressed data.
++ RDMA:
+ Uses a specific NIC to increase the efficiency of data
+ transmission.
++ Multi thread compression:
+ Compresses the data before transmission.
++ Auto convergence:
+ Reduces the data rate of dirty pages.
+
+Tests show none of the above features can satisfy the requirement of NFV.
+XBZRLE and Multi thread compression do the compression entirely in software and
+they are not fast enough in a 10Gbps network environment. RDMA is not flexible
+because it has to transport all the guest memory to the destination without zero
+page optimization. Auto convergence is not appropriate for NFV because it will
+impact guest’s performance.
+
+So we need to find other ways for optimization.
+
+Optimizations
+-------------------------
+a. Delay non-emergency operations
+ By profiling, it was discovered that some of the cleanup operations during
+ the stop and copy stage are the main reason for the long VM down time. The
+ cleanup operation includes stopping the dirty page logging, which is a time
+ consuming operation. By deferring these operations until the data transmission
+ is completed the VM down time is reduced to about 5-7ms.
+b. Optimize zero page checking
+ Currently QEMU uses the SSE2 instruction to optimize the zero pages
+ checking. The SSE2 instruction can process 16 bytes per instruction. By using
+ the AVX2 instruction, we can process 32 bytes per instruction. Testingt shows
+ that using AVX2 can speed up the zero pages checking process by about 25%.
+c. Remove unnecessary context synchronization.
+ The CPU context was being synchronized twice during live migration. Removing
+ this unnecessary synchronization shortened the VM downtime by about 100us.
+
+Test Environment
+----------------
+
+The source and destination host have the same hardware and OS:
+::
+Host: HSW-EP
+CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
+RAM: 64G
+OS: RHEL 7.1
+Kernel: 4.2
+QEMU v2.4.0
+
+Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
+QEMU parameters:
+::
+ /root/qemu.git/x86_64-softmmu/qemu-system-x86_64-enable-kvm -cpu host -smp 4 –device virtio-net-pci,netdev=net1,mac=52:54:00:12:34:56 –netdev type=tap,id=net1,script=/etc/kvm/qemu-ifup,downscript=no,vhost=on–device virtio-net-pci,netdev=net2,mac=54:54:00:12:34:56 –netdevtype=tap,id=net2,script=/etc/kvm/qemu-ifup2,downscript=no,vhost=on -balloon virtio -m 8192-monitor stdio /mnt/liang/ia32e_rhel6u5.qcow
+
+Network connection
+
+.. figure:: lmnetwork.jpg
+ :align: center
+ :alt: live migration network connection
+ :figwidth: 80%
+
+
+Test Result
+-----------
+The down time is set to 10ms when doing the test. We use pktgen to send the
+packages to guest, the package size is 64 bytes, and the line rate is 2013
+Mbps.
+
+a. Total live migration time
+
+ The total live migration time before and after optimization is shown in the
+ chart below. For an idle guest, we can reduce the total live migration time
+ from 2070ms to 401ms. For a guest running the DPDK L2 forwarding workload,
+ the total live migration time is reduced from 2271ms to 654ms.
+
+.. figure:: lmtotaltime.jpg
+ :align: center
+ :alt: total live migration time
+
+b. VM downtime
+
+ The VM down time before and after optimization is shown in the chart below.
+ For an idle guest, we can reduce the VM down time from 29ms to 9ms. For a guest
+ running the DPDK L2 forwarding workload, the VM down time is reduced from 26ms to
+ 5ms.
+
+.. figure:: lmdowntime.jpg
+ :align: center
+ :alt: vm downtime
+ :figwidth: 80%
diff --git a/docs/all/lmdowntime.jpg b/docs/all/lmdowntime.jpg
new file mode 100644
index 000000000..c9faa4c73
--- /dev/null
+++ b/docs/all/lmdowntime.jpg
Binary files differ
diff --git a/docs/all/lmnetwork.jpg b/docs/all/lmnetwork.jpg
new file mode 100644
index 000000000..8a9a324c3
--- /dev/null
+++ b/docs/all/lmnetwork.jpg
Binary files differ
diff --git a/docs/all/lmtotaltime.jpg b/docs/all/lmtotaltime.jpg
new file mode 100644
index 000000000..2dced3987
--- /dev/null
+++ b/docs/all/lmtotaltime.jpg
Binary files differ
diff --git a/docs/all/tunning.rst b/docs/all/tunning.rst
new file mode 100644
index 000000000..337255328
--- /dev/null
+++ b/docs/all/tunning.rst
@@ -0,0 +1,97 @@
+Low Latency Tunning Suggestion
+==============================
+
+The correct configuration is critical for improving the NFV performance/latency.
+Even working on the same codebase, configurations can cause wildly different
+performance/latency results.
+
+There are many combinations of configurations, from hardware configuration to
+Operating System configuration and application level configuration. And there
+is no one simple configuration that works for every case. To tune a specific
+scenario, it's important to know the behaviors of different configurations and
+their impact.
+
+Platform Configuration
+----------------------
+
+Some hardware features can be configured through firmware interface(like BIOS)
+but others may not be configurable (e.g. SMI on most platforms).
+
+* **Power management:**
+ Most power management related features save power at the
+ expensive of latency. These features include: Intel®Turbo Boost Technology,
+ Enhanced Intel®SpeedStep, Processor C state and P state. Normally they should
+ be disabled but, depending on the real-time application design and latency
+ requirements, there might be some features that can be enabled if the impact on
+ deterministic execution of the workload is small.
+
+* **Hyper-Threading:**
+ The logic cores that share resource with other logic cores can introduce
+ latency so the recommendation is to disable this feature for realtime use
+ cases.
+
+* **Legacy USB Support/Port 60/64 Emulation:**
+ These features involve some emulation in firmware and can introduce random
+ latency. It is recommended that they are disabled.
+
+* **SMI (System Management Interrupt):**
+ SMI runs outside of the kernel code and can potentially cause
+ latency. It is a pity there is no simple way to disable it. Some vendors may
+ provide related switches in BIOS but most machines do not have this capability.
+
+Operating System Configuration
+------------------------------
+
+* **CPU isolation:**
+ To achieve deterministic latency, dedicated CPUs should be allocated for
+ realtime application. This can be achieved by isolating cpus from kernel
+ scheduler. Please refer to
+ http://lxr.free-electrons.com/source/Documentation/kernel-parameters.txt#L1608
+ for more information.
+
+* **Memory allocation:**
+ Memory shoud be reserved for realtime applications and usually hugepage should
+ be used to reduce page fauts/TLB misses.
+
+* **IRQ affinity:**
+ All the non-realtime IRQs should be affinitized to non realtime CPUs to
+ reduce the impact on realtime CPUs. Some OS distributions contain an irqbalance
+ daemon which balances the IRQs among all the cores dynamically. It should be
+ disabled as well.
+
+* **Device assignment for VM:**
+ If a device is used in a VM, then device passthrough is desirable. In this case,
+ the IOMMU should be enabled.
+
+* **Tickless:**
+ Frequent clock ticks cause latency. CONFIG_NOHZ_FULL should be enabled in the
+ linux kernel. With CONFIG_NOHZ_FULL, the physical CPU will trigger many fewer
+ clock tick interrupts(currently, 1 tick per second). This can reduce latency
+ because each host timer interrupt triggers a VM exit from guest to host which
+ causes performance/latency impacts.
+
+* **TSC:**
+ Mark TSC clock source as reliable. A TSC clock source that seems to be
+ unreliable causes the kernel to continuously enable the clock source watchdog
+ to check if TSC frequency is still correct. On recent Intel platforms with
+ Constant TSC/Invariant TSC/Synchronized TSC, the TSC is reliable so the
+ watchdog is useless but cause latency.
+
+* **Idle:**
+ The poll option forces a polling idle loop that can slightly improve the
+ performance of waking up an idle CPU.
+
+* **RCU_NOCB:**
+ RCU is a kernel synchronization mechanism. Refer to
+ http://lxr.free-electrons.com/source/Documentation/RCU/whatisRCU.txt for more
+ information. With RCU_NOCB, the impact from RCU to the VNF will be reduced.
+
+* **Disable the RT throttling:**
+ RT Throttling is a Linux kernel mechanism that
+ occurs when a process or thread uses 100% of the core, leaving no resources for
+ the Linux scheduler to execute the kernel/housekeeping tasks. RT Throttling
+ increases the latency so should be disabled.
+
+* **NUMA configuration:**
+ To achieve the best latency. CPU/Memory and device allocated for realtime
+ application/VM should be in the same NUMA node.