Merge "OPNFV KVM4NFV: Documentation"

author: Jiang, Yunhong <yunhong.jiang@intel.com> 2016-08-19 22:38:30 +0000
committer: Gerrit Code Review <gerrit@172.30.200.206> 2016-08-19 22:38:30 +0000
commit: 1ec14a31c3ee8df1b8602632e0e3295547e3ea12 (patch)
tree: 13d77cab75c7aebcfd88988e1c104188472e2e6f /docs/all
parent: b1c117f1c8414bddbe4370414590f5f0b62ae4d1 (diff)
parent: 5a56bf69988b7c72e88546eb4659576fb51bfb77 (diff)
7 files changed, 0 insertions, 412 deletions
diff --git a/docs/all/environment-setup.rst b/docs/all/environment-setup.rst
deleted file mode 100644
index e3814310a..000000000
--- a/docs/all/environment-setup.rst
+++ /dev/null
@@ -1,151 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) <optionally add copywriters name>
-
-Low Latency Environment
-=======================
-
-Achieving low latency with the KVM4NFV project requires setting up a special
-test environment. This environment includes the BIOS settings, kernel
-configuration, kernel parameters and the run-time environment.
-
-Hardware Environment Description
---------------------------------
-
-BIOS setup plays an important role in achieving real-time latency. A collection
-of relevant settings, used on the platform where the baseline performance data
-was collected, is detailed below:
-
-CPU Features
-~~~~~~~~~~~~
-
-Some special CPU features like TSC-deadline timer, invariant TSC and Process posted
-interrupts, etc, are helpful for latency reduction.
-
-Below is the CPU information on the baseline test platform.
-::
-        processor       : 35
-        vendor_id       : GenuineIntel
-        cpu family      : 6
-        model           : 63
-        model name      : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
-        stepping        : 2
-        microcode       : 0x2d
-        cpu MHz         : 2294.795
-        cache size      : 46080 KB
-        physical id     : 1
-        siblings        : 18
-        core id         : 27
-        cpu cores       : 18
-        apicid          : 118
-        initial apicid  : 118
-        fpu             : yes
-        fpu_exception   : yes
-        cpuid level     : 15
-        wp              : yes
-        flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
-                          mca cmov pat pse36 clflush dts acpi mmx fxsr sse
-                          sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
-                          constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
-                          aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
-                          ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
-                          tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm arat epb
-                          pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase
-                          tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
-                          cqm_occup_llcbugs
-        bogomips        : 4595.54
-        clflush size    : 64
-        cache_alignment : 64
-        address sizes   : 46 bits physical, 48 bits virtual
-        power management:
-
-CPU Topology
-~~~~~~~~~~~~
-
-NUMA topology is also important for latency reduction.
-
-Below is the CPU topology on the baseline test platform.
-::
-        [nfv@otcnfv02 ~]$ lscpu
-        Architecture:          x86_64
-        CPU op-mode(s):        32-bit, 64-bit
-        Byte Order:            Little Endian
-        CPU(s):                36
-        On-line CPU(s) list:   0-35
-        Thread(s) per core:    1
-        Core(s) per socket:    18
-        Socket(s):             2
-        NUMA node(s):          2
-        Vendor ID:             GenuineIntel
-        CPU family:            6
-        Model:                 63
-        Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
-        Stepping:              2
-        CPU MHz:               2294.795
-        BogoMIPS:              4595.54
-        Virtualization:        VT-x
-        L1d cache:             32K
-        L1i cache:             32K
-        L2 cache:              256K
-        L3 cache:              46080K
-        NUMA node0 CPU(s):     0-17
-        NUMA node1 CPU(s):     18-35
-
-BIOS Setup
-~~~~~~~~~~
-
-Careful BIOS setup is important in achieving real time latency. Different
-platforms have different BIOS setups, below are the important BIOS settings on
-the platform used to collect the baseline performance data.
-::
-        CPU Power and Performance <Performance>
-        CPU C-State <Disabled>
-        C1E Autopromote <Disabled>
-        Processor C3 <Disabled>
-        Processor C6 <Disabled>
-        Select Memory RAS <Maximum Performance>
-        NUMA Optimized <Enabled>
-        Cluster-on-Die <Disabled>
-        Patrol Scrub <Disabled>
-        Demand Scrub <Disabled>
-        Correctable Error <10>
-        Intel(R) Hyper-Threading <Disabled>
-        Active Processor Cores <All>
-        Execute Disable Bit <Enabled>
-        Intel(R) Virtualization Technology <Enabled>
-        Intel(R) TXT <Disabled>
-        Enhanced Error Containment Mode <Disabled>
-        USB Controller <Enabled>
-        USB 3.0 Controller <Auto>
-        Legacy USB Support <Disabled>
-        Port 60/64 Emulation <Disabled>
-
-Software Environment Setup
---------------------------
-Both the host and the guest environment need to be configured properly to
-reduce latency variations.  Below are some suggested kernel configurations.
-The ci/envs/ directory gives detailed implementation on how to setup the
-environment.
-
-Kernel Parameter
-~~~~~~~~~~~~~~~~
-
-Please check the default kernel configuration in the source code at:
-kernel/arch/x86/configs/opnfv.config.
-
-Below is host kernel boot line example:
-::
-        isolcpus=11-15,31-35 nohz_full=11-15,31-35 rcu_nocbs=11-15,31-35 iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G mce=off idle=poll intel_pstate=disable processor.max_cstate=1 pcie_asmp=off tsc=reliable
-
-Below is guest kernel boot line example
-::
- isolcpus=1 nohz_full=1 rcu_nocbs=1 mce=off idle=poll default_hugepagesz=1G hugepagesz=1G
-
-Please refer to :doc:`tunning` for more explanation.
-
-Run-time Environment Setup
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Not only are special kernel parameters needed but a special run-time
-environment is also required. Please refer to :doc:`tunning` for more
-explanation.
diff --git a/docs/all/index.rst b/docs/all/index.rst
deleted file mode 100644
index 7f5f7a694..000000000
--- a/docs/all/index.rst
+++ /dev/null
@@ -1,48 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) <optionally add copywriters name>
-
-===============
-KVM4NFV project
-===============
-
-Welcome to KVM4NFV_ project!
-
-
-
-.. _KVM4NFV: https://wiki.opnfv.org/nfv-kvm
-
-Contents:
-
-KVM4NFV Project Description
-===========================
-
-The NFV hypervisors provide crucial functionality in the NFV Infrastructure
-(NFVI). The existing hypervisors, however, are not necessarily designed or
-targeted to meet the requirements for the NFVI, and we need to make
-collaborative efforts toward enabling the NFV features.
-
-The KVM4NFV project focuses on the KVM hypervisor to enhance it for NFV, by
-looking at the following areas
-
-+ Minimal Interrupt latency variation for data plane VNFs
-    * Minimal Timing Variation for Timing correctness of real-time VNFs
-    * Minimal packet latency variation for data-plane VNFs
-+ Fast live migration
-
-While these items require software development and/or specific hardware features
-there are also some adjustments that need to be made to system configuration
-information, like hardware, BIOS, OS, etc.
-
-.. toctree::
-        :numbered:
-        :maxdepth: 1
-
-Setup Guides
-============
-.. toctree::
-        :maxdepth: 2
-
-        environment-setup
-        tuning
-        live_migration
diff --git a/docs/all/live_migration.rst b/docs/all/live_migration.rst
deleted file mode 100644
index 4af19b6f4..000000000
--- a/docs/all/live_migration.rst
+++ /dev/null
@@ -1,112 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) <optionally add copywriters name>
-
-Fast Live Migration
-===================
-
-The NFV project requires fast live migration. The specific requirement is total
-live migration time < 2Sec, while keeping the VM down time < 10ms when running
-DPDK L2 forwarding workload.
-
-We measured the baseline data of migrating an idle 8GiB guest running a DPDK L2
-forwarding work load and observed that the total live migration time was 2271ms
-while the VM downtime was 26ms. Both of these two indicators failed to satisfy
-the requirements.
-
-Current Challenges
-------------------
-
-The following 4 features have been developed over the years to make the live
-migration process faster.
-
-+ XBZRLE:
-        Helps to reduce the network traffic by just sending the
-        compressed data.
-+ RDMA:
-        Uses a specific NIC to increase the efficiency of data
-        transmission.
-+ Multi thread compression:
-        Compresses the data before transmission.
-+ Auto convergence:
-        Reduces the data rate of dirty pages.
-
-Tests show none of the above features can satisfy the requirement of NFV.
-XBZRLE and Multi thread compression do the compression entirely in software and
-they are not fast enough in a 10Gbps network environment. RDMA is not flexible
-because it has to transport all the guest memory to the destination without zero
-page optimization. Auto convergence is not appropriate for NFV because it will
-impact guest’s performance.
-
-So we need to find other ways for optimization.
-
-Optimizations
--------------------------
-a. Delay non-emergency operations
-   By profiling, it was discovered that some of the cleanup operations during
-   the stop and copy stage are the main reason for the long VM down time. The
-   cleanup operation includes stopping the dirty page logging, which is a time
-   consuming operation. By deferring these operations until the data transmission
-   is completed the VM down time is reduced to about 5-7ms.
-b. Optimize zero page checking
-   Currently QEMU uses the SSE2 instruction to optimize the zero pages
-   checking.  The SSE2 instruction can process 16 bytes per instruction. By using
-   the AVX2 instruction, we can process 32 bytes per instruction. Testingt shows
-   that using AVX2 can speed up the zero pages checking process by about 25%.
-c. Remove unnecessary context synchronization.
-   The CPU context was being synchronized twice during live migration. Removing
-   this unnecessary synchronization shortened the VM downtime by about 100us.
-
-Test Environment
-----------------
-
-The source and destination host have the same hardware and OS:
-::
-Host: HSW-EP
-CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
-RAM: 64G
-OS: RHEL 7.1
-Kernel: 4.2
-QEMU v2.4.0
-
-Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
-QEMU parameters:
-::
-  /root/qemu.git/x86_64-softmmu/qemu-system-x86_64-enable-kvm -cpu host -smp 4 –device virtio-net-pci,netdev=net1,mac=52:54:00:12:34:56 –netdev type=tap,id=net1,script=/etc/kvm/qemu-ifup,downscript=no,vhost=on–device virtio-net-pci,netdev=net2,mac=54:54:00:12:34:56 –netdevtype=tap,id=net2,script=/etc/kvm/qemu-ifup2,downscript=no,vhost=on  -balloon virtio -m 8192-monitor stdio  /mnt/liang/ia32e_rhel6u5.qcow
-
-Network connection
-
-.. figure:: lmnetwork.jpg
-   :align: center
-   :alt: live migration network connection
-   :figwidth: 80%
-
-
-Test Result
------------
-The down time is set to 10ms when doing the test. We use pktgen to send the
-packages to guest, the package size is 64 bytes, and the line rate is 2013
-Mbps.
-
-a. Total live migration time
-
-   The total live migration time before and after optimization is shown in the
-   chart below. For an idle guest, we can reduce the total live migration time
-   from 2070ms to 401ms. For a guest running the DPDK L2 forwarding workload,
-   the total live migration time is reduced from 2271ms to 654ms.
-
-.. figure:: lmtotaltime.jpg
-   :align: center
-   :alt: total live migration time
-
-b. VM downtime
-
-   The VM down time before and after optimization is shown in the chart below.
-   For an idle guest, we can reduce the VM down time from 29ms to 9ms. For a guest
-   running the DPDK L2 forwarding workload, the VM down time is reduced from 26ms to
-   5ms.
-
-.. figure:: lmdowntime.jpg
-   :align: center
-   :alt: vm downtime
-   :figwidth: 80%
diff --git a/docs/all/lmdowntime.jpg b/docs/all/lmdowntime.jpg
deleted file mode 100644
index c9faa4c73..000000000
--- a/docs/all/lmdowntime.jpg
+++ /dev/null
diff --git a/docs/all/lmnetwork.jpg b/docs/all/lmnetwork.jpg
deleted file mode 100644
index 8a9a324c3..000000000
--- a/docs/all/lmnetwork.jpg
+++ /dev/null
diff --git a/docs/all/lmtotaltime.jpg b/docs/all/lmtotaltime.jpg
deleted file mode 100644
index 2dced3987..000000000
--- a/docs/all/lmtotaltime.jpg
+++ /dev/null
diff --git a/docs/all/tuning.rst b/docs/all/tuning.rst
deleted file mode 100644
index 760861b8b..000000000
--- a/docs/all/tuning.rst
+++ /dev/null
@@ -1,101 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) <optionally add copywriters name>
-
-Low Latency Tunning Suggestion
-==============================
-
-The correct configuration is critical for improving the NFV performance/latency.
-Even working on the same codebase, configurations can cause wildly different
-performance/latency results.
-
-There are many combinations of configurations, from hardware configuration to
-Operating System configuration and application level configuration. And there
-is no one simple configuration that works for every case. To tune a specific
-scenario, it's important to know the behaviors of different configurations and
-their impact.
-
-Platform Configuration
-----------------------
-
-Some hardware features can be configured through firmware interface(like BIOS)
-but others may not be configurable (e.g. SMI on most platforms).
-
-* **Power management:**
-  Most power management related features save power at the
-  expensive of latency. These features include: Intel®Turbo Boost Technology,
-  Enhanced Intel®SpeedStep, Processor C state and P state. Normally they should
-  be disabled but, depending on the real-time application design and latency
-  requirements, there might be some features that can be enabled if the impact on
-  deterministic execution of the workload is small.
-
-* **Hyper-Threading:**
-  The logic cores that share resource with other logic cores can introduce
-  latency so the recommendation is to disable this feature for realtime use
-  cases.
-
-* **Legacy USB Support/Port 60/64 Emulation:**
-  These features involve some emulation in firmware and can introduce random
-  latency. It is recommended that they are disabled.
-
-* **SMI (System Management Interrupt):**
-  SMI runs outside of the kernel code and can potentially cause
-  latency. It is a pity there is no simple way to disable it. Some vendors may
-  provide related switches in BIOS but most machines do not have this capability.
-
-Operating System Configuration
-------------------------------
-
-* **CPU isolation:**
-  To achieve deterministic latency, dedicated CPUs should be allocated for
-  realtime application. This can be achieved by isolating cpus from kernel
-  scheduler. Please refer to
-  http://lxr.free-electrons.com/source/Documentation/kernel-parameters.txt#L1608
-  for more information.
-
-* **Memory allocation:**
-  Memory shoud be reserved for realtime applications and usually hugepage should
-  be used to reduce page fauts/TLB misses.
-
-* **IRQ affinity:**
-  All the non-realtime IRQs should be affinitized to non realtime CPUs to
-  reduce the impact on realtime CPUs. Some OS distributions contain an irqbalance
-  daemon which balances the IRQs among all the cores dynamically. It should be
-  disabled as well.
-
-* **Device assignment for VM:**
-  If a device is used in a VM, then device passthrough is desirable. In this case,
-  the IOMMU should be enabled.
-
-* **Tickless:**
-  Frequent clock ticks cause latency. CONFIG_NOHZ_FULL should be enabled in the
-  linux kernel. With CONFIG_NOHZ_FULL, the physical CPU will trigger many fewer
-  clock tick interrupts(currently, 1 tick per second). This can reduce latency
-  because each host timer interrupt triggers a VM exit from guest to host which
-  causes performance/latency impacts.
-
-* **TSC:**
-  Mark TSC clock source as reliable. A TSC clock source that seems to be
-  unreliable causes the kernel to continuously enable the clock source watchdog
-  to check if TSC frequency is still correct. On recent Intel platforms with
-  Constant TSC/Invariant TSC/Synchronized TSC, the TSC is reliable so the
-  watchdog is useless but cause latency.
-
-* **Idle:**
-  The poll option forces a polling idle loop that can slightly improve the
-  performance of waking up an idle CPU.
-
-* **RCU_NOCB:**
-  RCU is a kernel synchronization mechanism. Refer to
-  http://lxr.free-electrons.com/source/Documentation/RCU/whatisRCU.txt for more
-  information. With RCU_NOCB, the impact from RCU to the VNF will be reduced.
-
-* **Disable the RT throttling:**
-  RT Throttling is a Linux kernel mechanism that
-  occurs when a process or thread uses 100% of the core, leaving no resources for
-  the Linux scheduler to execute the kernel/housekeeping tasks. RT Throttling
-  increases the latency so should be disabled.
-
-* **NUMA configuration:**
-  To achieve the best latency. CPU/Memory and device allocated for realtime
-  application/VM should be in the same NUMA node.
author	Jiang, Yunhong <yunhong.jiang@intel.com>	2016-08-19 22:38:30 +0000
committer	Gerrit Code Review <gerrit@172.30.200.206>	2016-08-19 22:38:30 +0000
commit	1ec14a31c3ee8df1b8602632e0e3295547e3ea12 (patch)
tree	13d77cab75c7aebcfd88988e1c104188472e2e6f /docs/all
parent	b1c117f1c8414bddbe4370414590f5f0b62ae4d1 (diff)
parent	5a56bf69988b7c72e88546eb4659576fb51bfb77 (diff)