Age | Commit message (Collapse) | Author | Files | Lines |
|
The preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always returning true for preemption timer vmexit. Then,
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's
vmexit handler:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
PGD 0
Oops: 0010 [#1] SMP
Call Trace:
? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
handle_preemption_timer+0xe/0x20 [kvm_intel]
vmx_handle_exit+0x169/0x15a0 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
? vcpu_load+0x1c/0x60 [kvm]
? kvm_arch_vcpu_load+0x57/0x260 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
SyS_ioctl+0x79/0x90
do_syscall_64+0x68/0x180
entry_SYSCALL64_slow_path+0x25/0x25
Code: Bad RIP value.
RIP [< (null)>] (null)
RSP <ffff8800b5263c48>
CR2: 0000000000000000
---[ end trace 9c70c48b1a2bc66e ]---
This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.
Return false since preemption timer vmexits must never be reflected to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: Iaffcd503666879e8157c8559876330110a66e5c4
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Simplify cpu_has_vmx_preemption_timer. This is consistent with the
rest of setup_vmcs_config and preparatory for the next patch.
Tested-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: I3b33a881c5e47d5d3046e28374d0b0ca363ffad7
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
INFO: rcu_sched detected stalls on CPUs/tasks:
1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
(detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R running task 0 3529 3525 0x00080808
ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25
This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.
If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.
This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: I8a2decefab743aecdfab676fb9267324bf42b848
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Introduce cancel_hv_tscdeadline() to encapsulate preemption
timer cancel stuff.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: Icc038176cbf361a9ecdf37ed3425108db57617f2
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value. This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: I12eea18c3ec648dbf782d7754b7b574d7d6aa92c
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Hook the VMX preemption timer to the "hv timer" functionality added
by the previous patch. This includes: checking if the feature is
supported, if the feature is broken on the CPU, the hooks to
setup/clean the VMX preemption timer, arming the timer on vmentry
and handling the vmexit.
A module parameter states if the VMX preemption timer should be
utilized.
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
[Move hv_deadline_tsc to struct vcpu_vmx, use -1 as the "unset" value.
Put all VMX bits here. Enable it by default #yolo. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: Icb8e0b853eedce3d52c394e510fa14d2cdd432e9
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Prepare to switch from preemption timer to hrtimer in the
vmx_pre/post_block. Current functions are only for posted interrupt,
rename them accordingly.
upstream-status: backport
Change-Id: Ie1dde9be21deeb661de095e07d6c29bcba2e7d73
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
The VMX preemption timer can be used to virtualize the TSC deadline timer.
The VMX preemption timer is armed when the vCPU is running, and a VMExit
will happen if the virtual TSC deadline timer expires.
When the vCPU thread is blocked because of HLT, KVM will switch to use
an hrtimer, and then go back to the VMX preemption timer when the vCPU
thread is unblocked.
This solution avoids the complex OS's hrtimer system, and the host
timer interrupt handling cost, replacing them with a little math
(for guest->host TSC and host TSC->preemption timer conversion)
and a cheaper VMexit. This benefits latency for isolated pCPUs.
[A word about performance... Yunhong reported a 30% reduction in average
latency from cyclictest. I made a similar test with tscdeadline_latency
from kvm-unit-tests, and measured
- ~20 clock cycles loss (out of ~3200, so less than 1% but still
statistically significant) in the worst case where the test halts
just after programming the TSC deadline timer
- ~800 clock cycles gain (25% reduction in latency) in the best case
where the test busy waits.
I removed the VMX bits from Yunhong's patch, to concentrate them in the
next patch - Paolo]
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change-Id: I4aa1ecfa3463d1cbfb317511b45d2074b33d9b6f
upstream-status: backport
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
The function to start the tsc deadline timer virtualization will be used
also by the pre_block hook when we use the preemption timer; change it
to a separate function. No logic changes.
upstream-status: backport
Change-Id: Ie2fc19108c3252f8a299b17aba16c14aa8d31ae8
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
1. Uplifit fuel-plugin for kvmfornfv to fuel 9.0 due to that fuel 9.0 will be the
deployment tool of OPNFV Colorado release.
2. Fixed quirk for kernel.
3. Added all tools and libs for building OVS module.
4. Allow KVM developer to build fuel-plugin-kvm after they modify the kerenl
code without commiting their change into the repo first. So, they can test
their code change by fuel-plugin-kvm till they satisfy with their change,
then commit to the repo.
5. The final code built into OPNFV fuel iso will depend on the commit ID for
kvmfornfv speicifed in fuel for OPNFV source tree.
Change-Id: Iaf9ff49c69df374d0757884cfdac4cccee3eebe4
Signed-off-by: davidjchou <david.j.chou@intel.com>
|
|
Due to expediencyh concerns some quirk code was placed in the build
script in order to get the Brahmaputra release out on time. These
quirks don't belong in the script itself and, now that there is a
mechanisms to applly arbitrary patch files during the build script,
this patch moves that quirk code out to the separate patch file
area.
Upstream status - NA
Change-Id: Ib8100eef00009bbaf0e16b28849821ef5878f9f5
Signed-off-by: Donald Dugger <n0ano@n0ano.com>
|
|
|
|
Since the upstream kvmfornfv kernel version and OVS changed, updated
the related files in fuel-plugin, so it can build fuel-plugin-kvm with
latest kvmfornfv kernel and OVS module.
Upstream status: kvmfornfv kernel 4.4.6-rt14nfv
Change-Id: I6a7c312f7527acae4d2de64c7b43da4fbec41504
Signed-off-by: davidjchou <david.j.chou@intel.com>
|
|
|
|
These changes are no longer needed as they are now (correctly) incorporated
in the actual config file in the source tree.
Upstream status: NA
Change-Id: Ide33453789338ba3f7692ca6108de89a4ac3c222
Signed-off-by: Don Dugger <n0ano@n0ano.com>
|
|
metal execution of dpdk-16.04."
|
|
Collission happened in the following patches:
migration: do cleanup operation after completion(738df5b9)
Bug fix.(1750c932f86)
kvmclock: add a new function to update env->tsc.(b52baab2)
The code provided by the patches was already in the upstreamed
version.
Change-Id: I3cc11841a6a76ae20887b2e245710199e1ea7f9a
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
Given that OVS doesn't support Linux 4.4 yet we need to add a patch
to the OVS sources so that we can build a 4.4 version of the OVS
loadable kernel module that works with the RT Linux 4.4 kernel used
in OPNFV.
The directory `patches/ovs' contains patches (currently only one) that
are applied against the OVS tree. The Fuel build script is modified
to apply all of the patches in this directory to the OVS tree. Then
a working OVS KLM is created that is then inserted into the RT kernel
DEB package so that the end result is an RT kernel that supports OVS.
Upstream status: NA
Change-Id: I361f92526fb4bcafbeab9ce21570202f4aad1632
Signed-off-by: Don Dugger <n0ano@n0ano.com>
|
|
in bare metal execution of dpdk-16.04.
Upstream: NA.
Change-Id: Ia98461b15348a667c4989dfe1399f0c5bc0f0c12
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
|
|
|
|
Reverted execution attribute of shell script files in fuel-plugin
sub-dir. The execution attribute was lost when these files were
commited inside fuel-plugin sub-dir, and the executtion attribute of
these shell script files is necessary to the build of fuel-plugin-kvm.
Upstream status: NA
Change-Id: I0ac060a27c4a1570bff17864aa7db5b6b9edf732
Signed-off-by: davidjchou <david.j.chou@intel.com>
|
|
Upstream status: NA
In Brahmaputra release, KVM plugin functionality was bundled inside the
fuel-plugin-qemu. In order for easy maintenance, the KVM plugin functionality
would be separated from fuel-plugin-qemu and moved into an independent
fuel-plugin-kvm and keep the source code here starting from Colorado release.
Change-Id: Id89069234a4529cca40f1887e2d947378f928dd2
Signed-off-by: davidjchou <david.j.chou@intel.com>
Signed-off-by: Guo Ruijing <ruijing.guo@intel.com>
|
|
Change-Id: Id67d11079e738bc4180c395b86d56cfc81c3770f
Signed-off-by: Aric Gardner <agardner@linuxfoundation.org>
|
|
The OPNFV environment requires many kernel modules that are not
part of the default RT kernel environment. This patch adds those
modules back in.
Upstream status: NA
Change-Id: Id4e63f3d2dd3e19614e9e080adf1cdae9ab26ee1
|
|
This config file is based in the previous one, adding the
changes needed in the config file for this new kernel version.
It has been added in kernel support for CephFS.
Upstream: NA.
Change-Id: I1de8b4678bdfa81f4fc204f4a02d11f11cb5ae87
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
are taken from kernel.org, and rt patch from the rt wiki download page.
During the rebasing, the following patch collided:
Force tick interrupt and get rid of softirq magic(I70131fb85).
Collisions have been removed because its logic was found on the
source already.
Change-Id: I7f57a4081d9deaa0d9ccfc41a6c8daccdee3b769
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
Add jose.pekkarinen@nokia.com as committer
RT Ticket: 19272, 19265
Change-Id: Ia49e92674f2da2feea53e08231bb4fdd6b54ca88
Signed-off-by: Aric Gardner <agardner@linuxfoundation.org>
|
|
Upstream Status : Not Applicable
For detalied license information please refer to https://wiki.opnfv.org/documentation#licencing_your_documentation
Change-Id: Iacba57a9162b25abc474dbac236b4dee5f27544c
|
|
|
|
This is a candidate for the Brahmaputra branch.
Change-Id: Idbe41ac77f1aae902cd00af4bc9a0e3532f4284a
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
The document to share the fast live migration environment setup and test
method.
Change-Id: Icd616f58e2ea39d101fcebc1d760178151d8629f
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Raghuveer Reddy <raghuveer.reddy@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Adding documentation to kvm4nfv project based on
https://wiki.opnfv.org/documentation/tools and the contents mostly comes
from https://wiki.opnfv.org/nfv-kvm,
https://wiki.opnfv.org/nfv-kvm-tuning and
https://wiki.opnfv.org/nfv-kvm-test.
Change-Id: If321221724ec9b76db065af7cdab97ce981be740
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
The patch to run cyclictest on baremetal
(https://gerrit.opnfv.org/gerrit/#/c/3633/) has been merged, thus we
don't need apply the patch anymore. Remove it.
Change-Id: I7058f9d6c3e873b56be52a0e886fe460506a9911
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
* changes:
Cyclictest invokation script
Add the yardstick invokation script
|
|
|
|
* changes:
A script to launch real time guest
Script to create the guest image
Script to create the rt-tests rpm
|
|
|
|
The configuration of the kernel enables the framebuffer console without
any framebuffer selected other than i915. Adding the VESA compliant
framebuffer should fix this issue.
Change-Id: Icc384e05774e1de20985aeb19dfef25ae2431bb6
Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
|
|
We run the cyclictest through yardstick, which will help to setup the
environment.
Environment setup scripts are copied to the yardstick docker images. A
yardstick cyclictest yaml file are also used.
Change-Id: Iacf1299a38c3c81a08fd5fdbbf64c5a57f30c38b
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
Launch real time guest need special options like lock memory etc. A
script is used to achieved this. We invoke the qemu utility directly
instead of going through middle layer like libvirt, to get better
controll.
Change-Id: Ia6ad7313463e2f858516bddd4a4b58e95d8c943e
Signed-off-by: David Su <david.w.su@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
This script does the real yardstick works. It downloads the yardstick code,
and run the cyclictest test case.
This scripts is copied by the cyclictest.sh to the container image and
is executed from the yardstick container.
It's based on a script from QiLiang when discussing the integration with
yardstick.
Change-Id: I5920a21401a3e442d5f4fada05d9e789f2a99add
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: QiLiang <liangqi1@huawei.com>
|
|
This script download the latest CentOS cloud image and modify it to meet
the test requirement, including remove cloud related script, setup
the ssh keys etc.
It derives from yardstick project, which currently only provide support
to Ubuntu image. If they support CentOS image, we can possibly switch to
them.
Change-Id: I9482936615a28da696ff8f51248a62b13e5677f4
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
To achieve good real time and live migration performance, special setup
is needed on guest environment.
Two scripts are used to setup the guest environment. The guest-setup0.sh
setup the environment that should take effect before the tested kernel is
bringup, including install the kernel rpm, the rt-test package and modify
the grub entries. The guest-setup1.sh setup the environment that takes
effect after the tested kernel is up, like some sysfs entry, interrupt
affinity etc.
Change-Id: Icaed71e250b314723d6b1814c9ac33c10d99c6a0
Signed-off-by: David Su <david.w.su@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
The kernel rpm installation on rehl/centos is using grubby to setup the
default kernel entry, so we should not change the grub default set from
saved to 0.
Change-Id: I5910f498f5889c052e43d2e1e92b209c05b01455
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
There is no rt-tests rpm on centos repo, also we need some special
compile flag to enable the -a parameter. So we try to make the rpm
ourselves. The version is specified as 0.96.
But please notice that the Makefile in the rt-tests tree requires some
key, we have to disable that requirement. So this rpm is only for OPNFV
testing purpose.
Change-Id: Ifdd52649bc14405dbe5ad375dc7fd32087139b18
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|
|
|
|
|
|
|
|
Testing KVM4NFV project requires special host environment to get the
best result.
Two scrripts are used to setup the environment. The host-setup0.sh setup
the environment that should take effect before the kernel is bringup,
mostly the grub entries. The host-setup1.sh setup the environment that
takes effect after the kernel is up, like some sysfs entry, interrupt
affinity etc. The host-config provides the configurations.
Change-Id: Ie933ea0089ac82acd39fc48088615215993312f3
Signed-off-by: David Su <david.w.su@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
|