summaryrefslogtreecommitdiffstats
path: root/kernel/Documentation/virtual
diff options
context:
space:
mode:
authorJosé Pekkarinen <jose.pekkarinen@nokia.com>2016-04-11 10:41:07 +0300
committerJosé Pekkarinen <jose.pekkarinen@nokia.com>2016-04-13 08:17:18 +0300
commite09b41010ba33a20a87472ee821fa407a5b8da36 (patch)
treed10dc367189862e7ca5c592f033dc3726e1df4e3 /kernel/Documentation/virtual
parentf93b97fd65072de626c074dbe099a1fff05ce060 (diff)
These changes are the raw update to linux-4.4.6-rt14. Kernel sources
are taken from kernel.org, and rt patch from the rt wiki download page. During the rebasing, the following patch collided: Force tick interrupt and get rid of softirq magic(I70131fb85). Collisions have been removed because its logic was found on the source already. Change-Id: I7f57a4081d9deaa0d9ccfc41a6c8daccdee3b769 Signed-off-by: José Pekkarinen <jose.pekkarinen@nokia.com>
Diffstat (limited to 'kernel/Documentation/virtual')
-rw-r--r--kernel/Documentation/virtual/kvm/api.txt145
-rw-r--r--kernel/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt187
-rw-r--r--kernel/Documentation/virtual/kvm/devices/arm-vgic.txt18
-rw-r--r--kernel/Documentation/virtual/kvm/devices/vm.txt2
-rw-r--r--kernel/Documentation/virtual/kvm/locking.txt12
-rw-r--r--kernel/Documentation/virtual/kvm/mmu.txt9
-rw-r--r--kernel/Documentation/virtual/kvm/ppc-pv.txt2
7 files changed, 339 insertions, 36 deletions
diff --git a/kernel/Documentation/virtual/kvm/api.txt b/kernel/Documentation/virtual/kvm/api.txt
index 9fa2bf8c3..092ee9fba 100644
--- a/kernel/Documentation/virtual/kvm/api.txt
+++ b/kernel/Documentation/virtual/kvm/api.txt
@@ -254,6 +254,11 @@ since the last call to this ioctl. Bit 0 is the first page in the
memory slot. Ensure the entire structure is cleared to avoid padding
issues.
+If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
+the address space for which you want to return the dirty bitmap.
+They must be less than the value that KVM_CHECK_EXTENSION returns for
+the KVM_CAP_MULTI_ADDRESS_SPACE capability.
+
4.9 KVM_SET_MEMORY_ALIAS
@@ -396,10 +401,9 @@ Capability: basic
Architectures: x86, ppc, mips
Type: vcpu ioctl
Parameters: struct kvm_interrupt (in)
-Returns: 0 on success, -1 on error
+Returns: 0 on success, negative on failure.
-Queues a hardware interrupt vector to be injected. This is only
-useful if in-kernel local APIC or equivalent is not used.
+Queues a hardware interrupt vector to be injected.
/* for KVM_INTERRUPT */
struct kvm_interrupt {
@@ -409,7 +413,14 @@ struct kvm_interrupt {
X86:
-Note 'irq' is an interrupt vector, not an interrupt pin or line.
+Returns: 0 on success,
+ -EEXIST if an interrupt is already enqueued
+ -EINVAL the the irq number is invalid
+ -ENXIO if the PIC is in the kernel
+ -EFAULT if the pointer is invalid
+
+Note 'irq' is an interrupt vector, not an interrupt pin or line. This
+ioctl is useful if the in-kernel PIC is not used.
PPC:
@@ -820,11 +831,21 @@ struct kvm_vcpu_events {
} nmi;
__u32 sipi_vector;
__u32 flags;
+ struct {
+ __u8 smm;
+ __u8 pending;
+ __u8 smm_inside_nmi;
+ __u8 latched_init;
+ } smi;
};
-KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
-interrupt.shadow contains a valid state. Otherwise, this field is undefined.
+Only two fields are defined in the flags field:
+- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
+ interrupt.shadow contains a valid state.
+
+- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
+ smi contains a valid state.
4.32 KVM_SET_VCPU_EVENTS
@@ -841,17 +862,20 @@ vcpu.
See KVM_GET_VCPU_EVENTS for the data structure.
Fields that may be modified asynchronously by running VCPUs can be excluded
-from the update. These fields are nmi.pending and sipi_vector. Keep the
-corresponding bits in the flags field cleared to suppress overwriting the
-current in-kernel state. The bits are:
+from the update. These fields are nmi.pending, sipi_vector, smi.smm,
+smi.pending. Keep the corresponding bits in the flags field cleared to
+suppress overwriting the current in-kernel state. The bits are:
KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
+KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
the flags field to signal that interrupt.shadow contains a valid state and
shall be written into the VCPU.
+KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
+
4.33 KVM_GET_DEBUGREGS
@@ -911,6 +935,13 @@ slot. When changing an existing slot, it may be moved in the guest
physical memory space, or its flags may be modified. It may not be
resized. Slots may not overlap in guest physical address space.
+If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
+specifies the address space which is being modified. They must be
+less than the value that KVM_CHECK_EXTENSION returns for the
+KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces
+are unrelated; the restriction on overlapping slots only applies within
+each address space.
+
Memory for the region is taken starting at the address denoted by the
field userspace_addr, which must point at user addressable memory for
the entire memory slot size. Any object may back this memory, including
@@ -959,7 +990,8 @@ documentation when it pops into existence).
4.37 KVM_ENABLE_CAP
Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
-Architectures: ppc, s390
+Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
+ mips (only KVM_CAP_ENABLE_CAP), ppc, s390
Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
Parameters: struct kvm_enable_cap (in)
Returns: 0 on success; -1 on error
@@ -1268,7 +1300,7 @@ The flags bitmap is defined as:
/* the host supports the ePAPR idle hcall
#define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
-4.48 KVM_ASSIGN_PCI_DEVICE
+4.48 KVM_ASSIGN_PCI_DEVICE (deprecated)
Capability: none
Architectures: x86
@@ -1318,7 +1350,7 @@ Errors:
have their standard meanings.
-4.49 KVM_DEASSIGN_PCI_DEVICE
+4.49 KVM_DEASSIGN_PCI_DEVICE (deprecated)
Capability: none
Architectures: x86
@@ -1337,7 +1369,7 @@ Errors:
Other error conditions may be defined by individual device types or
have their standard meanings.
-4.50 KVM_ASSIGN_DEV_IRQ
+4.50 KVM_ASSIGN_DEV_IRQ (deprecated)
Capability: KVM_CAP_ASSIGN_DEV_IRQ
Architectures: x86
@@ -1377,7 +1409,7 @@ Errors:
have their standard meanings.
-4.51 KVM_DEASSIGN_DEV_IRQ
+4.51 KVM_DEASSIGN_DEV_IRQ (deprecated)
Capability: KVM_CAP_ASSIGN_DEV_IRQ
Architectures: x86
@@ -1451,7 +1483,7 @@ struct kvm_irq_routing_s390_adapter {
};
-4.53 KVM_ASSIGN_SET_MSIX_NR
+4.53 KVM_ASSIGN_SET_MSIX_NR (deprecated)
Capability: none
Architectures: x86
@@ -1473,7 +1505,7 @@ struct kvm_assigned_msix_nr {
#define KVM_MAX_MSIX_PER_DEV 256
-4.54 KVM_ASSIGN_SET_MSIX_ENTRY
+4.54 KVM_ASSIGN_SET_MSIX_ENTRY (deprecated)
Capability: none
Architectures: x86
@@ -1572,7 +1604,7 @@ provided event instead of triggering an exit.
struct kvm_ioeventfd {
__u64 datamatch;
__u64 addr; /* legal pio/mmio address */
- __u32 len; /* 1, 2, 4, or 8 bytes */
+ __u32 len; /* 0, 1, 2, 4, or 8 bytes */
__s32 fd;
__u32 flags;
__u8 pad[36];
@@ -1595,6 +1627,10 @@ to the registered address is equal to datamatch in struct kvm_ioeventfd.
For virtio-ccw devices, addr contains the subchannel id and datamatch the
virtqueue index.
+With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
+the kernel will ignore the length of guest write and may get a faster vmexit.
+The speedup may only apply to specific architectures, but the ioeventfd will
+work anyway.
4.60 KVM_DIRTY_TLB
@@ -1629,7 +1665,7 @@ should skip processing the bitmap and just invalidate everything. It must
be set to the number of set bits in the bitmap.
-4.61 KVM_ASSIGN_SET_INTX_MASK
+4.61 KVM_ASSIGN_SET_INTX_MASK (deprecated)
Capability: KVM_CAP_PCI_2_3
Architectures: x86
@@ -1748,7 +1784,7 @@ has been called, this interface is completely emulated within the kernel.
To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
following algorithm:
- - pause the vpcu
+ - pause the vcpu
- read the local APIC's state (KVM_GET_LAPIC)
- check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
- if so, issue KVM_NMI
@@ -2645,7 +2681,7 @@ handled.
4.87 KVM_SET_GUEST_DEBUG
Capability: KVM_CAP_SET_GUEST_DEBUG
-Architectures: x86, s390, ppc
+Architectures: x86, s390, ppc, arm64
Type: vcpu ioctl
Parameters: struct kvm_guest_debug (in)
Returns: 0 on success; -1 on error
@@ -2667,8 +2703,8 @@ when running. Common control bits are:
The top 16 bits of the control field are architecture specific control
flags which can include the following:
- - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
- - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
+ - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
+ - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
- KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
@@ -2683,6 +2719,11 @@ updated to the correct (supplied) values.
The second part of the structure is architecture specific and
typically contains a set of debug registers.
+For arm64 the number of debug registers is implementation defined and
+can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
+KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
+indicating the number of supported registers.
+
When debug events exit the main run loop with the reason
KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
structure containing architecture specific debug information.
@@ -2767,7 +2808,7 @@ Returns: = 0 on success,
< 0 on generic error (e.g. -EFAULT or -ENOMEM),
> 0 if an exception occurred while walking the page tables
-Read or write data from/to the logical (virtual) memory of a VPCU.
+Read or write data from/to the logical (virtual) memory of a VCPU.
Parameters are specified via the following structure:
@@ -2978,6 +3019,16 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
which is the maximum number of possibly pending cpu-local interrupts.
+4.90 KVM_SMI
+
+Capability: KVM_CAP_X86_SMM
+Architectures: x86
+Type: vcpu ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+
+Queues an SMI on the thread's vcpu.
+
5. The kvm_run structure
------------------------
@@ -3013,7 +3064,12 @@ an interrupt can be injected now with KVM_INTERRUPT.
The value of the current interrupt flag. Only valid if in-kernel
local APIC is not used.
- __u8 padding2[2];
+ __u16 flags;
+
+More architecture-specific flags detailing state of the VCPU that may
+affect the device's behavior. The only currently defined flag is
+KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
+VCPU is in system management mode.
/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
@@ -3070,11 +3126,13 @@ data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
where kvm expects application code to place the data for the next
KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
+ /* KVM_EXIT_DEBUG */
struct {
struct kvm_debug_exit_arch arch;
} debug;
-Unused.
+If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
+for which architecture specific information is returned.
/* KVM_EXIT_MMIO */
struct {
@@ -3236,6 +3294,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
struct {
#define KVM_SYSTEM_EVENT_SHUTDOWN 1
#define KVM_SYSTEM_EVENT_RESET 2
+#define KVM_SYSTEM_EVENT_CRASH 3
__u32 type;
__u64 flags;
} system_event;
@@ -3255,6 +3314,22 @@ Valid values for 'type' are:
KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
As with SHUTDOWN, userspace can choose to ignore the request, or
to schedule the reset to occur in the future and may call KVM_RUN again.
+ KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
+ has requested a crash condition maintenance. Userspace can choose
+ to ignore the request, or to gather VM memory core dump and/or
+ reset/shutdown of the VM.
+
+ /* KVM_EXIT_IOAPIC_EOI */
+ struct {
+ __u8 vector;
+ } eoi;
+
+Indicates that the VCPU's in-kernel local APIC received an EOI for a
+level-triggered IOAPIC interrupt. This exit only triggers when the
+IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
+the userspace IOAPIC should process the EOI and retrigger the interrupt if
+it is still asserted. Vector is the LAPIC interrupt vector for which the
+EOI was received.
/* Fix the size of the union. */
char padding[256];
@@ -3574,6 +3649,26 @@ struct {
KVM handlers should exit to userspace with rc = -EREMOTE.
+7.5 KVM_CAP_SPLIT_IRQCHIP
+
+Architectures: x86
+Parameters: args[0] - number of routes reserved for userspace IOAPICs
+Returns: 0 on success, -1 on error
+
+Create a local apic for each processor in the kernel. This can be used
+instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
+IOAPIC and PIC (and also the PIT, even though this has to be enabled
+separately).
+
+This capability also enables in kernel routing of interrupt requests;
+when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
+used in the IRQ routing table. The first args[0] MSI routes are reserved
+for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
+a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
+
+Fails if VCPU has already been created, or if the irqchip is already in the
+kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
+
8. Other capabilities.
----------------------
diff --git a/kernel/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/kernel/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 000000000..38bca2835
--- /dev/null
+++ b/kernel/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,187 @@
+KVM/ARM VGIC Forwarded Physical Interrupts
+==========================================
+
+The KVM/ARM code implements software support for the ARM Generic
+Interrupt Controller's (GIC's) hardware support for virtualization by
+allowing software to inject virtual interrupts to a VM, which the guest
+OS sees as regular interrupts. The code is famously known as the VGIC.
+
+Some of these virtual interrupts, however, correspond to physical
+interrupts from real physical devices. One example could be the
+architected timer, which itself supports virtualization, and therefore
+lets a guest OS program the hardware device directly to raise an
+interrupt at some point in time. When such an interrupt is raised, the
+host OS initially handles the interrupt and must somehow signal this
+event as a virtual interrupt to the guest. Another example could be a
+passthrough device, where the physical interrupts are initially handled
+by the host, but the device driver for the device lives in the guest OS
+and KVM must therefore somehow inject a virtual interrupt on behalf of
+the physical one to the guest OS.
+
+These virtual interrupts corresponding to a physical interrupt on the
+host are called forwarded physical interrupts, but are also sometimes
+referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
+
+Forwarded physical interrupts are handled slightly differently compared
+to virtual interrupts generated purely by a software emulated device.
+
+
+The HW bit
+----------
+Virtual interrupts are signalled to the guest by programming the List
+Registers (LRs) on the GIC before running a VCPU. The LR is programmed
+with the virtual IRQ number and the state of the interrupt (Pending,
+Active, or Pending+Active). When the guest ACKs and EOIs a virtual
+interrupt, the LR state moves from Pending to Active, and finally to
+inactive.
+
+The LRs include an extra bit, called the HW bit. When this bit is set,
+KVM must also program an additional field in the LR, the physical IRQ
+number, to link the virtual with the physical IRQ.
+
+When the HW bit is set, KVM must EITHER set the Pending OR the Active
+bit, never both at the same time.
+
+Setting the HW bit causes the hardware to deactivate the physical
+interrupt on the physical distributor when the guest deactivates the
+corresponding virtual interrupt.
+
+
+Forwarded Physical Interrupts Life Cycle
+----------------------------------------
+
+The state of forwarded physical interrupts is managed in the following way:
+
+ - The physical interrupt is acked by the host, and becomes active on
+ the physical distributor (*).
+ - KVM sets the LR.Pending bit, because this is the only way the GICV
+ interface is going to present it to the guest.
+ - LR.Pending will stay set as long as the guest has not acked the interrupt.
+ - LR.Pending transitions to LR.Active on the guest read of the IAR, as
+ expected.
+ - On guest EOI, the *physical distributor* active bit gets cleared,
+ but the LR.Active is left untouched (set).
+ - KVM clears the LR on VM exits when the physical distributor
+ active state has been cleared.
+
+(*): The host handling is slightly more complicated. For some forwarded
+interrupts (shared), KVM directly sets the active state on the physical
+distributor before entering the guest, because the interrupt is never actually
+handled on the host (see details on the timer as an example below). For other
+forwarded interrupts (non-shared) the host does not deactivate the interrupt
+when the host ISR completes, but leaves the interrupt active until the guest
+deactivates it. Leaving the interrupt active is allowed, because Linux
+configures the physical GIC with EOIMode=1, which causes EOI operations to
+perform a priority drop allowing the GIC to receive other interrupts of the
+default priority.
+
+
+Forwarded Edge and Level Triggered PPIs and SPIs
+------------------------------------------------
+Forwarded physical interrupts injected should always be active on the
+physical distributor when injected to a guest.
+
+Level-triggered interrupts will keep the interrupt line to the GIC
+asserted, typically until the guest programs the device to deassert the
+line. This means that the interrupt will remain pending on the physical
+distributor until the guest has reprogrammed the device. Since we
+always run the VM with interrupts enabled on the CPU, a pending
+interrupt will exit the guest as soon as we switch into the guest,
+preventing the guest from ever making progress as the process repeats
+over and over. Therefore, the active state on the physical distributor
+must be set when entering the guest, preventing the GIC from forwarding
+the pending interrupt to the CPU. As soon as the guest deactivates the
+interrupt, the physical line is sampled by the hardware again and the host
+takes a new interrupt if and only if the physical line is still asserted.
+
+Edge-triggered interrupts do not exhibit the same problem with
+preventing guest execution that level-triggered interrupts do. One
+option is to not use HW bit at all, and inject edge-triggered interrupts
+from a physical device as pure virtual interrupts. But that would
+potentially slow down handling of the interrupt in the guest, because a
+physical interrupt occurring in the middle of the guest ISR would
+preempt the guest for the host to handle the interrupt. Additionally,
+if you configure the system to handle interrupts on a separate physical
+core from that running your VCPU, you still have to interrupt the VCPU
+to queue the pending state onto the LR, even though the guest won't use
+this information until the guest ISR completes. Therefore, the HW
+bit should always be set for forwarded edge-triggered interrupts. With
+the HW bit set, the virtual interrupt is injected and additional
+physical interrupts occurring before the guest deactivates the interrupt
+simply mark the state on the physical distributor as Pending+Active. As
+soon as the guest deactivates the interrupt, the host takes another
+interrupt if and only if there was a physical interrupt between injecting
+the forwarded interrupt to the guest and the guest deactivating the
+interrupt.
+
+Consequently, whenever we schedule a VCPU with one or more LRs with the
+HW bit set, the interrupt must also be active on the physical
+distributor.
+
+
+Forwarded LPIs
+--------------
+LPIs, introduced in GICv3, are always edge-triggered and do not have an
+active state. They become pending when a device signal them, and as
+soon as they are acked by the CPU, they are inactive again.
+
+It therefore doesn't make sense, and is not supported, to set the HW bit
+for physical LPIs that are forwarded to a VM as virtual interrupts,
+typically virtual SPIs.
+
+For LPIs, there is no other choice than to preempt the VCPU thread if
+necessary, and queue the pending state onto the LR.
+
+
+Putting It Together: The Architected Timer
+------------------------------------------
+The architected timer is a device that signals interrupts with level
+triggered semantics. The timer hardware is directly accessed by VCPUs
+which program the timer to fire at some point in time. Each VCPU on a
+system programs the timer to fire at different times, and therefore the
+hardware is multiplexed between multiple VCPUs. This is implemented by
+context-switching the timer state along with each VCPU thread.
+
+However, this means that a scenario like the following is entirely
+possible, and in fact, typical:
+
+1. KVM runs the VCPU
+2. The guest programs the time to fire in T+100
+3. The guest is idle and calls WFI (wait-for-interrupts)
+4. The hardware traps to the host
+5. KVM stores the timer state to memory and disables the hardware timer
+6. KVM schedules a soft timer to fire in T+(100 - time since step 2)
+7. KVM puts the VCPU thread to sleep (on a waitqueue)
+8. The soft timer fires, waking up the VCPU thread
+9. KVM reprograms the timer hardware with the VCPU's values
+10. KVM marks the timer interrupt as active on the physical distributor
+11. KVM injects a forwarded physical interrupt to the guest
+12. KVM runs the VCPU
+
+Notice that KVM injects a forwarded physical interrupt in step 11 without
+the corresponding interrupt having actually fired on the host. That is
+exactly why we mark the timer interrupt as active in step 10, because
+the active state on the physical distributor is part of the state
+belonging to the timer hardware, which is context-switched along with
+the VCPU thread.
+
+If the guest does not idle because it is busy, the flow looks like this
+instead:
+
+1. KVM runs the VCPU
+2. The guest programs the time to fire in T+100
+4. At T+100 the timer fires and a physical IRQ causes the VM to exit
+ (note that this initially only traps to EL2 and does not run the host ISR
+ until KVM has returned to the host).
+5. With interrupts still disabled on the CPU coming back from the guest, KVM
+ stores the virtual timer state to memory and disables the virtual hw timer.
+6. KVM looks at the timer state (in memory) and injects a forwarded physical
+ interrupt because it concludes the timer has expired.
+7. KVM marks the timer interrupt as active on the physical distributor
+7. KVM enables the timer, enables interrupts, and runs the VCPU
+
+Notice that again the forwarded physical interrupt is injected to the
+guest without having actually been handled on the host. In this case it
+is because the physical interrupt is never actually seen by the host because the
+timer is disabled upon guest return, and the virtual forwarded interrupt is
+injected on the KVM guest entry path.
diff --git a/kernel/Documentation/virtual/kvm/devices/arm-vgic.txt b/kernel/Documentation/virtual/kvm/devices/arm-vgic.txt
index 3fb905429..59541d49e 100644
--- a/kernel/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/kernel/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -44,28 +44,29 @@ Groups:
Attributes:
The attr field of kvm_device_attr encodes two values:
bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 |
- values: | reserved | cpu id | offset |
+ values: | reserved | vcpu_index | offset |
All distributor regs are (rw, 32-bit)
The offset is relative to the "Distributor base address" as defined in the
GICv2 specs. Getting or setting such a register has the same effect as
- reading or writing the register on the actual hardware from the cpu
- specified with cpu id field. Note that most distributor fields are not
- banked, but return the same value regardless of the cpu id used to access
- the register.
+ reading or writing the register on the actual hardware from the cpu whose
+ index is specified with the vcpu_index field. Note that most distributor
+ fields are not banked, but return the same value regardless of the
+ vcpu_index used to access the register.
Limitations:
- Priorities are not implemented, and registers are RAZ/WI
- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
Errors:
- -ENODEV: Getting or setting this register is not yet supported
+ -ENXIO: Getting or setting this register is not yet supported
-EBUSY: One or more VCPUs are running
+ -EINVAL: Invalid vcpu_index supplied
KVM_DEV_ARM_VGIC_GRP_CPU_REGS
Attributes:
The attr field of kvm_device_attr encodes two values:
bits: | 63 .... 40 | 39 .. 32 | 31 .... 0 |
- values: | reserved | cpu id | offset |
+ values: | reserved | vcpu_index | offset |
All CPU interface regs are (rw, 32-bit)
@@ -91,8 +92,9 @@ Groups:
- Priorities are not implemented, and registers are RAZ/WI
- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
Errors:
- -ENODEV: Getting or setting this register is not yet supported
+ -ENXIO: Getting or setting this register is not yet supported
-EBUSY: One or more VCPUs are running
+ -EINVAL: Invalid vcpu_index supplied
KVM_DEV_ARM_VGIC_GRP_NR_IRQS
Attributes:
diff --git a/kernel/Documentation/virtual/kvm/devices/vm.txt b/kernel/Documentation/virtual/kvm/devices/vm.txt
index 5542c4641..2d09d1ed8 100644
--- a/kernel/Documentation/virtual/kvm/devices/vm.txt
+++ b/kernel/Documentation/virtual/kvm/devices/vm.txt
@@ -74,7 +74,7 @@ struct kvm_s390_vm_cpu_processor {
KVM does not enforce or limit the cpu model data in any form. Take the information
retrieved by means of KVM_S390_VM_CPU_MACHINE as hint for reasonable configuration
-setups. Instruction interceptions triggered by additionally set facilitiy bits that
+setups. Instruction interceptions triggered by additionally set facility bits that
are not handled by KVM need to by imlemented in the VM driver code.
Parameters: address of buffer to store/set the processor related cpu
diff --git a/kernel/Documentation/virtual/kvm/locking.txt b/kernel/Documentation/virtual/kvm/locking.txt
index d68af4dc3..19f94a6b9 100644
--- a/kernel/Documentation/virtual/kvm/locking.txt
+++ b/kernel/Documentation/virtual/kvm/locking.txt
@@ -166,3 +166,15 @@ Comment: The srcu read lock must be held while accessing memslots (e.g.
MMIO/PIO address->device structure mapping (kvm->buses).
The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
if it is needed by multiple functions.
+
+Name: blocked_vcpu_on_cpu_lock
+Type: spinlock_t
+Arch: x86
+Protects: blocked_vcpu_on_cpu
+Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts.
+ When VT-d posted-interrupts is supported and the VM has assigned
+ devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
+ protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
+ wakeup notification event since external interrupts from the
+ assigned devices happens, we will find the vCPU on the list to
+ wakeup.
diff --git a/kernel/Documentation/virtual/kvm/mmu.txt b/kernel/Documentation/virtual/kvm/mmu.txt
index c59bd9bc4..b653641d4 100644
--- a/kernel/Documentation/virtual/kvm/mmu.txt
+++ b/kernel/Documentation/virtual/kvm/mmu.txt
@@ -173,6 +173,12 @@ Shadow pages contain the following information:
Contains the value of cr4.smap && !cr0.wp for which the page is valid
(pages for which this is true are different from other pages; see the
treatment of cr0.wp=0 below).
+ role.smm:
+ Is 1 if the page is valid in system management mode. This field
+ determines which of the kvm_memslots array was used to build this
+ shadow page; it is also used to go back from a struct kvm_mmu_page
+ to a memslot, through the kvm_memslots_for_spte_role macro and
+ __gfn_to_memslot.
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
@@ -352,7 +358,8 @@ In the first case there are two additional complications:
- if CR4.SMEP is enabled: since we've turned the page into a kernel page,
the kernel may now execute it. We handle this by also setting spte.nx.
If we get a user fetch or read fault, we'll change spte.u=1 and
- spte.nx=gpte.nx back.
+ spte.nx=gpte.nx back. For this to work, KVM forces EFER.NX to 1 when
+ shadow paging is in use.
- if CR4.SMAP is disabled: since the page has been changed to a kernel
page, it can not be reused when CR4.SMAP is enabled. We set
CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/kernel/Documentation/virtual/kvm/ppc-pv.txt b/kernel/Documentation/virtual/kvm/ppc-pv.txt
index 319560646..e26115ce4 100644
--- a/kernel/Documentation/virtual/kvm/ppc-pv.txt
+++ b/kernel/Documentation/virtual/kvm/ppc-pv.txt
@@ -110,7 +110,7 @@ Flags are passed to the host in the low 12 bits of the Effective Address.
The following flags are currently available for a guest to expose:
- MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correclty wrt magic page
+ MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correctly wrt magic page
MSR bits
========