summaryrefslogtreecommitdiffstats
path: root/kernel/Documentation/x86
diff options
context:
space:
mode:
Diffstat (limited to 'kernel/Documentation/x86')
-rw-r--r--kernel/Documentation/x86/boot.txt3
-rw-r--r--kernel/Documentation/x86/entry_64.txt12
-rw-r--r--kernel/Documentation/x86/kernel-stacks (renamed from kernel/Documentation/x86/x86_64/kernel-stacks)56
-rw-r--r--kernel/Documentation/x86/mtrr.txt30
-rw-r--r--kernel/Documentation/x86/pat.txt48
-rw-r--r--kernel/Documentation/x86/x86_64/boot-options.txt3
-rw-r--r--kernel/Documentation/x86/zero-page.txt3
7 files changed, 130 insertions, 25 deletions
diff --git a/kernel/Documentation/x86/boot.txt b/kernel/Documentation/x86/boot.txt
index 88b85899d..9da6f3512 100644
--- a/kernel/Documentation/x86/boot.txt
+++ b/kernel/Documentation/x86/boot.txt
@@ -406,7 +406,7 @@ Protocol: 2.00+
- If 0, the protected-mode code is loaded at 0x10000.
- If 1, the protected-mode code is loaded at 0x100000.
- Bit 1 (kernel internal): ALSR_FLAG
+ Bit 1 (kernel internal): KASLR_FLAG
- Used internally by the compressed kernel to communicate
KASLR status to kernel proper.
If 1, KASLR enabled.
@@ -1124,7 +1124,6 @@ The boot loader *must* fill out the following fields in bp,
o hdr.code32_start
o hdr.cmd_line_ptr
- o hdr.cmdline_size
o hdr.ramdisk_image (if applicable)
o hdr.ramdisk_size (if applicable)
diff --git a/kernel/Documentation/x86/entry_64.txt b/kernel/Documentation/x86/entry_64.txt
index 9132b8617..c1df8eba9 100644
--- a/kernel/Documentation/x86/entry_64.txt
+++ b/kernel/Documentation/x86/entry_64.txt
@@ -1,14 +1,14 @@
This file documents some of the kernel entries in
-arch/x86/kernel/entry_64.S. A lot of this explanation is adapted from
+arch/x86/entry/entry_64.S. A lot of this explanation is adapted from
an email from Ingo Molnar:
http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
The x86 architecture has quite a few different ways to jump into
kernel code. Most of these entry points are registered in
-arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
-for 64-bit, arch/x86/kernel/entry_32.S for 32-bit and finally
-arch/x86/ia32/ia32entry.S which implements the 32-bit compatibility
+arch/x86/kernel/traps.c and implemented in arch/x86/entry/entry_64.S
+for 64-bit, arch/x86/entry/entry_32.S for 32-bit and finally
+arch/x86/entry/entry_64_compat.S which implements the 32-bit compatibility
syscall entry points and thus provides for 32-bit processes the
ability to execute syscalls when running on 64-bit kernels.
@@ -18,10 +18,10 @@ Some of these entries are:
- system_call: syscall instruction from 64-bit code.
- - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+ - entry_INT80_compat: int 0x80 from 32-bit or 64-bit code; compat syscall
either way.
- - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+ - entry_INT80_compat, ia32_sysenter: syscall and sysenter from 32-bit
code
- interrupt: An array of entries. Every IDT vector that doesn't
diff --git a/kernel/Documentation/x86/x86_64/kernel-stacks b/kernel/Documentation/x86/kernel-stacks
index e3c8a49d1..9a0aa4d3a 100644
--- a/kernel/Documentation/x86/x86_64/kernel-stacks
+++ b/kernel/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
Most of the text from Keith Owens, hacked by AK
x86_64 page size (PAGE_SIZE) is 4K.
@@ -13,7 +16,7 @@ associated with each CPU. These stacks are only used while the kernel
is in control on that CPU; when a CPU returns to user space the
specialized stacks contain no useful data. The main CPU stacks are:
-* Interrupt stack. IRQSTACKSIZE
+* Interrupt stack. IRQ_STACK_SIZE
Used for external hardware interrupts. If this is the first external
hardware interrupt (i.e. not a nested hardware interrupt) then the
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
The currently assigned IST stacks are :-
-* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
-
- Used for interrupt 12 - Stack Fault Exception (#SS).
-
- This allows the CPU to recover from invalid stack segments. Rarely
- happens.
-
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
Used for interrupt 8 - Double Fault Exception (#DF).
@@ -99,3 +95,47 @@ The currently assigned IST stacks are :-
assumptions about the previous state of the kernel stack.
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+ values on the kernel stack, from earlier function calls. This is
+ the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+ up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+ the right order, and try to cross from one stack into another
+ reconstructing the call chain. This works most of the time.
diff --git a/kernel/Documentation/x86/mtrr.txt b/kernel/Documentation/x86/mtrr.txt
index cc071dc33..dc3e70391 100644
--- a/kernel/Documentation/x86/mtrr.txt
+++ b/kernel/Documentation/x86/mtrr.txt
@@ -1,7 +1,31 @@
MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing out MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by
+drivers on Linux is now completely phased out, device drivers should use
+arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on
+non-PAT systems while a no-op but equally effective on PAT enabled systems.
+
+Even if Linux does not use MTRRs directly, some x86 platform firmware may still
+set up MTRRs early before booting the OS. They do this as some platform
+firmware may still have implemented access to MTRRs which would be controlled
+and handled by the platform firmware directly. An example of platform use of
+MTRRs is through the use of SMI handlers, one case could be for fan control,
+the platform code would need uncachable access to some of its fan control
+registers. Such platform access does not need any Operating System MTRR code in
+place other than mtrr_type_lookup() to ensure any OS specific mapping requests
+are aligned with platform MTRR setup. If MTRRs are only set up by the platform
+firmware code though and the OS does not make any specific MTRR mapping
+requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/kernel/Documentation/x86/pat.txt b/kernel/Documentation/x86/pat.txt
index cf08c9fff..54944c71b 100644
--- a/kernel/Documentation/x86/pat.txt
+++ b/kernel/Documentation/x86/pat.txt
@@ -12,7 +12,7 @@ virtual addresses.
PAT allows for different types of memory attributes. The most commonly used
ones that will be supported at this time are Write-back, Uncached,
-Write-combined and Uncached Minus.
+Write-combined, Write-through and Uncached Minus.
PAT APIs
@@ -34,16 +34,23 @@ ioremap | -- | UC- | UC- |
| | | |
ioremap_cache | -- | WB | WB |
| | | |
+ioremap_uc | -- | UC | UC |
+ | | | |
ioremap_nocache | -- | UC- | UC- |
| | | |
ioremap_wc | -- | -- | WC |
| | | |
+ioremap_wt | -- | -- | WT |
+ | | | |
set_memory_uc | UC- | -- | -- |
set_memory_wb | | | |
| | | |
set_memory_wc | WC | -- | -- |
set_memory_wb | | | |
| | | |
+set_memory_wt | WT | -- | -- |
+ set_memory_wb | | | |
+ | | | |
pci sysfs resource | -- | -- | UC- |
| | | |
pci sysfs resource_wc | -- | -- | WC |
@@ -102,7 +109,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc(). Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas. Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+
+----------------------------------------------------------------------
+MTRR Non-PAT PAT Linux ioremap value Effective memory type
+----------------------------------------------------------------------
+ Non-PAT | PAT
+ PAT
+ |PCD
+ ||PWT
+ |||
+WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
+WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
+WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
+WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
Notes:
@@ -115,8 +153,8 @@ can be more restrictive, in case of any existing aliasing for that address.
For example: If there is an existing uncached mapping, a new ioremap_wc can
return uncached mapping in place of write-combine requested.
-set_memory_[uc|wc] and set_memory_wb should be used in pairs, where driver will
-first make a region uc or wc and switch it back to wb after use.
+set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
+will first make a region uc, wc or wt and switch it back to wb after use.
Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
@@ -124,7 +162,7 @@ interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
types.
-Drivers should use set_memory_[uc|wc] to set access type for RAM ranges.
+Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
PAT debugging
diff --git a/kernel/Documentation/x86/x86_64/boot-options.txt b/kernel/Documentation/x86/x86_64/boot-options.txt
index 522347929..68ed3114c 100644
--- a/kernel/Documentation/x86/x86_64/boot-options.txt
+++ b/kernel/Documentation/x86/x86_64/boot-options.txt
@@ -31,6 +31,9 @@ Machine check
(e.g. BIOS or hardware monitoring applications), conflicting
with OS's error handling, and you cannot deactivate the agent,
then this option will be a help.
+ mce=no_lmce
+ Do not opt-in to Local MCE delivery. Use legacy method
+ to broadcast MCEs.
mce=bootlog
Enable logging of machine checks left over from booting.
Disabled by default on AMD because some BIOS leave bogus ones.
diff --git a/kernel/Documentation/x86/zero-page.txt b/kernel/Documentation/x86/zero-page.txt
index 82fbdbc1e..95a4d34af 100644
--- a/kernel/Documentation/x86/zero-page.txt
+++ b/kernel/Documentation/x86/zero-page.txt
@@ -17,7 +17,8 @@ Offset Proto Name Meaning
(struct ist_info)
080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!
090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!!
-0A0/010 ALL sys_desc_table System description table (struct sys_desc_table)
+0A0/010 ALL sys_desc_table System description table (struct sys_desc_table),
+ OBSOLETE!!
0B0/010 ALL olpc_ofw_header OLPC's OpenFirmware CIF and friends
0C0/004 ALL ext_ramdisk_image ramdisk_image high 32bits
0C4/004 ALL ext_ramdisk_size ramdisk_size high 32bits