diff options
author | Yunhong Jiang <yunhong.jiang@intel.com> | 2015-08-04 12:17:53 -0700 |
---|---|---|
committer | Yunhong Jiang <yunhong.jiang@intel.com> | 2015-08-04 15:44:42 -0700 |
commit | 9ca8dbcc65cfc63d6f5ef3312a33184e1d726e00 (patch) | |
tree | 1c9cafbcd35f783a87880a10f85d1a060db1a563 /kernel/Documentation/arm64 | |
parent | 98260f3884f4a202f9ca5eabed40b1354c489b29 (diff) |
Add the rt linux 4.1.3-rt3 as base
Import the rt linux 4.1.3-rt3 as OPNFV kvm base.
It's from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt and
the base is:
commit 0917f823c59692d751951bf5ea699a2d1e2f26a2
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Sat Jul 25 12:13:34 2015 +0200
Prepare v4.1.3-rt3
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
We lose all the git history this way and it's not good. We
should apply another opnfv project repo in future.
Change-Id: I87543d81c9df70d99c5001fbdf646b202c19f423
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Diffstat (limited to 'kernel/Documentation/arm64')
-rw-r--r-- | kernel/Documentation/arm64/acpi_object_usage.txt | 593 | ||||
-rw-r--r-- | kernel/Documentation/arm64/arm-acpi.txt | 505 | ||||
-rw-r--r-- | kernel/Documentation/arm64/booting.txt | 224 | ||||
-rw-r--r-- | kernel/Documentation/arm64/legacy_instructions.txt | 57 | ||||
-rw-r--r-- | kernel/Documentation/arm64/memory.txt | 94 | ||||
-rw-r--r-- | kernel/Documentation/arm64/tagged-pointers.txt | 34 |
6 files changed, 1507 insertions, 0 deletions
diff --git a/kernel/Documentation/arm64/acpi_object_usage.txt b/kernel/Documentation/arm64/acpi_object_usage.txt new file mode 100644 index 000000000..a6e1a1805 --- /dev/null +++ b/kernel/Documentation/arm64/acpi_object_usage.txt @@ -0,0 +1,593 @@ +ACPI Tables +----------- +The expectations of individual ACPI tables are discussed in the list that +follows. + +If a section number is used, it refers to a section number in the ACPI +specification where the object is defined. If "Signature Reserved" is used, +the table signature (the first four bytes of the table) is the only portion +of the table recognized by the specification, and the actual table is defined +outside of the UEFI Forum (see Section 5.2.6 of the specification). + +For ACPI on arm64, tables also fall into the following categories: + + -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT + + -- Recommended: BERT, EINJ, ERST, HEST, SSDT + + -- Optional: BGRT, CPEP, CSRT, DRTM, ECDT, FACS, FPDT, MCHI, MPST, + MSCT, RASF, SBST, SLIT, SPMI, SRAT, TCPA, TPM2, UEFI + + -- Not supported: BOOT, DBG2, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, + LPIT, MSDM, RSDT, SLIC, WAET, WDAT, WDRT, WPBT + + +Table Usage for ARMv8 Linux +----- ---------------------------------------------------------------- +BERT Section 18.3 (signature == "BERT") + == Boot Error Record Table == + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +BOOT Signature Reserved (signature == "BOOT") + == simple BOOT flag table == + Microsoft only table, will not be supported. + +BGRT Section 5.2.22 (signature == "BGRT") + == Boot Graphics Resource Table == + Optional, not currently supported, with no real use-case for an + ARM server. + +CPEP Section 5.2.18 (signature == "CPEP") + == Corrected Platform Error Polling table == + Optional, not currently supported, and not recommended until such + time as ARM-compatible hardware is available, and the specification + suitably modified. + +CSRT Signature Reserved (signature == "CSRT") + == Core System Resources Table == + Optional, not currently supported. + +DBG2 Signature Reserved (signature == "DBG2") + == DeBuG port table 2 == + Microsoft only table, will not be supported. + +DBGP Signature Reserved (signature == "DBGP") + == DeBuG Port table == + Microsoft only table, will not be supported. + +DSDT Section 5.2.11.1 (signature == "DSDT") + == Differentiated System Description Table == + A DSDT is required; see also SSDT. + + ACPI tables contain only one DSDT but can contain one or more SSDTs, + which are optional. Each SSDT can only add to the ACPI namespace, + but cannot modify or replace anything in the DSDT. + +DMAR Signature Reserved (signature == "DMAR") + == DMA Remapping table == + x86 only table, will not be supported. + +DRTM Signature Reserved (signature == "DRTM") + == Dynamic Root of Trust for Measurement table == + Optional, not currently supported. + +ECDT Section 5.2.16 (signature == "ECDT") + == Embedded Controller Description Table == + Optional, not currently supported, but could be used on ARM if and + only if one uses the GPE_BIT field to represent an IRQ number, since + there are no GPE blocks defined in hardware reduced mode. This would + need to be modified in the ACPI specification. + +EINJ Section 18.6 (signature == "EINJ") + == Error Injection table == + This table is very useful for testing platform response to error + conditions; it allows one to inject an error into the system as + if it had actually occurred. However, this table should not be + shipped with a production system; it should be dynamically loaded + and executed with the ACPICA tools only during testing. + +ERST Section 18.5 (signature == "ERST") + == Error Record Serialization Table == + On a platform supports RAS, this table must be supplied if it is not + UEFI-based; if it is UEFI-based, this table may be supplied. When this + table is not present, UEFI run time service will be utilized to save + and retrieve hardware error information to and from a persistent store. + +ETDT Signature Reserved (signature == "ETDT") + == Event Timer Description Table == + Obsolete table, will not be supported. + +FACS Section 5.2.10 (signature == "FACS") + == Firmware ACPI Control Structure == + It is unlikely that this table will be terribly useful. If it is + provided, the Global Lock will NOT be used since it is not part of + the hardware reduced profile, and only 64-bit address fields will + be considered valid. + +FADT Section 5.2.9 (signature == "FACP") + == Fixed ACPI Description Table == + Required for arm64. + + The HW_REDUCED_ACPI flag must be set. All of the fields that are + to be ignored when HW_REDUCED_ACPI is set are expected to be set to + zero. + + If an FACS table is provided, the X_FIRMWARE_CTRL field is to be + used, not FIRMWARE_CTRL. + + If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is + filled in properly -- that the PSCI_COMPLIANT flag is set and that + PSCI_USE_HVC is set or unset as needed (see table 5-37). + + For the DSDT that is also required, the X_DSDT field is to be used, + not the DSDT field. + +FPDT Section 5.2.23 (signature == "FPDT") + == Firmware Performance Data Table == + Optional, not currently supported. + +GTDT Section 5.2.24 (signature == "GTDT") + == Generic Timer Description Table == + Required for arm64. + +HEST Section 18.3.2 (signature == "HEST") + == Hardware Error Source Table == + Until further error source types are defined, use only types 6 (AER + Root Port), 7 (AER Endpoint), 8 (AER Bridge), or 9 (Generic Hardware + Error Source). Firmware first error handling is possible if and only + if Trusted Firmware is being used on arm64. + + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +HPET Signature Reserved (signature == "HPET") + == High Precision Event timer Table == + x86 only table, will not be supported. + +IBFT Signature Reserved (signature == "IBFT") + == iSCSI Boot Firmware Table == + Microsoft defined table, support TBD. + +IVRS Signature Reserved (signature == "IVRS") + == I/O Virtualization Reporting Structure == + x86_64 (AMD) only table, will not be supported. + +LPIT Signature Reserved (signature == "LPIT") + == Low Power Idle Table == + x86 only table as of ACPI 5.1; future versions have been adapted for + use with ARM and will be recommended in order to support ACPI power + management. + +MADT Section 5.2.12 (signature == "APIC") + == Multiple APIC Description Table == + Required for arm64. Only the GIC interrupt controller structures + should be used (types 0xA - 0xE). + +MCFG Signature Reserved (signature == "MCFG") + == Memory-mapped ConFiGuration space == + If the platform supports PCI/PCIe, an MCFG table is required. + +MCHI Signature Reserved (signature == "MCHI") + == Management Controller Host Interface table == + Optional, not currently supported. + +MPST Section 5.2.21 (signature == "MPST") + == Memory Power State Table == + Optional, not currently supported. + +MSDM Signature Reserved (signature == "MSDM") + == Microsoft Data Management table == + Microsoft only table, will not be supported. + +MSCT Section 5.2.19 (signature == "MSCT") + == Maximum System Characteristic Table == + Optional, not currently supported. + +RASF Section 5.2.20 (signature == "RASF") + == RAS Feature table == + Optional, not currently supported. + +RSDP Section 5.2.5 (signature == "RSD PTR") + == Root System Description PoinTeR == + Required for arm64. + +RSDT Section 5.2.7 (signature == "RSDT") + == Root System Description Table == + Since this table can only provide 32-bit addresses, it is deprecated + on arm64, and will not be used. + +SBST Section 5.2.14 (signature == "SBST") + == Smart Battery Subsystem Table == + Optional, not currently supported. + +SLIC Signature Reserved (signature == "SLIC") + == Software LIcensing table == + Microsoft only table, will not be supported. + +SLIT Section 5.2.17 (signature == "SLIT") + == System Locality distance Information Table == + Optional in general, but required for NUMA systems. + +SPCR Signature Reserved (signature == "SPCR") + == Serial Port Console Redirection table == + Required for arm64. + +SPMI Signature Reserved (signature == "SPMI") + == Server Platform Management Interface table == + Optional, not currently supported. + +SRAT Section 5.2.16 (signature == "SRAT") + == System Resource Affinity Table == + Optional, but if used, only the GICC Affinity structures are read. + To support NUMA, this table is required. + +SSDT Section 5.2.11.2 (signature == "SSDT") + == Secondary System Description Table == + These tables are a continuation of the DSDT; these are recommended + for use with devices that can be added to a running system, but can + also serve the purpose of dividing up device descriptions into more + manageable pieces. + + An SSDT can only ADD to the ACPI namespace. It cannot modify or + replace existing device descriptions already in the namespace. + + These tables are optional, however. ACPI tables should contain only + one DSDT but can contain many SSDTs. + +TCPA Signature Reserved (signature == "TCPA") + == Trusted Computing Platform Alliance table == + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +TPM2 Signature Reserved (signature == "TPM2") + == Trusted Platform Module 2 table == + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +UEFI Signature Reserved (signature == "UEFI") + == UEFI ACPI data table == + Optional, not currently supported. No known use case for arm64, + at present. + +WAET Signature Reserved (signature == "WAET") + == Windows ACPI Emulated devices Table == + Microsoft only table, will not be supported. + +WDAT Signature Reserved (signature == "WDAT") + == Watch Dog Action Table == + Microsoft only table, will not be supported. + +WDRT Signature Reserved (signature == "WDRT") + == Watch Dog Resource Table == + Microsoft only table, will not be supported. + +WPBT Signature Reserved (signature == "WPBT") + == Windows Platform Binary Table == + Microsoft only table, will not be supported. + +XSDT Section 5.2.8 (signature == "XSDT") + == eXtended System Description Table == + Required for arm64. + + +ACPI Objects +------------ +The expectations on individual ACPI objects are discussed in the list that +follows: + +Name Section Usage for ARMv8 Linux +---- ------------ ------------------------------------------------- +_ADR 6.1.1 Use as needed. + +_BBN 6.5.5 Use as needed; PCI-specific. + +_BDN 6.5.3 Optional; not likely to be used on arm64. + +_CCA 6.2.17 This method should be defined for all bus masters + on arm64. While cache coherency is assumed, making + it explicit ensures the kernel will set up DMA as + it should. + +_CDM 6.2.1 Optional, to be used only for processor devices. + +_CID 6.1.2 Use as needed. + +_CLS 6.1.3 Use as needed. + +_CRS 6.2.2 Required on arm64. + +_DCK 6.5.2 Optional; not likely to be used on arm64. + +_DDN 6.1.4 This field can be used for a device name. However, + it is meant for DOS device names (e.g., COM1), so be + careful of its use across OSes. + +_DEP 6.5.8 Use as needed. + +_DIS 6.2.3 Optional, for power management use. + +_DLM 5.7.5 Optional. + +_DMA 6.2.4 Optional. + +_DSD 6.2.5 To be used with caution. If this object is used, try + to use it within the constraints already defined by the + Device Properties UUID. Only in rare circumstances + should it be necessary to create a new _DSD UUID. + + In either case, submit the _DSD definition along with + any driver patches for discussion, especially when + device properties are used. A driver will not be + considered complete without a corresponding _DSD + description. Once approved by kernel maintainers, + the UUID or device properties must then be registered + with the UEFI Forum; this may cause some iteration as + more than one OS will be registering entries. + +_DSM Do not use this method. It is not standardized, the + return values are not well documented, and it is + currently a frequent source of error. + +_DSW 7.2.1 Use as needed; power management specific. + +_EDL 6.3.1 Optional. + +_EJD 6.3.2 Optional. + +_EJx 6.3.3 Optional. + +_FIX 6.2.7 x86 specific, not used on arm64. + +\_GL 5.7.1 This object is not to be used in hardware reduced + mode, and therefore should not be used on arm64. + +_GLK 6.5.7 This object requires a global lock be defined; there + is no global lock on arm64 since it runs in hardware + reduced mode. Hence, do not use this object on arm64. + +\_GPE 5.3.1 This namespace is for x86 use only. Do not use it + on arm64. + +_GSB 6.2.7 Optional. + +_HID 6.1.5 Use as needed. This is the primary object to use in + device probing, though _CID and _CLS may also be used. + +_HPP 6.2.8 Optional, PCI specific. + +_HPX 6.2.9 Optional, PCI specific. + +_HRV 6.1.6 Optional, use as needed to clarify device behavior; in + some cases, this may be easier to use than _DSD. + +_INI 6.5.1 Not required, but can be useful in setting up devices + when UEFI leaves them in a state that may not be what + the driver expects before it starts probing. + +_IRC 7.2.15 Use as needed; power management specific. + +_LCK 6.3.4 Optional. + +_MAT 6.2.10 Optional; see also the MADT. + +_MLS 6.1.7 Optional, but highly recommended for use in + internationalization. + +_OFF 7.1.2 It is recommended to define this method for any device + that can be turned on or off. + +_ON 7.1.3 It is recommended to define this method for any device + that can be turned on or off. + +\_OS 5.7.3 This method will return "Linux" by default (this is + the value of the macro ACPI_OS_NAME on Linux). The + command line parameter acpi_os=<string> can be used + to set it to some other value. + +_OSC 6.2.11 This method can be a global method in ACPI (i.e., + \_SB._OSC), or it may be associated with a specific + device (e.g., \_SB.DEV0._OSC), or both. When used + as a global method, only capabilities published in + the ACPI specification are allowed. When used as + a device-specific method, the process described for + using _DSD MUST be used to create an _OSC definition; + out-of-process use of _OSC is not allowed. That is, + submit the device-specific _OSC usage description as + part of the kernel driver submission, get it approved + by the kernel community, then register it with the + UEFI Forum. + +\_OSI 5.7.2 Deprecated on ARM64. Any invocation of this method + will print a warning on the console and return false. + That is, as far as ACPI firmware is concerned, _OSI + cannot be used to determine what sort of system is + being used or what functionality is provided. The + _OSC method is to be used instead. + +_OST 6.3.5 Optional. + +_PDC 8.4.1 Deprecated, do not use on arm64. + +\_PIC 5.8.1 The method should not be used. On arm64, the only + interrupt model available is GIC. + +_PLD 6.1.8 Optional. + +\_PR 5.3.1 This namespace is for x86 use only on legacy systems. + Do not use it on arm64. + +_PRS 6.2.12 Optional. + +_PRT 6.2.13 Required as part of the definition of all PCI root + devices. + +_PRW 7.2.13 Use as needed; power management specific. + +_PRx 7.2.8-11 Use as needed; power management specific. If _PR0 is + defined, _PR3 must also be defined. + +_PSC 7.2.6 Use as needed; power management specific. + +_PSE 7.2.7 Use as needed; power management specific. + +_PSW 7.2.14 Use as needed; power management specific. + +_PSx 7.2.2-5 Use as needed; power management specific. If _PS0 is + defined, _PS3 must also be defined. If clocks or + regulators need adjusting to be consistent with power + usage, change them in these methods. + +\_PTS 7.3.1 Use as needed; power management specific. + +_PXM 6.2.14 Optional. + +_REG 6.5.4 Use as needed. + +\_REV 5.7.4 Always returns the latest version of ACPI supported. + +_RMV 6.3.6 Optional. + +\_SB 5.3.1 Required on arm64; all devices must be defined in this + namespace. + +_SEG 6.5.6 Use as needed; PCI-specific. + +\_SI 5.3.1, Optional. + 9.1 + +_SLI 6.2.15 Optional; recommended when SLIT table is in use. + +_STA 6.3.7, It is recommended to define this method for any device + 7.1.4 that can be turned on or off. + +_SRS 6.2.16 Optional; see also _PRS. + +_STR 6.1.10 Recommended for conveying device names to end users; + this is preferred over using _DDN. + +_SUB 6.1.9 Use as needed; _HID or _CID are preferred. + +_SUN 6.1.11 Optional. + +\_Sx 7.3.2 Use as needed; power management specific. + +_SxD 7.2.16-19 Use as needed; power management specific. + +_SxW 7.2.20-24 Use as needed; power management specific. + +_SWS 7.3.3 Use as needed; power management specific; this may + require specification changes for use on arm64. + +\_TTS 7.3.4 Use as needed; power management specific. + +\_TZ 5.3.1 Optional. + +_UID 6.1.12 Recommended for distinguishing devices of the same + class; define it if at all possible. + +\_WAK 7.3.5 Use as needed; power management specific. + + +ACPI Event Model +---------------- +Do not use GPE block devices; these are not supported in the hardware reduced +profile used by arm64. Since there are no GPE blocks defined for use on ARM +platforms, GPIO-signaled interrupts should be used for creating system events. + + +ACPI Processor Control +---------------------- +Section 8 of the ACPI specification is currently undergoing change that +should be completed in the 6.0 version of the specification. Processor +performance control will be handled differently for arm64 at that point +in time. Processor aggregator devices (section 8.5) will not be used, +for example, but another similar mechanism instead. + +While UEFI constrains what we can say until the release of 6.0, it is +recommended that CPPC (8.4.5) be used as the primary model. This will +still be useful into the future. C-states and P-states will still be +provided, but most of the current design work appears to favor CPPC. + +Further, it is essential that the ARMv8 SoC provide a fully functional +implementation of PSCI; this will be the only mechanism supported by ACPI +to control CPU power state (including secondary CPU booting). + +More details will be provided on the release of the ACPI 6.0 specification. + + +ACPI System Address Map Interfaces +---------------------------------- +In Section 15 of the ACPI specification, several methods are mentioned as +possible mechanisms for conveying memory resource information to the kernel. +For arm64, we will only support UEFI for booting with ACPI, hence the UEFI +GetMemoryMap() boot service is the only mechanism that will be used. + + +ACPI Platform Error Interfaces (APEI) +------------------------------------- +The APEI tables supported are described above. + +APEI requires the equivalent of an SCI and an NMI on ARMv8. The SCI is used +to notify the OSPM of errors that have occurred but can be corrected and the +system can continue correct operation, even if possibly degraded. The NMI is +used to indicate fatal errors that cannot be corrected, and require immediate +attention. + +Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles +these slightly differently. The SCI is handled as a normal GPIO-signaled +interrupt; given that these are corrected (or correctable) errors being +reported, this is sufficient. The NMI is emulated as the highest priority +GPIO-signaled interrupt possible. This implies some caution must be used +since there could be interrupts at higher privilege levels or even interrupts +at the same priority as the emulated NMI. In Linux, this should not be the +case but one should be aware it could happen. + + +ACPI Objects Not Supported on ARM64 +----------------------------------- +While this may change in the future, there are several classes of objects +that can be defined, but are not currently of general interest to ARM servers. + +These are not supported: + + -- Section 9.2: ambient light sensor devices + + -- Section 9.3: battery devices + + -- Section 9.4: lids (e.g., laptop lids) + + -- Section 9.8.2: IDE controllers + + -- Section 9.9: floppy controllers + + -- Section 9.10: GPE block devices + + -- Section 9.15: PC/AT RTC/CMOS devices + + -- Section 9.16: user presence detection devices + + -- Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT + + -- Section 9.18: time and alarm devices (see 9.15) + + +ACPI Objects Not Yet Implemented +-------------------------------- +While these objects have x86 equivalents, and they do make some sense in ARM +servers, there is either no hardware available at present, or in some cases +there may not yet be a non-ARM implementation. Hence, they are currently not +implemented though that may change in the future. + +Not yet implemented are: + + -- Section 10: power source and power meter devices + + -- Section 11: thermal management + + -- Section 12: embedded controllers interface + + -- Section 13: SMBus interfaces + + -- Section 17: NUMA support (prototypes have been submitted for + review) diff --git a/kernel/Documentation/arm64/arm-acpi.txt b/kernel/Documentation/arm64/arm-acpi.txt new file mode 100644 index 000000000..570a4f8e1 --- /dev/null +++ b/kernel/Documentation/arm64/arm-acpi.txt @@ -0,0 +1,505 @@ +ACPI on ARMv8 Servers +--------------------- +ACPI can be used for ARMv8 general purpose servers designed to follow +the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server +Base Boot Requirements) [1] specifications. Please note that the SBBR +can be retrieved simply by visiting [1], but the SBSA is currently only +available to those with an ARM login due to ARM IP licensing concerns. + +The ARMv8 kernel implements the reduced hardware model of ACPI version +5.1 or later. Links to the specification and all external documents +it refers to are managed by the UEFI Forum. The specification is +available at http://www.uefi.org/specifications and documents referenced +by the specification can be found via http://www.uefi.org/acpi. + +If an ARMv8 system does not meet the requirements of the SBSA and SBBR, +or cannot be described using the mechanisms defined in the required ACPI +specifications, then ACPI may not be a good fit for the hardware. + +While the documents mentioned above set out the requirements for building +industry-standard ARMv8 servers, they also apply to more than one operating +system. The purpose of this document is to describe the interaction between +ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of +ACPI and what ACPI can expect of Linux. + + +Why ACPI on ARM? +---------------- +Before examining the details of the interface between ACPI and Linux, it is +useful to understand why ACPI is being used. Several technologies already +exist in Linux for describing non-enumerable hardware, after all. In this +section we summarize a blog post [2] from Grant Likely that outlines the +reasoning behind ACPI on ARMv8 servers. Actually, we snitch a good portion +of the summary text almost directly, to be honest. + +The short form of the rationale for ACPI on ARM is: + +-- ACPI’s bytecode (AML) allows the platform to encode hardware behavior, + while DT explicitly does not support this. For hardware vendors, being + able to encode behavior is a key tool used in supporting operating + system releases on new hardware. + +-- ACPI’s OSPM defines a power management model that constrains what the + platform is allowed to do into a specific model, while still providing + flexibility in hardware design. + +-- In the enterprise server environment, ACPI has established bindings (such + as for RAS) which are currently used in production systems. DT does not. + Such bindings could be defined in DT at some point, but doing so means ARM + and x86 would end up using completely different code paths in both firmware + and the kernel. + +-- Choosing a single interface to describe the abstraction between a platform + and an OS is important. Hardware vendors would not be required to implement + both DT and ACPI if they want to support multiple operating systems. And, + agreeing on a single interface instead of being fragmented into per OS + interfaces makes for better interoperability overall. + +-- The new ACPI governance process works well and Linux is now at the same + table as hardware vendors and other OS vendors. In fact, there is no + longer any reason to feel that ACPI is only belongs to Windows or that + Linux is in any way secondary to Microsoft in this arena. The move of + ACPI governance into the UEFI forum has significantly opened up the + specification development process, and currently, a large portion of the + changes being made to ACPI is being driven by Linux. + +Key to the use of ACPI is the support model. For servers in general, the +responsibility for hardware behaviour cannot solely be the domain of the +kernel, but rather must be split between the platform and the kernel, in +order to allow for orderly change over time. ACPI frees the OS from needing +to understand all the minute details of the hardware so that the OS doesn’t +need to be ported to each and every device individually. It allows the +hardware vendors to take responsibility for power management behaviour without +depending on an OS release cycle which is not under their control. + +ACPI is also important because hardware and OS vendors have already worked +out the mechanisms for supporting a general purpose computing ecosystem. The +infrastructure is in place, the bindings are in place, and the processes are +in place. DT does exactly what Linux needs it to when working with vertically +integrated devices, but there are no good processes for supporting what the +server vendors need. Linux could potentially get there with DT, but doing so +really just duplicates something that already works. ACPI already does what +the hardware vendors need, Microsoft won’t collaborate on DT, and hardware +vendors would still end up providing two completely separate firmware +interfaces -- one for Linux and one for Windows. + + +Kernel Compatibility +-------------------- +One of the primary motivations for ACPI is standardization, and using that +to provide backward compatibility for Linux kernels. In the server market, +software and hardware are often used for long periods. ACPI allows the +kernel and firmware to agree on a consistent abstraction that can be +maintained over time, even as hardware or software change. As long as the +abstraction is supported, systems can be updated without necessarily having +to replace the kernel. + +When a Linux driver or subsystem is first implemented using ACPI, it by +definition ends up requiring a specific version of the ACPI specification +-- it's baseline. ACPI firmware must continue to work, even though it may +not be optimal, with the earliest kernel version that first provides support +for that baseline version of ACPI. There may be a need for additional drivers, +but adding new functionality (e.g., CPU power management) should not break +older kernel versions. Further, ACPI firmware must also work with the most +recent version of the kernel. + + +Relationship with Device Tree +----------------------------- +ACPI support in drivers and subsystems for ARMv8 should never be mutually +exclusive with DT support at compile time. + +At boot time the kernel will only use one description method depending on +parameters passed from the bootloader (including kernel bootargs). + +Regardless of whether DT or ACPI is used, the kernel must always be capable +of booting with either scheme (in kernels with both schemes enabled at compile +time). + + +Booting using ACPI tables +------------------------- +The only defined method for passing ACPI tables to the kernel on ARMv8 +is via the UEFI system configuration table. Just so it is explicit, this +means that ACPI is only supported on platforms that boot via UEFI. + +When an ARMv8 system boots, it can either have DT information, ACPI tables, +or in some very unusual cases, both. If no command line parameters are used, +the kernel will try to use DT for device enumeration; if there is no DT +present, the kernel will try to use ACPI tables, but only if they are present. +In neither is available, the kernel will not boot. If acpi=force is used +on the command line, the kernel will attempt to use ACPI tables first, but +fall back to DT if there are no ACPI tables present. The basic idea is that +the kernel will not fail to boot unless it absolutely has no other choice. + +Processing of ACPI tables may be disabled by passing acpi=off on the kernel +command line; this is the default behavior. + +In order for the kernel to load and use ACPI tables, the UEFI implementation +MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with +the ACPI signature "RSD PTR "). If this pointer is incorrect and acpi=force +is used, the kernel will disable ACPI and try to use DT to boot instead; the +kernel has, in effect, determined that ACPI tables are not present at that +point. + +If the pointer to the RSDP table is correct, the table will be mapped into +the kernel by the ACPI core, using the address provided by UEFI. + +The ACPI core will then locate and map in all other ACPI tables provided by +using the addresses in the RSDP table to find the XSDT (eXtended System +Description Table). The XSDT in turn provides the addresses to all other +ACPI tables provided by the system firmware; the ACPI core will then traverse +this table and map in the tables listed. + +The ACPI core will ignore any provided RSDT (Root System Description Table). +RSDTs have been deprecated and are ignored on arm64 since they only allow +for 32-bit addresses. + +Further, the ACPI core will only use the 64-bit address fields in the FADT +(Fixed ACPI Description Table). Any 32-bit address fields in the FADT will +be ignored on arm64. + +Hardware reduced mode (see Section 4.1 of the ACPI 5.1 specification) will +be enforced by the ACPI core on arm64. Doing so allows the ACPI core to +run less complex code since it no longer has to provide support for legacy +hardware from other architectures. Any fields that are not to be used for +hardware reduced mode must be set to zero. + +For the ACPI core to operate properly, and in turn provide the information +the kernel needs to configure devices, it expects to find the following +tables (all section numbers refer to the ACPI 5.1 specfication): + + -- RSDP (Root System Description Pointer), section 5.2.5 + + -- XSDT (eXtended System Description Table), section 5.2.8 + + -- FADT (Fixed ACPI Description Table), section 5.2.9 + + -- DSDT (Differentiated System Description Table), section + 5.2.11.1 + + -- MADT (Multiple APIC Description Table), section 5.2.12 + + -- GTDT (Generic Timer Description Table), section 5.2.24 + + -- If PCI is supported, the MCFG (Memory mapped ConFiGuration + Table), section 5.2.6, specifically Table 5-31. + +If the above tables are not all present, the kernel may or may not be +able to boot properly since it may not be able to configure all of the +devices available. + + +ACPI Detection +-------------- +Drivers should determine their probe() type by checking for a null +value for ACPI_HANDLE, or checking .of_node, or other information in +the device structure. This is detailed further in the "Driver +Recommendations" section. + +In non-driver code, if the presence of ACPI needs to be detected at +runtime, then check the value of acpi_disabled. If CONFIG_ACPI is not +set, acpi_disabled will always be 1. + + +Device Enumeration +------------------ +Device descriptions in ACPI should use standard recognized ACPI interfaces. +These may contain less information than is typically provided via a Device +Tree description for the same device. This is also one of the reasons that +ACPI can be useful -- the driver takes into account that it may have less +detailed information about the device and uses sensible defaults instead. +If done properly in the driver, the hardware can change and improve over +time without the driver having to change at all. + +Clocks provide an excellent example. In DT, clocks need to be specified +and the drivers need to take them into account. In ACPI, the assumption +is that UEFI will leave the device in a reasonable default state, including +any clock settings. If for some reason the driver needs to change a clock +value, this can be done in an ACPI method; all the driver needs to do is +invoke the method and not concern itself with what the method needs to do +to change the clock. Changing the hardware can then take place over time +by changing what the ACPI method does, and not the driver. + +In DT, the parameters needed by the driver to set up clocks as in the example +above are known as "bindings"; in ACPI, these are known as "Device Properties" +and provided to a driver via the _DSD object. + +ACPI tables are described with a formal language called ASL, the ACPI +Source Language (section 19 of the specification). This means that there +are always multiple ways to describe the same thing -- including device +properties. For example, device properties could use an ASL construct +that looks like this: Name(KEY0, "value0"). An ACPI device driver would +then retrieve the value of the property by evaluating the KEY0 object. +However, using Name() this way has multiple problems: (1) ACPI limits +names ("KEY0") to four characters unlike DT; (2) there is no industry +wide registry that maintains a list of names, minimzing re-use; (3) +there is also no registry for the definition of property values ("value0"), +again making re-use difficult; and (4) how does one maintain backward +compatibility as new hardware comes out? The _DSD method was created +to solve precisely these sorts of problems; Linux drivers should ALWAYS +use the _DSD method for device properties and nothing else. + +The _DSM object (ACPI Section 9.14.1) could also be used for conveying +device properties to a driver. Linux drivers should only expect it to +be used if _DSD cannot represent the data required, and there is no way +to create a new UUID for the _DSD object. Note that there is even less +regulation of the use of _DSM than there is of _DSD. Drivers that depend +on the contents of _DSM objects will be more difficult to maintain over +time because of this; as of this writing, the use of _DSM is the cause +of quite a few firmware problems and is not recommended. + +Drivers should look for device properties in the _DSD object ONLY; the _DSD +object is described in the ACPI specification section 6.2.5, but this only +describes how to define the structure of an object returned via _DSD, and +how specific data structures are defined by specific UUIDs. Linux should +only use the _DSD Device Properties UUID [5]: + + -- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 + + -- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf + +The UEFI Forum provides a mechanism for registering device properties [4] +so that they may be used across all operating systems supporting ACPI. +Device properties that have not been registered with the UEFI Forum should +not be used. + +Before creating new device properties, check to be sure that they have not +been defined before and either registered in the Linux kernel documentation +as DT bindings, or the UEFI Forum as device properties. While we do not want +to simply move all DT bindings into ACPI device properties, we can learn from +what has been previously defined. + +If it is necessary to define a new device property, or if it makes sense to +synthesize the definition of a binding so it can be used in any firmware, +both DT bindings and ACPI device properties for device drivers have review +processes. Use them both. When the driver itself is submitted for review +to the Linux mailing lists, the device property definitions needed must be +submitted at the same time. A driver that supports ACPI and uses device +properties will not be considered complete without their definitions. Once +the device property has been accepted by the Linux community, it must be +registered with the UEFI Forum [4], which will review it again for consistency +within the registry. This may require iteration. The UEFI Forum, though, +will always be the canonical site for device property definitions. + +It may make sense to provide notice to the UEFI Forum that there is the +intent to register a previously unused device property name as a means of +reserving the name for later use. Other operating system vendors will +also be submitting registration requests and this may help smooth the +process. + +Once registration and review have been completed, the kernel provides an +interface for looking up device properties in a manner independent of +whether DT or ACPI is being used. This API should be used [6]; it can +eliminate some duplication of code paths in driver probing functions and +discourage divergence between DT bindings and ACPI device properties. + + +Programmable Power Control Resources +------------------------------------ +Programmable power control resources include such resources as voltage/current +providers (regulators) and clock sources. + +With ACPI, the kernel clock and regulator framework is not expected to be used +at all. + +The kernel assumes that power control of these resources is represented with +Power Resource Objects (ACPI section 7.1). The ACPI core will then handle +correctly enabling and disabling resources as they are needed. In order to +get that to work, ACPI assumes each device has defined D-states and that these +can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3; +in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for +turning a device full off. + +There are two options for using those Power Resources. They can: + + -- be managed in a _PSx method which gets called on entry to power + state Dx. + + -- be declared separately as power resources with their own _ON and _OFF + methods. They are then tied back to D-states for a particular device + via _PRx which specifies which power resources a device needs to be on + while in Dx. Kernel then tracks number of devices using a power resource + and calls _ON/_OFF as needed. + +The kernel ACPI code will also assume that the _PSx methods follow the normal +ACPI rules for such methods: + + -- If either _PS0 or _PS3 is implemented, then the other method must also + be implemented. + + -- If a device requires usage or setup of a power resource when on, the ASL + should organize that it is allocated/enabled using the _PS0 method. + + -- Resources allocated or enabled in the _PS0 method should be disabled + or de-allocated in the _PS3 method. + + -- Firmware will leave the resources in a reasonable state before handing + over control to the kernel. + +Such code in _PSx methods will of course be very platform specific. But, +this allows the driver to abstract out the interface for operating the device +and avoid having to read special non-standard values from ACPI tables. Further, +abstracting the use of these resources allows the hardware to change over time +without requiring updates to the driver. + + +Clocks +------ +ACPI makes the assumption that clocks are initialized by the firmware -- +UEFI, in this case -- to some working value before control is handed over +to the kernel. This has implications for devices such as UARTs, or SoC-driven +LCD displays, for example. + +When the kernel boots, the clocks are assumed to be set to reasonable +working values. If for some reason the frequency needs to change -- e.g., +throttling for power management -- the device driver should expect that +process to be abstracted out into some ACPI method that can be invoked +(please see the ACPI specification for further recommendations on standard +methods to be expected). The only exceptions to this are CPU clocks where +CPPC provides a much richer interface than ACPI methods. If the clocks +are not set, there is no direct way for Linux to control them. + +If an SoC vendor wants to provide fine-grained control of the system clocks, +they could do so by providing ACPI methods that could be invoked by Linux +drivers. However, this is NOT recommended and Linux drivers should NOT use +such methods, even if they are provided. Such methods are not currently +standardized in the ACPI specification, and using them could tie a kernel +to a very specific SoC, or tie an SoC to a very specific version of the +kernel, both of which we are trying to avoid. + + +Driver Recommendations +---------------------- +DO NOT remove any DT handling when adding ACPI support for a driver. The +same device may be used on many different systems. + +DO try to structure the driver so that it is data-driven. That is, set up +a struct containing internal per-device state based on defaults and whatever +else must be discovered by the driver probe function. Then, have the rest +of the driver operate off of the contents of that struct. Doing so should +allow most divergence between ACPI and DT functionality to be kept local to +the probe function instead of being scattered throughout the driver. For +example: + +static int device_probe_dt(struct platform_device *pdev) +{ + /* DT specific functionality */ + ... +} + +static int device_probe_acpi(struct platform_device *pdev) +{ + /* ACPI specific functionality */ + ... +} + +static int device_probe(struct platform_device *pdev) +{ + ... + struct device_node node = pdev->dev.of_node; + ... + + if (node) + ret = device_probe_dt(pdev); + else if (ACPI_HANDLE(&pdev->dev)) + ret = device_probe_acpi(pdev); + else + /* other initialization */ + ... + /* Continue with any generic probe operations */ + ... +} + +DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it +clear the different names the driver is probed for, both from DT and from +ACPI: + +static struct of_device_id virtio_mmio_match[] = { + { .compatible = "virtio,mmio", }, + { } +}; +MODULE_DEVICE_TABLE(of, virtio_mmio_match); + +static const struct acpi_device_id virtio_mmio_acpi_match[] = { + { "LNRO0005", }, + { } +}; +MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); + + +ASWG +---- +The ACPI specification changes regularly. During the year 2014, for instance, +version 5.1 was released and version 6.0 substantially completed, with most of +the changes being driven by ARM-specific requirements. Proposed changes are +presented and discussed in the ASWG (ACPI Specification Working Group) which +is a part of the UEFI Forum. + +Participation in this group is open to all UEFI members. Please see +http://www.uefi.org/workinggroup for details on group membership. + +It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification +as closely as possible, and to only implement functionality that complies with +the released standards from UEFI ASWG. As a practical matter, there will be +vendors that provide bad ACPI tables or violate the standards in some way. +If this is because of errors, quirks and fixups may be necessary, but will +be avoided if possible. If there are features missing from ACPI that preclude +it from being used on a platform, ECRs (Engineering Change Requests) should be +submitted to ASWG and go through the normal approval process; for those that +are not UEFI members, many other members of the Linux community are and would +likely be willing to assist in submitting ECRs. + + +Linux Code +---------- +Individual items specific to Linux on ARM, contained in the the Linux +source code, are in the list that follows: + +ACPI_OS_NAME This macro defines the string to be returned when + an ACPI method invokes the _OS method. On ARM64 + systems, this macro will be "Linux" by default. + The command line parameter acpi_os=<string> + can be used to set it to some other value. The + default value for other architectures is "Microsoft + Windows NT", for example. + +ACPI Objects +------------ +Detailed expectations for ACPI tables and object are listed in the file +Documentation/arm64/acpi_object_usage.txt. + + +References +---------- +[0] http://silver.arm.com -- document ARM-DEN-0029, or newer + "Server Base System Architecture", version 2.3, dated 27 Mar 2014 + +[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf + Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System + Software on ARM Platforms", dated 16 Aug 2014 + +[2] http://www.secretlab.ca/archives/151, 10 Jan 2015, Copyright (c) 2015, + Linaro Ltd., written by Grant Likely. A copy of the verbatim text (apart + from formatting) is also in Documentation/arm64/why_use_acpi.txt. + +[3] AMD ACPI for Seattle platform documentation: + http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf + +[4] http://www.uefi.org/acpi -- please see the link for the "ACPI _DSD Device + Property Registry Instructions" + +[5] http://www.uefi.org/acpi -- please see the link for the "_DSD (Device + Specific Data) Implementation Guide" + +[6] Kernel code for the unified device property interface can be found in + include/linux/property.h and drivers/base/property.c. + + +Authors +------- +Al Stone <al.stone@linaro.org> +Graeme Gregory <graeme.gregory@linaro.org> +Hanjun Guo <hanjun.guo@linaro.org> + +Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section diff --git a/kernel/Documentation/arm64/booting.txt b/kernel/Documentation/arm64/booting.txt new file mode 100644 index 000000000..f3c05b5f9 --- /dev/null +++ b/kernel/Documentation/arm64/booting.txt @@ -0,0 +1,224 @@ + Booting AArch64 Linux + ===================== + +Author: Will Deacon <will.deacon@arm.com> +Date : 07 September 2012 + +This document is based on the ARM booting document by Russell King and +is relevant to all public releases of the AArch64 Linux kernel. + +The AArch64 exception model is made up of a number of exception levels +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure +counterpart. EL2 is the hypervisor level and exists only in non-secure +mode. EL3 is the highest priority level and exists only in secure mode. + +For the purposes of this document, we will use the term `boot loader' +simply to define all software that executes on the CPU(s) before control +is passed to the Linux kernel. This may include secure monitor and +hypervisor code, or it may just be a handful of instructions for +preparing a minimal boot environment. + +Essentially, the boot loader should provide (as a minimum) the +following: + +1. Setup and initialise the RAM +2. Setup the device tree +3. Decompress the kernel image +4. Call the kernel image + + +1. Setup and initialise RAM +--------------------------- + +Requirement: MANDATORY + +The boot loader is expected to find and initialise all RAM that the +kernel will use for volatile data storage in the system. It performs +this in a machine dependent manner. (It may use internal algorithms +to automatically locate and size all RAM, or it may use knowledge of +the RAM in the machine, or any other method the boot loader designer +sees fit.) + + +2. Setup the device tree +------------------------- + +Requirement: MANDATORY + +The device tree blob (dtb) must be placed on an 8-byte boundary within +the first 512 megabytes from the start of the kernel image and must not +cross a 2-megabyte boundary. This is to allow the kernel to map the +blob using a single section mapping in the initial page tables. + + +3. Decompress the kernel image +------------------------------ + +Requirement: OPTIONAL + +The AArch64 kernel does not currently provide a decompressor and +therefore requires decompression (gzip etc.) to be performed by the boot +loader if a compressed Image target (e.g. Image.gz) is used. For +bootloaders that do not implement this requirement, the uncompressed +Image target is available instead. + + +4. Call the kernel image +------------------------ + +Requirement: MANDATORY + +The decompressed kernel image contains a 64-byte header as follows: + + u32 code0; /* Executable code */ + u32 code1; /* Executable code */ + u64 text_offset; /* Image load offset, little endian */ + u64 image_size; /* Effective Image size, little endian */ + u64 flags; /* kernel flags, little endian */ + u64 res2 = 0; /* reserved */ + u64 res3 = 0; /* reserved */ + u64 res4 = 0; /* reserved */ + u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ + u32 res5; /* reserved (used for PE COFF offset) */ + + +Header notes: + +- As of v3.17, all fields are little endian unless stated otherwise. + +- code0/code1 are responsible for branching to stext. + +- when booting through EFI, code0/code1 are initially skipped. + res5 is an offset to the PE header and the PE header has the EFI + entry point (efi_stub_entry). When the stub has done its work, it + jumps to code0 to resume the normal boot process. + +- Prior to v3.17, the endianness of text_offset was not specified. In + these cases image_size is zero and text_offset is 0x80000 in the + endianness of the kernel. Where image_size is non-zero image_size is + little-endian and must be respected. Where image_size is zero, + text_offset can be assumed to be 0x80000. + +- The flags field (introduced in v3.17) is a little-endian 64-bit field + composed as follows: + Bit 0: Kernel endianness. 1 if BE, 0 if LE. + Bits 1-63: Reserved. + +- When image_size is zero, a bootloader should attempt to keep as much + memory as possible free for use by the kernel immediately after the + end of the kernel image. The amount of space required will vary + depending on selected features, and is effectively unbound. + +The Image must be placed text_offset bytes from a 2MB aligned base +address near the start of usable system RAM and called there. Memory +below that base address is currently unusable by Linux, and therefore it +is strongly recommended that this location is the start of system RAM. +At least image_size bytes from the start of the image must be free for +use by the kernel. + +Any memory described to the kernel (even that below the 2MB aligned base +address) which is not marked as reserved from the kernel e.g. with a +memreserve region in the device tree) will be considered as available to +the kernel. + +Before jumping into the kernel, the following conditions must be met: + +- Quiesce all DMA capable devices so that memory does not get + corrupted by bogus network packets or disk data. This will save + you many hours of debug. + +- Primary CPU general-purpose register settings + x0 = physical address of device tree blob (dtb) in system RAM. + x1 = 0 (reserved for future use) + x2 = 0 (reserved for future use) + x3 = 0 (reserved for future use) + +- CPU mode + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, + IRQ and FIQ). + The CPU must be in either EL2 (RECOMMENDED in order to have access to + the virtualisation extensions) or non-secure EL1. + +- Caches, MMUs + The MMU must be off. + Instruction cache may be on or off. + The address range corresponding to the loaded kernel image must be + cleaned to the PoC. In the presence of a system cache or other + coherent masters with caches enabled, this will typically require + cache maintenance by VA rather than set/way operations. + System caches which respect the architected cache maintenance by VA + operations must be configured and may be enabled. + System caches which do not respect architected cache maintenance by VA + operations (not recommended) must be configured and disabled. + +- Architected timers + CNTFRQ must be programmed with the timer frequency and CNTVOFF must + be programmed with a consistent value on all CPUs. If entering the + kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where + available. + +- Coherency + All CPUs to be booted by the kernel must be part of the same coherency + domain on entry to the kernel. This may require IMPLEMENTATION DEFINED + initialisation to enable the receiving of maintenance operations on + each CPU. + +- System registers + All writable architected system registers at the exception level where + the kernel image will be entered must be initialised by software at a + higher exception level to prevent execution in an UNKNOWN state. + + For systems with a GICv3 interrupt controller: + - If EL3 is present: + ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. + ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. + - If the kernel is entered at EL1: + ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 + ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. + +The requirements described above for CPU mode, caches, MMUs, architected +timers, coherency and system registers apply to all CPUs. All CPUs must +enter the kernel in the same exception level. + +The boot loader is expected to enter the kernel on each CPU in the +following manner: + +- The primary CPU must jump directly to the first instruction of the + kernel image. The device tree blob passed by this CPU must contain + an 'enable-method' property for each cpu node. The supported + enable-methods are described below. + + It is expected that the bootloader will generate these device tree + properties and insert them into the blob prior to kernel entry. + +- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' + property in their cpu node. This property identifies a + naturally-aligned 64-bit zero-initalised memory location. + + These CPUs should spin outside of the kernel in a reserved area of + memory (communicated to the kernel by a /memreserve/ region in the + device tree) polling their cpu-release-addr location, which must be + contained in the reserved region. A wfe instruction may be inserted + to reduce the overhead of the busy-loop and a sev will be issued by + the primary CPU. When a read of the location pointed to by the + cpu-release-addr returns a non-zero value, the CPU must jump to this + value. The value will be written as a single 64-bit little-endian + value, so CPUs must convert the read value to their native endianness + before jumping to it. + +- CPUs with a "psci" enable method should remain outside of + the kernel (i.e. outside of the regions of memory described to the + kernel in the memory node, or in a reserved area of memory described + to the kernel by a /memreserve/ region in the device tree). The + kernel will issue CPU_ON calls as described in ARM document number ARM + DEN 0022A ("Power State Coordination Interface System Software on ARM + processors") to bring CPUs into the kernel. + + The device tree should contain a 'psci' node, as described in + Documentation/devicetree/bindings/arm/psci.txt. + +- Secondary CPU general-purpose register settings + x0 = 0 (reserved for future use) + x1 = 0 (reserved for future use) + x2 = 0 (reserved for future use) + x3 = 0 (reserved for future use) diff --git a/kernel/Documentation/arm64/legacy_instructions.txt b/kernel/Documentation/arm64/legacy_instructions.txt new file mode 100644 index 000000000..01bf3d9fa --- /dev/null +++ b/kernel/Documentation/arm64/legacy_instructions.txt @@ -0,0 +1,57 @@ +The arm64 port of the Linux kernel provides infrastructure to support +emulation of instructions which have been deprecated, or obsoleted in +the architecture. The infrastructure code uses undefined instruction +hooks to support emulation. Where available it also allows turning on +the instruction execution in hardware. + +The emulation mode can be controlled by writing to sysctl nodes +(/proc/sys/abi). The following explains the different execution +behaviours and the corresponding values of the sysctl nodes - + +* Undef + Value: 0 + Generates undefined instruction abort. Default for instructions that + have been obsoleted in the architecture, e.g., SWP + +* Emulate + Value: 1 + Uses software emulation. To aid migration of software, in this mode + usage of emulated instruction is traced as well as rate limited + warnings are issued. This is the default for deprecated + instructions, .e.g., CP15 barriers + +* Hardware Execution + Value: 2 + Although marked as deprecated, some implementations may support the + enabling/disabling of hardware support for the execution of these + instructions. Using hardware execution generally provides better + performance, but at the loss of ability to gather runtime statistics + about the use of the deprecated instructions. + +The default mode depends on the status of the instruction in the +architecture. Deprecated instructions should default to emulation +while obsolete instructions must be undefined by default. + +Note: Instruction emulation may not be possible in all cases. See +individual instruction notes for further information. + +Supported legacy instructions +----------------------------- +* SWP{B} +Node: /proc/sys/abi/swp +Status: Obsolete +Default: Undef (0) + +* CP15 Barriers +Node: /proc/sys/abi/cp15_barrier +Status: Deprecated +Default: Emulate (1) + +* SETEND +Node: /proc/sys/abi/setend +Status: Deprecated +Default: Emulate (1)* +Note: All the cpus on the system must have mixed endian support at EL0 +for this feature to be enabled. If a new CPU - which doesn't support mixed +endian - is hotplugged in after this feature has been enabled, there could +be unexpected results in the application. diff --git a/kernel/Documentation/arm64/memory.txt b/kernel/Documentation/arm64/memory.txt new file mode 100644 index 000000000..d7273a5f6 --- /dev/null +++ b/kernel/Documentation/arm64/memory.txt @@ -0,0 +1,94 @@ + Memory Layout on AArch64 Linux + ============================== + +Author: Catalin Marinas <catalin.marinas@arm.com> + +This document describes the virtual memory layout used by the AArch64 +Linux kernel. The architecture allows up to 4 levels of translation +tables with a 4KB page size and up to 3 levels with a 64KB page size. + +AArch64 Linux uses either 3 levels or 4 levels of translation tables +with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit +(256TB) virtual addresses, respectively, for both user and kernel. With +64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) +virtual address, are used but the memory layout is the same. + +User addresses have bits 63:48 set to 0 while the kernel addresses have +the same bits set to 1. TTBRx selection is given by bit 63 of the +virtual address. The swapper_pg_dir contains only kernel (global) +mappings while the user pgd contains only user (non-global) mappings. +The swapper_pg_dir address is written to TTBR1 and never written to +TTBR0. + + +AArch64 Linux memory layout with 4KB pages + 3 levels: + +Start End Size Use +----------------------------------------------------------------------- +0000000000000000 0000007fffffffff 512GB user +ffffff8000000000 ffffffffffffffff 512GB kernel + + +AArch64 Linux memory layout with 4KB pages + 4 levels: + +Start End Size Use +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB user +ffff000000000000 ffffffffffffffff 256TB kernel + + +AArch64 Linux memory layout with 64KB pages + 2 levels: + +Start End Size Use +----------------------------------------------------------------------- +0000000000000000 000003ffffffffff 4TB user +fffffc0000000000 ffffffffffffffff 4TB kernel + + +AArch64 Linux memory layout with 64KB pages + 3 levels: + +Start End Size Use +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB user +ffff000000000000 ffffffffffffffff 256TB kernel + + +For details of the virtual kernel memory layout please see the kernel +booting log. + + +Translation table lookup with 4KB pages: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] in-page offset + | | | | +-> [20:12] L3 index + | | | +-----------> [29:21] L2 index + | | +---------------------> [38:30] L1 index + | +-------------------------------> [47:39] L0 index + +-------------------------------------------------> [63] TTBR0/1 + + +Translation table lookup with 64KB pages: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] in-page offset + | | | +----------> [28:16] L3 index + | | +--------------------------> [41:29] L2 index + | +-------------------------------> [47:42] L1 index + +-------------------------------------------------> [63] TTBR0/1 + + +When using KVM, the hypervisor maps kernel pages in EL2, at a fixed +offset from the kernel VA (top 24bits of the kernel VA set to zero): + +Start End Size Use +----------------------------------------------------------------------- +0000004000000000 0000007fffffffff 256GB kernel objects mapped in HYP diff --git a/kernel/Documentation/arm64/tagged-pointers.txt b/kernel/Documentation/arm64/tagged-pointers.txt new file mode 100644 index 000000000..d9995f1f5 --- /dev/null +++ b/kernel/Documentation/arm64/tagged-pointers.txt @@ -0,0 +1,34 @@ + Tagged virtual addresses in AArch64 Linux + ========================================= + +Author: Will Deacon <will.deacon@arm.com> +Date : 12 June 2013 + +This document briefly describes the provision of tagged virtual +addresses in the AArch64 translation system and their potential uses +in AArch64 Linux. + +The kernel configures the translation tables so that translations made +via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of +the virtual address ignored by the translation hardware. This frees up +this byte for application use, with the following caveats: + + (1) The kernel requires that all user addresses passed to EL1 + are tagged with tag 0x00. This means that any syscall + parameters containing user virtual addresses *must* have + their top byte cleared before trapping to the kernel. + + (2) Non-zero tags are not preserved when delivering signals. + This means that signal handlers in applications making use + of tags cannot rely on the tag information for user virtual + addresses being maintained for fields inside siginfo_t. + One exception to this rule is for signals raised in response + to watchpoint debug exceptions, where the tag information + will be preserved. + + (3) Special care should be taken when using tagged pointers, + since it is likely that C compilers will not hazard two + virtual addresses differing only in the upper byte. + +The architecture prevents the use of a tagged PC, so the upper byte will +be set to a sign-extension of bit 55 on exception return. |