From 9ca8dbcc65cfc63d6f5ef3312a33184e1d726e00 Mon Sep 17 00:00:00 2001 From: Yunhong Jiang Date: Tue, 4 Aug 2015 12:17:53 -0700 Subject: Add the rt linux 4.1.3-rt3 as base Import the rt linux 4.1.3-rt3 as OPNFV kvm base. It's from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt and the base is: commit 0917f823c59692d751951bf5ea699a2d1e2f26a2 Author: Sebastian Andrzej Siewior Date: Sat Jul 25 12:13:34 2015 +0200 Prepare v4.1.3-rt3 Signed-off-by: Sebastian Andrzej Siewior We lose all the git history this way and it's not good. We should apply another opnfv project repo in future. Change-Id: I87543d81c9df70d99c5001fbdf646b202c19f423 Signed-off-by: Yunhong Jiang --- kernel/Documentation/cpu-freq/amd-powernow.txt | 38 +++ kernel/Documentation/cpu-freq/boost.txt | 93 ++++++++ kernel/Documentation/cpu-freq/core.txt | 123 ++++++++++ kernel/Documentation/cpu-freq/cpu-drivers.txt | 274 ++++++++++++++++++++++ kernel/Documentation/cpu-freq/cpufreq-nforce2.txt | 19 ++ kernel/Documentation/cpu-freq/cpufreq-stats.txt | 128 ++++++++++ kernel/Documentation/cpu-freq/governors.txt | 269 +++++++++++++++++++++ kernel/Documentation/cpu-freq/index.txt | 54 +++++ kernel/Documentation/cpu-freq/intel-pstate.txt | 64 +++++ kernel/Documentation/cpu-freq/pcc-cpufreq.txt | 207 ++++++++++++++++ kernel/Documentation/cpu-freq/user-guide.txt | 224 ++++++++++++++++++ 11 files changed, 1493 insertions(+) create mode 100644 kernel/Documentation/cpu-freq/amd-powernow.txt create mode 100644 kernel/Documentation/cpu-freq/boost.txt create mode 100644 kernel/Documentation/cpu-freq/core.txt create mode 100644 kernel/Documentation/cpu-freq/cpu-drivers.txt create mode 100644 kernel/Documentation/cpu-freq/cpufreq-nforce2.txt create mode 100644 kernel/Documentation/cpu-freq/cpufreq-stats.txt create mode 100644 kernel/Documentation/cpu-freq/governors.txt create mode 100644 kernel/Documentation/cpu-freq/index.txt create mode 100644 kernel/Documentation/cpu-freq/intel-pstate.txt create mode 100644 kernel/Documentation/cpu-freq/pcc-cpufreq.txt create mode 100644 kernel/Documentation/cpu-freq/user-guide.txt (limited to 'kernel/Documentation/cpu-freq') diff --git a/kernel/Documentation/cpu-freq/amd-powernow.txt b/kernel/Documentation/cpu-freq/amd-powernow.txt new file mode 100644 index 000000000..254da155f --- /dev/null +++ b/kernel/Documentation/cpu-freq/amd-powernow.txt @@ -0,0 +1,38 @@ + +PowerNow! and Cool'n'Quiet are AMD names for frequency +management capabilities in AMD processors. As the hardware +implementation changes in new generations of the processors, +there is a different cpu-freq driver for each generation. + +Note that the driver's will not load on the "wrong" hardware, +so it is safe to try each driver in turn when in doubt as to +which is the correct driver. + +Note that the functionality to change frequency (and voltage) +is not available in all processors. The drivers will refuse +to load on processors without this capability. The capability +is detected with the cpuid instruction. + +The drivers use BIOS supplied tables to obtain frequency and +voltage information appropriate for a particular platform. +Frequency transitions will be unavailable if the BIOS does +not supply these tables. + +6th Generation: powernow-k6 + +7th Generation: powernow-k7: Athlon, Duron, Geode. + +8th Generation: powernow-k8: Athlon, Athlon 64, Opteron, Sempron. +Documentation on this functionality in 8th generation processors +is available in the "BIOS and Kernel Developer's Guide", publication +26094, in chapter 9, available for download from www.amd.com. + +BIOS supplied data, for powernow-k7 and for powernow-k8, may be +from either the PSB table or from ACPI objects. The ACPI support +is only available if the kernel config sets CONFIG_ACPI_PROCESSOR. +The powernow-k8 driver will attempt to use ACPI if so configured, +and fall back to PST if that fails. +The powernow-k7 driver will try to use the PSB support first, and +fall back to ACPI if the PSB support fails. A module parameter, +acpi_force, is provided to force ACPI support to be used instead +of PSB support. diff --git a/kernel/Documentation/cpu-freq/boost.txt b/kernel/Documentation/cpu-freq/boost.txt new file mode 100644 index 000000000..dd62e1334 --- /dev/null +++ b/kernel/Documentation/cpu-freq/boost.txt @@ -0,0 +1,93 @@ +Processor boosting control + + - information for users - + +Quick guide for the impatient: +-------------------- +/sys/devices/system/cpu/cpufreq/boost +controls the boost setting for the whole system. You can read and write +that file with either "0" (boosting disabled) or "1" (boosting allowed). +Reading or writing 1 does not mean that the system is boosting at this +very moment, but only that the CPU _may_ raise the frequency at it's +discretion. +-------------------- + +Introduction +------------- +Some CPUs support a functionality to raise the operating frequency of +some cores in a multi-core package if certain conditions apply, mostly +if the whole chip is not fully utilized and below it's intended thermal +budget. The decision about boost disable/enable is made either at hardware +(e.g. x86) or software (e.g ARM). +On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", +in technical documentation "Core performance boost". In Linux we use +the term "boost" for convenience. + +Rationale for disable switch +---------------------------- + +Though the idea is to just give better performance without any user +intervention, sometimes the need arises to disable this functionality. +Most systems offer a switch in the (BIOS) firmware to disable the +functionality at all, but a more fine-grained and dynamic control would +be desirable: +1. While running benchmarks, reproducible results are important. Since + the boosting functionality depends on the load of the whole package, + single thread performance can vary. By explicitly disabling the boost + functionality at least for the benchmark's run-time the system will run + at a fixed frequency and results are reproducible again. +2. To examine the impact of the boosting functionality it is helpful + to do tests with and without boosting. +3. Boosting means overclocking the processor, though under controlled + conditions. By raising the frequency and the voltage the processor + will consume more power than without the boosting, which may be + undesirable for instance for mobile users. Disabling boosting may + save power here, though this depends on the workload. + + +User controlled switch +---------------------- + +To allow the user to toggle the boosting functionality, the cpufreq core +driver exports a sysfs knob to enable or disable it. There is a file: +/sys/devices/system/cpu/cpufreq/boost +which can either read "0" (boosting disabled) or "1" (boosting enabled). +The file is exported only when cpufreq driver supports boosting. +Explicitly changing the permissions and writing to that file anyway will +return EINVAL. + +On supported CPUs one can write either a "0" or a "1" into this file. +This will either disable the boost functionality on all cores in the +whole system (0) or will allow the software or hardware to boost at will +(1). + +Writing a "1" does not explicitly boost the system, but just allows the +CPU to boost at their discretion. Some implementations take external +factors like the chip's temperature into account, so boosting once does +not necessarily mean that it will occur every time even using the exact +same software setup. + + +AMD legacy cpb switch +--------------------- +The AMD powernow-k8 driver used to support a very similar switch to +disable or enable the "Core Performance Boost" feature of some AMD CPUs. +This switch was instantiated in each CPU's cpufreq directory +(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". +Though the per CPU existence hints at a more fine grained control, the +actual implementation only supported a system-global switch semantics, +which was simply reflected into each CPU's file. Writing a 0 or 1 into it +would pull the other CPUs to the same state. +For compatibility reasons this file and its behavior is still supported +on AMD CPUs, though it is now protected by a config switch +(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, +even with the config option set. +This functionality is considered legacy and will be removed in some future +kernel version. + +More fine grained boosting control +---------------------------------- + +Technically it is possible to switch the boosting functionality at least +on a per package basis, for some CPUs even per core. Currently the driver +does not support it, but this may be implemented in the future. diff --git a/kernel/Documentation/cpu-freq/core.txt b/kernel/Documentation/cpu-freq/core.txt new file mode 100644 index 000000000..70933eadc --- /dev/null +++ b/kernel/Documentation/cpu-freq/core.txt @@ -0,0 +1,123 @@ + CPU frequency and voltage scaling code in the Linux(TM) kernel + + + L i n u x C P U F r e q + + C P U F r e q C o r e + + + Dominik Brodowski + David Kimdon + + + + Clock scaling allows you to change the clock speed of the CPUs on the + fly. This is a nice method to save battery power, because the lower + the clock speed, the less power the CPU consumes. + + +Contents: +--------- +1. CPUFreq core and interfaces +2. CPUFreq notifiers +3. CPUFreq Table Generation with Operating Performance Point (OPP) + +1. General Information +======================= + +The CPUFreq core code is located in drivers/cpufreq/cpufreq.c. This +cpufreq code offers a standardized interface for the CPUFreq +architecture drivers (those pieces of code that do actual +frequency transitions), as well as to "notifiers". These are device +drivers or other part of the kernel that need to be informed of +policy changes (ex. thermal modules like ACPI) or of all +frequency changes (ex. timing code) or even need to force certain +speed limits (like LCD drivers on ARM architecture). Additionally, the +kernel "constant" loops_per_jiffy is updated on frequency changes +here. + +Reference counting is done by cpufreq_get_cpu and cpufreq_put_cpu, +which make sure that the cpufreq processor driver is correctly +registered with the core, and will not be unloaded until +cpufreq_put_cpu is called. + +2. CPUFreq notifiers +==================== + +CPUFreq notifiers conform to the standard kernel notifier interface. +See linux/include/linux/notifier.h for details on notifiers. + +There are two different CPUFreq notifiers - policy notifiers and +transition notifiers. + + +2.1 CPUFreq policy notifiers +---------------------------- + +These are notified when a new policy is intended to be set. Each +CPUFreq policy notifier is called three times for a policy transition: + +1.) During CPUFREQ_ADJUST all CPUFreq notifiers may change the limit if + they see a need for this - may it be thermal considerations or + hardware limitations. + +2.) During CPUFREQ_INCOMPATIBLE only changes may be done in order to avoid + hardware failure. + +3.) And during CPUFREQ_NOTIFY all notifiers are informed of the new policy + - if two hardware drivers failed to agree on a new policy before this + stage, the incompatible hardware shall be shut down, and the user + informed of this. + +The phase is specified in the second argument to the notifier. + +The third argument, a void *pointer, points to a struct cpufreq_policy +consisting of five values: cpu, min, max, policy and max_cpu_freq. min +and max are the lower and upper frequencies (in kHz) of the new +policy, policy the new policy, cpu the number of the affected CPU; and +max_cpu_freq the maximum supported CPU frequency. This value is given +for informational purposes only. + + +2.2 CPUFreq transition notifiers +-------------------------------- + +These are notified twice when the CPUfreq driver switches the CPU core +frequency and this change has any external implications. + +The second argument specifies the phase - CPUFREQ_PRECHANGE or +CPUFREQ_POSTCHANGE. + +The third argument is a struct cpufreq_freqs with the following +values: +cpu - number of the affected CPU +old - old frequency +new - new frequency + +3. CPUFreq Table Generation with Operating Performance Point (OPP) +================================================================== +For details about OPP, see Documentation/power/opp.txt + +dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with + cpufreq_frequency_table_cpuinfo which is provided with the list of + frequencies that are available for operation. This function provides + a ready to use conversion routine to translate the OPP layer's internal + information about the available frequencies into a format readily + providable to cpufreq. + + WARNING: Do not use this function in interrupt context. + + Example: + soc_pm_init() + { + /* Do things */ + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); + if (!r) + cpufreq_frequency_table_cpuinfo(policy, freq_table); + /* Do other things */ + } + + NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in + addition to CONFIG_PM_OPP. + +dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table diff --git a/kernel/Documentation/cpu-freq/cpu-drivers.txt b/kernel/Documentation/cpu-freq/cpu-drivers.txt new file mode 100644 index 000000000..14f4e6336 --- /dev/null +++ b/kernel/Documentation/cpu-freq/cpu-drivers.txt @@ -0,0 +1,274 @@ + CPU frequency and voltage scaling code in the Linux(TM) kernel + + + L i n u x C P U F r e q + + C P U D r i v e r s + + - information for developers - + + + Dominik Brodowski + + + + Clock scaling allows you to change the clock speed of the CPUs on the + fly. This is a nice method to save battery power, because the lower + the clock speed, the less power the CPU consumes. + + +Contents: +--------- +1. What To Do? +1.1 Initialization +1.2 Per-CPU Initialization +1.3 verify +1.4 target/target_index or setpolicy? +1.5 target/target_index +1.6 setpolicy +1.7 get_intermediate and target_intermediate +2. Frequency Table Helpers + + + +1. What To Do? +============== + +So, you just got a brand-new CPU / chipset with datasheets and want to +add cpufreq support for this CPU / chipset? Great. Here are some hints +on what is necessary: + + +1.1 Initialization +------------------ + +First of all, in an __initcall level 7 (module_init()) or later +function check whether this kernel runs on the right CPU and the right +chipset. If so, register a struct cpufreq_driver with the CPUfreq core +using cpufreq_register_driver() + +What shall this struct cpufreq_driver contain? + +cpufreq_driver.name - The name of this driver. + +cpufreq_driver.init - A pointer to the per-CPU initialization + function. + +cpufreq_driver.verify - A pointer to a "verification" function. + +cpufreq_driver.setpolicy _or_ +cpufreq_driver.target/ +target_index - See below on the differences. + +And optionally + +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. + +cpufreq_driver.resume - A pointer to a per-CPU resume function + which is called with interrupts disabled + and _before_ the pre-suspend frequency + and/or policy is restored by a call to + ->target/target_index or ->setpolicy. + +cpufreq_driver.attr - A pointer to a NULL-terminated list of + "struct freq_attr" which allow to + export values to sysfs. + +cpufreq_driver.get_intermediate +and target_intermediate Used to switch to stable frequency while + changing CPU frequency. + + +1.2 Per-CPU Initialization +-------------------------- + +Whenever a new CPU is registered with the device model, or after the +cpufreq driver registers itself, the per-CPU initialization function +cpufreq_driver.init is called. It takes a struct cpufreq_policy +*policy as argument. What to do now? + +If necessary, activate the CPUfreq support on your CPU. + +Then, the driver must fill in the following values: + +policy->cpuinfo.min_freq _and_ +policy->cpuinfo.max_freq - the minimum and maximum frequency + (in kHz) which is supported by + this CPU +policy->cpuinfo.transition_latency the time it takes on this CPU to + switch between two frequencies in + nanoseconds (if appropriate, else + specify CPUFREQ_ETERNAL) + +policy->cur The current operating frequency of + this CPU (if appropriate) +policy->min, +policy->max, +policy->policy and, if necessary, +policy->governor must contain the "default policy" for + this CPU. A few moments later, + cpufreq_driver.verify and either + cpufreq_driver.setpolicy or + cpufreq_driver.target/target_index is called + with these values. + +For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the +frequency table helpers might be helpful. See the section 2 for more information +on them. + +SMP systems normally have same clock source for a group of cpus. For these the +.init() would be called only once for the first online cpu. Here the .init() +routine must initialize policy->cpus with mask of all possible cpus (Online + +Offline) that share the clock. Then the core would copy this mask onto +policy->related_cpus and will reset policy->cpus to carry only online cpus. + + +1.3 verify +------------ + +When the user decides a new policy (consisting of +"policy,governor,min,max") shall be set, this policy must be validated +so that incompatible values can be corrected. For verifying these +values, a frequency table helper and/or the +cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned +int min_freq, unsigned int max_freq) function might be helpful. See +section 2 for details on frequency table helpers. + +You need to make sure that at least one valid frequency (or operating +range) is within policy->min and policy->max. If necessary, increase +policy->max first, and only if this is no solution, decrease policy->min. + + +1.4 target/target_index or setpolicy? +---------------------------- + +Most cpufreq drivers or even most cpu frequency scaling algorithms +only allow the CPU to be set to one frequency. For these, you use the +->target/target_index call. + +Some cpufreq-capable processors switch the frequency between certain +limits on their own. These shall use the ->setpolicy call + + +1.5. target/target_index +------------- + +The target_index call has two arguments: struct cpufreq_policy *policy, +and unsigned int index (into the exposed frequency table). + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined by freq_table[index].frequency. + +It should always restore to earlier frequency (i.e. policy->restore_freq) in +case of errors, even if we switched to intermediate frequency earlier. + +Deprecated: +---------- +The target call has three arguments: struct cpufreq_policy *policy, +unsigned int target_frequency, unsigned int relation. + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined using the following rules: + +- keep close to "target_freq" +- policy->min <= new_freq <= policy->max (THIS MUST BE VALID!!!) +- if relation==CPUFREQ_REL_L, try to select a new_freq higher than or equal + target_freq. ("L for lowest, but no lower than") +- if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal + target_freq. ("H for highest, but no higher than") + +Here again the frequency table helper might assist you - see section 2 +for details. + + +1.6 setpolicy +--------------- + +The setpolicy call only takes a struct cpufreq_policy *policy as +argument. You need to set the lower limit of the in-processor or +in-chipset dynamic frequency switching to policy->min, the upper limit +to policy->max, and -if supported- select a performance-oriented +setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a +powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check +the reference implementation in drivers/cpufreq/longrun.c + +1.7 get_intermediate and target_intermediate +-------------------------------------------- + +Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. + +get_intermediate should return a stable intermediate frequency platform wants to +switch to, and target_intermediate() should set CPU to to that frequency, before +jumping to the frequency corresponding to 'index'. Core will take care of +sending notifications and driver doesn't have to handle them in +target_intermediate() or target_index(). + +Drivers can return '0' from get_intermediate() in case they don't wish to switch +to intermediate frequency for some target frequency. In that case core will +directly call ->target_index(). + +NOTE: ->target_index() should restore to policy->restore_freq in case of +failures as core would send notifications for that. + + +2. Frequency Table Helpers +========================== + +As most cpufreq processors only allow for being set to a few specific +frequencies, a "frequency table" with some functions might assist in +some work of the processor driver. Such a "frequency table" consists +of an array of struct cpufreq_frequency_table entries, with any value in +"driver_data" you want to use, and the corresponding frequency in +"frequency". At the end of the table, you need to add a +cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. And +if you want to skip one entry in the table, set the frequency to +CPUFREQ_ENTRY_INVALID. The entries don't need to be in ascending +order. + +By calling cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy, + struct cpufreq_frequency_table *table); +the cpuinfo.min_freq and cpuinfo.max_freq values are detected, and +policy->min and policy->max are set to the same values. This is +helpful for the per-CPU initialization stage. + +int cpufreq_frequency_table_verify(struct cpufreq_policy *policy, + struct cpufreq_frequency_table *table); +assures that at least one valid frequency is within policy->min and +policy->max, and all other criteria are met. This is helpful for the +->verify call. + +int cpufreq_frequency_table_target(struct cpufreq_policy *policy, + struct cpufreq_frequency_table *table, + unsigned int target_freq, + unsigned int relation, + unsigned int *index); + +is the corresponding frequency table helper for the ->target +stage. Just pass the values to this function, and the unsigned int +index returns the number of the frequency table entry which contains +the frequency the CPU shall be set to. + +The following macros can be used as iterators over cpufreq_frequency_table: + +cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency +table. + +cpufreq-for_each_valid_entry(pos, table) - iterates over all entries, +excluding CPUFREQ_ENTRY_INVALID frequencies. +Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and +"table" - the cpufreq_frequency_table * you want to iterate over. + +For example: + + struct cpufreq_frequency_table *pos, *driver_freq_table; + + cpufreq_for_each_entry(pos, driver_freq_table) { + /* Do something with pos */ + pos->frequency = ... + } diff --git a/kernel/Documentation/cpu-freq/cpufreq-nforce2.txt b/kernel/Documentation/cpu-freq/cpufreq-nforce2.txt new file mode 100644 index 000000000..babce1315 --- /dev/null +++ b/kernel/Documentation/cpu-freq/cpufreq-nforce2.txt @@ -0,0 +1,19 @@ + +The cpufreq-nforce2 driver changes the FSB on nVidia nForce2 platforms. + +This works better than on other platforms, because the FSB of the CPU +can be controlled independently from the PCI/AGP clock. + +The module has two options: + + fid: multiplier * 10 (for example 8.5 = 85) + min_fsb: minimum FSB + +If not set, fid is calculated from the current CPU speed and the FSB. +min_fsb defaults to FSB at boot time - 50 MHz. + +IMPORTANT: The available range is limited downwards! + Also the minimum available FSB can differ, for systems + booting with 200 MHz, 150 should always work. + + diff --git a/kernel/Documentation/cpu-freq/cpufreq-stats.txt b/kernel/Documentation/cpu-freq/cpufreq-stats.txt new file mode 100644 index 000000000..fc647492e --- /dev/null +++ b/kernel/Documentation/cpu-freq/cpufreq-stats.txt @@ -0,0 +1,128 @@ + + CPU frequency and voltage scaling statistics in the Linux(TM) kernel + + + L i n u x c p u f r e q - s t a t s d r i v e r + + - information for users - + + + Venkatesh Pallipadi + +Contents +1. Introduction +2. Statistics Provided (with example) +3. Configuring cpufreq-stats + + +1. Introduction + +cpufreq-stats is a driver that provides CPU frequency statistics for each CPU. +These statistics are provided in /sysfs as a bunch of read_only interfaces. This +interface (when configured) will appear in a separate directory under cpufreq +in /sysfs (/devices/system/cpu/cpuX/cpufreq/stats/) for each CPU. +Various statistics will form read_only files under this directory. + +This driver is designed to be independent of any particular cpufreq_driver +that may be running on your CPU. So, it will work with any cpufreq_driver. + + +2. Statistics Provided (with example) + +cpufreq stats provides following statistics (explained in detail below). +- time_in_state +- total_trans +- trans_table + +All the statistics will be from the time the stats driver has been inserted +to the time when a read of a particular statistic is done. Obviously, stats +driver will not have any information about the frequency transitions before +the stats driver insertion. + +-------------------------------------------------------------------------------- +:/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l +total 0 +drwxr-xr-x 2 root root 0 May 14 16:06 . +drwxr-xr-x 3 root root 0 May 14 15:58 .. +-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state +-r--r--r-- 1 root root 4096 May 14 16:06 total_trans +-r--r--r-- 1 root root 4096 May 14 16:06 trans_table +-------------------------------------------------------------------------------- + +- time_in_state +This gives the amount of time spent in each of the frequencies supported by +this CPU. The cat output will have "