summaryrefslogtreecommitdiffstats
path: root/kernel/Documentation/IRQ-affinity.txt
blob: 01a675175a3674ef88a08ebb4f430dca3a4e4ec2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
ChangeLog:
	Started by Ingo Molnar <mingo@redhat.com>
	Update by Max Krasnyansky <maxk@qualcomm.com>

SMP IRQ affinity

/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
which target CPUs are permitted for a given IRQ source.  It's a bitmask
(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs.  It's not
allowed to turn off all CPUs, and if an IRQ controller does not support
IRQ affinity then the value will not change from the default of all cpus.

/proc/irq/default_smp_affinity specifies default affinity mask that applies
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
will be set to the default mask. It can then be changed as described above.
Default mask is 0xffffffff.

Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
it to CPU4-7 (this is an 8-CPU SMP box):

[root@moon 44]# cd /proc/irq/44
[root@moon 44]# cat smp_affinity
ffffffff

[root@moon 44]# echo 0f > smp_affinity
[root@moon 44]# cat smp_affinity
0000000f
[root@moon 44]# ping -f h
PING hell (195.4.7.3): 56 data bytes
...
--- hell ping statistics ---
6029 packets transmitted, 6027 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.1/0.4 ms
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
           CPU0       CPU1       CPU2       CPU3      CPU4       CPU5        CPU6       CPU7
 44:       1068       1785       1785       1783         0          0           0         0    IO-APIC-level  eth1

As can be seen from the line above IRQ44 was delivered only to the first four
processors (0-3).
Now lets restrict that IRQ to CPU(4-7).

[root@moon 44]# echo f0 > smp_affinity
[root@moon 44]# cat smp_affinity
000000f0
[root@moon 44]# ping -f h
PING hell (195.4.7.3): 56 data bytes
..
--- hell ping statistics ---
2779 packets transmitted, 2777 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.5/585.4 ms
[root@moon 44]# cat /proc/interrupts |  'CPU\|44:'
           CPU0       CPU1       CPU2       CPU3      CPU4       CPU5        CPU6       CPU7
 44:       1068       1785       1785       1783      1784       1069        1070       1069   IO-APIC-level  eth1

This time around IRQ44 was delivered only to the last four processors.
i.e counters for the CPU0-3 did not change.

Here is an example of limiting that same irq (44) to cpus 1024 to 1031:

[root@moon 44]# echo 1024-1031 > smp_affinity_list
[root@moon 44]# cat smp_affinity_list
1024-1031

Note that to do this with a bitmask would require 32 bitmasks of zero
to follow the pertinent one.
back to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat corrupt, but usually it is restorable. 2. Setting the parameters Setting the ramoops parameters can be done in 2 different manners: 1. Use the module parameters (which have the names of the variables described as before). For quick debugging, you can also reserve parts of memory during boot and then use the reserved memory for ramoops. For example, assuming a machine with > 128 MB of memory, the following kernel command line will tell the kernel to use only the first 128 MB of memory, and place ECC-protected ramoops region at 128 MB boundary: "mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1" 2. Use a platform device and set the platform data. The parameters can then be set through that platform data. An example of doing that is: #include <linux/pstore_ram.h> [...] static struct ramoops_platform_data ramoops_data = { .mem_size = <...>, .mem_address = <...>, .mem_type = <...>, .record_size = <...>, .dump_oops = <...>, .ecc = <...>, }; static struct platform_device ramoops_dev = { .name = "ramoops", .dev = { .platform_data = &ramoops_data, }, }; [... inside a function ...] int ret; ret = platform_device_register(&ramoops_dev); if (ret) { printk(KERN_ERR "unable to register platform device\n"); return ret; } You can specify either RAM memory or peripheral devices' memory. However, when specifying RAM, be sure to reserve the memory by issuing memblock_reserve() very early in the architecture code, e.g.: #include <linux/memblock.h> memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size); 3. Dump format The data dump begins with a header, currently defined as "====" followed by a timestamp and a new line. The dump then continues with the actual data. 4. Reading the data The dump data can be read from the pstore filesystem. The format for these files is "dmesg-ramoops-N", where N is the record number in memory. To delete a stored record from RAM, simply unlink the respective pstore file. 5. Persistent function tracing Persistent function tracing might be useful for debugging software or hardware related hangs. The functions call chain log is stored in a "ftrace-ramoops" file. Here is an example of usage: # mount -t debugfs debugfs /sys/kernel/debug/ # echo 1 > /sys/kernel/debug/pstore/record_ftrace # reboot -f [...] # mount -t pstore pstore /mnt/ # tail /mnt/ftrace-ramoops 0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0 0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0 0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90 0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90 0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40 0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0 0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0 0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0 0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40 0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20