summaryrefslogtreecommitdiffstats
path: root/kernel/Documentation/arm/Interrupts
blob: f09ab1b90ef1b486bb55273a53590b22fb8193fa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
2.5.2-rmk5
----------

This is the first kernel that contains a major shake up of some of the
major architecture-specific subsystems.

Firstly, it contains some pretty major changes to the way we handle the
MMU TLB.  Each MMU TLB variant is now handled completely separately -
we have TLB v3, TLB v4 (without write buffer), TLB v4 (with write buffer),
and finally TLB v4 (with write buffer, with I TLB invalidate entry).
There is more assembly code inside each of these functions, mainly to
allow more flexible TLB handling for the future.

Secondly, the IRQ subsystem.

The 2.5 kernels will be having major changes to the way IRQs are handled.
Unfortunately, this means that machine types that touch the irq_desc[]
array (basically all machine types) will break, and this means every
machine type that we currently have.

Lets take an example.  On the Assabet with Neponset, we have:

                  GPIO25                 IRR:2
        SA1100 ------------> Neponset -----------> SA1111
                                         IIR:1
                                      -----------> USAR
                                         IIR:0
                                      -----------> SMC9196

The way stuff currently works, all SA1111 interrupts are mutually
exclusive of each other - if you're processing one interrupt from the
SA1111 and another comes in, you have to wait for that interrupt to
finish processing before you can service the new interrupt.  Eg, an
IDE PIO-based interrupt on the SA1111 excludes all other SA1111 and
SMC9196 interrupts until it has finished transferring its multi-sector
data, which can be a long time.  Note also that since we loop in the
SA1111 IRQ handler, SA1111 IRQs can hold off SMC9196 IRQs indefinitely.


The new approach brings several new ideas...

We introduce the concept of a "parent" and a "child".  For example,
to the Neponset handler, the "parent" is GPIO25, and the "children"d
are SA1111, SMC9196 and USAR.

We also bring the idea of an IRQ "chip" (mainly to reduce the size of
the irqdesc array).  This doesn't have to be a real "IC"; indeed the
SA11x0 IRQs are handled by two separate "chip" structures, one for
GPIO0-10, and another for all the rest.  It is just a container for
the various operations (maybe this'll change to a better name).
This structure has the following operations:

struct irqchip {
        /*
         * Acknowledge the IRQ.
         * If this is a level-based IRQ, then it is expected to mask the IRQ
         * as well.
         */
        void (*ack)(unsigned int irq);
        /*
         * Mask the IRQ in hardware.
         */
        void (*mask)(unsigned int irq);
        /*
         * Unmask the IRQ in hardware.
         */
        void (*unmask)(unsigned int irq);
        /*
         * Re-run the IRQ
         */
        void (*rerun)(unsigned int irq);
        /*
         * Set the type of the IRQ.
         */
        int (*type)(unsigned int irq, unsigned int, type);
};

ack    - required.  May be the same function as mask for IRQs
         handled by do_level_IRQ.
mask   - required.
unmask - required.
rerun  - optional.  Not required if you're using do_level_IRQ for all
         IRQs that use this 'irqchip'.  Generally expected to re-trigger
         the hardware IRQ if possible.  If not, may call the handler
	 directly.
type   - optional.  If you don't support changing the type of an IRQ,
         it should be null so people can detect if they are unable to
         set the IRQ type.

For each IRQ, we keep the following information:

        - "disable" depth (number of disable_irq()s without enable_irq()s)
        - flags indicating what we can do with this IRQ (valid, probe,
          noautounmask) as before
        - status of the IRQ (probing, enable, etc)
        - chip
        - per-IRQ handler
        - irqaction structure list

The handler can be one of the 3 standard handlers - "level", "edge" and
"simple", or your own specific handler if you need to do something special.

The "level" handler is what we currently have - its pretty simple.
"edge" knows about the brokenness of such IRQ implementations - that you
need to leave the hardware IRQ enabled while processing it, and queueing
further IRQ events should the IRQ happen again while processing.  The
"simple" handler is very basic, and does not perform any hardware
manipulation, nor state tracking.  This is useful for things like the
SMC9196 and USAR above.

So, what's changed?

1. Machine implementations must not write to the irqdesc array.

2. New functions to manipulate the irqdesc array.  The first 4 are expected
   to be useful only to machine specific code.  The last is recommended to
   only be used by machine specific code, but may be used in drivers if
   absolutely necessary.

        set_irq_chip(irq,chip)

                Set the mask/unmask methods for handling this IRQ

        set_irq_handler(irq,handler)

                Set the handler for this IRQ (level, edge, simple)

        set_irq_chained_handler(irq,handler)

                Set a "chained" handler for this IRQ - automatically
                enables this IRQ (eg, Neponset and SA1111 handlers).

        set_irq_flags(irq,flags)

                Set the valid/probe/noautoenable flags.

        set_irq_type(irq,type)

                Set active the IRQ edge(s)/level.  This replaces the
                SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
                function.  Type should be one of IRQ_TYPE_xxx defined in
		<linux/irq.h>

3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.

4. Direct access to SA1111 INTPOL is deprecated.  Use set_irq_type instead.

5. A handler is expected to perform any necessary acknowledgement of the
   parent IRQ via the correct chip specific function.  For instance, if
   the SA1111 is directly connected to a SA1110 GPIO, then you should
   acknowledge the SA1110 IRQ each time you re-read the SA1111 IRQ status.

6. For any child which doesn't have its own IRQ enable/disable controls
   (eg, SMC9196), the handler must mask or acknowledge the parent IRQ
   while the child handler is called, and the child handler should be the
   "simple" handler (not "edge" nor "level").  After the handler completes,
   the parent IRQ should be unmasked, and the status of all children must
   be re-checked for pending events.  (see the Neponset IRQ handler for
   details).

7. fixup_irq() is gone, as is arch/arm/mach-*/include/mach/irq.h

Please note that this will not solve all problems - some of them are
hardware based.  Mixing level-based and edge-based IRQs on the same
parent signal (eg neponset) is one such area where a software based
solution can't provide the full answer to low IRQ latency.
n class="n">sock *sk, u32 cnt, s32 rtt_us) { struct veno *veno = inet_csk_ca(sk); u32 vrtt; if (rtt_us < 0) return; /* Never allow zero rtt or baseRTT */ vrtt = rtt_us + 1; /* Filter to find propagation delay: */ if (vrtt < veno->basertt) veno->basertt = vrtt; /* Find the min rtt during the last rtt to find * the current prop. delay + queuing delay: */ veno->minrtt = min(veno->minrtt, vrtt); veno->cntrtt++; } static void tcp_veno_state(struct sock *sk, u8 ca_state) { if (ca_state == TCP_CA_Open) veno_enable(sk); else veno_disable(sk); } /* * If the connection is idle and we are restarting, * then we don't want to do any Veno calculations * until we get fresh rtt samples. So when we * restart, we reset our Veno state to a clean * state. After we get acks for this flight of * packets, _then_ we can make Veno calculations * again. */ static void tcp_veno_cwnd_event(struct sock *sk, enum tcp_ca_event event) { if (event == CA_EVENT_CWND_RESTART || event == CA_EVENT_TX_START) tcp_veno_init(sk); } static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 acked) { struct tcp_sock *tp = tcp_sk(sk); struct veno *veno = inet_csk_ca(sk); if (!veno->doing_veno_now) { tcp_reno_cong_avoid(sk, ack, acked); return; } /* limited by applications */ if (!tcp_is_cwnd_limited(sk)) return; /* We do the Veno calculations only if we got enough rtt samples */ if (veno->cntrtt <= 2) { /* We don't have enough rtt samples to do the Veno * calculation, so we'll behave like Reno. */ tcp_reno_cong_avoid(sk, ack, acked); } else { u64 target_cwnd; u32 rtt; /* We have enough rtt samples, so, using the Veno * algorithm, we determine the state of the network. */ rtt = veno->minrtt; target_cwnd = (u64)tp->snd_cwnd * veno->basertt; target_cwnd <<= V_PARAM_SHIFT; do_div(target_cwnd, rtt); veno->diff = (tp->snd_cwnd << V_PARAM_SHIFT) - target_cwnd; if (tcp_in_slow_start(tp)) { /* Slow start. */ tcp_slow_start(tp, acked); } else { /* Congestion avoidance. */ if (veno->diff < beta) { /* In the "non-congestive state", increase cwnd * every rtt. */ tcp_cong_avoid_ai(tp, tp->snd_cwnd, 1); } else { /* In the "congestive state", increase cwnd * every other rtt. */ if (tp->snd_cwnd_cnt >= tp->snd_cwnd) { if (veno->inc && tp->snd_cwnd < tp->snd_cwnd_clamp) { tp->snd_cwnd++; veno->inc = 0; } else veno->inc = 1; tp->snd_cwnd_cnt = 0; } else tp->snd_cwnd_cnt++; } } if (tp->snd_cwnd < 2) tp->snd_cwnd = 2; else if (tp->snd_cwnd > tp->snd_cwnd_clamp) tp->snd_cwnd = tp->snd_cwnd_clamp; } /* Wipe the slate clean for the next rtt. */ /* veno->cntrtt = 0; */ veno->minrtt = 0x7fffffff; } /* Veno MD phase */ static u32 tcp_veno_ssthresh(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); struct veno *veno = inet_csk_ca(sk); if (veno->diff < beta) /* in "non-congestive state", cut cwnd by 1/5 */ return max(tp->snd_cwnd * 4 / 5, 2U); else /* in "congestive state", cut cwnd by 1/2 */ return max(tp->snd_cwnd >> 1U, 2U); } static struct tcp_congestion_ops tcp_veno __read_mostly = { .init = tcp_veno_init, .ssthresh = tcp_veno_ssthresh, .cong_avoid = tcp_veno_cong_avoid, .pkts_acked = tcp_veno_pkts_acked, .set_state = tcp_veno_state, .cwnd_event = tcp_veno_cwnd_event, .owner = THIS_MODULE, .name = "veno", }; static int __init tcp_veno_register(void) { BUILD_BUG_ON(sizeof(struct veno) > ICSK_CA_PRIV_SIZE); tcp_register_congestion_control(&tcp_veno); return 0; } static void __exit tcp_veno_unregister(void) { tcp_unregister_congestion_control(&tcp_veno); } module_init(tcp_veno_register); module_exit(tcp_veno_unregister); MODULE_AUTHOR("Bin Zhou, Cheng Peng Fu"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("TCP Veno");