From c0b7206652b2852bc574694e7ba07ba1c2acdc00 Mon Sep 17 00:00:00 2001
From: hongbotian
-The original document was written by
-Dan Milstein,
-This describes the Apache JServ Protocol version 1.3 (hereafter
-ajp13). There is, apparently, no current documentation of how the
-protocol works. This document is an attempt to remedy that, in order to
-make life easier for maintainers of JK, and for anyone who wants to
-port the protocol somewhere (into jakarta 4.x, for example).
-
-I am not one of the designers of this protocol -- I believe that Gal
-Shachor was the original designer. Everything in this document is derived
-from the actual implementation I found in the tomcat 3.x code. I hope it
-is useful, but I can't make any grand claims to perfect accuracy. I also
-don't know why certain design decisions were made. Where I was able, I've
-offered some possible justifications for certain choices, but those are
-only my guesses. In general, the C code which Shachor wrote is very clean
-and comprehensible (if almost totally undocumented). I've cleaned up the
-Java code, and I think it's reasonably readable.
-
-According to email from Gal Shachor to the jakarta-dev mailing list,
-the original goals of JK (and thus ajp13) were to extend
-mod_jserv and ajp12 by (I am only including the goals which
-relate to communication between the web server and the servlet container):
-
-
-
-isSecure()
and
- getScheme()
will function correctly within the servlet
- container. The client certificates and cipher suite will be
- available to servlets as request attributes.
-The ajp13 protocol is packet-oriented. A binary format was -presumably chosen over the more readable plain text for reasons of -performance. The web server communicates with the servlet container over -TCP connections. To cut down on the expensive process of socket creation, -the web server will attempt to maintain persistent TCP connections to the -servlet container, and to reuse a connection for multiple request/response -cycles. -
-Once a connection is assigned to a particular request, it will not be -used for any others until the request-handling cycle has terminated. In -other words, requests are not multiplexed over connections. This makes -for much simpler code at either end of the connection, although it does -cause more connections to be open at once. -
-Once the web server has opened a connection to the servlet container, -the connection can be in one of the following states: -
-
-Once a connection is assigned to handle a particular request, the basic -request informaton (e.g. HTTP headers, etc) is sent over the connection in -a highly condensed form (e.g. common strings are encoded as integers). -Details of that format are below in Request Packet Structure. If there is a -body to the request (content-length > 0), that is sent in a separate -packet immediately after. -
-At this point, the servlet container is presumably ready to start -processing the request. As it does so, it can send the -following messages back to the web server: - -
- -Each message is accompanied by a differently formatted packet of data. See -Response Packet Structures below for details. -
--There is a bit of an XDR heritage to this protocol, but it differs in -lots of ways (no 4 byte alignment, for example). -
-Byte order: I am not clear about the endian-ness of the individual -bytes. I'm guessing the bytes are little-endian, because that's what XDR -specifies, and I'm guessing that sys/socket library is magically making -that so (on the C side). If anyone with a better knowledge of socket calls -can step in, that would be great. -
-There are four data types in the protocol: bytes, booleans, integers and -strings. - -
strlen
. This is a touch
- confusing on the Java side, which is littered with odd autoincrement
- statements to skip over these terminators. I believe the reason this was
- done was to allow the C code to be extra efficient when reading strings
- which the servlet container is sending back -- with the terminating \0
- character, the C code can pass around references into a single buffer,
- without copying. If the \0 was missing, the C code would have to copy
- things out in order to get its notion of a string. Note a size of -1
- (65535) indicates a null string and no data follow the length in this
- case.-According to much of the code, the max packet -size is 8 * 1024 bytes (8K). The actual length of the packet is encoded in the -header. -
-
-Packets sent from the server to the container begin with
-0x1234
. Packets sent from the container to the server begin
-with AB
(that's the ASCII code for A followed by the ASCII
-code for B). After those first two bytes, there is an integer (encoded as
-above) with the length of the payload. Although this might suggest that
-the maximum payload could be as large as 2^16, in fact, the code sets the
-maximum to be 8K.
-
-
-
Packet Format (Server->Container) | -|||||
---|---|---|---|---|---|
Byte | -0 | -1 | -2 | -3 | -4...(n+3) | -
Contents | -0x12 | -0x34 | -Data Length (n) | -Data | -
Packet Format (Container->Server) | -|||||
---|---|---|---|---|---|
Byte | -0 | -1 | -2 | -3 | -4...(n+3) | -
Contents | -A | -B | -Data Length (n) | -Data | -
- For most packets, the first byte of the -payload encodes the type of message. The exception is for request body -packets sent from the server to the container -- they are sent with a -standard packet header (0x1234 and then length of the packet), but without -any prefix code after that (this seems like a mistake to me). -
-The web server can send the following messages to the servlet container: - -
Code | -Type of Packet | -Meaning | -
---|---|---|
2 | -Forward Request | -Begin the request-processing cycle with the following data | -
7 | -Shutdown | -The web server asks the container to shut itself down. | -
8 | -Ping | -The web server asks the container to take control (secure login phase). | -
10 | -CPing | -The web server asks the container to respond quickly with a CPong. | -
none | -Data | -Size (2 bytes) and corresponding body data. | -
-To ensure some
-basic security, the container will only actually do the Shutdown
if the
-request comes from the same machine on which it's hosted.
-
-The first Data
packet is send immediatly after the Forward Request
by the web server.
-
The servlet container can send the following types of messages to the web -server: -
Code | -Type of Packet | -Meaning | -
---|---|---|
3 | -Send Body Chunk | -Send a chunk of the body from the servlet container to the web - server (and presumably, onto the browser). | -
4 | -Send Headers | -Send the response headers from the servlet container to the web - server (and presumably, onto the browser). | -
5 | -End Response | -Marks the end of the response (and thus the request-handling cycle). | -
6 | -Get Body Chunk | -Get further data from the request if it hasn't all been transferred - yet. | -
9 | -CPong Reply | -The reply to a CPing request | -
-Each of the above messages has a different internal structure, detailed below. -
--For messages from the server to the container of type "Forward Request": -
- -
-The request_headers
have the following structure:
-
- -
-
-The attributes
are optional and have the following structure:
-
- -
-Not that the all-important header is "content-length', because it -determines whether or not the container looks for another packet -immediately. -
-Detailed description of the elements of Forward Request. -
--For all requests, this will be 2. -See above for details on other prefix codes. -
--The HTTP method, encoded as a single byte: -
- --
Command Name | Code |
---|---|
OPTIONS | 1 |
GET | 2 |
HEAD | 3 |
POST | 4 |
PUT | 5 |
DELETE | 6 |
TRACE | 7 |
PROPFIND | 8 |
PROPPATCH | 9 |
MKCOL | 10 |
COPY | 11 |
MOVE | 12 |
LOCK | 13 |
UNLOCK | 14 |
ACL | 15 |
REPORT | 16 |
VERSION-CONTROL | 17 |
CHECKIN | 18 |
CHECKOUT | 19 |
UNCHECKOUT | 20 |
SEARCH | 21 |
MKWORKSPACE | 22 |
UPDATE | 23 |
LABEL | 24 |
MERGE | 25 |
BASELINE_CONTROL | 26 |
MKACTIVITY | 27 |
Later version of ajp13, when used with mod_jk2, will transport -additional methods, even if they are not in this list. -
- -- These are all fairly self-explanatory. Each of these is required, and - will be sent for every request. -
-
- The structure of request_headers
is the following:
- First, the number of headers num_headers
is encoded.
- Then, a series of header name req_header_name
/ value
- req_header_value
pairs follows.
- Common header names are encoded as integers,
- to save space. If the header name is not in the list of basic headers,
- it is encoded normally (as a string, with prefixed length). The list of
- common headers sc_req_header_name
and their codes
- is as follows (all are case-sensitive):
-
-
Name | Code value | Code name |
---|---|---|
accept | 0xA001 | SC_REQ_ACCEPT |
accept-charset | 0xA002 | SC_REQ_ACCEPT_CHARSET |
accept-encoding | 0xA003 | SC_REQ_ACCEPT_ENCODING |
accept-language | 0xA004 | SC_REQ_ACCEPT_LANGUAGE |
authorization | 0xA005 | SC_REQ_AUTHORIZATION |
connection | 0xA006 | SC_REQ_CONNECTION |
content-type | 0xA007 | SC_REQ_CONTENT_TYPE |
content-length | 0xA008 | SC_REQ_CONTENT_LENGTH |
cookie | 0xA009 | SC_REQ_COOKIE |
cookie2 | 0xA00A | SC_REQ_COOKIE2 |
host | 0xA00B | SC_REQ_HOST |
pragma | 0xA00C | SC_REQ_PRAGMA |
referer | 0xA00D | SC_REQ_REFERER |
user-agent | 0xA00E | SC_REQ_USER_AGENT |
- The Java code that reads this grabs the first two-byte integer and if
- it sees an '0xA0'
in the most significant
- byte, it uses the integer in the second byte as an index into an array of
- header names. If the first byte is not '0xA0', it assumes that the
- two-byte integer is the length of a string, which is then read in.
-
- This works on the assumption that no header names will have length - greater than 0x9999 (==0xA000 - 1), which is perfectly reasonable, though - somewhat arbitrary. (If you, like me, started to think about the cookie - spec here, and about how long headers can get, fear not -- this limit is - on header names not header values. It seems unlikely that - unmanageably huge header names will be showing up in the HTTP spec any time - soon). -
- Note: The content-length
header is extremely
- important. If it is present and non-zero, the container assumes that
- the request has a body (a POST request, for example), and immediately
- reads a separate packet off the input stream to get that body.
-
-
- The attributes prefixed with a ?
- (e.g. ?context
) are all optional. For each, there is a
- single byte code to indicate the type of attribute, and then a string to
- give its value. They can be sent in any order (thogh the C code always
- sends them in the order listed below). A special terminating code is
- sent to signal the end of the list of optional attributes. The list of
- byte codes is:
-
- -
Information | Code Value | Note |
---|---|---|
?context | 0x01 | Not currently implemented |
?servlet_path | 0x02 | Not currently implemented |
?remote_user | 0x03 | |
?auth_type | 0x04 | |
?query_string | 0x05 | |
?route | 0x06 | |
?ssl_cert | 0x07 | |
?ssl_cipher | 0x08 | |
?ssl_session | 0x09 | |
?req_attribute | 0x0A | Name (the name of the attribut follows) |
?ssl_key_size | 0x0B | |
?secret | 0x0C | |
?stored_method | 0x0D | |
are_done | 0xFF | request_terminator |
-
- The context
and servlet_path
are not currently
- set by the C code, and most of the Java code completely ignores whatever
- is sent over for those fields (and some of it will actually break if a
- string is sent along after one of those codes). I don't know if this is
- a bug or an unimplemented feature or just vestigial code, but it's
- missing from both sides of the connection.
-
- The remote_user
and auth_type
presumably refer
- to HTTP-level authentication, and communicate the remote user's username
- and the type of authentication used to establish their identity (e.g. Basic,
- Digest). I'm not clear on why the password isn't also sent, but I don't
- know HTTP authentication inside and out.
-
- The query_string
, ssl_cert
,
- ssl_cipher
, and ssl_session
refer to the
- corresponding pieces of HTTP and HTTPS.
-
- The route
, as I understand it, is used to support sticky
- sessions -- associating a user's sesson with a particular Tomcat instance
- in the presence of multiple, load-balancing servers. I don't know the
- details.
-
- Beyond this list of basic attributes, any number of other attributes can
- be sent via the req_attribute
code (0x0A). A pair of strings
- to represent the attribute name and value are sent immediately after each
- instance of that code. Environment values are passed in via this method.
-
- Finally, after all the attributes have been sent, the attribute terminator, - 0xFF, is sent. This signals both the end of the list of attributes and - also then end of the Request Packet. -
--For messages which the container can send back to the server. - - - -
--Details: -
- -- The chunk is basically binary data, and is sent directly back to the browser. -
-- The status code and message are the usual HTTP things (e.g. "200" and "OK"). - The response header names are encoded the same way the request header names are. - See above for details about how the the - codes are distinguished from the strings. The codes for common headers are: -
- --
Name | Code value |
---|---|
Content-Type | 0xA001 |
Content-Language | 0xA002 |
Content-Length | 0xA003 |
Date | 0xA004 |
Last-Modified | 0xA005 |
Location | 0xA006 |
Set-Cookie | 0xA007 |
Set-Cookie2 | 0xA008 |
Servlet-Engine | 0xA009 |
Status | 0xA00A |
WWW-Authenticate | 0xA00B |
- After the code or the string header name, the header value is immediately - encoded. -
- -
- Signals the end of this request-handling cycle. If the
- reuse
flag is true (==1), this TCP connection can now be used to
- handle new incoming requests. If reuse
is false (anything
- other than 1 in the actual C code), the connection should be closed.
-
- The container asks for more data from the request (If the body was
- too large to fit in the first packet sent over or when the request is
- chuncked).
- The server will send a body packet back with an amount of data which is
- the minimum of the request_length
,
- the maximum send body size (8186 (8 Kbytes - 6)), and the
- number of bytes actually left to send from the request body.
-
- If there is no more data in the body (i.e. the servlet container is
- trying to read past the end of the body), the server will send back an
- "empty" packet, which is a body packet with a payload length of 0.
- (0x12,0x34,0x00,0x00)
-
What happens if the request headers > max packet size? There is no -provision to send a second packet of request headers in case there are more -than 8K (I think this is correctly handled for response headers, though I'm -not certain). I don't know if there is a way to get more than 8K worth of -data into that initial set of request headers, but I'll bet there is -(combine long cookies with long ssl information and a lot of environment -variables, and you should hit 8K easily). I think the connector would just -fail before trying to send any headers in this case, but I'm not certain.
- -What about authentication? There doesn't seem to be any authentication -of the connection between the web server and the container. This strikes -me as potentially dangerous.
- -