From 9401f816dd0d9d550fe98a8507224bde51c4b847 Mon Sep 17 00:00:00 2001
From: hongbotian
+The original document was written by
+Dan Milstein,
+This describes the Apache JServ Protocol version 1.3 (hereafter
+ajp13). There is, apparently, no current documentation of how the
+protocol works. This document is an attempt to remedy that, in order to
+make life easier for maintainers of JK, and for anyone who wants to
+port the protocol somewhere (into jakarta 4.x, for example).
+
+I am not one of the designers of this protocol -- I believe that Gal
+Shachor was the original designer. Everything in this document is derived
+from the actual implementation I found in the tomcat 3.x code. I hope it
+is useful, but I can't make any grand claims to perfect accuracy. I also
+don't know why certain design decisions were made. Where I was able, I've
+offered some possible justifications for certain choices, but those are
+only my guesses. In general, the C code which Shachor wrote is very clean
+and comprehensible (if almost totally undocumented). I've cleaned up the
+Java code, and I think it's reasonably readable.
+
+According to email from Gal Shachor to the jakarta-dev mailing list,
+the original goals of JK (and thus ajp13) were to extend
+mod_jserv and ajp12 by (I am only including the goals which
+relate to communication between the web server and the servlet container):
+
+
+
+isSecure()
and
+ getScheme()
will function correctly within the servlet
+ container. The client certificates and cipher suite will be
+ available to servlets as request attributes.
+The ajp13 protocol is packet-oriented. A binary format was +presumably chosen over the more readable plain text for reasons of +performance. The web server communicates with the servlet container over +TCP connections. To cut down on the expensive process of socket creation, +the web server will attempt to maintain persistent TCP connections to the +servlet container, and to reuse a connection for multiple request/response +cycles. +
+Once a connection is assigned to a particular request, it will not be +used for any others until the request-handling cycle has terminated. In +other words, requests are not multiplexed over connections. This makes +for much simpler code at either end of the connection, although it does +cause more connections to be open at once. +
+Once the web server has opened a connection to the servlet container, +the connection can be in one of the following states: +
+
+Once a connection is assigned to handle a particular request, the basic +request informaton (e.g. HTTP headers, etc) is sent over the connection in +a highly condensed form (e.g. common strings are encoded as integers). +Details of that format are below in Request Packet Structure. If there is a +body to the request (content-length > 0), that is sent in a separate +packet immediately after. +
+At this point, the servlet container is presumably ready to start +processing the request. As it does so, it can send the +following messages back to the web server: + +
+ +Each message is accompanied by a differently formatted packet of data. See +Response Packet Structures below for details. +
++There is a bit of an XDR heritage to this protocol, but it differs in +lots of ways (no 4 byte alignment, for example). +
+Byte order: I am not clear about the endian-ness of the individual +bytes. I'm guessing the bytes are little-endian, because that's what XDR +specifies, and I'm guessing that sys/socket library is magically making +that so (on the C side). If anyone with a better knowledge of socket calls +can step in, that would be great. +
+There are four data types in the protocol: bytes, booleans, integers and +strings. + +
strlen
. This is a touch
+ confusing on the Java side, which is littered with odd autoincrement
+ statements to skip over these terminators. I believe the reason this was
+ done was to allow the C code to be extra efficient when reading strings
+ which the servlet container is sending back -- with the terminating \0
+ character, the C code can pass around references into a single buffer,
+ without copying. If the \0 was missing, the C code would have to copy
+ things out in order to get its notion of a string. Note a size of -1
+ (65535) indicates a null string and no data follow the length in this
+ case.+According to much of the code, the max packet +size is 8 * 1024 bytes (8K). The actual length of the packet is encoded in the +header. +
+
+Packets sent from the server to the container begin with
+0x1234
. Packets sent from the container to the server begin
+with AB
(that's the ASCII code for A followed by the ASCII
+code for B). After those first two bytes, there is an integer (encoded as
+above) with the length of the payload. Although this might suggest that
+the maximum payload could be as large as 2^16, in fact, the code sets the
+maximum to be 8K.
+
+
+
Packet Format (Server->Container) | +|||||
---|---|---|---|---|---|
Byte | +0 | +1 | +2 | +3 | +4...(n+3) | +
Contents | +0x12 | +0x34 | +Data Length (n) | +Data | +
Packet Format (Container->Server) | +|||||
---|---|---|---|---|---|
Byte | +0 | +1 | +2 | +3 | +4...(n+3) | +
Contents | +A | +B | +Data Length (n) | +Data | +
+ For most packets, the first byte of the +payload encodes the type of message. The exception is for request body +packets sent from the server to the container -- they are sent with a +standard packet header (0x1234 and then length of the packet), but without +any prefix code after that (this seems like a mistake to me). +
+The web server can send the following messages to the servlet container: + +
Code | +Type of Packet | +Meaning | +
---|---|---|
2 | +Forward Request | +Begin the request-processing cycle with the following data | +
7 | +Shutdown | +The web server asks the container to shut itself down. | +
8 | +Ping | +The web server asks the container to take control (secure login phase). | +
10 | +CPing | +The web server asks the container to respond quickly with a CPong. | +
none | +Data | +Size (2 bytes) and corresponding body data. | +
+To ensure some
+basic security, the container will only actually do the Shutdown
if the
+request comes from the same machine on which it's hosted.
+
+The first Data
packet is send immediatly after the Forward Request
by the web server.
+
The servlet container can send the following types of messages to the web +server: +
Code | +Type of Packet | +Meaning | +
---|---|---|
3 | +Send Body Chunk | +Send a chunk of the body from the servlet container to the web + server (and presumably, onto the browser). | +
4 | +Send Headers | +Send the response headers from the servlet container to the web + server (and presumably, onto the browser). | +
5 | +End Response | +Marks the end of the response (and thus the request-handling cycle). | +
6 | +Get Body Chunk | +Get further data from the request if it hasn't all been transferred + yet. | +
9 | +CPong Reply | +The reply to a CPing request | +
+Each of the above messages has a different internal structure, detailed below. +
++For messages from the server to the container of type "Forward Request": +
+ +
+The request_headers
have the following structure:
+
+ +
+
+The attributes
are optional and have the following structure:
+
+ +
+Not that the all-important header is "content-length', because it +determines whether or not the container looks for another packet +immediately. +
+Detailed description of the elements of Forward Request. +
++For all requests, this will be 2. +See above for details on other prefix codes. +
++The HTTP method, encoded as a single byte: +
+ ++
Command Name | Code |
---|---|
OPTIONS | 1 |
GET | 2 |
HEAD | 3 |
POST | 4 |
PUT | 5 |
DELETE | 6 |
TRACE | 7 |
PROPFIND | 8 |
PROPPATCH | 9 |
MKCOL | 10 |
COPY | 11 |
MOVE | 12 |
LOCK | 13 |
UNLOCK | 14 |
ACL | 15 |
REPORT | 16 |
VERSION-CONTROL | 17 |
CHECKIN | 18 |
CHECKOUT | 19 |
UNCHECKOUT | 20 |
SEARCH | 21 |
MKWORKSPACE | 22 |
UPDATE | 23 |
LABEL | 24 |
MERGE | 25 |
BASELINE_CONTROL | 26 |
MKACTIVITY | 27 |
Later version of ajp13, when used with mod_jk2, will transport +additional methods, even if they are not in this list. +
+ ++ These are all fairly self-explanatory. Each of these is required, and + will be sent for every request. +
+
+ The structure of request_headers
is the following:
+ First, the number of headers num_headers
is encoded.
+ Then, a series of header name req_header_name
/ value
+ req_header_value
pairs follows.
+ Common header names are encoded as integers,
+ to save space. If the header name is not in the list of basic headers,
+ it is encoded normally (as a string, with prefixed length). The list of
+ common headers sc_req_header_name
and their codes
+ is as follows (all are case-sensitive):
+
+
Name | Code value | Code name |
---|---|---|
accept | 0xA001 | SC_REQ_ACCEPT |
accept-charset | 0xA002 | SC_REQ_ACCEPT_CHARSET |
accept-encoding | 0xA003 | SC_REQ_ACCEPT_ENCODING |
accept-language | 0xA004 | SC_REQ_ACCEPT_LANGUAGE |
authorization | 0xA005 | SC_REQ_AUTHORIZATION |
connection | 0xA006 | SC_REQ_CONNECTION |
content-type | 0xA007 | SC_REQ_CONTENT_TYPE |
content-length | 0xA008 | SC_REQ_CONTENT_LENGTH |
cookie | 0xA009 | SC_REQ_COOKIE |
cookie2 | 0xA00A | SC_REQ_COOKIE2 |
host | 0xA00B | SC_REQ_HOST |
pragma | 0xA00C | SC_REQ_PRAGMA |
referer | 0xA00D | SC_REQ_REFERER |
user-agent | 0xA00E | SC_REQ_USER_AGENT |
+ The Java code that reads this grabs the first two-byte integer and if
+ it sees an '0xA0'
in the most significant
+ byte, it uses the integer in the second byte as an index into an array of
+ header names. If the first byte is not '0xA0', it assumes that the
+ two-byte integer is the length of a string, which is then read in.
+
+ This works on the assumption that no header names will have length + greater than 0x9999 (==0xA000 - 1), which is perfectly reasonable, though + somewhat arbitrary. (If you, like me, started to think about the cookie + spec here, and about how long headers can get, fear not -- this limit is + on header names not header values. It seems unlikely that + unmanageably huge header names will be showing up in the HTTP spec any time + soon). +
+ Note: The content-length
header is extremely
+ important. If it is present and non-zero, the container assumes that
+ the request has a body (a POST request, for example), and immediately
+ reads a separate packet off the input stream to get that body.
+
+
+ The attributes prefixed with a ?
+ (e.g. ?context
) are all optional. For each, there is a
+ single byte code to indicate the type of attribute, and then a string to
+ give its value. They can be sent in any order (thogh the C code always
+ sends them in the order listed below). A special terminating code is
+ sent to signal the end of the list of optional attributes. The list of
+ byte codes is:
+
+ +
Information | Code Value | Note |
---|---|---|
?context | 0x01 | Not currently implemented |
?servlet_path | 0x02 | Not currently implemented |
?remote_user | 0x03 | |
?auth_type | 0x04 | |
?query_string | 0x05 | |
?route | 0x06 | |
?ssl_cert | 0x07 | |
?ssl_cipher | 0x08 | |
?ssl_session | 0x09 | |
?req_attribute | 0x0A | Name (the name of the attribut follows) |
?ssl_key_size | 0x0B | |
?secret | 0x0C | |
?stored_method | 0x0D | |
are_done | 0xFF | request_terminator |
+
+ The context
and servlet_path
are not currently
+ set by the C code, and most of the Java code completely ignores whatever
+ is sent over for those fields (and some of it will actually break if a
+ string is sent along after one of those codes). I don't know if this is
+ a bug or an unimplemented feature or just vestigial code, but it's
+ missing from both sides of the connection.
+
+ The remote_user
and auth_type
presumably refer
+ to HTTP-level authentication, and communicate the remote user's username
+ and the type of authentication used to establish their identity (e.g. Basic,
+ Digest). I'm not clear on why the password isn't also sent, but I don't
+ know HTTP authentication inside and out.
+
+ The query_string
, ssl_cert
,
+ ssl_cipher
, and ssl_session
refer to the
+ corresponding pieces of HTTP and HTTPS.
+
+ The route
, as I understand it, is used to support sticky
+ sessions -- associating a user's sesson with a particular Tomcat instance
+ in the presence of multiple, load-balancing servers. I don't know the
+ details.
+
+ Beyond this list of basic attributes, any number of other attributes can
+ be sent via the req_attribute
code (0x0A). A pair of strings
+ to represent the attribute name and value are sent immediately after each
+ instance of that code. Environment values are passed in via this method.
+
+ Finally, after all the attributes have been sent, the attribute terminator, + 0xFF, is sent. This signals both the end of the list of attributes and + also then end of the Request Packet. +
++For messages which the container can send back to the server. + + + +
++Details: +
+ ++ The chunk is basically binary data, and is sent directly back to the browser. +
++ The status code and message are the usual HTTP things (e.g. "200" and "OK"). + The response header names are encoded the same way the request header names are. + See above for details about how the the + codes are distinguished from the strings. The codes for common headers are: +
+ ++
Name | Code value |
---|---|
Content-Type | 0xA001 |
Content-Language | 0xA002 |
Content-Length | 0xA003 |
Date | 0xA004 |
Last-Modified | 0xA005 |
Location | 0xA006 |
Set-Cookie | 0xA007 |
Set-Cookie2 | 0xA008 |
Servlet-Engine | 0xA009 |
Status | 0xA00A |
WWW-Authenticate | 0xA00B |
+ After the code or the string header name, the header value is immediately + encoded. +
+ +
+ Signals the end of this request-handling cycle. If the
+ reuse
flag is true (==1), this TCP connection can now be used to
+ handle new incoming requests. If reuse
is false (anything
+ other than 1 in the actual C code), the connection should be closed.
+
+ The container asks for more data from the request (If the body was
+ too large to fit in the first packet sent over or when the request is
+ chuncked).
+ The server will send a body packet back with an amount of data which is
+ the minimum of the request_length
,
+ the maximum send body size (8186 (8 Kbytes - 6)), and the
+ number of bytes actually left to send from the request body.
+
+ If there is no more data in the body (i.e. the servlet container is
+ trying to read past the end of the body), the server will send back an
+ "empty" packet, which is a body packet with a payload length of 0.
+ (0x12,0x34,0x00,0x00)
+
What happens if the request headers > max packet size? There is no +provision to send a second packet of request headers in case there are more +than 8K (I think this is correctly handled for response headers, though I'm +not certain). I don't know if there is a way to get more than 8K worth of +data into that initial set of request headers, but I'll bet there is +(combine long cookies with long ssl information and a lot of environment +variables, and you should hit 8K easily). I think the connector would just +fail before trying to send any headers in this case, but I'm not certain.
+ +What about authentication? There doesn't seem to be any authentication +of the connection between the web server and the container. This strikes +me as potentially dangerous.
+ ++This document is a proposal of evolution of the current +Apache JServ Protocol version 1.3, also known as ajp13. +I'll not cover here the full protocol but only the add-on from ajp13. + +This nth pass include comments from the tomcat-dev list and +misses discovered during developpment. +
++ajp13 is a good protocol to link a servlet engine like tomcat to a web server like Apache: + +
+But ajp13 lacks support for : +
++Let's descrive here the features and add-on that could be added to AJP13. +Since this document is a proposal, a reasonable level of chaos must be expected at first. +Be sure that discussion on tomcat list will help clarify points, add +features but the current list seems to be a 'minimun vital' + +
+ +
+ + +
+AJP13 miss a functionnality of AJP12, which is shutdown command. +A logout will tell servlet engine to shutdown itself. + + +
+NOTA: + +While working on AJP13 in JK, I really discovered "JkEnvVar". +The following "Extended Env Vars feature" description may not +be implemented in extended AJP13 since allready available in original +implementation. + +DESC: + +Many users will want to see some of their web-server env vars +passed to their servlet engine. + +To reduce the network traffic, the web-servlet will send a +table to describing the external vars in a shorter fashion. + +We'll use there a functionnality allready present in AJP13, +attributes list : + +In the AJP13, we've got : + + + +Using short 'web server attribute name' will reduce the +network traffic. + + + +ie : + + + +During transmission in extended AJP13 we'll see attributes name +containing S1, S2, S3 and attributes values of +2001/01/03, 2002/01/03, 0123AFE56. + +This example showed the use of extended SSL vars but +any 'personnal' web-server vars like custom authentification +vars could be reused in the servlet engine. +The cost will be only some more bytes in the AJP traffic. + +
+Just after the LOGON PHASE, the web server will ask for the list of contexts +and URLs/URIs handled by the servlet engine. +It will ease installation in many sites, reduce questions about configuration +on tomcat-user list, and be ready for servlet API 2.3. + +This mode will be activated by a new directive JkAutoMount + +ie: JkAutoMount examples myworker1 /examples/ + +If we want to get ALL the contexts handled by the servlet engine, willcard +could be used : + +ie: JkAutoMount * myworker1 * + +A servlet engine could have many contexts, /examples, /admin, /test. +We may want to use only some contexts for a given worker. It was +done previously, in apache HTTP server for example, by setting by +hand the JkMount accordingly in each [virtual] area of Apache. + +If you web-server support virtual hosting, we'll forward also that +information to servlet engine which will only return contexts for +that virtual host. +In that case the servlet engine will only return the URL/URI matching +these particular virtual server (defined in server.xml). +This feature will help ISP and big sites which mutualize large farm +of Tomcat in load-balancing configuration. + + + +We'll discover via context-query, the list of URL/MIMES handled by the remove servlet engine +for a list of contextes. +In wildcard mode, CONTEXTA will contains just '*'. + +
+Context update are messages caming from the servlet engine each time a context +is desactivated/reactivated. The update will be in use when the directive JkUpdateMount. +This directive will set the AJP13_CONTEXT_UPDATE_NEG flag. + +ie: JkUpdateMount myworker1 + + + +
+This query will be used by the web-server to determine if a given +contexts are UP, DOWN or INVALID (and should be removed). + + + +
+Sometimes even with a well negocied protocol, we may be in a situation +where one end (web server or servlet engine), will receive a message it +couldn't understand. In that case the receiver will send an +'UNKNOW PACKET CMD' with attached the unhandled message. + + + +Depending on the message, the sender will report an error and if +possible will try to forward the message to another endpoint. + +
+NOTA: This fonctionality may never be used, since it may slow up the normal process +since requiring on the web-server side an extra IO (read) before forwarding +the request..... + +One of the beauty of socket APIs, is that you could write on a half closed socket. +When servlet engine close the socket, the web server will discover it only at the +next read() to the socket. +Basically, in the AJP13 protocol, the web server send the HTTP HEADER and HTTP BODY +(POST by chunk of 8K) to the servlet engine and then try to receive the reply. +If the connection was broken the web server will learn it only at receive time. + +We could use a buffering scheme but what happen when you use the servlet engine +for upload operations with more than 8ko of datas ? + +The hack in the AJP13 protocol is to add some bytes to read after the end of the +service : + + + +The AJP STATUS will not be read by the servlet engine at the end of +the request/response #N but at the begining of the next session. + +More at that time the web server could also use OS dependants functions +(or better APR functions) to determine if there is also more data +to read. And that datas could be CONTEXT Updates. + +This will avoid the web server sending a request to a +desactivated context. In that case, if the load-balancing is used, +it will search for another servlet engine to handle the request. + +And that feature will help ISP and big sites with farm of tomcat, +to updates their servlet engine without any service interruption. + + + +
+The goal of the extended AJP13 protocol is to overcome some of the original AJP13 limitation. +An easier configuration, a better support for large site and farm of Tomcat, +a simple authentification system and provision for protocol updates. + +Using the stable ajp13 implementation in JK (native) and in servlet +engine (java), it's a reasonable evolution of the well known ajp13. +
++Index of Commands and ID to be added in AJP13 Protocol +
+ ++
Command Name | Command Number |
---|---|
AJP13_LOGINIT_CMD | 0x10 |
AJP13_LOGSEED_CMD | 0x11 |
AJP13_LOGCOMP_CMD | 0x12 |
AJP13_LOGOK_CMD | 0x13 |
AJP13_LOGNOK_CMD | 0x14 |
AJP13_CONTEXT_QRY_CMD | 0x15 |
AJP13_CONTEXT_INFO_CMD | 0x16 |
AJP13_CONTEXT_UPDATE_CMD | 0x17 |
AJP13_STATUS_CMD | 0x18 |
AJP13_SHUTDOWN_CMD | 0x19 |
AJP13_SHUTOK_CMD | 0x1A |
AJP13_SHUTNOK_CMD | 0x1B |
AJP13_CONTEXT_STATE_CMD | 0x1C |
AJP13_CONTEXT_STATE_REP_CMD | 0x1D |
AJP13_UNKNOW_PACKET_CMD | 0x1E |
+
Command Name | Number | Description |
---|---|---|
AJP13_CONTEXT_INFO_NEG | 0x80000000 | web-server want context info after login |
AJP13_CONTEXT_UPDATE_NEG | 0x40000000 | web-server want context updates |
AJP13_GZIP_STREAM_NEG | 0x20000000 | web-server want compressed stream |
AJP13_DES56_STREAM_NEG | 0x10000000 | web-server want crypted DES56 stream with secret key |
AJP13_SSL_VSERVER_NEG | 0x08000000 | Extended info on server SSL vars |
AJP13_SSL_VCLIENT_NEG | 0x04000000 | Extended info on client SSL vars |
AJP13_SSL_VCRYPTO_NEG | 0x02000000 | Extended info on crypto SSL vars |
AJP13_SSL_VMISC_NEG | 0x01000000 | Extended info on misc SSL vars |
Negociation ID | Number | Description |
---|---|---|
AJP13_PROTO_SUPPORT_AJPXX_NEG | 0x00FF0000 | mask of protocol supported |
AJP13_PROTO_SUPPORT_AJP13L1_NEG | 0x00010000 | communication could use AJP13 Level 1 |
AJP13_PROTO_SUPPORT_AJP13L2_NEG | 0x00020000 | communication could use AJP13 Level 2 |
AJP13_PROTO_SUPPORT_AJP13L3_NEG | 0x00040000 | communication could use AJP13 Level 3 |
+
Failure Id | Number |
---|---|
AJP13_BAD_KEY_ERR | 0xFFFFFFFF |
AJP13_ENGINE_DOWN_ERR | 0xFFFFFFFE |
AJP13_RETRY_LATER_ERR | 0xFFFFFFFD |
AJP13_SHUT_AUTHOR_FAILED_ERR | 0xFFFFFFFC |
+
Failure Id | Number |
---|---|
AJP13_CONTEXT_DOWN | 0x01 |
AJP13_CONTEXT_UP | 0x02 |
AJP13_CONTEXT_OK | 0x03 |