summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/rados-client-protocol.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/ceph/doc/dev/rados-client-protocol.rst')
-rw-r--r--src/ceph/doc/dev/rados-client-protocol.rst117
1 files changed, 0 insertions, 117 deletions
diff --git a/src/ceph/doc/dev/rados-client-protocol.rst b/src/ceph/doc/dev/rados-client-protocol.rst
deleted file mode 100644
index b05e753..0000000
--- a/src/ceph/doc/dev/rados-client-protocol.rst
+++ /dev/null
@@ -1,117 +0,0 @@
-RADOS client protocol
-=====================
-
-This is very incomplete, but one must start somewhere.
-
-Basics
-------
-
-Requests are MOSDOp messages. Replies are MOSDOpReply messages.
-
-An object request is targetted at an hobject_t, which includes a pool,
-hash value, object name, placement key (usually empty), and snapid.
-
-The hash value is a 32-bit hash value, normally generated by hashing
-the object name. The hobject_t can be arbitrarily constructed,
-though, with any hash value and name. Note that in the MOSDOp these
-components are spread across several fields and not logically
-assembled in an actual hobject_t member (mainly historical reasons).
-
-A request can also target a PG. In this case, the *ps* value matches
-a specific PG, the object name is empty, and (hopefully) the ops in
-the request are PG ops.
-
-Either way, the request ultimately targets a PG, either by using the
-explicit pgid or by folding the hash value onto the current number of
-pgs in the pool. The client sends the request to the primary for the
-assocated PG.
-
-Each request is assigned a unique tid.
-
-Resends
--------
-
-If there is a connection drop, the client will resend any outstanding
-requets.
-
-Any time there is a PG mapping change such that the primary changes,
-the client is responsible for resending the request. Note that
-although there may be an interval change from the OSD's perspective
-(triggering PG peering), if the primary doesn't change then the client
-need not resend.
-
-There are a few exceptions to this rule:
-
- * There is a last_force_op_resend field in the pg_pool_t in the
- OSDMap. If this changes, then the clients are forced to resend any
- outstanding requests. (This happens when tiering is adjusted, for
- example.)
- * Some requests are such that they are resent on *any* PG interval
- change, as defined by pg_interval_t's is_new_interval() (the same
- criteria used by peering in the OSD).
- * If the PAUSE OSDMap flag is set and unset.
-
-Each time a request is sent to the OSD the *attempt* field is incremented. The
-first time it is 0, the next 1, etc.
-
-Backoff
--------
-
-Ordinarily the OSD will simply queue any requests it can't immeidately
-process in memory until such time as it can. This can become
-problematic because the OSD limits the total amount of RAM consumed by
-incoming messages: if either of the thresholds for the number of
-messages or the number of bytes is reached, new messages will not be
-read off the network socket, causing backpressure through the network.
-
-In some cases, though, the OSD knows or expects that a PG or object
-will be unavailable for some time and does not want to consume memory
-by queuing requests. In these cases it can send a MOSDBackoff message
-to the client.
-
-A backoff request has four properties:
-
-#. the op code (block, unblock, or ack-block)
-#. *id*, a unique id assigned within this session
-#. hobject_t begin
-#. hobject_t end
-
-There are two types of backoff: a *PG* backoff will plug all requests
-targetting an entire PG at the client, as described by a range of the
-hash/hobject_t space [begin,end), while an *object* backoff will plug
-all requests targetting a single object (begin == end).
-
-When the client receives a *block* backoff message, it is now
-responsible for *not* sending any requests for hobject_ts described by
-the backoff. The backoff remains in effect until the backoff is
-cleared (via an 'unblock' message) or the OSD session is closed. A
-*ack_block* message is sent back to the OSD immediately to acknowledge
-receipt of the backoff.
-
-When an unblock is
-received, it will reference a specific id that the client previous had
-blocked. However, the range described by the unblock may be smaller
-than the original range, as the PG may have split on the OSD. The unblock
-should *only* unblock the range specified in the unblock message. Any requests
-that fall within the unblock request range are reexamined and, if no other
-installed backoff applies, resent.
-
-On the OSD, Backoffs are also tracked across ranges of the hash space, and
-exist in three states:
-
-#. new
-#. acked
-#. deleting
-
-A newly installed backoff is set to *new* and a message is sent to the
-client. When the *ack-block* message is received it is changed to the
-*acked* state. The OSD may process other messages from the client that
-are covered by the backoff in the *new* state, but once the backoff is
-*acked* it should never see a blocked request unless there is a bug.
-
-If the OSD wants to a remove a backoff in the *acked* state it can
-simply remove it and notify the client. If the backoff is in the
-*new* state it must move it to the *deleting* state and continue to
-use it to discard client requests until the *ack-block* message is
-received, at which point it can finally be removed. This is necessary to
-preserve the order of operations processed by the OSD.