summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/osd_internals/backfill_reservation.rst
diff options
context:
space:
mode:
authorQiaowei Ren <qiaowei.ren@intel.com>2018-01-04 13:43:33 +0800
committerQiaowei Ren <qiaowei.ren@intel.com>2018-01-05 11:59:39 +0800
commit812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch)
tree04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/dev/osd_internals/backfill_reservation.rst
parent15280273faafb77777eab341909a3f495cf248d9 (diff)
initial code repo
This patch creates initial code repo. For ceph, luminous stable release will be used for base code, and next changes and optimization for ceph will be added to it. For opensds, currently any changes can be upstreamed into original opensds repo (https://github.com/opensds/opensds), and so stor4nfv will directly clone opensds code to deploy stor4nfv environment. And the scripts for deployment based on ceph and opensds will be put into 'ci' directory. Change-Id: I46a32218884c75dda2936337604ff03c554648e4 Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Diffstat (limited to 'src/ceph/doc/dev/osd_internals/backfill_reservation.rst')
-rw-r--r--src/ceph/doc/dev/osd_internals/backfill_reservation.rst38
1 files changed, 38 insertions, 0 deletions
diff --git a/src/ceph/doc/dev/osd_internals/backfill_reservation.rst b/src/ceph/doc/dev/osd_internals/backfill_reservation.rst
new file mode 100644
index 0000000..cf9dab4
--- /dev/null
+++ b/src/ceph/doc/dev/osd_internals/backfill_reservation.rst
@@ -0,0 +1,38 @@
+====================
+Backfill Reservation
+====================
+
+When a new osd joins a cluster, all pgs containing it must eventually backfill
+to it. If all of these backfills happen simultaneously, it would put excessive
+load on the osd. osd_max_backfills limits the number of outgoing or
+incoming backfills on a single node. The maximum number of outgoing backfills is
+osd_max_backfills. The maximum number of incoming backfills is
+osd_max_backfills. Therefore there can be a maximum of osd_max_backfills * 2
+simultaneous backfills on one osd.
+
+Each OSDService now has two AsyncReserver instances: one for backfills going
+from the osd (local_reserver) and one for backfills going to the osd
+(remote_reserver). An AsyncReserver (common/AsyncReserver.h) manages a queue
+by priority of waiting items and a set of current reservation holders. When a
+slot frees up, the AsyncReserver queues the Context* associated with the next
+item on the highest priority queue in the finisher provided to the constructor.
+
+For a primary to initiate a backfill, it must first obtain a reservation from
+its own local_reserver. Then, it must obtain a reservation from the backfill
+target's remote_reserver via a MBackfillReserve message. This process is
+managed by substates of Active and ReplicaActive (see the substates of Active
+in PG.h). The reservations are dropped either on the Backfilled event, which
+is sent on the primary before calling recovery_complete and on the replica on
+receipt of the BackfillComplete progress message), or upon leaving Active or
+ReplicaActive.
+
+It's important that we always grab the local reservation before the remote
+reservation in order to prevent a circular dependency.
+
+We want to minimize the risk of data loss by prioritizing the order in
+which PGs are recovered. The highest priority is log based recovery
+(OSD_RECOVERY_PRIORITY_MAX) since this must always complete before
+backfill can start. The next priority is backfill of degraded PGs and
+is a function of the degradation. A backfill for a PG missing two
+replicas will have a priority higher than a backfill for a PG missing
+one replica. The lowest priority is backfill of non-degraded PGs.