summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/osd_internals/recovery_reservation.rst
blob: 4ab03192fe5548466405a250a7b9b5ef1e52252f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
====================
Recovery Reservation
====================

Recovery reservation extends and subsumes backfill reservation. The
reservation system from backfill recovery is used for local and remote
reservations.

When a PG goes active, first it determines what type of recovery is
necessary, if any. It may need log-based recovery, backfill recovery,
both, or neither.

In log-based recovery, the primary first acquires a local reservation
from the OSDService's local_reserver. Then a MRemoteReservationRequest
message is sent to each replica in order of OSD number. These requests
will always be granted (i.e., cannot be rejected), but they may take
some time to be granted if the remotes have already granted all their
remote reservation slots.

After all reservations are acquired, log-based recovery proceeds as it
would without the reservation system.

After log-based recovery completes, the primary releases all remote
reservations. The local reservation remains held. The primary then
determines whether backfill is necessary. If it is not necessary, the
primary releases its local reservation and waits in the Recovered state
for all OSDs to indicate that they are clean.

If backfill recovery occurs after log-based recovery, the local
reservation does not need to be reacquired since it is still held from
before. If it occurs immediately after activation (log-based recovery
not possible/necessary), the local reservation is acquired according to
the typical process.

Once the primary has its local reservation, it requests a remote
reservation from the backfill target. This reservation CAN be rejected,
for instance if the OSD is too full (backfillfull_ratio osd setting).
If the reservation is rejected, the primary drops its local
reservation, waits (osd_backfill_retry_interval), and then retries. It
will retry indefinitely.

Once the primary has the local and remote reservations, backfill
proceeds as usual. After backfill completes the remote reservation is
dropped.

Finally, after backfill (or log-based recovery if backfill was not
necessary), the primary drops the local reservation and enters the
Recovered state. Once all the PGs have reported they are clean, the
primary enters the Clean state and marks itself active+clean.


--------------
Things to Note
--------------

We always grab the local reservation first, to prevent a circular
dependency. We grab remote reservations in order of OSD number for the
same reason.

The recovery reservation state chart controls the PG state as reported
to the monitor. The state chart can set:

 - recovery_wait: waiting for local/remote reservations
 - recovering: recovering
 - recovery_toofull: recovery stopped, OSD(s) above full ratio
 - backfill_wait: waiting for remote backfill reservations
 - backfilling: backfilling
 - backfill_toofull: backfill stopped, OSD(s) above backfillfull ratio


--------
See Also
--------

The Active substate of the automatically generated OSD state diagram.