summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/osd_internals/last_epoch_started.rst
diff options
context:
space:
mode:
authorQiaowei Ren <qiaowei.ren@intel.com>2018-01-04 13:43:33 +0800
committerQiaowei Ren <qiaowei.ren@intel.com>2018-01-05 11:59:39 +0800
commit812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch)
tree04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/dev/osd_internals/last_epoch_started.rst
parent15280273faafb77777eab341909a3f495cf248d9 (diff)
initial code repo
This patch creates initial code repo. For ceph, luminous stable release will be used for base code, and next changes and optimization for ceph will be added to it. For opensds, currently any changes can be upstreamed into original opensds repo (https://github.com/opensds/opensds), and so stor4nfv will directly clone opensds code to deploy stor4nfv environment. And the scripts for deployment based on ceph and opensds will be put into 'ci' directory. Change-Id: I46a32218884c75dda2936337604ff03c554648e4 Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Diffstat (limited to 'src/ceph/doc/dev/osd_internals/last_epoch_started.rst')
-rw-r--r--src/ceph/doc/dev/osd_internals/last_epoch_started.rst60
1 files changed, 60 insertions, 0 deletions
diff --git a/src/ceph/doc/dev/osd_internals/last_epoch_started.rst b/src/ceph/doc/dev/osd_internals/last_epoch_started.rst
new file mode 100644
index 0000000..9978bd3
--- /dev/null
+++ b/src/ceph/doc/dev/osd_internals/last_epoch_started.rst
@@ -0,0 +1,60 @@
+======================
+last_epoch_started
+======================
+
+info.last_epoch_started records an activation epoch e for interval i
+such that all writes commited in i or earlier are reflected in the
+local info/log and no writes after i are reflected in the local
+info/log. Since no committed write is ever divergent, even if we
+get an authoritative log/info with an older info.last_epoch_started,
+we can leave our info.last_epoch_started alone since no writes could
+have commited in any intervening interval (See PG::proc_master_log).
+
+info.history.last_epoch_started records a lower bound on the most
+recent interval in which the pg as a whole went active and accepted
+writes. On a particular osd, it is also an upper bound on the
+activation epoch of intervals in which writes in the local pg log
+occurred (we update it before accepting writes). Because all
+committed writes are committed by all acting set osds, any
+non-divergent writes ensure that history.last_epoch_started was
+recorded by all acting set members in the interval. Once peering has
+queried one osd from each interval back to some seen
+history.last_epoch_started, it follows that no interval after the max
+history.last_epoch_started can have reported writes as committed
+(since we record it before recording client writes in an interval).
+Thus, the minimum last_update across all infos with
+info.last_epoch_started >= MAX(history.last_epoch_started) must be an
+upper bound on writes reported as committed to the client.
+
+We update info.last_epoch_started with the intial activation message,
+but we only update history.last_epoch_started after the new
+info.last_epoch_started is persisted (possibly along with the first
+write). This ensures that we do not require an osd with the most
+recent info.last_epoch_started until all acting set osds have recorded
+it.
+
+In find_best_info, we do include info.last_epoch_started values when
+calculating the max_last_epoch_started_found because we want to avoid
+designating a log entry divergent which in a prior interval would have
+been non-divergent since it might have been used to serve a read. In
+activate(), we use the peer's last_epoch_started value as a bound on
+how far back divergent log entries can be found.
+
+However, in a case like
+
+.. code::
+
+ calc_acting osd.0 1.4e( v 473'302 (292'200,473'302] local-les=473 n=4 ec=5 les/c 473/473 556/556/556
+ calc_acting osd.1 1.4e( v 473'302 (293'202,473'302] lb 0//0//-1 local-les=477 n=0 ec=5 les/c 473/473 556/556/556
+ calc_acting osd.4 1.4e( v 473'302 (120'121,473'302] local-les=473 n=4 ec=5 les/c 473/473 556/556/556
+ calc_acting osd.5 1.4e( empty local-les=0 n=0 ec=5 les/c 473/473 556/556/556
+
+since osd.1 is the only one which recorded info.les=477 while 4,0
+which were the acting set in that interval did not (4 restarted and 0
+did not get the message in time) the pg is marked incomplete when
+either 4 or 0 would have been valid choices. To avoid this, we do not
+consider info.les for incomplete peers when calculating
+min_last_epoch_started_found. It would not have been in the acting
+set, so we must have another osd from that interval anyway (if
+maybe_went_rw). If that osd does not remember that info.les, then we
+cannot have served reads.