From 7da45d65be36d36b880cc55c5036e96c24b53f00 Mon Sep 17 00:00:00 2001 From: Qiaowei Ren Date: Thu, 1 Mar 2018 14:38:11 +0800 Subject: remove ceph code This patch removes initial ceph code, due to license issue. Change-Id: I092d44f601cdf34aed92300fe13214925563081c Signed-off-by: Qiaowei Ren --- src/ceph/doc/dev/osd_internals/snaps.rst | 128 ------------------------------- 1 file changed, 128 deletions(-) delete mode 100644 src/ceph/doc/dev/osd_internals/snaps.rst (limited to 'src/ceph/doc/dev/osd_internals/snaps.rst') diff --git a/src/ceph/doc/dev/osd_internals/snaps.rst b/src/ceph/doc/dev/osd_internals/snaps.rst deleted file mode 100644 index e17378f..0000000 --- a/src/ceph/doc/dev/osd_internals/snaps.rst +++ /dev/null @@ -1,128 +0,0 @@ -====== -Snaps -====== - -Overview --------- -Rados supports two related snapshotting mechanisms: - - 1. *pool snaps*: snapshots are implicitely applied to all objects - in a pool - 2. *self managed snaps*: the user must provide the current *SnapContext* - on each write. - -These two are mutually exclusive, only one or the other can be used on -a particular pool. - -The *SnapContext* is the set of snapshots currently defined for an object -as well as the most recent snapshot (the *seq*) requested from the mon for -sequencing purposes (a *SnapContext* with a newer *seq* is considered to -be more recent). - -The difference between *pool snaps* and *self managed snaps* from the -OSD's point of view lies in whether the *SnapContext* comes to the OSD -via the client's MOSDOp or via the most recent OSDMap. - -See OSD::make_writeable - -Ondisk Structures ------------------ -Each object has in the pg collection a *head* object (or *snapdir*, which we -will come to shortly) and possibly a set of *clone* objects. -Each hobject_t has a snap field. For the *head* (the only writeable version -of an object), the snap field is set to CEPH_NOSNAP. For the *clones*, the -snap field is set to the *seq* of the *SnapContext* at their creation. -When the OSD services a write, it first checks whether the most recent -*clone* is tagged with a snapid prior to the most recent snap represented -in the *SnapContext*. If so, at least one snapshot has occurred between -the time of the write and the time of the last clone. Therefore, prior -to performing the mutation, the OSD creates a new clone for servicing -reads on snaps between the snapid of the last clone and the most recent -snapid. - -The *head* object contains a *SnapSet* encoded in an attribute, which tracks - - 1. The full set of snaps defined for the object - 2. The full set of clones which currently exist - 3. Overlapping intervals between clones for tracking space usage - 4. Clone size - -If the *head* is deleted while there are still clones, a *snapdir* object -is created instead to house the *SnapSet*. - -Additionally, the *object_info_t* on each clone includes a vector of snaps -for which clone is defined. - -Snap Removal ------------- -To remove a snapshot, a request is made to the *Monitor* cluster to -add the snapshot id to the list of purged snaps (or to remove it from -the set of pool snaps in the case of *pool snaps*). In either case, -the *PG* adds the snap to its *snap_trimq* for trimming. - -A clone can be removed when all of its snaps have been removed. In -order to determine which clones might need to be removed upon snap -removal, we maintain a mapping from snap to *hobject_t* using the -*SnapMapper*. - -See PrimaryLogPG::SnapTrimmer, SnapMapper - -This trimming is performed asynchronously by the snap_trim_wq while the -pg is clean and not scrubbing. - - #. The next snap in PG::snap_trimq is selected for trimming - #. We determine the next object for trimming out of PG::snap_mapper. - For each object, we create a log entry and repop updating the - object info and the snap set (including adjusting the overlaps). - If the object is a clone which no longer belongs to any live snapshots, - it is removed here. (See PrimaryLogPG::trim_object() when new_snaps - is empty.) - #. We also locally update our *SnapMapper* instance with the object's - new snaps. - #. The log entry containing the modification of the object also - contains the new set of snaps, which the replica uses to update - its own *SnapMapper* instance. - #. The primary shares the info with the replica, which persists - the new set of purged_snaps along with the rest of the info. - - - -Recovery --------- -Because the trim operations are implemented using repops and log entries, -normal pg peering and recovery maintain the snap trimmer operations with -the caveat that push and removal operations need to update the local -*SnapMapper* instance. If the purged_snaps update is lost, we merely -retrim a now empty snap. - -SnapMapper ----------- -*SnapMapper* is implemented on top of map_cacher, -which provides an interface over a backing store such as the filesystem -with async transactions. While transactions are incomplete, the map_cacher -instance buffers unstable keys allowing consistent access without having -to flush the filestore. *SnapMapper* provides two mappings: - - 1. hobject_t -> set: stores the set of snaps for each clone - object - 2. snapid_t -> hobject_t: stores the set of hobjects with the snapshot - as one of its snaps - -Assumption: there are lots of hobjects and relatively few snaps. The -first encoding has a stringification of the object as the key and an -encoding of the set of snaps as a value. The second mapping, because there -might be many hobjects for a single snap, is stored as a collection of keys -of the form stringify(snap)_stringify(object) such that stringify(snap) -is constant length. These keys have a bufferlist encoding -pair as a value. Thus, creating or trimming a single -object does not involve reading all objects for any snap. Additionally, -upon construction, the *SnapMapper* is provided with a mask for filtering -the objects in the single SnapMapper keyspace belonging to that pg. - -Split ------ -The snapid_t -> hobject_t key entries are arranged such that for any pg, -up to 8 prefixes need to be checked to determine all hobjects in a particular -snap for a particular pg. Upon split, the prefixes to check on the parent -are adjusted such that only the objects remaining in the pg will be visible. -The children will immediately have the correct mapping. -- cgit 1.2.3-korg