summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/file-striping.rst
diff options
context:
space:
mode:
authorQiaowei Ren <qiaowei.ren@intel.com>2018-01-04 13:43:33 +0800
committerQiaowei Ren <qiaowei.ren@intel.com>2018-01-05 11:59:39 +0800
commit812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch)
tree04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/dev/file-striping.rst
parent15280273faafb77777eab341909a3f495cf248d9 (diff)
initial code repo
This patch creates initial code repo. For ceph, luminous stable release will be used for base code, and next changes and optimization for ceph will be added to it. For opensds, currently any changes can be upstreamed into original opensds repo (https://github.com/opensds/opensds), and so stor4nfv will directly clone opensds code to deploy stor4nfv environment. And the scripts for deployment based on ceph and opensds will be put into 'ci' directory. Change-Id: I46a32218884c75dda2936337604ff03c554648e4 Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Diffstat (limited to 'src/ceph/doc/dev/file-striping.rst')
-rw-r--r--src/ceph/doc/dev/file-striping.rst161
1 files changed, 161 insertions, 0 deletions
diff --git a/src/ceph/doc/dev/file-striping.rst b/src/ceph/doc/dev/file-striping.rst
new file mode 100644
index 0000000..405c971
--- /dev/null
+++ b/src/ceph/doc/dev/file-striping.rst
@@ -0,0 +1,161 @@
+File striping
+=============
+
+The text below describes how files from Ceph file system clients are
+stored across objects stored in RADOS.
+
+ceph_file_layout
+----------------
+
+Ceph distributes (stripes) the data for a given file across a number
+of underlying objects. The way file data is mapped to those objects
+is defined by the ceph_file_layout structure. The data distribution
+is a modified RAID 0, where data is striped across a set of objects up
+to a (per-file) fixed size, at which point another set of objects
+holds the file's data. The second set also holds no more than the
+fixed amount of data, and then another set is used, and so on.
+
+Defining some terminology will go a long way toward explaining the
+way file data is laid out across Ceph objects.
+
+- file
+ A collection of contiguous data, named from the perspective of
+ the Ceph client (i.e., a file on a Linux system using Ceph
+ storage). The data for a file is divided into fixed-size
+ "stripe units," which are stored in ceph "objects."
+- stripe unit
+ The size (in bytes) of a block of data used in the RAID 0
+ distribution of a file. All stripe units for a file have equal
+ size. The last stripe unit is typically incomplete--i.e. it
+ represents the data at the end of the file as well as unused
+ "space" beyond it up to the end of the fixed stripe unit size.
+- stripe count
+ The number of consecutive stripe units that constitute a RAID 0
+ "stripe" of file data.
+- stripe
+ A contiguous range of file data, RAID 0 striped across "stripe
+ count" objects in fixed-size "stripe unit" blocks.
+- object
+ A collection of data maintained by Ceph storage. Objects are
+ used to hold portions of Ceph client files.
+- object set
+ A set of objects that together represent a contiguous portion of
+ a file.
+
+Three fields in the ceph_file_layout structure define this mapping::
+
+ u32 fl_stripe_unit;
+ u32 fl_stripe_count;
+ u32 fl_object_size;
+
+(They are actually maintained in their on-disk format, __le32.)
+
+The role of the first two fields should be clear from the
+definitions above.
+
+The third field is the maximum size (in bytes) of an object used to
+back file data. The object size is a multiple of the stripe unit.
+
+A file's data is blocked into stripe units, and consecutive stripe
+units are stored on objects in an object set. The number of objects
+in a set is the same as the stripe count. No object storing file
+data will exceed the file's designated object size, so after some
+fixed number of complete stripes, a new object set is used to store
+subsequent file data.
+
+Note that by default, Ceph uses a simple striping strategy in which
+object_size equals stripe_unit and stripe_count is 1. This simply
+puts one stripe_unit in each object.
+
+Here's a more complex example::
+
+ file size = 1 trillion = 1000000000000 bytes
+
+ fl_stripe_unit = 64KB = 65536 bytes
+ fl_stripe_count = 5 stripe units per stripe
+ fl_object_size = 64GB = 68719476736 bytes
+
+This means::
+
+ file stripe size = 64KB * 5 = 320KB = 327680 bytes
+ each object holds 64GB / 64KB = 1048576 stripe units
+ file object set size = 64GB * 5 = 320GB = 343597383680 bytes
+ (also 1048576 stripe units * 327680 bytes per stripe unit)
+
+So the file's 1 trillion bytes can be divided into complete object
+sets, then complete stripes, then complete stripe units, and finally
+a single incomplete stripe unit::
+
+ - 1 trillion bytes / 320GB per object set = 2 complete object sets
+ (with 312805232640 bytes remaining)
+ - 312805232640 bytes / 320KB per stripe = 954605 complete stripes
+ (with 266240 bytes remaining)
+ - 266240 bytes / 64KB per stripe unit = 4 complete stripe units
+ (with 4096 bytes remaining)
+ - and the final incomplete stripe unit holds those 4096 bytes.
+
+The ASCII art below attempts to capture this::
+
+ _________ _________ _________ _________ _________
+ /object 0\ /object 1\ /object 2\ /object 3\ /object 4\
+ +=========+ +=========+ +=========+ +=========+ +=========+
+ | stripe | | stripe | | stripe | | stripe | | stripe |
+ o | unit | | unit | | unit | | unit | | unit | stripe 0
+ b | 0 | | 1 | | 2 | | 3 | | 4 |
+ j |---------| |---------| |---------| |---------| |---------|
+ e | stripe | | stripe | | stripe | | stripe | | stripe |
+ c | unit | | unit | | unit | | unit | | unit | stripe 1
+ t | 5 | | 6 | | 7 | | 8 | | 9 |
+ |---------| |---------| |---------| |---------| |---------|
+ s | . | | . | | . | | . | | . |
+ e . . . . .
+ t | . | | . | | . | | . | | . |
+ |---------| |---------| |---------| |---------| |---------|
+ 0 | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ | unit | | unit | | unit | | unit | | unit | 1048575
+ | 5242875 | | 5242876 | | 5242877 | | 5242878 | | 5242879 |
+ \=========/ \=========/ \=========/ \=========/ \=========/
+
+ _________ _________ _________ _________ _________
+ /object 5\ /object 6\ /object 7\ /object 8\ /object 9\
+ +=========+ +=========+ +=========+ +=========+ +=========+
+ | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ o | unit | | unit | | unit | | unit | | unit | 1048576
+ b | 5242880 | | 5242881 | | 5242882 | | 5242883 | | 5242884 |
+ j |---------| |---------| |---------| |---------| |---------|
+ e | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ c | unit | | unit | | unit | | unit | | unit | 1048577
+ t | 5242885 | | 5242886 | | 5242887 | | 5242888 | | 5242889 |
+ |---------| |---------| |---------| |---------| |---------|
+ s | . | | . | | . | | . | | . |
+ e . . . . .
+ t | . | | . | | . | | . | | . |
+ |---------| |---------| |---------| |---------| |---------|
+ 1 | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ | unit | | unit | | unit | | unit | | unit | 2097151
+ | 10485755| | 10485756| | 10485757| | 10485758| | 10485759|
+ \=========/ \=========/ \=========/ \=========/ \=========/
+
+ _________ _________ _________ _________ _________
+ /object 10\ /object 11\ /object 12\ /object 13\ /object 14\
+ +=========+ +=========+ +=========+ +=========+ +=========+
+ | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ o | unit | | unit | | unit | | unit | | unit | 2097152
+ b | 10485760| | 10485761| | 10485762| | 10485763| | 10485764|
+ j |---------| |---------| |---------| |---------| |---------|
+ e | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ c | unit | | unit | | unit | | unit | | unit | 2097153
+ t | 10485765| | 10485766| | 10485767| | 10485768| | 10485769|
+ |---------| |---------| |---------| |---------| |---------|
+ s | . | | . | | . | | . | | . |
+ e . . . . .
+ t | . | | . | | . | | . | | . |
+ |---------| |---------| |---------| |---------| |---------|
+ 2 | stripe | | stripe | | stripe | | stripe | | stripe | stripe
+ | unit | | unit | | unit | | unit | | unit | 3051756
+ | 15258780| | 15258781| | 15258782| | 15258783| | 15258784|
+ |---------| |---------| |---------| |---------| |---------|
+ | stripe | | stripe | | stripe | | stripe | | (partial| (partial
+ | unit | | unit | | unit | | unit | | stripe | stripe
+ | 15258785| | 15258786| | 15258787| | 15258788| | unit) | 3051757)
+ \=========/ \=========/ \=========/ \=========/ \=========/