From 7da45d65be36d36b880cc55c5036e96c24b53f00 Mon Sep 17 00:00:00 2001 From: Qiaowei Ren Date: Thu, 1 Mar 2018 14:38:11 +0800 Subject: remove ceph code This patch removes initial ceph code, due to license issue. Change-Id: I092d44f601cdf34aed92300fe13214925563081c Signed-off-by: Qiaowei Ren --- .../erasure_coding/developer_notes.rst | 223 --------------------- 1 file changed, 223 deletions(-) delete mode 100644 src/ceph/doc/dev/osd_internals/erasure_coding/developer_notes.rst (limited to 'src/ceph/doc/dev/osd_internals/erasure_coding/developer_notes.rst') diff --git a/src/ceph/doc/dev/osd_internals/erasure_coding/developer_notes.rst b/src/ceph/doc/dev/osd_internals/erasure_coding/developer_notes.rst deleted file mode 100644 index 770ff4a..0000000 --- a/src/ceph/doc/dev/osd_internals/erasure_coding/developer_notes.rst +++ /dev/null @@ -1,223 +0,0 @@ -============================ -Erasure Code developer notes -============================ - -Introduction ------------- - -Each chapter of this document explains an aspect of the implementation -of the erasure code within Ceph. It is mostly based on examples being -explained to demonstrate how things work. - -Reading and writing encoded chunks from and to OSDs ---------------------------------------------------- - -An erasure coded pool stores each object as K+M chunks. It is divided -into K data chunks and M coding chunks. The pool is configured to have -a size of K+M so that each chunk is stored in an OSD in the acting -set. The rank of the chunk is stored as an attribute of the object. - -Let's say an erasure coded pool is created to use five OSDs ( K+M = -5 ) and sustain the loss of two of them ( M = 2 ). - -When the object *NYAN* containing *ABCDEFGHI* is written to it, the -erasure encoding function splits the content in three data chunks, -simply by dividing the content in three : the first contains *ABC*, -the second *DEF* and the last *GHI*. The content will be padded if the -content length is not a multiple of K. The function also creates two -coding chunks : the fourth with *YXY* and the fifth with *GQC*. Each -chunk is stored in an OSD in the acting set. The chunks are stored in -objects that have the same name ( *NYAN* ) but reside on different -OSDs. The order in which the chunks were created must be preserved and -is stored as an attribute of the object ( shard_t ), in addition to its -name. Chunk *1* contains *ABC* and is stored on *OSD5* while chunk *4* -contains *YXY* and is stored on *OSD3*. - -:: - - +-------------------+ - name | NYAN | - +-------------------+ - content | ABCDEFGHI | - +--------+----------+ - | - | - v - +------+------+ - +---------------+ encode(3,2) +-----------+ - | +--+--+---+---+ | - | | | | | - | +-------+ | +-----+ | - | | | | | - +--v---+ +--v---+ +--v---+ +--v---+ +--v---+ - name | NYAN | | NYAN | | NYAN | | NYAN | | NYAN | - +------+ +------+ +------+ +------+ +------+ - shard | 1 | | 2 | | 3 | | 4 | | 5 | - +------+ +------+ +------+ +------+ +------+ - content | ABC | | DEF | | GHI | | YXY | | QGC | - +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ - | | | | | - | | | | | - | | +--+---+ | | - | | | OSD1 | | | - | | +------+ | | - | | +------+ | | - | +------>| OSD2 | | | - | +------+ | | - | +------+ | | - | | OSD3 |<----+ | - | +------+ | - | +------+ | - | | OSD4 |<--------------+ - | +------+ - | +------+ - +----------------->| OSD5 | - +------+ - - - - -When the object *NYAN* is read from the erasure coded pool, the -decoding function reads three chunks : chunk *1* containing *ABC*, -chunk *3* containing *GHI* and chunk *4* containing *YXY* and rebuild -the original content of the object *ABCDEFGHI*. The decoding function -is informed that the chunks *2* and *5* are missing ( they are called -*erasures* ). The chunk *5* could not be read because the *OSD4* is -*out*. - -The decoding function could be called as soon as three chunks are -read : *OSD2* was the slowest and its chunk does not need to be taken into -account. This optimization is not implemented in Firefly. - -:: - - +-------------------+ - name | NYAN | - +-------------------+ - content | ABCDEFGHI | - +--------+----------+ - ^ - | - | - +------+------+ - | decode(3,2) | - | erasures 2,5| - +-------------->| | - | +-------------+ - | ^ ^ - | | +-----+ - | | | - +--+---+ +------+ +--+---+ +--+---+ - name | NYAN | | NYAN | | NYAN | | NYAN | - +------+ +------+ +------+ +------+ - shard | 1 | | 2 | | 3 | | 4 | - +------+ +------+ +------+ +------+ - content | ABC | | DEF | | GHI | | YXY | - +--+---+ +--+---+ +--+---+ +--+---+ - ^ . ^ ^ - | TOO . | | - | SLOW . +--+---+ | - | ^ | OSD1 | | - | | +------+ | - | | +------+ | - | +-------| OSD2 | | - | +------+ | - | +------+ | - | | OSD3 |-----+ - | +------+ - | +------+ - | | OSD4 | OUT - | +------+ - | +------+ - +------------------| OSD5 | - +------+ - - -Erasure code library --------------------- - -Using `Reed-Solomon `_, -with parameters K+M, object O is encoded by dividing it into chunks O1, -O2, ... OM and computing coding chunks P1, P2, ... PK. Any K chunks -out of the available K+M chunks can be used to obtain the original -object. If data chunk O2 or coding chunk P2 are lost, they can be -repaired using any K chunks out of the K+M chunks. If more than M -chunks are lost, it is not possible to recover the object. - -Reading the original content of object O can be a simple -concatenation of O1, O2, ... OM, because the plugins are using -`systematic codes -`_. Otherwise the chunks -must be given to the erasure code library *decode* method to retrieve -the content of the object. - -Performance depend on the parameters to the encoding functions and -is also influenced by the packet sizes used when calling the encoding -functions ( for Cauchy or Liberation for instance ): smaller packets -means more calls and more overhead. - -Although Reed-Solomon is provided as a default, Ceph uses it via an -`abstract API `_ designed to -allow each pool to choose the plugin that implements it using -key=value pairs stored in an `erasure code profile`_. - -.. _erasure code profile: ../../../erasure-coded-pool - -:: - - $ ceph osd erasure-code-profile set myprofile \ - crush-failure-domain=osd - $ ceph osd erasure-code-profile get myprofile - directory=/usr/lib/ceph/erasure-code - k=2 - m=1 - plugin=jerasure - technique=reed_sol_van - crush-failure-domain=osd - $ ceph osd pool create ecpool 12 12 erasure myprofile - -The *plugin* is dynamically loaded from *directory* and expected to -implement the *int __erasure_code_init(char *plugin_name, char *directory)* function -which is responsible for registering an object derived from *ErasureCodePlugin* -in the registry. The `ErasureCodePluginExample `_ plugin reads: - -:: - - ErasureCodePluginRegistry &instance = - ErasureCodePluginRegistry::instance(); - instance.add(plugin_name, new ErasureCodePluginExample()); - -The *ErasureCodePlugin* derived object must provide a factory method -from which the concrete implementation of the *ErasureCodeInterface* -object can be generated. The `ErasureCodePluginExample plugin `_ reads: - -:: - - virtual int factory(const map ¶meters, - ErasureCodeInterfaceRef *erasure_code) { - *erasure_code = ErasureCodeInterfaceRef(new ErasureCodeExample(parameters)); - return 0; - } - -The *parameters* argument is the list of *key=value* pairs that were -set in the erasure code profile, before the pool was created. - -:: - - ceph osd erasure-code-profile set myprofile \ - directory= \ # mandatory - plugin=jerasure \ # mandatory - m=10 \ # optional and plugin dependant - k=3 \ # optional and plugin dependant - technique=reed_sol_van \ # optional and plugin dependant - -Notes ------ - -If the objects are large, it may be impractical to encode and decode -them in memory. However, when using *RBD* a 1TB device is divided in -many individual 4MB objects and *RGW* does the same. - -Encoding and decoding is implemented in the OSD. Although it could be -implemented client side for read write, the OSD must be able to encode -and decode on its own when scrubbing. -- cgit 1.2.3-korg