summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/rados/operations
diff options
context:
space:
mode:
authorQiaowei Ren <qiaowei.ren@intel.com>2018-03-01 14:38:11 +0800
committerQiaowei Ren <qiaowei.ren@intel.com>2018-03-01 14:38:11 +0800
commit7da45d65be36d36b880cc55c5036e96c24b53f00 (patch)
treed4f944eb4f8f8de50a9a7584ffa408dc3a3185b2 /src/ceph/doc/rados/operations
parent691462d09d0987b47e112d6ee8740375df3c51b2 (diff)
remove ceph code
This patch removes initial ceph code, due to license issue. Change-Id: I092d44f601cdf34aed92300fe13214925563081c Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Diffstat (limited to 'src/ceph/doc/rados/operations')
-rw-r--r--src/ceph/doc/rados/operations/add-or-rm-mons.rst370
-rw-r--r--src/ceph/doc/rados/operations/add-or-rm-osds.rst366
-rw-r--r--src/ceph/doc/rados/operations/cache-tiering.rst461
-rw-r--r--src/ceph/doc/rados/operations/control.rst453
-rw-r--r--src/ceph/doc/rados/operations/crush-map-edits.rst654
-rw-r--r--src/ceph/doc/rados/operations/crush-map.rst956
-rw-r--r--src/ceph/doc/rados/operations/data-placement.rst37
-rw-r--r--src/ceph/doc/rados/operations/erasure-code-isa.rst105
-rw-r--r--src/ceph/doc/rados/operations/erasure-code-jerasure.rst120
-rw-r--r--src/ceph/doc/rados/operations/erasure-code-lrc.rst371
-rw-r--r--src/ceph/doc/rados/operations/erasure-code-profile.rst121
-rw-r--r--src/ceph/doc/rados/operations/erasure-code-shec.rst144
-rw-r--r--src/ceph/doc/rados/operations/erasure-code.rst195
-rw-r--r--src/ceph/doc/rados/operations/health-checks.rst527
-rw-r--r--src/ceph/doc/rados/operations/index.rst90
-rw-r--r--src/ceph/doc/rados/operations/monitoring-osd-pg.rst617
-rw-r--r--src/ceph/doc/rados/operations/monitoring.rst351
-rw-r--r--src/ceph/doc/rados/operations/operating.rst251
-rw-r--r--src/ceph/doc/rados/operations/pg-concepts.rst102
-rw-r--r--src/ceph/doc/rados/operations/pg-repair.rst4
-rw-r--r--src/ceph/doc/rados/operations/pg-states.rst80
-rw-r--r--src/ceph/doc/rados/operations/placement-groups.rst469
-rw-r--r--src/ceph/doc/rados/operations/pools.rst798
-rw-r--r--src/ceph/doc/rados/operations/upmap.rst75
-rw-r--r--src/ceph/doc/rados/operations/user-management.rst665
25 files changed, 0 insertions, 8382 deletions
diff --git a/src/ceph/doc/rados/operations/add-or-rm-mons.rst b/src/ceph/doc/rados/operations/add-or-rm-mons.rst
deleted file mode 100644
index 0cdc431..0000000
--- a/src/ceph/doc/rados/operations/add-or-rm-mons.rst
+++ /dev/null
@@ -1,370 +0,0 @@
-==========================
- Adding/Removing Monitors
-==========================
-
-When you have a cluster up and running, you may add or remove monitors
-from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
-or `Monitor Bootstrap`_.
-
-Adding Monitors
-===============
-
-Ceph monitors are light-weight processes that maintain a master copy of the
-cluster map. You can run a cluster with 1 monitor. We recommend at least 3
-monitors for a production cluster. Ceph monitors use a variation of the
-`Paxos`_ protocol to establish consensus about maps and other critical
-information across the cluster. Due to the nature of Paxos, Ceph requires
-a majority of monitors running to establish a quorum (thus establishing
-consensus).
-
-It is advisable to run an odd-number of monitors but not mandatory. An
-odd-number of monitors has a higher resiliency to failures than an
-even-number of monitors. For instance, on a 2 monitor deployment, no
-failures can be tolerated in order to maintain a quorum; with 3 monitors,
-one failure can be tolerated; in a 4 monitor deployment, one failure can
-be tolerated; with 5 monitors, two failures can be tolerated. This is
-why an odd-number is advisable. Summarizing, Ceph needs a majority of
-monitors to be running (and able to communicate with each other), but that
-majority can be achieved using a single monitor, or 2 out of 2 monitors,
-2 out of 3, 3 out of 4, etc.
-
-For an initial deployment of a multi-node Ceph cluster, it is advisable to
-deploy three monitors, increasing the number two at a time if a valid need
-for more than three exists.
-
-Since monitors are light-weight, it is possible to run them on the same
-host as an OSD; however, we recommend running them on separate hosts,
-because fsync issues with the kernel may impair performance.
-
-.. note:: A *majority* of monitors in your cluster must be able to
- reach each other in order to establish a quorum.
-
-Deploy your Hardware
---------------------
-
-If you are adding a new host when adding a new monitor, see `Hardware
-Recommendations`_ for details on minimum recommendations for monitor hardware.
-To add a monitor host to your cluster, first make sure you have an up-to-date
-version of Linux installed (typically Ubuntu 14.04 or RHEL 7).
-
-Add your monitor host to a rack in your cluster, connect it to the network
-and ensure that it has network connectivity.
-
-.. _Hardware Recommendations: ../../../start/hardware-recommendations
-
-Install the Required Software
------------------------------
-
-For manually deployed clusters, you must install Ceph packages
-manually. See `Installing Packages`_ for details.
-You should configure SSH to a user with password-less authentication
-and root permissions.
-
-.. _Installing Packages: ../../../install/install-storage-cluster
-
-
-.. _Adding a Monitor (Manual):
-
-Adding a Monitor (Manual)
--------------------------
-
-This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
-and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster. If
-this results in only two monitor daemons, you may add more monitors by
-repeating this procedure until you have a sufficient number of ``ceph-mon``
-daemons to achieve a quorum.
-
-At this point you should define your monitor's id. Traditionally, monitors
-have been named with single letters (``a``, ``b``, ``c``, ...), but you are
-free to define the id as you see fit. For the purpose of this document,
-please take into account that ``{mon-id}`` should be the id you chose,
-without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a``
-on ``mon.a``).
-
-#. Create the default directory on the machine that will host your
- new monitor. ::
-
- ssh {new-mon-host}
- sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
-
-#. Create a temporary directory ``{tmp}`` to keep the files needed during
- this process. This directory should be different from the monitor's default
- directory created in the previous step, and can be removed after all the
- steps are executed. ::
-
- mkdir {tmp}
-
-#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to
- the retrieved keyring, and ``{key-filename}`` is the name of the file
- containing the retrieved monitor key. ::
-
- ceph auth get mon. -o {tmp}/{key-filename}
-
-#. Retrieve the monitor map, where ``{tmp}`` is the path to
- the retrieved monitor map, and ``{map-filename}`` is the name of the file
- containing the retrieved monitor monitor map. ::
-
- ceph mon getmap -o {tmp}/{map-filename}
-
-#. Prepare the monitor's data directory created in the first step. You must
- specify the path to the monitor map so that you can retrieve the
- information about a quorum of monitors and their ``fsid``. You must also
- specify a path to the monitor keyring::
-
- sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
-
-
-#. Start the new monitor and it will automatically join the cluster.
- The daemon needs to know which address to bind to, either via
- ``--public-addr {ip:port}`` or by setting ``mon addr`` in the
- appropriate section of ``ceph.conf``. For example::
-
- ceph-mon -i {mon-id} --public-addr {ip:port}
-
-
-Removing Monitors
-=================
-
-When you remove monitors from a cluster, consider that Ceph monitors use
-PAXOS to establish consensus about the master cluster map. You must have
-a sufficient number of monitors to establish a quorum for consensus about
-the cluster map.
-
-.. _Removing a Monitor (Manual):
-
-Removing a Monitor (Manual)
----------------------------
-
-This procedure removes a ``ceph-mon`` daemon from your cluster. If this
-procedure results in only two monitor daemons, you may add or remove another
-monitor until you have a number of ``ceph-mon`` daemons that can achieve a
-quorum.
-
-#. Stop the monitor. ::
-
- service ceph -a stop mon.{mon-id}
-
-#. Remove the monitor from the cluster. ::
-
- ceph mon remove {mon-id}
-
-#. Remove the monitor entry from ``ceph.conf``.
-
-
-Removing Monitors from an Unhealthy Cluster
--------------------------------------------
-
-This procedure removes a ``ceph-mon`` daemon from an unhealthy
-cluster, for example a cluster where the monitors cannot form a
-quorum.
-
-
-#. Stop all ``ceph-mon`` daemons on all monitor hosts. ::
-
- ssh {mon-host}
- service ceph stop mon || stop ceph-mon-all
- # and repeat for all mons
-
-#. Identify a surviving monitor and log in to that host. ::
-
- ssh {mon-host}
-
-#. Extract a copy of the monmap file. ::
-
- ceph-mon -i {mon-id} --extract-monmap {map-path}
- # in most cases, that's
- ceph-mon -i `hostname` --extract-monmap /tmp/monmap
-
-#. Remove the non-surviving or problematic monitors. For example, if
- you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
- only ``mon.a`` will survive, follow the example below::
-
- monmaptool {map-path} --rm {mon-id}
- # for example,
- monmaptool /tmp/monmap --rm b
- monmaptool /tmp/monmap --rm c
-
-#. Inject the surviving map with the removed monitors into the
- surviving monitor(s). For example, to inject a map into monitor
- ``mon.a``, follow the example below::
-
- ceph-mon -i {mon-id} --inject-monmap {map-path}
- # for example,
- ceph-mon -i a --inject-monmap /tmp/monmap
-
-#. Start only the surviving monitors.
-
-#. Verify the monitors form a quorum (``ceph -s``).
-
-#. You may wish to archive the removed monitors' data directory in
- ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
- confident the remaining monitors are healthy and are sufficiently
- redundant.
-
-.. _Changing a Monitor's IP address:
-
-Changing a Monitor's IP Address
-===============================
-
-.. important:: Existing monitors are not supposed to change their IP addresses.
-
-Monitors are critical components of a Ceph cluster, and they need to maintain a
-quorum for the whole system to work properly. To establish a quorum, the
-monitors need to discover each other. Ceph has strict requirements for
-discovering monitors.
-
-Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
-However, monitors discover each other using the monitor map, not ``ceph.conf``.
-For example, if you refer to `Adding a Monitor (Manual)`_ you will see that you
-need to obtain the current monmap for the cluster when creating a new monitor,
-as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
-following sections explain the consistency requirements for Ceph monitors, and a
-few safe ways to change a monitor's IP address.
-
-
-Consistency Requirements
-------------------------
-
-A monitor always refers to the local copy of the monmap when discovering other
-monitors in the cluster. Using the monmap instead of ``ceph.conf`` avoids
-errors that could break the cluster (e.g., typos in ``ceph.conf`` when
-specifying a monitor address or port). Since monitors use monmaps for discovery
-and they share monmaps with clients and other Ceph daemons, the monmap provides
-monitors with a strict guarantee that their consensus is valid.
-
-Strict consistency also applies to updates to the monmap. As with any other
-updates on the monitor, changes to the monmap always run through a distributed
-consensus algorithm called `Paxos`_. The monitors must agree on each update to
-the monmap, such as adding or removing a monitor, to ensure that each monitor in
-the quorum has the same version of the monmap. Updates to the monmap are
-incremental so that monitors have the latest agreed upon version, and a set of
-previous versions, allowing a monitor that has an older version of the monmap to
-catch up with the current state of the cluster.
-
-If monitors discovered each other through the Ceph configuration file instead of
-through the monmap, it would introduce additional risks because the Ceph
-configuration files are not updated and distributed automatically. Monitors
-might inadvertently use an older ``ceph.conf`` file, fail to recognize a
-monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
-to determine the current state of the system accurately. Consequently, making
-changes to an existing monitor's IP address must be done with great care.
-
-
-Changing a Monitor's IP address (The Right Way)
------------------------------------------------
-
-Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
-ensure that other monitors in the cluster will receive the update. To change a
-monitor's IP address, you must add a new monitor with the IP address you want
-to use (as described in `Adding a Monitor (Manual)`_), ensure that the new
-monitor successfully joins the quorum; then, remove the monitor that uses the
-old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
-other daemons know the IP address of the new monitor.
-
-For example, lets assume there are three monitors in place, such as ::
-
- [mon.a]
- host = host01
- addr = 10.0.0.1:6789
- [mon.b]
- host = host02
- addr = 10.0.0.2:6789
- [mon.c]
- host = host03
- addr = 10.0.0.3:6789
-
-To change ``mon.c`` to ``host04`` with the IP address ``10.0.0.4``, follow the
-steps in `Adding a Monitor (Manual)`_ by adding a new monitor ``mon.d``. Ensure
-that ``mon.d`` is running before removing ``mon.c``, or it will break the
-quorum. Remove ``mon.c`` as described on `Removing a Monitor (Manual)`_. Moving
-all three monitors would thus require repeating this process as many times as
-needed.
-
-
-Changing a Monitor's IP address (The Messy Way)
------------------------------------------------
-
-There may come a time when the monitors must be moved to a different network, a
-different part of the datacenter or a different datacenter altogether. While it
-is possible to do it, the process becomes a bit more hazardous.
-
-In such a case, the solution is to generate a new monmap with updated IP
-addresses for all the monitors in the cluster, and inject the new map on each
-individual monitor. This is not the most user-friendly approach, but we do not
-expect this to be something that needs to be done every other week. As it is
-clearly stated on the top of this section, monitors are not supposed to change
-IP addresses.
-
-Using the previous monitor configuration as an example, assume you want to move
-all the monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
-networks are unable to communicate. Use the following procedure:
-
-#. Retrieve the monitor map, where ``{tmp}`` is the path to
- the retrieved monitor map, and ``{filename}`` is the name of the file
- containing the retrieved monitor monitor map. ::
-
- ceph mon getmap -o {tmp}/{filename}
-
-#. The following example demonstrates the contents of the monmap. ::
-
- $ monmaptool --print {tmp}/{filename}
-
- monmaptool: monmap file {tmp}/{filename}
- epoch 1
- fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
- last_changed 2012-12-17 02:46:41.591248
- created 2012-12-17 02:46:41.591248
- 0: 10.0.0.1:6789/0 mon.a
- 1: 10.0.0.2:6789/0 mon.b
- 2: 10.0.0.3:6789/0 mon.c
-
-#. Remove the existing monitors. ::
-
- $ monmaptool --rm a --rm b --rm c {tmp}/{filename}
-
- monmaptool: monmap file {tmp}/{filename}
- monmaptool: removing a
- monmaptool: removing b
- monmaptool: removing c
- monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
-
-#. Add the new monitor locations. ::
-
- $ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
-
- monmaptool: monmap file {tmp}/{filename}
- monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
-
-#. Check new contents. ::
-
- $ monmaptool --print {tmp}/{filename}
-
- monmaptool: monmap file {tmp}/{filename}
- epoch 1
- fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
- last_changed 2012-12-17 02:46:41.591248
- created 2012-12-17 02:46:41.591248
- 0: 10.1.0.1:6789/0 mon.a
- 1: 10.1.0.2:6789/0 mon.b
- 2: 10.1.0.3:6789/0 mon.c
-
-At this point, we assume the monitors (and stores) are installed at the new
-location. The next step is to propagate the modified monmap to the new
-monitors, and inject the modified monmap into each new monitor.
-
-#. First, make sure to stop all your monitors. Injection must be done while
- the daemon is not running.
-
-#. Inject the monmap. ::
-
- ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
-
-#. Restart the monitors.
-
-After this step, migration to the new location is complete and
-the monitors should operate successfully.
-
-
-.. _Manual Deployment: ../../../install/manual-deployment
-.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
-.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)
diff --git a/src/ceph/doc/rados/operations/add-or-rm-osds.rst b/src/ceph/doc/rados/operations/add-or-rm-osds.rst
deleted file mode 100644
index 59ce4c7..0000000
--- a/src/ceph/doc/rados/operations/add-or-rm-osds.rst
+++ /dev/null
@@ -1,366 +0,0 @@
-======================
- Adding/Removing OSDs
-======================
-
-When you have a cluster up and running, you may add OSDs or remove OSDs
-from the cluster at runtime.
-
-Adding OSDs
-===========
-
-When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an
-OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a
-host machine. If your host has multiple storage drives, you may map one
-``ceph-osd`` daemon for each drive.
-
-Generally, it's a good idea to check the capacity of your cluster to see if you
-are reaching the upper end of its capacity. As your cluster reaches its ``near
-full`` ratio, you should add one or more OSDs to expand your cluster's capacity.
-
-.. warning:: Do not let your cluster reach its ``full ratio`` before
- adding an OSD. OSD failures that occur after the cluster reaches
- its ``near full`` ratio may cause the cluster to exceed its
- ``full ratio``.
-
-Deploy your Hardware
---------------------
-
-If you are adding a new host when adding a new OSD, see `Hardware
-Recommendations`_ for details on minimum recommendations for OSD hardware. To
-add an OSD host to your cluster, first make sure you have an up-to-date version
-of Linux installed, and you have made some initial preparations for your
-storage drives. See `Filesystem Recommendations`_ for details.
-
-Add your OSD host to a rack in your cluster, connect it to the network
-and ensure that it has network connectivity. See the `Network Configuration
-Reference`_ for details.
-
-.. _Hardware Recommendations: ../../../start/hardware-recommendations
-.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
-.. _Network Configuration Reference: ../../configuration/network-config-ref
-
-Install the Required Software
------------------------------
-
-For manually deployed clusters, you must install Ceph packages
-manually. See `Installing Ceph (Manual)`_ for details.
-You should configure SSH to a user with password-less authentication
-and root permissions.
-
-.. _Installing Ceph (Manual): ../../../install
-
-
-Adding an OSD (Manual)
-----------------------
-
-This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive,
-and configures the cluster to distribute data to the OSD. If your host has
-multiple drives, you may add an OSD for each drive by repeating this procedure.
-
-To add an OSD, create a data directory for it, mount a drive to that directory,
-add the OSD to the cluster, and then add it to the CRUSH map.
-
-When you add the OSD to the CRUSH map, consider the weight you give to the new
-OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger
-hard drives than older hosts in the cluster (i.e., they may have greater
-weight).
-
-.. tip:: Ceph prefers uniform hardware across pools. If you are adding drives
- of dissimilar size, you can adjust their weights. However, for best
- performance, consider a CRUSH hierarchy with drives of the same type/size.
-
-#. Create the OSD. If no UUID is given, it will be set automatically when the
- OSD starts up. The following command will output the OSD number, which you
- will need for subsequent steps. ::
-
- ceph osd create [{uuid} [{id}]]
-
- If the optional parameter {id} is given it will be used as the OSD id.
- Note, in this case the command may fail if the number is already in use.
-
- .. warning:: In general, explicitly specifying {id} is not recommended.
- IDs are allocated as an array, and skipping entries consumes some extra
- memory. This can become significant if there are large gaps and/or
- clusters are large. If {id} is not specified, the smallest available is
- used.
-
-#. Create the default directory on your new OSD. ::
-
- ssh {new-osd-host}
- sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
-
-
-#. If the OSD is for a drive other than the OS drive, prepare it
- for use with Ceph, and mount it to the directory you just created::
-
- ssh {new-osd-host}
- sudo mkfs -t {fstype} /dev/{drive}
- sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
-
-
-#. Initialize the OSD data directory. ::
-
- ssh {new-osd-host}
- ceph-osd -i {osd-num} --mkfs --mkkey
-
- The directory must be empty before you can run ``ceph-osd``.
-
-#. Register the OSD authentication key. The value of ``ceph`` for
- ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. If your
- cluster name differs from ``ceph``, use your cluster name instead.::
-
- ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
-
-
-#. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The
- ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy
- wherever you wish. If you specify at least one bucket, the command
- will place the OSD into the most specific bucket you specify, *and* it will
- move that bucket underneath any other buckets you specify. **Important:** If
- you specify only the root bucket, the command will attach the OSD directly
- to the root, but CRUSH rules expect OSDs to be inside of hosts.
-
- For Argonaut (v 0.48), execute the following::
-
- ceph osd crush add {id} {name} {weight} [{bucket-type}={bucket-name} ...]
-
- For Bobtail (v 0.56) and later releases, execute the following::
-
- ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
-
- You may also decompile the CRUSH map, add the OSD to the device list, add the
- host as a bucket (if it's not already in the CRUSH map), add the device as an
- item in the host, assign it a weight, recompile it and set it. See
- `Add/Move an OSD`_ for details.
-
-
-.. topic:: Argonaut (v0.48) Best Practices
-
- To limit impact on user I/O performance, add an OSD to the CRUSH map
- with an initial weight of ``0``. Then, ramp up the CRUSH weight a
- little bit at a time. For example, to ramp by increments of ``0.2``,
- start with::
-
- ceph osd crush reweight {osd-id} .2
-
- and allow migration to complete before reweighting to ``0.4``,
- ``0.6``, and so on until the desired CRUSH weight is reached.
-
- To limit the impact of OSD failures, you can set::
-
- mon osd down out interval = 0
-
- which prevents down OSDs from automatically being marked out, and then
- ramp them down manually with::
-
- ceph osd reweight {osd-num} .8
-
- Again, wait for the cluster to finish migrating data, and then adjust
- the weight further until you reach a weight of 0. Note that this
- problem prevents the cluster to automatically re-replicate data after
- a failure, so please ensure that sufficient monitoring is in place for
- an administrator to intervene promptly.
-
- Note that this practice will no longer be necessary in Bobtail and
- subsequent releases.
-
-
-Replacing an OSD
-----------------
-
-When disks fail, or if an admnistrator wants to reprovision OSDs with a new
-backend, for instance, for switching from FileStore to BlueStore, OSDs need to
-be replaced. Unlike `Removing the OSD`_, replaced OSD's id and CRUSH map entry
-need to be keep intact after the OSD is destroyed for replacement.
-
-#. Destroy the OSD first::
-
- ceph osd destroy {id} --yes-i-really-mean-it
-
-#. Zap a disk for the new OSD, if the disk was used before for other purposes.
- It's not necessary for a new disk::
-
- ceph-disk zap /dev/sdX
-
-#. Prepare the disk for replacement by using the previously destroyed OSD id::
-
- ceph-disk prepare --bluestore /dev/sdX --osd-id {id} --osd-uuid `uuidgen`
-
-#. And activate the OSD::
-
- ceph-disk activate /dev/sdX1
-
-
-Starting the OSD
-----------------
-
-After you add an OSD to Ceph, the OSD is in your configuration. However,
-it is not yet running. The OSD is ``down`` and ``in``. You must start
-your new OSD before it can begin receiving data. You may use
-``service ceph`` from your admin host or start the OSD from its host
-machine.
-
-For Ubuntu Trusty use Upstart. ::
-
- sudo start ceph-osd id={osd-num}
-
-For all other distros use systemd. ::
-
- sudo systemctl start ceph-osd@{osd-num}
-
-
-Once you start your OSD, it is ``up`` and ``in``.
-
-
-Observe the Data Migration
---------------------------
-
-Once you have added your new OSD to the CRUSH map, Ceph will begin rebalancing
-the server by migrating placement groups to your new OSD. You can observe this
-process with the `ceph`_ tool. ::
-
- ceph -w
-
-You should see the placement group states change from ``active+clean`` to
-``active, some degraded objects``, and finally ``active+clean`` when migration
-completes. (Control-c to exit.)
-
-
-.. _Add/Move an OSD: ../crush-map#addosd
-.. _ceph: ../monitoring
-
-
-
-Removing OSDs (Manual)
-======================
-
-When you want to reduce the size of a cluster or replace hardware, you may
-remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd``
-daemon for one storage drive within a host machine. If your host has multiple
-storage drives, you may need to remove one ``ceph-osd`` daemon for each drive.
-Generally, it's a good idea to check the capacity of your cluster to see if you
-are reaching the upper end of its capacity. Ensure that when you remove an OSD
-that your cluster is not at its ``near full`` ratio.
-
-.. warning:: Do not let your cluster reach its ``full ratio`` when
- removing an OSD. Removing OSDs could cause the cluster to reach
- or exceed its ``full ratio``.
-
-
-Take the OSD out of the Cluster
------------------------------------
-
-Before you remove an OSD, it is usually ``up`` and ``in``. You need to take it
-out of the cluster so that Ceph can begin rebalancing and copying its data to
-other OSDs. ::
-
- ceph osd out {osd-num}
-
-
-Observe the Data Migration
---------------------------
-
-Once you have taken your OSD ``out`` of the cluster, Ceph will begin
-rebalancing the cluster by migrating placement groups out of the OSD you
-removed. You can observe this process with the `ceph`_ tool. ::
-
- ceph -w
-
-You should see the placement group states change from ``active+clean`` to
-``active, some degraded objects``, and finally ``active+clean`` when migration
-completes. (Control-c to exit.)
-
-.. note:: Sometimes, typically in a "small" cluster with few hosts (for
- instance with a small testing cluster), the fact to take ``out`` the
- OSD can spawn a CRUSH corner case where some PGs remain stuck in the
- ``active+remapped`` state. If you are in this case, you should mark
- the OSD ``in`` with:
-
- ``ceph osd in {osd-num}``
-
- to come back to the initial state and then, instead of marking ``out``
- the OSD, set its weight to 0 with:
-
- ``ceph osd crush reweight osd.{osd-num} 0``
-
- After that, you can observe the data migration which should come to its
- end. The difference between marking ``out`` the OSD and reweighting it
- to 0 is that in the first case the weight of the bucket which contains
- the OSD is not changed whereas in the second case the weight of the bucket
- is updated (and decreased of the OSD weight). The reweight command could
- be sometimes favoured in the case of a "small" cluster.
-
-
-
-Stopping the OSD
-----------------
-
-After you take an OSD out of the cluster, it may still be running.
-That is, the OSD may be ``up`` and ``out``. You must stop
-your OSD before you remove it from the configuration. ::
-
- ssh {osd-host}
- sudo systemctl stop ceph-osd@{osd-num}
-
-Once you stop your OSD, it is ``down``.
-
-
-Removing the OSD
-----------------
-
-This procedure removes an OSD from a cluster map, removes its authentication
-key, removes the OSD from the OSD map, and removes the OSD from the
-``ceph.conf`` file. If your host has multiple drives, you may need to remove an
-OSD for each drive by repeating this procedure.
-
-#. Let the cluster forget the OSD first. This step removes the OSD from the CRUSH
- map, removes its authentication key. And it is removed from the OSD map as
- well. Please note the `purge subcommand`_ is introduced in Luminous, for older
- versions, please see below ::
-
- ceph osd purge {id} --yes-i-really-mean-it
-
-#. Navigate to the host where you keep the master copy of the cluster's
- ``ceph.conf`` file. ::
-
- ssh {admin-host}
- cd /etc/ceph
- vim ceph.conf
-
-#. Remove the OSD entry from your ``ceph.conf`` file (if it exists). ::
-
- [osd.1]
- host = {hostname}
-
-#. From the host where you keep the master copy of the cluster's ``ceph.conf`` file,
- copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other
- hosts in your cluster.
-
-If your Ceph cluster is older than Luminous, instead of using ``ceph osd purge``,
-you need to perform this step manually:
-
-
-#. Remove the OSD from the CRUSH map so that it no longer receives data. You may
- also decompile the CRUSH map, remove the OSD from the device list, remove the
- device as an item in the host bucket or remove the host bucket (if it's in the
- CRUSH map and you intend to remove the host), recompile the map and set it.
- See `Remove an OSD`_ for details. ::
-
- ceph osd crush remove {name}
-
-#. Remove the OSD authentication key. ::
-
- ceph auth del osd.{osd-num}
-
- The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.
- If your cluster name differs from ``ceph``, use your cluster name instead.
-
-#. Remove the OSD. ::
-
- ceph osd rm {osd-num}
- #for example
- ceph osd rm 1
-
-
-.. _Remove an OSD: ../crush-map#removeosd
-.. _purge subcommand: /man/8/ceph#osd
diff --git a/src/ceph/doc/rados/operations/cache-tiering.rst b/src/ceph/doc/rados/operations/cache-tiering.rst
deleted file mode 100644
index 322c6ff..0000000
--- a/src/ceph/doc/rados/operations/cache-tiering.rst
+++ /dev/null
@@ -1,461 +0,0 @@
-===============
- Cache Tiering
-===============
-
-A cache tier provides Ceph Clients with better I/O performance for a subset of
-the data stored in a backing storage tier. Cache tiering involves creating a
-pool of relatively fast/expensive storage devices (e.g., solid state drives)
-configured to act as a cache tier, and a backing pool of either erasure-coded
-or relatively slower/cheaper devices configured to act as an economical storage
-tier. The Ceph objecter handles where to place the objects and the tiering
-agent determines when to flush objects from the cache to the backing storage
-tier. So the cache tier and the backing storage tier are completely transparent
-to Ceph clients.
-
-
-.. ditaa::
- +-------------+
- | Ceph Client |
- +------+------+
- ^
- Tiering is |
- Transparent | Faster I/O
- to Ceph | +---------------+
- Client Ops | | |
- | +----->+ Cache Tier |
- | | | |
- | | +-----+---+-----+
- | | | ^
- v v | | Active Data in Cache Tier
- +------+----+--+ | |
- | Objecter | | |
- +-----------+--+ | |
- ^ | | Inactive Data in Storage Tier
- | v |
- | +-----+---+-----+
- | | |
- +----->| Storage Tier |
- | |
- +---------------+
- Slower I/O
-
-
-The cache tiering agent handles the migration of data between the cache tier
-and the backing storage tier automatically. However, admins have the ability to
-configure how this migration takes place. There are two main scenarios:
-
-- **Writeback Mode:** When admins configure tiers with ``writeback`` mode, Ceph
- clients write data to the cache tier and receive an ACK from the cache tier.
- In time, the data written to the cache tier migrates to the storage tier
- and gets flushed from the cache tier. Conceptually, the cache tier is
- overlaid "in front" of the backing storage tier. When a Ceph client needs
- data that resides in the storage tier, the cache tiering agent migrates the
- data to the cache tier on read, then it is sent to the Ceph client.
- Thereafter, the Ceph client can perform I/O using the cache tier, until the
- data becomes inactive. This is ideal for mutable data (e.g., photo/video
- editing, transactional data, etc.).
-
-- **Read-proxy Mode:** This mode will use any objects that already
- exist in the cache tier, but if an object is not present in the
- cache the request will be proxied to the base tier. This is useful
- for transitioning from ``writeback`` mode to a disabled cache as it
- allows the workload to function properly while the cache is drained,
- without adding any new objects to the cache.
-
-A word of caution
-=================
-
-Cache tiering will *degrade* performance for most workloads. Users should use
-extreme caution before using this feature.
-
-* *Workload dependent*: Whether a cache will improve performance is
- highly dependent on the workload. Because there is a cost
- associated with moving objects into or out of the cache, it can only
- be effective when there is a *large skew* in the access pattern in
- the data set, such that most of the requests touch a small number of
- objects. The cache pool should be large enough to capture the
- working set for your workload to avoid thrashing.
-
-* *Difficult to benchmark*: Most benchmarks that users run to measure
- performance will show terrible performance with cache tiering, in
- part because very few of them skew requests toward a small set of
- objects, it can take a long time for the cache to "warm up," and
- because the warm-up cost can be high.
-
-* *Usually slower*: For workloads that are not cache tiering-friendly,
- performance is often slower than a normal RADOS pool without cache
- tiering enabled.
-
-* *librados object enumeration*: The librados-level object enumeration
- API is not meant to be coherent in the presence of the case. If
- your applicatoin is using librados directly and relies on object
- enumeration, cache tiering will probably not work as expected.
- (This is not a problem for RGW, RBD, or CephFS.)
-
-* *Complexity*: Enabling cache tiering means that a lot of additional
- machinery and complexity within the RADOS cluster is being used.
- This increases the probability that you will encounter a bug in the system
- that other users have not yet encountered and will put your deployment at a
- higher level of risk.
-
-Known Good Workloads
---------------------
-
-* *RGW time-skewed*: If the RGW workload is such that almost all read
- operations are directed at recently written objects, a simple cache
- tiering configuration that destages recently written objects from
- the cache to the base tier after a configurable period can work
- well.
-
-Known Bad Workloads
--------------------
-
-The following configurations are *known to work poorly* with cache
-tiering.
-
-* *RBD with replicated cache and erasure-coded base*: This is a common
- request, but usually does not perform well. Even reasonably skewed
- workloads still send some small writes to cold objects, and because
- small writes are not yet supported by the erasure-coded pool, entire
- (usually 4 MB) objects must be migrated into the cache in order to
- satisfy a small (often 4 KB) write. Only a handful of users have
- successfully deployed this configuration, and it only works for them
- because their data is extremely cold (backups) and they are not in
- any way sensitive to performance.
-
-* *RBD with replicated cache and base*: RBD with a replicated base
- tier does better than when the base is erasure coded, but it is
- still highly dependent on the amount of skew in the workload, and
- very difficult to validate. The user will need to have a good
- understanding of their workload and will need to tune the cache
- tiering parameters carefully.
-
-
-Setting Up Pools
-================
-
-To set up cache tiering, you must have two pools. One will act as the
-backing storage and the other will act as the cache.
-
-
-Setting Up a Backing Storage Pool
----------------------------------
-
-Setting up a backing storage pool typically involves one of two scenarios:
-
-- **Standard Storage**: In this scenario, the pool stores multiple copies
- of an object in the Ceph Storage Cluster.
-
-- **Erasure Coding:** In this scenario, the pool uses erasure coding to
- store data much more efficiently with a small performance tradeoff.
-
-In the standard storage scenario, you can setup a CRUSH ruleset to establish
-the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD
-Daemons perform optimally when all storage drives in the ruleset are of the
-same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_
-for details on creating a ruleset. Once you have created a ruleset, create
-a backing storage pool.
-
-In the erasure coding scenario, the pool creation arguments will generate the
-appropriate ruleset automatically. See `Create a Pool`_ for details.
-
-In subsequent examples, we will refer to the backing storage pool
-as ``cold-storage``.
-
-
-Setting Up a Cache Pool
------------------------
-
-Setting up a cache pool follows the same procedure as the standard storage
-scenario, but with this difference: the drives for the cache tier are typically
-high performance drives that reside in their own servers and have their own
-ruleset. When setting up a ruleset, it should take account of the hosts that
-have the high performance drives while omitting the hosts that don't. See
-`Placing Different Pools on Different OSDs`_ for details.
-
-
-In subsequent examples, we will refer to the cache pool as ``hot-storage`` and
-the backing pool as ``cold-storage``.
-
-For cache tier configuration and default values, see
-`Pools - Set Pool Values`_.
-
-
-Creating a Cache Tier
-=====================
-
-Setting up a cache tier involves associating a backing storage pool with
-a cache pool ::
-
- ceph osd tier add {storagepool} {cachepool}
-
-For example ::
-
- ceph osd tier add cold-storage hot-storage
-
-To set the cache mode, execute the following::
-
- ceph osd tier cache-mode {cachepool} {cache-mode}
-
-For example::
-
- ceph osd tier cache-mode hot-storage writeback
-
-The cache tiers overlay the backing storage tier, so they require one
-additional step: you must direct all client traffic from the storage pool to
-the cache pool. To direct client traffic directly to the cache pool, execute
-the following::
-
- ceph osd tier set-overlay {storagepool} {cachepool}
-
-For example::
-
- ceph osd tier set-overlay cold-storage hot-storage
-
-
-Configuring a Cache Tier
-========================
-
-Cache tiers have several configuration options. You may set
-cache tier configuration options with the following usage::
-
- ceph osd pool set {cachepool} {key} {value}
-
-See `Pools - Set Pool Values`_ for details.
-
-
-Target Size and Type
---------------------
-
-Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``::
-
- ceph osd pool set {cachepool} hit_set_type bloom
-
-For example::
-
- ceph osd pool set hot-storage hit_set_type bloom
-
-The ``hit_set_count`` and ``hit_set_period`` define how much time each HitSet
-should cover, and how many such HitSets to store. ::
-
- ceph osd pool set {cachepool} hit_set_count 12
- ceph osd pool set {cachepool} hit_set_period 14400
- ceph osd pool set {cachepool} target_max_bytes 1000000000000
-
-.. note:: A larger ``hit_set_count`` results in more RAM consumed by
- the ``ceph-osd`` process.
-
-Binning accesses over time allows Ceph to determine whether a Ceph client
-accessed an object at least once, or more than once over a time period
-("age" vs "temperature").
-
-The ``min_read_recency_for_promote`` defines how many HitSets to check for the
-existence of an object when handling a read operation. The checking result is
-used to decide whether to promote the object asynchronously. Its value should be
-between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
-If it's set to 1, the current HitSet is checked. And if this object is in the
-current HitSet, it's promoted. Otherwise not. For the other values, the exact
-number of archive HitSets are checked. The object is promoted if the object is
-found in any of the most recent ``min_read_recency_for_promote`` HitSets.
-
-A similar parameter can be set for the write operation, which is
-``min_write_recency_for_promote``. ::
-
- ceph osd pool set {cachepool} min_read_recency_for_promote 2
- ceph osd pool set {cachepool} min_write_recency_for_promote 2
-
-.. note:: The longer the period and the higher the
- ``min_read_recency_for_promote`` and
- ``min_write_recency_for_promote``values, the more RAM the ``ceph-osd``
- daemon consumes. In particular, when the agent is active to flush
- or evict cache objects, all ``hit_set_count`` HitSets are loaded
- into RAM.
-
-
-Cache Sizing
-------------
-
-The cache tiering agent performs two main functions:
-
-- **Flushing:** The agent identifies modified (or dirty) objects and forwards
- them to the storage pool for long-term storage.
-
-- **Evicting:** The agent identifies objects that haven't been modified
- (or clean) and evicts the least recently used among them from the cache.
-
-
-Absolute Sizing
-~~~~~~~~~~~~~~~
-
-The cache tiering agent can flush or evict objects based upon the total number
-of bytes or the total number of objects. To specify a maximum number of bytes,
-execute the following::
-
- ceph osd pool set {cachepool} target_max_bytes {#bytes}
-
-For example, to flush or evict at 1 TB, execute the following::
-
- ceph osd pool set hot-storage target_max_bytes 1099511627776
-
-
-To specify the maximum number of objects, execute the following::
-
- ceph osd pool set {cachepool} target_max_objects {#objects}
-
-For example, to flush or evict at 1M objects, execute the following::
-
- ceph osd pool set hot-storage target_max_objects 1000000
-
-.. note:: Ceph is not able to determine the size of a cache pool automatically, so
- the configuration on the absolute size is required here, otherwise the
- flush/evict will not work. If you specify both limits, the cache tiering
- agent will begin flushing or evicting when either threshold is triggered.
-
-.. note:: All client requests will be blocked only when ``target_max_bytes`` or
- ``target_max_objects`` reached
-
-Relative Sizing
-~~~~~~~~~~~~~~~
-
-The cache tiering agent can flush or evict objects relative to the size of the
-cache pool(specified by ``target_max_bytes`` / ``target_max_objects`` in
-`Absolute sizing`_). When the cache pool consists of a certain percentage of
-modified (or dirty) objects, the cache tiering agent will flush them to the
-storage pool. To set the ``cache_target_dirty_ratio``, execute the following::
-
- ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0}
-
-For example, setting the value to ``0.4`` will begin flushing modified
-(dirty) objects when they reach 40% of the cache pool's capacity::
-
- ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
-
-When the dirty objects reaches a certain percentage of its capacity, flush dirty
-objects with a higher speed. To set the ``cache_target_dirty_high_ratio``::
-
- ceph osd pool set {cachepool} cache_target_dirty_high_ratio {0.0..1.0}
-
-For example, setting the value to ``0.6`` will begin aggressively flush dirty objects
-when they reach 60% of the cache pool's capacity. obviously, we'd better set the value
-between dirty_ratio and full_ratio::
-
- ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
-
-When the cache pool reaches a certain percentage of its capacity, the cache
-tiering agent will evict objects to maintain free capacity. To set the
-``cache_target_full_ratio``, execute the following::
-
- ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0}
-
-For example, setting the value to ``0.8`` will begin flushing unmodified
-(clean) objects when they reach 80% of the cache pool's capacity::
-
- ceph osd pool set hot-storage cache_target_full_ratio 0.8
-
-
-Cache Age
----------
-
-You can specify the minimum age of an object before the cache tiering agent
-flushes a recently modified (or dirty) object to the backing storage pool::
-
- ceph osd pool set {cachepool} cache_min_flush_age {#seconds}
-
-For example, to flush modified (or dirty) objects after 10 minutes, execute
-the following::
-
- ceph osd pool set hot-storage cache_min_flush_age 600
-
-You can specify the minimum age of an object before it will be evicted from
-the cache tier::
-
- ceph osd pool {cache-tier} cache_min_evict_age {#seconds}
-
-For example, to evict objects after 30 minutes, execute the following::
-
- ceph osd pool set hot-storage cache_min_evict_age 1800
-
-
-Removing a Cache Tier
-=====================
-
-Removing a cache tier differs depending on whether it is a writeback
-cache or a read-only cache.
-
-
-Removing a Read-Only Cache
---------------------------
-
-Since a read-only cache does not have modified data, you can disable
-and remove it without losing any recent changes to objects in the cache.
-
-#. Change the cache-mode to ``none`` to disable it. ::
-
- ceph osd tier cache-mode {cachepool} none
-
- For example::
-
- ceph osd tier cache-mode hot-storage none
-
-#. Remove the cache pool from the backing pool. ::
-
- ceph osd tier remove {storagepool} {cachepool}
-
- For example::
-
- ceph osd tier remove cold-storage hot-storage
-
-
-
-Removing a Writeback Cache
---------------------------
-
-Since a writeback cache may have modified data, you must take steps to ensure
-that you do not lose any recent changes to objects in the cache before you
-disable and remove it.
-
-
-#. Change the cache mode to ``forward`` so that new and modified objects will
- flush to the backing storage pool. ::
-
- ceph osd tier cache-mode {cachepool} forward
-
- For example::
-
- ceph osd tier cache-mode hot-storage forward
-
-
-#. Ensure that the cache pool has been flushed. This may take a few minutes::
-
- rados -p {cachepool} ls
-
- If the cache pool still has objects, you can flush them manually.
- For example::
-
- rados -p {cachepool} cache-flush-evict-all
-
-
-#. Remove the overlay so that clients will not direct traffic to the cache. ::
-
- ceph osd tier remove-overlay {storagetier}
-
- For example::
-
- ceph osd tier remove-overlay cold-storage
-
-
-#. Finally, remove the cache tier pool from the backing storage pool. ::
-
- ceph osd tier remove {storagepool} {cachepool}
-
- For example::
-
- ceph osd tier remove cold-storage hot-storage
-
-
-.. _Create a Pool: ../pools#create-a-pool
-.. _Pools - Set Pool Values: ../pools#set-pool-values
-.. _Placing Different Pools on Different OSDs: ../crush-map/#placing-different-pools-on-different-osds
-.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
-.. _CRUSH Maps: ../crush-map
-.. _Absolute Sizing: #absolute-sizing
diff --git a/src/ceph/doc/rados/operations/control.rst b/src/ceph/doc/rados/operations/control.rst
deleted file mode 100644
index 1a58076..0000000
--- a/src/ceph/doc/rados/operations/control.rst
+++ /dev/null
@@ -1,453 +0,0 @@
-.. index:: control, commands
-
-==================
- Control Commands
-==================
-
-
-Monitor Commands
-================
-
-Monitor commands are issued using the ceph utility::
-
- ceph [-m monhost] {command}
-
-The command is usually (though not always) of the form::
-
- ceph {subsystem} {command}
-
-
-System Commands
-===============
-
-Execute the following to display the current status of the cluster. ::
-
- ceph -s
- ceph status
-
-Execute the following to display a running summary of the status of the cluster,
-and major events. ::
-
- ceph -w
-
-Execute the following to show the monitor quorum, including which monitors are
-participating and which one is the leader. ::
-
- ceph quorum_status
-
-Execute the following to query the status of a single monitor, including whether
-or not it is in the quorum. ::
-
- ceph [-m monhost] mon_status
-
-
-Authentication Subsystem
-========================
-
-To add a keyring for an OSD, execute the following::
-
- ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}
-
-To list the cluster's keys and their capabilities, execute the following::
-
- ceph auth ls
-
-
-Placement Group Subsystem
-=========================
-
-To display the statistics for all placement groups, execute the following::
-
- ceph pg dump [--format {format}]
-
-The valid formats are ``plain`` (default) and ``json``.
-
-To display the statistics for all placement groups stuck in a specified state,
-execute the following::
-
- ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]
-
-
-``--format`` may be ``plain`` (default) or ``json``
-
-``--threshold`` defines how many seconds "stuck" is (default: 300)
-
-**Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
-with the most up-to-date data to come back.
-
-**Unclean** Placement groups contain objects that are not replicated the desired number
-of times. They should be recovering.
-
-**Stale** Placement groups are in an unknown state - the OSDs that host them have not
-reported to the monitor cluster in a while (configured by
-``mon_osd_report_timeout``).
-
-Delete "lost" objects or revert them to their prior state, either a previous version
-or delete them if they were just created. ::
-
- ceph pg {pgid} mark_unfound_lost revert|delete
-
-
-OSD Subsystem
-=============
-
-Query OSD subsystem status. ::
-
- ceph osd stat
-
-Write a copy of the most recent OSD map to a file. See
-`osdmaptool`_. ::
-
- ceph osd getmap -o file
-
-.. _osdmaptool: ../../man/8/osdmaptool
-
-Write a copy of the crush map from the most recent OSD map to
-file. ::
-
- ceph osd getcrushmap -o file
-
-The foregoing functionally equivalent to ::
-
- ceph osd getmap -o /tmp/osdmap
- osdmaptool /tmp/osdmap --export-crush file
-
-Dump the OSD map. Valid formats for ``-f`` are ``plain`` and ``json``. If no
-``--format`` option is given, the OSD map is dumped as plain text. ::
-
- ceph osd dump [--format {format}]
-
-Dump the OSD map as a tree with one line per OSD containing weight
-and state. ::
-
- ceph osd tree [--format {format}]
-
-Find out where a specific object is or would be stored in the system::
-
- ceph osd map <pool-name> <object-name>
-
-Add or move a new item (OSD) with the given id/name/weight at the specified
-location. ::
-
- ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]
-
-Remove an existing item (OSD) from the CRUSH map. ::
-
- ceph osd crush remove {name}
-
-Remove an existing bucket from the CRUSH map. ::
-
- ceph osd crush remove {bucket-name}
-
-Move an existing bucket from one position in the hierarchy to another. ::
-
- ceph osd crush move {id} {loc1} [{loc2} ...]
-
-Set the weight of the item given by ``{name}`` to ``{weight}``. ::
-
- ceph osd crush reweight {name} {weight}
-
-Mark an OSD as lost. This may result in permanent data loss. Use with caution. ::
-
- ceph osd lost {id} [--yes-i-really-mean-it]
-
-Create a new OSD. If no UUID is given, it will be set automatically when the OSD
-starts up. ::
-
- ceph osd create [{uuid}]
-
-Remove the given OSD(s). ::
-
- ceph osd rm [{id}...]
-
-Query the current max_osd parameter in the OSD map. ::
-
- ceph osd getmaxosd
-
-Import the given crush map. ::
-
- ceph osd setcrushmap -i file
-
-Set the ``max_osd`` parameter in the OSD map. This is necessary when
-expanding the storage cluster. ::
-
- ceph osd setmaxosd
-
-Mark OSD ``{osd-num}`` down. ::
-
- ceph osd down {osd-num}
-
-Mark OSD ``{osd-num}`` out of the distribution (i.e. allocated no data). ::
-
- ceph osd out {osd-num}
-
-Mark ``{osd-num}`` in the distribution (i.e. allocated data). ::
-
- ceph osd in {osd-num}
-
-Set or clear the pause flags in the OSD map. If set, no IO requests
-will be sent to any OSD. Clearing the flags via unpause results in
-resending pending requests. ::
-
- ceph osd pause
- ceph osd unpause
-
-Set the weight of ``{osd-num}`` to ``{weight}``. Two OSDs with the
-same weight will receive roughly the same number of I/O requests and
-store approximately the same amount of data. ``ceph osd reweight``
-sets an override weight on the OSD. This value is in the range 0 to 1,
-and forces CRUSH to re-place (1-weight) of the data that would
-otherwise live on this drive. It does not change the weights assigned
-to the buckets above the OSD in the crush map, and is a corrective
-measure in case the normal CRUSH distribution is not working out quite
-right. For instance, if one of your OSDs is at 90% and the others are
-at 50%, you could reduce this weight to try and compensate for it. ::
-
- ceph osd reweight {osd-num} {weight}
-
-Reweights all the OSDs by reducing the weight of OSDs which are
-heavily overused. By default it will adjust the weights downward on
-OSDs which have 120% of the average utilization, but if you include
-threshold it will use that percentage instead. ::
-
- ceph osd reweight-by-utilization [threshold]
-
-Describes what reweight-by-utilization would do. ::
-
- ceph osd test-reweight-by-utilization
-
-Adds/removes the address to/from the blacklist. When adding an address,
-you can specify how long it should be blacklisted in seconds; otherwise,
-it will default to 1 hour. A blacklisted address is prevented from
-connecting to any OSD. Blacklisting is most often used to prevent a
-lagging metadata server from making bad changes to data on the OSDs.
-
-These commands are mostly only useful for failure testing, as
-blacklists are normally maintained automatically and shouldn't need
-manual intervention. ::
-
- ceph osd blacklist add ADDRESS[:source_port] [TIME]
- ceph osd blacklist rm ADDRESS[:source_port]
-
-Creates/deletes a snapshot of a pool. ::
-
- ceph osd pool mksnap {pool-name} {snap-name}
- ceph osd pool rmsnap {pool-name} {snap-name}
-
-Creates/deletes/renames a storage pool. ::
-
- ceph osd pool create {pool-name} pg_num [pgp_num]
- ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
- ceph osd pool rename {old-name} {new-name}
-
-Changes a pool setting. ::
-
- ceph osd pool set {pool-name} {field} {value}
-
-Valid fields are:
-
- * ``size``: Sets the number of copies of data in the pool.
- * ``pg_num``: The placement group number.
- * ``pgp_num``: Effective number when calculating pg placement.
- * ``crush_ruleset``: rule number for mapping placement.
-
-Get the value of a pool setting. ::
-
- ceph osd pool get {pool-name} {field}
-
-Valid fields are:
-
- * ``pg_num``: The placement group number.
- * ``pgp_num``: Effective number of placement groups when calculating placement.
- * ``lpg_num``: The number of local placement groups.
- * ``lpgp_num``: The number used for placing the local placement groups.
-
-
-Sends a scrub command to OSD ``{osd-num}``. To send the command to all OSDs, use ``*``. ::
-
- ceph osd scrub {osd-num}
-
-Sends a repair command to OSD.N. To send the command to all OSDs, use ``*``. ::
-
- ceph osd repair N
-
-Runs a simple throughput benchmark against OSD.N, writing ``TOTAL_DATA_BYTES``
-in write requests of ``BYTES_PER_WRITE`` each. By default, the test
-writes 1 GB in total in 4-MB increments.
-The benchmark is non-destructive and will not overwrite existing live
-OSD data, but might temporarily affect the performance of clients
-concurrently accessing the OSD. ::
-
- ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]
-
-
-MDS Subsystem
-=============
-
-Change configuration parameters on a running mds. ::
-
- ceph tell mds.{mds-id} injectargs --{switch} {value} [--{switch} {value}]
-
-Example::
-
- ceph tell mds.0 injectargs --debug_ms 1 --debug_mds 10
-
-Enables debug messages. ::
-
- ceph mds stat
-
-Displays the status of all metadata servers. ::
-
- ceph mds fail 0
-
-Marks the active MDS as failed, triggering failover to a standby if present.
-
-.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap
-
-
-Mon Subsystem
-=============
-
-Show monitor stats::
-
- ceph mon stat
-
- e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c
-
-
-The ``quorum`` list at the end lists monitor nodes that are part of the current quorum.
-
-This is also available more directly::
-
- ceph quorum_status -f json-pretty
-
-.. code-block:: javascript
-
- {
- "election_epoch": 6,
- "quorum": [
- 0,
- 1,
- 2
- ],
- "quorum_names": [
- "a",
- "b",
- "c"
- ],
- "quorum_leader_name": "a",
- "monmap": {
- "epoch": 2,
- "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
- "modified": "2016-12-26 14:42:09.288066",
- "created": "2016-12-26 14:42:03.573585",
- "features": {
- "persistent": [
- "kraken"
- ],
- "optional": []
- },
- "mons": [
- {
- "rank": 0,
- "name": "a",
- "addr": "127.0.0.1:40000\/0",
- "public_addr": "127.0.0.1:40000\/0"
- },
- {
- "rank": 1,
- "name": "b",
- "addr": "127.0.0.1:40001\/0",
- "public_addr": "127.0.0.1:40001\/0"
- },
- {
- "rank": 2,
- "name": "c",
- "addr": "127.0.0.1:40002\/0",
- "public_addr": "127.0.0.1:40002\/0"
- }
- ]
- }
- }
-
-
-The above will block until a quorum is reached.
-
-For a status of just the monitor you connect to (use ``-m HOST:PORT``
-to select)::
-
- ceph mon_status -f json-pretty
-
-
-.. code-block:: javascript
-
- {
- "name": "b",
- "rank": 1,
- "state": "peon",
- "election_epoch": 6,
- "quorum": [
- 0,
- 1,
- 2
- ],
- "features": {
- "required_con": "9025616074522624",
- "required_mon": [
- "kraken"
- ],
- "quorum_con": "1152921504336314367",
- "quorum_mon": [
- "kraken"
- ]
- },
- "outside_quorum": [],
- "extra_probe_peers": [],
- "sync_provider": [],
- "monmap": {
- "epoch": 2,
- "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
- "modified": "2016-12-26 14:42:09.288066",
- "created": "2016-12-26 14:42:03.573585",
- "features": {
- "persistent": [
- "kraken"
- ],
- "optional": []
- },
- "mons": [
- {
- "rank": 0,
- "name": "a",
- "addr": "127.0.0.1:40000\/0",
- "public_addr": "127.0.0.1:40000\/0"
- },
- {
- "rank": 1,
- "name": "b",
- "addr": "127.0.0.1:40001\/0",
- "public_addr": "127.0.0.1:40001\/0"
- },
- {
- "rank": 2,
- "name": "c",
- "addr": "127.0.0.1:40002\/0",
- "public_addr": "127.0.0.1:40002\/0"
- }
- ]
- }
- }
-
-A dump of the monitor state::
-
- ceph mon dump
-
- dumped monmap epoch 2
- epoch 2
- fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
- last_changed 2016-12-26 14:42:09.288066
- created 2016-12-26 14:42:03.573585
- 0: 127.0.0.1:40000/0 mon.a
- 1: 127.0.0.1:40001/0 mon.b
- 2: 127.0.0.1:40002/0 mon.c
-
diff --git a/src/ceph/doc/rados/operations/crush-map-edits.rst b/src/ceph/doc/rados/operations/crush-map-edits.rst
deleted file mode 100644
index 5222270..0000000
--- a/src/ceph/doc/rados/operations/crush-map-edits.rst
+++ /dev/null
@@ -1,654 +0,0 @@
-Manually editing a CRUSH Map
-============================
-
-.. note:: Manually editing the CRUSH map is considered an advanced
- administrator operation. All CRUSH changes that are
- necessary for the overwhelming majority of installations are
- possible via the standard ceph CLI and do not require manual
- CRUSH map edits. If you have identified a use case where
- manual edits *are* necessary, consider contacting the Ceph
- developers so that future versions of Ceph can make this
- unnecessary.
-
-To edit an existing CRUSH map:
-
-#. `Get the CRUSH map`_.
-#. `Decompile`_ the CRUSH map.
-#. Edit at least one of `Devices`_, `Buckets`_ and `Rules`_.
-#. `Recompile`_ the CRUSH map.
-#. `Set the CRUSH map`_.
-
-To activate CRUSH map rules for a specific pool, identify the common ruleset
-number for those rules and specify that ruleset number for the pool. See `Set
-Pool Values`_ for details.
-
-.. _Get the CRUSH map: #getcrushmap
-.. _Decompile: #decompilecrushmap
-.. _Devices: #crushmapdevices
-.. _Buckets: #crushmapbuckets
-.. _Rules: #crushmaprules
-.. _Recompile: #compilecrushmap
-.. _Set the CRUSH map: #setcrushmap
-.. _Set Pool Values: ../pools#setpoolvalues
-
-.. _getcrushmap:
-
-Get a CRUSH Map
----------------
-
-To get the CRUSH map for your cluster, execute the following::
-
- ceph osd getcrushmap -o {compiled-crushmap-filename}
-
-Ceph will output (-o) a compiled CRUSH map to the filename you specified. Since
-the CRUSH map is in a compiled form, you must decompile it first before you can
-edit it.
-
-.. _decompilecrushmap:
-
-Decompile a CRUSH Map
----------------------
-
-To decompile a CRUSH map, execute the following::
-
- crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
-
-
-Sections
---------
-
-There are six main sections to a CRUSH Map.
-
-#. **tunables:** The preamble at the top of the map described any *tunables*
- for CRUSH behavior that vary from the historical/legacy CRUSH behavior. These
- correct for old bugs, optimizations, or other changes in behavior that have
- been made over the years to improve CRUSH's behavior.
-
-#. **devices:** Devices are individual ``ceph-osd`` daemons that can
- store data.
-
-#. **types**: Bucket ``types`` define the types of buckets used in
- your CRUSH hierarchy. Buckets consist of a hierarchical aggregation
- of storage locations (e.g., rows, racks, chassis, hosts, etc.) and
- their assigned weights.
-
-#. **buckets:** Once you define bucket types, you must define each node
- in the hierarchy, its type, and which devices or other nodes it
- containes.
-
-#. **rules:** Rules define policy about how data is distributed across
- devices in the hierarchy.
-
-#. **choose_args:** Choose_args are alternative weights associated with
- the hierarchy that have been adjusted to optimize data placement. A single
- choose_args map can be used for the entire cluster, or one can be
- created for each individual pool.
-
-
-.. _crushmapdevices:
-
-CRUSH Map Devices
------------------
-
-Devices are individual ``ceph-osd`` daemons that can store data. You
-will normally have one defined here for each OSD daemon in your
-cluster. Devices are identified by an id (a non-negative integer) and
-a name, normally ``osd.N`` where ``N`` is the device id.
-
-Devices may also have a *device class* associated with them (e.g.,
-``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
-crush rule.
-
-::
-
- # devices
- device {num} {osd.name} [class {class}]
-
-For example::
-
- # devices
- device 0 osd.0 class ssd
- device 1 osd.1 class hdd
- device 2 osd.2
- device 3 osd.3
-
-In most cases, each device maps to a single ``ceph-osd`` daemon. This
-is normally a single storage device, a pair of devices (for example,
-one for data and one for a journal or metadata), or in some cases a
-small RAID device.
-
-
-
-
-
-CRUSH Map Bucket Types
-----------------------
-
-The second list in the CRUSH map defines 'bucket' types. Buckets facilitate
-a hierarchy of nodes and leaves. Node (or non-leaf) buckets typically represent
-physical locations in a hierarchy. Nodes aggregate other nodes or leaves.
-Leaf buckets represent ``ceph-osd`` daemons and their corresponding storage
-media.
-
-.. tip:: The term "bucket" used in the context of CRUSH means a node in
- the hierarchy, i.e. a location or a piece of physical hardware. It
- is a different concept from the term "bucket" when used in the
- context of RADOS Gateway APIs.
-
-To add a bucket type to the CRUSH map, create a new line under your list of
-bucket types. Enter ``type`` followed by a unique numeric ID and a bucket name.
-By convention, there is one leaf bucket and it is ``type 0``; however, you may
-give it any name you like (e.g., osd, disk, drive, storage, etc.)::
-
- #types
- type {num} {bucket-name}
-
-For example::
-
- # types
- type 0 osd
- type 1 host
- type 2 chassis
- type 3 rack
- type 4 row
- type 5 pdu
- type 6 pod
- type 7 room
- type 8 datacenter
- type 9 region
- type 10 root
-
-
-
-.. _crushmapbuckets:
-
-CRUSH Map Bucket Hierarchy
---------------------------
-
-The CRUSH algorithm distributes data objects among storage devices according
-to a per-device weight value, approximating a uniform probability distribution.
-CRUSH distributes objects and their replicas according to the hierarchical
-cluster map you define. Your CRUSH map represents the available storage
-devices and the logical elements that contain them.
-
-To map placement groups to OSDs across failure domains, a CRUSH map defines a
-hierarchical list of bucket types (i.e., under ``#types`` in the generated CRUSH
-map). The purpose of creating a bucket hierarchy is to segregate the
-leaf nodes by their failure domains, such as hosts, chassis, racks, power
-distribution units, pods, rows, rooms, and data centers. With the exception of
-the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary, and
-you may define it according to your own needs.
-
-We recommend adapting your CRUSH map to your firms's hardware naming conventions
-and using instances names that reflect the physical hardware. Your naming
-practice can make it easier to administer the cluster and troubleshoot
-problems when an OSD and/or other hardware malfunctions and the administrator
-need access to physical hardware.
-
-In the following example, the bucket hierarchy has a leaf bucket named ``osd``,
-and two node buckets named ``host`` and ``rack`` respectively.
-
-.. ditaa::
- +-----------+
- | {o}rack |
- | Bucket |
- +-----+-----+
- |
- +---------------+---------------+
- | |
- +-----+-----+ +-----+-----+
- | {o}host | | {o}host |
- | Bucket | | Bucket |
- +-----+-----+ +-----+-----+
- | |
- +-------+-------+ +-------+-------+
- | | | |
- +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
- | osd | | osd | | osd | | osd |
- | Bucket | | Bucket | | Bucket | | Bucket |
- +-----------+ +-----------+ +-----------+ +-----------+
-
-.. note:: The higher numbered ``rack`` bucket type aggregates the lower
- numbered ``host`` bucket type.
-
-Since leaf nodes reflect storage devices declared under the ``#devices`` list
-at the beginning of the CRUSH map, you do not need to declare them as bucket
-instances. The second lowest bucket type in your hierarchy usually aggregates
-the devices (i.e., it's usually the computer containing the storage media, and
-uses whatever term you prefer to describe it, such as "node", "computer",
-"server," "host", "machine", etc.). In high density environments, it is
-increasingly common to see multiple hosts/nodes per chassis. You should account
-for chassis failure too--e.g., the need to pull a chassis if a node fails may
-result in bringing down numerous hosts/nodes and their OSDs.
-
-When declaring a bucket instance, you must specify its type, give it a unique
-name (string), assign it a unique ID expressed as a negative integer (optional),
-specify a weight relative to the total capacity/capability of its item(s),
-specify the bucket algorithm (usually ``straw``), and the hash (usually ``0``,
-reflecting hash algorithm ``rjenkins1``). A bucket may have one or more items.
-The items may consist of node buckets or leaves. Items may have a weight that
-reflects the relative weight of the item.
-
-You may declare a node bucket with the following syntax::
-
- [bucket-type] [bucket-name] {
- id [a unique negative numeric ID]
- weight [the relative capacity/capability of the item(s)]
- alg [the bucket type: uniform | list | tree | straw ]
- hash [the hash type: 0 by default]
- item [item-name] weight [weight]
- }
-
-For example, using the diagram above, we would define two host buckets
-and one rack bucket. The OSDs are declared as items within the host buckets::
-
- host node1 {
- id -1
- alg straw
- hash 0
- item osd.0 weight 1.00
- item osd.1 weight 1.00
- }
-
- host node2 {
- id -2
- alg straw
- hash 0
- item osd.2 weight 1.00
- item osd.3 weight 1.00
- }
-
- rack rack1 {
- id -3
- alg straw
- hash 0
- item node1 weight 2.00
- item node2 weight 2.00
- }
-
-.. note:: In the foregoing example, note that the rack bucket does not contain
- any OSDs. Rather it contains lower level host buckets, and includes the
- sum total of their weight in the item entry.
-
-.. topic:: Bucket Types
-
- Ceph supports four bucket types, each representing a tradeoff between
- performance and reorganization efficiency. If you are unsure of which bucket
- type to use, we recommend using a ``straw`` bucket. For a detailed
- discussion of bucket types, refer to
- `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
- and more specifically to **Section 3.4**. The bucket types are:
-
- #. **Uniform:** Uniform buckets aggregate devices with **exactly** the same
- weight. For example, when firms commission or decommission hardware, they
- typically do so with many machines that have exactly the same physical
- configuration (e.g., bulk purchases). When storage devices have exactly
- the same weight, you may use the ``uniform`` bucket type, which allows
- CRUSH to map replicas into uniform buckets in constant time. With
- non-uniform weights, you should use another bucket algorithm.
-
- #. **List**: List buckets aggregate their content as linked lists. Based on
- the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`P` algorithm,
- a list is a natural and intuitive choice for an **expanding cluster**:
- either an object is relocated to the newest device with some appropriate
- probability, or it remains on the older devices as before. The result is
- optimal data migration when items are added to the bucket. Items removed
- from the middle or tail of the list, however, can result in a significant
- amount of unnecessary movement, making list buckets most suitable for
- circumstances in which they **never (or very rarely) shrink**.
-
- #. **Tree**: Tree buckets use a binary search tree. They are more efficient
- than list buckets when a bucket contains a larger set of items. Based on
- the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`R` algorithm,
- tree buckets reduce the placement time to O(log :sub:`n`), making them
- suitable for managing much larger sets of devices or nested buckets.
-
- #. **Straw:** List and Tree buckets use a divide and conquer strategy
- in a way that either gives certain items precedence (e.g., those
- at the beginning of a list) or obviates the need to consider entire
- subtrees of items at all. That improves the performance of the replica
- placement process, but can also introduce suboptimal reorganization
- behavior when the contents of a bucket change due an addition, removal,
- or re-weighting of an item. The straw bucket type allows all items to
- fairly “compete” against each other for replica placement through a
- process analogous to a draw of straws.
-
-.. topic:: Hash
-
- Each bucket uses a hash algorithm. Currently, Ceph supports ``rjenkins1``.
- Enter ``0`` as your hash setting to select ``rjenkins1``.
-
-
-.. _weightingbucketitems:
-
-.. topic:: Weighting Bucket Items
-
- Ceph expresses bucket weights as doubles, which allows for fine
- weighting. A weight is the relative difference between device capacities. We
- recommend using ``1.00`` as the relative weight for a 1TB storage device.
- In such a scenario, a weight of ``0.5`` would represent approximately 500GB,
- and a weight of ``3.00`` would represent approximately 3TB. Higher level
- buckets have a weight that is the sum total of the leaf items aggregated by
- the bucket.
-
- A bucket item weight is one dimensional, but you may also calculate your
- item weights to reflect the performance of the storage drive. For example,
- if you have many 1TB drives where some have relatively low data transfer
- rate and the others have a relatively high data transfer rate, you may
- weight them differently, even though they have the same capacity (e.g.,
- a weight of 0.80 for the first set of drives with lower total throughput,
- and 1.20 for the second set of drives with higher total throughput).
-
-
-.. _crushmaprules:
-
-CRUSH Map Rules
----------------
-
-CRUSH maps support the notion of 'CRUSH rules', which are the rules that
-determine data placement for a pool. For large clusters, you will likely create
-many pools where each pool may have its own CRUSH ruleset and rules. The default
-CRUSH map has a rule for each pool, and one ruleset assigned to each of the
-default pools.
-
-.. note:: In most cases, you will not need to modify the default rules. When
- you create a new pool, its default ruleset is ``0``.
-
-
-CRUSH rules define placement and replication strategies or distribution policies
-that allow you to specify exactly how CRUSH places object replicas. For
-example, you might create a rule selecting a pair of targets for 2-way
-mirroring, another rule for selecting three targets in two different data
-centers for 3-way mirroring, and yet another rule for erasure coding over six
-storage devices. For a detailed discussion of CRUSH rules, refer to
-`CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
-and more specifically to **Section 3.2**.
-
-A rule takes the following form::
-
- rule <rulename> {
-
- ruleset <ruleset>
- type [ replicated | erasure ]
- min_size <min-size>
- max_size <max-size>
- step take <bucket-name> [class <device-class>]
- step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
- step emit
- }
-
-
-``ruleset``
-
-:Description: A means of classifying a rule as belonging to a set of rules.
- Activated by `setting the ruleset in a pool`_.
-
-:Purpose: A component of the rule mask.
-:Type: Integer
-:Required: Yes
-:Default: 0
-
-.. _setting the ruleset in a pool: ../pools#setpoolvalues
-
-
-``type``
-
-:Description: Describes a rule for either a storage drive (replicated)
- or a RAID.
-
-:Purpose: A component of the rule mask.
-:Type: String
-:Required: Yes
-:Default: ``replicated``
-:Valid Values: Currently only ``replicated`` and ``erasure``
-
-``min_size``
-
-:Description: If a pool makes fewer replicas than this number, CRUSH will
- **NOT** select this rule.
-
-:Type: Integer
-:Purpose: A component of the rule mask.
-:Required: Yes
-:Default: ``1``
-
-``max_size``
-
-:Description: If a pool makes more replicas than this number, CRUSH will
- **NOT** select this rule.
-
-:Type: Integer
-:Purpose: A component of the rule mask.
-:Required: Yes
-:Default: 10
-
-
-``step take <bucket-name> [class <device-class>]``
-
-:Description: Takes a bucket name, and begins iterating down the tree.
- If the ``device-class`` is specified, it must match
- a class previously used when defining a device. All
- devices that do not belong to the class are excluded.
-:Purpose: A component of the rule.
-:Required: Yes
-:Example: ``step take data``
-
-
-``step choose firstn {num} type {bucket-type}``
-
-:Description: Selects the number of buckets of the given type. The number is
- usually the number of replicas in the pool (i.e., pool size).
-
- - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
- - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
- - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
-
-:Purpose: A component of the rule.
-:Prerequisite: Follows ``step take`` or ``step choose``.
-:Example: ``step choose firstn 1 type row``
-
-
-``step chooseleaf firstn {num} type {bucket-type}``
-
-:Description: Selects a set of buckets of ``{bucket-type}`` and chooses a leaf
- node from the subtree of each bucket in the set of buckets. The
- number of buckets in the set is usually the number of replicas in
- the pool (i.e., pool size).
-
- - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
- - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
- - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
-
-:Purpose: A component of the rule. Usage removes the need to select a device using two steps.
-:Prerequisite: Follows ``step take`` or ``step choose``.
-:Example: ``step chooseleaf firstn 0 type row``
-
-
-
-``step emit``
-
-:Description: Outputs the current value and empties the stack. Typically used
- at the end of a rule, but may also be used to pick from different
- trees in the same rule.
-
-:Purpose: A component of the rule.
-:Prerequisite: Follows ``step choose``.
-:Example: ``step emit``
-
-.. important:: To activate one or more rules with a common ruleset number to a
- pool, set the ruleset number of the pool.
-
-
-Placing Different Pools on Different OSDS:
-==========================================
-
-Suppose you want to have most pools default to OSDs backed by large hard drives,
-but have some pools mapped to OSDs backed by fast solid-state drives (SSDs).
-It's possible to have multiple independent CRUSH hierarchies within the same
-CRUSH map. Define two hierarchies with two different root nodes--one for hard
-disks (e.g., "root platter") and one for SSDs (e.g., "root ssd") as shown
-below::
-
- device 0 osd.0
- device 1 osd.1
- device 2 osd.2
- device 3 osd.3
- device 4 osd.4
- device 5 osd.5
- device 6 osd.6
- device 7 osd.7
-
- host ceph-osd-ssd-server-1 {
- id -1
- alg straw
- hash 0
- item osd.0 weight 1.00
- item osd.1 weight 1.00
- }
-
- host ceph-osd-ssd-server-2 {
- id -2
- alg straw
- hash 0
- item osd.2 weight 1.00
- item osd.3 weight 1.00
- }
-
- host ceph-osd-platter-server-1 {
- id -3
- alg straw
- hash 0
- item osd.4 weight 1.00
- item osd.5 weight 1.00
- }
-
- host ceph-osd-platter-server-2 {
- id -4
- alg straw
- hash 0
- item osd.6 weight 1.00
- item osd.7 weight 1.00
- }
-
- root platter {
- id -5
- alg straw
- hash 0
- item ceph-osd-platter-server-1 weight 2.00
- item ceph-osd-platter-server-2 weight 2.00
- }
-
- root ssd {
- id -6
- alg straw
- hash 0
- item ceph-osd-ssd-server-1 weight 2.00
- item ceph-osd-ssd-server-2 weight 2.00
- }
-
- rule data {
- ruleset 0
- type replicated
- min_size 2
- max_size 2
- step take platter
- step chooseleaf firstn 0 type host
- step emit
- }
-
- rule metadata {
- ruleset 1
- type replicated
- min_size 0
- max_size 10
- step take platter
- step chooseleaf firstn 0 type host
- step emit
- }
-
- rule rbd {
- ruleset 2
- type replicated
- min_size 0
- max_size 10
- step take platter
- step chooseleaf firstn 0 type host
- step emit
- }
-
- rule platter {
- ruleset 3
- type replicated
- min_size 0
- max_size 10
- step take platter
- step chooseleaf firstn 0 type host
- step emit
- }
-
- rule ssd {
- ruleset 4
- type replicated
- min_size 0
- max_size 4
- step take ssd
- step chooseleaf firstn 0 type host
- step emit
- }
-
- rule ssd-primary {
- ruleset 5
- type replicated
- min_size 5
- max_size 10
- step take ssd
- step chooseleaf firstn 1 type host
- step emit
- step take platter
- step chooseleaf firstn -1 type host
- step emit
- }
-
-You can then set a pool to use the SSD rule by::
-
- ceph osd pool set <poolname> crush_ruleset 4
-
-Similarly, using the ``ssd-primary`` rule will cause each placement group in the
-pool to be placed with an SSD as the primary and platters as the replicas.
-
-
-Tuning CRUSH, the hard way
---------------------------
-
-If you can ensure that all clients are running recent code, you can
-adjust the tunables by extracting the CRUSH map, modifying the values,
-and reinjecting it into the cluster.
-
-* Extract the latest CRUSH map::
-
- ceph osd getcrushmap -o /tmp/crush
-
-* Adjust tunables. These values appear to offer the best behavior
- for both large and small clusters we tested with. You will need to
- additionally specify the ``--enable-unsafe-tunables`` argument to
- ``crushtool`` for this to work. Please use this option with
- extreme care.::
-
- crushtool -i /tmp/crush --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --set-choose-total-tries 50 -o /tmp/crush.new
-
-* Reinject modified map::
-
- ceph osd setcrushmap -i /tmp/crush.new
-
-Legacy values
--------------
-
-For reference, the legacy values for the CRUSH tunables can be set
-with::
-
- crushtool -i /tmp/crush --set-choose-local-tries 2 --set-choose-local-fallback-tries 5 --set-choose-total-tries 19 --set-chooseleaf-descend-once 0 --set-chooseleaf-vary-r 0 -o /tmp/crush.legacy
-
-Again, the special ``--enable-unsafe-tunables`` option is required.
-Further, as noted above, be careful running old versions of the
-``ceph-osd`` daemon after reverting to legacy values as the feature
-bit is not perfectly enforced.
diff --git a/src/ceph/doc/rados/operations/crush-map.rst b/src/ceph/doc/rados/operations/crush-map.rst
deleted file mode 100644
index 05fa4ff..0000000
--- a/src/ceph/doc/rados/operations/crush-map.rst
+++ /dev/null
@@ -1,956 +0,0 @@
-============
- CRUSH Maps
-============
-
-The :abbr:`CRUSH (Controlled Replication Under Scalable Hashing)` algorithm
-determines how to store and retrieve data by computing data storage locations.
-CRUSH empowers Ceph clients to communicate with OSDs directly rather than
-through a centralized server or broker. With an algorithmically determined
-method of storing and retrieving data, Ceph avoids a single point of failure, a
-performance bottleneck, and a physical limit to its scalability.
-
-CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly
-store and retrieve data in OSDs with a uniform distribution of data across the
-cluster. For a detailed discussion of CRUSH, see
-`CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_
-
-CRUSH maps contain a list of :abbr:`OSDs (Object Storage Devices)`, a list of
-'buckets' for aggregating the devices into physical locations, and a list of
-rules that tell CRUSH how it should replicate data in a Ceph cluster's pools. By
-reflecting the underlying physical organization of the installation, CRUSH can
-model—and thereby address—potential sources of correlated device failures.
-Typical sources include physical proximity, a shared power source, and a shared
-network. By encoding this information into the cluster map, CRUSH placement
-policies can separate object replicas across different failure domains while
-still maintaining the desired distribution. For example, to address the
-possibility of concurrent failures, it may be desirable to ensure that data
-replicas are on devices using different shelves, racks, power supplies,
-controllers, and/or physical locations.
-
-When you deploy OSDs they are automatically placed within the CRUSH map under a
-``host`` node named with the hostname for the host they are running on. This,
-combined with the default CRUSH failure domain, ensures that replicas or erasure
-code shards are separated across hosts and a single host failure will not
-affect availability. For larger clusters, however, administrators should carefully consider their choice of failure domain. Separating replicas across racks,
-for example, is common for mid- to large-sized clusters.
-
-
-CRUSH Location
-==============
-
-The location of an OSD in terms of the CRUSH map's hierarchy is
-referred to as a ``crush location``. This location specifier takes the
-form of a list of key and value pairs describing a position. For
-example, if an OSD is in a particular row, rack, chassis and host, and
-is part of the 'default' CRUSH tree (this is the case for the vast
-majority of clusters), its crush location could be described as::
-
- root=default row=a rack=a2 chassis=a2a host=a2a1
-
-Note:
-
-#. Note that the order of the keys does not matter.
-#. The key name (left of ``=``) must be a valid CRUSH ``type``. By default
- these include root, datacenter, room, row, pod, pdu, rack, chassis and host,
- but those types can be customized to be anything appropriate by modifying
- the CRUSH map.
-#. Not all keys need to be specified. For example, by default, Ceph
- automatically sets a ``ceph-osd`` daemon's location to be
- ``root=default host=HOSTNAME`` (based on the output from ``hostname -s``).
-
-The crush location for an OSD is normally expressed via the ``crush location``
-config option being set in the ``ceph.conf`` file. Each time the OSD starts,
-it verifies it is in the correct location in the CRUSH map and, if it is not,
-it moved itself. To disable this automatic CRUSH map management, add the
-following to your configuration file in the ``[osd]`` section::
-
- osd crush update on start = false
-
-
-Custom location hooks
----------------------
-
-A customized location hook can be used to generate a more complete
-crush location on startup. The sample ``ceph-crush-location`` utility
-will generate a CRUSH location string for a given daemon. The
-location is based on, in order of preference:
-
-#. A ``crush location`` option in ceph.conf.
-#. A default of ``root=default host=HOSTNAME`` where the hostname is
- generated with the ``hostname -s`` command.
-
-This is not useful by itself, as the OSD itself has the exact same
-behavior. However, the script can be modified to provide additional
-location fields (for example, the rack or datacenter), and then the
-hook enabled via the config option::
-
- crush location hook = /path/to/customized-ceph-crush-location
-
-This hook is passed several arguments (below) and should output a single line
-to stdout with the CRUSH location description.::
-
- $ ceph-crush-location --cluster CLUSTER --id ID --type TYPE
-
-where the cluster name is typically 'ceph', the id is the daemon
-identifier (the OSD number), and the daemon type is typically ``osd``.
-
-
-CRUSH structure
-===============
-
-The CRUSH map consists of, loosely speaking, a hierarchy describing
-the physical topology of the cluster, and a set of rules defining
-policy about how we place data on those devices. The hierarchy has
-devices (``ceph-osd`` daemons) at the leaves, and internal nodes
-corresponding to other physical features or groupings: hosts, racks,
-rows, datacenters, and so on. The rules describe how replicas are
-placed in terms of that hierarchy (e.g., 'three replicas in different
-racks').
-
-Devices
--------
-
-Devices are individual ``ceph-osd`` daemons that can store data. You
-will normally have one defined here for each OSD daemon in your
-cluster. Devices are identified by an id (a non-negative integer) and
-a name, normally ``osd.N`` where ``N`` is the device id.
-
-Devices may also have a *device class* associated with them (e.g.,
-``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
-crush rule.
-
-Types and Buckets
------------------
-
-A bucket is the CRUSH term for internal nodes in the hierarchy: hosts,
-racks, rows, etc. The CRUSH map defines a series of *types* that are
-used to describe these nodes. By default, these types include:
-
-- osd (or device)
-- host
-- chassis
-- rack
-- row
-- pdu
-- pod
-- room
-- datacenter
-- region
-- root
-
-Most clusters make use of only a handful of these types, and others
-can be defined as needed.
-
-The hierarchy is built with devices (normally type ``osd``) at the
-leaves, interior nodes with non-device types, and a root node of type
-``root``. For example,
-
-.. ditaa::
-
- +-----------------+
- | {o}root default |
- +--------+--------+
- |
- +---------------+---------------+
- | |
- +-------+-------+ +-----+-------+
- | {o}host foo | | {o}host bar |
- +-------+-------+ +-----+-------+
- | |
- +-------+-------+ +-------+-------+
- | | | |
- +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
- | osd.0 | | osd.1 | | osd.2 | | osd.3 |
- +-----------+ +-----------+ +-----------+ +-----------+
-
-Each node (device or bucket) in the hierarchy has a *weight*
-associated with it, indicating the relative proportion of the total
-data that device or hierarchy subtree should store. Weights are set
-at the leaves, indicating the size of the device, and automatically
-sum up the tree from there, such that the weight of the default node
-will be the total of all devices contained beneath it. Normally
-weights are in units of terabytes (TB).
-
-You can get a simple view the CRUSH hierarchy for your cluster,
-including the weights, with::
-
- ceph osd crush tree
-
-Rules
------
-
-Rules define policy about how data is distributed across the devices
-in the hierarchy.
-
-CRUSH rules define placement and replication strategies or
-distribution policies that allow you to specify exactly how CRUSH
-places object replicas. For example, you might create a rule selecting
-a pair of targets for 2-way mirroring, another rule for selecting
-three targets in two different data centers for 3-way mirroring, and
-yet another rule for erasure coding over six storage devices. For a
-detailed discussion of CRUSH rules, refer to `CRUSH - Controlled,
-Scalable, Decentralized Placement of Replicated Data`_, and more
-specifically to **Section 3.2**.
-
-In almost all cases, CRUSH rules can be created via the CLI by
-specifying the *pool type* they will be used for (replicated or
-erasure coded), the *failure domain*, and optionally a *device class*.
-In rare cases rules must be written by hand by manually editing the
-CRUSH map.
-
-You can see what rules are defined for your cluster with::
-
- ceph osd crush rule ls
-
-You can view the contents of the rules with::
-
- ceph osd crush rule dump
-
-Device classes
---------------
-
-Each device can optionally have a *class* associated with it. By
-default, OSDs automatically set their class on startup to either
-`hdd`, `ssd`, or `nvme` based on the type of device they are backed
-by.
-
-The device class for one or more OSDs can be explicitly set with::
-
- ceph osd crush set-device-class <class> <osd-name> [...]
-
-Once a device class is set, it cannot be changed to another class
-until the old class is unset with::
-
- ceph osd crush rm-device-class <osd-name> [...]
-
-This allows administrators to set device classes without the class
-being changed on OSD restart or by some other script.
-
-A placement rule that targets a specific device class can be created with::
-
- ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
-
-A pool can then be changed to use the new rule with::
-
- ceph osd pool set <pool-name> crush_rule <rule-name>
-
-Device classes are implemented by creating a "shadow" CRUSH hierarchy
-for each device class in use that contains only devices of that class.
-Rules can then distribute data over the shadow hierarchy. One nice
-thing about this approach is that it is fully backward compatible with
-old Ceph clients. You can view the CRUSH hierarchy with shadow items
-with::
-
- ceph osd crush tree --show-shadow
-
-
-Weights sets
-------------
-
-A *weight set* is an alternative set of weights to use when
-calculating data placement. The normal weights associated with each
-device in the CRUSH map are set based on the device size and indicate
-how much data we *should* be storing where. However, because CRUSH is
-based on a pseudorandom placement process, there is always some
-variation from this ideal distribution, the same way that rolling a
-dice sixty times will not result in rolling exactly 10 ones and 10
-sixes. Weight sets allow the cluster to do a numerical optimization
-based on the specifics of your cluster (hierarchy, pools, etc.) to achieve
-a balanced distribution.
-
-There are two types of weight sets supported:
-
- #. A **compat** weight set is a single alternative set of weights for
- each device and node in the cluster. This is not well-suited for
- correcting for all anomalies (for example, placement groups for
- different pools may be different sizes and have different load
- levels, but will be mostly treated the same by the balancer).
- However, compat weight sets have the huge advantage that they are
- *backward compatible* with previous versions of Ceph, which means
- that even though weight sets were first introduced in Luminous
- v12.2.z, older clients (e.g., firefly) can still connect to the
- cluster when a compat weight set is being used to balance data.
- #. A **per-pool** weight set is more flexible in that it allows
- placement to be optimized for each data pool. Additionally,
- weights can be adjusted for each position of placement, allowing
- the optimizer to correct for a suble skew of data toward devices
- with small weights relative to their peers (and effect that is
- usually only apparently in very large clusters but which can cause
- balancing problems).
-
-When weight sets are in use, the weights associated with each node in
-the hierarchy is visible as a separate column (labeled either
-``(compat)`` or the pool name) from the command::
-
- ceph osd crush tree
-
-When both *compat* and *per-pool* weight sets are in use, data
-placement for a particular pool will use its own per-pool weight set
-if present. If not, it will use the compat weight set if present. If
-neither are present, it will use the normal CRUSH weights.
-
-Although weight sets can be set up and manipulated by hand, it is
-recommended that the *balancer* module be enabled to do so
-automatically.
-
-
-Modifying the CRUSH map
-=======================
-
-.. _addosd:
-
-Add/Move an OSD
----------------
-
-.. note: OSDs are normally automatically added to the CRUSH map when
- the OSD is created. This command is rarely needed.
-
-To add or move an OSD in the CRUSH map of a running cluster::
-
- ceph osd crush set {name} {weight} root={root} [{bucket-type}={bucket-name} ...]
-
-Where:
-
-``name``
-
-:Description: The full name of the OSD.
-:Type: String
-:Required: Yes
-:Example: ``osd.0``
-
-
-``weight``
-
-:Description: The CRUSH weight for the OSD, normally its size measure in terabytes (TB).
-:Type: Double
-:Required: Yes
-:Example: ``2.0``
-
-
-``root``
-
-:Description: The root node of the tree in which the OSD resides (normally ``default``)
-:Type: Key/value pair.
-:Required: Yes
-:Example: ``root=default``
-
-
-``bucket-type``
-
-:Description: You may specify the OSD's location in the CRUSH hierarchy.
-:Type: Key/value pairs.
-:Required: No
-:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
-
-
-The following example adds ``osd.0`` to the hierarchy, or moves the
-OSD from a previous location. ::
-
- ceph osd crush set osd.0 1.0 root=default datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1
-
-
-Adjust OSD weight
------------------
-
-.. note: Normally OSDs automatically add themselves to the CRUSH map
- with the correct weight when they are created. This command
- is rarely needed.
-
-To adjust an OSD's crush weight in the CRUSH map of a running cluster, execute
-the following::
-
- ceph osd crush reweight {name} {weight}
-
-Where:
-
-``name``
-
-:Description: The full name of the OSD.
-:Type: String
-:Required: Yes
-:Example: ``osd.0``
-
-
-``weight``
-
-:Description: The CRUSH weight for the OSD.
-:Type: Double
-:Required: Yes
-:Example: ``2.0``
-
-
-.. _removeosd:
-
-Remove an OSD
--------------
-
-.. note: OSDs are normally removed from the CRUSH as part of the
- ``ceph osd purge`` command. This command is rarely needed.
-
-To remove an OSD from the CRUSH map of a running cluster, execute the
-following::
-
- ceph osd crush remove {name}
-
-Where:
-
-``name``
-
-:Description: The full name of the OSD.
-:Type: String
-:Required: Yes
-:Example: ``osd.0``
-
-
-Add a Bucket
-------------
-
-.. note: Buckets are normally implicitly created when an OSD is added
- that specifies a ``{bucket-type}={bucket-name}`` as part of its
- location and a bucket with that name does not already exist. This
- command is typically used when manually adjusting the structure of the
- hierarchy after OSDs have been created (for example, to move a
- series of hosts underneath a new rack-level bucket).
-
-To add a bucket in the CRUSH map of a running cluster, execute the
-``ceph osd crush add-bucket`` command::
-
- ceph osd crush add-bucket {bucket-name} {bucket-type}
-
-Where:
-
-``bucket-name``
-
-:Description: The full name of the bucket.
-:Type: String
-:Required: Yes
-:Example: ``rack12``
-
-
-``bucket-type``
-
-:Description: The type of the bucket. The type must already exist in the hierarchy.
-:Type: String
-:Required: Yes
-:Example: ``rack``
-
-
-The following example adds the ``rack12`` bucket to the hierarchy::
-
- ceph osd crush add-bucket rack12 rack
-
-Move a Bucket
--------------
-
-To move a bucket to a different location or position in the CRUSH map
-hierarchy, execute the following::
-
- ceph osd crush move {bucket-name} {bucket-type}={bucket-name}, [...]
-
-Where:
-
-``bucket-name``
-
-:Description: The name of the bucket to move/reposition.
-:Type: String
-:Required: Yes
-:Example: ``foo-bar-1``
-
-``bucket-type``
-
-:Description: You may specify the bucket's location in the CRUSH hierarchy.
-:Type: Key/value pairs.
-:Required: No
-:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
-
-Remove a Bucket
----------------
-
-To remove a bucket from the CRUSH map hierarchy, execute the following::
-
- ceph osd crush remove {bucket-name}
-
-.. note:: A bucket must be empty before removing it from the CRUSH hierarchy.
-
-Where:
-
-``bucket-name``
-
-:Description: The name of the bucket that you'd like to remove.
-:Type: String
-:Required: Yes
-:Example: ``rack12``
-
-The following example removes the ``rack12`` bucket from the hierarchy::
-
- ceph osd crush remove rack12
-
-Creating a compat weight set
-----------------------------
-
-.. note: This step is normally done automatically by the ``balancer``
- module when enabled.
-
-To create a *compat* weight set::
-
- ceph osd crush weight-set create-compat
-
-Weights for the compat weight set can be adjusted with::
-
- ceph osd crush weight-set reweight-compat {name} {weight}
-
-The compat weight set can be destroyed with::
-
- ceph osd crush weight-set rm-compat
-
-Creating per-pool weight sets
------------------------------
-
-To create a weight set for a specific pool,::
-
- ceph osd crush weight-set create {pool-name} {mode}
-
-.. note:: Per-pool weight sets require that all servers and daemons
- run Luminous v12.2.z or later.
-
-Where:
-
-``pool-name``
-
-:Description: The name of a RADOS pool
-:Type: String
-:Required: Yes
-:Example: ``rbd``
-
-``mode``
-
-:Description: Either ``flat`` or ``positional``. A *flat* weight set
- has a single weight for each device or bucket. A
- *positional* weight set has a potentially different
- weight for each position in the resulting placement
- mapping. For example, if a pool has a replica count of
- 3, then a positional weight set will have three weights
- for each device and bucket.
-:Type: String
-:Required: Yes
-:Example: ``flat``
-
-To adjust the weight of an item in a weight set::
-
- ceph osd crush weight-set reweight {pool-name} {item-name} {weight [...]}
-
-To list existing weight sets,::
-
- ceph osd crush weight-set ls
-
-To remove a weight set,::
-
- ceph osd crush weight-set rm {pool-name}
-
-Creating a rule for a replicated pool
--------------------------------------
-
-For a replicated pool, the primary decision when creating the CRUSH
-rule is what the failure domain is going to be. For example, if a
-failure domain of ``host`` is selected, then CRUSH will ensure that
-each replica of the data is stored on a different host. If ``rack``
-is selected, then each replica will be stored in a different rack.
-What failure domain you choose primarily depends on the size of your
-cluster and how your hierarchy is structured.
-
-Normally, the entire cluster hierarchy is nested beneath a root node
-named ``default``. If you have customized your hierarchy, you may
-want to create a rule nested at some other node in the hierarchy. It
-doesn't matter what type is associated with that node (it doesn't have
-to be a ``root`` node).
-
-It is also possible to create a rule that restricts data placement to
-a specific *class* of device. By default, Ceph OSDs automatically
-classify themselves as either ``hdd`` or ``ssd``, depending on the
-underlying type of device being used. These classes can also be
-customized.
-
-To create a replicated rule,::
-
- ceph osd crush rule create-replicated {name} {root} {failure-domain-type} [{class}]
-
-Where:
-
-``name``
-
-:Description: The name of the rule
-:Type: String
-:Required: Yes
-:Example: ``rbd-rule``
-
-``root``
-
-:Description: The name of the node under which data should be placed.
-:Type: String
-:Required: Yes
-:Example: ``default``
-
-``failure-domain-type``
-
-:Description: The type of CRUSH nodes across which we should separate replicas.
-:Type: String
-:Required: Yes
-:Example: ``rack``
-
-``class``
-
-:Description: The device class data should be placed on.
-:Type: String
-:Required: No
-:Example: ``ssd``
-
-Creating a rule for an erasure coded pool
------------------------------------------
-
-For an erasure-coded pool, the same basic decisions need to be made as
-with a replicated pool: what is the failure domain, what node in the
-hierarchy will data be placed under (usually ``default``), and will
-placement be restricted to a specific device class. Erasure code
-pools are created a bit differently, however, because they need to be
-constructed carefully based on the erasure code being used. For this reason,
-you must include this information in the *erasure code profile*. A CRUSH
-rule will then be created from that either explicitly or automatically when
-the profile is used to create a pool.
-
-The erasure code profiles can be listed with::
-
- ceph osd erasure-code-profile ls
-
-An existing profile can be viewed with::
-
- ceph osd erasure-code-profile get {profile-name}
-
-Normally profiles should never be modified; instead, a new profile
-should be created and used when creating a new pool or creating a new
-rule for an existing pool.
-
-An erasure code profile consists of a set of key=value pairs. Most of
-these control the behavior of the erasure code that is encoding data
-in the pool. Those that begin with ``crush-``, however, affect the
-CRUSH rule that is created.
-
-The erasure code profile properties of interest are:
-
- * **crush-root**: the name of the CRUSH node to place data under [default: ``default``].
- * **crush-failure-domain**: the CRUSH type to separate erasure-coded shards across [default: ``host``].
- * **crush-device-class**: the device class to place data on [default: none, meaning all devices are used].
- * **k** and **m** (and, for the ``lrc`` plugin, **l**): these determine the number of erasure code shards, affecting the resulting CRUSH rule.
-
-Once a profile is defined, you can create a CRUSH rule with::
-
- ceph osd crush rule create-erasure {name} {profile-name}
-
-.. note: When creating a new pool, it is not actually necessary to
- explicitly create the rule. If the erasure code profile alone is
- specified and the rule argument is left off then Ceph will create
- the CRUSH rule automatically.
-
-Deleting rules
---------------
-
-Rules that are not in use by pools can be deleted with::
-
- ceph osd crush rule rm {rule-name}
-
-
-Tunables
-========
-
-Over time, we have made (and continue to make) improvements to the
-CRUSH algorithm used to calculate the placement of data. In order to
-support the change in behavior, we have introduced a series of tunable
-options that control whether the legacy or improved variation of the
-algorithm is used.
-
-In order to use newer tunables, both clients and servers must support
-the new version of CRUSH. For this reason, we have created
-``profiles`` that are named after the Ceph version in which they were
-introduced. For example, the ``firefly`` tunables are first supported
-in the firefly release, and will not work with older (e.g., dumpling)
-clients. Once a given set of tunables are changed from the legacy
-default behavior, the ``ceph-mon`` and ``ceph-osd`` will prevent older
-clients who do not support the new CRUSH features from connecting to
-the cluster.
-
-argonaut (legacy)
------------------
-
-The legacy CRUSH behavior used by argonaut and older releases works
-fine for most clusters, provided there are not too many OSDs that have
-been marked out.
-
-bobtail (CRUSH_TUNABLES2)
--------------------------
-
-The bobtail tunable profile fixes a few key misbehaviors:
-
- * For hierarchies with a small number of devices in the leaf buckets,
- some PGs map to fewer than the desired number of replicas. This
- commonly happens for hierarchies with "host" nodes with a small
- number (1-3) of OSDs nested beneath each one.
-
- * For large clusters, some small percentages of PGs map to less than
- the desired number of OSDs. This is more prevalent when there are
- several layers of the hierarchy (e.g., row, rack, host, osd).
-
- * When some OSDs are marked out, the data tends to get redistributed
- to nearby OSDs instead of across the entire hierarchy.
-
-The new tunables are:
-
- * ``choose_local_tries``: Number of local retries. Legacy value is
- 2, optimal value is 0.
-
- * ``choose_local_fallback_tries``: Legacy value is 5, optimal value
- is 0.
-
- * ``choose_total_tries``: Total number of attempts to choose an item.
- Legacy value was 19, subsequent testing indicates that a value of
- 50 is more appropriate for typical clusters. For extremely large
- clusters, a larger value might be necessary.
-
- * ``chooseleaf_descend_once``: Whether a recursive chooseleaf attempt
- will retry, or only try once and allow the original placement to
- retry. Legacy default is 0, optimal value is 1.
-
-Migration impact:
-
- * Moving from argonaut to bobtail tunables triggers a moderate amount
- of data movement. Use caution on a cluster that is already
- populated with data.
-
-firefly (CRUSH_TUNABLES3)
--------------------------
-
-The firefly tunable profile fixes a problem
-with the ``chooseleaf`` CRUSH rule behavior that tends to result in PG
-mappings with too few results when too many OSDs have been marked out.
-
-The new tunable is:
-
- * ``chooseleaf_vary_r``: Whether a recursive chooseleaf attempt will
- start with a non-zero value of r, based on how many attempts the
- parent has already made. Legacy default is 0, but with this value
- CRUSH is sometimes unable to find a mapping. The optimal value (in
- terms of computational cost and correctness) is 1.
-
-Migration impact:
-
- * For existing clusters that have lots of existing data, changing
- from 0 to 1 will cause a lot of data to move; a value of 4 or 5
- will allow CRUSH to find a valid mapping but will make less data
- move.
-
-straw_calc_version tunable (introduced with Firefly too)
---------------------------------------------------------
-
-There were some problems with the internal weights calculated and
-stored in the CRUSH map for ``straw`` buckets. Specifically, when
-there were items with a CRUSH weight of 0 or both a mix of weights and
-some duplicated weights CRUSH would distribute data incorrectly (i.e.,
-not in proportion to the weights).
-
-The new tunable is:
-
- * ``straw_calc_version``: A value of 0 preserves the old, broken
- internal weight calculation; a value of 1 fixes the behavior.
-
-Migration impact:
-
- * Moving to straw_calc_version 1 and then adjusting a straw bucket
- (by adding, removing, or reweighting an item, or by using the
- reweight-all command) can trigger a small to moderate amount of
- data movement *if* the cluster has hit one of the problematic
- conditions.
-
-This tunable option is special because it has absolutely no impact
-concerning the required kernel version in the client side.
-
-hammer (CRUSH_V4)
------------------
-
-The hammer tunable profile does not affect the
-mapping of existing CRUSH maps simply by changing the profile. However:
-
- * There is a new bucket type (``straw2``) supported. The new
- ``straw2`` bucket type fixes several limitations in the original
- ``straw`` bucket. Specifically, the old ``straw`` buckets would
- change some mappings that should have changed when a weight was
- adjusted, while ``straw2`` achieves the original goal of only
- changing mappings to or from the bucket item whose weight has
- changed.
-
- * ``straw2`` is the default for any newly created buckets.
-
-Migration impact:
-
- * Changing a bucket type from ``straw`` to ``straw2`` will result in
- a reasonably small amount of data movement, depending on how much
- the bucket item weights vary from each other. When the weights are
- all the same no data will move, and when item weights vary
- significantly there will be more movement.
-
-jewel (CRUSH_TUNABLES5)
------------------------
-
-The jewel tunable profile improves the
-overall behavior of CRUSH such that significantly fewer mappings
-change when an OSD is marked out of the cluster.
-
-The new tunable is:
-
- * ``chooseleaf_stable``: Whether a recursive chooseleaf attempt will
- use a better value for an inner loop that greatly reduces the number
- of mapping changes when an OSD is marked out. The legacy value is 0,
- while the new value of 1 uses the new approach.
-
-Migration impact:
-
- * Changing this value on an existing cluster will result in a very
- large amount of data movement as almost every PG mapping is likely
- to change.
-
-
-
-
-Which client versions support CRUSH_TUNABLES
---------------------------------------------
-
- * argonaut series, v0.48.1 or later
- * v0.49 or later
- * Linux kernel version v3.6 or later (for the file system and RBD kernel clients)
-
-Which client versions support CRUSH_TUNABLES2
----------------------------------------------
-
- * v0.55 or later, including bobtail series (v0.56.x)
- * Linux kernel version v3.9 or later (for the file system and RBD kernel clients)
-
-Which client versions support CRUSH_TUNABLES3
----------------------------------------------
-
- * v0.78 (firefly) or later
- * Linux kernel version v3.15 or later (for the file system and RBD kernel clients)
-
-Which client versions support CRUSH_V4
---------------------------------------
-
- * v0.94 (hammer) or later
- * Linux kernel version v4.1 or later (for the file system and RBD kernel clients)
-
-Which client versions support CRUSH_TUNABLES5
----------------------------------------------
-
- * v10.0.2 (jewel) or later
- * Linux kernel version v4.5 or later (for the file system and RBD kernel clients)
-
-Warning when tunables are non-optimal
--------------------------------------
-
-Starting with version v0.74, Ceph will issue a health warning if the
-current CRUSH tunables don't include all the optimal values from the
-``default`` profile (see below for the meaning of the ``default`` profile).
-To make this warning go away, you have two options:
-
-1. Adjust the tunables on the existing cluster. Note that this will
- result in some data movement (possibly as much as 10%). This is the
- preferred route, but should be taken with care on a production cluster
- where the data movement may affect performance. You can enable optimal
- tunables with::
-
- ceph osd crush tunables optimal
-
- If things go poorly (e.g., too much load) and not very much
- progress has been made, or there is a client compatibility problem
- (old kernel cephfs or rbd clients, or pre-bobtail librados
- clients), you can switch back with::
-
- ceph osd crush tunables legacy
-
-2. You can make the warning go away without making any changes to CRUSH by
- adding the following option to your ceph.conf ``[mon]`` section::
-
- mon warn on legacy crush tunables = false
-
- For the change to take effect, you will need to restart the monitors, or
- apply the option to running monitors with::
-
- ceph tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables
-
-
-A few important points
-----------------------
-
- * Adjusting these values will result in the shift of some PGs between
- storage nodes. If the Ceph cluster is already storing a lot of
- data, be prepared for some fraction of the data to move.
- * The ``ceph-osd`` and ``ceph-mon`` daemons will start requiring the
- feature bits of new connections as soon as they get
- the updated map. However, already-connected clients are
- effectively grandfathered in, and will misbehave if they do not
- support the new feature.
- * If the CRUSH tunables are set to non-legacy values and then later
- changed back to the defult values, ``ceph-osd`` daemons will not be
- required to support the feature. However, the OSD peering process
- requires examining and understanding old maps. Therefore, you
- should not run old versions of the ``ceph-osd`` daemon
- if the cluster has previously used non-legacy CRUSH values, even if
- the latest version of the map has been switched back to using the
- legacy defaults.
-
-Tuning CRUSH
-------------
-
-The simplest way to adjust the crush tunables is by changing to a known
-profile. Those are:
-
- * ``legacy``: the legacy behavior from argonaut and earlier.
- * ``argonaut``: the legacy values supported by the original argonaut release
- * ``bobtail``: the values supported by the bobtail release
- * ``firefly``: the values supported by the firefly release
- * ``hammer``: the values supported by the hammer release
- * ``jewel``: the values supported by the jewel release
- * ``optimal``: the best (ie optimal) values of the current version of Ceph
- * ``default``: the default values of a new cluster installed from
- scratch. These values, which depend on the current version of Ceph,
- are hard coded and are generally a mix of optimal and legacy values.
- These values generally match the ``optimal`` profile of the previous
- LTS release, or the most recent release for which we generally except
- more users to have up to date clients for.
-
-You can select a profile on a running cluster with the command::
-
- ceph osd crush tunables {PROFILE}
-
-Note that this may result in some data movement.
-
-
-.. _CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data: https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
-
-
-Primary Affinity
-================
-
-When a Ceph Client reads or writes data, it always contacts the primary OSD in
-the acting set. For set ``[2, 3, 4]``, ``osd.2`` is the primary. Sometimes an
-OSD is not well suited to act as a primary compared to other OSDs (e.g., it has
-a slow disk or a slow controller). To prevent performance bottlenecks
-(especially on read operations) while maximizing utilization of your hardware,
-you can set a Ceph OSD's primary affinity so that CRUSH is less likely to use
-the OSD as a primary in an acting set. ::
-
- ceph osd primary-affinity <osd-id> <weight>
-
-Primary affinity is ``1`` by default (*i.e.,* an OSD may act as a primary). You
-may set the OSD primary range from ``0-1``, where ``0`` means that the OSD may
-**NOT** be used as a primary and ``1`` means that an OSD may be used as a
-primary. When the weight is ``< 1``, it is less likely that CRUSH will select
-the Ceph OSD Daemon to act as a primary.
-
-
-
diff --git a/src/ceph/doc/rados/operations/data-placement.rst b/src/ceph/doc/rados/operations/data-placement.rst
deleted file mode 100644
index 27966b0..0000000
--- a/src/ceph/doc/rados/operations/data-placement.rst
+++ /dev/null
@@ -1,37 +0,0 @@
-=========================
- Data Placement Overview
-=========================
-
-Ceph stores, replicates and rebalances data objects across a RADOS cluster
-dynamically. With many different users storing objects in different pools for
-different purposes on countless OSDs, Ceph operations require some data
-placement planning. The main data placement planning concepts in Ceph include:
-
-- **Pools:** Ceph stores data within pools, which are logical groups for storing
- objects. Pools manage the number of placement groups, the number of replicas,
- and the ruleset for the pool. To store data in a pool, you must have
- an authenticated user with permissions for the pool. Ceph can snapshot pools.
- See `Pools`_ for additional details.
-
-- **Placement Groups:** Ceph maps objects to placement groups (PGs).
- Placement groups (PGs) are shards or fragments of a logical object pool
- that place objects as a group into OSDs. Placement groups reduce the amount
- of per-object metadata when Ceph stores the data in OSDs. A larger number of
- placement groups (e.g., 100 per OSD) leads to better balancing. See
- `Placement Groups`_ for additional details.
-
-- **CRUSH Maps:** CRUSH is a big part of what allows Ceph to scale without
- performance bottlenecks, without limitations to scalability, and without a
- single point of failure. CRUSH maps provide the physical topology of the
- cluster to the CRUSH algorithm to determine where the data for an object
- and its replicas should be stored, and how to do so across failure domains
- for added data safety among other things. See `CRUSH Maps`_ for additional
- details.
-
-When you initially set up a test cluster, you can use the default values. Once
-you begin planning for a large Ceph cluster, refer to pools, placement groups
-and CRUSH for data placement operations.
-
-.. _Pools: ../pools
-.. _Placement Groups: ../placement-groups
-.. _CRUSH Maps: ../crush-map
diff --git a/src/ceph/doc/rados/operations/erasure-code-isa.rst b/src/ceph/doc/rados/operations/erasure-code-isa.rst
deleted file mode 100644
index b52933a..0000000
--- a/src/ceph/doc/rados/operations/erasure-code-isa.rst
+++ /dev/null
@@ -1,105 +0,0 @@
-=======================
-ISA erasure code plugin
-=======================
-
-The *isa* plugin encapsulates the `ISA
-<https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version/>`_
-library. It only runs on Intel processors.
-
-Create an isa profile
-=====================
-
-To create a new *isa* erasure code profile::
-
- ceph osd erasure-code-profile set {name} \
- plugin=isa \
- technique={reed_sol_van|cauchy} \
- [k={data-chunks}] \
- [m={coding-chunks}] \
- [crush-root={root}] \
- [crush-failure-domain={bucket-type}] \
- [crush-device-class={device-class}] \
- [directory={directory}] \
- [--force]
-
-Where:
-
-``k={data chunks}``
-
-:Description: Each object is split in **data-chunks** parts,
- each stored on a different OSD.
-
-:Type: Integer
-:Required: No.
-:Default: 7
-
-``m={coding-chunks}``
-
-:Description: Compute **coding chunks** for each object and store them
- on different OSDs. The number of coding chunks is also
- the number of OSDs that can be down without losing data.
-
-:Type: Integer
-:Required: No.
-:Default: 3
-
-``technique={reed_sol_van|cauchy}``
-
-:Description: The ISA plugin comes in two `Reed Solomon
- <https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction>`_
- forms. If *reed_sol_van* is set, it is `Vandermonde
- <https://en.wikipedia.org/wiki/Vandermonde_matrix>`_, if
- *cauchy* is set, it is `Cauchy
- <https://en.wikipedia.org/wiki/Cauchy_matrix>`_.
-
-:Type: String
-:Required: No.
-:Default: reed_sol_van
-
-``crush-root={root}``
-
-:Description: The name of the crush bucket used for the first step of
- the ruleset. For intance **step take default**.
-
-:Type: String
-:Required: No.
-:Default: default
-
-``crush-failure-domain={bucket-type}``
-
-:Description: Ensure that no two chunks are in a bucket with the same
- failure domain. For instance, if the failure domain is
- **host** no two chunks will be stored on the same
- host. It is used to create a ruleset step such as **step
- chooseleaf host**.
-
-:Type: String
-:Required: No.
-:Default: host
-
-``crush-device-class={device-class}``
-
-:Description: Restrict placement to devices of a specific class (e.g.,
- ``ssd`` or ``hdd``), using the crush device class names
- in the CRUSH map.
-
-:Type: String
-:Required: No.
-:Default:
-
-``directory={directory}``
-
-:Description: Set the **directory** name from which the erasure code
- plugin is loaded.
-
-:Type: String
-:Required: No.
-:Default: /usr/lib/ceph/erasure-code
-
-``--force``
-
-:Description: Override an existing profile by the same name.
-
-:Type: String
-:Required: No.
-
diff --git a/src/ceph/doc/rados/operations/erasure-code-jerasure.rst b/src/ceph/doc/rados/operations/erasure-code-jerasure.rst
deleted file mode 100644
index e8da097..0000000
--- a/src/ceph/doc/rados/operations/erasure-code-jerasure.rst
+++ /dev/null
@@ -1,120 +0,0 @@
-============================
-Jerasure erasure code plugin
-============================
-
-The *jerasure* plugin is the most generic and flexible plugin, it is
-also the default for Ceph erasure coded pools.
-
-The *jerasure* plugin encapsulates the `Jerasure
-<http://jerasure.org>`_ library. It is
-recommended to read the *jerasure* documentation to get a better
-understanding of the parameters.
-
-Create a jerasure profile
-=========================
-
-To create a new *jerasure* erasure code profile::
-
- ceph osd erasure-code-profile set {name} \
- plugin=jerasure \
- k={data-chunks} \
- m={coding-chunks} \
- technique={reed_sol_van|reed_sol_r6_op|cauchy_orig|cauchy_good|liberation|blaum_roth|liber8tion} \
- [crush-root={root}] \
- [crush-failure-domain={bucket-type}] \
- [crush-device-class={device-class}] \
- [directory={directory}] \
- [--force]
-
-Where:
-
-``k={data chunks}``
-
-:Description: Each object is split in **data-chunks** parts,
- each stored on a different OSD.
-
-:Type: Integer
-:Required: Yes.
-:Example: 4
-
-``m={coding-chunks}``
-
-:Description: Compute **coding chunks** for each object and store them
- on different OSDs. The number of coding chunks is also
- the number of OSDs that can be down without losing data.
-
-:Type: Integer
-:Required: Yes.
-:Example: 2
-
-``technique={reed_sol_van|reed_sol_r6_op|cauchy_orig|cauchy_good|liberation|blaum_roth|liber8tion}``
-
-:Description: The more flexible technique is *reed_sol_van* : it is
- enough to set *k* and *m*. The *cauchy_good* technique
- can be faster but you need to chose the *packetsize*
- carefully. All of *reed_sol_r6_op*, *liberation*,
- *blaum_roth*, *liber8tion* are *RAID6* equivalents in
- the sense that they can only be configured with *m=2*.
-
-:Type: String
-:Required: No.
-:Default: reed_sol_van
-
-``packetsize={bytes}``
-
-:Description: The encoding will be done on packets of *bytes* size at
- a time. Chosing the right packet size is difficult. The
- *jerasure* documentation contains extensive information
- on this topic.
-
-:Type: Integer
-:Required: No.
-:Default: 2048
-
-``crush-root={root}``
-
-:Description: The name of the crush bucket used for the first step of
- the ruleset. For intance **step take default**.
-
-:Type: String
-:Required: No.
-:Default: default
-
-``crush-failure-domain={bucket-type}``
-
-:Description: Ensure that no two chunks are in a bucket with the same
- failure domain. For instance, if the failure domain is
- **host** no two chunks will be stored on the same
- host. It is used to create a ruleset step such as **step
- chooseleaf host**.
-
-:Type: String
-:Required: No.
-:Default: host
-
-``crush-device-class={device-class}``
-
-:Description: Restrict placement to devices of a specific class (e.g.,
- ``ssd`` or ``hdd``), using the crush device class names
- in the CRUSH map.
-
-:Type: String
-:Required: No.
-:Default:
-
- ``directory={directory}``
-
-:Description: Set the **directory** name from which the erasure code
- plugin is loaded.
-
-:Type: String
-:Required: No.
-:Default: /usr/lib/ceph/erasure-code
-
-``--force``
-
-:Description: Override an existing profile by the same name.
-
-:Type: String
-:Required: No.
-
diff --git a/src/ceph/doc/rados/operations/erasure-code-lrc.rst b/src/ceph/doc/rados/operations/erasure-code-lrc.rst
deleted file mode 100644
index 447ce23..0000000
--- a/src/ceph/doc/rados/operations/erasure-code-lrc.rst
+++ /dev/null
@@ -1,371 +0,0 @@
-======================================
-Locally repairable erasure code plugin
-======================================
-
-With the *jerasure* plugin, when an erasure coded object is stored on
-multiple OSDs, recovering from the loss of one OSD requires reading
-from all the others. For instance if *jerasure* is configured with
-*k=8* and *m=4*, losing one OSD requires reading from the eleven
-others to repair.
-
-The *lrc* erasure code plugin creates local parity chunks to be able
-to recover using less OSDs. For instance if *lrc* is configured with
-*k=8*, *m=4* and *l=4*, it will create an additional parity chunk for
-every four OSDs. When a single OSD is lost, it can be recovered with
-only four OSDs instead of eleven.
-
-Erasure code profile examples
-=============================
-
-Reduce recovery bandwidth between hosts
----------------------------------------
-
-Although it is probably not an interesting use case when all hosts are
-connected to the same switch, reduced bandwidth usage can actually be
-observed.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- k=4 m=2 l=3 \
- crush-failure-domain=host
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-
-Reduce recovery bandwidth between racks
----------------------------------------
-
-In Firefly the reduced bandwidth will only be observed if the primary
-OSD is in the same rack as the lost chunk.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- k=4 m=2 l=3 \
- crush-locality=rack \
- crush-failure-domain=host
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-
-Create an lrc profile
-=====================
-
-To create a new lrc erasure code profile::
-
- ceph osd erasure-code-profile set {name} \
- plugin=lrc \
- k={data-chunks} \
- m={coding-chunks} \
- l={locality} \
- [crush-root={root}] \
- [crush-locality={bucket-type}] \
- [crush-failure-domain={bucket-type}] \
- [crush-device-class={device-class}] \
- [directory={directory}] \
- [--force]
-
-Where:
-
-``k={data chunks}``
-
-:Description: Each object is split in **data-chunks** parts,
- each stored on a different OSD.
-
-:Type: Integer
-:Required: Yes.
-:Example: 4
-
-``m={coding-chunks}``
-
-:Description: Compute **coding chunks** for each object and store them
- on different OSDs. The number of coding chunks is also
- the number of OSDs that can be down without losing data.
-
-:Type: Integer
-:Required: Yes.
-:Example: 2
-
-``l={locality}``
-
-:Description: Group the coding and data chunks into sets of size
- **locality**. For instance, for **k=4** and **m=2**,
- when **locality=3** two groups of three are created.
- Each set can be recovered without reading chunks
- from another set.
-
-:Type: Integer
-:Required: Yes.
-:Example: 3
-
-``crush-root={root}``
-
-:Description: The name of the crush bucket used for the first step of
- the ruleset. For intance **step take default**.
-
-:Type: String
-:Required: No.
-:Default: default
-
-``crush-locality={bucket-type}``
-
-:Description: The type of the crush bucket in which each set of chunks
- defined by **l** will be stored. For instance, if it is
- set to **rack**, each group of **l** chunks will be
- placed in a different rack. It is used to create a
- ruleset step such as **step choose rack**. If it is not
- set, no such grouping is done.
-
-:Type: String
-:Required: No.
-
-``crush-failure-domain={bucket-type}``
-
-:Description: Ensure that no two chunks are in a bucket with the same
- failure domain. For instance, if the failure domain is
- **host** no two chunks will be stored on the same
- host. It is used to create a ruleset step such as **step
- chooseleaf host**.
-
-:Type: String
-:Required: No.
-:Default: host
-
-``crush-device-class={device-class}``
-
-:Description: Restrict placement to devices of a specific class (e.g.,
- ``ssd`` or ``hdd``), using the crush device class names
- in the CRUSH map.
-
-:Type: String
-:Required: No.
-:Default:
-
-``directory={directory}``
-
-:Description: Set the **directory** name from which the erasure code
- plugin is loaded.
-
-:Type: String
-:Required: No.
-:Default: /usr/lib/ceph/erasure-code
-
-``--force``
-
-:Description: Override an existing profile by the same name.
-
-:Type: String
-:Required: No.
-
-Low level plugin configuration
-==============================
-
-The sum of **k** and **m** must be a multiple of the **l** parameter.
-The low level configuration parameters do not impose such a
-restriction and it may be more convienient to use it for specific
-purposes. It is for instance possible to define two groups, one with 4
-chunks and another with 3 chunks. It is also possible to recursively
-define locality sets, for instance datacenters and racks into
-datacenters. The **k/m/l** are implemented by generating a low level
-configuration.
-
-The *lrc* erasure code plugin recursively applies erasure code
-techniques so that recovering from the loss of some chunks only
-requires a subset of the available chunks, most of the time.
-
-For instance, when three coding steps are described as::
-
- chunk nr 01234567
- step 1 _cDD_cDD
- step 2 cDDD____
- step 3 ____cDDD
-
-where *c* are coding chunks calculated from the data chunks *D*, the
-loss of chunk *7* can be recovered with the last four chunks. And the
-loss of chunk *2* chunk can be recovered with the first four
-chunks.
-
-Erasure code profile examples using low level configuration
-===========================================================
-
-Minimal testing
----------------
-
-It is strictly equivalent to using the default erasure code profile. The *DD*
-implies *K=2*, the *c* implies *M=1* and the *jerasure* plugin is used
-by default.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- mapping=DD_ \
- layers='[ [ "DDc", "" ] ]'
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-Reduce recovery bandwidth between hosts
----------------------------------------
-
-Although it is probably not an interesting use case when all hosts are
-connected to the same switch, reduced bandwidth usage can actually be
-observed. It is equivalent to **k=4**, **m=2** and **l=3** although
-the layout of the chunks is different::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- mapping=__DD__DD \
- layers='[
- [ "_cDD_cDD", "" ],
- [ "cDDD____", "" ],
- [ "____cDDD", "" ],
- ]'
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-
-Reduce recovery bandwidth between racks
----------------------------------------
-
-In Firefly the reduced bandwidth will only be observed if the primary
-OSD is in the same rack as the lost chunk.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- mapping=__DD__DD \
- layers='[
- [ "_cDD_cDD", "" ],
- [ "cDDD____", "" ],
- [ "____cDDD", "" ],
- ]' \
- crush-steps='[
- [ "choose", "rack", 2 ],
- [ "chooseleaf", "host", 4 ],
- ]'
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-Testing with different Erasure Code backends
---------------------------------------------
-
-LRC now uses jerasure as the default EC backend. It is possible to
-specify the EC backend/algorithm on a per layer basis using the low
-level configuration. The second argument in layers='[ [ "DDc", "" ] ]'
-is actually an erasure code profile to be used for this level. The
-example below specifies the ISA backend with the cauchy technique to
-be used in the lrcpool.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- mapping=DD_ \
- layers='[ [ "DDc", "plugin=isa technique=cauchy" ] ]'
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-You could also use a different erasure code profile for for each
-layer.::
-
- $ ceph osd erasure-code-profile set LRCprofile \
- plugin=lrc \
- mapping=__DD__DD \
- layers='[
- [ "_cDD_cDD", "plugin=isa technique=cauchy" ],
- [ "cDDD____", "plugin=isa" ],
- [ "____cDDD", "plugin=jerasure" ],
- ]'
- $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
-
-
-
-Erasure coding and decoding algorithm
-=====================================
-
-The steps found in the layers description::
-
- chunk nr 01234567
-
- step 1 _cDD_cDD
- step 2 cDDD____
- step 3 ____cDDD
-
-are applied in order. For instance, if a 4K object is encoded, it will
-first go thru *step 1* and be divided in four 1K chunks (the four
-uppercase D). They are stored in the chunks 2, 3, 6 and 7, in
-order. From these, two coding chunks are calculated (the two lowercase
-c). The coding chunks are stored in the chunks 1 and 5, respectively.
-
-The *step 2* re-uses the content created by *step 1* in a similar
-fashion and stores a single coding chunk *c* at position 0. The last four
-chunks, marked with an underscore (*_*) for readability, are ignored.
-
-The *step 3* stores a single coding chunk *c* at position 4. The three
-chunks created by *step 1* are used to compute this coding chunk,
-i.e. the coding chunk from *step 1* becomes a data chunk in *step 3*.
-
-If chunk *2* is lost::
-
- chunk nr 01234567
-
- step 1 _c D_cDD
- step 2 cD D____
- step 3 __ _cDDD
-
-decoding will attempt to recover it by walking the steps in reverse
-order: *step 3* then *step 2* and finally *step 1*.
-
-The *step 3* knows nothing about chunk *2* (i.e. it is an underscore)
-and is skipped.
-
-The coding chunk from *step 2*, stored in chunk *0*, allows it to
-recover the content of chunk *2*. There are no more chunks to recover
-and the process stops, without considering *step 1*.
-
-Recovering chunk *2* requires reading chunks *0, 1, 3* and writing
-back chunk *2*.
-
-If chunk *2, 3, 6* are lost::
-
- chunk nr 01234567
-
- step 1 _c _c D
- step 2 cD __ _
- step 3 __ cD D
-
-The *step 3* can recover the content of chunk *6*::
-
- chunk nr 01234567
-
- step 1 _c _cDD
- step 2 cD ____
- step 3 __ cDDD
-
-The *step 2* fails to recover and is skipped because there are two
-chunks missing (*2, 3*) and it can only recover from one missing
-chunk.
-
-The coding chunk from *step 1*, stored in chunk *1, 5*, allows it to
-recover the content of chunk *2, 3*::
-
- chunk nr 01234567
-
- step 1 _cDD_cDD
- step 2 cDDD____
- step 3 ____cDDD
-
-Controlling crush placement
-===========================
-
-The default crush ruleset provides OSDs that are on different hosts. For instance::
-
- chunk nr 01234567
-
- step 1 _cDD_cDD
- step 2 cDDD____
- step 3 ____cDDD
-
-needs exactly *8* OSDs, one for each chunk. If the hosts are in two
-adjacent racks, the first four chunks can be placed in the first rack
-and the last four in the second rack. So that recovering from the loss
-of a single OSD does not require using bandwidth between the two
-racks.
-
-For instance::
-
- crush-steps='[ [ "choose", "rack", 2 ], [ "chooseleaf", "host", 4 ] ]'
-
-will create a ruleset that will select two crush buckets of type
-*rack* and for each of them choose four OSDs, each of them located in
-different buckets of type *host*.
-
-The ruleset can also be manually crafted for finer control.
diff --git a/src/ceph/doc/rados/operations/erasure-code-profile.rst b/src/ceph/doc/rados/operations/erasure-code-profile.rst
deleted file mode 100644
index ddf772d..0000000
--- a/src/ceph/doc/rados/operations/erasure-code-profile.rst
+++ /dev/null
@@ -1,121 +0,0 @@
-=====================
-Erasure code profiles
-=====================
-
-Erasure code is defined by a **profile** and is used when creating an
-erasure coded pool and the associated crush ruleset.
-
-The **default** erasure code profile (which is created when the Ceph
-cluster is initialized) provides the same level of redundancy as two
-copies but requires 25% less disk space. It is described as a profile
-with **k=2** and **m=1**, meaning the information is spread over three
-OSD (k+m == 3) and one of them can be lost.
-
-To improve redundancy without increasing raw storage requirements, a
-new profile can be created. For instance, a profile with **k=10** and
-**m=4** can sustain the loss of four (**m=4**) OSDs by distributing an
-object on fourteen (k+m=14) OSDs. The object is first divided in
-**10** chunks (if the object is 10MB, each chunk is 1MB) and **4**
-coding chunks are computed, for recovery (each coding chunk has the
-same size as the data chunk, i.e. 1MB). The raw space overhead is only
-40% and the object will not be lost even if four OSDs break at the
-same time.
-
-.. _list of available plugins:
-
-.. toctree::
- :maxdepth: 1
-
- erasure-code-jerasure
- erasure-code-isa
- erasure-code-lrc
- erasure-code-shec
-
-osd erasure-code-profile set
-============================
-
-To create a new erasure code profile::
-
- ceph osd erasure-code-profile set {name} \
- [{directory=directory}] \
- [{plugin=plugin}] \
- [{stripe_unit=stripe_unit}] \
- [{key=value} ...] \
- [--force]
-
-Where:
-
-``{directory=directory}``
-
-:Description: Set the **directory** name from which the erasure code
- plugin is loaded.
-
-:Type: String
-:Required: No.
-:Default: /usr/lib/ceph/erasure-code
-
-``{plugin=plugin}``
-
-:Description: Use the erasure code **plugin** to compute coding chunks
- and recover missing chunks. See the `list of available
- plugins`_ for more information.
-
-:Type: String
-:Required: No.
-:Default: jerasure
-
-``{stripe_unit=stripe_unit}``
-
-:Description: The amount of data in a data chunk, per stripe. For
- example, a profile with 2 data chunks and stripe_unit=4K
- would put the range 0-4K in chunk 0, 4K-8K in chunk 1,
- then 8K-12K in chunk 0 again. This should be a multiple
- of 4K for best performance. The default value is taken
- from the monitor config option
- ``osd_pool_erasure_code_stripe_unit`` when a pool is
- created. The stripe_width of a pool using this profile
- will be the number of data chunks multiplied by this
- stripe_unit.
-
-:Type: String
-:Required: No.
-
-``{key=value}``
-
-:Description: The semantic of the remaining key/value pairs is defined
- by the erasure code plugin.
-
-:Type: String
-:Required: No.
-
-``--force``
-
-:Description: Override an existing profile by the same name, and allow
- setting a non-4K-aligned stripe_unit.
-
-:Type: String
-:Required: No.
-
-osd erasure-code-profile rm
-============================
-
-To remove an erasure code profile::
-
- ceph osd erasure-code-profile rm {name}
-
-If the profile is referenced by a pool, the deletion will fail.
-
-osd erasure-code-profile get
-============================
-
-To display an erasure code profile::
-
- ceph osd erasure-code-profile get {name}
-
-osd erasure-code-profile ls
-===========================
-
-To list the names of all erasure code profiles::
-
- ceph osd erasure-code-profile ls
-
diff --git a/src/ceph/doc/rados/operations/erasure-code-shec.rst b/src/ceph/doc/rados/operations/erasure-code-shec.rst
deleted file mode 100644
index e3bab37..0000000
--- a/src/ceph/doc/rados/operations/erasure-code-shec.rst
+++ /dev/null
@@ -1,144 +0,0 @@
-========================
-SHEC erasure code plugin
-========================
-
-The *shec* plugin encapsulates the `multiple SHEC
-<http://tracker.ceph.com/projects/ceph/wiki/Shingled_Erasure_Code_(SHEC)>`_
-library. It allows ceph to recover data more efficiently than Reed Solomon codes.
-
-Create an SHEC profile
-======================
-
-To create a new *shec* erasure code profile::
-
- ceph osd erasure-code-profile set {name} \
- plugin=shec \
- [k={data-chunks}] \
- [m={coding-chunks}] \
- [c={durability-estimator}] \
- [crush-root={root}] \
- [crush-failure-domain={bucket-type}] \
- [crush-device-class={device-class}] \
- [directory={directory}] \
- [--force]
-
-Where:
-
-``k={data-chunks}``
-
-:Description: Each object is split in **data-chunks** parts,
- each stored on a different OSD.
-
-:Type: Integer
-:Required: No.
-:Default: 4
-
-``m={coding-chunks}``
-
-:Description: Compute **coding-chunks** for each object and store them on
- different OSDs. The number of **coding-chunks** does not necessarily
- equal the number of OSDs that can be down without losing data.
-
-:Type: Integer
-:Required: No.
-:Default: 3
-
-``c={durability-estimator}``
-
-:Description: The number of parity chunks each of which includes each data chunk in its
- calculation range. The number is used as a **durability estimator**.
- For instance, if c=2, 2 OSDs can be down without losing data.
-
-:Type: Integer
-:Required: No.
-:Default: 2
-
-``crush-root={root}``
-
-:Description: The name of the crush bucket used for the first step of
- the ruleset. For intance **step take default**.
-
-:Type: String
-:Required: No.
-:Default: default
-
-``crush-failure-domain={bucket-type}``
-
-:Description: Ensure that no two chunks are in a bucket with the same
- failure domain. For instance, if the failure domain is
- **host** no two chunks will be stored on the same
- host. It is used to create a ruleset step such as **step
- chooseleaf host**.
-
-:Type: String
-:Required: No.
-:Default: host
-
-``crush-device-class={device-class}``
-
-:Description: Restrict placement to devices of a specific class (e.g.,
- ``ssd`` or ``hdd``), using the crush device class names
- in the CRUSH map.
-
-:Type: String
-:Required: No.
-:Default:
-
-``directory={directory}``
-
-:Description: Set the **directory** name from which the erasure code
- plugin is loaded.
-
-:Type: String
-:Required: No.
-:Default: /usr/lib/ceph/erasure-code
-
-``--force``
-
-:Description: Override an existing profile by the same name.
-
-:Type: String
-:Required: No.
-
-Brief description of SHEC's layouts
-===================================
-
-Space Efficiency
-----------------
-
-Space efficiency is a ratio of data chunks to all ones in a object and
-represented as k/(k+m).
-In order to improve space efficiency, you should increase k or decrease m.
-
-::
-
- space efficiency of SHEC(4,3,2) = 4/(4+3) = 0.57
- SHEC(5,3,2) or SHEC(4,2,2) improves SHEC(4,3,2)'s space efficiency
-
-Durability
-----------
-
-The third parameter of SHEC (=c) is a durability estimator, which approximates
-the number of OSDs that can be down without losing data.
-
-``durability estimator of SHEC(4,3,2) = 2``
-
-Recovery Efficiency
--------------------
-
-Describing calculation of recovery efficiency is beyond the scope of this document,
-but at least increasing m without increasing c achieves improvement of recovery efficiency.
-(However, we must pay attention to the sacrifice of space efficiency in this case.)
-
-``SHEC(4,2,2) -> SHEC(4,3,2) : achieves improvement of recovery efficiency``
-
-Erasure code profile examples
-=============================
-
-::
-
- $ ceph osd erasure-code-profile set SHECprofile \
- plugin=shec \
- k=8 m=4 c=3 \
- crush-failure-domain=host
- $ ceph osd pool create shecpool 256 256 erasure SHECprofile
diff --git a/src/ceph/doc/rados/operations/erasure-code.rst b/src/ceph/doc/rados/operations/erasure-code.rst
deleted file mode 100644
index 6ec5a09..0000000
--- a/src/ceph/doc/rados/operations/erasure-code.rst
+++ /dev/null
@@ -1,195 +0,0 @@
-=============
- Erasure code
-=============
-
-A Ceph pool is associated to a type to sustain the loss of an OSD
-(i.e. a disk since most of the time there is one OSD per disk). The
-default choice when `creating a pool <../pools>`_ is *replicated*,
-meaning every object is copied on multiple disks. The `Erasure Code
-<https://en.wikipedia.org/wiki/Erasure_code>`_ pool type can be used
-instead to save space.
-
-Creating a sample erasure coded pool
-------------------------------------
-
-The simplest erasure coded pool is equivalent to `RAID5
-<https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5>`_ and
-requires at least three hosts::
-
- $ ceph osd pool create ecpool 12 12 erasure
- pool 'ecpool' created
- $ echo ABCDEFGHI | rados --pool ecpool put NYAN -
- $ rados --pool ecpool get NYAN -
- ABCDEFGHI
-
-.. note:: the 12 in *pool create* stands for
- `the number of placement groups <../pools>`_.
-
-Erasure code profiles
----------------------
-
-The default erasure code profile sustains the loss of a single OSD. It
-is equivalent to a replicated pool of size two but requires 1.5TB
-instead of 2TB to store 1TB of data. The default profile can be
-displayed with::
-
- $ ceph osd erasure-code-profile get default
- k=2
- m=1
- plugin=jerasure
- crush-failure-domain=host
- technique=reed_sol_van
-
-Choosing the right profile is important because it cannot be modified
-after the pool is created: a new pool with a different profile needs
-to be created and all objects from the previous pool moved to the new.
-
-The most important parameters of the profile are *K*, *M* and
-*crush-failure-domain* because they define the storage overhead and
-the data durability. For instance, if the desired architecture must
-sustain the loss of two racks with a storage overhead of 40% overhead,
-the following profile can be defined::
-
- $ ceph osd erasure-code-profile set myprofile \
- k=3 \
- m=2 \
- crush-failure-domain=rack
- $ ceph osd pool create ecpool 12 12 erasure myprofile
- $ echo ABCDEFGHI | rados --pool ecpool put NYAN -
- $ rados --pool ecpool get NYAN -
- ABCDEFGHI
-
-The *NYAN* object will be divided in three (*K=3*) and two additional
-*chunks* will be created (*M=2*). The value of *M* defines how many
-OSD can be lost simultaneously without losing any data. The
-*crush-failure-domain=rack* will create a CRUSH ruleset that ensures
-no two *chunks* are stored in the same rack.
-
-.. ditaa::
- +-------------------+
- name | NYAN |
- +-------------------+
- content | ABCDEFGHI |
- +--------+----------+
- |
- |
- v
- +------+------+
- +---------------+ encode(3,2) +-----------+
- | +--+--+---+---+ |
- | | | | |
- | +-------+ | +-----+ |
- | | | | |
- +--v---+ +--v---+ +--v---+ +--v---+ +--v---+
- name | NYAN | | NYAN | | NYAN | | NYAN | | NYAN |
- +------+ +------+ +------+ +------+ +------+
- shard | 1 | | 2 | | 3 | | 4 | | 5 |
- +------+ +------+ +------+ +------+ +------+
- content | ABC | | DEF | | GHI | | YXY | | QGC |
- +--+---+ +--+---+ +--+---+ +--+---+ +--+---+
- | | | | |
- | | v | |
- | | +--+---+ | |
- | | | OSD1 | | |
- | | +------+ | |
- | | | |
- | | +------+ | |
- | +------>| OSD2 | | |
- | +------+ | |
- | | |
- | +------+ | |
- | | OSD3 |<----+ |
- | +------+ |
- | |
- | +------+ |
- | | OSD4 |<--------------+
- | +------+
- |
- | +------+
- +----------------->| OSD5 |
- +------+
-
-
-More information can be found in the `erasure code profiles
-<../erasure-code-profile>`_ documentation.
-
-
-Erasure Coding with Overwrites
-------------------------------
-
-By default, erasure coded pools only work with uses like RGW that
-perform full object writes and appends.
-
-Since Luminous, partial writes for an erasure coded pool may be
-enabled with a per-pool setting. This lets RBD and Cephfs store their
-data in an erasure coded pool::
-
- ceph osd pool set ec_pool allow_ec_overwrites true
-
-This can only be enabled on a pool residing on bluestore OSDs, since
-bluestore's checksumming is used to detect bitrot or other corruption
-during deep-scrub. In addition to being unsafe, using filestore with
-ec overwrites yields low performance compared to bluestore.
-
-Erasure coded pools do not support omap, so to use them with RBD and
-Cephfs you must instruct them to store their data in an ec pool, and
-their metadata in a replicated pool. For RBD, this means using the
-erasure coded pool as the ``--data-pool`` during image creation::
-
- rbd create --size 1G --data-pool ec_pool replicated_pool/image_name
-
-For Cephfs, using an erasure coded pool means setting that pool in
-a `file layout <../../../cephfs/file-layouts>`_.
-
-
-Erasure coded pool and cache tiering
-------------------------------------
-
-Erasure coded pools require more resources than replicated pools and
-lack some functionalities such as omap. To overcome these
-limitations, one can set up a `cache tier <../cache-tiering>`_
-before the erasure coded pool.
-
-For instance, if the pool *hot-storage* is made of fast storage::
-
- $ ceph osd tier add ecpool hot-storage
- $ ceph osd tier cache-mode hot-storage writeback
- $ ceph osd tier set-overlay ecpool hot-storage
-
-will place the *hot-storage* pool as tier of *ecpool* in *writeback*
-mode so that every write and read to the *ecpool* are actually using
-the *hot-storage* and benefit from its flexibility and speed.
-
-More information can be found in the `cache tiering
-<../cache-tiering>`_ documentation.
-
-Glossary
---------
-
-*chunk*
- when the encoding function is called, it returns chunks of the same
- size. Data chunks which can be concatenated to reconstruct the original
- object and coding chunks which can be used to rebuild a lost chunk.
-
-*K*
- the number of data *chunks*, i.e. the number of *chunks* in which the
- original object is divided. For instance if *K* = 2 a 10KB object
- will be divided into *K* objects of 5KB each.
-
-*M*
- the number of coding *chunks*, i.e. the number of additional *chunks*
- computed by the encoding functions. If there are 2 coding *chunks*,
- it means 2 OSDs can be out without losing data.
-
-
-Table of content
-----------------
-
-.. toctree::
- :maxdepth: 1
-
- erasure-code-profile
- erasure-code-jerasure
- erasure-code-isa
- erasure-code-lrc
- erasure-code-shec
diff --git a/src/ceph/doc/rados/operations/health-checks.rst b/src/ceph/doc/rados/operations/health-checks.rst
deleted file mode 100644
index c1e2200..0000000
--- a/src/ceph/doc/rados/operations/health-checks.rst
+++ /dev/null
@@ -1,527 +0,0 @@
-
-=============
-Health checks
-=============
-
-Overview
-========
-
-There is a finite set of possible health messages that a Ceph cluster can
-raise -- these are defined as *health checks* which have unique identifiers.
-
-The identifier is a terse pseudo-human-readable (i.e. like a variable name)
-string. It is intended to enable tools (such as UIs) to make sense of
-health checks, and present them in a way that reflects their meaning.
-
-This page lists the health checks that are raised by the monitor and manager
-daemons. In addition to these, you may also see health checks that originate
-from MDS daemons (see :doc:`/cephfs/health-messages`), and health checks
-that are defined by ceph-mgr python modules.
-
-Definitions
-===========
-
-
-OSDs
-----
-
-OSD_DOWN
-________
-
-One or more OSDs are marked down. The ceph-osd daemon may have been
-stopped, or peer OSDs may be unable to reach the OSD over the network.
-Common causes include a stopped or crashed daemon, a down host, or a
-network outage.
-
-Verify the host is healthy, the daemon is started, and network is
-functioning. If the daemon has crashed, the daemon log file
-(``/var/log/ceph/ceph-osd.*``) may contain debugging information.
-
-OSD_<crush type>_DOWN
-_____________________
-
-(e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN)
-
-All the OSDs within a particular CRUSH subtree are marked down, for example
-all OSDs on a host.
-
-OSD_ORPHAN
-__________
-
-An OSD is referenced in the CRUSH map hierarchy but does not exist.
-
-The OSD can be removed from the CRUSH hierarchy with::
-
- ceph osd crush rm osd.<id>
-
-OSD_OUT_OF_ORDER_FULL
-_____________________
-
-The utilization thresholds for `backfillfull`, `nearfull`, `full`,
-and/or `failsafe_full` are not ascending. In particular, we expect
-`backfillfull < nearfull`, `nearfull < full`, and `full <
-failsafe_full`.
-
-The thresholds can be adjusted with::
-
- ceph osd set-backfillfull-ratio <ratio>
- ceph osd set-nearfull-ratio <ratio>
- ceph osd set-full-ratio <ratio>
-
-
-OSD_FULL
-________
-
-One or more OSDs has exceeded the `full` threshold and is preventing
-the cluster from servicing writes.
-
-Utilization by pool can be checked with::
-
- ceph df
-
-The currently defined `full` ratio can be seen with::
-
- ceph osd dump | grep full_ratio
-
-A short-term workaround to restore write availability is to raise the full
-threshold by a small amount::
-
- ceph osd set-full-ratio <ratio>
-
-New storage should be added to the cluster by deploying more OSDs or
-existing data should be deleted in order to free up space.
-
-OSD_BACKFILLFULL
-________________
-
-One or more OSDs has exceeded the `backfillfull` threshold, which will
-prevent data from being allowed to rebalance to this device. This is
-an early warning that rebalancing may not be able to complete and that
-the cluster is approaching full.
-
-Utilization by pool can be checked with::
-
- ceph df
-
-OSD_NEARFULL
-____________
-
-One or more OSDs has exceeded the `nearfull` threshold. This is an early
-warning that the cluster is approaching full.
-
-Utilization by pool can be checked with::
-
- ceph df
-
-OSDMAP_FLAGS
-____________
-
-One or more cluster flags of interest has been set. These flags include:
-
-* *full* - the cluster is flagged as full and cannot service writes
-* *pauserd*, *pausewr* - paused reads or writes
-* *noup* - OSDs are not allowed to start
-* *nodown* - OSD failure reports are being ignored, such that the
- monitors will not mark OSDs `down`
-* *noin* - OSDs that were previously marked `out` will not be marked
- back `in` when they start
-* *noout* - down OSDs will not automatically be marked out after the
- configured interval
-* *nobackfill*, *norecover*, *norebalance* - recovery or data
- rebalancing is suspended
-* *noscrub*, *nodeep_scrub* - scrubbing is disabled
-* *notieragent* - cache tiering activity is suspended
-
-With the exception of *full*, these flags can be set or cleared with::
-
- ceph osd set <flag>
- ceph osd unset <flag>
-
-OSD_FLAGS
-_________
-
-One or more OSDs has a per-OSD flag of interest set. These flags include:
-
-* *noup*: OSD is not allowed to start
-* *nodown*: failure reports for this OSD will be ignored
-* *noin*: if this OSD was previously marked `out` automatically
- after a failure, it will not be marked in when it stats
-* *noout*: if this OSD is down it will not automatically be marked
- `out` after the configured interval
-
-Per-OSD flags can be set and cleared with::
-
- ceph osd add-<flag> <osd-id>
- ceph osd rm-<flag> <osd-id>
-
-For example, ::
-
- ceph osd rm-nodown osd.123
-
-OLD_CRUSH_TUNABLES
-__________________
-
-The CRUSH map is using very old settings and should be updated. The
-oldest tunables that can be used (i.e., the oldest client version that
-can connect to the cluster) without triggering this health warning is
-determined by the ``mon_crush_min_required_version`` config option.
-See :doc:`/rados/operations/crush-map/#tunables` for more information.
-
-OLD_CRUSH_STRAW_CALC_VERSION
-____________________________
-
-The CRUSH map is using an older, non-optimal method for calculating
-intermediate weight values for ``straw`` buckets.
-
-The CRUSH map should be updated to use the newer method
-(``straw_calc_version=1``). See
-:doc:`/rados/operations/crush-map/#tunables` for more information.
-
-CACHE_POOL_NO_HIT_SET
-_____________________
-
-One or more cache pools is not configured with a *hit set* to track
-utilization, which will prevent the tiering agent from identifying
-cold objects to flush and evict from the cache.
-
-Hit sets can be configured on the cache pool with::
-
- ceph osd pool set <poolname> hit_set_type <type>
- ceph osd pool set <poolname> hit_set_period <period-in-seconds>
- ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
- ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>
-
-OSD_NO_SORTBITWISE
-__________________
-
-No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not
-been set.
-
-The ``sortbitwise`` flag must be set before luminous v12.y.z or newer
-OSDs can start. You can safely set the flag with::
-
- ceph osd set sortbitwise
-
-POOL_FULL
-_________
-
-One or more pools has reached its quota and is no longer allowing writes.
-
-Pool quotas and utilization can be seen with::
-
- ceph df detail
-
-You can either raise the pool quota with::
-
- ceph osd pool set-quota <poolname> max_objects <num-objects>
- ceph osd pool set-quota <poolname> max_bytes <num-bytes>
-
-or delete some existing data to reduce utilization.
-
-
-Data health (pools & placement groups)
---------------------------------------
-
-PG_AVAILABILITY
-_______________
-
-Data availability is reduced, meaning that the cluster is unable to
-service potential read or write requests for some data in the cluster.
-Specifically, one or more PGs is in a state that does not allow IO
-requests to be serviced. Problematic PG states include *peering*,
-*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear
-quickly).
-
-Detailed information about which PGs are affected is available from::
-
- ceph health detail
-
-In most cases the root cause is that one or more OSDs is currently
-down; see the dicussion for ``OSD_DOWN`` above.
-
-The state of specific problematic PGs can be queried with::
-
- ceph tell <pgid> query
-
-PG_DEGRADED
-___________
-
-Data redundancy is reduced for some data, meaning the cluster does not
-have the desired number of replicas for all data (for replicated
-pools) or erasure code fragments (for erasure coded pools).
-Specifically, one or more PGs:
-
-* has the *degraded* or *undersized* flag set, meaning there are not
- enough instances of that placement group in the cluster;
-* has not had the *clean* flag set for some time.
-
-Detailed information about which PGs are affected is available from::
-
- ceph health detail
-
-In most cases the root cause is that one or more OSDs is currently
-down; see the dicussion for ``OSD_DOWN`` above.
-
-The state of specific problematic PGs can be queried with::
-
- ceph tell <pgid> query
-
-
-PG_DEGRADED_FULL
-________________
-
-Data redundancy may be reduced or at risk for some data due to a lack
-of free space in the cluster. Specifically, one or more PGs has the
-*backfill_toofull* or *recovery_toofull* flag set, meaning that the
-cluster is unable to migrate or recover data because one or more OSDs
-is above the *backfillfull* threshold.
-
-See the discussion for *OSD_BACKFILLFULL* or *OSD_FULL* above for
-steps to resolve this condition.
-
-PG_DAMAGED
-__________
-
-Data scrubbing has discovered some problems with data consistency in
-the cluster. Specifically, one or more PGs has the *inconsistent* or
-*snaptrim_error* flag is set, indicating an earlier scrub operation
-found a problem, or that the *repair* flag is set, meaning a repair
-for such an inconsistency is currently in progress.
-
-See :doc:`pg-repair` for more information.
-
-OSD_SCRUB_ERRORS
-________________
-
-Recent OSD scrubs have uncovered inconsistencies. This error is generally
-paired with *PG_DAMANGED* (see above).
-
-See :doc:`pg-repair` for more information.
-
-CACHE_POOL_NEAR_FULL
-____________________
-
-A cache tier pool is nearly full. Full in this context is determined
-by the ``target_max_bytes`` and ``target_max_objects`` properties on
-the cache pool. Once the pool reaches the target threshold, write
-requests to the pool may block while data is flushed and evicted
-from the cache, a state that normally leads to very high latencies and
-poor performance.
-
-The cache pool target size can be adjusted with::
-
- ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
- ceph osd pool set <cache-pool-name> target_max_objects <objects>
-
-Normal cache flush and evict activity may also be throttled due to reduced
-availability or performance of the base tier, or overall cluster load.
-
-TOO_FEW_PGS
-___________
-
-The number of PGs in use in the cluster is below the configurable
-threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead
-to suboptimizal distribution and balance of data across the OSDs in
-the cluster, and similar reduce overall performance.
-
-This may be an expected condition if data pools have not yet been
-created.
-
-The PG count for existing pools can be increased or new pools can be
-created. Please refer to
-:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for
-more information.
-
-TOO_MANY_PGS
-____________
-
-The number of PGs in use in the cluster is above the configurable
-threshold of ``mon_max_pg_per_osd`` PGs per OSD. If this threshold is
-exceed the cluster will not allow new pools to be created, pool `pg_num` to
-be increased, or pool replication to be increased (any of which would lead to
-more PGs in the cluster). A large number of PGs can lead
-to higher memory utilization for OSD daemons, slower peering after
-cluster state changes (like OSD restarts, additions, or removals), and
-higher load on the Manager and Monitor daemons.
-
-The simplest way to mitigate the problem is to increase the number of
-OSDs in the cluster by adding more hardware. Note that the OSD count
-used for the purposes of this health check is the number of "in" OSDs,
-so marking "out" OSDs "in" (if there are any) can also help::
-
- ceph osd in <osd id(s)>
-
-Please refer to
-:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for
-more information.
-
-SMALLER_PGP_NUM
-_______________
-
-One or more pools has a ``pgp_num`` value less than ``pg_num``. This
-is normally an indication that the PG count was increased without
-also increasing the placement behavior.
-
-This is sometimes done deliberately to separate out the `split` step
-when the PG count is adjusted from the data migration that is needed
-when ``pgp_num`` is changed.
-
-This is normally resolved by setting ``pgp_num`` to match ``pg_num``,
-triggering the data migration, with::
-
- ceph osd pool set <pool> pgp_num <pg-num-value>
-
-MANY_OBJECTS_PER_PG
-___________________
-
-One or more pools has an average number of objects per PG that is
-significantly higher than the overall cluster average. The specific
-threshold is controlled by the ``mon_pg_warn_max_object_skew``
-configuration value.
-
-This is usually an indication that the pool(s) containing most of the
-data in the cluster have too few PGs, and/or that other pools that do
-not contain as much data have too many PGs. See the discussion of
-*TOO_MANY_PGS* above.
-
-The threshold can be raised to silence the health warning by adjusting
-the ``mon_pg_warn_max_object_skew`` config option on the monitors.
-
-POOL_APP_NOT_ENABLED
-____________________
-
-A pool exists that contains one or more objects but has not been
-tagged for use by a particular application.
-
-Resolve this warning by labeling the pool for use by an application. For
-example, if the pool is used by RBD,::
-
- rbd pool init <poolname>
-
-If the pool is being used by a custom application 'foo', you can also label
-via the low-level command::
-
- ceph osd pool application enable foo
-
-For more information, see :doc:`pools.rst#associate-pool-to-application`.
-
-POOL_FULL
-_________
-
-One or more pools has reached (or is very close to reaching) its
-quota. The threshold to trigger this error condition is controlled by
-the ``mon_pool_quota_crit_threshold`` configuration option.
-
-Pool quotas can be adjusted up or down (or removed) with::
-
- ceph osd pool set-quota <pool> max_bytes <bytes>
- ceph osd pool set-quota <pool> max_objects <objects>
-
-Setting the quota value to 0 will disable the quota.
-
-POOL_NEAR_FULL
-______________
-
-One or more pools is approaching is quota. The threshold to trigger
-this warning condition is controlled by the
-``mon_pool_quota_warn_threshold`` configuration option.
-
-Pool quotas can be adjusted up or down (or removed) with::
-
- ceph osd pool set-quota <pool> max_bytes <bytes>
- ceph osd pool set-quota <pool> max_objects <objects>
-
-Setting the quota value to 0 will disable the quota.
-
-OBJECT_MISPLACED
-________________
-
-One or more objects in the cluster is not stored on the node the
-cluster would like it to be stored on. This is an indication that
-data migration due to some recent cluster change has not yet completed.
-
-Misplaced data is not a dangerous condition in and of itself; data
-consistency is never at risk, and old copies of objects are never
-removed until the desired number of new copies (in the desired
-locations) are present.
-
-OBJECT_UNFOUND
-______________
-
-One or more objects in the cluster cannot be found. Specifically, the
-OSDs know that a new or updated copy of an object should exist, but a
-copy of that version of the object has not been found on OSDs that are
-currently online.
-
-Read or write requests to unfound objects will block.
-
-Ideally, a down OSD can be brought back online that has the more
-recent copy of the unfound object. Candidate OSDs can be identified from the
-peering state for the PG(s) responsible for the unfound object::
-
- ceph tell <pgid> query
-
-If the latest copy of the object is not available, the cluster can be
-told to roll back to a previous version of the object. See
-:doc:`troubleshooting-pg#Unfound-objects` for more information.
-
-REQUEST_SLOW
-____________
-
-One or more OSD requests is taking a long time to process. This can
-be an indication of extreme load, a slow storage device, or a software
-bug.
-
-The request queue on the OSD(s) in question can be queried with the
-following command, executed from the OSD host::
-
- ceph daemon osd.<id> ops
-
-A summary of the slowest recent requests can be seen with::
-
- ceph daemon osd.<id> dump_historic_ops
-
-The location of an OSD can be found with::
-
- ceph osd find osd.<id>
-
-REQUEST_STUCK
-_____________
-
-One or more OSD requests has been blocked for an extremely long time.
-This is an indication that either the cluster has been unhealthy for
-an extended period of time (e.g., not enough running OSDs) or there is
-some internal problem with the OSD. See the dicussion of
-*REQUEST_SLOW* above.
-
-PG_NOT_SCRUBBED
-_______________
-
-One or more PGs has not been scrubbed recently. PGs are normally
-scrubbed every ``mon_scrub_interval`` seconds, and this warning
-triggers when ``mon_warn_not_scrubbed`` such intervals have elapsed
-without a scrub.
-
-PGs will not scrub if they are not flagged as *clean*, which may
-happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
-*PG_DEGRADED* above).
-
-You can manually initiate a scrub of a clean PG with::
-
- ceph pg scrub <pgid>
-
-PG_NOT_DEEP_SCRUBBED
-____________________
-
-One or more PGs has not been deep scrubbed recently. PGs are normally
-scrubbed every ``osd_deep_mon_scrub_interval`` seconds, and this warning
-triggers when ``mon_warn_not_deep_scrubbed`` such intervals have elapsed
-without a scrub.
-
-PGs will not (deep) scrub if they are not flagged as *clean*, which may
-happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
-*PG_DEGRADED* above).
-
-You can manually initiate a scrub of a clean PG with::
-
- ceph pg deep-scrub <pgid>
diff --git a/src/ceph/doc/rados/operations/index.rst b/src/ceph/doc/rados/operations/index.rst
deleted file mode 100644
index aacf764..0000000
--- a/src/ceph/doc/rados/operations/index.rst
+++ /dev/null
@@ -1,90 +0,0 @@
-====================
- Cluster Operations
-====================
-
-.. raw:: html
-
- <table><colgroup><col width="50%"><col width="50%"></colgroup><tbody valign="top"><tr><td><h3>High-level Operations</h3>
-
-High-level cluster operations consist primarily of starting, stopping, and
-restarting a cluster with the ``ceph`` service; checking the cluster's health;
-and, monitoring an operating cluster.
-
-.. toctree::
- :maxdepth: 1
-
- operating
- health-checks
- monitoring
- monitoring-osd-pg
- user-management
-
-.. raw:: html
-
- </td><td><h3>Data Placement</h3>
-
-Once you have your cluster up and running, you may begin working with data
-placement. Ceph supports petabyte-scale data storage clusters, with storage
-pools and placement groups that distribute data across the cluster using Ceph's
-CRUSH algorithm.
-
-.. toctree::
- :maxdepth: 1
-
- data-placement
- pools
- erasure-code
- cache-tiering
- placement-groups
- upmap
- crush-map
- crush-map-edits
-
-
-
-.. raw:: html
-
- </td></tr><tr><td><h3>Low-level Operations</h3>
-
-Low-level cluster operations consist of starting, stopping, and restarting a
-particular daemon within a cluster; changing the settings of a particular
-daemon or subsystem; and, adding a daemon to the cluster or removing a daemon
-from the cluster. The most common use cases for low-level operations include
-growing or shrinking the Ceph cluster and replacing legacy or failed hardware
-with new hardware.
-
-.. toctree::
- :maxdepth: 1
-
- add-or-rm-osds
- add-or-rm-mons
- Command Reference <control>
-
-
-
-.. raw:: html
-
- </td><td><h3>Troubleshooting</h3>
-
-Ceph is still on the leading edge, so you may encounter situations that require
-you to evaluate your Ceph configuration and modify your logging and debugging
-settings to identify and remedy issues you are encountering with your cluster.
-
-.. toctree::
- :maxdepth: 1
-
- ../troubleshooting/community
- ../troubleshooting/troubleshooting-mon
- ../troubleshooting/troubleshooting-osd
- ../troubleshooting/troubleshooting-pg
- ../troubleshooting/log-and-debug
- ../troubleshooting/cpu-profiling
- ../troubleshooting/memory-profiling
-
-
-
-
-.. raw:: html
-
- </td></tr></tbody></table>
-
diff --git a/src/ceph/doc/rados/operations/monitoring-osd-pg.rst b/src/ceph/doc/rados/operations/monitoring-osd-pg.rst
deleted file mode 100644
index 0107e34..0000000
--- a/src/ceph/doc/rados/operations/monitoring-osd-pg.rst
+++ /dev/null
@@ -1,617 +0,0 @@
-=========================
- Monitoring OSDs and PGs
-=========================
-
-High availability and high reliability require a fault-tolerant approach to
-managing hardware and software issues. Ceph has no single point-of-failure, and
-can service requests for data in a "degraded" mode. Ceph's `data placement`_
-introduces a layer of indirection to ensure that data doesn't bind directly to
-particular OSD addresses. This means that tracking down system faults requires
-finding the `placement group`_ and the underlying OSDs at root of the problem.
-
-.. tip:: A fault in one part of the cluster may prevent you from accessing a
- particular object, but that doesn't mean that you cannot access other objects.
- When you run into a fault, don't panic. Just follow the steps for monitoring
- your OSDs and placement groups. Then, begin troubleshooting.
-
-Ceph is generally self-repairing. However, when problems persist, monitoring
-OSDs and placement groups will help you identify the problem.
-
-
-Monitoring OSDs
-===============
-
-An OSD's status is either in the cluster (``in``) or out of the cluster
-(``out``); and, it is either up and running (``up``), or it is down and not
-running (``down``). If an OSD is ``up``, it may be either ``in`` the cluster
-(you can read and write data) or it is ``out`` of the cluster. If it was
-``in`` the cluster and recently moved ``out`` of the cluster, Ceph will migrate
-placement groups to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will
-not assign placement groups to the OSD. If an OSD is ``down``, it should also be
-``out``.
-
-.. note:: If an OSD is ``down`` and ``in``, there is a problem and the cluster
- will not be in a healthy state.
-
-.. ditaa:: +----------------+ +----------------+
- | | | |
- | OSD #n In | | OSD #n Up |
- | | | |
- +----------------+ +----------------+
- ^ ^
- | |
- | |
- v v
- +----------------+ +----------------+
- | | | |
- | OSD #n Out | | OSD #n Down |
- | | | |
- +----------------+ +----------------+
-
-If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
-you may notice that the cluster does not always echo back ``HEALTH OK``. Don't
-panic. With respect to OSDs, you should expect that the cluster will **NOT**
-echo ``HEALTH OK`` in a few expected circumstances:
-
-#. You haven't started the cluster yet (it won't respond).
-#. You have just started or restarted the cluster and it's not ready yet,
- because the placement groups are getting created and the OSDs are in
- the process of peering.
-#. You just added or removed an OSD.
-#. You just have modified your cluster map.
-
-An important aspect of monitoring OSDs is to ensure that when the cluster
-is up and running that all OSDs that are ``in`` the cluster are ``up`` and
-running, too. To see if all OSDs are running, execute::
-
- ceph osd stat
-
-The result should tell you the map epoch (eNNNN), the total number of OSDs (x),
-how many are ``up`` (y) and how many are ``in`` (z). ::
-
- eNNNN: x osds: y up, z in
-
-If the number of OSDs that are ``in`` the cluster is more than the number of
-OSDs that are ``up``, execute the following command to identify the ``ceph-osd``
-daemons that are not running::
-
- ceph osd tree
-
-::
-
- dumped osdmap tree epoch 1
- # id weight type name up/down reweight
- -1 2 pool openstack
- -3 2 rack dell-2950-rack-A
- -2 2 host dell-2950-A1
- 0 1 osd.0 up 1
- 1 1 osd.1 down 1
-
-
-.. tip:: The ability to search through a well-designed CRUSH hierarchy may help
- you troubleshoot your cluster by identifying the physcial locations faster.
-
-If an OSD is ``down``, start it::
-
- sudo systemctl start ceph-osd@1
-
-See `OSD Not Running`_ for problems associated with OSDs that stopped, or won't
-restart.
-
-
-PG Sets
-=======
-
-When CRUSH assigns placement groups to OSDs, it looks at the number of replicas
-for the pool and assigns the placement group to OSDs such that each replica of
-the placement group gets assigned to a different OSD. For example, if the pool
-requires three replicas of a placement group, CRUSH may assign them to
-``osd.1``, ``osd.2`` and ``osd.3`` respectively. CRUSH actually seeks a
-pseudo-random placement that will take into account failure domains you set in
-your `CRUSH map`_, so you will rarely see placement groups assigned to nearest
-neighbor OSDs in a large cluster. We refer to the set of OSDs that should
-contain the replicas of a particular placement group as the **Acting Set**. In
-some cases, an OSD in the Acting Set is ``down`` or otherwise not able to
-service requests for objects in the placement group. When these situations
-arise, don't panic. Common examples include:
-
-- You added or removed an OSD. Then, CRUSH reassigned the placement group to
- other OSDs--thereby changing the composition of the Acting Set and spawning
- the migration of data with a "backfill" process.
-- An OSD was ``down``, was restarted, and is now ``recovering``.
-- An OSD in the Acting Set is ``down`` or unable to service requests,
- and another OSD has temporarily assumed its duties.
-
-Ceph processes a client request using the **Up Set**, which is the set of OSDs
-that will actually handle the requests. In most cases, the Up Set and the Acting
-Set are virtually identical. When they are not, it may indicate that Ceph is
-migrating data, an OSD is recovering, or that there is a problem (i.e., Ceph
-usually echoes a "HEALTH WARN" state with a "stuck stale" message in such
-scenarios).
-
-To retrieve a list of placement groups, execute::
-
- ceph pg dump
-
-To view which OSDs are within the Acting Set or the Up Set for a given placement
-group, execute::
-
- ceph pg map {pg-num}
-
-The result should tell you the osdmap epoch (eNNN), the placement group number
-({pg-num}), the OSDs in the Up Set (up[]), and the OSDs in the acting set
-(acting[]). ::
-
- osdmap eNNN pg {pg-num} -> up [0,1,2] acting [0,1,2]
-
-.. note:: If the Up Set and Acting Set do not match, this may be an indicator
- that the cluster rebalancing itself or of a potential problem with
- the cluster.
-
-
-Peering
-=======
-
-Before you can write data to a placement group, it must be in an ``active``
-state, and it **should** be in a ``clean`` state. For Ceph to determine the
-current state of a placement group, the primary OSD of the placement group
-(i.e., the first OSD in the acting set), peers with the secondary and tertiary
-OSDs to establish agreement on the current state of the placement group
-(assuming a pool with 3 replicas of the PG).
-
-
-.. ditaa:: +---------+ +---------+ +-------+
- | OSD 1 | | OSD 2 | | OSD 3 |
- +---------+ +---------+ +-------+
- | | |
- | Request To | |
- | Peer | |
- |-------------->| |
- |<--------------| |
- | Peering |
- | |
- | Request To |
- | Peer |
- |----------------------------->|
- |<-----------------------------|
- | Peering |
-
-The OSDs also report their status to the monitor. See `Configuring Monitor/OSD
-Interaction`_ for details. To troubleshoot peering issues, see `Peering
-Failure`_.
-
-
-Monitoring Placement Group States
-=================================
-
-If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
-you may notice that the cluster does not always echo back ``HEALTH OK``. After
-you check to see if the OSDs are running, you should also check placement group
-states. You should expect that the cluster will **NOT** echo ``HEALTH OK`` in a
-number of placement group peering-related circumstances:
-
-#. You have just created a pool and placement groups haven't peered yet.
-#. The placement groups are recovering.
-#. You have just added an OSD to or removed an OSD from the cluster.
-#. You have just modified your CRUSH map and your placement groups are migrating.
-#. There is inconsistent data in different replicas of a placement group.
-#. Ceph is scrubbing a placement group's replicas.
-#. Ceph doesn't have enough storage capacity to complete backfilling operations.
-
-If one of the foregoing circumstances causes Ceph to echo ``HEALTH WARN``, don't
-panic. In many cases, the cluster will recover on its own. In some cases, you
-may need to take action. An important aspect of monitoring placement groups is
-to ensure that when the cluster is up and running that all placement groups are
-``active``, and preferably in the ``clean`` state. To see the status of all
-placement groups, execute::
-
- ceph pg stat
-
-The result should tell you the placement group map version (vNNNNNN), the total
-number of placement groups (x), and how many placement groups are in a
-particular state such as ``active+clean`` (y). ::
-
- vNNNNNN: x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail
-
-.. note:: It is common for Ceph to report multiple states for placement groups.
-
-In addition to the placement group states, Ceph will also echo back the amount
-of data used (aa), the amount of storage capacity remaining (bb), and the total
-storage capacity for the placement group. These numbers can be important in a
-few cases:
-
-- You are reaching your ``near full ratio`` or ``full ratio``.
-- Your data is not getting distributed across the cluster due to an
- error in your CRUSH configuration.
-
-
-.. topic:: Placement Group IDs
-
- Placement group IDs consist of the pool number (not pool name) followed
- by a period (.) and the placement group ID--a hexadecimal number. You
- can view pool numbers and their names from the output of ``ceph osd
- lspools``. For example, the default pool ``rbd`` corresponds to
- pool number ``0``. A fully qualified placement group ID has the
- following form::
-
- {pool-num}.{pg-id}
-
- And it typically looks like this::
-
- 0.1f
-
-
-To retrieve a list of placement groups, execute the following::
-
- ceph pg dump
-
-You can also format the output in JSON format and save it to a file::
-
- ceph pg dump -o {filename} --format=json
-
-To query a particular placement group, execute the following::
-
- ceph pg {poolnum}.{pg-id} query
-
-Ceph will output the query in JSON format.
-
-.. code-block:: javascript
-
- {
- "state": "active+clean",
- "up": [
- 1,
- 0
- ],
- "acting": [
- 1,
- 0
- ],
- "info": {
- "pgid": "1.e",
- "last_update": "4'1",
- "last_complete": "4'1",
- "log_tail": "0'0",
- "last_backfill": "MAX",
- "purged_snaps": "[]",
- "history": {
- "epoch_created": 1,
- "last_epoch_started": 537,
- "last_epoch_clean": 537,
- "last_epoch_split": 534,
- "same_up_since": 536,
- "same_interval_since": 536,
- "same_primary_since": 536,
- "last_scrub": "4'1",
- "last_scrub_stamp": "2013-01-25 10:12:23.828174"
- },
- "stats": {
- "version": "4'1",
- "reported": "536'782",
- "state": "active+clean",
- "last_fresh": "2013-01-25 10:12:23.828271",
- "last_change": "2013-01-25 10:12:23.828271",
- "last_active": "2013-01-25 10:12:23.828271",
- "last_clean": "2013-01-25 10:12:23.828271",
- "last_unstale": "2013-01-25 10:12:23.828271",
- "mapping_epoch": 535,
- "log_start": "0'0",
- "ondisk_log_start": "0'0",
- "created": 1,
- "last_epoch_clean": 1,
- "parent": "0.0",
- "parent_split_bits": 0,
- "last_scrub": "4'1",
- "last_scrub_stamp": "2013-01-25 10:12:23.828174",
- "log_size": 128,
- "ondisk_log_size": 128,
- "stat_sum": {
- "num_bytes": 205,
- "num_objects": 1,
- "num_object_clones": 0,
- "num_object_copies": 0,
- "num_objects_missing_on_primary": 0,
- "num_objects_degraded": 0,
- "num_objects_unfound": 0,
- "num_read": 1,
- "num_read_kb": 0,
- "num_write": 3,
- "num_write_kb": 1
- },
- "stat_cat_sum": {
-
- },
- "up": [
- 1,
- 0
- ],
- "acting": [
- 1,
- 0
- ]
- },
- "empty": 0,
- "dne": 0,
- "incomplete": 0
- },
- "recovery_state": [
- {
- "name": "Started\/Primary\/Active",
- "enter_time": "2013-01-23 09:35:37.594691",
- "might_have_unfound": [
-
- ],
- "scrub": {
- "scrub_epoch_start": "536",
- "scrub_active": 0,
- "scrub_block_writes": 0,
- "finalizing_scrub": 0,
- "scrub_waiting_on": 0,
- "scrub_waiting_on_whom": [
-
- ]
- }
- },
- {
- "name": "Started",
- "enter_time": "2013-01-23 09:35:31.581160"
- }
- ]
- }
-
-
-
-The following subsections describe common states in greater detail.
-
-Creating
---------
-
-When you create a pool, it will create the number of placement groups you
-specified. Ceph will echo ``creating`` when it is creating one or more
-placement groups. Once they are created, the OSDs that are part of a placement
-group's Acting Set will peer. Once peering is complete, the placement group
-status should be ``active+clean``, which means a Ceph client can begin writing
-to the placement group.
-
-.. ditaa::
-
- /-----------\ /-----------\ /-----------\
- | Creating |------>| Peering |------>| Active |
- \-----------/ \-----------/ \-----------/
-
-Peering
--------
-
-When Ceph is Peering a placement group, Ceph is bringing the OSDs that
-store the replicas of the placement group into **agreement about the state**
-of the objects and metadata in the placement group. When Ceph completes peering,
-this means that the OSDs that store the placement group agree about the current
-state of the placement group. However, completion of the peering process does
-**NOT** mean that each replica has the latest contents.
-
-.. topic:: Authoratative History
-
- Ceph will **NOT** acknowledge a write operation to a client, until
- all OSDs of the acting set persist the write operation. This practice
- ensures that at least one member of the acting set will have a record
- of every acknowledged write operation since the last successful
- peering operation.
-
- With an accurate record of each acknowledged write operation, Ceph can
- construct and disseminate a new authoritative history of the placement
- group--a complete, and fully ordered set of operations that, if performed,
- would bring an OSD’s copy of a placement group up to date.
-
-
-Active
-------
-
-Once Ceph completes the peering process, a placement group may become
-``active``. The ``active`` state means that the data in the placement group is
-generally available in the primary placement group and the replicas for read
-and write operations.
-
-
-Clean
------
-
-When a placement group is in the ``clean`` state, the primary OSD and the
-replica OSDs have successfully peered and there are no stray replicas for the
-placement group. Ceph replicated all objects in the placement group the correct
-number of times.
-
-
-Degraded
---------
-
-When a client writes an object to the primary OSD, the primary OSD is
-responsible for writing the replicas to the replica OSDs. After the primary OSD
-writes the object to storage, the placement group will remain in a ``degraded``
-state until the primary OSD has received an acknowledgement from the replica
-OSDs that Ceph created the replica objects successfully.
-
-The reason a placement group can be ``active+degraded`` is that an OSD may be
-``active`` even though it doesn't hold all of the objects yet. If an OSD goes
-``down``, Ceph marks each placement group assigned to the OSD as ``degraded``.
-The OSDs must peer again when the OSD comes back online. However, a client can
-still write a new object to a ``degraded`` placement group if it is ``active``.
-
-If an OSD is ``down`` and the ``degraded`` condition persists, Ceph may mark the
-``down`` OSD as ``out`` of the cluster and remap the data from the ``down`` OSD
-to another OSD. The time between being marked ``down`` and being marked ``out``
-is controlled by ``mon osd down out interval``, which is set to ``600`` seconds
-by default.
-
-A placement group can also be ``degraded``, because Ceph cannot find one or more
-objects that Ceph thinks should be in the placement group. While you cannot
-read or write to unfound objects, you can still access all of the other objects
-in the ``degraded`` placement group.
-
-
-Recovering
-----------
-
-Ceph was designed for fault-tolerance at a scale where hardware and software
-problems are ongoing. When an OSD goes ``down``, its contents may fall behind
-the current state of other replicas in the placement groups. When the OSD is
-back ``up``, the contents of the placement groups must be updated to reflect the
-current state. During that time period, the OSD may reflect a ``recovering``
-state.
-
-Recovery is not always trivial, because a hardware failure might cause a
-cascading failure of multiple OSDs. For example, a network switch for a rack or
-cabinet may fail, which can cause the OSDs of a number of host machines to fall
-behind the current state of the cluster. Each one of the OSDs must recover once
-the fault is resolved.
-
-Ceph provides a number of settings to balance the resource contention between
-new service requests and the need to recover data objects and restore the
-placement groups to the current state. The ``osd recovery delay start`` setting
-allows an OSD to restart, re-peer and even process some replay requests before
-starting the recovery process. The ``osd
-recovery thread timeout`` sets a thread timeout, because multiple OSDs may fail,
-restart and re-peer at staggered rates. The ``osd recovery max active`` setting
-limits the number of recovery requests an OSD will entertain simultaneously to
-prevent the OSD from failing to serve . The ``osd recovery max chunk`` setting
-limits the size of the recovered data chunks to prevent network congestion.
-
-
-Back Filling
-------------
-
-When a new OSD joins the cluster, CRUSH will reassign placement groups from OSDs
-in the cluster to the newly added OSD. Forcing the new OSD to accept the
-reassigned placement groups immediately can put excessive load on the new OSD.
-Back filling the OSD with the placement groups allows this process to begin in
-the background. Once backfilling is complete, the new OSD will begin serving
-requests when it is ready.
-
-During the backfill operations, you may see one of several states:
-``backfill_wait`` indicates that a backfill operation is pending, but is not
-underway yet; ``backfill`` indicates that a backfill operation is underway;
-and, ``backfill_too_full`` indicates that a backfill operation was requested,
-but couldn't be completed due to insufficient storage capacity. When a
-placement group cannot be backfilled, it may be considered ``incomplete``.
-
-Ceph provides a number of settings to manage the load spike associated with
-reassigning placement groups to an OSD (especially a new OSD). By default,
-``osd_max_backfills`` sets the maximum number of concurrent backfills to or from
-an OSD to 10. The ``backfill full ratio`` enables an OSD to refuse a
-backfill request if the OSD is approaching its full ratio (90%, by default) and
-change with ``ceph osd set-backfillfull-ratio`` comand.
-If an OSD refuses a backfill request, the ``osd backfill retry interval``
-enables an OSD to retry the request (after 10 seconds, by default). OSDs can
-also set ``osd backfill scan min`` and ``osd backfill scan max`` to manage scan
-intervals (64 and 512, by default).
-
-
-Remapped
---------
-
-When the Acting Set that services a placement group changes, the data migrates
-from the old acting set to the new acting set. It may take some time for a new
-primary OSD to service requests. So it may ask the old primary to continue to
-service requests until the placement group migration is complete. Once data
-migration completes, the mapping uses the primary OSD of the new acting set.
-
-
-Stale
------
-
-While Ceph uses heartbeats to ensure that hosts and daemons are running, the
-``ceph-osd`` daemons may also get into a ``stuck`` state where they are not
-reporting statistics in a timely manner (e.g., a temporary network fault). By
-default, OSD daemons report their placement group, up thru, boot and failure
-statistics every half second (i.e., ``0.5``), which is more frequent than the
-heartbeat thresholds. If the **Primary OSD** of a placement group's acting set
-fails to report to the monitor or if other OSDs have reported the primary OSD
-``down``, the monitors will mark the placement group ``stale``.
-
-When you start your cluster, it is common to see the ``stale`` state until
-the peering process completes. After your cluster has been running for awhile,
-seeing placement groups in the ``stale`` state indicates that the primary OSD
-for those placement groups is ``down`` or not reporting placement group statistics
-to the monitor.
-
-
-Identifying Troubled PGs
-========================
-
-As previously noted, a placement group is not necessarily problematic just
-because its state is not ``active+clean``. Generally, Ceph's ability to self
-repair may not be working when placement groups get stuck. The stuck states
-include:
-
-- **Unclean**: Placement groups contain objects that are not replicated the
- desired number of times. They should be recovering.
-- **Inactive**: Placement groups cannot process reads or writes because they
- are waiting for an OSD with the most up-to-date data to come back ``up``.
-- **Stale**: Placement groups are in an unknown state, because the OSDs that
- host them have not reported to the monitor cluster in a while (configured
- by ``mon osd report timeout``).
-
-To identify stuck placement groups, execute the following::
-
- ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
-
-See `Placement Group Subsystem`_ for additional details. To troubleshoot
-stuck placement groups, see `Troubleshooting PG Errors`_.
-
-
-Finding an Object Location
-==========================
-
-To store object data in the Ceph Object Store, a Ceph client must:
-
-#. Set an object name
-#. Specify a `pool`_
-
-The Ceph client retrieves the latest cluster map and the CRUSH algorithm
-calculates how to map the object to a `placement group`_, and then calculates
-how to assign the placement group to an OSD dynamically. To find the object
-location, all you need is the object name and the pool name. For example::
-
- ceph osd map {poolname} {object-name}
-
-.. topic:: Exercise: Locate an Object
-
- As an exercise, lets create an object. Specify an object name, a path to a
- test file containing some object data and a pool name using the
- ``rados put`` command on the command line. For example::
-
- rados put {object-name} {file-path} --pool=data
- rados put test-object-1 testfile.txt --pool=data
-
- To verify that the Ceph Object Store stored the object, execute the following::
-
- rados -p data ls
-
- Now, identify the object location::
-
- ceph osd map {pool-name} {object-name}
- ceph osd map data test-object-1
-
- Ceph should output the object's location. For example::
-
- osdmap e537 pool 'data' (0) object 'test-object-1' -> pg 0.d1743484 (0.4) -> up [1,0] acting [1,0]
-
- To remove the test object, simply delete it using the ``rados rm`` command.
- For example::
-
- rados rm test-object-1 --pool=data
-
-
-As the cluster evolves, the object location may change dynamically. One benefit
-of Ceph's dynamic rebalancing is that Ceph relieves you from having to perform
-the migration manually. See the `Architecture`_ section for details.
-
-.. _data placement: ../data-placement
-.. _pool: ../pools
-.. _placement group: ../placement-groups
-.. _Architecture: ../../../architecture
-.. _OSD Not Running: ../../troubleshooting/troubleshooting-osd#osd-not-running
-.. _Troubleshooting PG Errors: ../../troubleshooting/troubleshooting-pg#troubleshooting-pg-errors
-.. _Peering Failure: ../../troubleshooting/troubleshooting-pg#failures-osd-peering
-.. _CRUSH map: ../crush-map
-.. _Configuring Monitor/OSD Interaction: ../../configuration/mon-osd-interaction/
-.. _Placement Group Subsystem: ../control#placement-group-subsystem
diff --git a/src/ceph/doc/rados/operations/monitoring.rst b/src/ceph/doc/rados/operations/monitoring.rst
deleted file mode 100644
index c291440..0000000
--- a/src/ceph/doc/rados/operations/monitoring.rst
+++ /dev/null
@@ -1,351 +0,0 @@
-======================
- Monitoring a Cluster
-======================
-
-Once you have a running cluster, you may use the ``ceph`` tool to monitor your
-cluster. Monitoring a cluster typically involves checking OSD status, monitor
-status, placement group status and metadata server status.
-
-Using the command line
-======================
-
-Interactive mode
-----------------
-
-To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
-with no arguments. For example::
-
- ceph
- ceph> health
- ceph> status
- ceph> quorum_status
- ceph> mon_status
-
-Non-default paths
------------------
-
-If you specified non-default locations for your configuration or keyring,
-you may specify their locations::
-
- ceph -c /path/to/conf -k /path/to/keyring health
-
-Checking a Cluster's Status
-===========================
-
-After you start your cluster, and before you start reading and/or
-writing data, check your cluster's status first.
-
-To check a cluster's status, execute the following::
-
- ceph status
-
-Or::
-
- ceph -s
-
-In interactive mode, type ``status`` and press **Enter**. ::
-
- ceph> status
-
-Ceph will print the cluster status. For example, a tiny Ceph demonstration
-cluster with one of each service may print the following:
-
-::
-
- cluster:
- id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
- health: HEALTH_OK
-
- services:
- mon: 1 daemons, quorum a
- mgr: x(active)
- mds: 1/1/1 up {0=a=up:active}
- osd: 1 osds: 1 up, 1 in
-
- data:
- pools: 2 pools, 16 pgs
- objects: 21 objects, 2246 bytes
- usage: 546 GB used, 384 GB / 931 GB avail
- pgs: 16 active+clean
-
-
-.. topic:: How Ceph Calculates Data Usage
-
- The ``usage`` value reflects the *actual* amount of raw storage used. The
- ``xxx GB / xxx GB`` value means the amount available (the lesser number)
- of the overall storage capacity of the cluster. The notional number reflects
- the size of the stored data before it is replicated, cloned or snapshotted.
- Therefore, the amount of data actually stored typically exceeds the notional
- amount stored, because Ceph creates replicas of the data and may also use
- storage capacity for cloning and snapshotting.
-
-
-Watching a Cluster
-==================
-
-In addition to local logging by each daemon, Ceph clusters maintain
-a *cluster log* that records high level events about the whole system.
-This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
-default), but can also be monitored via the command line.
-
-To follow the cluster log, use the following command
-
-::
-
- ceph -w
-
-Ceph will print the status of the system, followed by each log message as it
-is emitted. For example:
-
-::
-
- cluster:
- id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
- health: HEALTH_OK
-
- services:
- mon: 1 daemons, quorum a
- mgr: x(active)
- mds: 1/1/1 up {0=a=up:active}
- osd: 1 osds: 1 up, 1 in
-
- data:
- pools: 2 pools, 16 pgs
- objects: 21 objects, 2246 bytes
- usage: 546 GB used, 384 GB / 931 GB avail
- pgs: 16 active+clean
-
-
- 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
- 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
- 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
-
-
-In addition to using ``ceph -w`` to print log lines as they are emitted,
-use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
-log.
-
-Monitoring Health Checks
-========================
-
-Ceph continously runs various *health checks* against its own status. When
-a health check fails, this is reflected in the output of ``ceph status`` (or
-``ceph health``). In addition, messages are sent to the cluster log to
-indicate when a check fails, and when the cluster recovers.
-
-For example, when an OSD goes down, the ``health`` section of the status
-output may be updated as follows:
-
-::
-
- health: HEALTH_WARN
- 1 osds down
- Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
-
-At this time, cluster log messages are also emitted to record the failure of the
-health checks:
-
-::
-
- 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
- 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
-
-When the OSD comes back online, the cluster log records the cluster's return
-to a health state:
-
-::
-
- 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
- 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
- 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
-
-
-Detecting configuration issues
-==============================
-
-In addition to the health checks that Ceph continuously runs on its
-own status, there are some configuration issues that may only be detected
-by an external tool.
-
-Use the `ceph-medic`_ tool to run these additional checks on your Ceph
-cluster's configuration.
-
-Checking a Cluster's Usage Stats
-================================
-
-To check a cluster's data usage and data distribution among pools, you can
-use the ``df`` option. It is similar to Linux ``df``. Execute
-the following::
-
- ceph df
-
-The **GLOBAL** section of the output provides an overview of the amount of
-storage your cluster uses for your data.
-
-- **SIZE:** The overall storage capacity of the cluster.
-- **AVAIL:** The amount of free space available in the cluster.
-- **RAW USED:** The amount of raw storage used.
-- **% RAW USED:** The percentage of raw storage used. Use this number in
- conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
- you are not reaching your cluster's capacity. See `Storage Capacity`_ for
- additional details.
-
-The **POOLS** section of the output provides a list of pools and the notional
-usage of each pool. The output from this section **DOES NOT** reflect replicas,
-clones or snapshots. For example, if you store an object with 1MB of data, the
-notional usage will be 1MB, but the actual usage may be 2MB or more depending
-on the number of replicas, clones and snapshots.
-
-- **NAME:** The name of the pool.
-- **ID:** The pool ID.
-- **USED:** The notional amount of data stored in kilobytes, unless the number
- appends **M** for megabytes or **G** for gigabytes.
-- **%USED:** The notional percentage of storage used per pool.
-- **MAX AVAIL:** An estimate of the notional amount of data that can be written
- to this pool.
-- **Objects:** The notional number of objects stored per pool.
-
-.. note:: The numbers in the **POOLS** section are notional. They are not
- inclusive of the number of replicas, shapshots or clones. As a result,
- the sum of the **USED** and **%USED** amounts will not add up to the
- **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
- output.
-
-.. note:: The **MAX AVAIL** value is a complicated function of the
- replication or erasure code used, the CRUSH rule that maps storage
- to devices, the utilization of those devices, and the configured
- mon_osd_full_ratio.
-
-
-
-Checking OSD Status
-===================
-
-You can check OSDs to ensure they are ``up`` and ``in`` by executing::
-
- ceph osd stat
-
-Or::
-
- ceph osd dump
-
-You can also check view OSDs according to their position in the CRUSH map. ::
-
- ceph osd tree
-
-Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
-and their weight. ::
-
- # id weight type name up/down reweight
- -1 3 pool default
- -3 3 rack mainrack
- -2 3 host osd-host
- 0 1 osd.0 up 1
- 1 1 osd.1 up 1
- 2 1 osd.2 up 1
-
-For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
-
-Checking Monitor Status
-=======================
-
-If your cluster has multiple monitors (likely), you should check the monitor
-quorum status after you start the cluster before reading and/or writing data. A
-quorum must be present when multiple monitors are running. You should also check
-monitor status periodically to ensure that they are running.
-
-To see display the monitor map, execute the following::
-
- ceph mon stat
-
-Or::
-
- ceph mon dump
-
-To check the quorum status for the monitor cluster, execute the following::
-
- ceph quorum_status
-
-Ceph will return the quorum status. For example, a Ceph cluster consisting of
-three monitors may return the following:
-
-.. code-block:: javascript
-
- { "election_epoch": 10,
- "quorum": [
- 0,
- 1,
- 2],
- "monmap": { "epoch": 1,
- "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
- "modified": "2011-12-12 13:28:27.505520",
- "created": "2011-12-12 13:28:27.505520",
- "mons": [
- { "rank": 0,
- "name": "a",
- "addr": "127.0.0.1:6789\/0"},
- { "rank": 1,
- "name": "b",
- "addr": "127.0.0.1:6790\/0"},
- { "rank": 2,
- "name": "c",
- "addr": "127.0.0.1:6791\/0"}
- ]
- }
- }
-
-Checking MDS Status
-===================
-
-Metadata servers provide metadata services for Ceph FS. Metadata servers have
-two sets of states: ``up | down`` and ``active | inactive``. To ensure your
-metadata servers are ``up`` and ``active``, execute the following::
-
- ceph mds stat
-
-To display details of the metadata cluster, execute the following::
-
- ceph fs dump
-
-
-Checking Placement Group States
-===============================
-
-Placement groups map objects to OSDs. When you monitor your
-placement groups, you will want them to be ``active`` and ``clean``.
-For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
-
-.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
-
-
-Using the Admin Socket
-======================
-
-The Ceph admin socket allows you to query a daemon via a socket interface.
-By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
-via the admin socket, login to the host running the daemon and use the
-following command::
-
- ceph daemon {daemon-name}
- ceph daemon {path-to-socket-file}
-
-For example, the following are equivalent::
-
- ceph daemon osd.0 foo
- ceph daemon /var/run/ceph/ceph-osd.0.asok foo
-
-To view the available admin socket commands, execute the following command::
-
- ceph daemon {daemon-name} help
-
-The admin socket command enables you to show and set your configuration at
-runtime. See `Viewing a Configuration at Runtime`_ for details.
-
-Additionally, you can set configuration values at runtime directly (i.e., the
-admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
-injectargs``, which relies on the monitor but doesn't require you to login
-directly to the host in question ).
-
-.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
-.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
-.. _ceph-medic: http://docs.ceph.com/ceph-medic/master/
diff --git a/src/ceph/doc/rados/operations/operating.rst b/src/ceph/doc/rados/operations/operating.rst
deleted file mode 100644
index 791941a..0000000
--- a/src/ceph/doc/rados/operations/operating.rst
+++ /dev/null
@@ -1,251 +0,0 @@
-=====================
- Operating a Cluster
-=====================
-
-.. index:: systemd; operating a cluster
-
-
-Running Ceph with systemd
-==========================
-
-For all distributions that support systemd (CentOS 7, Fedora, Debian
-Jessie 8 and later, SUSE), ceph daemons are now managed using native
-systemd files instead of the legacy sysvinit scripts. For example::
-
- sudo systemctl start ceph.target # start all daemons
- sudo systemctl status ceph-osd@12 # check status of osd.12
-
-To list the Ceph systemd units on a node, execute::
-
- sudo systemctl status ceph\*.service ceph\*.target
-
-Starting all Daemons
---------------------
-
-To start all daemons on a Ceph Node (irrespective of type), execute the
-following::
-
- sudo systemctl start ceph.target
-
-
-Stopping all Daemons
---------------------
-
-To stop all daemons on a Ceph Node (irrespective of type), execute the
-following::
-
- sudo systemctl stop ceph\*.service ceph\*.target
-
-
-Starting all Daemons by Type
-----------------------------
-
-To start all daemons of a particular type on a Ceph Node, execute one of the
-following::
-
- sudo systemctl start ceph-osd.target
- sudo systemctl start ceph-mon.target
- sudo systemctl start ceph-mds.target
-
-
-Stopping all Daemons by Type
-----------------------------
-
-To stop all daemons of a particular type on a Ceph Node, execute one of the
-following::
-
- sudo systemctl stop ceph-mon\*.service ceph-mon.target
- sudo systemctl stop ceph-osd\*.service ceph-osd.target
- sudo systemctl stop ceph-mds\*.service ceph-mds.target
-
-
-Starting a Daemon
------------------
-
-To start a specific daemon instance on a Ceph Node, execute one of the
-following::
-
- sudo systemctl start ceph-osd@{id}
- sudo systemctl start ceph-mon@{hostname}
- sudo systemctl start ceph-mds@{hostname}
-
-For example::
-
- sudo systemctl start ceph-osd@1
- sudo systemctl start ceph-mon@ceph-server
- sudo systemctl start ceph-mds@ceph-server
-
-
-Stopping a Daemon
------------------
-
-To stop a specific daemon instance on a Ceph Node, execute one of the
-following::
-
- sudo systemctl stop ceph-osd@{id}
- sudo systemctl stop ceph-mon@{hostname}
- sudo systemctl stop ceph-mds@{hostname}
-
-For example::
-
- sudo systemctl stop ceph-osd@1
- sudo systemctl stop ceph-mon@ceph-server
- sudo systemctl stop ceph-mds@ceph-server
-
-
-.. index:: Ceph service; Upstart; operating a cluster
-
-
-
-Running Ceph with Upstart
-=========================
-
-When deploying Ceph with ``ceph-deploy`` on Ubuntu Trusty, you may start and
-stop Ceph daemons on a :term:`Ceph Node` using the event-based `Upstart`_.
-Upstart does not require you to define daemon instances in the Ceph
-configuration file.
-
-To list the Ceph Upstart jobs and instances on a node, execute::
-
- sudo initctl list | grep ceph
-
-See `initctl`_ for additional details.
-
-
-Starting all Daemons
---------------------
-
-To start all daemons on a Ceph Node (irrespective of type), execute the
-following::
-
- sudo start ceph-all
-
-
-Stopping all Daemons
---------------------
-
-To stop all daemons on a Ceph Node (irrespective of type), execute the
-following::
-
- sudo stop ceph-all
-
-
-Starting all Daemons by Type
-----------------------------
-
-To start all daemons of a particular type on a Ceph Node, execute one of the
-following::
-
- sudo start ceph-osd-all
- sudo start ceph-mon-all
- sudo start ceph-mds-all
-
-
-Stopping all Daemons by Type
-----------------------------
-
-To stop all daemons of a particular type on a Ceph Node, execute one of the
-following::
-
- sudo stop ceph-osd-all
- sudo stop ceph-mon-all
- sudo stop ceph-mds-all
-
-
-Starting a Daemon
------------------
-
-To start a specific daemon instance on a Ceph Node, execute one of the
-following::
-
- sudo start ceph-osd id={id}
- sudo start ceph-mon id={hostname}
- sudo start ceph-mds id={hostname}
-
-For example::
-
- sudo start ceph-osd id=1
- sudo start ceph-mon id=ceph-server
- sudo start ceph-mds id=ceph-server
-
-
-Stopping a Daemon
------------------
-
-To stop a specific daemon instance on a Ceph Node, execute one of the
-following::
-
- sudo stop ceph-osd id={id}
- sudo stop ceph-mon id={hostname}
- sudo stop ceph-mds id={hostname}
-
-For example::
-
- sudo stop ceph-osd id=1
- sudo start ceph-mon id=ceph-server
- sudo start ceph-mds id=ceph-server
-
-
-.. index:: Ceph service; sysvinit; operating a cluster
-
-
-Running Ceph
-============
-
-Each time you to **start**, **restart**, and **stop** Ceph daemons (or your
-entire cluster) you must specify at least one option and one command. You may
-also specify a daemon type or a daemon instance. ::
-
- {commandline} [options] [commands] [daemons]
-
-
-The ``ceph`` options include:
-
-+-----------------+----------+-------------------------------------------------+
-| Option | Shortcut | Description |
-+=================+==========+=================================================+
-| ``--verbose`` | ``-v`` | Use verbose logging. |
-+-----------------+----------+-------------------------------------------------+
-| ``--valgrind`` | ``N/A`` | (Dev and QA only) Use `Valgrind`_ debugging. |
-+-----------------+----------+-------------------------------------------------+
-| ``--allhosts`` | ``-a`` | Execute on all nodes in ``ceph.conf.`` |
-| | | Otherwise, it only executes on ``localhost``. |
-+-----------------+----------+-------------------------------------------------+
-| ``--restart`` | ``N/A`` | Automatically restart daemon if it core dumps. |
-+-----------------+----------+-------------------------------------------------+
-| ``--norestart`` | ``N/A`` | Don't restart a daemon if it core dumps. |
-+-----------------+----------+-------------------------------------------------+
-| ``--conf`` | ``-c`` | Use an alternate configuration file. |
-+-----------------+----------+-------------------------------------------------+
-
-The ``ceph`` commands include:
-
-+------------------+------------------------------------------------------------+
-| Command | Description |
-+==================+============================================================+
-| ``start`` | Start the daemon(s). |
-+------------------+------------------------------------------------------------+
-| ``stop`` | Stop the daemon(s). |
-+------------------+------------------------------------------------------------+
-| ``forcestop`` | Force the daemon(s) to stop. Same as ``kill -9`` |
-+------------------+------------------------------------------------------------+
-| ``killall`` | Kill all daemons of a particular type. |
-+------------------+------------------------------------------------------------+
-| ``cleanlogs`` | Cleans out the log directory. |
-+------------------+------------------------------------------------------------+
-| ``cleanalllogs`` | Cleans out **everything** in the log directory. |
-+------------------+------------------------------------------------------------+
-
-For subsystem operations, the ``ceph`` service can target specific daemon types
-by adding a particular daemon type for the ``[daemons]`` option. Daemon types
-include:
-
-- ``mon``
-- ``osd``
-- ``mds``
-
-
-
-.. _Valgrind: http://www.valgrind.org/
-.. _Upstart: http://upstart.ubuntu.com/index.html
-.. _initctl: http://manpages.ubuntu.com/manpages/raring/en/man8/initctl.8.html
diff --git a/src/ceph/doc/rados/operations/pg-concepts.rst b/src/ceph/doc/rados/operations/pg-concepts.rst
deleted file mode 100644
index 636d6bf..0000000
--- a/src/ceph/doc/rados/operations/pg-concepts.rst
+++ /dev/null
@@ -1,102 +0,0 @@
-==========================
- Placement Group Concepts
-==========================
-
-When you execute commands like ``ceph -w``, ``ceph osd dump``, and other
-commands related to placement groups, Ceph may return values using some
-of the following terms:
-
-*Peering*
- The process of bringing all of the OSDs that store
- a Placement Group (PG) into agreement about the state
- of all of the objects (and their metadata) in that PG.
- Note that agreeing on the state does not mean that
- they all have the latest contents.
-
-*Acting Set*
- The ordered list of OSDs who are (or were as of some epoch)
- responsible for a particular placement group.
-
-*Up Set*
- The ordered list of OSDs responsible for a particular placement
- group for a particular epoch according to CRUSH. Normally this
- is the same as the *Acting Set*, except when the *Acting Set* has
- been explicitly overridden via ``pg_temp`` in the OSD Map.
-
-*Current Interval* or *Past Interval*
- A sequence of OSD map epochs during which the *Acting Set* and *Up
- Set* for particular placement group do not change.
-
-*Primary*
- The member (and by convention first) of the *Acting Set*,
- that is responsible for coordination peering, and is
- the only OSD that will accept client-initiated
- writes to objects in a placement group.
-
-*Replica*
- A non-primary OSD in the *Acting Set* for a placement group
- (and who has been recognized as such and *activated* by the primary).
-
-*Stray*
- An OSD that is not a member of the current *Acting Set*, but
- has not yet been told that it can delete its copies of a
- particular placement group.
-
-*Recovery*
- Ensuring that copies of all of the objects in a placement group
- are on all of the OSDs in the *Acting Set*. Once *Peering* has
- been performed, the *Primary* can start accepting write operations,
- and *Recovery* can proceed in the background.
-
-*PG Info*
- Basic metadata about the placement group's creation epoch, the version
- for the most recent write to the placement group, *last epoch started*,
- *last epoch clean*, and the beginning of the *current interval*. Any
- inter-OSD communication about placement groups includes the *PG Info*,
- such that any OSD that knows a placement group exists (or once existed)
- also has a lower bound on *last epoch clean* or *last epoch started*.
-
-*PG Log*
- A list of recent updates made to objects in a placement group.
- Note that these logs can be truncated after all OSDs
- in the *Acting Set* have acknowledged up to a certain
- point.
-
-*Missing Set*
- Each OSD notes update log entries and if they imply updates to
- the contents of an object, adds that object to a list of needed
- updates. This list is called the *Missing Set* for that ``<OSD,PG>``.
-
-*Authoritative History*
- A complete, and fully ordered set of operations that, if
- performed, would bring an OSD's copy of a placement group
- up to date.
-
-*Epoch*
- A (monotonically increasing) OSD map version number
-
-*Last Epoch Start*
- The last epoch at which all nodes in the *Acting Set*
- for a particular placement group agreed on an
- *Authoritative History*. At this point, *Peering* is
- deemed to have been successful.
-
-*up_thru*
- Before a *Primary* can successfully complete the *Peering* process,
- it must inform a monitor that is alive through the current
- OSD map *Epoch* by having the monitor set its *up_thru* in the osd
- map. This helps *Peering* ignore previous *Acting Sets* for which
- *Peering* never completed after certain sequences of failures, such as
- the second interval below:
-
- - *acting set* = [A,B]
- - *acting set* = [A]
- - *acting set* = [] very shortly after (e.g., simultaneous failure, but staggered detection)
- - *acting set* = [B] (B restarts, A does not)
-
-*Last Epoch Clean*
- The last *Epoch* at which all nodes in the *Acting set*
- for a particular placement group were completely
- up to date (both placement group logs and object contents).
- At this point, *recovery* is deemed to have been
- completed.
diff --git a/src/ceph/doc/rados/operations/pg-repair.rst b/src/ceph/doc/rados/operations/pg-repair.rst
deleted file mode 100644
index 0d6692a..0000000
--- a/src/ceph/doc/rados/operations/pg-repair.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-Repairing PG inconsistencies
-============================
-
-
diff --git a/src/ceph/doc/rados/operations/pg-states.rst b/src/ceph/doc/rados/operations/pg-states.rst
deleted file mode 100644
index 0fbd3dc..0000000
--- a/src/ceph/doc/rados/operations/pg-states.rst
+++ /dev/null
@@ -1,80 +0,0 @@
-========================
- Placement Group States
-========================
-
-When checking a cluster's status (e.g., running ``ceph -w`` or ``ceph -s``),
-Ceph will report on the status of the placement groups. A placement group has
-one or more states. The optimum state for placement groups in the placement group
-map is ``active + clean``.
-
-*Creating*
- Ceph is still creating the placement group.
-
-*Active*
- Ceph will process requests to the placement group.
-
-*Clean*
- Ceph replicated all objects in the placement group the correct number of times.
-
-*Down*
- A replica with necessary data is down, so the placement group is offline.
-
-*Scrubbing*
- Ceph is checking the placement group for inconsistencies.
-
-*Degraded*
- Ceph has not replicated some objects in the placement group the correct number of times yet.
-
-*Inconsistent*
- Ceph detects inconsistencies in the one or more replicas of an object in the placement group
- (e.g. objects are the wrong size, objects are missing from one replica *after* recovery finished, etc.).
-
-*Peering*
- The placement group is undergoing the peering process
-
-*Repair*
- Ceph is checking the placement group and repairing any inconsistencies it finds (if possible).
-
-*Recovering*
- Ceph is migrating/synchronizing objects and their replicas.
-
-*Forced-Recovery*
- High recovery priority of that PG is enforced by user.
-
-*Backfill*
- Ceph is scanning and synchronizing the entire contents of a placement group
- instead of inferring what contents need to be synchronized from the logs of
- recent operations. *Backfill* is a special case of recovery.
-
-*Forced-Backfill*
- High backfill priority of that PG is enforced by user.
-
-*Wait-backfill*
- The placement group is waiting in line to start backfill.
-
-*Backfill-toofull*
- A backfill operation is waiting because the destination OSD is over its
- full ratio.
-
-*Incomplete*
- Ceph detects that a placement group is missing information about
- writes that may have occurred, or does not have any healthy
- copies. If you see this state, try to start any failed OSDs that may
- contain the needed information. In the case of an erasure coded pool
- temporarily reducing min_size may allow recovery.
-
-*Stale*
- The placement group is in an unknown state - the monitors have not received
- an update for it since the placement group mapping changed.
-
-*Remapped*
- The placement group is temporarily mapped to a different set of OSDs from what
- CRUSH specified.
-
-*Undersized*
- The placement group fewer copies than the configured pool replication level.
-
-*Peered*
- The placement group has peered, but cannot serve client IO due to not having
- enough copies to reach the pool's configured min_size parameter. Recovery
- may occur in this state, so the pg may heal up to min_size eventually.
diff --git a/src/ceph/doc/rados/operations/placement-groups.rst b/src/ceph/doc/rados/operations/placement-groups.rst
deleted file mode 100644
index fee833a..0000000
--- a/src/ceph/doc/rados/operations/placement-groups.rst
+++ /dev/null
@@ -1,469 +0,0 @@
-==================
- Placement Groups
-==================
-
-.. _preselection:
-
-A preselection of pg_num
-========================
-
-When creating a new pool with::
-
- ceph osd pool create {pool-name} pg_num
-
-it is mandatory to choose the value of ``pg_num`` because it cannot be
-calculated automatically. Here are a few values commonly used:
-
-- Less than 5 OSDs set ``pg_num`` to 128
-
-- Between 5 and 10 OSDs set ``pg_num`` to 512
-
-- Between 10 and 50 OSDs set ``pg_num`` to 1024
-
-- If you have more than 50 OSDs, you need to understand the tradeoffs
- and how to calculate the ``pg_num`` value by yourself
-
-- For calculating ``pg_num`` value by yourself please take help of `pgcalc`_ tool
-
-As the number of OSDs increases, chosing the right value for pg_num
-becomes more important because it has a significant influence on the
-behavior of the cluster as well as the durability of the data when
-something goes wrong (i.e. the probability that a catastrophic event
-leads to data loss).
-
-How are Placement Groups used ?
-===============================
-
-A placement group (PG) aggregates objects within a pool because
-tracking object placement and object metadata on a per-object basis is
-computationally expensive--i.e., a system with millions of objects
-cannot realistically track placement on a per-object basis.
-
-.. ditaa::
- /-----\ /-----\ /-----\ /-----\ /-----\
- | obj | | obj | | obj | | obj | | obj |
- \-----/ \-----/ \-----/ \-----/ \-----/
- | | | | |
- +--------+--------+ +---+----+
- | |
- v v
- +-----------------------+ +-----------------------+
- | Placement Group #1 | | Placement Group #2 |
- | | | |
- +-----------------------+ +-----------------------+
- | |
- +------------------------------+
- |
- v
- +-----------------------+
- | Pool |
- | |
- +-----------------------+
-
-The Ceph client will calculate which placement group an object should
-be in. It does this by hashing the object ID and applying an operation
-based on the number of PGs in the defined pool and the ID of the pool.
-See `Mapping PGs to OSDs`_ for details.
-
-The object's contents within a placement group are stored in a set of
-OSDs. For instance, in a replicated pool of size two, each placement
-group will store objects on two OSDs, as shown below.
-
-.. ditaa::
-
- +-----------------------+ +-----------------------+
- | Placement Group #1 | | Placement Group #2 |
- | | | |
- +-----------------------+ +-----------------------+
- | | | |
- v v v v
- /----------\ /----------\ /----------\ /----------\
- | | | | | | | |
- | OSD #1 | | OSD #2 | | OSD #2 | | OSD #3 |
- | | | | | | | |
- \----------/ \----------/ \----------/ \----------/
-
-
-Should OSD #2 fail, another will be assigned to Placement Group #1 and
-will be filled with copies of all objects in OSD #1. If the pool size
-is changed from two to three, an additional OSD will be assigned to
-the placement group and will receive copies of all objects in the
-placement group.
-
-Placement groups do not own the OSD, they share it with other
-placement groups from the same pool or even other pools. If OSD #2
-fails, the Placement Group #2 will also have to restore copies of
-objects, using OSD #3.
-
-When the number of placement groups increases, the new placement
-groups will be assigned OSDs. The result of the CRUSH function will
-also change and some objects from the former placement groups will be
-copied over to the new Placement Groups and removed from the old ones.
-
-Placement Groups Tradeoffs
-==========================
-
-Data durability and even distribution among all OSDs call for more
-placement groups but their number should be reduced to the minimum to
-save CPU and memory.
-
-.. _data durability:
-
-Data durability
----------------
-
-After an OSD fails, the risk of data loss increases until the data it
-contained is fully recovered. Let's imagine a scenario that causes
-permanent data loss in a single placement group:
-
-- The OSD fails and all copies of the object it contains are lost.
- For all objects within the placement group the number of replica
- suddently drops from three to two.
-
-- Ceph starts recovery for this placement group by chosing a new OSD
- to re-create the third copy of all objects.
-
-- Another OSD, within the same placement group, fails before the new
- OSD is fully populated with the third copy. Some objects will then
- only have one surviving copies.
-
-- Ceph picks yet another OSD and keeps copying objects to restore the
- desired number of copies.
-
-- A third OSD, within the same placement group, fails before recovery
- is complete. If this OSD contained the only remaining copy of an
- object, it is permanently lost.
-
-In a cluster containing 10 OSDs with 512 placement groups in a three
-replica pool, CRUSH will give each placement groups three OSDs. In the
-end, each OSDs will end up hosting (512 * 3) / 10 = ~150 Placement
-Groups. When the first OSD fails, the above scenario will therefore
-start recovery for all 150 placement groups at the same time.
-
-The 150 placement groups being recovered are likely to be
-homogeneously spread over the 9 remaining OSDs. Each remaining OSD is
-therefore likely to send copies of objects to all others and also
-receive some new objects to be stored because they became part of a
-new placement group.
-
-The amount of time it takes for this recovery to complete entirely
-depends on the architecture of the Ceph cluster. Let say each OSD is
-hosted by a 1TB SSD on a single machine and all of them are connected
-to a 10Gb/s switch and the recovery for a single OSD completes within
-M minutes. If there are two OSDs per machine using spinners with no
-SSD journal and a 1Gb/s switch, it will at least be an order of
-magnitude slower.
-
-In a cluster of this size, the number of placement groups has almost
-no influence on data durability. It could be 128 or 8192 and the
-recovery would not be slower or faster.
-
-However, growing the same Ceph cluster to 20 OSDs instead of 10 OSDs
-is likely to speed up recovery and therefore improve data durability
-significantly. Each OSD now participates in only ~75 placement groups
-instead of ~150 when there were only 10 OSDs and it will still require
-all 19 remaining OSDs to perform the same amount of object copies in
-order to recover. But where 10 OSDs had to copy approximately 100GB
-each, they now have to copy 50GB each instead. If the network was the
-bottleneck, recovery will happen twice as fast. In other words,
-recovery goes faster when the number of OSDs increases.
-
-If this cluster grows to 40 OSDs, each of them will only host ~35
-placement groups. If an OSD dies, recovery will keep going faster
-unless it is blocked by another bottleneck. However, if this cluster
-grows to 200 OSDs, each of them will only host ~7 placement groups. If
-an OSD dies, recovery will happen between at most of ~21 (7 * 3) OSDs
-in these placement groups: recovery will take longer than when there
-were 40 OSDs, meaning the number of placement groups should be
-increased.
-
-No matter how short the recovery time is, there is a chance for a
-second OSD to fail while it is in progress. In the 10 OSDs cluster
-described above, if any of them fail, then ~17 placement groups
-(i.e. ~150 / 9 placement groups being recovered) will only have one
-surviving copy. And if any of the 8 remaining OSD fail, the last
-objects of two placement groups are likely to be lost (i.e. ~17 / 8
-placement groups with only one remaining copy being recovered).
-
-When the size of the cluster grows to 20 OSDs, the number of Placement
-Groups damaged by the loss of three OSDs drops. The second OSD lost
-will degrade ~4 (i.e. ~75 / 19 placement groups being recovered)
-instead of ~17 and the third OSD lost will only lose data if it is one
-of the four OSDs containing the surviving copy. In other words, if the
-probability of losing one OSD is 0.0001% during the recovery time
-frame, it goes from 17 * 10 * 0.0001% in the cluster with 10 OSDs to 4 * 20 *
-0.0001% in the cluster with 20 OSDs.
-
-In a nutshell, more OSDs mean faster recovery and a lower risk of
-cascading failures leading to the permanent loss of a Placement
-Group. Having 512 or 4096 Placement Groups is roughly equivalent in a
-cluster with less than 50 OSDs as far as data durability is concerned.
-
-Note: It may take a long time for a new OSD added to the cluster to be
-populated with placement groups that were assigned to it. However
-there is no degradation of any object and it has no impact on the
-durability of the data contained in the Cluster.
-
-.. _object distribution:
-
-Object distribution within a pool
----------------------------------
-
-Ideally objects are evenly distributed in each placement group. Since
-CRUSH computes the placement group for each object, but does not
-actually know how much data is stored in each OSD within this
-placement group, the ratio between the number of placement groups and
-the number of OSDs may influence the distribution of the data
-significantly.
-
-For instance, if there was single a placement group for ten OSDs in a
-three replica pool, only three OSD would be used because CRUSH would
-have no other choice. When more placement groups are available,
-objects are more likely to be evenly spread among them. CRUSH also
-makes every effort to evenly spread OSDs among all existing Placement
-Groups.
-
-As long as there are one or two orders of magnitude more Placement
-Groups than OSDs, the distribution should be even. For instance, 300
-placement groups for 3 OSDs, 1000 placement groups for 10 OSDs etc.
-
-Uneven data distribution can be caused by factors other than the ratio
-between OSDs and placement groups. Since CRUSH does not take into
-account the size of the objects, a few very large objects may create
-an imbalance. Let say one million 4K objects totaling 4GB are evenly
-spread among 1000 placement groups on 10 OSDs. They will use 4GB / 10
-= 400MB on each OSD. If one 400MB object is added to the pool, the
-three OSDs supporting the placement group in which the object has been
-placed will be filled with 400MB + 400MB = 800MB while the seven
-others will remain occupied with only 400MB.
-
-.. _resource usage:
-
-Memory, CPU and network usage
------------------------------
-
-For each placement group, OSDs and MONs need memory, network and CPU
-at all times and even more during recovery. Sharing this overhead by
-clustering objects within a placement group is one of the main reasons
-they exist.
-
-Minimizing the number of placement groups saves significant amounts of
-resources.
-
-Choosing the number of Placement Groups
-=======================================
-
-If you have more than 50 OSDs, we recommend approximately 50-100
-placement groups per OSD to balance out resource usage, data
-durability and distribution. If you have less than 50 OSDs, chosing
-among the `preselection`_ above is best. For a single pool of objects,
-you can use the following formula to get a baseline::
-
- (OSDs * 100)
- Total PGs = ------------
- pool size
-
-Where **pool size** is either the number of replicas for replicated
-pools or the K+M sum for erasure coded pools (as returned by **ceph
-osd erasure-code-profile get**).
-
-You should then check if the result makes sense with the way you
-designed your Ceph cluster to maximize `data durability`_,
-`object distribution`_ and minimize `resource usage`_.
-
-The result should be **rounded up to the nearest power of two.**
-Rounding up is optional, but recommended for CRUSH to evenly balance
-the number of objects among placement groups.
-
-As an example, for a cluster with 200 OSDs and a pool size of 3
-replicas, you would estimate your number of PGs as follows::
-
- (200 * 100)
- ----------- = 6667. Nearest power of 2: 8192
- 3
-
-When using multiple data pools for storing objects, you need to ensure
-that you balance the number of placement groups per pool with the
-number of placement groups per OSD so that you arrive at a reasonable
-total number of placement groups that provides reasonably low variance
-per OSD without taxing system resources or making the peering process
-too slow.
-
-For instance a cluster of 10 pools each with 512 placement groups on
-ten OSDs is a total of 5,120 placement groups spread over ten OSDs,
-that is 512 placement groups per OSD. That does not use too many
-resources. However, if 1,000 pools were created with 512 placement
-groups each, the OSDs will handle ~50,000 placement groups each and it
-would require significantly more resources and time for peering.
-
-You may find the `PGCalc`_ tool helpful.
-
-
-.. _setting the number of placement groups:
-
-Set the Number of Placement Groups
-==================================
-
-To set the number of placement groups in a pool, you must specify the
-number of placement groups at the time you create the pool.
-See `Create a Pool`_ for details. Once you have set placement groups for a
-pool, you may increase the number of placement groups (but you cannot
-decrease the number of placement groups). To increase the number of
-placement groups, execute the following::
-
- ceph osd pool set {pool-name} pg_num {pg_num}
-
-Once you increase the number of placement groups, you must also
-increase the number of placement groups for placement (``pgp_num``)
-before your cluster will rebalance. The ``pgp_num`` will be the number of
-placement groups that will be considered for placement by the CRUSH
-algorithm. Increasing ``pg_num`` splits the placement groups but data
-will not be migrated to the newer placement groups until placement
-groups for placement, ie. ``pgp_num`` is increased. The ``pgp_num``
-should be equal to the ``pg_num``. To increase the number of
-placement groups for placement, execute the following::
-
- ceph osd pool set {pool-name} pgp_num {pgp_num}
-
-
-Get the Number of Placement Groups
-==================================
-
-To get the number of placement groups in a pool, execute the following::
-
- ceph osd pool get {pool-name} pg_num
-
-
-Get a Cluster's PG Statistics
-=============================
-
-To get the statistics for the placement groups in your cluster, execute the following::
-
- ceph pg dump [--format {format}]
-
-Valid formats are ``plain`` (default) and ``json``.
-
-
-Get Statistics for Stuck PGs
-============================
-
-To get the statistics for all placement groups stuck in a specified state,
-execute the following::
-
- ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format <format>] [-t|--threshold <seconds>]
-
-**Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
-with the most up-to-date data to come up and in.
-
-**Unclean** Placement groups contain objects that are not replicated the desired number
-of times. They should be recovering.
-
-**Stale** Placement groups are in an unknown state - the OSDs that host them have not
-reported to the monitor cluster in a while (configured by ``mon_osd_report_timeout``).
-
-Valid formats are ``plain`` (default) and ``json``. The threshold defines the minimum number
-of seconds the placement group is stuck before including it in the returned statistics
-(default 300 seconds).
-
-
-Get a PG Map
-============
-
-To get the placement group map for a particular placement group, execute the following::
-
- ceph pg map {pg-id}
-
-For example::
-
- ceph pg map 1.6c
-
-Ceph will return the placement group map, the placement group, and the OSD status::
-
- osdmap e13 pg 1.6c (1.6c) -> up [1,0] acting [1,0]
-
-
-Get a PGs Statistics
-====================
-
-To retrieve statistics for a particular placement group, execute the following::
-
- ceph pg {pg-id} query
-
-
-Scrub a Placement Group
-=======================
-
-To scrub a placement group, execute the following::
-
- ceph pg scrub {pg-id}
-
-Ceph checks the primary and any replica nodes, generates a catalog of all objects
-in the placement group and compares them to ensure that no objects are missing
-or mismatched, and their contents are consistent. Assuming the replicas all
-match, a final semantic sweep ensures that all of the snapshot-related object
-metadata is consistent. Errors are reported via logs.
-
-Prioritize backfill/recovery of a Placement Group(s)
-====================================================
-
-You may run into a situation where a bunch of placement groups will require
-recovery and/or backfill, and some particular groups hold data more important
-than others (for example, those PGs may hold data for images used by running
-machines and other PGs may be used by inactive machines/less relevant data).
-In that case, you may want to prioritize recovery of those groups so
-performance and/or availability of data stored on those groups is restored
-earlier. To do this (mark particular placement group(s) as prioritized during
-backfill or recovery), execute the following::
-
- ceph pg force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
- ceph pg force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
-
-This will cause Ceph to perform recovery or backfill on specified placement
-groups first, before other placement groups. This does not interrupt currently
-ongoing backfills or recovery, but causes specified PGs to be processed
-as soon as possible. If you change your mind or prioritize wrong groups,
-use::
-
- ceph pg cancel-force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
- ceph pg cancel-force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
-
-This will remove "force" flag from those PGs and they will be processed
-in default order. Again, this doesn't affect currently processed placement
-group, only those that are still queued.
-
-The "force" flag is cleared automatically after recovery or backfill of group
-is done.
-
-Revert Lost
-===========
-
-If the cluster has lost one or more objects, and you have decided to
-abandon the search for the lost data, you must mark the unfound objects
-as ``lost``.
-
-If all possible locations have been queried and objects are still
-lost, you may have to give up on the lost objects. This is
-possible given unusual combinations of failures that allow the cluster
-to learn about writes that were performed before the writes themselves
-are recovered.
-
-Currently the only supported option is "revert", which will either roll back to
-a previous version of the object or (if it was a new object) forget about it
-entirely. To mark the "unfound" objects as "lost", execute the following::
-
- ceph pg {pg-id} mark_unfound_lost revert|delete
-
-.. important:: Use this feature with caution, because it may confuse
- applications that expect the object(s) to exist.
-
-
-.. toctree::
- :hidden:
-
- pg-states
- pg-concepts
-
-
-.. _Create a Pool: ../pools#createpool
-.. _Mapping PGs to OSDs: ../../../architecture#mapping-pgs-to-osds
-.. _pgcalc: http://ceph.com/pgcalc/
diff --git a/src/ceph/doc/rados/operations/pools.rst b/src/ceph/doc/rados/operations/pools.rst
deleted file mode 100644
index 7015593..0000000
--- a/src/ceph/doc/rados/operations/pools.rst
+++ /dev/null
@@ -1,798 +0,0 @@
-=======
- Pools
-=======
-
-When you first deploy a cluster without creating a pool, Ceph uses the default
-pools for storing data. A pool provides you with:
-
-- **Resilience**: You can set how many OSD are allowed to fail without losing data.
- For replicated pools, it is the desired number of copies/replicas of an object.
- A typical configuration stores an object and one additional copy
- (i.e., ``size = 2``), but you can determine the number of copies/replicas.
- For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
- (i.e. ``m=2`` in the **erasure code profile**)
-
-- **Placement Groups**: You can set the number of placement groups for the pool.
- A typical configuration uses approximately 100 placement groups per OSD to
- provide optimal balancing without using up too many computing resources. When
- setting up multiple pools, be careful to ensure you set a reasonable number of
- placement groups for both the pool and the cluster as a whole.
-
-- **CRUSH Rules**: When you store data in a pool, a CRUSH ruleset mapped to the
- pool enables CRUSH to identify a rule for the placement of the object
- and its replicas (or chunks for erasure coded pools) in your cluster.
- You can create a custom CRUSH rule for your pool.
-
-- **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
- you effectively take a snapshot of a particular pool.
-
-To organize data into pools, you can list, create, and remove pools.
-You can also view the utilization statistics for each pool.
-
-List Pools
-==========
-
-To list your cluster's pools, execute::
-
- ceph osd lspools
-
-On a freshly installed cluster, only the ``rbd`` pool exists.
-
-
-.. _createpool:
-
-Create a Pool
-=============
-
-Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
-Ideally, you should override the default value for the number of placement
-groups in your Ceph configuration file, as the default is NOT ideal.
-For details on placement group numbers refer to `setting the number of placement groups`_
-
-.. note:: Starting with Luminous, all pools need to be associated to the
- application using the pool. See `Associate Pool to Application`_ below for
- more information.
-
-For example::
-
- osd pool default pg num = 100
- osd pool default pgp num = 100
-
-To create a pool, execute::
-
- ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
- [crush-rule-name] [expected-num-objects]
- ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
- [erasure-code-profile] [crush-rule-name] [expected_num_objects]
-
-Where:
-
-``{pool-name}``
-
-:Description: The name of the pool. It must be unique.
-:Type: String
-:Required: Yes.
-
-``{pg-num}``
-
-:Description: The total number of placement groups for the pool. See `Placement
- Groups`_ for details on calculating a suitable number. The
- default value ``8`` is NOT suitable for most systems.
-
-:Type: Integer
-:Required: Yes.
-:Default: 8
-
-``{pgp-num}``
-
-:Description: The total number of placement groups for placement purposes. This
- **should be equal to the total number of placement groups**, except
- for placement group splitting scenarios.
-
-:Type: Integer
-:Required: Yes. Picks up default or Ceph configuration value if not specified.
-:Default: 8
-
-``{replicated|erasure}``
-
-:Description: The pool type which may either be **replicated** to
- recover from lost OSDs by keeping multiple copies of the
- objects or **erasure** to get a kind of
- `generalized RAID5 <../erasure-code>`_ capability.
- The **replicated** pools require more
- raw storage but implement all Ceph operations. The
- **erasure** pools require less raw storage but only
- implement a subset of the available operations.
-
-:Type: String
-:Required: No.
-:Default: replicated
-
-``[crush-rule-name]``
-
-:Description: The name of a CRUSH rule to use for this pool. The specified
- rule must exist.
-
-:Type: String
-:Required: No.
-:Default: For **replicated** pools it is the ruleset specified by the ``osd
- pool default crush replicated ruleset`` config variable. This
- ruleset must exist.
- For **erasure** pools it is ``erasure-code`` if the ``default``
- `erasure code profile`_ is used or ``{pool-name}`` otherwise. This
- ruleset will be created implicitly if it doesn't exist already.
-
-
-``[erasure-code-profile=profile]``
-
-.. _erasure code profile: ../erasure-code-profile
-
-:Description: For **erasure** pools only. Use the `erasure code profile`_. It
- must be an existing profile as defined by
- **osd erasure-code-profile set**.
-
-:Type: String
-:Required: No.
-
-When you create a pool, set the number of placement groups to a reasonable value
-(e.g., ``100``). Consider the total number of placement groups per OSD too.
-Placement groups are computationally expensive, so performance will degrade when
-you have many pools with many placement groups (e.g., 50 pools with 100
-placement groups each). The point of diminishing returns depends upon the power
-of the OSD host.
-
-See `Placement Groups`_ for details on calculating an appropriate number of
-placement groups for your pool.
-
-.. _Placement Groups: ../placement-groups
-
-``[expected-num-objects]``
-
-:Description: The expected number of objects for this pool. By setting this value (
- together with a negative **filestore merge threshold**), the PG folder
- splitting would happen at the pool creation time, to avoid the latency
- impact to do a runtime folder splitting.
-
-:Type: Integer
-:Required: No.
-:Default: 0, no splitting at the pool creation time.
-
-Associate Pool to Application
-=============================
-
-Pools need to be associated with an application before use. Pools that will be
-used with CephFS or pools that are automatically created by RGW are
-automatically associated. Pools that are intended for use with RBD should be
-initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
-information).
-
-For other cases, you can manually associate a free-form application name to
-a pool.::
-
- ceph osd pool application enable {pool-name} {application-name}
-
-.. note:: CephFS uses the application name ``cephfs``, RBD uses the
- application name ``rbd``, and RGW uses the application name ``rgw``.
-
-Set Pool Quotas
-===============
-
-You can set pool quotas for the maximum number of bytes and/or the maximum
-number of objects per pool. ::
-
- ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
-
-For example::
-
- ceph osd pool set-quota data max_objects 10000
-
-To remove a quota, set its value to ``0``.
-
-
-Delete a Pool
-=============
-
-To delete a pool, execute::
-
- ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
-
-
-To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
-configuration. Otherwise they will refuse to remove a pool.
-
-See `Monitor Configuration`_ for more information.
-
-.. _Monitor Configuration: ../../configuration/mon-config-ref
-
-If you created your own rulesets and rules for a pool you created, you should
-consider removing them when you no longer need your pool::
-
- ceph osd pool get {pool-name} crush_ruleset
-
-If the ruleset was "123", for example, you can check the other pools like so::
-
- ceph osd dump | grep "^pool" | grep "crush_ruleset 123"
-
-If no other pools use that custom ruleset, then it's safe to delete that
-ruleset from the cluster.
-
-If you created users with permissions strictly for a pool that no longer
-exists, you should consider deleting those users too::
-
- ceph auth ls | grep -C 5 {pool-name}
- ceph auth del {user}
-
-
-Rename a Pool
-=============
-
-To rename a pool, execute::
-
- ceph osd pool rename {current-pool-name} {new-pool-name}
-
-If you rename a pool and you have per-pool capabilities for an authenticated
-user, you must update the user's capabilities (i.e., caps) with the new pool
-name.
-
-.. note:: Version ``0.48`` Argonaut and above.
-
-Show Pool Statistics
-====================
-
-To show a pool's utilization statistics, execute::
-
- rados df
-
-
-Make a Snapshot of a Pool
-=========================
-
-To make a snapshot of a pool, execute::
-
- ceph osd pool mksnap {pool-name} {snap-name}
-
-.. note:: Version ``0.48`` Argonaut and above.
-
-
-Remove a Snapshot of a Pool
-===========================
-
-To remove a snapshot of a pool, execute::
-
- ceph osd pool rmsnap {pool-name} {snap-name}
-
-.. note:: Version ``0.48`` Argonaut and above.
-
-.. _setpoolvalues:
-
-
-Set Pool Values
-===============
-
-To set a value to a pool, execute the following::
-
- ceph osd pool set {pool-name} {key} {value}
-
-You may set values for the following keys:
-
-.. _compression_algorithm:
-
-``compression_algorithm``
-:Description: Sets inline compression algorithm to use for underlying BlueStore.
- This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``.
-
-:Type: String
-:Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
-
-``compression_mode``
-
-:Description: Sets the policy for the inline compression algorithm for underlying BlueStore.
- This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``.
-
-:Type: String
-:Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
-
-``compression_min_blob_size``
-
-:Description: Chunks smaller than this are never compressed.
- This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``.
-
-:Type: Unsigned Integer
-
-``compression_max_blob_size``
-
-:Description: Chunks larger than this are broken into smaller blobs sizing
- ``compression_max_blob_size`` before being compressed.
-
-:Type: Unsigned Integer
-
-.. _size:
-
-``size``
-
-:Description: Sets the number of replicas for objects in the pool.
- See `Set the Number of Object Replicas`_ for further details.
- Replicated pools only.
-
-:Type: Integer
-
-.. _min_size:
-
-``min_size``
-
-:Description: Sets the minimum number of replicas required for I/O.
- See `Set the Number of Object Replicas`_ for further details.
- Replicated pools only.
-
-:Type: Integer
-:Version: ``0.54`` and above
-
-.. _pg_num:
-
-``pg_num``
-
-:Description: The effective number of placement groups to use when calculating
- data placement.
-:Type: Integer
-:Valid Range: Superior to ``pg_num`` current value.
-
-.. _pgp_num:
-
-``pgp_num``
-
-:Description: The effective number of placement groups for placement to use
- when calculating data placement.
-
-:Type: Integer
-:Valid Range: Equal to or less than ``pg_num``.
-
-.. _crush_ruleset:
-
-``crush_ruleset``
-
-:Description: The ruleset to use for mapping object placement in the cluster.
-:Type: Integer
-
-.. _allow_ec_overwrites:
-
-``allow_ec_overwrites``
-
-:Description: Whether writes to an erasure coded pool can update part
- of an object, so cephfs and rbd can use it. See
- `Erasure Coding with Overwrites`_ for more details.
-:Type: Boolean
-:Version: ``12.2.0`` and above
-
-.. _hashpspool:
-
-``hashpspool``
-
-:Description: Set/Unset HASHPSPOOL flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-:Version: Version ``0.48`` Argonaut and above.
-
-.. _nodelete:
-
-``nodelete``
-
-:Description: Set/Unset NODELETE flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-:Version: Version ``FIXME``
-
-.. _nopgchange:
-
-``nopgchange``
-
-:Description: Set/Unset NOPGCHANGE flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-:Version: Version ``FIXME``
-
-.. _nosizechange:
-
-``nosizechange``
-
-:Description: Set/Unset NOSIZECHANGE flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-:Version: Version ``FIXME``
-
-.. _write_fadvise_dontneed:
-
-``write_fadvise_dontneed``
-
-:Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-
-.. _noscrub:
-
-``noscrub``
-
-:Description: Set/Unset NOSCRUB flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-
-.. _nodeep-scrub:
-
-``nodeep-scrub``
-
-:Description: Set/Unset NODEEP_SCRUB flag on a given pool.
-:Type: Integer
-:Valid Range: 1 sets flag, 0 unsets flag
-
-.. _hit_set_type:
-
-``hit_set_type``
-
-:Description: Enables hit set tracking for cache pools.
- See `Bloom Filter`_ for additional information.
-
-:Type: String
-:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
-:Default: ``bloom``. Other values are for testing.
-
-.. _hit_set_count:
-
-``hit_set_count``
-
-:Description: The number of hit sets to store for cache pools. The higher
- the number, the more RAM consumed by the ``ceph-osd`` daemon.
-
-:Type: Integer
-:Valid Range: ``1``. Agent doesn't handle > 1 yet.
-
-.. _hit_set_period:
-
-``hit_set_period``
-
-:Description: The duration of a hit set period in seconds for cache pools.
- The higher the number, the more RAM consumed by the
- ``ceph-osd`` daemon.
-
-:Type: Integer
-:Example: ``3600`` 1hr
-
-.. _hit_set_fpp:
-
-``hit_set_fpp``
-
-:Description: The false positive probability for the ``bloom`` hit set type.
- See `Bloom Filter`_ for additional information.
-
-:Type: Double
-:Valid Range: 0.0 - 1.0
-:Default: ``0.05``
-
-.. _cache_target_dirty_ratio:
-
-``cache_target_dirty_ratio``
-
-:Description: The percentage of the cache pool containing modified (dirty)
- objects before the cache tiering agent will flush them to the
- backing storage pool.
-
-:Type: Double
-:Default: ``.4``
-
-.. _cache_target_dirty_high_ratio:
-
-``cache_target_dirty_high_ratio``
-
-:Description: The percentage of the cache pool containing modified (dirty)
- objects before the cache tiering agent will flush them to the
- backing storage pool with a higher speed.
-
-:Type: Double
-:Default: ``.6``
-
-.. _cache_target_full_ratio:
-
-``cache_target_full_ratio``
-
-:Description: The percentage of the cache pool containing unmodified (clean)
- objects before the cache tiering agent will evict them from the
- cache pool.
-
-:Type: Double
-:Default: ``.8``
-
-.. _target_max_bytes:
-
-``target_max_bytes``
-
-:Description: Ceph will begin flushing or evicting objects when the
- ``max_bytes`` threshold is triggered.
-
-:Type: Integer
-:Example: ``1000000000000`` #1-TB
-
-.. _target_max_objects:
-
-``target_max_objects``
-
-:Description: Ceph will begin flushing or evicting objects when the
- ``max_objects`` threshold is triggered.
-
-:Type: Integer
-:Example: ``1000000`` #1M objects
-
-
-``hit_set_grade_decay_rate``
-
-:Description: Temperature decay rate between two successive hit_sets
-:Type: Integer
-:Valid Range: 0 - 100
-:Default: ``20``
-
-
-``hit_set_search_last_n``
-
-:Description: Count at most N appearance in hit_sets for temperature calculation
-:Type: Integer
-:Valid Range: 0 - hit_set_count
-:Default: ``1``
-
-
-.. _cache_min_flush_age:
-
-``cache_min_flush_age``
-
-:Description: The time (in seconds) before the cache tiering agent will flush
- an object from the cache pool to the storage pool.
-
-:Type: Integer
-:Example: ``600`` 10min
-
-.. _cache_min_evict_age:
-
-``cache_min_evict_age``
-
-:Description: The time (in seconds) before the cache tiering agent will evict
- an object from the cache pool.
-
-:Type: Integer
-:Example: ``1800`` 30min
-
-.. _fast_read:
-
-``fast_read``
-
-:Description: On Erasure Coding pool, if this flag is turned on, the read request
- would issue sub reads to all shards, and waits until it receives enough
- shards to decode to serve the client. In the case of jerasure and isa
- erasure plugins, once the first K replies return, client's request is
- served immediately using the data decoded from these replies. This
- helps to tradeoff some resources for better performance. Currently this
- flag is only supported for Erasure Coding pool.
-
-:Type: Boolean
-:Defaults: ``0``
-
-.. _scrub_min_interval:
-
-``scrub_min_interval``
-
-:Description: The minimum interval in seconds for pool scrubbing when
- load is low. If it is 0, the value osd_scrub_min_interval
- from config is used.
-
-:Type: Double
-:Default: ``0``
-
-.. _scrub_max_interval:
-
-``scrub_max_interval``
-
-:Description: The maximum interval in seconds for pool scrubbing
- irrespective of cluster load. If it is 0, the value
- osd_scrub_max_interval from config is used.
-
-:Type: Double
-:Default: ``0``
-
-.. _deep_scrub_interval:
-
-``deep_scrub_interval``
-
-:Description: The interval in seconds for pool “deep” scrubbing. If it
- is 0, the value osd_deep_scrub_interval from config is used.
-
-:Type: Double
-:Default: ``0``
-
-
-Get Pool Values
-===============
-
-To get a value from a pool, execute the following::
-
- ceph osd pool get {pool-name} {key}
-
-You may get values for the following keys:
-
-``size``
-
-:Description: see size_
-
-:Type: Integer
-
-``min_size``
-
-:Description: see min_size_
-
-:Type: Integer
-:Version: ``0.54`` and above
-
-``pg_num``
-
-:Description: see pg_num_
-
-:Type: Integer
-
-
-``pgp_num``
-
-:Description: see pgp_num_
-
-:Type: Integer
-:Valid Range: Equal to or less than ``pg_num``.
-
-
-``crush_ruleset``
-
-:Description: see crush_ruleset_
-
-
-``hit_set_type``
-
-:Description: see hit_set_type_
-
-:Type: String
-:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
-
-``hit_set_count``
-
-:Description: see hit_set_count_
-
-:Type: Integer
-
-
-``hit_set_period``
-
-:Description: see hit_set_period_
-
-:Type: Integer
-
-
-``hit_set_fpp``
-
-:Description: see hit_set_fpp_
-
-:Type: Double
-
-
-``cache_target_dirty_ratio``
-
-:Description: see cache_target_dirty_ratio_
-
-:Type: Double
-
-
-``cache_target_dirty_high_ratio``
-
-:Description: see cache_target_dirty_high_ratio_
-
-:Type: Double
-
-
-``cache_target_full_ratio``
-
-:Description: see cache_target_full_ratio_
-
-:Type: Double
-
-
-``target_max_bytes``
-
-:Description: see target_max_bytes_
-
-:Type: Integer
-
-
-``target_max_objects``
-
-:Description: see target_max_objects_
-
-:Type: Integer
-
-
-``cache_min_flush_age``
-
-:Description: see cache_min_flush_age_
-
-:Type: Integer
-
-
-``cache_min_evict_age``
-
-:Description: see cache_min_evict_age_
-
-:Type: Integer
-
-
-``fast_read``
-
-:Description: see fast_read_
-
-:Type: Boolean
-
-
-``scrub_min_interval``
-
-:Description: see scrub_min_interval_
-
-:Type: Double
-
-
-``scrub_max_interval``
-
-:Description: see scrub_max_interval_
-
-:Type: Double
-
-
-``deep_scrub_interval``
-
-:Description: see deep_scrub_interval_
-
-:Type: Double
-
-
-Set the Number of Object Replicas
-=================================
-
-To set the number of object replicas on a replicated pool, execute the following::
-
- ceph osd pool set {poolname} size {num-replicas}
-
-.. important:: The ``{num-replicas}`` includes the object itself.
- If you want the object and two copies of the object for a total of
- three instances of the object, specify ``3``.
-
-For example::
-
- ceph osd pool set data size 3
-
-You may execute this command for each pool. **Note:** An object might accept
-I/Os in degraded mode with fewer than ``pool size`` replicas. To set a minimum
-number of required replicas for I/O, you should use the ``min_size`` setting.
-For example::
-
- ceph osd pool set data min_size 2
-
-This ensures that no object in the data pool will receive I/O with fewer than
-``min_size`` replicas.
-
-
-Get the Number of Object Replicas
-=================================
-
-To get the number of object replicas, execute the following::
-
- ceph osd dump | grep 'replicated size'
-
-Ceph will list the pools, with the ``replicated size`` attribute highlighted.
-By default, ceph creates two replicas of an object (a total of three copies, or
-a size of 3).
-
-
-
-.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
-.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
-.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
-.. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
-.. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool
-
diff --git a/src/ceph/doc/rados/operations/upmap.rst b/src/ceph/doc/rados/operations/upmap.rst
deleted file mode 100644
index 58f6322..0000000
--- a/src/ceph/doc/rados/operations/upmap.rst
+++ /dev/null
@@ -1,75 +0,0 @@
-Using the pg-upmap
-==================
-
-Starting in Luminous v12.2.z there is a new *pg-upmap* exception table
-in the OSDMap that allows the cluster to explicitly map specific PGs to
-specific OSDs. This allows the cluster to fine-tune the data
-distribution to, in most cases, perfectly distributed PGs across OSDs.
-
-The key caveat to this new mechanism is that it requires that all
-clients understand the new *pg-upmap* structure in the OSDMap.
-
-Enabling
---------
-
-To allow use of the feature, you must tell the cluster that it only
-needs to support luminous (and newer) clients with::
-
- ceph osd set-require-min-compat-client luminous
-
-This command will fail if any pre-luminous clients or daemons are
-connected to the monitors. You can see what client versions are in
-use with::
-
- ceph features
-
-A word of caution
------------------
-
-This is a new feature and not very user friendly. At the time of this
-writing we are working on a new `balancer` module for ceph-mgr that
-will eventually do all of this automatically.
-
-Until then,
-
-Offline optimization
---------------------
-
-Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
-
-#. Grab the latest copy of your osdmap::
-
- ceph osd getmap -o om
-
-#. Run the optimizer::
-
- osdmaptool om --upmap out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]
-
- It is highly recommended that optimization be done for each pool
- individually, or for sets of similarly-utilized pools. You can
- specify the ``--upmap-pool`` option multiple times. "Similar pools"
- means pools that are mapped to the same devices and store the same
- kind of data (e.g., RBD image pools, yes; RGW index pool and RGW
- data pool, no).
-
- The ``max-count`` value is the maximum number of upmap entries to
- identify in the run. The default is 100, but you may want to make
- this a smaller number so that the tool completes more quickly (but
- does less work). If it cannot find any additional changes to make
- it will stop early (i.e., when the pool distribution is perfect).
-
- The ``max-deviation`` value defaults to `.01` (i.e., 1%). If an OSD
- utilization varies from the average by less than this amount it
- will be considered perfect.
-
-#. The proposed changes are written to the output file ``out.txt`` in
- the example above. These are normal ceph CLI commands that can be
- run to apply the changes to the cluster. This can be done with::
-
- source out.txt
-
-The above steps can be repeated as many times as necessary to achieve
-a perfect distribution of PGs for each set of pools.
-
-You can see some (gory) details about what the tool is doing by
-passing ``--debug-osd 10`` to ``osdmaptool``.
diff --git a/src/ceph/doc/rados/operations/user-management.rst b/src/ceph/doc/rados/operations/user-management.rst
deleted file mode 100644
index 8a35a50..0000000
--- a/src/ceph/doc/rados/operations/user-management.rst
+++ /dev/null
@@ -1,665 +0,0 @@
-=================
- User Management
-=================
-
-This document describes :term:`Ceph Client` users, and their authentication and
-authorization with the :term:`Ceph Storage Cluster`. Users are either
-individuals or system actors such as applications, which use Ceph clients to
-interact with the Ceph Storage Cluster daemons.
-
-.. ditaa:: +-----+
- | {o} |
- | |
- +--+--+ /---------\ /---------\
- | | Ceph | | Ceph |
- ---+---*----->| |<------------->| |
- | uses | Clients | | Servers |
- | \---------/ \---------/
- /--+--\
- | |
- | |
- actor
-
-
-When Ceph runs with authentication and authorization enabled (enabled by
-default), you must specify a user name and a keyring containing the secret key
-of the specified user (usually via the command line). If you do not specify a
-user name, Ceph will use ``client.admin`` as the default user name. If you do
-not specify a keyring, Ceph will look for a keyring via the ``keyring`` setting
-in the Ceph configuration. For example, if you execute the ``ceph health``
-command without specifying a user or keyring::
-
- ceph health
-
-Ceph interprets the command like this::
-
- ceph -n client.admin --keyring=/etc/ceph/ceph.client.admin.keyring health
-
-Alternatively, you may use the ``CEPH_ARGS`` environment variable to avoid
-re-entry of the user name and secret.
-
-For details on configuring the Ceph Storage Cluster to use authentication,
-see `Cephx Config Reference`_. For details on the architecture of Cephx, see
-`Architecture - High Availability Authentication`_.
-
-
-Background
-==========
-
-Irrespective of the type of Ceph client (e.g., Block Device, Object Storage,
-Filesystem, native API, etc.), Ceph stores all data as objects within `pools`_.
-Ceph users must have access to pools in order to read and write data.
-Additionally, Ceph users must have execute permissions to use Ceph's
-administrative commands. The following concepts will help you understand Ceph
-user management.
-
-
-User
-----
-
-A user is either an individual or a system actor such as an application.
-Creating users allows you to control who (or what) can access your Ceph Storage
-Cluster, its pools, and the data within pools.
-
-Ceph has the notion of a ``type`` of user. For the purposes of user management,
-the type will always be ``client``. Ceph identifies users in period (.)
-delimited form consisting of the user type and the user ID: for example,
-``TYPE.ID``, ``client.admin``, or ``client.user1``. The reason for user typing
-is that Ceph Monitors, OSDs, and Metadata Servers also use the Cephx protocol,
-but they are not clients. Distinguishing the user type helps to distinguish
-between client users and other users--streamlining access control, user
-monitoring and traceability.
-
-Sometimes Ceph's user type may seem confusing, because the Ceph command line
-allows you to specify a user with or without the type, depending upon your
-command line usage. If you specify ``--user`` or ``--id``, you can omit the
-type. So ``client.user1`` can be entered simply as ``user1``. If you specify
-``--name`` or ``-n``, you must specify the type and name, such as
-``client.user1``. We recommend using the type and name as a best practice
-wherever possible.
-
-.. note:: A Ceph Storage Cluster user is not the same as a Ceph Object Storage
- user or a Ceph Filesystem user. The Ceph Object Gateway uses a Ceph Storage
- Cluster user to communicate between the gateway daemon and the storage
- cluster, but the gateway has its own user management functionality for end
- users. The Ceph Filesystem uses POSIX semantics. The user space associated
- with the Ceph Filesystem is not the same as a Ceph Storage Cluster user.
-
-
-
-Authorization (Capabilities)
-----------------------------
-
-Ceph uses the term "capabilities" (caps) to describe authorizing an
-authenticated user to exercise the functionality of the monitors, OSDs and
-metadata servers. Capabilities can also restrict access to data within a pool or
-a namespace within a pool. A Ceph administrative user sets a user's
-capabilities when creating or updating a user.
-
-Capability syntax follows the form::
-
- {daemon-type} '{capspec}[, {capspec} ...]'
-
-- **Monitor Caps:** Monitor capabilities include ``r``, ``w``, ``x`` access
- settings or ``profile {name}``. For example::
-
- mon 'allow rwx'
- mon 'profile osd'
-
-- **OSD Caps:** OSD capabilities include ``r``, ``w``, ``x``, ``class-read``,
- ``class-write`` access settings or ``profile {name}``. Additionally, OSD
- capabilities also allow for pool and namespace settings. ::
-
- osd 'allow {access} [pool={pool-name} [namespace={namespace-name}]]'
- osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]'
-
-- **Metadata Server Caps:** For administrators, use ``allow *``. For all
- other users, such as CephFS clients, consult :doc:`/cephfs/client-auth`
-
-
-.. note:: The Ceph Object Gateway daemon (``radosgw``) is a client of the
- Ceph Storage Cluster, so it is not represented as a Ceph Storage
- Cluster daemon type.
-
-The following entries describe each capability.
-
-``allow``
-
-:Description: Precedes access settings for a daemon. Implies ``rw``
- for MDS only.
-
-
-``r``
-
-:Description: Gives the user read access. Required with monitors to retrieve
- the CRUSH map.
-
-
-``w``
-
-:Description: Gives the user write access to objects.
-
-
-``x``
-
-:Description: Gives the user the capability to call class methods
- (i.e., both read and write) and to conduct ``auth``
- operations on monitors.
-
-
-``class-read``
-
-:Descriptions: Gives the user the capability to call class read methods.
- Subset of ``x``.
-
-
-``class-write``
-
-:Description: Gives the user the capability to call class write methods.
- Subset of ``x``.
-
-
-``*``
-
-:Description: Gives the user read, write and execute permissions for a
- particular daemon/pool, and the ability to execute
- admin commands.
-
-
-``profile osd`` (Monitor only)
-
-:Description: Gives a user permissions to connect as an OSD to other OSDs or
- monitors. Conferred on OSDs to enable OSDs to handle replication
- heartbeat traffic and status reporting.
-
-
-``profile mds`` (Monitor only)
-
-:Description: Gives a user permissions to connect as a MDS to other MDSs or
- monitors.
-
-
-``profile bootstrap-osd`` (Monitor only)
-
-:Description: Gives a user permissions to bootstrap an OSD. Conferred on
- deployment tools such as ``ceph-disk``, ``ceph-deploy``, etc.
- so that they have permissions to add keys, etc. when
- bootstrapping an OSD.
-
-
-``profile bootstrap-mds`` (Monitor only)
-
-:Description: Gives a user permissions to bootstrap a metadata server.
- Conferred on deployment tools such as ``ceph-deploy``, etc.
- so they have permissions to add keys, etc. when bootstrapping
- a metadata server.
-
-``profile rbd`` (Monitor and OSD)
-
-:Description: Gives a user permissions to manipulate RBD images. When used
- as a Monitor cap, it provides the minimal privileges required
- by an RBD client application. When used as an OSD cap, it
- provides read-write access to an RBD client application.
-
-``profile rbd-read-only`` (OSD only)
-
-:Description: Gives a user read-only permissions to an RBD image.
-
-
-Pool
-----
-
-A pool is a logical partition where users store data.
-In Ceph deployments, it is common to create a pool as a logical partition for
-similar types of data. For example, when deploying Ceph as a backend for
-OpenStack, a typical deployment would have pools for volumes, images, backups
-and virtual machines, and users such as ``client.glance``, ``client.cinder``,
-etc.
-
-
-Namespace
----------
-
-Objects within a pool can be associated to a namespace--a logical group of
-objects within the pool. A user's access to a pool can be associated with a
-namespace such that reads and writes by the user take place only within the
-namespace. Objects written to a namespace within the pool can only be accessed
-by users who have access to the namespace.
-
-.. note:: Namespaces are primarily useful for applications written on top of
- ``librados`` where the logical grouping can alleviate the need to create
- different pools. Ceph Object Gateway (from ``luminous``) uses namespaces for various
- metadata objects.
-
-The rationale for namespaces is that pools can be a computationally expensive
-method of segregating data sets for the purposes of authorizing separate sets
-of users. For example, a pool should have ~100 placement groups per OSD. So an
-exemplary cluster with 1000 OSDs would have 100,000 placement groups for one
-pool. Each pool would create another 100,000 placement groups in the exemplary
-cluster. By contrast, writing an object to a namespace simply associates the
-namespace to the object name with out the computational overhead of a separate
-pool. Rather than creating a separate pool for a user or set of users, you may
-use a namespace. **Note:** Only available using ``librados`` at this time.
-
-
-Managing Users
-==============
-
-User management functionality provides Ceph Storage Cluster administrators with
-the ability to create, update and delete users directly in the Ceph Storage
-Cluster.
-
-When you create or delete users in the Ceph Storage Cluster, you may need to
-distribute keys to clients so that they can be added to keyrings. See `Keyring
-Management`_ for details.
-
-
-List Users
-----------
-
-To list the users in your cluster, execute the following::
-
- ceph auth ls
-
-Ceph will list out all users in your cluster. For example, in a two-node
-exemplary cluster, ``ceph auth ls`` will output something that looks like
-this::
-
- installed auth entries:
-
- osd.0
- key: AQCvCbtToC6MDhAATtuT70Sl+DymPCfDSsyV4w==
- caps: [mon] allow profile osd
- caps: [osd] allow *
- osd.1
- key: AQC4CbtTCFJBChAAVq5spj0ff4eHZICxIOVZeA==
- caps: [mon] allow profile osd
- caps: [osd] allow *
- client.admin
- key: AQBHCbtT6APDHhAA5W00cBchwkQjh3dkKsyPjw==
- caps: [mds] allow
- caps: [mon] allow *
- caps: [osd] allow *
- client.bootstrap-mds
- key: AQBICbtTOK9uGBAAdbe5zcIGHZL3T/u2g6EBww==
- caps: [mon] allow profile bootstrap-mds
- client.bootstrap-osd
- key: AQBHCbtT4GxqORAADE5u7RkpCN/oo4e5W0uBtw==
- caps: [mon] allow profile bootstrap-osd
-
-
-Note that the ``TYPE.ID`` notation for users applies such that ``osd.0`` is a
-user of type ``osd`` and its ID is ``0``, ``client.admin`` is a user of type
-``client`` and its ID is ``admin`` (i.e., the default ``client.admin`` user).
-Note also that each entry has a ``key: <value>`` entry, and one or more
-``caps:`` entries.
-
-You may use the ``-o {filename}`` option with ``ceph auth ls`` to
-save the output to a file.
-
-
-Get a User
-----------
-
-To retrieve a specific user, key and capabilities, execute the
-following::
-
- ceph auth get {TYPE.ID}
-
-For example::
-
- ceph auth get client.admin
-
-You may also use the ``-o {filename}`` option with ``ceph auth get`` to
-save the output to a file. Developers may also execute the following::
-
- ceph auth export {TYPE.ID}
-
-The ``auth export`` command is identical to ``auth get``, but also prints
-out the internal ``auid``, which is not relevant to end users.
-
-
-
-Add a User
-----------
-
-Adding a user creates a username (i.e., ``TYPE.ID``), a secret key and
-any capabilities included in the command you use to create the user.
-
-A user's key enables the user to authenticate with the Ceph Storage Cluster.
-The user's capabilities authorize the user to read, write, or execute on Ceph
-monitors (``mon``), Ceph OSDs (``osd``) or Ceph Metadata Servers (``mds``).
-
-There are a few ways to add a user:
-
-- ``ceph auth add``: This command is the canonical way to add a user. It
- will create the user, generate a key and add any specified capabilities.
-
-- ``ceph auth get-or-create``: This command is often the most convenient way
- to create a user, because it returns a keyfile format with the user name
- (in brackets) and the key. If the user already exists, this command
- simply returns the user name and key in the keyfile format. You may use the
- ``-o {filename}`` option to save the output to a file.
-
-- ``ceph auth get-or-create-key``: This command is a convenient way to create
- a user and return the user's key (only). This is useful for clients that
- need the key only (e.g., libvirt). If the user already exists, this command
- simply returns the key. You may use the ``-o {filename}`` option to save the
- output to a file.
-
-When creating client users, you may create a user with no capabilities. A user
-with no capabilities is useless beyond mere authentication, because the client
-cannot retrieve the cluster map from the monitor. However, you can create a
-user with no capabilities if you wish to defer adding capabilities later using
-the ``ceph auth caps`` command.
-
-A typical user has at least read capabilities on the Ceph monitor and
-read and write capability on Ceph OSDs. Additionally, a user's OSD permissions
-are often restricted to accessing a particular pool. ::
-
- ceph auth add client.john mon 'allow r' osd 'allow rw pool=liverpool'
- ceph auth get-or-create client.paul mon 'allow r' osd 'allow rw pool=liverpool'
- ceph auth get-or-create client.george mon 'allow r' osd 'allow rw pool=liverpool' -o george.keyring
- ceph auth get-or-create-key client.ringo mon 'allow r' osd 'allow rw pool=liverpool' -o ringo.key
-
-
-.. important:: If you provide a user with capabilities to OSDs, but you DO NOT
- restrict access to particular pools, the user will have access to ALL
- pools in the cluster!
-
-
-.. _modify-user-capabilities:
-
-Modify User Capabilities
-------------------------
-
-The ``ceph auth caps`` command allows you to specify a user and change the
-user's capabilities. Setting new capabilities will overwrite current capabilities.
-To view current capabilities run ``ceph auth get USERTYPE.USERID``. To add
-capabilities, you should also specify the existing capabilities when using the form::
-
- ceph auth caps USERTYPE.USERID {daemon} 'allow [r|w|x|*|...] [pool={pool-name}] [namespace={namespace-name}]' [{daemon} 'allow [r|w|x|*|...] [pool={pool-name}] [namespace={namespace-name}]']
-
-For example::
-
- ceph auth get client.john
- ceph auth caps client.john mon 'allow r' osd 'allow rw pool=liverpool'
- ceph auth caps client.paul mon 'allow rw' osd 'allow rwx pool=liverpool'
- ceph auth caps client.brian-manager mon 'allow *' osd 'allow *'
-
-To remove a capability, you may reset the capability. If you want the user
-to have no access to a particular daemon that was previously set, specify
-an empty string. For example::
-
- ceph auth caps client.ringo mon ' ' osd ' '
-
-See `Authorization (Capabilities)`_ for additional details on capabilities.
-
-
-Delete a User
--------------
-
-To delete a user, use ``ceph auth del``::
-
- ceph auth del {TYPE}.{ID}
-
-Where ``{TYPE}`` is one of ``client``, ``osd``, ``mon``, or ``mds``,
-and ``{ID}`` is the user name or ID of the daemon.
-
-
-Print a User's Key
-------------------
-
-To print a user's authentication key to standard output, execute the following::
-
- ceph auth print-key {TYPE}.{ID}
-
-Where ``{TYPE}`` is one of ``client``, ``osd``, ``mon``, or ``mds``,
-and ``{ID}`` is the user name or ID of the daemon.
-
-Printing a user's key is useful when you need to populate client
-software with a user's key (e.g., libvirt). ::
-
- mount -t ceph serverhost:/ mountpoint -o name=client.user,secret=`ceph auth print-key client.user`
-
-
-Import a User(s)
-----------------
-
-To import one or more users, use ``ceph auth import`` and
-specify a keyring::
-
- ceph auth import -i /path/to/keyring
-
-For example::
-
- sudo ceph auth import -i /etc/ceph/ceph.keyring
-
-
-.. note:: The ceph storage cluster will add new users, their keys and their
- capabilities and will update existing users, their keys and their
- capabilities.
-
-
-Keyring Management
-==================
-
-When you access Ceph via a Ceph client, the Ceph client will look for a local
-keyring. Ceph presets the ``keyring`` setting with the following four keyring
-names by default so you don't have to set them in your Ceph configuration file
-unless you want to override the defaults (not recommended):
-
-- ``/etc/ceph/$cluster.$name.keyring``
-- ``/etc/ceph/$cluster.keyring``
-- ``/etc/ceph/keyring``
-- ``/etc/ceph/keyring.bin``
-
-The ``$cluster`` metavariable is your Ceph cluster name as defined by the
-name of the Ceph configuration file (i.e., ``ceph.conf`` means the cluster name
-is ``ceph``; thus, ``ceph.keyring``). The ``$name`` metavariable is the user
-type and user ID (e.g., ``client.admin``; thus, ``ceph.client.admin.keyring``).
-
-.. note:: When executing commands that read or write to ``/etc/ceph``, you may
- need to use ``sudo`` to execute the command as ``root``.
-
-After you create a user (e.g., ``client.ringo``), you must get the key and add
-it to a keyring on a Ceph client so that the user can access the Ceph Storage
-Cluster.
-
-The `User Management`_ section details how to list, get, add, modify and delete
-users directly in the Ceph Storage Cluster. However, Ceph also provides the
-``ceph-authtool`` utility to allow you to manage keyrings from a Ceph client.
-
-
-Create a Keyring
-----------------
-
-When you use the procedures in the `Managing Users`_ section to create users,
-you need to provide user keys to the Ceph client(s) so that the Ceph client
-can retrieve the key for the specified user and authenticate with the Ceph
-Storage Cluster. Ceph Clients access keyrings to lookup a user name and
-retrieve the user's key.
-
-The ``ceph-authtool`` utility allows you to create a keyring. To create an
-empty keyring, use ``--create-keyring`` or ``-C``. For example::
-
- ceph-authtool --create-keyring /path/to/keyring
-
-When creating a keyring with multiple users, we recommend using the cluster name
-(e.g., ``$cluster.keyring``) for the keyring filename and saving it in the
-``/etc/ceph`` directory so that the ``keyring`` configuration default setting
-will pick up the filename without requiring you to specify it in the local copy
-of your Ceph configuration file. For example, create ``ceph.keyring`` by
-executing the following::
-
- sudo ceph-authtool -C /etc/ceph/ceph.keyring
-
-When creating a keyring with a single user, we recommend using the cluster name,
-the user type and the user name and saving it in the ``/etc/ceph`` directory.
-For example, ``ceph.client.admin.keyring`` for the ``client.admin`` user.
-
-To create a keyring in ``/etc/ceph``, you must do so as ``root``. This means
-the file will have ``rw`` permissions for the ``root`` user only, which is
-appropriate when the keyring contains administrator keys. However, if you
-intend to use the keyring for a particular user or group of users, ensure
-that you execute ``chown`` or ``chmod`` to establish appropriate keyring
-ownership and access.
-
-
-Add a User to a Keyring
------------------------
-
-When you `Add a User`_ to the Ceph Storage Cluster, you can use the `Get a
-User`_ procedure to retrieve a user, key and capabilities and save the user to a
-keyring.
-
-When you only want to use one user per keyring, the `Get a User`_ procedure with
-the ``-o`` option will save the output in the keyring file format. For example,
-to create a keyring for the ``client.admin`` user, execute the following::
-
- sudo ceph auth get client.admin -o /etc/ceph/ceph.client.admin.keyring
-
-Notice that we use the recommended file format for an individual user.
-
-When you want to import users to a keyring, you can use ``ceph-authtool``
-to specify the destination keyring and the source keyring.
-For example::
-
- sudo ceph-authtool /etc/ceph/ceph.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
-
-
-Create a User
--------------
-
-Ceph provides the `Add a User`_ function to create a user directly in the Ceph
-Storage Cluster. However, you can also create a user, keys and capabilities
-directly on a Ceph client keyring. Then, you can import the user to the Ceph
-Storage Cluster. For example::
-
- sudo ceph-authtool -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.keyring
-
-See `Authorization (Capabilities)`_ for additional details on capabilities.
-
-You can also create a keyring and add a new user to the keyring simultaneously.
-For example::
-
- sudo ceph-authtool -C /etc/ceph/ceph.keyring -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx' --gen-key
-
-In the foregoing scenarios, the new user ``client.ringo`` is only in the
-keyring. To add the new user to the Ceph Storage Cluster, you must still add
-the new user to the Ceph Storage Cluster. ::
-
- sudo ceph auth add client.ringo -i /etc/ceph/ceph.keyring
-
-
-Modify a User
--------------
-
-To modify the capabilities of a user record in a keyring, specify the keyring,
-and the user followed by the capabilities. For example::
-
- sudo ceph-authtool /etc/ceph/ceph.keyring -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx'
-
-To update the user to the Ceph Storage Cluster, you must update the user
-in the keyring to the user entry in the the Ceph Storage Cluster. ::
-
- sudo ceph auth import -i /etc/ceph/ceph.keyring
-
-See `Import a User(s)`_ for details on updating a Ceph Storage Cluster user
-from a keyring.
-
-You may also `Modify User Capabilities`_ directly in the cluster, store the
-results to a keyring file; then, import the keyring into your main
-``ceph.keyring`` file.
-
-
-Command Line Usage
-==================
-
-Ceph supports the following usage for user name and secret:
-
-``--id`` | ``--user``
-
-:Description: Ceph identifies users with a type and an ID (e.g., ``TYPE.ID`` or
- ``client.admin``, ``client.user1``). The ``id``, ``name`` and
- ``-n`` options enable you to specify the ID portion of the user
- name (e.g., ``admin``, ``user1``, ``foo``, etc.). You can specify
- the user with the ``--id`` and omit the type. For example,
- to specify user ``client.foo`` enter the following::
-
- ceph --id foo --keyring /path/to/keyring health
- ceph --user foo --keyring /path/to/keyring health
-
-
-``--name`` | ``-n``
-
-:Description: Ceph identifies users with a type and an ID (e.g., ``TYPE.ID`` or
- ``client.admin``, ``client.user1``). The ``--name`` and ``-n``
- options enables you to specify the fully qualified user name.
- You must specify the user type (typically ``client``) with the
- user ID. For example::
-
- ceph --name client.foo --keyring /path/to/keyring health
- ceph -n client.foo --keyring /path/to/keyring health
-
-
-``--keyring``
-
-:Description: The path to the keyring containing one or more user name and
- secret. The ``--secret`` option provides the same functionality,
- but it does not work with Ceph RADOS Gateway, which uses
- ``--secret`` for another purpose. You may retrieve a keyring with
- ``ceph auth get-or-create`` and store it locally. This is a
- preferred approach, because you can switch user names without
- switching the keyring path. For example::
-
- sudo rbd map --id foo --keyring /path/to/keyring mypool/myimage
-
-
-.. _pools: ../pools
-
-
-Limitations
-===========
-
-The ``cephx`` protocol authenticates Ceph clients and servers to each other. It
-is not intended to handle authentication of human users or application programs
-run on their behalf. If that effect is required to handle your access control
-needs, you must have another mechanism, which is likely to be specific to the
-front end used to access the Ceph object store. This other mechanism has the
-role of ensuring that only acceptable users and programs are able to run on the
-machine that Ceph will permit to access its object store.
-
-The keys used to authenticate Ceph clients and servers are typically stored in
-a plain text file with appropriate permissions in a trusted host.
-
-.. important:: Storing keys in plaintext files has security shortcomings, but
- they are difficult to avoid, given the basic authentication methods Ceph
- uses in the background. Those setting up Ceph systems should be aware of
- these shortcomings.
-
-In particular, arbitrary user machines, especially portable machines, should not
-be configured to interact directly with Ceph, since that mode of use would
-require the storage of a plaintext authentication key on an insecure machine.
-Anyone who stole that machine or obtained surreptitious access to it could
-obtain the key that will allow them to authenticate their own machines to Ceph.
-
-Rather than permitting potentially insecure machines to access a Ceph object
-store directly, users should be required to sign in to a trusted machine in
-your environment using a method that provides sufficient security for your
-purposes. That trusted machine will store the plaintext Ceph keys for the
-human users. A future version of Ceph may address these particular
-authentication issues more fully.
-
-At the moment, none of the Ceph authentication protocols provide secrecy for
-messages in transit. Thus, an eavesdropper on the wire can hear and understand
-all data sent between clients and servers in Ceph, even if it cannot create or
-alter them. Further, Ceph does not include options to encrypt user data in the
-object store. Users can hand-encrypt and store their own data in the Ceph
-object store, of course, but Ceph provides no features to perform object
-encryption itself. Those storing sensitive data in Ceph should consider
-encrypting their data before providing it to the Ceph system.
-
-
-.. _Architecture - High Availability Authentication: ../../../architecture#high-availability-authentication
-.. _Cephx Config Reference: ../../configuration/auth-config-ref