summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/rados/configuration
diff options
context:
space:
mode:
Diffstat (limited to 'src/ceph/doc/rados/configuration')
-rw-r--r--src/ceph/doc/rados/configuration/auth-config-ref.rst432
-rw-r--r--src/ceph/doc/rados/configuration/bluestore-config-ref.rst297
-rw-r--r--src/ceph/doc/rados/configuration/ceph-conf.rst629
-rw-r--r--src/ceph/doc/rados/configuration/demo-ceph.conf31
-rw-r--r--src/ceph/doc/rados/configuration/filestore-config-ref.rst365
-rw-r--r--src/ceph/doc/rados/configuration/general-config-ref.rst66
-rw-r--r--src/ceph/doc/rados/configuration/index.rst64
-rw-r--r--src/ceph/doc/rados/configuration/journal-ref.rst116
-rw-r--r--src/ceph/doc/rados/configuration/mon-config-ref.rst1222
-rw-r--r--src/ceph/doc/rados/configuration/mon-lookup-dns.rst51
-rw-r--r--src/ceph/doc/rados/configuration/mon-osd-interaction.rst408
-rw-r--r--src/ceph/doc/rados/configuration/ms-ref.rst154
-rw-r--r--src/ceph/doc/rados/configuration/network-config-ref.rst494
-rw-r--r--src/ceph/doc/rados/configuration/osd-config-ref.rst1105
-rw-r--r--src/ceph/doc/rados/configuration/pool-pg-config-ref.rst270
-rw-r--r--src/ceph/doc/rados/configuration/pool-pg.conf20
-rw-r--r--src/ceph/doc/rados/configuration/storage-devices.rst83
17 files changed, 5807 insertions, 0 deletions
diff --git a/src/ceph/doc/rados/configuration/auth-config-ref.rst b/src/ceph/doc/rados/configuration/auth-config-ref.rst
new file mode 100644
index 0000000..eb14fa4
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/auth-config-ref.rst
@@ -0,0 +1,432 @@
+========================
+ Cephx Config Reference
+========================
+
+The ``cephx`` protocol is enabled by default. Cryptographic authentication has
+some computational costs, though they should generally be quite low. If the
+network environment connecting your client and server hosts is very safe and
+you cannot afford authentication, you can turn it off. **This is not generally
+recommended**.
+
+.. note:: If you disable authentication, you are at risk of a man-in-the-middle
+ attack altering your client/server messages, which could lead to disastrous
+ security effects.
+
+For creating users, see `User Management`_. For details on the architecture
+of Cephx, see `Architecture - High Availability Authentication`_.
+
+
+Deployment Scenarios
+====================
+
+There are two main scenarios for deploying a Ceph cluster, which impact
+how you initially configure Cephx. Most first time Ceph users use
+``ceph-deploy`` to create a cluster (easiest). For clusters using
+other deployment tools (e.g., Chef, Juju, Puppet, etc.), you will need
+to use the manual procedures or configure your deployment tool to
+bootstrap your monitor(s).
+
+ceph-deploy
+-----------
+
+When you deploy a cluster with ``ceph-deploy``, you do not have to bootstrap the
+monitor manually or create the ``client.admin`` user or keyring. The steps you
+execute in the `Storage Cluster Quick Start`_ will invoke ``ceph-deploy`` to do
+that for you.
+
+When you execute ``ceph-deploy new {initial-monitor(s)}``, Ceph will create a
+monitor keyring for you (only used to bootstrap monitors), and it will generate
+an initial Ceph configuration file for you, which contains the following
+authentication settings, indicating that Ceph enables authentication by
+default::
+
+ auth_cluster_required = cephx
+ auth_service_required = cephx
+ auth_client_required = cephx
+
+When you execute ``ceph-deploy mon create-initial``, Ceph will bootstrap the
+initial monitor(s), retrieve a ``ceph.client.admin.keyring`` file containing the
+key for the ``client.admin`` user. Additionally, it will also retrieve keyrings
+that give ``ceph-deploy`` and ``ceph-disk`` utilities the ability to prepare and
+activate OSDs and metadata servers.
+
+When you execute ``ceph-deploy admin {node-name}`` (**note:** Ceph must be
+installed first), you are pushing a Ceph configuration file and the
+``ceph.client.admin.keyring`` to the ``/etc/ceph`` directory of the node. You
+will be able to execute Ceph administrative functions as ``root`` on the command
+line of that node.
+
+
+Manual Deployment
+-----------------
+
+When you deploy a cluster manually, you have to bootstrap the monitor manually
+and create the ``client.admin`` user and keyring. To bootstrap monitors, follow
+the steps in `Monitor Bootstrapping`_. The steps for monitor bootstrapping are
+the logical steps you must perform when using third party deployment tools like
+Chef, Puppet, Juju, etc.
+
+
+Enabling/Disabling Cephx
+========================
+
+Enabling Cephx requires that you have deployed keys for your monitors,
+OSDs and metadata servers. If you are simply toggling Cephx on / off,
+you do not have to repeat the bootstrapping procedures.
+
+
+Enabling Cephx
+--------------
+
+When ``cephx`` is enabled, Ceph will look for the keyring in the default search
+path, which includes ``/etc/ceph/$cluster.$name.keyring``. You can override
+this location by adding a ``keyring`` option in the ``[global]`` section of
+your `Ceph configuration`_ file, but this is not recommended.
+
+Execute the following procedures to enable ``cephx`` on a cluster with
+authentication disabled. If you (or your deployment utility) have already
+generated the keys, you may skip the steps related to generating keys.
+
+#. Create a ``client.admin`` key, and save a copy of the key for your client
+ host::
+
+ ceph auth get-or-create client.admin mon 'allow *' mds 'allow *' osd 'allow *' -o /etc/ceph/ceph.client.admin.keyring
+
+ **Warning:** This will clobber any existing
+ ``/etc/ceph/client.admin.keyring`` file. Do not perform this step if a
+ deployment tool has already done it for you. Be careful!
+
+#. Create a keyring for your monitor cluster and generate a monitor
+ secret key. ::
+
+ ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
+
+#. Copy the monitor keyring into a ``ceph.mon.keyring`` file in every monitor's
+ ``mon data`` directory. For example, to copy it to ``mon.a`` in cluster ``ceph``,
+ use the following::
+
+ cp /tmp/ceph.mon.keyring /var/lib/ceph/mon/ceph-a/keyring
+
+#. Generate a secret key for every OSD, where ``{$id}`` is the OSD number::
+
+ ceph auth get-or-create osd.{$id} mon 'allow rwx' osd 'allow *' -o /var/lib/ceph/osd/ceph-{$id}/keyring
+
+#. Generate a secret key for every MDS, where ``{$id}`` is the MDS letter::
+
+ ceph auth get-or-create mds.{$id} mon 'allow rwx' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mds/ceph-{$id}/keyring
+
+#. Enable ``cephx`` authentication by setting the following options in the
+ ``[global]`` section of your `Ceph configuration`_ file::
+
+ auth cluster required = cephx
+ auth service required = cephx
+ auth client required = cephx
+
+
+#. Start or restart the Ceph cluster. See `Operating a Cluster`_ for details.
+
+For details on bootstrapping a monitor manually, see `Manual Deployment`_.
+
+
+
+Disabling Cephx
+---------------
+
+The following procedure describes how to disable Cephx. If your cluster
+environment is relatively safe, you can offset the computation expense of
+running authentication. **We do not recommend it.** However, it may be easier
+during setup and/or troubleshooting to temporarily disable authentication.
+
+#. Disable ``cephx`` authentication by setting the following options in the
+ ``[global]`` section of your `Ceph configuration`_ file::
+
+ auth cluster required = none
+ auth service required = none
+ auth client required = none
+
+
+#. Start or restart the Ceph cluster. See `Operating a Cluster`_ for details.
+
+
+Configuration Settings
+======================
+
+Enablement
+----------
+
+
+``auth cluster required``
+
+:Description: If enabled, the Ceph Storage Cluster daemons (i.e., ``ceph-mon``,
+ ``ceph-osd``, and ``ceph-mds``) must authenticate with
+ each other. Valid settings are ``cephx`` or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+
+``auth service required``
+
+:Description: If enabled, the Ceph Storage Cluster daemons require Ceph Clients
+ to authenticate with the Ceph Storage Cluster in order to access
+ Ceph services. Valid settings are ``cephx`` or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+
+``auth client required``
+
+:Description: If enabled, the Ceph Client requires the Ceph Storage Cluster to
+ authenticate with the Ceph Client. Valid settings are ``cephx``
+ or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+
+.. index:: keys; keyring
+
+Keys
+----
+
+When you run Ceph with authentication enabled, ``ceph`` administrative commands
+and Ceph Clients require authentication keys to access the Ceph Storage Cluster.
+
+The most common way to provide these keys to the ``ceph`` administrative
+commands and clients is to include a Ceph keyring under the ``/etc/ceph``
+directory. For Cuttlefish and later releases using ``ceph-deploy``, the filename
+is usually ``ceph.client.admin.keyring`` (or ``$cluster.client.admin.keyring``).
+If you include the keyring under the ``/etc/ceph`` directory, you don't need to
+specify a ``keyring`` entry in your Ceph configuration file.
+
+We recommend copying the Ceph Storage Cluster's keyring file to nodes where you
+will run administrative commands, because it contains the ``client.admin`` key.
+
+You may use ``ceph-deploy admin`` to perform this task. See `Create an Admin
+Host`_ for details. To perform this step manually, execute the following::
+
+ sudo scp {user}@{ceph-cluster-host}:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring
+
+.. tip:: Ensure the ``ceph.keyring`` file has appropriate permissions set
+ (e.g., ``chmod 644``) on your client machine.
+
+You may specify the key itself in the Ceph configuration file using the ``key``
+setting (not recommended), or a path to a keyfile using the ``keyfile`` setting.
+
+
+``keyring``
+
+:Description: The path to the keyring file.
+:Type: String
+:Required: No
+:Default: ``/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin``
+
+
+``keyfile``
+
+:Description: The path to a key file (i.e,. a file containing only the key).
+:Type: String
+:Required: No
+:Default: None
+
+
+``key``
+
+:Description: The key (i.e., the text string of the key itself). Not recommended.
+:Type: String
+:Required: No
+:Default: None
+
+
+Daemon Keyrings
+---------------
+
+Administrative users or deployment tools (e.g., ``ceph-deploy``) may generate
+daemon keyrings in the same way as generating user keyrings. By default, Ceph
+stores daemons keyrings inside their data directory. The default keyring
+locations, and the capabilities necessary for the daemon to function, are shown
+below.
+
+``ceph-mon``
+
+:Location: ``$mon_data/keyring``
+:Capabilities: ``mon 'allow *'``
+
+``ceph-osd``
+
+:Location: ``$osd_data/keyring``
+:Capabilities: ``mon 'allow profile osd' osd 'allow *'``
+
+``ceph-mds``
+
+:Location: ``$mds_data/keyring``
+:Capabilities: ``mds 'allow' mon 'allow profile mds' osd 'allow rwx'``
+
+``radosgw``
+
+:Location: ``$rgw_data/keyring``
+:Capabilities: ``mon 'allow rwx' osd 'allow rwx'``
+
+
+.. note:: The monitor keyring (i.e., ``mon.``) contains a key but no
+ capabilities, and is not part of the cluster ``auth`` database.
+
+The daemon data directory locations default to directories of the form::
+
+ /var/lib/ceph/$type/$cluster-$id
+
+For example, ``osd.12`` would be::
+
+ /var/lib/ceph/osd/ceph-12
+
+You can override these locations, but it is not recommended.
+
+
+.. index:: signatures
+
+Signatures
+----------
+
+In Ceph Bobtail and subsequent versions, we prefer that Ceph authenticate all
+ongoing messages between the entities using the session key set up for that
+initial authentication. However, Argonaut and earlier Ceph daemons do not know
+how to perform ongoing message authentication. To maintain backward
+compatibility (e.g., running both Botbail and Argonaut daemons in the same
+cluster), message signing is **off** by default. If you are running Bobtail or
+later daemons exclusively, configure Ceph to require signatures.
+
+Like other parts of Ceph authentication, Ceph provides fine-grained control so
+you can enable/disable signatures for service messages between the client and
+Ceph, and you can enable/disable signatures for messages between Ceph daemons.
+
+
+``cephx require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message
+ traffic between the Ceph Client and the Ceph Storage Cluster, and
+ between daemons comprising the Ceph Storage Cluster.
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx cluster require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message
+ traffic between Ceph daemons comprising the Ceph Storage Cluster.
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx service require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message
+ traffic between Ceph Clients and the Ceph Storage Cluster.
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx sign messages``
+
+:Description: If the Ceph version supports message signing, Ceph will sign
+ all messages so they cannot be spoofed.
+
+:Type: Boolean
+:Default: ``true``
+
+
+Time to Live
+------------
+
+``auth service ticket ttl``
+
+:Description: When the Ceph Storage Cluster sends a Ceph Client a ticket for
+ authentication, the Ceph Storage Cluster assigns the ticket a
+ time to live.
+
+:Type: Double
+:Default: ``60*60``
+
+
+Backward Compatibility
+======================
+
+For Cuttlefish and earlier releases, see `Cephx`_.
+
+In Ceph Argonaut v0.48 and earlier versions, if you enable ``cephx``
+authentication, Ceph only authenticates the initial communication between the
+client and daemon; Ceph does not authenticate the subsequent messages they send
+to each other, which has security implications. In Ceph Bobtail and subsequent
+versions, Ceph authenticates all ongoing messages between the entities using the
+session key set up for that initial authentication.
+
+We identified a backward compatibility issue between Argonaut v0.48 (and prior
+versions) and Bobtail (and subsequent versions). During testing, if you
+attempted to use Argonaut (and earlier) daemons with Bobtail (and later)
+daemons, the Argonaut daemons did not know how to perform ongoing message
+authentication, while the Bobtail versions of the daemons insist on
+authenticating message traffic subsequent to the initial
+request/response--making it impossible for Argonaut (and prior) daemons to
+interoperate with Bobtail (and subsequent) daemons.
+
+We have addressed this potential problem by providing a means for Argonaut (and
+prior) systems to interact with Bobtail (and subsequent) systems. Here's how it
+works: by default, the newer systems will not insist on seeing signatures from
+older systems that do not know how to perform them, but will simply accept such
+messages without authenticating them. This new default behavior provides the
+advantage of allowing two different releases to interact. **We do not recommend
+this as a long term solution**. Allowing newer daemons to forgo ongoing
+authentication has the unfortunate security effect that an attacker with control
+of some of your machines or some access to your network can disable session
+security simply by claiming to be unable to sign messages.
+
+.. note:: Even if you don't actually run any old versions of Ceph,
+ the attacker may be able to force some messages to be accepted unsigned in the
+ default scenario. While running Cephx with the default scenario, Ceph still
+ authenticates the initial communication, but you lose desirable session security.
+
+If you know that you are not running older versions of Ceph, or you are willing
+to accept that old servers and new servers will not be able to interoperate, you
+can eliminate this security risk. If you do so, any Ceph system that is new
+enough to support session authentication and that has Cephx enabled will reject
+unsigned messages. To preclude new servers from interacting with old servers,
+include the following in the ``[global]`` section of your `Ceph
+configuration`_ file directly below the line that specifies the use of Cephx
+for authentication::
+
+ cephx require signatures = true ; everywhere possible
+
+You can also selectively require signatures for cluster internal
+communications only, separate from client-facing service::
+
+ cephx cluster require signatures = true ; for cluster-internal communication
+ cephx service require signatures = true ; for client-facing service
+
+An option to make a client require signatures from the cluster is not
+yet implemented.
+
+**We recommend migrating all daemons to the newer versions and enabling the
+foregoing flag** at the nearest practical time so that you may avail yourself
+of the enhanced authentication.
+
+.. note:: Ceph kernel modules do not support signatures yet.
+
+
+.. _Storage Cluster Quick Start: ../../../start/quick-ceph-deploy/
+.. _Monitor Bootstrapping: ../../../install/manual-deployment#monitor-bootstrapping
+.. _Operating a Cluster: ../../operations/operating
+.. _Manual Deployment: ../../../install/manual-deployment
+.. _Cephx: http://docs.ceph.com/docs/cuttlefish/rados/configuration/auth-config-ref/
+.. _Ceph configuration: ../ceph-conf
+.. _Create an Admin Host: ../../deployment/ceph-deploy-admin
+.. _Architecture - High Availability Authentication: ../../../architecture#high-availability-authentication
+.. _User Management: ../../operations/user-management
diff --git a/src/ceph/doc/rados/configuration/bluestore-config-ref.rst b/src/ceph/doc/rados/configuration/bluestore-config-ref.rst
new file mode 100644
index 0000000..8d8ace6
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/bluestore-config-ref.rst
@@ -0,0 +1,297 @@
+==========================
+BlueStore Config Reference
+==========================
+
+Devices
+=======
+
+BlueStore manages either one, two, or (in certain cases) three storage
+devices.
+
+In the simplest case, BlueStore consumes a single (primary) storage
+device. The storage device is normally partitioned into two parts:
+
+#. A small partition is formatted with XFS and contains basic metadata
+ for the OSD. This *data directory* includes information about the
+ OSD (its identifier, which cluster it belongs to, and its private
+ keyring.
+
+#. The rest of the device is normally a large partition occupying the
+ rest of the device that is managed directly by BlueStore contains
+ all of the actual data. This *primary device* is normally identifed
+ by a ``block`` symlink in data directory.
+
+It is also possible to deploy BlueStore across two additional devices:
+
+* A *WAL device* can be used for BlueStore's internal journal or
+ write-ahead log. It is identified by the ``block.wal`` symlink in
+ the data directory. It is only useful to use a WAL device if the
+ device is faster than the primary device (e.g., when it is on an SSD
+ and the primary device is an HDD).
+* A *DB device* can be used for storing BlueStore's internal metadata.
+ BlueStore (or rather, the embedded RocksDB) will put as much
+ metadata as it can on the DB device to improve performance. If the
+ DB device fills up, metadata will spill back onto the primary device
+ (where it would have been otherwise). Again, it is only helpful to
+ provision a DB device if it is faster than the primary device.
+
+If there is only a small amount of fast storage available (e.g., less
+than a gigabyte), we recommend using it as a WAL device. If there is
+more, provisioning a DB device makes more sense. The BlueStore
+journal will always be placed on the fastest device available, so
+using a DB device will provide the same benefit that the WAL device
+would while *also* allowing additional metadata to be stored there (if
+it will fix).
+
+A single-device BlueStore OSD can be provisioned with::
+
+ ceph-disk prepare --bluestore <device>
+
+To specify a WAL device and/or DB device, ::
+
+ ceph-disk prepare --bluestore <device> --block.wal <wal-device> --block-db <db-device>
+
+Cache size
+==========
+
+The amount of memory consumed by each OSD for BlueStore's cache is
+determined by the ``bluestore_cache_size`` configuration option. If
+that config option is not set (i.e., remains at 0), there is a
+different default value that is used depending on whether an HDD or
+SSD is used for the primary device (set by the
+``bluestore_cache_size_ssd`` and ``bluestore_cache_size_hdd`` config
+options).
+
+BlueStore and the rest of the Ceph OSD does the best it can currently
+to stick to the budgeted memory. Note that on top of the configured
+cache size, there is also memory consumed by the OSD itself, and
+generally some overhead due to memory fragmentation and other
+allocator overhead.
+
+The configured cache memory budget can be used in a few different ways:
+
+* Key/Value metadata (i.e., RocksDB's internal cache)
+* BlueStore metadata
+* BlueStore data (i.e., recently read or written object data)
+
+Cache memory usage is governed by the following options:
+``bluestore_cache_meta_ratio``, ``bluestore_cache_kv_ratio``, and
+``bluestore_cache_kv_max``. The fraction of the cache devoted to data
+is 1.0 minus the meta and kv ratios. The memory devoted to kv
+metadata (the RocksDB cache) is capped by ``bluestore_cache_kv_max``
+since our testing indicates there are diminishing returns beyond a
+certain point.
+
+``bluestore_cache_size``
+
+:Description: The amount of memory BlueStore will use for its cache. If zero, ``bluestore_cache_size_hdd`` or ``bluestore_cache_size_ssd`` will be used instead.
+:Type: Integer
+:Required: Yes
+:Default: ``0``
+
+``bluestore_cache_size_hdd``
+
+:Description: The default amount of memory BlueStore will use for its cache when backed by an HDD.
+:Type: Integer
+:Required: Yes
+:Default: ``1 * 1024 * 1024 * 1024`` (1 GB)
+
+``bluestore_cache_size_ssd``
+
+:Description: The default amount of memory BlueStore will use for its cache when backed by an SSD.
+:Type: Integer
+:Required: Yes
+:Default: ``3 * 1024 * 1024 * 1024`` (3 GB)
+
+``bluestore_cache_meta_ratio``
+
+:Description: The ratio of cache devoted to metadata.
+:Type: Floating point
+:Required: Yes
+:Default: ``.01``
+
+``bluestore_cache_kv_ratio``
+
+:Description: The ratio of cache devoted to key/value data (rocksdb).
+:Type: Floating point
+:Required: Yes
+:Default: ``.99``
+
+``bluestore_cache_kv_max``
+
+:Description: The maximum amount of cache devoted to key/value data (rocksdb).
+:Type: Floating point
+:Required: Yes
+:Default: ``512 * 1024*1024`` (512 MB)
+
+
+Checksums
+=========
+
+BlueStore checksums all metadata and data written to disk. Metadata
+checksumming is handled by RocksDB and uses `crc32c`. Data
+checksumming is done by BlueStore and can make use of `crc32c`,
+`xxhash32`, or `xxhash64`. The default is `crc32c` and should be
+suitable for most purposes.
+
+Full data checksumming does increase the amount of metadata that
+BlueStore must store and manage. When possible, e.g., when clients
+hint that data is written and read sequentially, BlueStore will
+checksum larger blocks, but in many cases it must store a checksum
+value (usually 4 bytes) for every 4 kilobyte block of data.
+
+It is possible to use a smaller checksum value by truncating the
+checksum to two or one byte, reducing the metadata overhead. The
+trade-off is that the probability that a random error will not be
+detected is higher with a smaller checksum, going from about one if
+four billion with a 32-bit (4 byte) checksum to one is 65,536 for a
+16-bit (2 byte) checksum or one in 256 for an 8-bit (1 byte) checksum.
+The smaller checksum values can be used by selecting `crc32c_16` or
+`crc32c_8` as the checksum algorithm.
+
+The *checksum algorithm* can be set either via a per-pool
+``csum_type`` property or the global config option. For example, ::
+
+ ceph osd pool set <pool-name> csum_type <algorithm>
+
+``bluestore_csum_type``
+
+:Description: The default checksum algorithm to use.
+:Type: String
+:Required: Yes
+:Valid Settings: ``none``, ``crc32c``, ``crc32c_16``, ``crc32c_8``, ``xxhash32``, ``xxhash64``
+:Default: ``crc32c``
+
+
+Inline Compression
+==================
+
+BlueStore supports inline compression using `snappy`, `zlib`, or
+`lz4`. Please note that the `lz4` compression plugin is not
+distributed in the official release.
+
+Whether data in BlueStore is compressed is determined by a combination
+of the *compression mode* and any hints associated with a write
+operation. The modes are:
+
+* **none**: Never compress data.
+* **passive**: Do not compress data unless the write operation as a
+ *compressible* hint set.
+* **aggressive**: Compress data unless the write operation as an
+ *incompressible* hint set.
+* **force**: Try to compress data no matter what.
+
+For more information about the *compressible* and *incompressible* IO
+hints, see :doc:`/api/librados/#rados_set_alloc_hint`.
+
+Note that regardless of the mode, if the size of the data chunk is not
+reduced sufficiently it will not be used and the original
+(uncompressed) data will be stored. For example, if the ``bluestore
+compression required ratio`` is set to ``.7`` then the compressed data
+must be 70% of the size of the original (or smaller).
+
+The *compression mode*, *compression algorithm*, *compression required
+ratio*, *min blob size*, and *max blob size* can be set either via a
+per-pool property or a global config option. Pool properties can be
+set with::
+
+ ceph osd pool set <pool-name> compression_algorithm <algorithm>
+ ceph osd pool set <pool-name> compression_mode <mode>
+ ceph osd pool set <pool-name> compression_required_ratio <ratio>
+ ceph osd pool set <pool-name> compression_min_blob_size <size>
+ ceph osd pool set <pool-name> compression_max_blob_size <size>
+
+``bluestore compression algorithm``
+
+:Description: The default compressor to use (if any) if the per-pool property
+ ``compression_algorithm`` is not set. Note that zstd is *not*
+ recommended for bluestore due to high CPU overhead when
+ compressing small amounts of data.
+:Type: String
+:Required: No
+:Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
+:Default: ``snappy``
+
+``bluestore compression mode``
+
+:Description: The default policy for using compression if the per-pool property
+ ``compression_mode`` is not set. ``none`` means never use
+ compression. ``passive`` means use compression when
+ `clients hint`_ that data is compressible. ``aggressive`` means
+ use compression unless clients hint that data is not compressible.
+ ``force`` means use compression under all circumstances even if
+ the clients hint that the data is not compressible.
+:Type: String
+:Required: No
+:Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
+:Default: ``none``
+
+``bluestore compression required ratio``
+
+:Description: The ratio of the size of the data chunk after
+ compression relative to the original size must be at
+ least this small in order to store the compressed
+ version.
+
+:Type: Floating point
+:Required: No
+:Default: .875
+
+``bluestore compression min blob size``
+
+:Description: Chunks smaller than this are never compressed.
+ The per-pool property ``compression_min_blob_size`` overrides
+ this setting.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 0
+
+``bluestore compression min blob size hdd``
+
+:Description: Default value of ``bluestore compression min blob size``
+ for rotational media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 128K
+
+``bluestore compression min blob size ssd``
+
+:Description: Default value of ``bluestore compression min blob size``
+ for non-rotational (solid state) media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 8K
+
+``bluestore compression max blob size``
+
+:Description: Chunks larger than this are broken into smaller blobs sizing
+ ``bluestore compression max blob size`` before being compressed.
+ The per-pool property ``compression_max_blob_size`` overrides
+ this setting.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 0
+
+``bluestore compression max blob size hdd``
+
+:Description: Default value of ``bluestore compression max blob size``
+ for rotational media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 512K
+
+``bluestore compression max blob size ssd``
+
+:Description: Default value of ``bluestore compression max blob size``
+ for non-rotational (solid state) media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 64K
+
+.. _clients hint: ../../api/librados/#rados_set_alloc_hint
diff --git a/src/ceph/doc/rados/configuration/ceph-conf.rst b/src/ceph/doc/rados/configuration/ceph-conf.rst
new file mode 100644
index 0000000..df88452
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/ceph-conf.rst
@@ -0,0 +1,629 @@
+==================
+ Configuring Ceph
+==================
+
+When you start the Ceph service, the initialization process activates a series
+of daemons that run in the background. A :term:`Ceph Storage Cluster` runs
+two types of daemons:
+
+- :term:`Ceph Monitor` (``ceph-mon``)
+- :term:`Ceph OSD Daemon` (``ceph-osd``)
+
+Ceph Storage Clusters that support the :term:`Ceph Filesystem` run at least one
+:term:`Ceph Metadata Server` (``ceph-mds``). Clusters that support :term:`Ceph
+Object Storage` run Ceph Gateway daemons (``radosgw``). For your convenience,
+each daemon has a series of default values (*i.e.*, many are set by
+``ceph/src/common/config_opts.h``). You may override these settings with a Ceph
+configuration file.
+
+
+.. _ceph-conf-file:
+
+The Configuration File
+======================
+
+When you start a Ceph Storage Cluster, each daemon looks for a Ceph
+configuration file (i.e., ``ceph.conf`` by default) that provides the cluster's
+configuration settings. For manual deployments, you need to create a Ceph
+configuration file. For tools that create configuration files for you (*e.g.*,
+``ceph-deploy``, Chef, etc.), you may use the information contained herein as a
+reference. The Ceph configuration file defines:
+
+- Cluster Identity
+- Authentication settings
+- Cluster membership
+- Host names
+- Host addresses
+- Paths to keyrings
+- Paths to journals
+- Paths to data
+- Other runtime options
+
+The default Ceph configuration file locations in sequential order include:
+
+#. ``$CEPH_CONF`` (*i.e.,* the path following the ``$CEPH_CONF``
+ environment variable)
+#. ``-c path/path`` (*i.e.,* the ``-c`` command line argument)
+#. ``/etc/ceph/ceph.conf``
+#. ``~/.ceph/config``
+#. ``./ceph.conf`` (*i.e.,* in the current working directory)
+
+
+The Ceph configuration file uses an *ini* style syntax. You can add comments
+by preceding comments with a pound sign (#) or a semi-colon (;). For example:
+
+.. code-block:: ini
+
+ # <--A number (#) sign precedes a comment.
+ ; A comment may be anything.
+ # Comments always follow a semi-colon (;) or a pound (#) on each line.
+ # The end of the line terminates a comment.
+ # We recommend that you provide comments in your configuration file(s).
+
+
+.. _ceph-conf-settings:
+
+Config Sections
+===============
+
+The configuration file can configure all Ceph daemons in a Ceph Storage Cluster,
+or all Ceph daemons of a particular type. To configure a series of daemons, the
+settings must be included under the processes that will receive the
+configuration as follows:
+
+``[global]``
+
+:Description: Settings under ``[global]`` affect all daemons in a Ceph Storage
+ Cluster.
+
+:Example: ``auth supported = cephx``
+
+``[osd]``
+
+:Description: Settings under ``[osd]`` affect all ``ceph-osd`` daemons in
+ the Ceph Storage Cluster, and override the same setting in
+ ``[global]``.
+
+:Example: ``osd journal size = 1000``
+
+``[mon]``
+
+:Description: Settings under ``[mon]`` affect all ``ceph-mon`` daemons in
+ the Ceph Storage Cluster, and override the same setting in
+ ``[global]``.
+
+:Example: ``mon addr = 10.0.0.101:6789``
+
+
+``[mds]``
+
+:Description: Settings under ``[mds]`` affect all ``ceph-mds`` daemons in
+ the Ceph Storage Cluster, and override the same setting in
+ ``[global]``.
+
+:Example: ``host = myserver01``
+
+``[client]``
+
+:Description: Settings under ``[client]`` affect all Ceph Clients
+ (e.g., mounted Ceph Filesystems, mounted Ceph Block Devices,
+ etc.).
+
+:Example: ``log file = /var/log/ceph/radosgw.log``
+
+
+Global settings affect all instances of all daemon in the Ceph Storage Cluster.
+Use the ``[global]`` setting for values that are common for all daemons in the
+Ceph Storage Cluster. You can override each ``[global]`` setting by:
+
+#. Changing the setting in a particular process type
+ (*e.g.,* ``[osd]``, ``[mon]``, ``[mds]`` ).
+
+#. Changing the setting in a particular process (*e.g.,* ``[osd.1]`` ).
+
+Overriding a global setting affects all child processes, except those that
+you specifically override in a particular daemon.
+
+A typical global setting involves activating authentication. For example:
+
+.. code-block:: ini
+
+ [global]
+ #Enable authentication between hosts within the cluster.
+ #v 0.54 and earlier
+ auth supported = cephx
+
+ #v 0.55 and after
+ auth cluster required = cephx
+ auth service required = cephx
+ auth client required = cephx
+
+
+You can specify settings that apply to a particular type of daemon. When you
+specify settings under ``[osd]``, ``[mon]`` or ``[mds]`` without specifying a
+particular instance, the setting will apply to all OSDs, monitors or metadata
+daemons respectively.
+
+A typical daemon-wide setting involves setting journal sizes, filestore
+settings, etc. For example:
+
+.. code-block:: ini
+
+ [osd]
+ osd journal size = 1000
+
+
+You may specify settings for particular instances of a daemon. You may specify
+an instance by entering its type, delimited by a period (.) and by the instance
+ID. The instance ID for a Ceph OSD Daemon is always numeric, but it may be
+alphanumeric for Ceph Monitors and Ceph Metadata Servers.
+
+.. code-block:: ini
+
+ [osd.1]
+ # settings affect osd.1 only.
+
+ [mon.a]
+ # settings affect mon.a only.
+
+ [mds.b]
+ # settings affect mds.b only.
+
+
+If the daemon you specify is a Ceph Gateway client, specify the daemon and the
+instance, delimited by a period (.). For example::
+
+ [client.radosgw.instance-name]
+ # settings affect client.radosgw.instance-name only.
+
+
+
+.. _ceph-metavariables:
+
+Metavariables
+=============
+
+Metavariables simplify Ceph Storage Cluster configuration dramatically. When a
+metavariable is set in a configuration value, Ceph expands the metavariable into
+a concrete value. Metavariables are very powerful when used within the
+``[global]``, ``[osd]``, ``[mon]``, ``[mds]`` or ``[client]`` sections of your
+configuration file. Ceph metavariables are similar to Bash shell expansion.
+
+Ceph supports the following metavariables:
+
+
+``$cluster``
+
+:Description: Expands to the Ceph Storage Cluster name. Useful when running
+ multiple Ceph Storage Clusters on the same hardware.
+
+:Example: ``/etc/ceph/$cluster.keyring``
+:Default: ``ceph``
+
+
+``$type``
+
+:Description: Expands to one of ``mds``, ``osd``, or ``mon``, depending on the
+ type of the instant daemon.
+
+:Example: ``/var/lib/ceph/$type``
+
+
+``$id``
+
+:Description: Expands to the daemon identifier. For ``osd.0``, this would be
+ ``0``; for ``mds.a``, it would be ``a``.
+
+:Example: ``/var/lib/ceph/$type/$cluster-$id``
+
+
+``$host``
+
+:Description: Expands to the host name of the instant daemon.
+
+
+``$name``
+
+:Description: Expands to ``$type.$id``.
+:Example: ``/var/run/ceph/$cluster-$name.asok``
+
+``$pid``
+
+:Description: Expands to daemon pid.
+:Example: ``/var/run/ceph/$cluster-$name-$pid.asok``
+
+
+.. _ceph-conf-common-settings:
+
+Common Settings
+===============
+
+The `Hardware Recommendations`_ section provides some hardware guidelines for
+configuring a Ceph Storage Cluster. It is possible for a single :term:`Ceph
+Node` to run multiple daemons. For example, a single node with multiple drives
+may run one ``ceph-osd`` for each drive. Ideally, you will have a node for a
+particular type of process. For example, some nodes may run ``ceph-osd``
+daemons, other nodes may run ``ceph-mds`` daemons, and still other nodes may
+run ``ceph-mon`` daemons.
+
+Each node has a name identified by the ``host`` setting. Monitors also specify
+a network address and port (i.e., domain name or IP address) identified by the
+``addr`` setting. A basic configuration file will typically specify only
+minimal settings for each instance of monitor daemons. For example:
+
+.. code-block:: ini
+
+ [global]
+ mon_initial_members = ceph1
+ mon_host = 10.0.0.1
+
+
+.. important:: The ``host`` setting is the short name of the node (i.e., not
+ an fqdn). It is **NOT** an IP address either. Enter ``hostname -s`` on
+ the command line to retrieve the name of the node. Do not use ``host``
+ settings for anything other than initial monitors unless you are deploying
+ Ceph manually. You **MUST NOT** specify ``host`` under individual daemons
+ when using deployment tools like ``chef`` or ``ceph-deploy``, as those tools
+ will enter the appropriate values for you in the cluster map.
+
+
+.. _ceph-network-config:
+
+Networks
+========
+
+See the `Network Configuration Reference`_ for a detailed discussion about
+configuring a network for use with Ceph.
+
+
+Monitors
+========
+
+Ceph production clusters typically deploy with a minimum 3 :term:`Ceph Monitor`
+daemons to ensure high availability should a monitor instance crash. At least
+three (3) monitors ensures that the Paxos algorithm can determine which version
+of the :term:`Ceph Cluster Map` is the most recent from a majority of Ceph
+Monitors in the quorum.
+
+.. note:: You may deploy Ceph with a single monitor, but if the instance fails,
+ the lack of other monitors may interrupt data service availability.
+
+Ceph Monitors typically listen on port ``6789``. For example:
+
+.. code-block:: ini
+
+ [mon.a]
+ host = hostName
+ mon addr = 150.140.130.120:6789
+
+By default, Ceph expects that you will store a monitor's data under the
+following path::
+
+ /var/lib/ceph/mon/$cluster-$id
+
+You or a deployment tool (e.g., ``ceph-deploy``) must create the corresponding
+directory. With metavariables fully expressed and a cluster named "ceph", the
+foregoing directory would evaluate to::
+
+ /var/lib/ceph/mon/ceph-a
+
+For additional details, see the `Monitor Config Reference`_.
+
+.. _Monitor Config Reference: ../mon-config-ref
+
+
+.. _ceph-osd-config:
+
+
+Authentication
+==============
+
+.. versionadded:: Bobtail 0.56
+
+For Bobtail (v 0.56) and beyond, you should expressly enable or disable
+authentication in the ``[global]`` section of your Ceph configuration file. ::
+
+ auth cluster required = cephx
+ auth service required = cephx
+ auth client required = cephx
+
+Additionally, you should enable message signing. See `Cephx Config Reference`_ for details.
+
+.. important:: When upgrading, we recommend expressly disabling authentication
+ first, then perform the upgrade. Once the upgrade is complete, re-enable
+ authentication.
+
+.. _Cephx Config Reference: ../auth-config-ref
+
+
+.. _ceph-monitor-config:
+
+
+OSDs
+====
+
+Ceph production clusters typically deploy :term:`Ceph OSD Daemons` where one node
+has one OSD daemon running a filestore on one storage drive. A typical
+deployment specifies a journal size. For example:
+
+.. code-block:: ini
+
+ [osd]
+ osd journal size = 10000
+
+ [osd.0]
+ host = {hostname} #manual deployments only.
+
+
+By default, Ceph expects that you will store a Ceph OSD Daemon's data with the
+following path::
+
+ /var/lib/ceph/osd/$cluster-$id
+
+You or a deployment tool (e.g., ``ceph-deploy``) must create the corresponding
+directory. With metavariables fully expressed and a cluster named "ceph", the
+foregoing directory would evaluate to::
+
+ /var/lib/ceph/osd/ceph-0
+
+You may override this path using the ``osd data`` setting. We don't recommend
+changing the default location. Create the default directory on your OSD host.
+
+::
+
+ ssh {osd-host}
+ sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
+
+The ``osd data`` path ideally leads to a mount point with a hard disk that is
+separate from the hard disk storing and running the operating system and
+daemons. If the OSD is for a disk other than the OS disk, prepare it for
+use with Ceph, and mount it to the directory you just created::
+
+ ssh {new-osd-host}
+ sudo mkfs -t {fstype} /dev/{disk}
+ sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
+
+We recommend using the ``xfs`` file system when running
+:command:`mkfs`. (``btrfs`` and ``ext4`` are not recommended and no
+longer tested.)
+
+See the `OSD Config Reference`_ for additional configuration details.
+
+
+Heartbeats
+==========
+
+During runtime operations, Ceph OSD Daemons check up on other Ceph OSD Daemons
+and report their findings to the Ceph Monitor. You do not have to provide any
+settings. However, if you have network latency issues, you may wish to modify
+the settings.
+
+See `Configuring Monitor/OSD Interaction`_ for additional details.
+
+
+.. _ceph-logging-and-debugging:
+
+Logs / Debugging
+================
+
+Sometimes you may encounter issues with Ceph that require
+modifying logging output and using Ceph's debugging. See `Debugging and
+Logging`_ for details on log rotation.
+
+.. _Debugging and Logging: ../../troubleshooting/log-and-debug
+
+
+Example ceph.conf
+=================
+
+.. literalinclude:: demo-ceph.conf
+ :language: ini
+
+.. _ceph-runtime-config:
+
+Runtime Changes
+===============
+
+Ceph allows you to make changes to the configuration of a ``ceph-osd``,
+``ceph-mon``, or ``ceph-mds`` daemon at runtime. This capability is quite
+useful for increasing/decreasing logging output, enabling/disabling debug
+settings, and even for runtime optimization. The following reflects runtime
+configuration usage::
+
+ ceph tell {daemon-type}.{id or *} injectargs --{name} {value} [--{name} {value}]
+
+Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply
+the runtime setting to all daemons of a particular type with ``*``, or specify
+a specific daemon's ID (i.e., its number or letter). For example, to increase
+debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following::
+
+ ceph tell osd.0 injectargs --debug-osd 20 --debug-ms 1
+
+In your ``ceph.conf`` file, you may use spaces when specifying a
+setting name. When specifying a setting name on the command line,
+ensure that you use an underscore or hyphen (``_`` or ``-``) between
+terms (e.g., ``debug osd`` becomes ``--debug-osd``).
+
+
+Viewing a Configuration at Runtime
+==================================
+
+If your Ceph Storage Cluster is running, and you would like to see the
+configuration settings from a running daemon, execute the following::
+
+ ceph daemon {daemon-type}.{id} config show | less
+
+If you are on a machine where osd.0 is running, the command would be::
+
+ ceph daemon osd.0 config show | less
+
+Reading Configuration Metadata at Runtime
+=========================================
+
+Information about the available configuration options is available via
+the ``config help`` command:
+
+::
+
+ ceph daemon {daemon-type}.{id} config help | less
+
+
+This metadata is primarily intended to be used when integrating other
+software with Ceph, such as graphical user interfaces. The output is
+a list of JSON objects, for example:
+
+::
+
+ {
+ "name": "mon_host",
+ "type": "std::string",
+ "level": "basic",
+ "desc": "list of hosts or addresses to search for a monitor",
+ "long_desc": "This is a comma, whitespace, or semicolon separated list of IP addresses or hostnames. Hostnames are resolved via DNS and all A or AAAA records are included in the search list.",
+ "default": "",
+ "daemon_default": "",
+ "tags": [],
+ "services": [
+ "common"
+ ],
+ "see_also": [],
+ "enum_values": [],
+ "min": "",
+ "max": ""
+ }
+
+type
+____
+
+The type of the setting, given as a C++ type name.
+
+level
+_____
+
+One of `basic`, `advanced`, `dev`. The `dev` options are not intended
+for use outside of development and testing.
+
+desc
+____
+
+A short description -- this is a sentence fragment suitable for display
+in small spaces like a single line in a list.
+
+long_desc
+_________
+
+A full description of what the setting does, this may be as long as needed.
+
+default
+_______
+
+The default value, if any.
+
+daemon_default
+______________
+
+An alternative default used for daemons (services) as opposed to clients.
+
+tags
+____
+
+A list of strings indicating topics to which this setting relates. Examples
+of tags are `performance` and `networking`.
+
+services
+________
+
+A list of strings indicating which Ceph services the setting relates to, such
+as `osd`, `mds`, `mon`. For settings that are relevant to any Ceph client
+or server, `common` is used.
+
+see_also
+________
+
+A list of strings indicating other configuration options that may also
+be of interest to a user setting this option.
+
+enum_values
+___________
+
+Optional: a list of strings indicating the valid settings.
+
+min, max
+________
+
+Optional: upper and lower (inclusive) bounds on valid settings.
+
+
+
+
+Running Multiple Clusters
+=========================
+
+With Ceph, you can run multiple Ceph Storage Clusters on the same hardware.
+Running multiple clusters provides a higher level of isolation compared to
+using different pools on the same cluster with different CRUSH rulesets. A
+separate cluster will have separate monitor, OSD and metadata server processes.
+When running Ceph with default settings, the default cluster name is ``ceph``,
+which means you would save your Ceph configuration file with the file name
+``ceph.conf`` in the ``/etc/ceph`` default directory.
+
+See `ceph-deploy new`_ for details.
+.. _ceph-deploy new:../ceph-deploy-new
+
+When you run multiple clusters, you must name your cluster and save the Ceph
+configuration file with the name of the cluster. For example, a cluster named
+``openstack`` will have a Ceph configuration file with the file name
+``openstack.conf`` in the ``/etc/ceph`` default directory.
+
+.. important:: Cluster names must consist of letters a-z and digits 0-9 only.
+
+Separate clusters imply separate data disks and journals, which are not shared
+between clusters. Referring to `Metavariables`_, the ``$cluster`` metavariable
+evaluates to the cluster name (i.e., ``openstack`` in the foregoing example).
+Various settings use the ``$cluster`` metavariable, including:
+
+- ``keyring``
+- ``admin socket``
+- ``log file``
+- ``pid file``
+- ``mon data``
+- ``mon cluster log file``
+- ``osd data``
+- ``osd journal``
+- ``mds data``
+- ``rgw data``
+
+See `General Settings`_, `OSD Settings`_, `Monitor Settings`_, `MDS Settings`_,
+`RGW Settings`_ and `Log Settings`_ for relevant path defaults that use the
+``$cluster`` metavariable.
+
+.. _General Settings: ../general-config-ref
+.. _OSD Settings: ../osd-config-ref
+.. _Monitor Settings: ../mon-config-ref
+.. _MDS Settings: ../../../cephfs/mds-config-ref
+.. _RGW Settings: ../../../radosgw/config-ref/
+.. _Log Settings: ../../troubleshooting/log-and-debug
+
+
+When creating default directories or files, you should use the cluster
+name at the appropriate places in the path. For example::
+
+ sudo mkdir /var/lib/ceph/osd/openstack-0
+ sudo mkdir /var/lib/ceph/mon/openstack-a
+
+.. important:: When running monitors on the same host, you should use
+ different ports. By default, monitors use port 6789. If you already
+ have monitors using port 6789, use a different port for your other cluster(s).
+
+To invoke a cluster other than the default ``ceph`` cluster, use the
+``-c {filename}.conf`` option with the ``ceph`` command. For example::
+
+ ceph -c {cluster-name}.conf health
+ ceph -c openstack.conf health
+
+
+.. _Hardware Recommendations: ../../../start/hardware-recommendations
+.. _Network Configuration Reference: ../network-config-ref
+.. _OSD Config Reference: ../osd-config-ref
+.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction
+.. _ceph-deploy new: ../../deployment/ceph-deploy-new#naming-a-cluster
diff --git a/src/ceph/doc/rados/configuration/demo-ceph.conf b/src/ceph/doc/rados/configuration/demo-ceph.conf
new file mode 100644
index 0000000..ba86d53
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/demo-ceph.conf
@@ -0,0 +1,31 @@
+[global]
+fsid = {cluster-id}
+mon initial members = {hostname}[, {hostname}]
+mon host = {ip-address}[, {ip-address}]
+
+#All clusters have a front-side public network.
+#If you have two NICs, you can configure a back side cluster
+#network for OSD object replication, heart beats, backfilling,
+#recovery, etc.
+public network = {network}[, {network}]
+#cluster network = {network}[, {network}]
+
+#Clusters require authentication by default.
+auth cluster required = cephx
+auth service required = cephx
+auth client required = cephx
+
+#Choose reasonable numbers for your journals, number of replicas
+#and placement groups.
+osd journal size = {n}
+osd pool default size = {n} # Write an object n times.
+osd pool default min size = {n} # Allow writing n copy in a degraded state.
+osd pool default pg num = {n}
+osd pool default pgp num = {n}
+
+#Choose a reasonable crush leaf type.
+#0 for a 1-node cluster.
+#1 for a multi node cluster in a single rack
+#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
+#3 for a multi node cluster with hosts across racks, etc.
+osd crush chooseleaf type = {n} \ No newline at end of file
diff --git a/src/ceph/doc/rados/configuration/filestore-config-ref.rst b/src/ceph/doc/rados/configuration/filestore-config-ref.rst
new file mode 100644
index 0000000..4dff60c
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/filestore-config-ref.rst
@@ -0,0 +1,365 @@
+============================
+ Filestore Config Reference
+============================
+
+
+``filestore debug omap check``
+
+:Description: Debugging check on synchronization. Expensive. For debugging only.
+:Type: Boolean
+:Required: No
+:Default: ``0``
+
+
+.. index:: filestore; extended attributes
+
+Extended Attributes
+===================
+
+Extended Attributes (XATTRs) are an important aspect in your configuration.
+Some file systems have limits on the number of bytes stored in XATTRS.
+Additionally, in some cases, the filesystem may not be as fast as an alternative
+method of storing XATTRs. The following settings may help improve performance
+by using a method of storing XATTRs that is extrinsic to the underlying filesystem.
+
+Ceph XATTRs are stored as ``inline xattr``, using the XATTRs provided
+by the underlying file system, if it does not impose a size limit. If
+there is a size limit (4KB total on ext4, for instance), some Ceph
+XATTRs will be stored in an key/value database when either the
+``filestore max inline xattr size`` or ``filestore max inline
+xattrs`` threshold is reached.
+
+
+``filestore max inline xattr size``
+
+:Description: The maximimum size of an XATTR stored in the filesystem (i.e., XFS,
+ btrfs, ext4, etc.) per object. Should not be larger than the
+ filesytem can handle. Default value of 0 means to use the value
+ specific to the underlying filesystem.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``filestore max inline xattr size xfs``
+
+:Description: The maximimum size of an XATTR stored in the XFS filesystem.
+ Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``65536``
+
+
+``filestore max inline xattr size btrfs``
+
+:Description: The maximimum size of an XATTR stored in the btrfs filesystem.
+ Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``2048``
+
+
+``filestore max inline xattr size other``
+
+:Description: The maximimum size of an XATTR stored in other filesystems.
+ Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``512``
+
+
+``filestore max inline xattrs``
+
+:Description: The maximum number of XATTRs stored in the filesystem per object.
+ Default value of 0 means to use the value specific to the
+ underlying filesystem.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``filestore max inline xattrs xfs``
+
+:Description: The maximum number of XATTRs stored in the XFS filesystem per object.
+ Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore max inline xattrs btrfs``
+
+:Description: The maximum number of XATTRs stored in the btrfs filesystem per object.
+ Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore max inline xattrs other``
+
+:Description: The maximum number of XATTRs stored in other filesystems per object.
+ Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``2``
+
+.. index:: filestore; synchronization
+
+Synchronization Intervals
+=========================
+
+Periodically, the filestore needs to quiesce writes and synchronize the
+filesystem, which creates a consistent commit point. It can then free journal
+entries up to the commit point. Synchronizing more frequently tends to reduce
+the time required to perform synchronization, and reduces the amount of data
+that needs to remain in the journal. Less frequent synchronization allows the
+backing filesystem to coalesce small writes and metadata updates more
+optimally--potentially resulting in more efficient synchronization.
+
+
+``filestore max sync interval``
+
+:Description: The maximum interval in seconds for synchronizing the filestore.
+:Type: Double
+:Required: No
+:Default: ``5``
+
+
+``filestore min sync interval``
+
+:Description: The minimum interval in seconds for synchronizing the filestore.
+:Type: Double
+:Required: No
+:Default: ``.01``
+
+
+.. index:: filestore; flusher
+
+Flusher
+=======
+
+The filestore flusher forces data from large writes to be written out using
+``sync file range`` before the sync in order to (hopefully) reduce the cost of
+the eventual sync. In practice, disabling 'filestore flusher' seems to improve
+performance in some cases.
+
+
+``filestore flusher``
+
+:Description: Enables the filestore flusher.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+.. deprecated:: v.65
+
+``filestore flusher max fds``
+
+:Description: Sets the maximum number of file descriptors for the flusher.
+:Type: Integer
+:Required: No
+:Default: ``512``
+
+.. deprecated:: v.65
+
+``filestore sync flush``
+
+:Description: Enables the synchronization flusher.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+.. deprecated:: v.65
+
+``filestore fsync flushes journal data``
+
+:Description: Flush journal data during filesystem synchronization.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+.. index:: filestore; queue
+
+Queue
+=====
+
+The following settings provide limits on the size of filestore queue.
+
+``filestore queue max ops``
+
+:Description: Defines the maximum number of in progress operations the file store accepts before blocking on queuing new operations.
+:Type: Integer
+:Required: No. Minimal impact on performance.
+:Default: ``50``
+
+
+``filestore queue max bytes``
+
+:Description: The maximum number of bytes for an operation.
+:Type: Integer
+:Required: No
+:Default: ``100 << 20``
+
+
+
+
+.. index:: filestore; timeouts
+
+Timeouts
+========
+
+
+``filestore op threads``
+
+:Description: The number of filesystem operation threads that execute in parallel.
+:Type: Integer
+:Required: No
+:Default: ``2``
+
+
+``filestore op thread timeout``
+
+:Description: The timeout for a filesystem operation thread (in seconds).
+:Type: Integer
+:Required: No
+:Default: ``60``
+
+
+``filestore op thread suicide timeout``
+
+:Description: The timeout for a commit operation before cancelling the commit (in seconds).
+:Type: Integer
+:Required: No
+:Default: ``180``
+
+
+.. index:: filestore; btrfs
+
+B-Tree Filesystem
+=================
+
+
+``filestore btrfs snap``
+
+:Description: Enable snapshots for a ``btrfs`` filestore.
+:Type: Boolean
+:Required: No. Only used for ``btrfs``.
+:Default: ``true``
+
+
+``filestore btrfs clone range``
+
+:Description: Enable cloning ranges for a ``btrfs`` filestore.
+:Type: Boolean
+:Required: No. Only used for ``btrfs``.
+:Default: ``true``
+
+
+.. index:: filestore; journal
+
+Journal
+=======
+
+
+``filestore journal parallel``
+
+:Description: Enables parallel journaling, default for btrfs.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore journal writeahead``
+
+:Description: Enables writeahead journaling, default for xfs.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore journal trailing``
+
+:Description: Deprecated, never use.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+Misc
+====
+
+
+``filestore merge threshold``
+
+:Description: Min number of files in a subdir before merging into parent
+ NOTE: A negative value means to disable subdir merging
+:Type: Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore split multiple``
+
+:Description: ``(filestore_split_multiple * abs(filestore_merge_threshold) + (rand() % filestore_split_rand_factor)) * 16``
+ is the maximum number of files in a subdirectory before
+ splitting into child directories.
+
+:Type: Integer
+:Required: No
+:Default: ``2``
+
+
+``filestore split rand factor``
+
+:Description: A random factor added to the split threshold to avoid
+ too many filestore splits occurring at once. See
+ ``filestore split multiple`` for details.
+ This can only be changed for an existing osd offline,
+ via ceph-objectstore-tool's apply-layout-settings command.
+
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``20``
+
+
+``filestore update to``
+
+:Description: Limits filestore auto upgrade to specified version.
+:Type: Integer
+:Required: No
+:Default: ``1000``
+
+
+``filestore blackhole``
+
+:Description: Drop any new transactions on the floor.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore dump file``
+
+:Description: File onto which store transaction dumps.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore kill at``
+
+:Description: inject a failure at the n'th opportunity
+:Type: String
+:Required: No
+:Default: ``false``
+
+
+``filestore fail eio``
+
+:Description: Fail/Crash on eio.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
diff --git a/src/ceph/doc/rados/configuration/general-config-ref.rst b/src/ceph/doc/rados/configuration/general-config-ref.rst
new file mode 100644
index 0000000..ca09ee5
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/general-config-ref.rst
@@ -0,0 +1,66 @@
+==========================
+ General Config Reference
+==========================
+
+
+``fsid``
+
+:Description: The filesystem ID. One per cluster.
+:Type: UUID
+:Required: No.
+:Default: N/A. Usually generated by deployment tools.
+
+
+``admin socket``
+
+:Description: The socket for executing administrative commands on a daemon,
+ irrespective of whether Ceph Monitors have established a quorum.
+
+:Type: String
+:Required: No
+:Default: ``/var/run/ceph/$cluster-$name.asok``
+
+
+``pid file``
+
+:Description: The file in which the mon, osd or mds will write its
+ PID. For instance, ``/var/run/$cluster/$type.$id.pid``
+ will create /var/run/ceph/mon.a.pid for the ``mon`` with
+ id ``a`` running in the ``ceph`` cluster. The ``pid
+ file`` is removed when the daemon stops gracefully. If
+ the process is not daemonized (i.e. runs with the ``-f``
+ or ``-d`` option), the ``pid file`` is not created.
+:Type: String
+:Required: No
+:Default: No
+
+
+``chdir``
+
+:Description: The directory Ceph daemons change to once they are
+ up and running. Default ``/`` directory recommended.
+
+:Type: String
+:Required: No
+:Default: ``/``
+
+
+``max open files``
+
+:Description: If set, when the :term:`Ceph Storage Cluster` starts, Ceph sets
+ the ``max open fds`` at the OS level (i.e., the max # of file
+ descriptors). It helps prevents Ceph OSD Daemons from running out
+ of file descriptors.
+
+:Type: 64-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``fatal signal handlers``
+
+:Description: If set, we will install signal handlers for SEGV, ABRT, BUS, ILL,
+ FPE, XCPU, XFSZ, SYS signals to generate a useful log message
+
+:Type: Boolean
+:Default: ``true``
diff --git a/src/ceph/doc/rados/configuration/index.rst b/src/ceph/doc/rados/configuration/index.rst
new file mode 100644
index 0000000..48b58ef
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/index.rst
@@ -0,0 +1,64 @@
+===============
+ Configuration
+===============
+
+Ceph can run with a cluster containing thousands of Object Storage Devices
+(OSDs). A minimal system will have at least two OSDs for data replication. To
+configure OSD clusters, you must provide settings in the configuration file.
+Ceph provides default values for many settings, which you can override in the
+configuration file. Additionally, you can make runtime modification to the
+configuration using command-line utilities.
+
+When Ceph starts, it activates three daemons:
+
+- ``ceph-mon`` (mandatory)
+- ``ceph-osd`` (mandatory)
+- ``ceph-mds`` (mandatory for cephfs only)
+
+Each process, daemon or utility loads the host's configuration file. A process
+may have information about more than one daemon instance (*i.e.,* multiple
+contexts). A daemon or utility only has information about a single daemon
+instance (a single context).
+
+.. note:: Ceph can run on a single host for evaluation purposes.
+
+
+.. raw:: html
+
+ <table cellpadding="10"><colgroup><col width="50%"><col width="50%"></colgroup><tbody valign="top"><tr><td><h3>Configuring the Object Store</h3>
+
+For general object store configuration, refer to the following:
+
+.. toctree::
+ :maxdepth: 1
+
+ Storage devices <storage-devices>
+ ceph-conf
+
+
+.. raw:: html
+
+ </td><td><h3>Reference</h3>
+
+To optimize the performance of your cluster, refer to the following:
+
+.. toctree::
+ :maxdepth: 1
+
+ Network Settings <network-config-ref>
+ Auth Settings <auth-config-ref>
+ Monitor Settings <mon-config-ref>
+ mon-lookup-dns
+ Heartbeat Settings <mon-osd-interaction>
+ OSD Settings <osd-config-ref>
+ BlueStore Settings <bluestore-config-ref>
+ FileStore Settings <filestore-config-ref>
+ Journal Settings <journal-ref>
+ Pool, PG & CRUSH Settings <pool-pg-config-ref.rst>
+ Messaging Settings <ms-ref>
+ General Settings <general-config-ref>
+
+
+.. raw:: html
+
+ </td></tr></tbody></table>
diff --git a/src/ceph/doc/rados/configuration/journal-ref.rst b/src/ceph/doc/rados/configuration/journal-ref.rst
new file mode 100644
index 0000000..97300f4
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/journal-ref.rst
@@ -0,0 +1,116 @@
+==========================
+ Journal Config Reference
+==========================
+
+.. index:: journal; journal configuration
+
+Ceph OSDs use a journal for two reasons: speed and consistency.
+
+- **Speed:** The journal enables the Ceph OSD Daemon to commit small writes
+ quickly. Ceph writes small, random i/o to the journal sequentially, which
+ tends to speed up bursty workloads by allowing the backing filesystem more
+ time to coalesce writes. The Ceph OSD Daemon's journal, however, can lead
+ to spiky performance with short spurts of high-speed writes followed by
+ periods without any write progress as the filesystem catches up to the
+ journal.
+
+- **Consistency:** Ceph OSD Daemons require a filesystem interface that
+ guarantees atomic compound operations. Ceph OSD Daemons write a description
+ of the operation to the journal and apply the operation to the filesystem.
+ This enables atomic updates to an object (for example, placement group
+ metadata). Every few seconds--between ``filestore max sync interval`` and
+ ``filestore min sync interval``--the Ceph OSD Daemon stops writes and
+ synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to
+ trim operations from the journal and reuse the space. On failure, Ceph
+ OSD Daemons replay the journal starting after the last synchronization
+ operation.
+
+Ceph OSD Daemons support the following journal settings:
+
+
+``journal dio``
+
+:Description: Enables direct i/o to the journal. Requires ``journal block
+ align`` set to ``true``.
+
+:Type: Boolean
+:Required: Yes when using ``aio``.
+:Default: ``true``
+
+
+
+``journal aio``
+
+.. versionchanged:: 0.61 Cuttlefish
+
+:Description: Enables using ``libaio`` for asynchronous writes to the journal.
+ Requires ``journal dio`` set to ``true``.
+
+:Type: Boolean
+:Required: No.
+:Default: Version 0.61 and later, ``true``. Version 0.60 and earlier, ``false``.
+
+
+``journal block align``
+
+:Description: Block aligns write operations. Required for ``dio`` and ``aio``.
+:Type: Boolean
+:Required: Yes when using ``dio`` and ``aio``.
+:Default: ``true``
+
+
+``journal max write bytes``
+
+:Description: The maximum number of bytes the journal will write at
+ any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``10 << 20``
+
+
+``journal max write entries``
+
+:Description: The maximum number of entries the journal will write at
+ any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``100``
+
+
+``journal queue max ops``
+
+:Description: The maximum number of operations allowed in the queue at
+ any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``500``
+
+
+``journal queue max bytes``
+
+:Description: The maximum number of bytes allowed in the queue at
+ any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``10 << 20``
+
+
+``journal align min size``
+
+:Description: Align data payloads greater than the specified minimum.
+:Type: Integer
+:Required: No
+:Default: ``64 << 10``
+
+
+``journal zero on create``
+
+:Description: Causes the file store to overwrite the entire journal with
+ ``0``'s during ``mkfs``.
+:Type: Boolean
+:Required: No
+:Default: ``false``
diff --git a/src/ceph/doc/rados/configuration/mon-config-ref.rst b/src/ceph/doc/rados/configuration/mon-config-ref.rst
new file mode 100644
index 0000000..6c8e92b
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-config-ref.rst
@@ -0,0 +1,1222 @@
+==========================
+ Monitor Config Reference
+==========================
+
+Understanding how to configure a :term:`Ceph Monitor` is an important part of
+building a reliable :term:`Ceph Storage Cluster`. **All Ceph Storage Clusters
+have at least one monitor**. A monitor configuration usually remains fairly
+consistent, but you can add, remove or replace a monitor in a cluster. See
+`Adding/Removing a Monitor`_ and `Add/Remove a Monitor (ceph-deploy)`_ for
+details.
+
+
+.. index:: Ceph Monitor; Paxos
+
+Background
+==========
+
+Ceph Monitors maintain a "master copy" of the :term:`cluster map`, which means a
+:term:`Ceph Client` can determine the location of all Ceph Monitors, Ceph OSD
+Daemons, and Ceph Metadata Servers just by connecting to one Ceph Monitor and
+retrieving a current cluster map. Before Ceph Clients can read from or write to
+Ceph OSD Daemons or Ceph Metadata Servers, they must connect to a Ceph Monitor
+first. With a current copy of the cluster map and the CRUSH algorithm, a Ceph
+Client can compute the location for any object. The ability to compute object
+locations allows a Ceph Client to talk directly to Ceph OSD Daemons, which is a
+very important aspect of Ceph's high scalability and performance. See
+`Scalability and High Availability`_ for additional details.
+
+The primary role of the Ceph Monitor is to maintain a master copy of the cluster
+map. Ceph Monitors also provide authentication and logging services. Ceph
+Monitors write all changes in the monitor services to a single Paxos instance,
+and Paxos writes the changes to a key/value store for strong consistency. Ceph
+Monitors can query the most recent version of the cluster map during sync
+operations. Ceph Monitors leverage the key/value store's snapshots and iterators
+(using leveldb) to perform store-wide synchronization.
+
+.. ditaa::
+
+ /-------------\ /-------------\
+ | Monitor | Write Changes | Paxos |
+ | cCCC +-------------->+ cCCC |
+ | | | |
+ +-------------+ \------+------/
+ | Auth | |
+ +-------------+ | Write Changes
+ | Log | |
+ +-------------+ v
+ | Monitor Map | /------+------\
+ +-------------+ | Key / Value |
+ | OSD Map | | Store |
+ +-------------+ | cCCC |
+ | PG Map | \------+------/
+ +-------------+ ^
+ | MDS Map | | Read Changes
+ +-------------+ |
+ | cCCC |*---------------------+
+ \-------------/
+
+
+.. deprecated:: version 0.58
+
+In Ceph versions 0.58 and earlier, Ceph Monitors use a Paxos instance for
+each service and store the map as a file.
+
+.. index:: Ceph Monitor; cluster map
+
+Cluster Maps
+------------
+
+The cluster map is a composite of maps, including the monitor map, the OSD map,
+the placement group map and the metadata server map. The cluster map tracks a
+number of important things: which processes are ``in`` the Ceph Storage Cluster;
+which processes that are ``in`` the Ceph Storage Cluster are ``up`` and running
+or ``down``; whether, the placement groups are ``active`` or ``inactive``, and
+``clean`` or in some other state; and, other details that reflect the current
+state of the cluster such as the total amount of storage space, and the amount
+of storage used.
+
+When there is a significant change in the state of the cluster--e.g., a Ceph OSD
+Daemon goes down, a placement group falls into a degraded state, etc.--the
+cluster map gets updated to reflect the current state of the cluster.
+Additionally, the Ceph Monitor also maintains a history of the prior states of
+the cluster. The monitor map, OSD map, placement group map and metadata server
+map each maintain a history of their map versions. We call each version an
+"epoch."
+
+When operating your Ceph Storage Cluster, keeping track of these states is an
+important part of your system administration duties. See `Monitoring a Cluster`_
+and `Monitoring OSDs and PGs`_ for additional details.
+
+.. index:: high availability; quorum
+
+Monitor Quorum
+--------------
+
+Our Configuring ceph section provides a trivial `Ceph configuration file`_ that
+provides for one monitor in the test cluster. A cluster will run fine with a
+single monitor; however, **a single monitor is a single-point-of-failure**. To
+ensure high availability in a production Ceph Storage Cluster, you should run
+Ceph with multiple monitors so that the failure of a single monitor **WILL NOT**
+bring down your entire cluster.
+
+When a Ceph Storage Cluster runs multiple Ceph Monitors for high availability,
+Ceph Monitors use `Paxos`_ to establish consensus about the master cluster map.
+A consensus requires a majority of monitors running to establish a quorum for
+consensus about the cluster map (e.g., 1; 2 out of 3; 3 out of 5; 4 out of 6;
+etc.).
+
+``mon force quorum join``
+
+:Description: Force monitor to join quorum even if it has been previously removed from the map
+:Type: Boolean
+:Default: ``False``
+
+.. index:: Ceph Monitor; consistency
+
+Consistency
+-----------
+
+When you add monitor settings to your Ceph configuration file, you need to be
+aware of some of the architectural aspects of Ceph Monitors. **Ceph imposes
+strict consistency requirements** for a Ceph monitor when discovering another
+Ceph Monitor within the cluster. Whereas, Ceph Clients and other Ceph daemons
+use the Ceph configuration file to discover monitors, monitors discover each
+other using the monitor map (monmap), not the Ceph configuration file.
+
+A Ceph Monitor always refers to the local copy of the monmap when discovering
+other Ceph Monitors in the Ceph Storage Cluster. Using the monmap instead of the
+Ceph configuration file avoids errors that could break the cluster (e.g., typos
+in ``ceph.conf`` when specifying a monitor address or port). Since monitors use
+monmaps for discovery and they share monmaps with clients and other Ceph
+daemons, **the monmap provides monitors with a strict guarantee that their
+consensus is valid.**
+
+Strict consistency also applies to updates to the monmap. As with any other
+updates on the Ceph Monitor, changes to the monmap always run through a
+distributed consensus algorithm called `Paxos`_. The Ceph Monitors must agree on
+each update to the monmap, such as adding or removing a Ceph Monitor, to ensure
+that each monitor in the quorum has the same version of the monmap. Updates to
+the monmap are incremental so that Ceph Monitors have the latest agreed upon
+version, and a set of previous versions. Maintaining a history enables a Ceph
+Monitor that has an older version of the monmap to catch up with the current
+state of the Ceph Storage Cluster.
+
+If Ceph Monitors discovered each other through the Ceph configuration file
+instead of through the monmap, it would introduce additional risks because the
+Ceph configuration files are not updated and distributed automatically. Ceph
+Monitors might inadvertently use an older Ceph configuration file, fail to
+recognize a Ceph Monitor, fall out of a quorum, or develop a situation where
+`Paxos`_ is not able to determine the current state of the system accurately.
+
+
+.. index:: Ceph Monitor; bootstrapping monitors
+
+Bootstrapping Monitors
+----------------------
+
+In most configuration and deployment cases, tools that deploy Ceph may help
+bootstrap the Ceph Monitors by generating a monitor map for you (e.g.,
+``ceph-deploy``, etc). A Ceph Monitor requires a few explicit
+settings:
+
+- **Filesystem ID**: The ``fsid`` is the unique identifier for your
+ object store. Since you can run multiple clusters on the same
+ hardware, you must specify the unique ID of the object store when
+ bootstrapping a monitor. Deployment tools usually do this for you
+ (e.g., ``ceph-deploy`` can call a tool like ``uuidgen``), but you
+ may specify the ``fsid`` manually too.
+
+- **Monitor ID**: A monitor ID is a unique ID assigned to each monitor within
+ the cluster. It is an alphanumeric value, and by convention the identifier
+ usually follows an alphabetical increment (e.g., ``a``, ``b``, etc.). This
+ can be set in a Ceph configuration file (e.g., ``[mon.a]``, ``[mon.b]``, etc.),
+ by a deployment tool, or using the ``ceph`` commandline.
+
+- **Keys**: The monitor must have secret keys. A deployment tool such as
+ ``ceph-deploy`` usually does this for you, but you may
+ perform this step manually too. See `Monitor Keyrings`_ for details.
+
+For additional details on bootstrapping, see `Bootstrapping a Monitor`_.
+
+.. index:: Ceph Monitor; configuring monitors
+
+Configuring Monitors
+====================
+
+To apply configuration settings to the entire cluster, enter the configuration
+settings under ``[global]``. To apply configuration settings to all monitors in
+your cluster, enter the configuration settings under ``[mon]``. To apply
+configuration settings to specific monitors, specify the monitor instance
+(e.g., ``[mon.a]``). By convention, monitor instance names use alpha notation.
+
+.. code-block:: ini
+
+ [global]
+
+ [mon]
+
+ [mon.a]
+
+ [mon.b]
+
+ [mon.c]
+
+
+Minimum Configuration
+---------------------
+
+The bare minimum monitor settings for a Ceph monitor via the Ceph configuration
+file include a hostname and a monitor address for each monitor. You can configure
+these under ``[mon]`` or under the entry for a specific monitor.
+
+.. code-block:: ini
+
+ [mon]
+ mon host = hostname1,hostname2,hostname3
+ mon addr = 10.0.0.10:6789,10.0.0.11:6789,10.0.0.12:6789
+
+
+.. code-block:: ini
+
+ [mon.a]
+ host = hostname1
+ mon addr = 10.0.0.10:6789
+
+See the `Network Configuration Reference`_ for details.
+
+.. note:: This minimum configuration for monitors assumes that a deployment
+ tool generates the ``fsid`` and the ``mon.`` key for you.
+
+Once you deploy a Ceph cluster, you **SHOULD NOT** change the IP address of
+the monitors. However, if you decide to change the monitor's IP address, you
+must follow a specific procedure. See `Changing a Monitor's IP Address`_ for
+details.
+
+Monitors can also be found by clients using DNS SRV records. See `Monitor lookup through DNS`_ for details.
+
+Cluster ID
+----------
+
+Each Ceph Storage Cluster has a unique identifier (``fsid``). If specified, it
+usually appears under the ``[global]`` section of the configuration file.
+Deployment tools usually generate the ``fsid`` and store it in the monitor map,
+so the value may not appear in a configuration file. The ``fsid`` makes it
+possible to run daemons for multiple clusters on the same hardware.
+
+``fsid``
+
+:Description: The cluster ID. One per cluster.
+:Type: UUID
+:Required: Yes.
+:Default: N/A. May be generated by a deployment tool if not specified.
+
+.. note:: Do not set this value if you use a deployment tool that does
+ it for you.
+
+
+.. index:: Ceph Monitor; initial members
+
+Initial Members
+---------------
+
+We recommend running a production Ceph Storage Cluster with at least three Ceph
+Monitors to ensure high availability. When you run multiple monitors, you may
+specify the initial monitors that must be members of the cluster in order to
+establish a quorum. This may reduce the time it takes for your cluster to come
+online.
+
+.. code-block:: ini
+
+ [mon]
+ mon initial members = a,b,c
+
+
+``mon initial members``
+
+:Description: The IDs of initial monitors in a cluster during startup. If
+ specified, Ceph requires an odd number of monitors to form an
+ initial quorum (e.g., 3).
+
+:Type: String
+:Default: None
+
+.. note:: A *majority* of monitors in your cluster must be able to reach
+ each other in order to establish a quorum. You can decrease the initial
+ number of monitors to establish a quorum with this setting.
+
+.. index:: Ceph Monitor; data path
+
+Data
+----
+
+Ceph provides a default path where Ceph Monitors store data. For optimal
+performance in a production Ceph Storage Cluster, we recommend running Ceph
+Monitors on separate hosts and drives from Ceph OSD Daemons. As leveldb is using
+``mmap()`` for writing the data, Ceph Monitors flush their data from memory to disk
+very often, which can interfere with Ceph OSD Daemon workloads if the data
+store is co-located with the OSD Daemons.
+
+In Ceph versions 0.58 and earlier, Ceph Monitors store their data in files. This
+approach allows users to inspect monitor data with common tools like ``ls``
+and ``cat``. However, it doesn't provide strong consistency.
+
+In Ceph versions 0.59 and later, Ceph Monitors store their data as key/value
+pairs. Ceph Monitors require `ACID`_ transactions. Using a data store prevents
+recovering Ceph Monitors from running corrupted versions through Paxos, and it
+enables multiple modification operations in one single atomic batch, among other
+advantages.
+
+Generally, we do not recommend changing the default data location. If you modify
+the default location, we recommend that you make it uniform across Ceph Monitors
+by setting it in the ``[mon]`` section of the configuration file.
+
+
+``mon data``
+
+:Description: The monitor's data location.
+:Type: String
+:Default: ``/var/lib/ceph/mon/$cluster-$id``
+
+
+``mon data size warn``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log when the monitor's data
+ store goes over 15GB.
+:Type: Integer
+:Default: 15*1024*1024*1024*
+
+
+``mon data avail warn``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log when the available disk
+ space of monitor's data store is lower or equal to this
+ percentage.
+:Type: Integer
+:Default: 30
+
+
+``mon data avail crit``
+
+:Description: Issue a ``HEALTH_ERR`` in cluster log when the available disk
+ space of monitor's data store is lower or equal to this
+ percentage.
+:Type: Integer
+:Default: 5
+
+
+``mon warn on cache pools without hit sets``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if a cache pool does not
+ have the hitset type set set.
+ See `hit set type <../operations/pools#hit-set-type>`_ for more
+ details.
+:Type: Boolean
+:Default: True
+
+
+``mon warn on crush straw calc version zero``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the CRUSH's
+ ``straw_calc_version`` is zero. See
+ `CRUSH map tunables <../operations/crush-map#tunables>`_ for
+ details.
+:Type: Boolean
+:Default: True
+
+
+``mon warn on legacy crush tunables``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if
+ CRUSH tunables are too old (older than ``mon_min_crush_required_version``)
+:Type: Boolean
+:Default: True
+
+
+``mon crush min required version``
+
+:Description: The minimum tunable profile version required by the cluster.
+ See
+ `CRUSH map tunables <../operations/crush-map#tunables>`_ for
+ details.
+:Type: String
+:Default: ``firefly``
+
+
+``mon warn on osd down out interval zero``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if
+ ``mon osd down out interval`` is zero. Having this option set to
+ zero on the leader acts much like the ``noout`` flag. It's hard
+ to figure out what's going wrong with clusters witout the
+ ``noout`` flag set but acting like that just the same, so we
+ report a warning in this case.
+:Type: Boolean
+:Default: True
+
+
+``mon cache target full warn ratio``
+
+:Description: Position between pool's ``cache_target_full`` and
+ ``target_max_object`` where we start warning
+:Type: Float
+:Default: ``0.66``
+
+
+``mon health data update interval``
+
+:Description: How often (in seconds) the monitor in quorum shares its health
+ status with its peers. (negative number disables it)
+:Type: Float
+:Default: ``60``
+
+
+``mon health to clog``
+
+:Description: Enable sending health summary to cluster log periodically.
+:Type: Boolean
+:Default: True
+
+
+``mon health to clog tick interval``
+
+:Description: How often (in seconds) the monitor send health summary to cluster
+ log (a non-positive number disables it). If current health summary
+ is empty or identical to the last time, monitor will not send it
+ to cluster log.
+:Type: Integer
+:Default: 3600
+
+
+``mon health to clog interval``
+
+:Description: How often (in seconds) the monitor send health summary to cluster
+ log (a non-positive number disables it). Monitor will always
+ send the summary to cluster log no matter if the summary changes
+ or not.
+:Type: Integer
+:Default: 60
+
+
+
+.. index:: Ceph Storage Cluster; capacity planning, Ceph Monitor; capacity planning
+
+Storage Capacity
+----------------
+
+When a Ceph Storage Cluster gets close to its maximum capacity (i.e., ``mon osd
+full ratio``), Ceph prevents you from writing to or reading from Ceph OSD
+Daemons as a safety measure to prevent data loss. Therefore, letting a
+production Ceph Storage Cluster approach its full ratio is not a good practice,
+because it sacrifices high availability. The default full ratio is ``.95``, or
+95% of capacity. This a very aggressive setting for a test cluster with a small
+number of OSDs.
+
+.. tip:: When monitoring your cluster, be alert to warnings related to the
+ ``nearfull`` ratio. This means that a failure of some OSDs could result
+ in a temporary service disruption if one or more OSDs fails. Consider adding
+ more OSDs to increase storage capacity.
+
+A common scenario for test clusters involves a system administrator removing a
+Ceph OSD Daemon from the Ceph Storage Cluster to watch the cluster rebalance;
+then, removing another Ceph OSD Daemon, and so on until the Ceph Storage Cluster
+eventually reaches the full ratio and locks up. We recommend a bit of capacity
+planning even with a test cluster. Planning enables you to gauge how much spare
+capacity you will need in order to maintain high availability. Ideally, you want
+to plan for a series of Ceph OSD Daemon failures where the cluster can recover
+to an ``active + clean`` state without replacing those Ceph OSD Daemons
+immediately. You can run a cluster in an ``active + degraded`` state, but this
+is not ideal for normal operating conditions.
+
+The following diagram depicts a simplistic Ceph Storage Cluster containing 33
+Ceph Nodes with one Ceph OSD Daemon per host, each Ceph OSD Daemon reading from
+and writing to a 3TB drive. So this exemplary Ceph Storage Cluster has a maximum
+actual capacity of 99TB. With a ``mon osd full ratio`` of ``0.95``, if the Ceph
+Storage Cluster falls to 5TB of remaining capacity, the cluster will not allow
+Ceph Clients to read and write data. So the Ceph Storage Cluster's operating
+capacity is 95TB, not 99TB.
+
+.. ditaa::
+
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | Rack 1 | | Rack 2 | | Rack 3 | | Rack 4 | | Rack 5 | | Rack 6 |
+ | cCCC | | cF00 | | cCCC | | cCCC | | cCCC | | cCCC |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 1 | | OSD 7 | | OSD 13 | | OSD 19 | | OSD 25 | | OSD 31 |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 2 | | OSD 8 | | OSD 14 | | OSD 20 | | OSD 26 | | OSD 32 |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 3 | | OSD 9 | | OSD 15 | | OSD 21 | | OSD 27 | | OSD 33 |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 4 | | OSD 10 | | OSD 16 | | OSD 22 | | OSD 28 | | Spare |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 5 | | OSD 11 | | OSD 17 | | OSD 23 | | OSD 29 | | Spare |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+ | OSD 6 | | OSD 12 | | OSD 18 | | OSD 24 | | OSD 30 | | Spare |
+ +--------+ +--------+ +--------+ +--------+ +--------+ +--------+
+
+It is normal in such a cluster for one or two OSDs to fail. A less frequent but
+reasonable scenario involves a rack's router or power supply failing, which
+brings down multiple OSDs simultaneously (e.g., OSDs 7-12). In such a scenario,
+you should still strive for a cluster that can remain operational and achieve an
+``active + clean`` state--even if that means adding a few hosts with additional
+OSDs in short order. If your capacity utilization is too high, you may not lose
+data, but you could still sacrifice data availability while resolving an outage
+within a failure domain if capacity utilization of the cluster exceeds the full
+ratio. For this reason, we recommend at least some rough capacity planning.
+
+Identify two numbers for your cluster:
+
+#. The number of OSDs.
+#. The total capacity of the cluster
+
+If you divide the total capacity of your cluster by the number of OSDs in your
+cluster, you will find the mean average capacity of an OSD within your cluster.
+Consider multiplying that number by the number of OSDs you expect will fail
+simultaneously during normal operations (a relatively small number). Finally
+multiply the capacity of the cluster by the full ratio to arrive at a maximum
+operating capacity; then, subtract the number of amount of data from the OSDs
+you expect to fail to arrive at a reasonable full ratio. Repeat the foregoing
+process with a higher number of OSD failures (e.g., a rack of OSDs) to arrive at
+a reasonable number for a near full ratio.
+
+.. code-block:: ini
+
+ [global]
+
+ mon osd full ratio = .80
+ mon osd backfillfull ratio = .75
+ mon osd nearfull ratio = .70
+
+
+``mon osd full ratio``
+
+:Description: The percentage of disk space used before an OSD is
+ considered ``full``.
+
+:Type: Float
+:Default: ``.95``
+
+
+``mon osd backfillfull ratio``
+
+:Description: The percentage of disk space used before an OSD is
+ considered too ``full`` to backfill.
+
+:Type: Float
+:Default: ``.90``
+
+
+``mon osd nearfull ratio``
+
+:Description: The percentage of disk space used before an OSD is
+ considered ``nearfull``.
+
+:Type: Float
+:Default: ``.85``
+
+
+.. tip:: If some OSDs are nearfull, but others have plenty of capacity, you
+ may have a problem with the CRUSH weight for the nearfull OSDs.
+
+.. index:: heartbeat
+
+Heartbeat
+---------
+
+Ceph monitors know about the cluster by requiring reports from each OSD, and by
+receiving reports from OSDs about the status of their neighboring OSDs. Ceph
+provides reasonable default settings for monitor/OSD interaction; however, you
+may modify them as needed. See `Monitor/OSD Interaction`_ for details.
+
+
+.. index:: Ceph Monitor; leader, Ceph Monitor; provider, Ceph Monitor; requester, Ceph Monitor; synchronization
+
+Monitor Store Synchronization
+-----------------------------
+
+When you run a production cluster with multiple monitors (recommended), each
+monitor checks to see if a neighboring monitor has a more recent version of the
+cluster map (e.g., a map in a neighboring monitor with one or more epoch numbers
+higher than the most current epoch in the map of the instant monitor).
+Periodically, one monitor in the cluster may fall behind the other monitors to
+the point where it must leave the quorum, synchronize to retrieve the most
+current information about the cluster, and then rejoin the quorum. For the
+purposes of synchronization, monitors may assume one of three roles:
+
+#. **Leader**: The `Leader` is the first monitor to achieve the most recent
+ Paxos version of the cluster map.
+
+#. **Provider**: The `Provider` is a monitor that has the most recent version
+ of the cluster map, but wasn't the first to achieve the most recent version.
+
+#. **Requester:** A `Requester` is a monitor that has fallen behind the leader
+ and must synchronize in order to retrieve the most recent information about
+ the cluster before it can rejoin the quorum.
+
+These roles enable a leader to delegate synchronization duties to a provider,
+which prevents synchronization requests from overloading the leader--improving
+performance. In the following diagram, the requester has learned that it has
+fallen behind the other monitors. The requester asks the leader to synchronize,
+and the leader tells the requester to synchronize with a provider.
+
+
+.. ditaa:: +-----------+ +---------+ +----------+
+ | Requester | | Leader | | Provider |
+ +-----------+ +---------+ +----------+
+ | | |
+ | | |
+ | Ask to Synchronize | |
+ |------------------->| |
+ | | |
+ |<-------------------| |
+ | Tell Requester to | |
+ | Sync with Provider | |
+ | | |
+ | Synchronize |
+ |--------------------+-------------------->|
+ | | |
+ |<-------------------+---------------------|
+ | Send Chunk to Requester |
+ | (repeat as necessary) |
+ | Requester Acks Chuck to Provider |
+ |--------------------+-------------------->|
+ | |
+ | Sync Complete |
+ | Notification |
+ |------------------->|
+ | |
+ |<-------------------|
+ | Ack |
+ | |
+
+
+Synchronization always occurs when a new monitor joins the cluster. During
+runtime operations, monitors may receive updates to the cluster map at different
+times. This means the leader and provider roles may migrate from one monitor to
+another. If this happens while synchronizing (e.g., a provider falls behind the
+leader), the provider can terminate synchronization with a requester.
+
+Once synchronization is complete, Ceph requires trimming across the cluster.
+Trimming requires that the placement groups are ``active + clean``.
+
+
+``mon sync trim timeout``
+
+:Description:
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync heartbeat timeout``
+
+:Description:
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync heartbeat interval``
+
+:Description:
+:Type: Double
+:Default: ``5.0``
+
+
+``mon sync backoff timeout``
+
+:Description:
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync timeout``
+
+:Description: Number of seconds the monitor will wait for the next update
+ message from its sync provider before it gives up and bootstrap
+ again.
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync max retries``
+
+:Description:
+:Type: Integer
+:Default: ``5``
+
+
+``mon sync max payload size``
+
+:Description: The maximum size for a sync payload (in bytes).
+:Type: 32-bit Integer
+:Default: ``1045676``
+
+
+``paxos max join drift``
+
+:Description: The maximum Paxos iterations before we must first sync the
+ monitor data stores. When a monitor finds that its peer is too
+ far ahead of it, it will first sync with data stores before moving
+ on.
+:Type: Integer
+:Default: ``10``
+
+``paxos stash full interval``
+
+:Description: How often (in commits) to stash a full copy of the PaxosService state.
+ Current this setting only affects ``mds``, ``mon``, ``auth`` and ``mgr``
+ PaxosServices.
+:Type: Integer
+:Default: 25
+
+``paxos propose interval``
+
+:Description: Gather updates for this time interval before proposing
+ a map update.
+:Type: Double
+:Default: ``1.0``
+
+
+``paxos min``
+
+:Description: The minimum number of paxos states to keep around
+:Type: Integer
+:Default: 500
+
+
+``paxos min wait``
+
+:Description: The minimum amount of time to gather updates after a period of
+ inactivity.
+:Type: Double
+:Default: ``0.05``
+
+
+``paxos trim min``
+
+:Description: Number of extra proposals tolerated before trimming
+:Type: Integer
+:Default: 250
+
+
+``paxos trim max``
+
+:Description: The maximum number of extra proposals to trim at a time
+:Type: Integer
+:Default: 500
+
+
+``paxos service trim min``
+
+:Description: The minimum amount of versions to trigger a trim (0 disables it)
+:Type: Integer
+:Default: 250
+
+
+``paxos service trim max``
+
+:Description: The maximum amount of versions to trim during a single proposal (0 disables it)
+:Type: Integer
+:Default: 500
+
+
+``mon max log epochs``
+
+:Description: The maximum amount of log epochs to trim during a single proposal
+:Type: Integer
+:Default: 500
+
+
+``mon max pgmap epochs``
+
+:Description: The maximum amount of pgmap epochs to trim during a single proposal
+:Type: Integer
+:Default: 500
+
+
+``mon mds force trim to``
+
+:Description: Force monitor to trim mdsmaps to this point (0 disables it.
+ dangerous, use with care)
+:Type: Integer
+:Default: 0
+
+
+``mon osd force trim to``
+
+:Description: Force monitor to trim osdmaps to this point, even if there is
+ PGs not clean at the specified epoch (0 disables it. dangerous,
+ use with care)
+:Type: Integer
+:Default: 0
+
+``mon osd cache size``
+
+:Description: The size of osdmaps cache, not to rely on underlying store's cache
+:Type: Integer
+:Default: 10
+
+
+``mon election timeout``
+
+:Description: On election proposer, maximum waiting time for all ACKs in seconds.
+:Type: Float
+:Default: ``5``
+
+
+``mon lease``
+
+:Description: The length (in seconds) of the lease on the monitor's versions.
+:Type: Float
+:Default: ``5``
+
+
+``mon lease renew interval factor``
+
+:Description: ``mon lease`` \* ``mon lease renew interval factor`` will be the
+ interval for the Leader to renew the other monitor's leases. The
+ factor should be less than ``1.0``.
+:Type: Float
+:Default: ``0.6``
+
+
+``mon lease ack timeout factor``
+
+:Description: The Leader will wait ``mon lease`` \* ``mon lease ack timeout factor``
+ for the Providers to acknowledge the lease extension.
+:Type: Float
+:Default: ``2.0``
+
+
+``mon accept timeout factor``
+
+:Description: The Leader will wait ``mon lease`` \* ``mon accept timeout factor``
+ for the Requester(s) to accept a Paxos update. It is also used
+ during the Paxos recovery phase for similar purposes.
+:Type: Float
+:Default: ``2.0``
+
+
+``mon min osdmap epochs``
+
+:Description: Minimum number of OSD map epochs to keep at all times.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``mon max pgmap epochs``
+
+:Description: Maximum number of PG map epochs the monitor should keep.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``mon max log epochs``
+
+:Description: Maximum number of Log epochs the monitor should keep.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+
+.. index:: Ceph Monitor; clock
+
+Clock
+-----
+
+Ceph daemons pass critical messages to each other, which must be processed
+before daemons reach a timeout threshold. If the clocks in Ceph monitors
+are not synchronized, it can lead to a number of anomalies. For example:
+
+- Daemons ignoring received messages (e.g., timestamps outdated)
+- Timeouts triggered too soon/late when a message wasn't received in time.
+
+See `Monitor Store Synchronization`_ for details.
+
+
+.. tip:: You SHOULD install NTP on your Ceph monitor hosts to
+ ensure that the monitor cluster operates with synchronized clocks.
+
+Clock drift may still be noticeable with NTP even though the discrepancy is not
+yet harmful. Ceph's clock drift / clock skew warnings may get triggered even
+though NTP maintains a reasonable level of synchronization. Increasing your
+clock drift may be tolerable under such circumstances; however, a number of
+factors such as workload, network latency, configuring overrides to default
+timeouts and the `Monitor Store Synchronization`_ settings may influence
+the level of acceptable clock drift without compromising Paxos guarantees.
+
+Ceph provides the following tunable options to allow you to find
+acceptable values.
+
+
+``clock offset``
+
+:Description: How much to offset the system clock. See ``Clock.cc`` for details.
+:Type: Double
+:Default: ``0``
+
+
+.. deprecated:: 0.58
+
+``mon tick interval``
+
+:Description: A monitor's tick interval in seconds.
+:Type: 32-bit Integer
+:Default: ``5``
+
+
+``mon clock drift allowed``
+
+:Description: The clock drift in seconds allowed between monitors.
+:Type: Float
+:Default: ``.050``
+
+
+``mon clock drift warn backoff``
+
+:Description: Exponential backoff for clock drift warnings
+:Type: Float
+:Default: ``5``
+
+
+``mon timecheck interval``
+
+:Description: The time check interval (clock drift check) in seconds
+ for the Leader.
+
+:Type: Float
+:Default: ``300.0``
+
+
+``mon timecheck skew interval``
+
+:Description: The time check interval (clock drift check) in seconds when in
+ presence of a skew in seconds for the Leader.
+:Type: Float
+:Default: ``30.0``
+
+
+Client
+------
+
+``mon client hunt interval``
+
+:Description: The client will try a new monitor every ``N`` seconds until it
+ establishes a connection.
+
+:Type: Double
+:Default: ``3.0``
+
+
+``mon client ping interval``
+
+:Description: The client will ping the monitor every ``N`` seconds.
+:Type: Double
+:Default: ``10.0``
+
+
+``mon client max log entries per message``
+
+:Description: The maximum number of log entries a monitor will generate
+ per client message.
+
+:Type: Integer
+:Default: ``1000``
+
+
+``mon client bytes``
+
+:Description: The amount of client message data allowed in memory (in bytes).
+:Type: 64-bit Integer Unsigned
+:Default: ``100ul << 20``
+
+
+Pool settings
+=============
+Since version v0.94 there is support for pool flags which allow or disallow changes to be made to pools.
+
+Monitors can also disallow removal of pools if configured that way.
+
+``mon allow pool delete``
+
+:Description: If the monitors should allow pools to be removed. Regardless of what the pool flags say.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag hashpspool``
+
+:Description: Set the hashpspool flag on new pools
+:Type: Boolean
+:Default: ``true``
+
+``osd pool default flag nodelete``
+
+:Description: Set the nodelete flag on new pools. Prevents allow pool removal with this flag in any way.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag nopgchange``
+
+:Description: Set the nopgchange flag on new pools. Does not allow the number of PGs to be changed for a pool.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag nosizechange``
+
+:Description: Set the nosizechange flag on new pools. Does not allow the size to be changed of pool.
+:Type: Boolean
+:Default: ``false``
+
+For more information about the pool flags see `Pool values`_.
+
+Miscellaneous
+=============
+
+
+``mon max osd``
+
+:Description: The maximum number of OSDs allowed in the cluster.
+:Type: 32-bit Integer
+:Default: ``10000``
+
+``mon globalid prealloc``
+
+:Description: The number of global IDs to pre-allocate for clients and daemons in the cluster.
+:Type: 32-bit Integer
+:Default: ``100``
+
+``mon subscribe interval``
+
+:Description: The refresh interval (in seconds) for subscriptions. The
+ subscription mechanism enables obtaining the cluster maps
+ and log information.
+
+:Type: Double
+:Default: ``300``
+
+
+``mon stat smooth intervals``
+
+:Description: Ceph will smooth statistics over the last ``N`` PG maps.
+:Type: Integer
+:Default: ``2``
+
+
+``mon probe timeout``
+
+:Description: Number of seconds the monitor will wait to find peers before bootstrapping.
+:Type: Double
+:Default: ``2.0``
+
+
+``mon daemon bytes``
+
+:Description: The message memory cap for metadata server and OSD messages (in bytes).
+:Type: 64-bit Integer Unsigned
+:Default: ``400ul << 20``
+
+
+``mon max log entries per event``
+
+:Description: The maximum number of log entries per event.
+:Type: Integer
+:Default: ``4096``
+
+
+``mon osd prime pg temp``
+
+:Description: Enables or disable priming the PGMap with the previous OSDs when an out
+ OSD comes back into the cluster. With the ``true`` setting the clients
+ will continue to use the previous OSDs until the newly in OSDs as that
+ PG peered.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd prime pg temp max time``
+
+:Description: How much time in seconds the monitor should spend trying to prime the
+ PGMap when an out OSD comes back into the cluster.
+:Type: Float
+:Default: ``0.5``
+
+
+``mon osd prime pg temp max time estimate``
+
+:Description: Maximum estimate of time spent on each PG before we prime all PGs
+ in parallel.
+:Type: Float
+:Default: ``0.25``
+
+
+``mon osd allow primary affinity``
+
+:Description: allow ``primary_affinity`` to be set in the osdmap.
+:Type: Boolean
+:Default: False
+
+
+``mon osd pool ec fast read``
+
+:Description: Whether turn on fast read on the pool or not. It will be used as
+ the default setting of newly created erasure pools if ``fast_read``
+ is not specified at create time.
+:Type: Boolean
+:Default: False
+
+
+``mon mds skip sanity``
+
+:Description: Skip safety assertions on FSMap (in case of bugs where we want to
+ continue anyway). Monitor terminates if the FSMap sanity check
+ fails, but we can disable it by enabling this option.
+:Type: Boolean
+:Default: False
+
+
+``mon max mdsmap epochs``
+
+:Description: The maximum amount of mdsmap epochs to trim during a single proposal.
+:Type: Integer
+:Default: 500
+
+
+``mon config key max entry size``
+
+:Description: The maximum size of config-key entry (in bytes)
+:Type: Integer
+:Default: 4096
+
+
+``mon scrub interval``
+
+:Description: How often (in seconds) the monitor scrub its store by comparing
+ the stored checksums with the computed ones of all the stored
+ keys.
+:Type: Integer
+:Default: 3600*24
+
+
+``mon scrub max keys``
+
+:Description: The maximum number of keys to scrub each time.
+:Type: Integer
+:Default: 100
+
+
+``mon compact on start``
+
+:Description: Compact the database used as Ceph Monitor store on
+ ``ceph-mon`` start. A manual compaction helps to shrink the
+ monitor database and improve the performance of it if the regular
+ compaction fails to work.
+:Type: Boolean
+:Default: False
+
+
+``mon compact on bootstrap``
+
+:Description: Compact the database used as Ceph Monitor store on
+ on bootstrap. Monitor starts probing each other for creating
+ a quorum after bootstrap. If it times out before joining the
+ quorum, it will start over and bootstrap itself again.
+:Type: Boolean
+:Default: False
+
+
+``mon compact on trim``
+
+:Description: Compact a certain prefix (including paxos) when we trim its old states.
+:Type: Boolean
+:Default: True
+
+
+``mon cpu threads``
+
+:Description: Number of threads for performing CPU intensive work on monitor.
+:Type: Boolean
+:Default: True
+
+
+``mon osd mapping pgs per chunk``
+
+:Description: We calculate the mapping from placement group to OSDs in chunks.
+ This option specifies the number of placement groups per chunk.
+:Type: Integer
+:Default: 4096
+
+
+``mon osd max split count``
+
+:Description: Largest number of PGs per "involved" OSD to let split create.
+ When we increase the ``pg_num`` of a pool, the placement groups
+ will be splitted on all OSDs serving that pool. We want to avoid
+ extreme multipliers on PG splits.
+:Type: Integer
+:Default: 300
+
+
+``mon session timeout``
+
+:Description: Monitor will terminate inactive sessions stay idle over this
+ time limit.
+:Type: Integer
+:Default: 300
+
+
+
+.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)
+.. _Monitor Keyrings: ../../../dev/mon-bootstrap#secret-keys
+.. _Ceph configuration file: ../ceph-conf/#monitors
+.. _Network Configuration Reference: ../network-config-ref
+.. _Monitor lookup through DNS: ../mon-lookup-dns
+.. _ACID: http://en.wikipedia.org/wiki/ACID
+.. _Adding/Removing a Monitor: ../../operations/add-or-rm-mons
+.. _Add/Remove a Monitor (ceph-deploy): ../../deployment/ceph-deploy-mon
+.. _Monitoring a Cluster: ../../operations/monitoring
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg
+.. _Bootstrapping a Monitor: ../../../dev/mon-bootstrap
+.. _Changing a Monitor's IP Address: ../../operations/add-or-rm-mons#changing-a-monitor-s-ip-address
+.. _Monitor/OSD Interaction: ../mon-osd-interaction
+.. _Scalability and High Availability: ../../../architecture#scalability-and-high-availability
+.. _Pool values: ../../operations/pools/#set-pool-values
diff --git a/src/ceph/doc/rados/configuration/mon-lookup-dns.rst b/src/ceph/doc/rados/configuration/mon-lookup-dns.rst
new file mode 100644
index 0000000..e32b320
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-lookup-dns.rst
@@ -0,0 +1,51 @@
+===============================
+Looking op Monitors through DNS
+===============================
+
+Since version 11.0.0 RADOS supports looking up Monitors through DNS.
+
+This way daemons and clients do not require a *mon host* configuration directive in their ceph.conf configuration file.
+
+Using DNS SRV TCP records clients are able to look up the monitors.
+
+This allows for less configuration on clients and monitors. Using a DNS update clients and daemons can be made aware of changes in the monitor topology.
+
+By default clients and daemons will look for the TCP service called *ceph-mon* which is configured by the *mon_dns_srv_name* configuration directive.
+
+
+``mon dns srv name``
+
+:Description: the service name used querying the DNS for the monitor hosts/addresses
+:Type: String
+:Default: ``ceph-mon``
+
+Example
+-------
+When the DNS search domain is set to *example.com* a DNS zone file might contain the following elements.
+
+First, create records for the Monitors, either IPv4 (A) or IPv6 (AAAA).
+
+::
+
+ mon1.example.com. AAAA 2001:db8::100
+ mon2.example.com. AAAA 2001:db8::200
+ mon3.example.com. AAAA 2001:db8::300
+
+::
+
+ mon1.example.com. A 192.168.0.1
+ mon2.example.com. A 192.168.0.2
+ mon3.example.com. A 192.168.0.3
+
+
+With those records now existing we can create the SRV TCP records with the name *ceph-mon* pointing to the three Monitors.
+
+::
+
+ _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon1.example.com.
+ _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon2.example.com.
+ _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon3.example.com.
+
+In this case the Monitors are running on port *6789*, and their priority and weight are all *10* and *60* respectively.
+
+The current implementation in clients and daemons will *only* respect the priority set in SRV records, and they will only connect to the monitors with lowest-numbered priority. The targets with the same priority will be selected at random.
diff --git a/src/ceph/doc/rados/configuration/mon-osd-interaction.rst b/src/ceph/doc/rados/configuration/mon-osd-interaction.rst
new file mode 100644
index 0000000..e335ff0
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-osd-interaction.rst
@@ -0,0 +1,408 @@
+=====================================
+ Configuring Monitor/OSD Interaction
+=====================================
+
+.. index:: heartbeat
+
+After you have completed your initial Ceph configuration, you may deploy and run
+Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
+:term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
+Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
+reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
+OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
+Monitor doesn't receive reports, or if it receives reports of changes in the
+Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
+Cluster Map`.
+
+Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
+interaction. However, you may override the defaults. The following sections
+describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
+monitoring the Ceph Storage Cluster.
+
+.. index:: heartbeat interval
+
+OSDs Check Heartbeats
+=====================
+
+Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
+seconds. You can change the heartbeat interval by adding an ``osd heartbeat
+interval`` setting under the ``[osd]`` section of your Ceph configuration file,
+or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
+show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
+consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
+Monitor, which will update the Ceph Cluster Map. You may change this grace
+period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
+and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
+or by setting the value at runtime.
+
+
+.. ditaa:: +---------+ +---------+
+ | OSD 1 | | OSD 2 |
+ +---------+ +---------+
+ | |
+ |----+ Heartbeat |
+ | | Interval |
+ |<---+ Exceeded |
+ | |
+ | Check |
+ | Heartbeat |
+ |------------------->|
+ | |
+ |<-------------------|
+ | Heart Beating |
+ | |
+ |----+ Heartbeat |
+ | | Interval |
+ |<---+ Exceeded |
+ | |
+ | Check |
+ | Heartbeat |
+ |------------------->|
+ | |
+ |----+ Grace |
+ | | Period |
+ |<---+ Exceeded |
+ | |
+ |----+ Mark |
+ | | OSD 2 |
+ |<---+ Down |
+
+
+.. index:: OSD down report
+
+OSDs Report Down OSDs
+=====================
+
+By default, two Ceph OSD Daemons from different hosts must report to the Ceph
+Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
+acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
+that all the OSDs reporting the failure are hosted in a rack with a bad switch
+which has trouble connecting to another OSD. To avoid this sort of false alarm,
+we consider the peers reporting a failure a proxy for a potential "subcluster"
+over the overall cluster that is similarly laggy. This is clearly not true in
+all cases, but will sometimes help us localize the grace correction to a subset
+of the system that is unhappy. ``mon osd reporter subtree level`` is used to
+group the peers into the "subcluster" by their common ancestor type in CRUSH
+map. By default, only two reports from different subtree are required to report
+another Ceph OSD Daemon ``down``. You can change the number of reporters from
+unique subtrees and the common ancestor type required to report a Ceph OSD
+Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
+and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
+your Ceph configuration file, or by setting the value at runtime.
+
+
+.. ditaa:: +---------+ +---------+ +---------+
+ | OSD 1 | | OSD 2 | | Monitor |
+ +---------+ +---------+ +---------+
+ | | |
+ | OSD 3 Is Down | |
+ |---------------+--------------->|
+ | | |
+ | | |
+ | | OSD 3 Is Down |
+ | |--------------->|
+ | | |
+ | | |
+ | | |---------+ Mark
+ | | | | OSD 3
+ | | |<--------+ Down
+
+
+.. index:: peering failure
+
+OSDs Report Peering Failure
+===========================
+
+If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
+Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
+the most recent copy of the cluster map every 30 seconds. You can change the
+Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
+setting under the ``[osd]`` section of your Ceph configuration file, or by
+setting the value at runtime.
+
+.. ditaa:: +---------+ +---------+ +-------+ +---------+
+ | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
+ +---------+ +---------+ +-------+ +---------+
+ | | | |
+ | Request To | | |
+ | Peer | | |
+ |-------------->| | |
+ |<--------------| | |
+ | Peering | |
+ | | |
+ | Request To | |
+ | Peer | |
+ |----------------------------->| |
+ | |
+ |----+ OSD Monitor |
+ | | Heartbeat |
+ |<---+ Interval Exceeded |
+ | |
+ | Failed to Peer with OSD 3 |
+ |-------------------------------------------->|
+ |<--------------------------------------------|
+ | Receive New Cluster Map |
+
+
+.. index:: OSD status
+
+OSDs Report Their Status
+========================
+
+If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
+consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
+elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
+event such as a failure, a change in placement group stats, a change in
+``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
+Daemon minimum report interval by adding an ``osd mon report interval min``
+setting under the ``[osd]`` section of your Ceph configuration file, or by
+setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
+Monitor every 120 seconds irrespective of whether any notable changes occur.
+You can change the Ceph Monitor report interval by adding an ``osd mon report
+interval max`` setting under the ``[osd]`` section of your Ceph configuration
+file, or by setting the value at runtime.
+
+
+.. ditaa:: +---------+ +---------+
+ | OSD 1 | | Monitor |
+ +---------+ +---------+
+ | |
+ |----+ Report Min |
+ | | Interval |
+ |<---+ Exceeded |
+ | |
+ |----+ Reportable |
+ | | Event |
+ |<---+ Occurs |
+ | |
+ | Report To |
+ | Monitor |
+ |------------------->|
+ | |
+ |----+ Report Max |
+ | | Interval |
+ |<---+ Exceeded |
+ | |
+ | Report To |
+ | Monitor |
+ |------------------->|
+ | |
+ |----+ Monitor |
+ | | Fails |
+ |<---+ |
+ +----+ Monitor OSD
+ | | Report Timeout
+ |<---+ Exceeded
+ |
+ +----+ Mark
+ | | OSD 1
+ |<---+ Down
+
+
+
+
+Configuration Settings
+======================
+
+When modifying heartbeat settings, you should include them in the ``[global]``
+section of your configuration file.
+
+.. index:: monitor heartbeat
+
+Monitor Settings
+----------------
+
+``mon osd min up ratio``
+
+:Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
+ mark Ceph OSD Daemons ``down``.
+
+:Type: Double
+:Default: ``.3``
+
+
+``mon osd min in ratio``
+
+:Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
+ mark Ceph OSD Daemons ``out``.
+
+:Type: Double
+:Default: ``.75``
+
+
+``mon osd laggy halflife``
+
+:Description: The number of seconds laggy estimates will decay.
+:Type: Integer
+:Default: ``60*60``
+
+
+``mon osd laggy weight``
+
+:Description: The weight for new samples in laggy estimation decay.
+:Type: Double
+:Default: ``0.3``
+
+
+
+``mon osd laggy max interval``
+
+:Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
+ Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
+ a certain OSD. This value will be used to calculate the grace time for
+ that OSD.
+:Type: Integer
+:Default: 300
+
+``mon osd adjust heartbeat grace``
+
+:Description: If set to ``true``, Ceph will scale based on laggy estimations.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd adjust down out interval``
+
+:Description: If set to ``true``, Ceph will scaled based on laggy estimations.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd auto mark in``
+
+:Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
+ the Ceph Storage Cluster.
+
+:Type: Boolean
+:Default: ``false``
+
+
+``mon osd auto mark auto out in``
+
+:Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
+ of the Ceph Storage Cluster as ``in`` the cluster.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd auto mark new in``
+
+:Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
+ Ceph Storage Cluster.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd down out interval``
+
+:Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
+ ``down`` and ``out`` if it doesn't respond.
+
+:Type: 32-bit Integer
+:Default: ``600``
+
+
+``mon osd down out subtree limit``
+
+:Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
+ automatically mark out. For instance, if set to ``host`` and if
+ all OSDs of a host are down, Ceph will not automatically mark out
+ these OSDs.
+
+:Type: String
+:Default: ``rack``
+
+
+``mon osd report timeout``
+
+:Description: The grace period in seconds before declaring
+ unresponsive Ceph OSD Daemons ``down``.
+
+:Type: 32-bit Integer
+:Default: ``900``
+
+``mon osd min down reporters``
+
+:Description: The minimum number of Ceph OSD Daemons required to report a
+ ``down`` Ceph OSD Daemon.
+
+:Type: 32-bit Integer
+:Default: ``2``
+
+
+``mon osd reporter subtree level``
+
+:Description: In which level of parent bucket the reporters are counted. The OSDs
+ send failure reports to monitor if they find its peer is not responsive.
+ And monitor mark the reported OSD out and then down after a grace period.
+:Type: String
+:Default: ``host``
+
+
+.. index:: OSD hearbeat
+
+OSD Settings
+------------
+
+``osd heartbeat address``
+
+:Description: An Ceph OSD Daemon's network address for heartbeats.
+:Type: Address
+:Default: The host address.
+
+
+``osd heartbeat interval``
+
+:Description: How often an Ceph OSD Daemon pings its peers (in seconds).
+:Type: 32-bit Integer
+:Default: ``6``
+
+
+``osd heartbeat grace``
+
+:Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
+ that the Ceph Storage Cluster considers it ``down``.
+ This setting has to be set in both the [mon] and [osd] or [global]
+ section so that it is read by both the MON and OSD daemons.
+:Type: 32-bit Integer
+:Default: ``20``
+
+
+``osd mon heartbeat interval``
+
+:Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
+ Ceph OSD Daemon peers.
+
+:Type: 32-bit Integer
+:Default: ``30``
+
+
+``osd mon report interval max``
+
+:Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
+ it must report to a Ceph Monitor.
+
+:Type: 32-bit Integer
+:Default: ``120``
+
+
+``osd mon report interval min``
+
+:Description: The minimum number of seconds a Ceph OSD Daemon may wait
+ from startup or another reportable event before reporting
+ to a Ceph Monitor.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: Should be less than ``osd mon report interval max``
+
+
+``osd mon ack timeout``
+
+:Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
+ request for statistics.
+
+:Type: 32-bit Integer
+:Default: ``30``
diff --git a/src/ceph/doc/rados/configuration/ms-ref.rst b/src/ceph/doc/rados/configuration/ms-ref.rst
new file mode 100644
index 0000000..55d009e
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/ms-ref.rst
@@ -0,0 +1,154 @@
+===========
+ Messaging
+===========
+
+General Settings
+================
+
+``ms tcp nodelay``
+
+:Description: Disables nagle's algorithm on messenger tcp sessions.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``ms initial backoff``
+
+:Description: The initial time to wait before reconnecting on a fault.
+:Type: Double
+:Required: No
+:Default: ``.2``
+
+
+``ms max backoff``
+
+:Description: The maximum time to wait before reconnecting on a fault.
+:Type: Double
+:Required: No
+:Default: ``15.0``
+
+
+``ms nocrc``
+
+:Description: Disables crc on network messages. May increase performance if cpu limited.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms die on bad msg``
+
+:Description: Debug option; do not configure.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms dispatch throttle bytes``
+
+:Description: Throttles total size of messages waiting to be dispatched.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``100 << 20``
+
+
+``ms bind ipv6``
+
+:Description: Enable if you want your daemons to bind to IPv6 address instead of IPv4 ones. (Not required if you specify a daemon or cluster IP.)
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms rwthread stack bytes``
+
+:Description: Debug option for stack size; do not configure.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``1024 << 10``
+
+
+``ms tcp read timeout``
+
+:Description: Controls how long (in seconds) the messenger will wait before closing an idle connection.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``900``
+
+
+``ms inject socket failures``
+
+:Description: Debug option; do not configure.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``0``
+
+Async messenger options
+=======================
+
+
+``ms async transport type``
+
+:Description: Transport type used by Async Messenger. Can be ``posix``, ``dpdk``
+ or ``rdma``. Posix uses standard TCP/IP networking and is default.
+ Other transports may be experimental and support may be limited.
+:Type: String
+:Required: No
+:Default: ``posix``
+
+
+``ms async op threads``
+
+:Description: Initial number of worker threads used by each Async Messenger instance.
+ Should be at least equal to highest number of replicas, but you can
+ decrease it if you are low on CPU core count and/or you host a lot of
+ OSDs on single server.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``3``
+
+
+``ms async max op threads``
+
+:Description: Maximum number of worker threads used by each Async Messenger instance.
+ Set to lower values when your machine has limited CPU count, and increase
+ when your CPUs are underutilized (i. e. one or more of CPUs are
+ constantly on 100% load during I/O operations).
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``5``
+
+
+``ms async set affinity``
+
+:Description: Set to true to bind Async Messenger workers to particular CPU cores.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``ms async affinity cores``
+
+:Description: When ``ms async set affinity`` is true, this string specifies how Async
+ Messenger workers are bound to CPU cores. For example, "0,2" will bind
+ workers #1 and #2 to CPU cores #0 and #2, respectively.
+ NOTE: when manually setting affinity, make sure to not assign workers to
+ processors that are virtual CPUs created as an effect of Hyperthreading
+ or similar technology, because they are slower than regular CPU cores.
+:Type: String
+:Required: No
+:Default: ``(empty)``
+
+
+``ms async send inline``
+
+:Description: Send messages directly from the thread that generated them instead of
+ queuing and sending from Async Messenger thread. This option is known
+ to decrease performance on systems with a lot of CPU cores, so it's
+ disabled by default.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
diff --git a/src/ceph/doc/rados/configuration/network-config-ref.rst b/src/ceph/doc/rados/configuration/network-config-ref.rst
new file mode 100644
index 0000000..2d7f9d6
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/network-config-ref.rst
@@ -0,0 +1,494 @@
+=================================
+ Network Configuration Reference
+=================================
+
+Network configuration is critical for building a high performance :term:`Ceph
+Storage Cluster`. The Ceph Storage Cluster does not perform request routing or
+dispatching on behalf of the :term:`Ceph Client`. Instead, Ceph Clients make
+requests directly to Ceph OSD Daemons. Ceph OSD Daemons perform data replication
+on behalf of Ceph Clients, which means replication and other factors impose
+additional loads on Ceph Storage Cluster networks.
+
+Our Quick Start configurations provide a trivial `Ceph configuration file`_ that
+sets monitor IP addresses and daemon host names only. Unless you specify a
+cluster network, Ceph assumes a single "public" network. Ceph functions just
+fine with a public network only, but you may see significant performance
+improvement with a second "cluster" network in a large cluster.
+
+We recommend running a Ceph Storage Cluster with two networks: a public
+(front-side) network and a cluster (back-side) network. To support two networks,
+each :term:`Ceph Node` will need to have more than one NIC. See `Hardware
+Recommendations - Networks`_ for additional details.
+
+.. ditaa::
+ +-------------+
+ | Ceph Client |
+ +----*--*-----+
+ | ^
+ Request | : Response
+ v |
+ /----------------------------------*--*-------------------------------------\
+ | Public Network |
+ \---*--*------------*--*-------------*--*------------*--*------------*--*---/
+ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
+ | | | | | | | | | |
+ | : | : | : | : | :
+ v v v v v v v v v v
+ +---*--*---+ +---*--*---+ +---*--*---+ +---*--*---+ +---*--*---+
+ | Ceph MON | | Ceph MDS | | Ceph OSD | | Ceph OSD | | Ceph OSD |
+ +----------+ +----------+ +---*--*---+ +---*--*---+ +---*--*---+
+ ^ ^ ^ ^ ^ ^
+ The cluster network relieves | | | | | |
+ OSD replication and heartbeat | : | : | :
+ traffic from the public network. v v v v v v
+ /------------------------------------*--*------------*--*------------*--*---\
+ | cCCC Cluster Network |
+ \---------------------------------------------------------------------------/
+
+
+There are several reasons to consider operating two separate networks:
+
+#. **Performance:** Ceph OSD Daemons handle data replication for the Ceph
+ Clients. When Ceph OSD Daemons replicate data more than once, the network
+ load between Ceph OSD Daemons easily dwarfs the network load between Ceph
+ Clients and the Ceph Storage Cluster. This can introduce latency and
+ create a performance problem. Recovery and rebalancing can
+ also introduce significant latency on the public network. See
+ `Scalability and High Availability`_ for additional details on how Ceph
+ replicates data. See `Monitor / OSD Interaction`_ for details on heartbeat
+ traffic.
+
+#. **Security**: While most people are generally civil, a very tiny segment of
+ the population likes to engage in what's known as a Denial of Service (DoS)
+ attack. When traffic between Ceph OSD Daemons gets disrupted, placement
+ groups may no longer reflect an ``active + clean`` state, which may prevent
+ users from reading and writing data. A great way to defeat this type of
+ attack is to maintain a completely separate cluster network that doesn't
+ connect directly to the internet. Also, consider using `Message Signatures`_
+ to defeat spoofing attacks.
+
+
+IP Tables
+=========
+
+By default, daemons `bind`_ to ports within the ``6800:7300`` range. You may
+configure this range at your discretion. Before configuring your IP tables,
+check the default ``iptables`` configuration.
+
+ sudo iptables -L
+
+Some Linux distributions include rules that reject all inbound requests
+except SSH from all network interfaces. For example::
+
+ REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
+
+You will need to delete these rules on both your public and cluster networks
+initially, and replace them with appropriate rules when you are ready to
+harden the ports on your Ceph Nodes.
+
+
+Monitor IP Tables
+-----------------
+
+Ceph Monitors listen on port ``6789`` by default. Additionally, Ceph Monitors
+always operate on the public network. When you add the rule using the example
+below, make sure you replace ``{iface}`` with the public network interface
+(e.g., ``eth0``, ``eth1``, etc.), ``{ip-address}`` with the IP address of the
+public network and ``{netmask}`` with the netmask for the public network. ::
+
+ sudo iptables -A INPUT -i {iface} -p tcp -s {ip-address}/{netmask} --dport 6789 -j ACCEPT
+
+
+MDS IP Tables
+-------------
+
+A :term:`Ceph Metadata Server` listens on the first available port on the public
+network beginning at port 6800. Note that this behavior is not deterministic, so
+if you are running more than one OSD or MDS on the same host, or if you restart
+the daemons within a short window of time, the daemons will bind to higher
+ports. You should open the entire 6800-7300 range by default. When you add the
+rule using the example below, make sure you replace ``{iface}`` with the public
+network interface (e.g., ``eth0``, ``eth1``, etc.), ``{ip-address}`` with the IP
+address of the public network and ``{netmask}`` with the netmask of the public
+network.
+
+For example::
+
+ sudo iptables -A INPUT -i {iface} -m multiport -p tcp -s {ip-address}/{netmask} --dports 6800:7300 -j ACCEPT
+
+
+OSD IP Tables
+-------------
+
+By default, Ceph OSD Daemons `bind`_ to the first available ports on a Ceph Node
+beginning at port 6800. Note that this behavior is not deterministic, so if you
+are running more than one OSD or MDS on the same host, or if you restart the
+daemons within a short window of time, the daemons will bind to higher ports.
+Each Ceph OSD Daemon on a Ceph Node may use up to four ports:
+
+#. One for talking to clients and monitors.
+#. One for sending data to other OSDs.
+#. Two for heartbeating on each interface.
+
+.. ditaa::
+ /---------------\
+ | OSD |
+ | +---+----------------+-----------+
+ | | Clients & Monitors | Heartbeat |
+ | +---+----------------+-----------+
+ | |
+ | +---+----------------+-----------+
+ | | Data Replication | Heartbeat |
+ | +---+----------------+-----------+
+ | cCCC |
+ \---------------/
+
+When a daemon fails and restarts without letting go of the port, the restarted
+daemon will bind to a new port. You should open the entire 6800-7300 port range
+to handle this possibility.
+
+If you set up separate public and cluster networks, you must add rules for both
+the public network and the cluster network, because clients will connect using
+the public network and other Ceph OSD Daemons will connect using the cluster
+network. When you add the rule using the example below, make sure you replace
+``{iface}`` with the network interface (e.g., ``eth0``, ``eth1``, etc.),
+``{ip-address}`` with the IP address and ``{netmask}`` with the netmask of the
+public or cluster network. For example::
+
+ sudo iptables -A INPUT -i {iface} -m multiport -p tcp -s {ip-address}/{netmask} --dports 6800:7300 -j ACCEPT
+
+.. tip:: If you run Ceph Metadata Servers on the same Ceph Node as the
+ Ceph OSD Daemons, you can consolidate the public network configuration step.
+
+
+Ceph Networks
+=============
+
+To configure Ceph networks, you must add a network configuration to the
+``[global]`` section of the configuration file. Our 5-minute Quick Start
+provides a trivial `Ceph configuration file`_ that assumes one public network
+with client and server on the same network and subnet. Ceph functions just fine
+with a public network only. However, Ceph allows you to establish much more
+specific criteria, including multiple IP network and subnet masks for your
+public network. You can also establish a separate cluster network to handle OSD
+heartbeat, object replication and recovery traffic. Don't confuse the IP
+addresses you set in your configuration with the public-facing IP addresses
+network clients may use to access your service. Typical internal IP networks are
+often ``192.168.0.0`` or ``10.0.0.0``.
+
+.. tip:: If you specify more than one IP address and subnet mask for
+ either the public or the cluster network, the subnets within the network
+ must be capable of routing to each other. Additionally, make sure you
+ include each IP address/subnet in your IP tables and open ports for them
+ as necessary.
+
+.. note:: Ceph uses `CIDR`_ notation for subnets (e.g., ``10.0.0.0/24``).
+
+When you have configured your networks, you may restart your cluster or restart
+each daemon. Ceph daemons bind dynamically, so you do not have to restart the
+entire cluster at once if you change your network configuration.
+
+
+Public Network
+--------------
+
+To configure a public network, add the following option to the ``[global]``
+section of your Ceph configuration file.
+
+.. code-block:: ini
+
+ [global]
+ ...
+ public network = {public-network/netmask}
+
+
+Cluster Network
+---------------
+
+If you declare a cluster network, OSDs will route heartbeat, object replication
+and recovery traffic over the cluster network. This may improve performance
+compared to using a single network. To configure a cluster network, add the
+following option to the ``[global]`` section of your Ceph configuration file.
+
+.. code-block:: ini
+
+ [global]
+ ...
+ cluster network = {cluster-network/netmask}
+
+We prefer that the cluster network is **NOT** reachable from the public network
+or the Internet for added security.
+
+
+Ceph Daemons
+============
+
+Ceph has one network configuration requirement that applies to all daemons: the
+Ceph configuration file **MUST** specify the ``host`` for each daemon. Ceph also
+requires that a Ceph configuration file specify the monitor IP address and its
+port.
+
+.. important:: Some deployment tools (e.g., ``ceph-deploy``, Chef) may create a
+ configuration file for you. **DO NOT** set these values if the deployment
+ tool does it for you.
+
+.. tip:: The ``host`` setting is the short name of the host (i.e., not
+ an fqdn). It is **NOT** an IP address either. Enter ``hostname -s`` on
+ the command line to retrieve the name of the host.
+
+
+.. code-block:: ini
+
+ [mon.a]
+
+ host = {hostname}
+ mon addr = {ip-address}:6789
+
+ [osd.0]
+ host = {hostname}
+
+
+You do not have to set the host IP address for a daemon. If you have a static IP
+configuration and both public and cluster networks running, the Ceph
+configuration file may specify the IP address of the host for each daemon. To
+set a static IP address for a daemon, the following option(s) should appear in
+the daemon instance sections of your ``ceph.conf`` file.
+
+.. code-block:: ini
+
+ [osd.0]
+ public addr = {host-public-ip-address}
+ cluster addr = {host-cluster-ip-address}
+
+
+.. topic:: One NIC OSD in a Two Network Cluster
+
+ Generally, we do not recommend deploying an OSD host with a single NIC in a
+ cluster with two networks. However, you may accomplish this by forcing the
+ OSD host to operate on the public network by adding a ``public addr`` entry
+ to the ``[osd.n]`` section of the Ceph configuration file, where ``n``
+ refers to the number of the OSD with one NIC. Additionally, the public
+ network and cluster network must be able to route traffic to each other,
+ which we don't recommend for security reasons.
+
+
+Network Config Settings
+=======================
+
+Network configuration settings are not required. Ceph assumes a public network
+with all hosts operating on it unless you specifically configure a cluster
+network.
+
+
+Public Network
+--------------
+
+The public network configuration allows you specifically define IP addresses
+and subnets for the public network. You may specifically assign static IP
+addresses or override ``public network`` settings using the ``public addr``
+setting for a specific daemon.
+
+``public network``
+
+:Description: The IP address and netmask of the public (front-side) network
+ (e.g., ``192.168.0.0/24``). Set in ``[global]``. You may specify
+ comma-delimited subnets.
+
+:Type: ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
+:Required: No
+:Default: N/A
+
+
+``public addr``
+
+:Description: The IP address for the public (front-side) network.
+ Set for each daemon.
+
+:Type: IP Address
+:Required: No
+:Default: N/A
+
+
+
+Cluster Network
+---------------
+
+The cluster network configuration allows you to declare a cluster network, and
+specifically define IP addresses and subnets for the cluster network. You may
+specifically assign static IP addresses or override ``cluster network``
+settings using the ``cluster addr`` setting for specific OSD daemons.
+
+
+``cluster network``
+
+:Description: The IP address and netmask of the cluster (back-side) network
+ (e.g., ``10.0.0.0/24``). Set in ``[global]``. You may specify
+ comma-delimited subnets.
+
+:Type: ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
+:Required: No
+:Default: N/A
+
+
+``cluster addr``
+
+:Description: The IP address for the cluster (back-side) network.
+ Set for each daemon.
+
+:Type: Address
+:Required: No
+:Default: N/A
+
+
+Bind
+----
+
+Bind settings set the default port ranges Ceph OSD and MDS daemons use. The
+default range is ``6800:7300``. Ensure that your `IP Tables`_ configuration
+allows you to use the configured port range.
+
+You may also enable Ceph daemons to bind to IPv6 addresses instead of IPv4
+addresses.
+
+
+``ms bind port min``
+
+:Description: The minimum port number to which an OSD or MDS daemon will bind.
+:Type: 32-bit Integer
+:Default: ``6800``
+:Required: No
+
+
+``ms bind port max``
+
+:Description: The maximum port number to which an OSD or MDS daemon will bind.
+:Type: 32-bit Integer
+:Default: ``7300``
+:Required: No.
+
+
+``ms bind ipv6``
+
+:Description: Enables Ceph daemons to bind to IPv6 addresses. Currently the
+ messenger *either* uses IPv4 or IPv6, but it cannot do both.
+:Type: Boolean
+:Default: ``false``
+:Required: No
+
+``public bind addr``
+
+:Description: In some dynamic deployments the Ceph MON daemon might bind
+ to an IP address locally that is different from the ``public addr``
+ advertised to other peers in the network. The environment must ensure
+ that routing rules are set correclty. If ``public bind addr`` is set
+ the Ceph MON daemon will bind to it locally and use ``public addr``
+ in the monmaps to advertise its address to peers. This behavior is limited
+ to the MON daemon.
+
+:Type: IP Address
+:Required: No
+:Default: N/A
+
+
+
+Hosts
+-----
+
+Ceph expects at least one monitor declared in the Ceph configuration file, with
+a ``mon addr`` setting under each declared monitor. Ceph expects a ``host``
+setting under each declared monitor, metadata server and OSD in the Ceph
+configuration file. Optionally, a monitor can be assigned with a priority, and
+the clients will always connect to the monitor with lower value of priority if
+specified.
+
+
+``mon addr``
+
+:Description: A list of ``{hostname}:{port}`` entries that clients can use to
+ connect to a Ceph monitor. If not set, Ceph searches ``[mon.*]``
+ sections.
+
+:Type: String
+:Required: No
+:Default: N/A
+
+``mon priority``
+
+:Description: The priority of the declared monitor, the lower value the more
+ prefered when a client selects a monitor when trying to connect
+ to the cluster.
+
+:Type: Unsigned 16-bit Integer
+:Required: No
+:Default: 0
+
+``host``
+
+:Description: The hostname. Use this setting for specific daemon instances
+ (e.g., ``[osd.0]``).
+
+:Type: String
+:Required: Yes, for daemon instances.
+:Default: ``localhost``
+
+.. tip:: Do not use ``localhost``. To get your host name, execute
+ ``hostname -s`` on your command line and use the name of your host
+ (to the first period, not the fully-qualified domain name).
+
+.. important:: You should not specify any value for ``host`` when using a third
+ party deployment system that retrieves the host name for you.
+
+
+
+TCP
+---
+
+Ceph disables TCP buffering by default.
+
+
+``ms tcp nodelay``
+
+:Description: Ceph enables ``ms tcp nodelay`` so that each request is sent
+ immediately (no buffering). Disabling `Nagle's algorithm`_
+ increases network traffic, which can introduce latency. If you
+ experience large numbers of small packets, you may try
+ disabling ``ms tcp nodelay``.
+
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+
+``ms tcp rcvbuf``
+
+:Description: The size of the socket buffer on the receiving end of a network
+ connection. Disable by default.
+
+:Type: 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+
+``ms tcp read timeout``
+
+:Description: If a client or daemon makes a request to another Ceph daemon and
+ does not drop an unused connection, the ``ms tcp read timeout``
+ defines the connection as idle after the specified number
+ of seconds.
+
+:Type: Unsigned 64-bit Integer
+:Required: No
+:Default: ``900`` 15 minutes.
+
+
+
+.. _Scalability and High Availability: ../../../architecture#scalability-and-high-availability
+.. _Hardware Recommendations - Networks: ../../../start/hardware-recommendations#networks
+.. _Ceph configuration file: ../../../start/quick-ceph-deploy/#create-a-cluster
+.. _hardware recommendations: ../../../start/hardware-recommendations
+.. _Monitor / OSD Interaction: ../mon-osd-interaction
+.. _Message Signatures: ../auth-config-ref#signatures
+.. _CIDR: http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
+.. _Nagle's Algorithm: http://en.wikipedia.org/wiki/Nagle's_algorithm
diff --git a/src/ceph/doc/rados/configuration/osd-config-ref.rst b/src/ceph/doc/rados/configuration/osd-config-ref.rst
new file mode 100644
index 0000000..fae7078
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/osd-config-ref.rst
@@ -0,0 +1,1105 @@
+======================
+ OSD Config Reference
+======================
+
+.. index:: OSD; configuration
+
+You can configure Ceph OSD Daemons in the Ceph configuration file, but Ceph OSD
+Daemons can use the default values and a very minimal configuration. A minimal
+Ceph OSD Daemon configuration sets ``osd journal size`` and ``host``, and
+uses default values for nearly everything else.
+
+Ceph OSD Daemons are numerically identified in incremental fashion, beginning
+with ``0`` using the following convention. ::
+
+ osd.0
+ osd.1
+ osd.2
+
+In a configuration file, you may specify settings for all Ceph OSD Daemons in
+the cluster by adding configuration settings to the ``[osd]`` section of your
+configuration file. To add settings directly to a specific Ceph OSD Daemon
+(e.g., ``host``), enter it in an OSD-specific section of your configuration
+file. For example:
+
+.. code-block:: ini
+
+ [osd]
+ osd journal size = 1024
+
+ [osd.0]
+ host = osd-host-a
+
+ [osd.1]
+ host = osd-host-b
+
+
+.. index:: OSD; config settings
+
+General Settings
+================
+
+The following settings provide an Ceph OSD Daemon's ID, and determine paths to
+data and journals. Ceph deployment scripts typically generate the UUID
+automatically. We **DO NOT** recommend changing the default paths for data or
+journals, as it makes it more problematic to troubleshoot Ceph later.
+
+The journal size should be at least twice the product of the expected drive
+speed multiplied by ``filestore max sync interval``. However, the most common
+practice is to partition the journal drive (often an SSD), and mount it such
+that Ceph uses the entire partition for the journal.
+
+
+``osd uuid``
+
+:Description: The universally unique identifier (UUID) for the Ceph OSD Daemon.
+:Type: UUID
+:Default: The UUID.
+:Note: The ``osd uuid`` applies to a single Ceph OSD Daemon. The ``fsid``
+ applies to the entire cluster.
+
+
+``osd data``
+
+:Description: The path to the OSDs data. You must create the directory when
+ deploying Ceph. You should mount a drive for OSD data at this
+ mount point. We do not recommend changing the default.
+
+:Type: String
+:Default: ``/var/lib/ceph/osd/$cluster-$id``
+
+
+``osd max write size``
+
+:Description: The maximum size of a write in megabytes.
+:Type: 32-bit Integer
+:Default: ``90``
+
+
+``osd client message size cap``
+
+:Description: The largest client data message allowed in memory.
+:Type: 64-bit Unsigned Integer
+:Default: 500MB default. ``500*1024L*1024L``
+
+
+``osd class dir``
+
+:Description: The class path for RADOS class plug-ins.
+:Type: String
+:Default: ``$libdir/rados-classes``
+
+
+.. index:: OSD; file system
+
+File System Settings
+====================
+Ceph builds and mounts file systems which are used for Ceph OSDs.
+
+``osd mkfs options {fs-type}``
+
+:Description: Options used when creating a new Ceph OSD of type {fs-type}.
+
+:Type: String
+:Default for xfs: ``-f -i 2048``
+:Default for other file systems: {empty string}
+
+For example::
+ ``osd mkfs options xfs = -f -d agcount=24``
+
+``osd mount options {fs-type}``
+
+:Description: Options used when mounting a Ceph OSD of type {fs-type}.
+
+:Type: String
+:Default for xfs: ``rw,noatime,inode64``
+:Default for other file systems: ``rw, noatime``
+
+For example::
+ ``osd mount options xfs = rw, noatime, inode64, logbufs=8``
+
+
+.. index:: OSD; journal settings
+
+Journal Settings
+================
+
+By default, Ceph expects that you will store an Ceph OSD Daemons journal with
+the following path::
+
+ /var/lib/ceph/osd/$cluster-$id/journal
+
+Without performance optimization, Ceph stores the journal on the same disk as
+the Ceph OSD Daemons data. An Ceph OSD Daemon optimized for performance may use
+a separate disk to store journal data (e.g., a solid state drive delivers high
+performance journaling).
+
+Ceph's default ``osd journal size`` is 0, so you will need to set this in your
+``ceph.conf`` file. A journal size should find the product of the ``filestore
+max sync interval`` and the expected throughput, and multiply the product by
+two (2)::
+
+ osd journal size = {2 * (expected throughput * filestore max sync interval)}
+
+The expected throughput number should include the expected disk throughput
+(i.e., sustained data transfer rate), and network throughput. For example,
+a 7200 RPM disk will likely have approximately 100 MB/s. Taking the ``min()``
+of the disk and network throughput should provide a reasonable expected
+throughput. Some users just start off with a 10GB journal size. For
+example::
+
+ osd journal size = 10000
+
+
+``osd journal``
+
+:Description: The path to the OSD's journal. This may be a path to a file or a
+ block device (such as a partition of an SSD). If it is a file,
+ you must create the directory to contain it. We recommend using a
+ drive separate from the ``osd data`` drive.
+
+:Type: String
+:Default: ``/var/lib/ceph/osd/$cluster-$id/journal``
+
+
+``osd journal size``
+
+:Description: The size of the journal in megabytes. If this is 0, and the
+ journal is a block device, the entire block device is used.
+ Since v0.54, this is ignored if the journal is a block device,
+ and the entire block device is used.
+
+:Type: 32-bit Integer
+:Default: ``5120``
+:Recommended: Begin with 1GB. Should be at least twice the product of the
+ expected speed multiplied by ``filestore max sync interval``.
+
+
+See `Journal Config Reference`_ for additional details.
+
+
+Monitor OSD Interaction
+=======================
+
+Ceph OSD Daemons check each other's heartbeats and report to monitors
+periodically. Ceph can use default values in many cases. However, if your
+network has latency issues, you may need to adopt longer intervals. See
+`Configuring Monitor/OSD Interaction`_ for a detailed discussion of heartbeats.
+
+
+Data Placement
+==============
+
+See `Pool & PG Config Reference`_ for details.
+
+
+.. index:: OSD; scrubbing
+
+Scrubbing
+=========
+
+In addition to making multiple copies of objects, Ceph insures data integrity by
+scrubbing placement groups. Ceph scrubbing is analogous to ``fsck`` on the
+object storage layer. For each placement group, Ceph generates a catalog of all
+objects and compares each primary object and its replicas to ensure that no
+objects are missing or mismatched. Light scrubbing (daily) checks the object
+size and attributes. Deep scrubbing (weekly) reads the data and uses checksums
+to ensure data integrity.
+
+Scrubbing is important for maintaining data integrity, but it can reduce
+performance. You can adjust the following settings to increase or decrease
+scrubbing operations.
+
+
+``osd max scrubs``
+
+:Description: The maximum number of simultaneous scrub operations for
+ a Ceph OSD Daemon.
+
+:Type: 32-bit Int
+:Default: ``1``
+
+``osd scrub begin hour``
+
+:Description: The time of day for the lower bound when a scheduled scrub can be
+ performed.
+:Type: Integer in the range of 0 to 24
+:Default: ``0``
+
+
+``osd scrub end hour``
+
+:Description: The time of day for the upper bound when a scheduled scrub can be
+ performed. Along with ``osd scrub begin hour``, they define a time
+ window, in which the scrubs can happen. But a scrub will be performed
+ no matter the time window allows or not, as long as the placement
+ group's scrub interval exceeds ``osd scrub max interval``.
+:Type: Integer in the range of 0 to 24
+:Default: ``24``
+
+
+``osd scrub during recovery``
+
+:Description: Allow scrub during recovery. Setting this to ``false`` will disable
+ scheduling new scrub (and deep--scrub) while there is active recovery.
+ Already running scrubs will be continued. This might be useful to reduce
+ load on busy clusters.
+:Type: Boolean
+:Default: ``true``
+
+
+``osd scrub thread timeout``
+
+:Description: The maximum time in seconds before timing out a scrub thread.
+:Type: 32-bit Integer
+:Default: ``60``
+
+
+``osd scrub finalize thread timeout``
+
+:Description: The maximum time in seconds before timing out a scrub finalize
+ thread.
+
+:Type: 32-bit Integer
+:Default: ``60*10``
+
+
+``osd scrub load threshold``
+
+:Description: The maximum load. Ceph will not scrub when the system load
+ (as defined by ``getloadavg()``) is higher than this number.
+ Default is ``0.5``.
+
+:Type: Float
+:Default: ``0.5``
+
+
+``osd scrub min interval``
+
+:Description: The minimal interval in seconds for scrubbing the Ceph OSD Daemon
+ when the Ceph Storage Cluster load is low.
+
+:Type: Float
+:Default: Once per day. ``60*60*24``
+
+
+``osd scrub max interval``
+
+:Description: The maximum interval in seconds for scrubbing the Ceph OSD Daemon
+ irrespective of cluster load.
+
+:Type: Float
+:Default: Once per week. ``7*60*60*24``
+
+
+``osd scrub chunk min``
+
+:Description: The minimal number of object store chunks to scrub during single operation.
+ Ceph blocks writes to single chunk during scrub.
+
+:Type: 32-bit Integer
+:Default: 5
+
+
+``osd scrub chunk max``
+
+:Description: The maximum number of object store chunks to scrub during single operation.
+
+:Type: 32-bit Integer
+:Default: 25
+
+
+``osd scrub sleep``
+
+:Description: Time to sleep before scrubbing next group of chunks. Increasing this value will slow
+ down whole scrub operation while client operations will be less impacted.
+
+:Type: Float
+:Default: 0
+
+
+``osd deep scrub interval``
+
+:Description: The interval for "deep" scrubbing (fully reading all data). The
+ ``osd scrub load threshold`` does not affect this setting.
+
+:Type: Float
+:Default: Once per week. ``60*60*24*7``
+
+
+``osd scrub interval randomize ratio``
+
+:Description: Add a random delay to ``osd scrub min interval`` when scheduling
+ the next scrub job for a placement group. The delay is a random
+ value less than ``osd scrub min interval`` \*
+ ``osd scrub interval randomized ratio``. So the default setting
+ practically randomly spreads the scrubs out in the allowed time
+ window of ``[1, 1.5]`` \* ``osd scrub min interval``.
+:Type: Float
+:Default: ``0.5``
+
+``osd deep scrub stride``
+
+:Description: Read size when doing a deep scrub.
+:Type: 32-bit Integer
+:Default: 512 KB. ``524288``
+
+
+.. index:: OSD; operations settings
+
+Operations
+==========
+
+Operations settings allow you to configure the number of threads for servicing
+requests. If you set ``osd op threads`` to ``0``, it disables multi-threading.
+By default, Ceph uses two threads with a 30 second timeout and a 30 second
+complaint time if an operation doesn't complete within those time parameters.
+You can set operations priority weights between client operations and
+recovery operations to ensure optimal performance during recovery.
+
+
+``osd op threads``
+
+:Description: The number of threads to service Ceph OSD Daemon operations.
+ Set to ``0`` to disable it. Increasing the number may increase
+ the request processing rate.
+
+:Type: 32-bit Integer
+:Default: ``2``
+
+
+``osd op queue``
+
+:Description: This sets the type of queue to be used for prioritizing ops
+ in the OSDs. Both queues feature a strict sub-queue which is
+ dequeued before the normal queue. The normal queue is different
+ between implementations. The original PrioritizedQueue (``prio``) uses a
+ token bucket system which when there are sufficient tokens will
+ dequeue high priority queues first. If there are not enough
+ tokens available, queues are dequeued low priority to high priority.
+ The WeightedPriorityQueue (``wpq``) dequeues all priorities in
+ relation to their priorities to prevent starvation of any queue.
+ WPQ should help in cases where a few OSDs are more overloaded
+ than others. The new mClock based OpClassQueue
+ (``mclock_opclass``) prioritizes operations based on which class
+ they belong to (recovery, scrub, snaptrim, client op, osd subop).
+ And, the mClock based ClientQueue (``mclock_client``) also
+ incorporates the client identifier in order to promote fairness
+ between clients. See `QoS Based on mClock`_. Requires a restart.
+
+:Type: String
+:Valid Choices: prio, wpq, mclock_opclass, mclock_client
+:Default: ``prio``
+
+
+``osd op queue cut off``
+
+:Description: This selects which priority ops will be sent to the strict
+ queue verses the normal queue. The ``low`` setting sends all
+ replication ops and higher to the strict queue, while the ``high``
+ option sends only replication acknowledgement ops and higher to
+ the strict queue. Setting this to ``high`` should help when a few
+ OSDs in the cluster are very busy especially when combined with
+ ``wpq`` in the ``osd op queue`` setting. OSDs that are very busy
+ handling replication traffic could starve primary client traffic
+ on these OSDs without these settings. Requires a restart.
+
+:Type: String
+:Valid Choices: low, high
+:Default: ``low``
+
+
+``osd client op priority``
+
+:Description: The priority set for client operations. It is relative to
+ ``osd recovery op priority``.
+
+:Type: 32-bit Integer
+:Default: ``63``
+:Valid Range: 1-63
+
+
+``osd recovery op priority``
+
+:Description: The priority set for recovery operations. It is relative to
+ ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``3``
+:Valid Range: 1-63
+
+
+``osd scrub priority``
+
+:Description: The priority set for scrub operations. It is relative to
+ ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: 1-63
+
+
+``osd snap trim priority``
+
+:Description: The priority set for snap trim operations. It is relative to
+ ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: 1-63
+
+
+``osd op thread timeout``
+
+:Description: The Ceph OSD Daemon operation thread timeout in seconds.
+:Type: 32-bit Integer
+:Default: ``15``
+
+
+``osd op complaint time``
+
+:Description: An operation becomes complaint worthy after the specified number
+ of seconds have elapsed.
+
+:Type: Float
+:Default: ``30``
+
+
+``osd disk threads``
+
+:Description: The number of disk threads, which are used to perform background
+ disk intensive OSD operations such as scrubbing and snap
+ trimming.
+
+:Type: 32-bit Integer
+:Default: ``1``
+
+``osd disk thread ioprio class``
+
+:Description: Warning: it will only be used if both ``osd disk thread
+ ioprio class`` and ``osd disk thread ioprio priority`` are
+ set to a non default value. Sets the ioprio_set(2) I/O
+ scheduling ``class`` for the disk thread. Acceptable
+ values are ``idle``, ``be`` or ``rt``. The ``idle``
+ class means the disk thread will have lower priority
+ than any other thread in the OSD. This is useful to slow
+ down scrubbing on an OSD that is busy handling client
+ operations. ``be`` is the default and is the same
+ priority as all other threads in the OSD. ``rt`` means
+ the disk thread will have precendence over all other
+ threads in the OSD. Note: Only works with the Linux Kernel
+ CFQ scheduler. Since Jewel scrubbing is no longer carried
+ out by the disk iothread, see osd priority options instead.
+:Type: String
+:Default: the empty string
+
+``osd disk thread ioprio priority``
+
+:Description: Warning: it will only be used if both ``osd disk thread
+ ioprio class`` and ``osd disk thread ioprio priority`` are
+ set to a non default value. It sets the ioprio_set(2)
+ I/O scheduling ``priority`` of the disk thread ranging
+ from 0 (highest) to 7 (lowest). If all OSDs on a given
+ host were in class ``idle`` and compete for I/O
+ (i.e. due to controller congestion), it can be used to
+ lower the disk thread priority of one OSD to 7 so that
+ another OSD with priority 0 can have priority.
+ Note: Only works with the Linux Kernel CFQ scheduler.
+:Type: Integer in the range of 0 to 7 or -1 if not to be used.
+:Default: ``-1``
+
+``osd op history size``
+
+:Description: The maximum number of completed operations to track.
+:Type: 32-bit Unsigned Integer
+:Default: ``20``
+
+
+``osd op history duration``
+
+:Description: The oldest completed operation to track.
+:Type: 32-bit Unsigned Integer
+:Default: ``600``
+
+
+``osd op log threshold``
+
+:Description: How many operations logs to display at once.
+:Type: 32-bit Integer
+:Default: ``5``
+
+
+QoS Based on mClock
+-------------------
+
+Ceph's use of mClock is currently in the experimental phase and should
+be approached with an exploratory mindset.
+
+Core Concepts
+`````````````
+
+The QoS support of Ceph is implemented using a queueing scheduler
+based on `the dmClock algorithm`_. This algorithm allocates the I/O
+resources of the Ceph cluster in proportion to weights, and enforces
+the constraits of minimum reservation and maximum limitation, so that
+the services can compete for the resources fairly. Currently the
+*mclock_opclass* operation queue divides Ceph services involving I/O
+resources into following buckets:
+
+- client op: the iops issued by client
+- osd subop: the iops issued by primary OSD
+- snap trim: the snap trimming related requests
+- pg recovery: the recovery related requests
+- pg scrub: the scrub related requests
+
+And the resources are partitioned using following three sets of tags. In other
+words, the share of each type of service is controlled by three tags:
+
+#. reservation: the minimum IOPS allocated for the service.
+#. limitation: the maximum IOPS allocated for the service.
+#. weight: the proportional share of capacity if extra capacity or system
+ oversubscribed.
+
+In Ceph operations are graded with "cost". And the resources allocated
+for serving various services are consumed by these "costs". So, for
+example, the more reservation a services has, the more resource it is
+guaranteed to possess, as long as it requires. Assuming there are 2
+services: recovery and client ops:
+
+- recovery: (r:1, l:5, w:1)
+- client ops: (r:2, l:0, w:9)
+
+The settings above ensure that the recovery won't get more than 5
+requests per second serviced, even if it requires so (see CURRENT
+IMPLEMENTATION NOTE below), and no other services are competing with
+it. But if the clients start to issue large amount of I/O requests,
+neither will they exhaust all the I/O resources. 1 request per second
+is always allocated for recovery jobs as long as there are any such
+requests. So the recovery jobs won't be starved even in a cluster with
+high load. And in the meantime, the client ops can enjoy a larger
+portion of the I/O resource, because its weight is "9", while its
+competitor "1". In the case of client ops, it is not clamped by the
+limit setting, so it can make use of all the resources if there is no
+recovery ongoing.
+
+Along with *mclock_opclass* another mclock operation queue named
+*mclock_client* is available. It divides operations based on category
+but also divides them based on the client making the request. This
+helps not only manage the distribution of resources spent on different
+classes of operations but also tries to insure fairness among clients.
+
+CURRENT IMPLEMENTATION NOTE: the current experimental implementation
+does not enforce the limit values. As a first approximation we decided
+not to prevent operations that would otherwise enter the operation
+sequencer from doing so.
+
+Subtleties of mClock
+````````````````````
+
+The reservation and limit values have a unit of requests per
+second. The weight, however, does not technically have a unit and the
+weights are relative to one another. So if one class of requests has a
+weight of 1 and another a weight of 9, then the latter class of
+requests should get 9 executed at a 9 to 1 ratio as the first class.
+However that will only happen once the reservations are met and those
+values include the operations executed under the reservation phase.
+
+Even though the weights do not have units, one must be careful in
+choosing their values due how the algorithm assigns weight tags to
+requests. If the weight is *W*, then for a given class of requests,
+the next one that comes in will have a weight tag of *1/W* plus the
+previous weight tag or the current time, whichever is larger. That
+means if *W* is sufficiently large and therefore *1/W* is sufficiently
+small, the calculated tag may never be assigned as it will get a value
+of the current time. The ultimate lesson is that values for weight
+should not be too large. They should be under the number of requests
+one expects to ve serviced each second.
+
+Caveats
+```````
+
+There are some factors that can reduce the impact of the mClock op
+queues within Ceph. First, requests to an OSD are sharded by their
+placement group identifier. Each shard has its own mClock queue and
+these queues neither interact nor share information among them. The
+number of shards can be controlled with the configuration options
+``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and
+``osd_op_num_shards_ssd``. A lower number of shards will increase the
+impact of the mClock queues, but may have other deliterious effects.
+
+Second, requests are transferred from the operation queue to the
+operation sequencer, in which they go through the phases of
+execution. The operation queue is where mClock resides and mClock
+determines the next op to transfer to the operation sequencer. The
+number of operations allowed in the operation sequencer is a complex
+issue. In general we want to keep enough operations in the sequencer
+so it's always getting work done on some operations while it's waiting
+for disk and network access to complete on other operations. On the
+other hand, once an operation is transferred to the operation
+sequencer, mClock no longer has control over it. Therefore to maximize
+the impact of mClock, we want to keep as few operations in the
+operation sequencer as possible. So we have an inherent tension.
+
+The configuration options that influence the number of operations in
+the operation sequencer are ``bluestore_throttle_bytes``,
+``bluestore_throttle_deferred_bytes``,
+``bluestore_throttle_cost_per_io``,
+``bluestore_throttle_cost_per_io_hdd``, and
+``bluestore_throttle_cost_per_io_ssd``.
+
+A third factor that affects the impact of the mClock algorithm is that
+we're using a distributed system, where requests are made to multiple
+OSDs and each OSD has (can have) multiple shards. Yet we're currently
+using the mClock algorithm, which is not distributed (note: dmClock is
+the distributed version of mClock).
+
+Various organizations and individuals are currently experimenting with
+mClock as it exists in this code base along with their modifications
+to the code base. We hope you'll share you're experiences with your
+mClock and dmClock experiments in the ceph-devel mailing list.
+
+
+``osd push per object cost``
+
+:Description: the overhead for serving a push op
+
+:Type: Unsigned Integer
+:Default: 1000
+
+``osd recovery max chunk``
+
+:Description: the maximum total size of data chunks a recovery op can carry.
+
+:Type: Unsigned Integer
+:Default: 8 MiB
+
+
+``osd op queue mclock client op res``
+
+:Description: the reservation of client op.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock client op wgt``
+
+:Description: the weight of client op.
+
+:Type: Float
+:Default: 500.0
+
+
+``osd op queue mclock client op lim``
+
+:Description: the limit of client op.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock osd subop res``
+
+:Description: the reservation of osd subop.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock osd subop wgt``
+
+:Description: the weight of osd subop.
+
+:Type: Float
+:Default: 500.0
+
+
+``osd op queue mclock osd subop lim``
+
+:Description: the limit of osd subop.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock snap res``
+
+:Description: the reservation of snap trimming.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock snap wgt``
+
+:Description: the weight of snap trimming.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock snap lim``
+
+:Description: the limit of snap trimming.
+
+:Type: Float
+:Default: 0.001
+
+
+``osd op queue mclock recov res``
+
+:Description: the reservation of recovery.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock recov wgt``
+
+:Description: the weight of recovery.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock recov lim``
+
+:Description: the limit of recovery.
+
+:Type: Float
+:Default: 0.001
+
+
+``osd op queue mclock scrub res``
+
+:Description: the reservation of scrub jobs.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock scrub wgt``
+
+:Description: the weight of scrub jobs.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock scrub lim``
+
+:Description: the limit of scrub jobs.
+
+:Type: Float
+:Default: 0.001
+
+.. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
+
+
+.. index:: OSD; backfilling
+
+Backfilling
+===========
+
+When you add or remove Ceph OSD Daemons to a cluster, the CRUSH algorithm will
+want to rebalance the cluster by moving placement groups to or from Ceph OSD
+Daemons to restore the balance. The process of migrating placement groups and
+the objects they contain can reduce the cluster's operational performance
+considerably. To maintain operational performance, Ceph performs this migration
+with 'backfilling', which allows Ceph to set backfill operations to a lower
+priority than requests to read or write data.
+
+
+``osd max backfills``
+
+:Description: The maximum number of backfills allowed to or from a single OSD.
+:Type: 64-bit Unsigned Integer
+:Default: ``1``
+
+
+``osd backfill scan min``
+
+:Description: The minimum number of objects per backfill scan.
+
+:Type: 32-bit Integer
+:Default: ``64``
+
+
+``osd backfill scan max``
+
+:Description: The maximum number of objects per backfill scan.
+
+:Type: 32-bit Integer
+:Default: ``512``
+
+
+``osd backfill retry interval``
+
+:Description: The number of seconds to wait before retrying backfill requests.
+:Type: Double
+:Default: ``10.0``
+
+.. index:: OSD; osdmap
+
+OSD Map
+=======
+
+OSD maps reflect the OSD daemons operating in the cluster. Over time, the
+number of map epochs increases. Ceph provides some settings to ensure that
+Ceph performs well as the OSD map grows larger.
+
+
+``osd map dedup``
+
+:Description: Enable removing duplicates in the OSD map.
+:Type: Boolean
+:Default: ``true``
+
+
+``osd map cache size``
+
+:Description: The number of OSD maps to keep cached.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``osd map cache bl size``
+
+:Description: The size of the in-memory OSD map cache in OSD daemons.
+:Type: 32-bit Integer
+:Default: ``50``
+
+
+``osd map cache bl inc size``
+
+:Description: The size of the in-memory OSD map cache incrementals in
+ OSD daemons.
+
+:Type: 32-bit Integer
+:Default: ``100``
+
+
+``osd map message max``
+
+:Description: The maximum map entries allowed per MOSDMap message.
+:Type: 32-bit Integer
+:Default: ``100``
+
+
+
+.. index:: OSD; recovery
+
+Recovery
+========
+
+When the cluster starts or when a Ceph OSD Daemon crashes and restarts, the OSD
+begins peering with other Ceph OSD Daemons before writes can occur. See
+`Monitoring OSDs and PGs`_ for details.
+
+If a Ceph OSD Daemon crashes and comes back online, usually it will be out of
+sync with other Ceph OSD Daemons containing more recent versions of objects in
+the placement groups. When this happens, the Ceph OSD Daemon goes into recovery
+mode and seeks to get the latest copy of the data and bring its map back up to
+date. Depending upon how long the Ceph OSD Daemon was down, the OSD's objects
+and placement groups may be significantly out of date. Also, if a failure domain
+went down (e.g., a rack), more than one Ceph OSD Daemon may come back online at
+the same time. This can make the recovery process time consuming and resource
+intensive.
+
+To maintain operational performance, Ceph performs recovery with limitations on
+the number recovery requests, threads and object chunk sizes which allows Ceph
+perform well in a degraded state.
+
+
+``osd recovery delay start``
+
+:Description: After peering completes, Ceph will delay for the specified number
+ of seconds before starting to recover objects.
+
+:Type: Float
+:Default: ``0``
+
+
+``osd recovery max active``
+
+:Description: The number of active recovery requests per OSD at one time. More
+ requests will accelerate recovery, but the requests places an
+ increased load on the cluster.
+
+:Type: 32-bit Integer
+:Default: ``3``
+
+
+``osd recovery max chunk``
+
+:Description: The maximum size of a recovered chunk of data to push.
+:Type: 64-bit Unsigned Integer
+:Default: ``8 << 20``
+
+
+``osd recovery max single start``
+
+:Description: The maximum number of recovery operations per OSD that will be
+ newly started when an OSD is recovering.
+:Type: 64-bit Unsigned Integer
+:Default: ``1``
+
+
+``osd recovery thread timeout``
+
+:Description: The maximum time in seconds before timing out a recovery thread.
+:Type: 32-bit Integer
+:Default: ``30``
+
+
+``osd recover clone overlap``
+
+:Description: Preserves clone overlap during recovery. Should always be set
+ to ``true``.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``osd recovery sleep``
+
+:Description: Time in seconds to sleep before next recovery or backfill op.
+ Increasing this value will slow down recovery operation while
+ client operations will be less impacted.
+
+:Type: Float
+:Default: ``0``
+
+
+``osd recovery sleep hdd``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+ for HDDs.
+
+:Type: Float
+:Default: ``0.1``
+
+
+``osd recovery sleep ssd``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+ for SSDs.
+
+:Type: Float
+:Default: ``0``
+
+
+``osd recovery sleep hybrid``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+ when osd data is on HDD and osd journal is on SSD.
+
+:Type: Float
+:Default: ``0.025``
+
+Tiering
+=======
+
+``osd agent max ops``
+
+:Description: The maximum number of simultaneous flushing ops per tiering agent
+ in the high speed mode.
+:Type: 32-bit Integer
+:Default: ``4``
+
+
+``osd agent max low ops``
+
+:Description: The maximum number of simultaneous flushing ops per tiering agent
+ in the low speed mode.
+:Type: 32-bit Integer
+:Default: ``2``
+
+See `cache target dirty high ratio`_ for when the tiering agent flushes dirty
+objects within the high speed mode.
+
+Miscellaneous
+=============
+
+
+``osd snap trim thread timeout``
+
+:Description: The maximum time in seconds before timing out a snap trim thread.
+:Type: 32-bit Integer
+:Default: ``60*60*1``
+
+
+``osd backlog thread timeout``
+
+:Description: The maximum time in seconds before timing out a backlog thread.
+:Type: 32-bit Integer
+:Default: ``60*60*1``
+
+
+``osd default notify timeout``
+
+:Description: The OSD default notification timeout (in seconds).
+:Type: 32-bit Unsigned Integer
+:Default: ``30``
+
+
+``osd check for log corruption``
+
+:Description: Check log files for corruption. Can be computationally expensive.
+:Type: Boolean
+:Default: ``false``
+
+
+``osd remove thread timeout``
+
+:Description: The maximum time in seconds before timing out a remove OSD thread.
+:Type: 32-bit Integer
+:Default: ``60*60``
+
+
+``osd command thread timeout``
+
+:Description: The maximum time in seconds before timing out a command thread.
+:Type: 32-bit Integer
+:Default: ``10*60``
+
+
+``osd command max records``
+
+:Description: Limits the number of lost objects to return.
+:Type: 32-bit Integer
+:Default: ``256``
+
+
+``osd auto upgrade tmap``
+
+:Description: Uses ``tmap`` for ``omap`` on old objects.
+:Type: Boolean
+:Default: ``true``
+
+
+``osd tmapput sets users tmap``
+
+:Description: Uses ``tmap`` for debugging only.
+:Type: Boolean
+:Default: ``false``
+
+
+``osd fast fail on connection refused``
+
+:Description: If this option is enabled, crashed OSDs are marked down
+ immediately by connected peers and MONs (assuming that the
+ crashed OSD host survives). Disable it to restore old
+ behavior, at the expense of possible long I/O stalls when
+ OSDs crash in the middle of I/O operations.
+:Type: Boolean
+:Default: ``true``
+
+
+
+.. _pool: ../../operations/pools
+.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering
+.. _Pool & PG Config Reference: ../pool-pg-config-ref
+.. _Journal Config Reference: ../journal-ref
+.. _cache target dirty high ratio: ../../operations/pools#cache-target-dirty-high-ratio
diff --git a/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst b/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst
new file mode 100644
index 0000000..89a3707
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst
@@ -0,0 +1,270 @@
+======================================
+ Pool, PG and CRUSH Config Reference
+======================================
+
+.. index:: pools; configuration
+
+When you create pools and set the number of placement groups for the pool, Ceph
+uses default values when you don't specifically override the defaults. **We
+recommend** overridding some of the defaults. Specifically, we recommend setting
+a pool's replica size and overriding the default number of placement groups. You
+can specifically set these values when running `pool`_ commands. You can also
+override the defaults by adding new ones in the ``[global]`` section of your
+Ceph configuration file.
+
+
+.. literalinclude:: pool-pg.conf
+ :language: ini
+
+
+
+``mon max pool pg num``
+
+:Description: The maximum number of placement groups per pool.
+:Type: Integer
+:Default: ``65536``
+
+
+``mon pg create interval``
+
+:Description: Number of seconds between PG creation in the same
+ Ceph OSD Daemon.
+
+:Type: Float
+:Default: ``30.0``
+
+
+``mon pg stuck threshold``
+
+:Description: Number of seconds after which PGs can be considered as
+ being stuck.
+
+:Type: 32-bit Integer
+:Default: ``300``
+
+``mon pg min inactive``
+
+:Description: Issue a ``HEALTH_ERR`` in cluster log if the number of PGs stay
+ inactive longer than ``mon_pg_stuck_threshold`` exceeds this
+ setting. A non-positive number means disabled, never go into ERR.
+:Type: Integer
+:Default: ``1``
+
+
+``mon pg warn min per osd``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
+ of PGs per (in) OSD is under this number. (a non-positive number
+ disables this)
+:Type: Integer
+:Default: ``30``
+
+
+``mon pg warn max per osd``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
+ of PGs per (in) OSD is above this number. (a non-positive number
+ disables this)
+:Type: Integer
+:Default: ``300``
+
+
+``mon pg warn min objects``
+
+:Description: Do not warn if the total number of objects in cluster is below
+ this number
+:Type: Integer
+:Default: ``1000``
+
+
+``mon pg warn min pool objects``
+
+:Description: Do not warn on pools whose object number is below this number
+:Type: Integer
+:Default: ``1000``
+
+
+``mon pg check down all threshold``
+
+:Description: Threshold of down OSDs percentage after which we check all PGs
+ for stale ones.
+:Type: Float
+:Default: ``0.5``
+
+
+``mon pg warn max object skew``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average object number
+ of a certain pool is greater than ``mon pg warn max object skew`` times
+ the average object number of the whole pool. (a non-positive number
+ disables this)
+:Type: Float
+:Default: ``10``
+
+
+``mon delta reset interval``
+
+:Description: Seconds of inactivity before we reset the pg delta to 0. We keep
+ track of the delta of the used space of each pool, so, for
+ example, it would be easier for us to understand the progress of
+ recovery or the performance of cache tier. But if there's no
+ activity reported for a certain pool, we just reset the history of
+ deltas of that pool.
+:Type: Integer
+:Default: ``10``
+
+
+``mon osd max op age``
+
+:Description: Maximum op age before we get concerned (make it a power of 2).
+ A ``HEALTH_WARN`` will be issued if a request has been blocked longer
+ than this limit.
+:Type: Float
+:Default: ``32.0``
+
+
+``osd pg bits``
+
+:Description: Placement group bits per Ceph OSD Daemon.
+:Type: 32-bit Integer
+:Default: ``6``
+
+
+``osd pgp bits``
+
+:Description: The number of bits per Ceph OSD Daemon for PGPs.
+:Type: 32-bit Integer
+:Default: ``6``
+
+
+``osd crush chooseleaf type``
+
+:Description: The bucket type to use for ``chooseleaf`` in a CRUSH rule. Uses
+ ordinal rank rather than name.
+
+:Type: 32-bit Integer
+:Default: ``1``. Typically a host containing one or more Ceph OSD Daemons.
+
+
+``osd crush initial weight``
+
+:Description: The initial crush weight for newly added osds into crushmap.
+
+:Type: Double
+:Default: ``the size of newly added osd in TB``. By default, the initial crush
+ weight for the newly added osd is set to its volume size in TB.
+ See `Weighting Bucket Items`_ for details.
+
+
+``osd pool default crush replicated ruleset``
+
+:Description: The default CRUSH ruleset to use when creating a replicated pool.
+:Type: 8-bit Integer
+:Default: ``CEPH_DEFAULT_CRUSH_REPLICATED_RULESET``, which means "pick
+ a ruleset with the lowest numerical ID and use that". This is to
+ make pool creation work in the absence of ruleset 0.
+
+
+``osd pool erasure code stripe unit``
+
+:Description: Sets the default size, in bytes, of a chunk of an object
+ stripe for erasure coded pools. Every object of size S
+ will be stored as N stripes, with each data chunk
+ receiving ``stripe unit`` bytes. Each stripe of ``N *
+ stripe unit`` bytes will be encoded/decoded
+ individually. This option can is overridden by the
+ ``stripe_unit`` setting in an erasure code profile.
+
+:Type: Unsigned 32-bit Integer
+:Default: ``4096``
+
+
+``osd pool default size``
+
+:Description: Sets the number of replicas for objects in the pool. The default
+ value is the same as
+ ``ceph osd pool set {pool-name} size {size}``.
+
+:Type: 32-bit Integer
+:Default: ``3``
+
+
+``osd pool default min size``
+
+:Description: Sets the minimum number of written replicas for objects in the
+ pool in order to acknowledge a write operation to the client.
+ If minimum is not met, Ceph will not acknowledge the write to the
+ client. This setting ensures a minimum number of replicas when
+ operating in ``degraded`` mode.
+
+:Type: 32-bit Integer
+:Default: ``0``, which means no particular minimum. If ``0``,
+ minimum is ``size - (size / 2)``.
+
+
+``osd pool default pg num``
+
+:Description: The default number of placement groups for a pool. The default
+ value is the same as ``pg_num`` with ``mkpool``.
+
+:Type: 32-bit Integer
+:Default: ``8``
+
+
+``osd pool default pgp num``
+
+:Description: The default number of placement groups for placement for a pool.
+ The default value is the same as ``pgp_num`` with ``mkpool``.
+ PG and PGP should be equal (for now).
+
+:Type: 32-bit Integer
+:Default: ``8``
+
+
+``osd pool default flags``
+
+:Description: The default flags for new pools.
+:Type: 32-bit Integer
+:Default: ``0``
+
+
+``osd max pgls``
+
+:Description: The maximum number of placement groups to list. A client
+ requesting a large number can tie up the Ceph OSD Daemon.
+
+:Type: Unsigned 64-bit Integer
+:Default: ``1024``
+:Note: Default should be fine.
+
+
+``osd min pg log entries``
+
+:Description: The minimum number of placement group logs to maintain
+ when trimming log files.
+
+:Type: 32-bit Int Unsigned
+:Default: ``1000``
+
+
+``osd default data pool replay window``
+
+:Description: The time (in seconds) for an OSD to wait for a client to replay
+ a request.
+
+:Type: 32-bit Integer
+:Default: ``45``
+
+``osd max pg per osd hard ratio``
+
+:Description: The ratio of number of PGs per OSD allowed by the cluster before
+ OSD refuses to create new PGs. OSD stops creating new PGs if the number
+ of PGs it serves exceeds
+ ``osd max pg per osd hard ratio`` \* ``mon max pg per osd``.
+
+:Type: Float
+:Default: ``2``
+
+.. _pool: ../../operations/pools
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering
+.. _Weighting Bucket Items: ../../operations/crush-map#weightingbucketitems
diff --git a/src/ceph/doc/rados/configuration/pool-pg.conf b/src/ceph/doc/rados/configuration/pool-pg.conf
new file mode 100644
index 0000000..5f1b3b7
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/pool-pg.conf
@@ -0,0 +1,20 @@
+[global]
+
+ # By default, Ceph makes 3 replicas of objects. If you want to make four
+ # copies of an object the default value--a primary copy and three replica
+ # copies--reset the default values as shown in 'osd pool default size'.
+ # If you want to allow Ceph to write a lesser number of copies in a degraded
+ # state, set 'osd pool default min size' to a number less than the
+ # 'osd pool default size' value.
+
+ osd pool default size = 4 # Write an object 4 times.
+ osd pool default min size = 1 # Allow writing one copy in a degraded state.
+
+ # Ensure you have a realistic number of placement groups. We recommend
+ # approximately 100 per OSD. E.g., total number of OSDs multiplied by 100
+ # divided by the number of replicas (i.e., osd pool default size). So for
+ # 10 OSDs and osd pool default size = 4, we'd recommend approximately
+ # (100 * 10) / 4 = 250.
+
+ osd pool default pg num = 250
+ osd pool default pgp num = 250
diff --git a/src/ceph/doc/rados/configuration/storage-devices.rst b/src/ceph/doc/rados/configuration/storage-devices.rst
new file mode 100644
index 0000000..83c0c9b
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/storage-devices.rst
@@ -0,0 +1,83 @@
+=================
+ Storage Devices
+=================
+
+There are two Ceph daemons that store data on disk:
+
+* **Ceph OSDs** (or Object Storage Daemons) are where most of the
+ data is stored in Ceph. Generally speaking, each OSD is backed by
+ a single storage device, like a traditional hard disk (HDD) or
+ solid state disk (SSD). OSDs can also be backed by a combination
+ of devices, like a HDD for most data and an SSD (or partition of an
+ SSD) for some metadata. The number of OSDs in a cluster is
+ generally a function of how much data will be stored, how big each
+ storage device will be, and the level and type of redundancy
+ (replication or erasure coding).
+* **Ceph Monitor** daemons manage critical cluster state like cluster
+ membership and authentication information. For smaller clusters a
+ few gigabytes is all that is needed, although for larger clusters
+ the monitor database can reach tens or possibly hundreds of
+ gigabytes.
+
+
+OSD Backends
+============
+
+There are two ways that OSDs can manage the data they store. Starting
+with the Luminous 12.2.z release, the new default (and recommended) backend is
+*BlueStore*. Prior to Luminous, the default (and only option) was
+*FileStore*.
+
+BlueStore
+---------
+
+BlueStore is a special-purpose storage backend designed specifically
+for managing data on disk for Ceph OSD workloads. It is motivated by
+experience supporting and managing OSDs using FileStore over the
+last ten years. Key BlueStore features include:
+
+* Direct management of storage devices. BlueStore consumes raw block
+ devices or partitions. This avoids any intervening layers of
+ abstraction (such as local file systems like XFS) that may limit
+ performance or add complexity.
+* Metadata management with RocksDB. We embed RocksDB's key/value database
+ in order to manage internal metadata, such as the mapping from object
+ names to block locations on disk.
+* Full data and metadata checksumming. By default all data and
+ metadata written to BlueStore is protected by one or more
+ checksums. No data or metadata will be read from disk or returned
+ to the user without being verified.
+* Inline compression. Data written may be optionally compressed
+ before being written to disk.
+* Multi-device metadata tiering. BlueStore allows its internal
+ journal (write-ahead log) to be written to a separate, high-speed
+ device (like an SSD, NVMe, or NVDIMM) to increased performance. If
+ a significant amount of faster storage is available, internal
+ metadata can also be stored on the faster device.
+* Efficient copy-on-write. RBD and CephFS snapshots rely on a
+ copy-on-write *clone* mechanism that is implemented efficiently in
+ BlueStore. This results in efficient IO both for regular snapshots
+ and for erasure coded pools (which rely on cloning to implement
+ efficient two-phase commits).
+
+For more information, see :doc:`bluestore-config-ref`.
+
+FileStore
+---------
+
+FileStore is the legacy approach to storing objects in Ceph. It
+relies on a standard file system (normally XFS) in combination with a
+key/value database (traditionally LevelDB, now RocksDB) for some
+metadata.
+
+FileStore is well-tested and widely used in production but suffers
+from many performance deficiencies due to its overall design and
+reliance on a traditional file system for storing object data.
+
+Although FileStore is generally capable of functioning on most
+POSIX-compatible file systems (including btrfs and ext4), we only
+recommend that XFS be used. Both btrfs and ext4 have known bugs and
+deficiencies and their use may lead to data loss. By default all Ceph
+provisioning tools will use XFS.
+
+For more information, see :doc:`filestore-config-ref`.