70 files changed, 19407 insertions, 0 deletions
diff --git a/src/ceph/doc/rados/api/index.rst b/src/ceph/doc/rados/api/index.rst
new file mode 100644
index 0000000..cccc153
--- /dev/null
+++ b/src/ceph/doc/rados/api/index.rst
@@ -0,0 +1,22 @@
+===========================
+ Ceph Storage Cluster APIs
+===========================
+
+The :term:`Ceph Storage Cluster` has a messaging layer protocol that enables
+clients to interact with a :term:`Ceph Monitor` and a :term:`Ceph OSD Daemon`.
+``librados`` provides this functionality to :term:`Ceph Clients` in the form of
+a library.  All Ceph Clients either use ``librados`` or the same functionality
+encapsulated in ``librados`` to interact with the object store.  For example,
+``librbd`` and ``libcephfs`` leverage this functionality. You may use
+``librados`` to interact with Ceph directly (e.g., an application that talks to
+Ceph, your own interface to Ceph, etc.).
+
+
+.. toctree::
+   :maxdepth: 2 
+
+   Introduction to librados <librados-intro>
+   librados (C) <librados>
+   librados (C++) <libradospp>
+   librados (Python) <python>
+   object class <objclass-sdk>
diff --git a/src/ceph/doc/rados/api/librados-intro.rst b/src/ceph/doc/rados/api/librados-intro.rst
new file mode 100644
index 0000000..8405f6e
--- /dev/null
+++ b/src/ceph/doc/rados/api/librados-intro.rst
@@ -0,0 +1,1003 @@
+==========================
+ Introduction to librados
+==========================
+
+The :term:`Ceph Storage Cluster` provides the basic storage service that allows
+:term:`Ceph` to uniquely deliver **object, block, and file storage** in one
+unified system. However, you are not limited to using the RESTful, block, or
+POSIX interfaces. Based upon :abbr:`RADOS (Reliable Autonomic Distributed Object
+Store)`, the ``librados`` API enables you to create your own interface to the
+Ceph Storage Cluster.
+
+The ``librados`` API enables you to interact with the two types of daemons in
+the Ceph Storage Cluster: 
+
+- The :term:`Ceph Monitor`, which maintains a master copy of the cluster map. 
+- The :term:`Ceph OSD Daemon` (OSD), which stores data as objects on a storage node.
+
+.. ditaa::  
+            +---------------------------------+
+            |  Ceph Storage Cluster Protocol  |
+            |           (librados)            |
+            +---------------------------------+
+            +---------------+ +---------------+
+            |      OSDs     | |    Monitors   |
+            +---------------+ +---------------+
+
+This guide provides a high-level introduction to using ``librados``. 
+Refer to :doc:`../../architecture` for additional details of the Ceph
+Storage Cluster. To use the API, you need a running Ceph Storage Cluster. 
+See `Installation (Quick)`_ for details.
+
+
+Step 1: Getting librados
+========================
+
+Your client application must bind with ``librados`` to connect to the Ceph
+Storage Cluster. You must install ``librados`` and any required packages to
+write applications that use ``librados``. The ``librados`` API is written in
+C++, with additional bindings for C, Python, Java and PHP. 
+
+
+Getting librados for C/C++
+--------------------------
+
+To install ``librados`` development support files for C/C++ on Debian/Ubuntu
+distributions, execute the following::
+
+	sudo apt-get install librados-dev
+
+To install ``librados`` development support files for C/C++ on RHEL/CentOS
+distributions, execute the following::
+
+	sudo yum install librados2-devel
+
+Once you install ``librados`` for developers, you can find the required 
+headers for C/C++ under ``/usr/include/rados``. ::
+
+	ls /usr/include/rados
+
+
+Getting librados for Python
+---------------------------
+
+The ``rados`` module provides ``librados`` support to Python
+applications. The ``librados-dev`` package for Debian/Ubuntu
+and the ``librados2-devel`` package for RHEL/CentOS will install the
+``python-rados`` package for you. You may install ``python-rados``
+directly too.
+
+To install ``librados`` development support files for Python on Debian/Ubuntu
+distributions, execute the following::
+
+	sudo apt-get install python-rados
+
+To install ``librados`` development support files for Python on RHEL/CentOS
+distributions, execute the following::
+
+	sudo yum install python-rados
+
+You can find the module under ``/usr/share/pyshared`` on Debian systems,
+or under ``/usr/lib/python*/site-packages`` on CentOS/RHEL systems.
+
+
+Getting librados for Java
+-------------------------
+
+To install ``librados`` for Java, you need to execute the following procedure:
+
+#. Install ``jna.jar``. For Debian/Ubuntu, execute:: 
+
+	sudo apt-get install libjna-java
+
+   For CentOS/RHEL, execute::
+
+	sudo yum install jna
+
+   The JAR files are located in ``/usr/share/java``.
+
+#. Clone the ``rados-java`` repository::
+
+	git clone --recursive https://github.com/ceph/rados-java.git
+
+#. Build the ``rados-java`` repository:: 
+
+	cd rados-java
+	ant
+
+   The JAR file is located under ``rados-java/target``.
+
+#. Copy the JAR for RADOS to a common location (e.g., ``/usr/share/java``) and 
+   ensure that it and the JNA JAR are in your JVM's classpath. For example::
+
+	sudo cp target/rados-0.1.3.jar /usr/share/java/rados-0.1.3.jar
+	sudo ln -s /usr/share/java/jna-3.2.7.jar /usr/lib/jvm/default-java/jre/lib/ext/jna-3.2.7.jar  
+	sudo ln -s /usr/share/java/rados-0.1.3.jar  /usr/lib/jvm/default-java/jre/lib/ext/rados-0.1.3.jar
+
+To build the documentation, execute the following::
+
+	ant docs
+
+
+Getting librados for PHP
+-------------------------
+
+To install the ``librados`` extension for PHP, you need to execute the following procedure:
+
+#. Install php-dev. For Debian/Ubuntu, execute::
+
+	sudo apt-get install php5-dev build-essential
+
+   For CentOS/RHEL, execute::
+
+	sudo yum install php-devel
+
+#. Clone the ``phprados`` repository::
+
+	git clone https://github.com/ceph/phprados.git
+
+#. Build ``phprados``::
+
+	cd phprados
+	phpize
+	./configure
+	make
+	sudo make install
+
+#. Enable ``phprados`` in php.ini by adding::
+
+	extension=rados.so
+
+
+Step 2: Configuring a Cluster Handle
+====================================
+
+A :term:`Ceph Client`, via ``librados``, interacts directly with OSDs to store
+and retrieve data. To interact with OSDs, the client app must invoke
+``librados``  and connect to a Ceph Monitor. Once connected, ``librados``
+retrieves the  :term:`Cluster Map` from the Ceph Monitor. When the client app
+wants to read or write data, it creates an I/O context and binds to a
+:term:`pool`. The pool has an associated :term:`ruleset` that defines how it
+will place data in the storage cluster. Via the I/O context, the client 
+provides the object name to ``librados``, which takes the object name
+and the cluster map (i.e., the topology of the cluster) and `computes`_ the
+placement group and `OSD`_  for locating the data. Then the client application
+can read or write data. The client app doesn't need to learn about the topology
+of the cluster directly.
+
+.. ditaa:: 
+            +--------+  Retrieves  +---------------+
+            | Client |------------>|  Cluster Map  |
+            +--------+             +---------------+
+                 |
+                 v      Writes
+              /-----\
+              | obj |
+              \-----/
+                 |      To
+                 v
+            +--------+           +---------------+
+            |  Pool  |---------->| CRUSH Ruleset |
+            +--------+  Selects  +---------------+
+
+
+The Ceph Storage Cluster handle encapsulates the client configuration, including:
+
+- The `user ID`_ for ``rados_create()`` or user name for ``rados_create2()`` 
+  (preferred).
+- The :term:`cephx` authentication key
+- The monitor ID and IP address
+- Logging levels
+- Debugging levels
+
+Thus, the first steps in using the cluster from your app are to 1) create
+a cluster handle that your app will use to connect to the storage cluster,
+and then 2) use that handle to connect. To connect to the cluster, the
+app must supply a monitor address, a username and an authentication key
+(cephx is enabled by default).
+
+.. tip:: Talking to different Ceph Storage Clusters – or to the same cluster 
+   with different users – requires different cluster handles.
+
+RADOS provides a number of ways for you to set the required values. For
+the monitor and encryption key settings, an easy way to handle them is to ensure
+that your Ceph configuration file contains a ``keyring`` path to a keyring file
+and at least one monitor address (e.g,. ``mon host``). For example:: 
+
+	[global]
+	mon host = 192.168.1.1
+	keyring = /etc/ceph/ceph.client.admin.keyring
+
+Once you create the handle, you can read a Ceph configuration file to configure
+the handle. You can also pass arguments to your app and parse them with the
+function for parsing command line arguments (e.g., ``rados_conf_parse_argv()``),
+or parse Ceph environment variables (e.g., ``rados_conf_parse_env()``). Some
+wrappers may not implement convenience methods, so you may need to implement
+these capabilities. The following diagram provides a high-level flow for the
+initial connection.
+
+
+.. ditaa:: +---------+     +---------+
+           | Client  |     | Monitor |
+           +---------+     +---------+
+                |               |
+                |-----+ create  |
+                |     | cluster |
+                |<----+ handle  |
+                |               |
+                |-----+ read    |
+                |     | config  |
+                |<----+ file    |
+                |               |
+                |    connect    |
+                |-------------->|
+                |               |
+                |<--------------|
+                |   connected   |
+                |               |
+
+
+Once connected, your app can invoke functions that affect the whole cluster
+with only the cluster handle. For example, once you have a cluster
+handle, you can:
+
+- Get cluster statistics
+- Use Pool Operation (exists, create, list, delete)
+- Get and set the configuration
+
+
+One of the powerful features of Ceph is the ability to bind to different pools.
+Each pool may have a different number of placement groups, object replicas and
+replication strategies. For example, a pool could be set up as a "hot" pool that
+uses SSDs for frequently used objects or a "cold" pool that uses erasure coding.
+
+The main difference in the various ``librados`` bindings is between C and
+the object-oriented bindings for C++, Java and Python. The object-oriented
+bindings use objects to represent cluster handles, IO Contexts, iterators,
+exceptions, etc.
+
+
+C Example
+---------
+
+For C, creating a simple cluster handle using the ``admin`` user, configuring
+it and connecting to the cluster might look something like this: 
+
+.. code-block:: c
+
+	#include <stdio.h>
+	#include <stdlib.h>
+	#include <string.h>
+	#include <rados/librados.h>
+
+	int main (int argc, const char **argv) 
+	{
+
+		/* Declare the cluster handle and required arguments. */
+		rados_t cluster;
+		char cluster_name[] = "ceph";
+		char user_name[] = "client.admin";
+		uint64_t flags; 
+	
+		/* Initialize the cluster handle with the "ceph" cluster name and the "client.admin" user */  
+		int err;
+		err = rados_create2(&cluster, cluster_name, user_name, flags);
+
+		if (err < 0) {
+			fprintf(stderr, "%s: Couldn't create the cluster handle! %s\n", argv[0], strerror(-err));
+			exit(EXIT_FAILURE);
+		} else {
+			printf("\nCreated a cluster handle.\n");
+		}
+
+
+		/* Read a Ceph configuration file to configure the cluster handle. */
+		err = rados_conf_read_file(cluster, "/etc/ceph/ceph.conf");
+		if (err < 0) {
+			fprintf(stderr, "%s: cannot read config file: %s\n", argv[0], strerror(-err));
+			exit(EXIT_FAILURE);
+		} else {
+			printf("\nRead the config file.\n");
+		}
+
+		/* Read command line arguments */
+		err = rados_conf_parse_argv(cluster, argc, argv);
+		if (err < 0) {
+			fprintf(stderr, "%s: cannot parse command line arguments: %s\n", argv[0], strerror(-err));
+			exit(EXIT_FAILURE);
+		} else {
+			printf("\nRead the command line arguments.\n");
+		}
+
+		/* Connect to the cluster */
+		err = rados_connect(cluster);
+		if (err < 0) {
+			fprintf(stderr, "%s: cannot connect to cluster: %s\n", argv[0], strerror(-err));
+			exit(EXIT_FAILURE);
+		} else {
+			printf("\nConnected to the cluster.\n");
+		}
+
+	}
+
+Compile your client and link to ``librados`` using ``-lrados``. For example:: 
+
+	gcc ceph-client.c -lrados -o ceph-client
+
+
+C++ Example
+-----------
+
+The Ceph project provides a C++ example in the ``ceph/examples/librados``
+directory. For C++, a simple cluster handle using the ``admin`` user requires
+you to initialize a ``librados::Rados`` cluster handle object:
+
+.. code-block:: c++
+
+	#include <iostream>
+	#include <string>
+	#include <rados/librados.hpp>
+
+	int main(int argc, const char **argv)
+	{
+
+		int ret = 0;
+
+		/* Declare the cluster handle and required variables. */	
+		librados::Rados cluster;
+		char cluster_name[] = "ceph";
+		char user_name[] = "client.admin";
+		uint64_t flags = 0; 
+	
+		/* Initialize the cluster handle with the "ceph" cluster name and "client.admin" user */ 
+		{
+			ret = cluster.init2(user_name, cluster_name, flags);
+			if (ret < 0) {
+				std::cerr << "Couldn't initialize the cluster handle! error " << ret << std::endl;
+				return EXIT_FAILURE;
+			} else {
+				std::cout << "Created a cluster handle." << std::endl;
+			}
+		}
+
+		/* Read a Ceph configuration file to configure the cluster handle. */	
+		{	
+			ret = cluster.conf_read_file("/etc/ceph/ceph.conf");	
+			if (ret < 0) {
+				std::cerr << "Couldn't read the Ceph configuration file! error " << ret << std::endl;
+				return EXIT_FAILURE;
+			} else {
+				std::cout << "Read the Ceph configuration file." << std::endl;
+			}
+		}
+		
+		/* Read command line arguments */
+		{
+			ret = cluster.conf_parse_argv(argc, argv);
+			if (ret < 0) {
+				std::cerr << "Couldn't parse command line options! error " << ret << std::endl;
+				return EXIT_FAILURE;
+			} else {
+				std::cout << "Parsed command line options." << std::endl;
+			}
+		}
+	
+		/* Connect to the cluster */
+		{
+			ret = cluster.connect();
+			if (ret < 0) {
+				std::cerr << "Couldn't connect to cluster! error " << ret << std::endl;
+				return EXIT_FAILURE;
+			} else {
+				std::cout << "Connected to the cluster." << std::endl;
+			}
+		}
+	
+		return 0;
+	}
+	
+
+Compile the source; then, link ``librados`` using ``-lrados``. 
+For example::
+
+	g++ -g -c ceph-client.cc -o ceph-client.o
+	g++ -g ceph-client.o -lrados -o ceph-client
+
+
+
+Python Example
+--------------
+
+Python uses the ``admin`` id and the ``ceph`` cluster name by default, and
+will read the standard ``ceph.conf`` file if the conffile parameter is
+set to the empty string. The Python binding converts C++ errors
+into exceptions.
+
+
+.. code-block:: python
+
+	import rados
+
+	try:
+		cluster = rados.Rados(conffile='')
+	except TypeError as e:
+		print 'Argument validation error: ', e
+		raise e
+		
+	print "Created cluster handle."
+
+	try:
+		cluster.connect()
+	except Exception as e:
+		print "connection error: ", e
+		raise e
+	finally:
+		print "Connected to the cluster."
+
+
+Execute the example to verify that it connects to your cluster. ::
+
+	python ceph-client.py
+
+
+Java Example
+------------
+
+Java requires you to specify the user ID (``admin``) or user name
+(``client.admin``), and uses the ``ceph`` cluster name by default . The Java
+binding converts C++-based errors into exceptions.
+
+.. code-block:: java
+
+	import com.ceph.rados.Rados;
+	import com.ceph.rados.RadosException;
+	
+	import java.io.File;
+	
+	public class CephClient {
+		public static void main (String args[]){
+	
+			try {
+				Rados cluster = new Rados("admin");
+				System.out.println("Created cluster handle.");
+	            
+				File f = new File("/etc/ceph/ceph.conf");
+				cluster.confReadFile(f);
+				System.out.println("Read the configuration file.");
+
+				cluster.connect();
+				System.out.println("Connected to the cluster.");            
+
+			} catch (RadosException e) {
+				System.out.println(e.getMessage() + ": " + e.getReturnValue());
+			}
+		}
+	}
+
+
+Compile the source; then, run it. If you have copied the JAR to
+``/usr/share/java`` and sym linked from your ``ext`` directory, you won't need
+to specify the classpath. For example::
+
+	javac CephClient.java
+	java CephClient
+
+
+PHP Example
+------------
+
+With the RADOS extension enabled in PHP you can start creating a new cluster handle very easily:
+
+.. code-block:: php
+
+	<?php
+
+	$r = rados_create();
+	rados_conf_read_file($r, '/etc/ceph/ceph.conf');
+	if (!rados_connect($r)) {
+		echo "Failed to connect to Ceph cluster";
+	} else {
+		echo "Successfully connected to Ceph cluster";
+	}
+
+
+Save this as rados.php and run the code::
+
+	php rados.php
+
+
+Step 3: Creating an I/O Context
+===============================
+
+Once your app has a cluster handle and a connection to a Ceph Storage Cluster,
+you may create an I/O Context and begin reading and writing data. An I/O Context
+binds the connection to a specific pool. The user must have appropriate
+`CAPS`_ permissions to access the specified pool. For example, a user with read
+access but not write access will only be able to read data. I/O Context 
+functionality includes:
+
+- Write/read data and extended attributes
+- List and iterate over objects and extended attributes
+- Snapshot pools, list snapshots, etc.
+
+
+.. ditaa:: +---------+     +---------+     +---------+
+           | Client  |     | Monitor |     |   OSD   |
+           +---------+     +---------+     +---------+
+                |               |               |
+                |-----+ create  |               |
+                |     | I/O     |               | 
+                |<----+ context |               |              
+                |               |               |
+                |  write data   |               |
+                |---------------+-------------->|
+                |               |               |
+                |  write ack    |               |
+                |<--------------+---------------|
+                |               |               |
+                |  write xattr  |               |
+                |---------------+-------------->|
+                |               |               |
+                |  xattr ack    |               |
+                |<--------------+---------------|
+                |               |               |
+                |   read data   |               |
+                |---------------+-------------->|
+                |               |               |
+                |   read ack    |               |
+                |<--------------+---------------|
+                |               |               |
+                |  remove data  |               |
+                |---------------+-------------->|
+                |               |               |
+                |  remove ack   |               |
+                |<--------------+---------------|
+
+
+
+RADOS enables you to interact both synchronously and asynchronously. Once your
+app has an I/O Context, read/write operations only require you to know the
+object/xattr name. The CRUSH algorithm encapsulated in ``librados`` uses the
+cluster map to identify the appropriate OSD. OSD daemons handle the replication,
+as described in `Smart Daemons Enable Hyperscale`_. The ``librados`` library also 
+maps objects to placement groups, as described in  `Calculating PG IDs`_.
+
+The following examples use the default ``data`` pool. However, you may also
+use the API to list pools, ensure they exist, or create and delete pools. For 
+the write operations, the examples illustrate how to use synchronous mode. For
+the read operations, the examples illustrate how to use asynchronous mode.
+
+.. important:: Use caution when deleting pools with this API. If you delete
+   a pool, the pool and ALL DATA in the pool will be lost.
+
+
+C Example
+---------
+
+
+.. code-block:: c
+
+	#include <stdio.h>
+	#include <stdlib.h>
+	#include <string.h>
+	#include <rados/librados.h>
+
+	int main (int argc, const char **argv) 
+	{
+		/* 
+		 * Continued from previous C example, where cluster handle and
+		 * connection are established. First declare an I/O Context. 
+		 */
+
+		rados_ioctx_t io;
+		char *poolname = "data";
+	
+		err = rados_ioctx_create(cluster, poolname, &io);
+		if (err < 0) {
+			fprintf(stderr, "%s: cannot open rados pool %s: %s\n", argv[0], poolname, strerror(-err));
+			rados_shutdown(cluster);
+			exit(EXIT_FAILURE);
+		} else {
+			printf("\nCreated I/O context.\n");
+		}
+
+		/* Write data to the cluster synchronously. */	
+		err = rados_write(io, "hw", "Hello World!", 12, 0);
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot write object \"hw\" to pool %s: %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nWrote \"Hello World\" to object \"hw\".\n");
+		}
+	
+		char xattr[] = "en_US";
+		err = rados_setxattr(io, "hw", "lang", xattr, 5);
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot write xattr to pool %s: %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nWrote \"en_US\" to xattr \"lang\" for object \"hw\".\n");
+		}
+	
+		/*
+		 * Read data from the cluster asynchronously. 
+		 * First, set up asynchronous I/O completion.
+		 */
+		rados_completion_t comp;
+		err = rados_aio_create_completion(NULL, NULL, NULL, &comp);
+		if (err < 0) {
+			fprintf(stderr, "%s: Could not create aio completion: %s\n", argv[0], strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nCreated AIO completion.\n");
+		}
+
+		/* Next, read data using rados_aio_read. */
+		char read_res[100];
+		err = rados_aio_read(io, "hw", comp, read_res, 12, 0);
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot read object. %s %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nRead object \"hw\". The contents are:\n %s \n", read_res);
+		}
+		
+		/* Wait for the operation to complete */
+		rados_aio_wait_for_complete(comp);
+		
+		/* Release the asynchronous I/O complete handle to avoid memory leaks. */
+		rados_aio_release(comp);		
+		
+	
+		char xattr_res[100];
+		err = rados_getxattr(io, "hw", "lang", xattr_res, 5);
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot read xattr. %s %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nRead xattr \"lang\" for object \"hw\". The contents are:\n %s \n", xattr_res);
+		}
+
+		err = rados_rmxattr(io, "hw", "lang");
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot remove xattr. %s %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nRemoved xattr \"lang\" for object \"hw\".\n");
+		}
+
+		err = rados_remove(io, "hw");
+		if (err < 0) {
+			fprintf(stderr, "%s: Cannot remove object. %s %s\n", argv[0], poolname, strerror(-err));
+			rados_ioctx_destroy(io);
+			rados_shutdown(cluster);
+			exit(1);
+		} else {
+			printf("\nRemoved object \"hw\".\n");
+		}
+
+	}
+
+
+
+C++ Example
+-----------
+
+
+.. code-block:: c++
+
+	#include <iostream>
+	#include <string>
+	#include <rados/librados.hpp>
+
+	int main(int argc, const char **argv)
+	{
+
+		/* Continued from previous C++ example, where cluster handle and
+		 * connection are established. First declare an I/O Context. 
+		 */
+
+		librados::IoCtx io_ctx;
+		const char *pool_name = "data";
+		
+		{
+			ret = cluster.ioctx_create(pool_name, io_ctx);
+			if (ret < 0) {
+				std::cerr << "Couldn't set up ioctx! error " << ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Created an ioctx for the pool." << std::endl;
+			}
+		}
+		
+
+		/* Write an object synchronously. */
+		{
+			librados::bufferlist bl;
+			bl.append("Hello World!");
+			ret = io_ctx.write_full("hw", bl);
+			if (ret < 0) {
+				std::cerr << "Couldn't write object! error " << ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Wrote new object 'hw' " << std::endl;
+			}
+		}
+		
+		
+		/*
+		 * Add an xattr to the object.
+		 */
+		{
+			librados::bufferlist lang_bl;
+			lang_bl.append("en_US");
+			ret = io_ctx.setxattr("hw", "lang", lang_bl);
+			if (ret < 0) {
+				std::cerr << "failed to set xattr version entry! error "
+				<< ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Set the xattr 'lang' on our object!" << std::endl;
+			}
+		}
+		
+		
+		/*
+		 * Read the object back asynchronously.
+		 */
+		{
+			librados::bufferlist read_buf;
+			int read_len = 4194304;
+
+			//Create I/O Completion.
+			librados::AioCompletion *read_completion = librados::Rados::aio_create_completion();
+			
+			//Send read request.
+			ret = io_ctx.aio_read("hw", read_completion, &read_buf, read_len, 0);
+			if (ret < 0) {
+				std::cerr << "Couldn't start read object! error " << ret << std::endl;
+				exit(EXIT_FAILURE);
+			}
+
+			// Wait for the request to complete, and check that it succeeded.
+			read_completion->wait_for_complete();
+			ret = read_completion->get_return_value();
+			if (ret < 0) {
+				std::cerr << "Couldn't read object! error " << ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Read object hw asynchronously with contents.\n"
+				<< read_buf.c_str() << std::endl;
+			}
+		}
+		
+		
+		/*
+		 * Read the xattr.
+		 */
+		{
+			librados::bufferlist lang_res;
+			ret = io_ctx.getxattr("hw", "lang", lang_res);
+			if (ret < 0) {
+				std::cerr << "failed to get xattr version entry! error "
+				<< ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Got the xattr 'lang' from object hw!"
+				<< lang_res.c_str() << std::endl;
+			}
+		}
+		
+		
+		/*
+		 * Remove the xattr.
+		 */
+		{
+			ret = io_ctx.rmxattr("hw", "lang");
+			if (ret < 0) {
+				std::cerr << "Failed to remove xattr! error "
+				<< ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Removed the xattr 'lang' from our object!" << std::endl;
+			}
+		}
+		
+		/*
+		 * Remove the object.
+		 */
+		{
+			ret = io_ctx.remove("hw");
+			if (ret < 0) {
+				std::cerr << "Couldn't remove object! error " << ret << std::endl;
+				exit(EXIT_FAILURE);
+			} else {
+				std::cout << "Removed object 'hw'." << std::endl;
+			}
+		}
+	}
+
+
+
+Python Example
+--------------
+
+.. code-block:: python
+
+	print "\n\nI/O Context and Object Operations"
+	print "================================="
+	
+	print "\nCreating a context for the 'data' pool"
+	if not cluster.pool_exists('data'):
+		raise RuntimeError('No data pool exists')
+	ioctx = cluster.open_ioctx('data')
+	
+	print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'."
+	ioctx.write("hw", "Hello World!")
+	print "Writing XATTR 'lang' with value 'en_US' to object 'hw'"
+	ioctx.set_xattr("hw", "lang", "en_US")
+	
+	
+	print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'."
+	ioctx.write("bm", "Bonjour tout le monde!")
+	print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'"
+	ioctx.set_xattr("bm", "lang", "fr_FR")
+	
+	print "\nContents of object 'hw'\n------------------------"
+	print ioctx.read("hw")
+	
+	print "\n\nGetting XATTR 'lang' from object 'hw'"
+	print ioctx.get_xattr("hw", "lang")
+	
+	print "\nContents of object 'bm'\n------------------------"
+	print ioctx.read("bm")
+	
+	print "Getting XATTR 'lang' from object 'bm'"
+	print ioctx.get_xattr("bm", "lang")
+	
+	
+	print "\nRemoving object 'hw'"
+	ioctx.remove_object("hw")
+	
+	print "Removing object 'bm'"
+	ioctx.remove_object("bm")
+
+
+Java-Example
+------------
+
+.. code-block:: java
+
+	import com.ceph.rados.Rados;
+	import com.ceph.rados.RadosException;
+
+	import java.io.File;
+	import com.ceph.rados.IoCTX;
+
+	public class CephClient {
+        	public static void main (String args[]){
+
+                	try {
+				Rados cluster = new Rados("admin");
+				System.out.println("Created cluster handle.");
+
+                        	File f = new File("/etc/ceph/ceph.conf");
+                        	cluster.confReadFile(f);
+                        	System.out.println("Read the configuration file.");
+
+                        	cluster.connect();
+                        	System.out.println("Connected to the cluster.");
+
+				IoCTX io = cluster.ioCtxCreate("data");
+
+				String oidone = "hw";
+				String contentone = "Hello World!";
+				io.write(oidone, contentone); 
+
+				String oidtwo = "bm";
+				String contenttwo = "Bonjour tout le monde!";
+				io.write(oidtwo, contenttwo); 
+
+				String[] objects = io.listObjects();
+                       		for (String object: objects)
+					System.out.println(object);
+
+				io.remove(oidone);
+				io.remove(oidtwo);
+
+				cluster.ioCtxDestroy(io);
+
+                	} catch (RadosException e) {
+                        	System.out.println(e.getMessage() + ": " + e.getReturnValue());
+                	}
+        	}
+	}
+
+
+PHP Example
+-----------
+
+.. code-block:: php
+
+	<?php
+
+	$io = rados_ioctx_create($r, "mypool");
+	rados_write_full($io, "oidOne", "mycontents");
+	rados_remove("oidOne");
+	rados_ioctx_destroy($io);
+
+
+Step 4: Closing Sessions
+========================
+
+Once your app finishes with the I/O Context and cluster handle, the app should
+close the connection and shutdown the handle. For asynchronous I/O, the app
+should also ensure that pending asynchronous operations have completed.
+
+
+C Example
+---------
+
+.. code-block:: c
+
+	rados_ioctx_destroy(io);
+	rados_shutdown(cluster);	
+
+
+C++ Example
+-----------
+
+.. code-block:: c++
+
+	io_ctx.close();
+	cluster.shutdown();
+
+
+Java Example
+--------------
+
+.. code-block:: java
+
+	cluster.ioCtxDestroy(io);
+	cluster.shutDown();
+	
+	
+Python Example
+--------------
+
+.. code-block:: python
+
+	print "\nClosing the connection."
+	ioctx.close()
+	
+	print "Shutting down the handle."
+	cluster.shutdown()
+
+PHP Example
+-----------
+
+.. code-block:: php
+
+	rados_shutdown($r);
+
+
+
+.. _user ID: ../../operations/user-management#command-line-usage
+.. _CAPS: ../../operations/user-management#authorization-capabilities
+.. _Installation (Quick): ../../../start
+.. _Smart Daemons Enable Hyperscale: ../../../architecture#smart-daemons-enable-hyperscale
+.. _Calculating PG IDs: ../../../architecture#calculating-pg-ids
+.. _computes: ../../../architecture#calculating-pg-ids
+.. _OSD: ../../../architecture#mapping-pgs-to-osds
diff --git a/src/ceph/doc/rados/api/librados.rst b/src/ceph/doc/rados/api/librados.rst
new file mode 100644
index 0000000..73d0e42
--- /dev/null
+++ b/src/ceph/doc/rados/api/librados.rst
@@ -0,0 +1,187 @@
+==============
+ Librados (C)
+==============
+
+.. highlight:: c
+
+`librados` provides low-level access to the RADOS service. For an
+overview of RADOS, see :doc:`../../architecture`.
+
+
+Example: connecting and writing an object
+=========================================
+
+To use `Librados`, you instantiate a :c:type:`rados_t` variable (a cluster handle) and
+call :c:func:`rados_create()` with a pointer to it::
+
+	int err;
+	rados_t cluster;
+
+	err = rados_create(&cluster, NULL);
+	if (err < 0) {
+		fprintf(stderr, "%s: cannot create a cluster handle: %s\n", argv[0], strerror(-err));
+		exit(1);
+	}
+
+Then you configure your :c:type:`rados_t` to connect to your cluster,
+either by setting individual values (:c:func:`rados_conf_set()`),
+using a configuration file (:c:func:`rados_conf_read_file()`), using
+command line options (:c:func:`rados_conf_parse_argv`), or an
+environment variable (:c:func:`rados_conf_parse_env()`)::
+
+	err = rados_conf_read_file(cluster, "/path/to/myceph.conf");
+	if (err < 0) {
+		fprintf(stderr, "%s: cannot read config file: %s\n", argv[0], strerror(-err));
+		exit(1);
+	}
+
+Once the cluster handle is configured, you can connect to the cluster with :c:func:`rados_connect()`::
+
+	err = rados_connect(cluster);
+	if (err < 0) {
+		fprintf(stderr, "%s: cannot connect to cluster: %s\n", argv[0], strerror(-err));
+		exit(1);
+	}
+
+Then you open an "IO context", a :c:type:`rados_ioctx_t`, with :c:func:`rados_ioctx_create()`::
+
+	rados_ioctx_t io;
+	char *poolname = "mypool";
+
+	err = rados_ioctx_create(cluster, poolname, &io);
+	if (err < 0) {
+		fprintf(stderr, "%s: cannot open rados pool %s: %s\n", argv[0], poolname, strerror(-err));
+		rados_shutdown(cluster);
+		exit(1);
+	}
+
+Note that the pool you try to access must exist.
+
+Then you can use the RADOS data manipulation functions, for example
+write into an object called ``greeting`` with
+:c:func:`rados_write_full()`::
+
+	err = rados_write_full(io, "greeting", "hello", 5);
+	if (err < 0) {
+		fprintf(stderr, "%s: cannot write pool %s: %s\n", argv[0], poolname, strerror(-err));
+		rados_ioctx_destroy(io);
+		rados_shutdown(cluster);
+		exit(1);
+	}
+
+In the end, you will want to close your IO context and connection to RADOS with :c:func:`rados_ioctx_destroy()` and :c:func:`rados_shutdown()`::
+
+	rados_ioctx_destroy(io);
+	rados_shutdown(cluster);
+
+
+Asychronous IO
+==============
+
+When doing lots of IO, you often don't need to wait for one operation
+to complete before starting the next one. `Librados` provides
+asynchronous versions of several operations:
+
+* :c:func:`rados_aio_write`
+* :c:func:`rados_aio_append`
+* :c:func:`rados_aio_write_full`
+* :c:func:`rados_aio_read`
+
+For each operation, you must first create a
+:c:type:`rados_completion_t` that represents what to do when the
+operation is safe or complete by calling
+:c:func:`rados_aio_create_completion`. If you don't need anything
+special to happen, you can pass NULL::
+
+	rados_completion_t comp;
+	err = rados_aio_create_completion(NULL, NULL, NULL, &comp);
+	if (err < 0) {
+		fprintf(stderr, "%s: could not create aio completion: %s\n", argv[0], strerror(-err));
+		rados_ioctx_destroy(io);
+		rados_shutdown(cluster);
+		exit(1);
+	}
+
+Now you can call any of the aio operations, and wait for it to
+be in memory or on disk on all replicas::
+
+	err = rados_aio_write(io, "foo", comp, "bar", 3, 0);
+	if (err < 0) {
+		fprintf(stderr, "%s: could not schedule aio write: %s\n", argv[0], strerror(-err));
+		rados_aio_release(comp);
+		rados_ioctx_destroy(io);
+		rados_shutdown(cluster);
+		exit(1);
+	}
+	rados_aio_wait_for_complete(comp); // in memory
+	rados_aio_wait_for_safe(comp); // on disk
+
+Finally, we need to free the memory used by the completion with :c:func:`rados_aio_release`::
+
+	rados_aio_release(comp);
+
+You can use the callbacks to tell your application when writes are
+durable, or when read buffers are full. For example, if you wanted to
+measure the latency of each operation when appending to several
+objects, you could schedule several writes and store the ack and
+commit time in the corresponding callback, then wait for all of them
+to complete using :c:func:`rados_aio_flush` before analyzing the
+latencies::
+
+	typedef struct {
+		struct timeval start;
+		struct timeval ack_end;
+		struct timeval commit_end;
+	} req_duration;
+
+	void ack_callback(rados_completion_t comp, void *arg) {
+		req_duration *dur = (req_duration *) arg;
+		gettimeofday(&dur->ack_end, NULL);
+	}
+
+	void commit_callback(rados_completion_t comp, void *arg) {
+		req_duration *dur = (req_duration *) arg;
+		gettimeofday(&dur->commit_end, NULL);
+	}
+
+	int output_append_latency(rados_ioctx_t io, const char *data, size_t len, size_t num_writes) {
+		req_duration times[num_writes];
+		rados_completion_t comps[num_writes];
+		for (size_t i = 0; i < num_writes; ++i) {
+			gettimeofday(&times[i].start, NULL);
+			int err = rados_aio_create_completion((void*) &times[i], ack_callback, commit_callback, &comps[i]);
+			if (err < 0) {
+				fprintf(stderr, "Error creating rados completion: %s\n", strerror(-err));
+				return err;
+			}
+			char obj_name[100];
+			snprintf(obj_name, sizeof(obj_name), "foo%ld", (unsigned long)i);
+			err = rados_aio_append(io, obj_name, comps[i], data, len);
+			if (err < 0) {
+				fprintf(stderr, "Error from rados_aio_append: %s", strerror(-err));
+				return err;
+			}
+		}
+		// wait until all requests finish *and* the callbacks complete
+		rados_aio_flush(io);
+		// the latencies can now be analyzed
+		printf("Request # | Ack latency (s) | Commit latency (s)\n");
+		for (size_t i = 0; i < num_writes; ++i) {
+			// don't forget to free the completions
+			rados_aio_release(comps[i]);
+			struct timeval ack_lat, commit_lat;
+			timersub(&times[i].ack_end, &times[i].start, &ack_lat);
+			timersub(&times[i].commit_end, &times[i].start, &commit_lat);
+			printf("%9ld | %8ld.%06ld | %10ld.%06ld\n", (unsigned long) i, ack_lat.tv_sec, ack_lat.tv_usec, commit_lat.tv_sec, commit_lat.tv_usec);
+		}
+		return 0;
+	}
+
+Note that all the :c:type:`rados_completion_t` must be freed with :c:func:`rados_aio_release` to avoid leaking memory.
+
+
+API calls
+=========
+
+ .. autodoxygenfile:: rados_types.h
+ .. autodoxygenfile:: librados.h
diff --git a/src/ceph/doc/rados/api/libradospp.rst b/src/ceph/doc/rados/api/libradospp.rst
new file mode 100644
index 0000000..27d3fa7
--- /dev/null
+++ b/src/ceph/doc/rados/api/libradospp.rst
@@ -0,0 +1,5 @@
+==================
+ LibradosPP (C++)
+==================
+
+.. todo:: write me!
diff --git a/src/ceph/doc/rados/api/objclass-sdk.rst b/src/ceph/doc/rados/api/objclass-sdk.rst
new file mode 100644
index 0000000..6b1162f
--- /dev/null
+++ b/src/ceph/doc/rados/api/objclass-sdk.rst
@@ -0,0 +1,37 @@
+===========================
+SDK for Ceph Object Classes
+===========================
+
+`Ceph` can be extended by creating shared object classes called `Ceph Object 
+Classes`. The existing framework to build these object classes has dependencies 
+on the internal functionality of `Ceph`, which restricts users to build object 
+classes within the tree. The aim of this project is to create an independent 
+object class interface, which can be used to build object classes outside the 
+`Ceph` tree. This allows us to have two types of object classes, 1) those that 
+have in-tree dependencies and reside in the tree and 2) those that can make use 
+of the `Ceph Object Class SDK framework` and can be built outside of the `Ceph` 
+tree because they do not depend on any internal implementation of `Ceph`. This 
+project decouples object class development from Ceph and encourages creation 
+and distribution of object classes as packages.
+
+In order to demonstrate the use of this framework, we have provided an example 
+called ``cls_sdk``, which is a very simple object class that makes use of the 
+SDK framework. This object class resides in the ``src/cls`` directory. 
+
+Installing objclass.h
+---------------------
+
+The object class interface that enables out-of-tree development of object 
+classes resides in ``src/include/rados/`` and gets installed with `Ceph` 
+installation. After running ``make install``, you should be able to see it 
+in ``<prefix>/include/rados``. ::
+
+        ls /usr/local/include/rados
+
+Using the SDK example
+---------------------
+
+The ``cls_sdk`` object class resides in ``src/cls/sdk/``. This gets built and 
+loaded into Ceph, with the Ceph build process. You can run the 
+``ceph_test_cls_sdk`` unittest, which resides in ``src/test/cls_sdk/``, 
+to test this class.
diff --git a/src/ceph/doc/rados/api/python.rst b/src/ceph/doc/rados/api/python.rst
new file mode 100644
index 0000000..b4fd7e0
--- /dev/null
+++ b/src/ceph/doc/rados/api/python.rst
@@ -0,0 +1,397 @@
+===================
+ Librados (Python)
+===================
+
+The ``rados`` module is a thin Python wrapper for ``librados``.
+
+Installation
+============
+
+To install Python libraries for Ceph, see `Getting librados for Python`_.
+
+
+Getting Started
+===============
+
+You can create your own Ceph client using Python. The following tutorial will
+show you how to import the Ceph Python module, connect to a Ceph cluster,  and
+perform object operations as a ``client.admin`` user. 
+
+.. note:: To use the Ceph Python bindings, you must have access to a 
+   running Ceph cluster. To set one up quickly, see `Getting Started`_.
+
+First, create a Python source file for your Ceph client. ::
+   :linenos:
+   
+	sudo vim client.py
+
+
+Import the Module
+-----------------
+
+To use the ``rados`` module, import it into your source file.
+
+.. code-block:: python
+   :linenos:
+
+	import rados
+
+
+Configure a Cluster Handle
+--------------------------
+
+Before connecting to the Ceph Storage Cluster, create a cluster handle. By
+default, the cluster handle assumes a cluster named ``ceph`` (i.e., the default
+for deployment tools, and our Getting Started guides too),  and a
+``client.admin`` user name. You may change these defaults to suit your needs.
+
+To connect to the Ceph Storage Cluster, your application needs to know where to
+find the  Ceph Monitor. Provide this information to your application by
+specifying the path to your Ceph configuration file, which contains the location
+of the initial Ceph monitors.
+
+.. code-block:: python
+   :linenos:
+
+	import rados, sys
+	
+	#Create Handle Examples.
+	cluster = rados.Rados(conffile='ceph.conf')
+	cluster = rados.Rados(conffile=sys.argv[1])
+	cluster = rados.Rados(conffile = 'ceph.conf', conf = dict (keyring = '/path/to/keyring'))
+
+Ensure that the ``conffile`` argument provides the path and file name of your
+Ceph configuration file. You may use the ``sys`` module to avoid hard-coding the
+Ceph configuration path and file name. 
+
+Your Python client also requires a client keyring. For this example, we use the
+``client.admin`` key by default. If you would like to specify the keyring when
+creating the cluster handle, you may use the ``conf`` argument. Alternatively,
+you may specify the keyring path in your Ceph configuration file. For example, 
+you may add something like the following line to you Ceph configuration file:: 
+
+	keyring = /path/to/ceph.client.admin.keyring
+
+For additional details on modifying your configuration via Python, see `Configuration`_.
+
+
+Connect to the Cluster
+----------------------
+
+Once you have a cluster handle configured, you may connect to the cluster. 
+With a connection to the cluster, you may execute methods that return
+information about the cluster.
+
+.. code-block:: python
+   :linenos:
+   :emphasize-lines: 7
+
+	import rados, sys
+	
+	cluster = rados.Rados(conffile='ceph.conf')
+	print "\nlibrados version: " + str(cluster.version())
+	print "Will attempt to connect to: " + str(cluster.conf_get('mon initial members'))	
+	
+	cluster.connect()
+	print "\nCluster ID: " + cluster.get_fsid()
+
+	print "\n\nCluster Statistics"
+	print "=================="
+	cluster_stats = cluster.get_cluster_stats()
+
+	for key, value in cluster_stats.iteritems():
+		print key, value
+
+
+By default, Ceph authentication is ``on``. Your application will need to know
+the location of the keyring. The ``python-ceph`` module doesn't have the default
+location, so you need to specify the keyring path. The easiest way to specify
+the keyring is to add it to the Ceph configuration file. The following Ceph
+configuration file example uses the ``client.admin`` keyring you generated with
+``ceph-deploy``.
+
+.. code-block:: ini
+   :linenos:
+   
+	[global]
+	...
+	keyring=/path/to/keyring/ceph.client.admin.keyring
+
+
+Manage Pools
+------------
+
+When connected to the cluster, the ``Rados`` API allows you to manage pools. You
+can list pools, check for the existence of a pool, create a pool and delete a
+pool. 
+
+.. code-block:: python
+   :linenos:
+   :emphasize-lines: 6, 13, 18, 25
+
+	print "\n\nPool Operations"
+	print "==============="
+
+	print "\nAvailable Pools"
+	print "----------------"
+	pools = cluster.list_pools()
+
+	for pool in pools:
+		print pool
+
+	print "\nCreate 'test' Pool"
+	print "------------------"
+	cluster.create_pool('test')
+
+	print "\nPool named 'test' exists: " + str(cluster.pool_exists('test'))
+	print "\nVerify 'test' Pool Exists"
+	print "-------------------------"
+	pools = cluster.list_pools()
+
+	for pool in pools:
+		print pool
+
+	print "\nDelete 'test' Pool"
+	print "------------------"
+	cluster.delete_pool('test')
+	print "\nPool named 'test' exists: " + str(cluster.pool_exists('test'))
+
+
+
+Input/Output Context
+--------------------
+
+Reading from and writing to the Ceph Storage Cluster requires an input/output
+context (ioctx). You can create an ioctx with the ``open_ioctx()`` method of the
+``Rados`` class. The ``ioctx_name`` parameter is the name of the  pool you wish
+to use.
+
+.. code-block:: python
+   :linenos:
+
+	ioctx = cluster.open_ioctx('data')
+
+
+Once you have an I/O context, you can read/write objects, extended attributes,
+and perform a number of other operations. After you complete operations, ensure
+that you close the connection. For example: 
+
+.. code-block:: python
+   :linenos:
+
+	print "\nClosing the connection."
+	ioctx.close()
+
+
+Writing, Reading and Removing Objects
+-------------------------------------
+
+Once you create an I/O context, you can write objects to the cluster. If you
+write to an object that doesn't exist, Ceph creates it. If you write to an
+object that exists, Ceph overwrites it (except when you specify a range, and
+then it only overwrites the range). You may read objects (and object ranges)
+from the cluster. You may also remove objects from the cluster. For example: 
+
+.. code-block:: python
+	:linenos:
+	:emphasize-lines: 2, 5, 8
+	
+	print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'."
+	ioctx.write_full("hw", "Hello World!")
+
+	print "\n\nContents of object 'hw'\n------------------------\n"
+	print ioctx.read("hw")
+	
+	print "\nRemoving object 'hw'"
+	ioctx.remove_object("hw")
+
+
+Writing and Reading XATTRS
+--------------------------
+
+Once you create an object, you can write extended attributes (XATTRs) to
+the object and read XATTRs from the object. For example: 
+
+.. code-block:: python
+	:linenos:
+	:emphasize-lines: 2, 5
+
+	print "\n\nWriting XATTR 'lang' with value 'en_US' to object 'hw'"
+	ioctx.set_xattr("hw", "lang", "en_US")
+
+	print "\n\nGetting XATTR 'lang' from object 'hw'\n"
+	print ioctx.get_xattr("hw", "lang")
+
+
+Listing Objects
+---------------
+
+If you want to examine the list of objects in a pool, you may 
+retrieve the list of objects and iterate over them with the object iterator.
+For example:
+
+.. code-block:: python
+	:linenos:
+	:emphasize-lines: 1, 6, 7
+
+	object_iterator = ioctx.list_objects()
+
+	while True : 
+	
+		try : 
+			rados_object = object_iterator.next()
+			print "Object contents = " + rados_object.read()
+	
+		except StopIteration :
+			break
+
+The ``Object`` class provides a file-like interface to an object, allowing
+you to read and write content and extended attributes. Object operations using
+the I/O context provide additional functionality and asynchronous capabilities.
+
+
+Cluster Handle API
+==================
+
+The ``Rados`` class provides an interface into the Ceph Storage Daemon.
+
+
+Configuration
+-------------
+
+The ``Rados`` class provides methods for getting and setting configuration
+values, reading the Ceph configuration file, and parsing arguments. You 
+do not need to be connected to the Ceph Storage Cluster to invoke the following
+methods. See `Storage Cluster Configuration`_ for details on settings.
+
+.. currentmodule:: rados
+.. automethod:: Rados.conf_get(option)
+.. automethod:: Rados.conf_set(option, val)
+.. automethod:: Rados.conf_read_file(path=None)
+.. automethod:: Rados.conf_parse_argv(args)
+.. automethod:: Rados.version()   
+
+
+Connection Management
+---------------------
+
+Once you configure your cluster handle, you may connect to the cluster, check
+the cluster ``fsid``, retrieve cluster statistics, and disconnect (shutdown)
+from the cluster. You may also assert that the cluster handle is in a particular
+state (e.g., "configuring", "connecting", etc.).
+
+
+.. automethod:: Rados.connect(timeout=0)
+.. automethod:: Rados.shutdown()
+.. automethod:: Rados.get_fsid()
+.. automethod:: Rados.get_cluster_stats()
+.. automethod:: Rados.require_state(*args)
+
+
+Pool Operations
+---------------
+
+To use pool operation methods, you must connect to the Ceph Storage Cluster
+first.  You may list the available pools, create a pool, check to see if a pool
+exists,  and delete a pool.
+
+.. automethod:: Rados.list_pools()
+.. automethod:: Rados.create_pool(pool_name, auid=None, crush_rule=None)
+.. automethod:: Rados.pool_exists()
+.. automethod:: Rados.delete_pool(pool_name)
+
+
+
+Input/Output Context API
+========================
+
+To write data to and read data from the Ceph Object Store, you must create
+an Input/Output context (ioctx). The `Rados` class provides a `open_ioctx()`
+method. The remaining ``ioctx`` operations involve invoking methods of the 
+`Ioctx` and other classes. 
+
+.. automethod:: Rados.open_ioctx(ioctx_name)
+.. automethod:: Ioctx.require_ioctx_open()
+.. automethod:: Ioctx.get_stats()
+.. automethod:: Ioctx.change_auid(auid)
+.. automethod:: Ioctx.get_last_version()
+.. automethod:: Ioctx.close()
+
+
+.. Pool Snapshots
+.. --------------
+
+.. The Ceph Storage Cluster allows you to make a snapshot of a pool's state.
+.. Whereas, basic pool operations only require a connection to the cluster, 
+.. snapshots require an I/O context.
+
+.. Ioctx.create_snap(self, snap_name)
+.. Ioctx.list_snaps(self)
+.. SnapIterator.next(self)
+.. Snap.get_timestamp(self)
+.. Ioctx.lookup_snap(self, snap_name)
+.. Ioctx.remove_snap(self, snap_name)
+
+.. not published. This doesn't seem ready yet.
+
+Object Operations
+-----------------
+
+The Ceph Storage Cluster stores data as objects. You can read and write objects
+synchronously or asynchronously. You can read and write from offsets. An object
+has a name (or key) and data.
+
+
+.. automethod:: Ioctx.aio_write(object_name, to_write, offset=0, oncomplete=None, onsafe=None)
+.. automethod:: Ioctx.aio_write_full(object_name, to_write, oncomplete=None, onsafe=None)
+.. automethod:: Ioctx.aio_append(object_name, to_append, oncomplete=None, onsafe=None)
+.. automethod:: Ioctx.write(key, data, offset=0)
+.. automethod:: Ioctx.write_full(key, data)
+.. automethod:: Ioctx.aio_flush()
+.. automethod:: Ioctx.set_locator_key(loc_key)
+.. automethod:: Ioctx.aio_read(object_name, length, offset, oncomplete)
+.. automethod:: Ioctx.read(key, length=8192, offset=0)
+.. automethod:: Ioctx.stat(key)
+.. automethod:: Ioctx.trunc(key, size)
+.. automethod:: Ioctx.remove_object(key)
+
+
+Object Extended Attributes
+--------------------------
+
+You may set extended attributes (XATTRs) on an object. You can retrieve a list
+of objects or XATTRs and iterate over them.
+
+.. automethod:: Ioctx.set_xattr(key, xattr_name, xattr_value)
+.. automethod:: Ioctx.get_xattrs(oid)
+.. automethod:: XattrIterator.next()
+.. automethod:: Ioctx.get_xattr(key, xattr_name)
+.. automethod:: Ioctx.rm_xattr(key, xattr_name)
+
+
+
+Object Interface
+================
+
+From an I/O context, you can retrieve a list of objects from a pool and iterate
+over them. The object interface provide makes each object look like a file, and
+you may perform synchronous operations on the  objects. For asynchronous
+operations, you should use the I/O context methods.
+
+.. automethod:: Ioctx.list_objects()
+.. automethod:: ObjectIterator.next()
+.. automethod:: Object.read(length = 1024*1024)
+.. automethod:: Object.write(string_to_write)
+.. automethod:: Object.get_xattrs()
+.. automethod:: Object.get_xattr(xattr_name)
+.. automethod:: Object.set_xattr(xattr_name, xattr_value)
+.. automethod:: Object.rm_xattr(xattr_name)
+.. automethod:: Object.stat()
+.. automethod:: Object.remove()
+
+
+
+
+.. _Getting Started: ../../../start
+.. _Storage Cluster Configuration: ../../configuration
+.. _Getting librados for Python: ../librados-intro#getting-librados-for-python
diff --git a/src/ceph/doc/rados/command/list-inconsistent-obj.json b/src/ceph/doc/rados/command/list-inconsistent-obj.json
new file mode 100644
index 0000000..76ca43e
--- /dev/null
+++ b/src/ceph/doc/rados/command/list-inconsistent-obj.json
@@ -0,0 +1,195 @@
+{
+  "$schema": "http://json-schema.org/draft-04/schema#",
+  "type": "object",
+  "properties": {
+    "epoch": {
+      "description": "Scrub epoch",
+      "type": "integer"
+    },
+    "inconsistents": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "object": {
+            "description": "Identify a Ceph object",
+            "type": "object",
+            "properties": {
+              "name": {
+                "type": "string"
+              },
+              "nspace": {
+                "type": "string"
+              },
+              "locator": {
+                "type": "string"
+              },
+              "version": {
+                "type": "integer",
+                "minimum": 0
+              },
+              "snap": {
+                "oneOf": [
+                  {
+                    "type": "string",
+                    "enum": [ "head", "snapdir" ]
+                  },
+                  {
+                    "type": "integer",
+                    "minimum": 0
+                  }
+                ]
+              }
+            },
+            "required": [
+              "name",
+              "nspace",
+              "locator",
+              "version",
+              "snap"
+            ]
+          },
+          "selected_object_info": {
+              "type": "string"
+          },
+          "union_shard_errors": {
+            "description": "Union of all shard errors",
+            "type": "array",
+            "items": {
+              "enum": [
+                "missing",
+                "stat_error",
+                "read_error",
+                "data_digest_mismatch_oi",
+                "omap_digest_mismatch_oi",
+                "size_mismatch_oi",
+                "ec_hash_error",
+                "ec_size_error",
+                "oi_attr_missing",
+                "oi_attr_corrupted",
+                "obj_size_oi_mismatch",
+                "ss_attr_missing",
+                "ss_attr_corrupted"
+              ]
+            },
+            "minItems": 0,
+            "uniqueItems": true
+          },
+          "errors": {
+            "description": "Errors related to the analysis of this object",
+            "type": "array",
+            "items": {
+              "enum": [
+                "object_info_inconsistency",
+                "data_digest_mismatch",
+                "omap_digest_mismatch",
+                "size_mismatch",
+                "attr_value_mismatch",
+                "attr_name_mismatch"
+              ]
+            },
+            "minItems": 0,
+            "uniqueItems": true
+          },
+          "shards": {
+            "description": "All found or expected shards",
+            "type": "array",
+            "items": {
+              "description": "Information about a particular shard of object",
+              "type": "object",
+              "properties": {
+                "object_info": {
+                  "type": "string"
+                },
+                "shard": {
+                  "type": "integer"
+                },
+                "osd": {
+                  "type": "integer"
+                },
+                "primary": {
+                  "type": "boolean"
+                },
+                "size": {
+                  "type": "integer"
+                },
+                "omap_digest": {
+                  "description": "Hex representation (e.g. 0x1abd1234)",
+                  "type": "string"
+                },
+                "data_digest": {
+                  "description": "Hex representation (e.g. 0x1abd1234)",
+                  "type": "string"
+                },
+                "errors": {
+                  "description": "Errors with this shard",
+                  "type": "array",
+                  "items": {
+                    "enum": [
+                      "missing",
+                      "stat_error",
+                      "read_error",
+                      "data_digest_mismatch_oi",
+                      "omap_digest_mismatch_oi",
+                      "size_mismatch_oi",
+                      "ec_hash_error",
+                      "ec_size_error",
+                      "oi_attr_missing",
+                      "oi_attr_corrupted",
+                      "obj_size_oi_mismatch",
+                      "ss_attr_missing",
+                      "ss_attr_corrupted"
+                    ]
+                  },
+                  "minItems": 0,
+                  "uniqueItems": true
+                },
+                "attrs": {
+                  "description": "If any shard's attr error is set then all attrs are here",
+                  "type": "array",
+                  "items": {
+                    "description": "Information about a particular shard of object",
+                    "type": "object",
+                    "properties": {
+                      "name": {
+                        "type": "string"
+                      },
+                      "value": {
+                        "type": "string"
+                      },
+                      "Base64": {
+                        "type": "boolean"
+                      }
+                    },
+                    "required": [
+                      "name",
+                      "value",
+                      "Base64"
+                    ],
+                    "additionalProperties": false,
+                    "minItems": 1
+                  }
+                }
+              },
+              "required": [
+                "osd",
+                "primary",
+                "errors"
+              ]
+            }
+          }
+        },
+        "required": [
+          "object",
+          "union_shard_errors",
+          "errors",
+          "shards"
+        ]
+      }
+    }
+  },
+  "required": [
+    "epoch",
+    "inconsistents"
+  ]
+}
diff --git a/src/ceph/doc/rados/command/list-inconsistent-snap.json b/src/ceph/doc/rados/command/list-inconsistent-snap.json
new file mode 100644
index 0000000..0da6b0f
--- /dev/null
+++ b/src/ceph/doc/rados/command/list-inconsistent-snap.json
@@ -0,0 +1,87 @@
+{
+  "$schema": "http://json-schema.org/draft-04/schema#",
+  "type": "object",
+  "properties": {
+    "epoch": {
+      "description": "Scrub epoch",
+      "type": "integer"
+    },
+    "inconsistents": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "name": {
+            "type": "string"
+          },
+          "nspace": {
+            "type": "string"
+          },
+          "locator": {
+            "type": "string"
+          },
+          "snap": {
+            "oneOf": [
+              {
+                "type": "string",
+                "enum": [
+                  "head",
+                  "snapdir"
+                ]
+              },
+              {
+                "type": "integer",
+                "minimum": 0
+              }
+            ]
+          },
+          "errors": {
+            "description": "Errors for this object's snap",
+            "type": "array",
+            "items": {
+              "enum": [
+                "ss_attr_missing",
+                "ss_attr_corrupted",
+                "oi_attr_missing",
+                "oi_attr_corrupted",
+                "snapset_mismatch",
+                "head_mismatch",
+                "headless",
+                "size_mismatch",
+                "extra_clones",
+                "clone_missing"
+              ]
+            },
+            "minItems": 1,
+            "uniqueItems": true
+          },
+          "missing": {
+            "description": "List of missing clones if clone_missing error set",
+            "type": "array",
+            "items": {
+              "type": "integer"
+            }
+          },
+          "extra_clones": {
+            "description": "List of extra clones if extra_clones error set",
+            "type": "array",
+            "items": {
+              "type": "integer"
+            }
+          }
+        },
+        "required": [
+          "name",
+          "nspace",
+          "locator",
+          "snap",
+          "errors"
+        ]
+      }
+    }
+  },
+  "required": [
+    "epoch",
+    "inconsistents"
+  ]
+}
diff --git a/src/ceph/doc/rados/configuration/auth-config-ref.rst b/src/ceph/doc/rados/configuration/auth-config-ref.rst
new file mode 100644
index 0000000..eb14fa4
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/auth-config-ref.rst
@@ -0,0 +1,432 @@
+========================
+ Cephx Config Reference
+========================
+
+The ``cephx`` protocol is enabled by default. Cryptographic authentication has
+some computational costs, though they should generally be quite low.  If the
+network environment connecting your client and server hosts is very safe and 
+you cannot afford authentication, you can turn it off. **This is not generally
+recommended**.
+
+.. note:: If you disable authentication, you are at risk of a man-in-the-middle
+   attack altering your client/server messages, which could lead to disastrous 
+   security effects.
+
+For creating users, see `User Management`_. For details on the architecture
+of Cephx, see `Architecture - High Availability Authentication`_.
+
+
+Deployment Scenarios
+====================
+
+There are two main scenarios for deploying a Ceph cluster, which impact 
+how you initially configure Cephx. Most first time Ceph users use 
+``ceph-deploy`` to create a cluster (easiest). For clusters using
+other deployment tools (e.g., Chef, Juju, Puppet, etc.), you will need
+to use the manual procedures or configure your deployment tool to 
+bootstrap your monitor(s).
+
+ceph-deploy
+-----------
+
+When you deploy a cluster with ``ceph-deploy``, you do not have to bootstrap the
+monitor manually or create the ``client.admin`` user or keyring. The steps you
+execute in the `Storage Cluster Quick Start`_ will invoke ``ceph-deploy`` to do
+that for you.
+
+When you execute ``ceph-deploy new {initial-monitor(s)}``, Ceph will create a
+monitor keyring for you (only used to bootstrap monitors), and it will generate
+an  initial Ceph configuration file for you, which contains the following
+authentication settings, indicating that Ceph enables authentication by
+default::
+
+	auth_cluster_required = cephx
+	auth_service_required = cephx
+	auth_client_required = cephx
+
+When you execute ``ceph-deploy mon create-initial``, Ceph will bootstrap the
+initial monitor(s), retrieve a ``ceph.client.admin.keyring`` file containing the
+key for the  ``client.admin`` user. Additionally, it will also retrieve keyrings
+that give ``ceph-deploy`` and ``ceph-disk`` utilities the ability to prepare and
+activate OSDs and metadata servers.
+
+When you execute ``ceph-deploy admin {node-name}`` (**note:** Ceph must be 
+installed first), you are pushing a Ceph configuration file and the
+``ceph.client.admin.keyring`` to the ``/etc/ceph``  directory of the node. You
+will be able to execute Ceph administrative functions as ``root`` on the command 
+line of that node.
+
+
+Manual Deployment
+-----------------
+
+When you deploy a cluster manually, you have to bootstrap the monitor manually
+and create the ``client.admin`` user and keyring. To bootstrap monitors, follow
+the steps in `Monitor Bootstrapping`_. The steps for monitor bootstrapping are
+the logical steps you must perform when using third party deployment tools like
+Chef, Puppet,  Juju, etc.
+
+
+Enabling/Disabling Cephx
+========================
+
+Enabling Cephx requires that you have deployed keys for your monitors,
+OSDs and metadata servers. If you are simply toggling Cephx on / off, 
+you do not have to repeat the bootstrapping procedures.
+
+
+Enabling Cephx
+--------------
+
+When ``cephx`` is enabled, Ceph will look for the keyring in the default search
+path, which includes ``/etc/ceph/$cluster.$name.keyring``. You can override 
+this location by adding a ``keyring`` option in the ``[global]`` section of 
+your `Ceph configuration`_ file, but this is not recommended.
+
+Execute the following procedures to enable ``cephx`` on a cluster with
+authentication disabled. If you (or your deployment utility) have already
+generated the keys, you may skip the steps related to generating keys.
+
+#. Create a ``client.admin`` key, and save a copy of the key for your client 
+   host::
+
+	ceph auth get-or-create client.admin mon 'allow *' mds 'allow *' osd 'allow *' -o /etc/ceph/ceph.client.admin.keyring
+
+   **Warning:** This will clobber any existing 
+   ``/etc/ceph/client.admin.keyring`` file. Do not perform this step if a 
+   deployment tool has already done it for you. Be careful!
+
+#. Create a keyring for your monitor cluster and generate a monitor 
+   secret key. ::
+
+	ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
+
+#. Copy the monitor keyring into a ``ceph.mon.keyring`` file in every monitor's 
+   ``mon data`` directory. For example, to copy it to ``mon.a`` in cluster ``ceph``, 
+   use the following::
+
+    cp /tmp/ceph.mon.keyring /var/lib/ceph/mon/ceph-a/keyring
+
+#. Generate a secret key for every OSD, where ``{$id}`` is the OSD number::
+
+    ceph auth get-or-create osd.{$id} mon 'allow rwx' osd 'allow *' -o /var/lib/ceph/osd/ceph-{$id}/keyring
+
+#. Generate a secret key for every MDS, where ``{$id}`` is the MDS letter::
+
+    ceph auth get-or-create mds.{$id} mon 'allow rwx' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mds/ceph-{$id}/keyring
+
+#. Enable ``cephx`` authentication by setting the following options in the 
+   ``[global]`` section of your `Ceph configuration`_ file::
+
+    auth cluster required = cephx
+    auth service required = cephx
+    auth client required = cephx
+
+
+#. Start or restart the Ceph cluster. See `Operating a Cluster`_ for details. 
+
+For details on bootstrapping a monitor manually, see `Manual Deployment`_.
+
+
+
+Disabling Cephx
+---------------
+
+The following procedure describes how to disable Cephx. If your cluster
+environment is relatively safe, you can offset the computation expense of
+running authentication. **We do not recommend it.** However, it may be easier
+during setup and/or troubleshooting to temporarily disable authentication.
+
+#. Disable ``cephx`` authentication by setting the following options in the 
+   ``[global]`` section of your `Ceph configuration`_ file::
+
+    auth cluster required = none
+    auth service required = none
+    auth client required = none
+
+
+#. Start or restart the Ceph cluster. See `Operating a Cluster`_ for details.
+
+
+Configuration Settings
+======================
+
+Enablement
+----------
+
+
+``auth cluster required``
+
+:Description: If enabled, the Ceph Storage Cluster daemons (i.e., ``ceph-mon``,
+              ``ceph-osd``, and ``ceph-mds``) must authenticate with 
+              each other. Valid settings are ``cephx`` or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+    
+``auth service required``
+
+:Description: If enabled, the Ceph Storage Cluster daemons require Ceph Clients
+              to authenticate with the Ceph Storage Cluster in order to access 
+              Ceph services. Valid settings are ``cephx`` or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+
+``auth client required``
+
+:Description: If enabled, the Ceph Client requires the Ceph Storage Cluster to 
+              authenticate with the Ceph Client. Valid settings are ``cephx`` 
+              or ``none``.
+
+:Type: String
+:Required: No
+:Default: ``cephx``.
+
+
+.. index:: keys; keyring
+
+Keys
+----
+
+When you run Ceph with authentication enabled, ``ceph`` administrative commands
+and Ceph Clients require authentication keys to access the Ceph Storage Cluster.
+
+The most common way to provide these keys to the ``ceph`` administrative
+commands and clients is to include a Ceph keyring under the ``/etc/ceph``
+directory. For Cuttlefish and later releases using ``ceph-deploy``, the filename
+is usually ``ceph.client.admin.keyring`` (or ``$cluster.client.admin.keyring``).
+If you include the keyring under the ``/etc/ceph`` directory, you don't need to
+specify a ``keyring`` entry in your Ceph configuration file.
+
+We recommend copying the Ceph Storage Cluster's keyring file to nodes where you
+will run administrative commands, because it contains the ``client.admin`` key.
+
+You may use ``ceph-deploy admin`` to perform this task. See `Create an Admin
+Host`_ for details. To perform this step manually, execute the following::
+
+	sudo scp {user}@{ceph-cluster-host}:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring
+
+.. tip:: Ensure the ``ceph.keyring`` file has appropriate permissions set 
+   (e.g., ``chmod 644``) on your client machine.
+
+You may specify the key itself in the Ceph configuration file using the ``key``
+setting (not recommended), or a path to a keyfile using the ``keyfile`` setting.
+
+
+``keyring``
+
+:Description: The path to the keyring file. 
+:Type: String
+:Required: No
+:Default: ``/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin``
+
+
+``keyfile``
+
+:Description: The path to a key file (i.e,. a file containing only the key).
+:Type: String
+:Required: No
+:Default: None
+
+
+``key``
+
+:Description: The key (i.e., the text string of the key itself). Not recommended.
+:Type: String
+:Required: No
+:Default: None
+
+
+Daemon Keyrings
+---------------
+
+Administrative users or deployment tools  (e.g., ``ceph-deploy``) may generate
+daemon keyrings in the same way as generating user keyrings.  By default, Ceph
+stores daemons keyrings inside their data directory. The default keyring
+locations, and the capabilities necessary for the daemon to function, are shown
+below.
+
+``ceph-mon``
+
+:Location: ``$mon_data/keyring``
+:Capabilities: ``mon 'allow *'``
+
+``ceph-osd``
+
+:Location: ``$osd_data/keyring``
+:Capabilities: ``mon 'allow profile osd' osd 'allow *'``
+
+``ceph-mds``
+
+:Location: ``$mds_data/keyring``
+:Capabilities: ``mds 'allow' mon 'allow profile mds' osd 'allow rwx'``
+
+``radosgw``
+
+:Location: ``$rgw_data/keyring``
+:Capabilities: ``mon 'allow rwx' osd 'allow rwx'``
+
+
+.. note:: The monitor keyring (i.e., ``mon.``) contains a key but no 
+   capabilities, and is not part of the cluster ``auth`` database.
+
+The daemon data directory locations default to directories of the form::
+
+  /var/lib/ceph/$type/$cluster-$id
+
+For example, ``osd.12`` would be::
+
+  /var/lib/ceph/osd/ceph-12
+
+You can override these locations, but it is not recommended.
+
+
+.. index:: signatures
+
+Signatures
+----------
+
+In Ceph Bobtail and subsequent versions, we prefer that Ceph authenticate all
+ongoing messages between the entities using the session key set up for that
+initial authentication. However, Argonaut and earlier Ceph daemons do not know
+how to perform ongoing message authentication. To maintain backward
+compatibility (e.g., running both Botbail and Argonaut daemons in the same
+cluster), message signing is **off** by default. If you are running Bobtail or
+later daemons exclusively, configure Ceph to require signatures.
+
+Like other parts of Ceph authentication, Ceph provides fine-grained control so
+you can enable/disable signatures for service messages between the client and
+Ceph, and you can enable/disable signatures for messages between Ceph daemons.
+
+
+``cephx require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message 
+              traffic between the Ceph Client and the Ceph Storage Cluster, and 
+              between daemons comprising the Ceph Storage Cluster. 
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx cluster require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message
+              traffic between Ceph daemons comprising the Ceph Storage Cluster. 
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx service require signatures``
+
+:Description: If set to ``true``, Ceph requires signatures on all message
+              traffic between Ceph Clients and the Ceph Storage Cluster.
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``cephx sign messages``
+
+:Description: If the Ceph version supports message signing, Ceph will sign
+              all messages so they cannot be spoofed.
+
+:Type: Boolean
+:Default: ``true``
+
+
+Time to Live
+------------
+
+``auth service ticket ttl``
+
+:Description: When the Ceph Storage Cluster sends a Ceph Client a ticket for 
+              authentication, the Ceph Storage Cluster assigns the ticket a 
+              time to live.
+
+:Type: Double
+:Default: ``60*60``
+
+
+Backward Compatibility
+======================
+
+For Cuttlefish and earlier releases, see `Cephx`_.
+
+In Ceph Argonaut v0.48 and earlier versions, if you enable ``cephx``
+authentication, Ceph only authenticates the initial communication between the
+client and daemon; Ceph does not authenticate the subsequent messages they send
+to each other, which has security implications. In Ceph Bobtail and subsequent
+versions, Ceph authenticates all ongoing messages between the entities using the
+session key set up for that initial authentication.
+
+We identified a backward compatibility issue between Argonaut v0.48 (and prior
+versions) and Bobtail (and subsequent versions). During testing, if you
+attempted  to use Argonaut (and earlier) daemons with Bobtail (and later)
+daemons, the Argonaut daemons did not know how to perform ongoing message
+authentication, while the Bobtail versions of the daemons insist on
+authenticating message traffic subsequent to the initial
+request/response--making it impossible for Argonaut (and prior) daemons to
+interoperate with Bobtail (and subsequent) daemons.
+
+We have addressed this potential problem by providing a means for Argonaut (and
+prior) systems to interact with Bobtail (and subsequent) systems. Here's how it
+works: by default, the newer systems will not insist on seeing signatures from
+older systems that do not know how to perform them, but will simply accept such
+messages without authenticating them. This new default behavior provides the
+advantage of allowing two different releases to interact. **We do not recommend
+this as a long term solution**. Allowing newer daemons to forgo ongoing
+authentication has the unfortunate security effect that an attacker with control
+of some of your machines or some access to your network can disable session
+security simply by claiming to be unable to sign messages.  
+
+.. note:: Even if you don't actually run any old versions of Ceph, 
+   the attacker may be able to force some messages to be accepted unsigned in the 
+   default scenario. While running Cephx with the default scenario, Ceph still
+   authenticates the initial communication, but you lose desirable session security.
+
+If you know that you are not running older versions of Ceph, or you are willing
+to accept that old servers and new servers will not be able to interoperate, you
+can eliminate this security risk.  If you do so, any Ceph system that is new
+enough to support session authentication and that has Cephx enabled will reject
+unsigned messages.  To preclude new servers from interacting with old servers,
+include the following in the ``[global]`` section of your `Ceph
+configuration`_ file directly below the line that specifies the use of Cephx
+for authentication::
+
+	cephx require signatures = true    ; everywhere possible
+
+You can also selectively require signatures for cluster internal
+communications only, separate from client-facing service::
+
+	cephx cluster require signatures = true    ; for cluster-internal communication
+	cephx service require signatures = true    ; for client-facing service
+
+An option to make a client require signatures from the cluster is not
+yet implemented.
+
+**We recommend migrating all daemons to the newer versions and enabling the 
+foregoing flag** at the nearest practical time so that you may avail yourself 
+of the enhanced authentication.
+
+.. note:: Ceph kernel modules do not support signatures yet.
+
+
+.. _Storage Cluster Quick Start: ../../../start/quick-ceph-deploy/
+.. _Monitor Bootstrapping: ../../../install/manual-deployment#monitor-bootstrapping
+.. _Operating a Cluster: ../../operations/operating
+.. _Manual Deployment: ../../../install/manual-deployment
+.. _Cephx: http://docs.ceph.com/docs/cuttlefish/rados/configuration/auth-config-ref/
+.. _Ceph configuration: ../ceph-conf
+.. _Create an Admin Host: ../../deployment/ceph-deploy-admin
+.. _Architecture - High Availability Authentication: ../../../architecture#high-availability-authentication
+.. _User Management: ../../operations/user-management
diff --git a/src/ceph/doc/rados/configuration/bluestore-config-ref.rst b/src/ceph/doc/rados/configuration/bluestore-config-ref.rst
new file mode 100644
index 0000000..8d8ace6
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/bluestore-config-ref.rst
@@ -0,0 +1,297 @@
+==========================
+BlueStore Config Reference
+==========================
+
+Devices
+=======
+
+BlueStore manages either one, two, or (in certain cases) three storage
+devices.
+
+In the simplest case, BlueStore consumes a single (primary) storage
+device.  The storage device is normally partitioned into two parts:
+
+#. A small partition is formatted with XFS and contains basic metadata
+   for the OSD.  This *data directory* includes information about the
+   OSD (its identifier, which cluster it belongs to, and its private
+   keyring.
+
+#. The rest of the device is normally a large partition occupying the
+   rest of the device that is managed directly by BlueStore contains
+   all of the actual data.  This *primary device* is normally identifed
+   by a ``block`` symlink in data directory.
+
+It is also possible to deploy BlueStore across two additional devices:
+
+* A *WAL device* can be used for BlueStore's internal journal or
+  write-ahead log.  It is identified by the ``block.wal`` symlink in
+  the data directory.  It is only useful to use a WAL device if the
+  device is faster than the primary device (e.g., when it is on an SSD
+  and the primary device is an HDD).
+* A *DB device* can be used for storing BlueStore's internal metadata.
+  BlueStore (or rather, the embedded RocksDB) will put as much
+  metadata as it can on the DB device to improve performance.  If the
+  DB device fills up, metadata will spill back onto the primary device
+  (where it would have been otherwise).  Again, it is only helpful to
+  provision a DB device if it is faster than the primary device.
+
+If there is only a small amount of fast storage available (e.g., less
+than a gigabyte), we recommend using it as a WAL device.  If there is
+more, provisioning a DB device makes more sense.  The BlueStore
+journal will always be placed on the fastest device available, so
+using a DB device will provide the same benefit that the WAL device
+would while *also* allowing additional metadata to be stored there (if
+it will fix).
+
+A single-device BlueStore OSD can be provisioned with::
+
+  ceph-disk prepare --bluestore <device>
+
+To specify a WAL device and/or DB device, ::
+
+  ceph-disk prepare --bluestore <device> --block.wal <wal-device> --block-db <db-device>
+
+Cache size
+==========
+
+The amount of memory consumed by each OSD for BlueStore's cache is
+determined by the ``bluestore_cache_size`` configuration option.  If
+that config option is not set (i.e., remains at 0), there is a
+different default value that is used depending on whether an HDD or
+SSD is used for the primary device (set by the
+``bluestore_cache_size_ssd`` and ``bluestore_cache_size_hdd`` config
+options).
+
+BlueStore and the rest of the Ceph OSD does the best it can currently
+to stick to the budgeted memory.  Note that on top of the configured
+cache size, there is also memory consumed by the OSD itself, and
+generally some overhead due to memory fragmentation and other
+allocator overhead.
+
+The configured cache memory budget can be used in a few different ways:
+
+* Key/Value metadata (i.e., RocksDB's internal cache)
+* BlueStore metadata
+* BlueStore data (i.e., recently read or written object data)
+
+Cache memory usage is governed by the following options:
+``bluestore_cache_meta_ratio``, ``bluestore_cache_kv_ratio``, and
+``bluestore_cache_kv_max``.  The fraction of the cache devoted to data
+is 1.0 minus the meta and kv ratios.  The memory devoted to kv
+metadata (the RocksDB cache) is capped by ``bluestore_cache_kv_max``
+since our testing indicates there are diminishing returns beyond a
+certain point.
+
+``bluestore_cache_size``
+
+:Description: The amount of memory BlueStore will use for its cache.  If zero, ``bluestore_cache_size_hdd`` or ``bluestore_cache_size_ssd`` will be used instead.
+:Type: Integer
+:Required: Yes
+:Default: ``0``
+
+``bluestore_cache_size_hdd``
+
+:Description: The default amount of memory BlueStore will use for its cache when backed by an HDD.
+:Type: Integer
+:Required: Yes
+:Default: ``1 * 1024 * 1024 * 1024`` (1 GB)
+
+``bluestore_cache_size_ssd``
+
+:Description: The default amount of memory BlueStore will use for its cache when backed by an SSD.
+:Type: Integer
+:Required: Yes
+:Default: ``3 * 1024 * 1024 * 1024`` (3 GB)
+
+``bluestore_cache_meta_ratio``
+
+:Description: The ratio of cache devoted to metadata.
+:Type: Floating point
+:Required: Yes
+:Default: ``.01``
+
+``bluestore_cache_kv_ratio``
+
+:Description: The ratio of cache devoted to key/value data (rocksdb).
+:Type: Floating point
+:Required: Yes
+:Default: ``.99``
+
+``bluestore_cache_kv_max``
+
+:Description: The maximum amount of cache devoted to key/value data (rocksdb).
+:Type: Floating point
+:Required: Yes
+:Default: ``512 * 1024*1024`` (512 MB)
+
+
+Checksums
+=========
+
+BlueStore checksums all metadata and data written to disk.  Metadata
+checksumming is handled by RocksDB and uses `crc32c`. Data
+checksumming is done by BlueStore and can make use of `crc32c`,
+`xxhash32`, or `xxhash64`.  The default is `crc32c` and should be
+suitable for most purposes.
+
+Full data checksumming does increase the amount of metadata that
+BlueStore must store and manage.  When possible, e.g., when clients
+hint that data is written and read sequentially, BlueStore will
+checksum larger blocks, but in many cases it must store a checksum
+value (usually 4 bytes) for every 4 kilobyte block of data.
+
+It is possible to use a smaller checksum value by truncating the
+checksum to two or one byte, reducing the metadata overhead.  The
+trade-off is that the probability that a random error will not be
+detected is higher with a smaller checksum, going from about one if
+four billion with a 32-bit (4 byte) checksum to one is 65,536 for a
+16-bit (2 byte) checksum or one in 256 for an 8-bit (1 byte) checksum.
+The smaller checksum values can be used by selecting `crc32c_16` or
+`crc32c_8` as the checksum algorithm.
+
+The *checksum algorithm* can be set either via a per-pool
+``csum_type`` property or the global config option.  For example, ::
+
+  ceph osd pool set <pool-name> csum_type <algorithm>
+
+``bluestore_csum_type``
+
+:Description: The default checksum algorithm to use.
+:Type: String
+:Required: Yes
+:Valid Settings: ``none``, ``crc32c``, ``crc32c_16``, ``crc32c_8``, ``xxhash32``, ``xxhash64``
+:Default: ``crc32c``
+
+
+Inline Compression
+==================
+
+BlueStore supports inline compression using `snappy`, `zlib`, or
+`lz4`. Please note that the `lz4` compression plugin is not
+distributed in the official release.
+
+Whether data in BlueStore is compressed is determined by a combination
+of the *compression mode* and any hints associated with a write
+operation.  The modes are:
+
+* **none**: Never compress data.
+* **passive**: Do not compress data unless the write operation as a
+  *compressible* hint set.
+* **aggressive**: Compress data unless the write operation as an
+  *incompressible* hint set.
+* **force**: Try to compress data no matter what.
+
+For more information about the *compressible* and *incompressible* IO
+hints, see :doc:`/api/librados/#rados_set_alloc_hint`.
+
+Note that regardless of the mode, if the size of the data chunk is not
+reduced sufficiently it will not be used and the original
+(uncompressed) data will be stored.  For example, if the ``bluestore
+compression required ratio`` is set to ``.7`` then the compressed data
+must be 70% of the size of the original (or smaller).
+
+The *compression mode*, *compression algorithm*, *compression required
+ratio*, *min blob size*, and *max blob size* can be set either via a
+per-pool property or a global config option.  Pool properties can be
+set with::
+
+  ceph osd pool set <pool-name> compression_algorithm <algorithm>
+  ceph osd pool set <pool-name> compression_mode <mode>
+  ceph osd pool set <pool-name> compression_required_ratio <ratio>
+  ceph osd pool set <pool-name> compression_min_blob_size <size>
+  ceph osd pool set <pool-name> compression_max_blob_size <size>
+
+``bluestore compression algorithm``
+
+:Description: The default compressor to use (if any) if the per-pool property
+              ``compression_algorithm`` is not set. Note that zstd is *not*
+              recommended for bluestore due to high CPU overhead when
+              compressing small amounts of data.
+:Type: String
+:Required: No
+:Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
+:Default: ``snappy``
+
+``bluestore compression mode``
+
+:Description: The default policy for using compression if the per-pool property
+              ``compression_mode`` is not set. ``none`` means never use
+              compression.  ``passive`` means use compression when
+              `clients hint`_ that data is compressible.  ``aggressive`` means
+              use compression unless clients hint that data is not compressible.
+              ``force`` means use compression under all circumstances even if
+              the clients hint that the data is not compressible.
+:Type: String
+:Required: No
+:Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
+:Default: ``none``
+
+``bluestore compression required ratio``
+
+:Description: The ratio of the size of the data chunk after
+              compression relative to the original size must be at
+              least this small in order to store the compressed
+              version.
+
+:Type: Floating point
+:Required: No
+:Default: .875
+
+``bluestore compression min blob size``
+
+:Description: Chunks smaller than this are never compressed.
+              The per-pool property ``compression_min_blob_size`` overrides
+              this setting.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 0
+
+``bluestore compression min blob size hdd``
+
+:Description: Default value of ``bluestore compression min blob size``
+              for rotational media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 128K
+
+``bluestore compression min blob size ssd``
+
+:Description: Default value of ``bluestore compression min blob size``
+              for non-rotational (solid state) media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 8K
+
+``bluestore compression max blob size``
+
+:Description: Chunks larger than this are broken into smaller blobs sizing
+              ``bluestore compression max blob size`` before being compressed.
+              The per-pool property ``compression_max_blob_size`` overrides
+              this setting.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 0
+
+``bluestore compression max blob size hdd``
+
+:Description: Default value of ``bluestore compression max blob size``
+              for rotational media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 512K
+
+``bluestore compression max blob size ssd``
+
+:Description: Default value of ``bluestore compression max blob size``
+              for non-rotational (solid state) media.
+
+:Type: Unsigned Integer
+:Required: No
+:Default: 64K
+
+.. _clients hint: ../../api/librados/#rados_set_alloc_hint
diff --git a/src/ceph/doc/rados/configuration/ceph-conf.rst b/src/ceph/doc/rados/configuration/ceph-conf.rst
new file mode 100644
index 0000000..df88452
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/ceph-conf.rst
@@ -0,0 +1,629 @@
+==================
+ Configuring Ceph
+==================
+
+When you start the Ceph service, the initialization process activates a series
+of daemons that run in the background. A :term:`Ceph Storage Cluster` runs 
+two types of daemons: 
+
+- :term:`Ceph Monitor` (``ceph-mon``)
+- :term:`Ceph OSD Daemon` (``ceph-osd``)
+
+Ceph Storage Clusters that support the :term:`Ceph Filesystem` run at least one
+:term:`Ceph Metadata Server` (``ceph-mds``). Clusters that support :term:`Ceph
+Object Storage` run Ceph Gateway daemons (``radosgw``). For your convenience,
+each daemon has a series of default values (*i.e.*, many are set by
+``ceph/src/common/config_opts.h``). You may override these settings with a Ceph
+configuration file.
+
+
+.. _ceph-conf-file:
+
+The Configuration File
+======================
+
+When you start a Ceph Storage Cluster, each daemon looks for a Ceph
+configuration file (i.e., ``ceph.conf`` by default) that provides the cluster's
+configuration settings. For manual deployments, you need to create a Ceph
+configuration file. For tools that create configuration files for you (*e.g.*,
+``ceph-deploy``, Chef, etc.), you may use the information contained herein as a
+reference. The Ceph configuration file defines:
+
+- Cluster Identity
+- Authentication settings
+- Cluster membership
+- Host names
+- Host addresses
+- Paths to keyrings
+- Paths to journals
+- Paths to data
+- Other runtime options
+
+The default Ceph configuration file locations in sequential order include:
+
+#. ``$CEPH_CONF`` (*i.e.,* the path following the ``$CEPH_CONF`` 
+   environment variable)
+#. ``-c path/path``  (*i.e.,* the ``-c`` command line argument)
+#. ``/etc/ceph/ceph.conf``
+#. ``~/.ceph/config``
+#. ``./ceph.conf`` (*i.e.,* in the current working directory)
+
+
+The Ceph configuration file uses an *ini* style syntax. You can add comments 
+by preceding comments with a pound sign (#) or a semi-colon (;).  For example:
+
+.. code-block:: ini
+
+	# <--A number (#) sign precedes a comment.
+	; A comment may be anything. 
+	# Comments always follow a semi-colon (;) or a pound (#) on each line.
+	# The end of the line terminates a comment.
+	# We recommend that you provide comments in your configuration file(s).
+
+
+.. _ceph-conf-settings:
+
+Config Sections 
+===============
+
+The configuration file can configure all Ceph daemons in a Ceph Storage Cluster,
+or all Ceph daemons of a particular type. To configure a series of daemons, the
+settings must be included under the processes that will receive the
+configuration as follows: 
+
+``[global]``
+
+:Description: Settings under ``[global]`` affect all daemons in a Ceph Storage
+              Cluster.
+              
+:Example: ``auth supported = cephx``
+
+``[osd]``
+
+:Description: Settings under ``[osd]`` affect all ``ceph-osd`` daemons in 
+              the Ceph Storage Cluster, and override the same setting in 
+              ``[global]``.
+
+:Example: ``osd journal size = 1000``
+
+``[mon]``
+
+:Description: Settings under ``[mon]`` affect all ``ceph-mon`` daemons in 
+              the Ceph Storage Cluster, and override the same setting in 
+              ``[global]``.
+
+:Example: ``mon addr = 10.0.0.101:6789``
+
+
+``[mds]``
+
+:Description: Settings under ``[mds]`` affect all ``ceph-mds`` daemons in 
+              the Ceph Storage Cluster, and override the same setting in 
+              ``[global]``. 
+
+:Example: ``host = myserver01``
+
+``[client]``
+
+:Description: Settings under ``[client]`` affect all Ceph Clients 
+              (e.g., mounted Ceph Filesystems, mounted Ceph Block Devices, 
+              etc.).
+
+:Example: ``log file = /var/log/ceph/radosgw.log``
+
+
+Global settings affect all instances of all daemon in the Ceph Storage Cluster.
+Use the ``[global]`` setting for values that are common for all daemons in the
+Ceph Storage Cluster. You can override each ``[global]`` setting by:
+
+#. Changing the setting in a particular process type 
+   (*e.g.,* ``[osd]``, ``[mon]``, ``[mds]`` ).
+
+#. Changing the setting in a particular process (*e.g.,* ``[osd.1]`` ).
+
+Overriding a global setting affects all child processes, except those that
+you specifically override in a particular daemon. 
+
+A typical global setting involves activating authentication. For example:
+
+.. code-block:: ini
+
+	[global]
+	#Enable authentication between hosts within the cluster.
+	#v 0.54 and earlier
+	auth supported = cephx
+		
+	#v 0.55 and after
+	auth cluster required = cephx
+	auth service required = cephx
+	auth client required = cephx
+
+
+You can specify settings that apply to a particular type of daemon. When you
+specify settings under ``[osd]``, ``[mon]`` or ``[mds]`` without specifying a
+particular instance, the setting will apply to all OSDs, monitors or metadata
+daemons respectively.
+
+A typical daemon-wide setting involves setting journal sizes, filestore
+settings, etc. For example:
+
+.. code-block:: ini
+
+	[osd]
+	osd journal size = 1000
+
+
+You may specify settings for particular instances of a daemon. You may specify
+an instance by entering its type, delimited by a period (.) and by the instance
+ID. The instance ID for a Ceph OSD Daemon is always numeric, but it may be
+alphanumeric for Ceph Monitors and Ceph Metadata Servers.
+
+.. code-block:: ini
+
+	[osd.1]
+	# settings affect osd.1 only.
+		
+	[mon.a]	
+	# settings affect mon.a only.
+		
+	[mds.b]
+	# settings affect mds.b only.
+
+
+If the daemon you specify is a Ceph Gateway client, specify the daemon and the 
+instance, delimited by a period (.). For example:: 
+
+	[client.radosgw.instance-name]
+	# settings affect client.radosgw.instance-name only.
+
+
+
+.. _ceph-metavariables:
+
+Metavariables
+=============
+
+Metavariables simplify Ceph Storage Cluster configuration dramatically. When a
+metavariable is set in a configuration value, Ceph expands the metavariable into
+a concrete value. Metavariables are very powerful when used within the
+``[global]``, ``[osd]``, ``[mon]``, ``[mds]`` or ``[client]`` sections of your 
+configuration file. Ceph metavariables are similar to Bash shell expansion.
+
+Ceph supports the following metavariables: 
+
+
+``$cluster``
+
+:Description: Expands to the Ceph Storage Cluster name. Useful when running 
+              multiple Ceph Storage Clusters on the same hardware.
+
+:Example: ``/etc/ceph/$cluster.keyring``
+:Default: ``ceph``
+
+
+``$type``
+
+:Description: Expands to one of ``mds``, ``osd``, or ``mon``, depending on the 
+              type of the instant daemon.
+
+:Example: ``/var/lib/ceph/$type``
+
+
+``$id``
+
+:Description: Expands to the daemon identifier. For ``osd.0``, this would be 
+              ``0``; for ``mds.a``, it would be ``a``.
+
+:Example: ``/var/lib/ceph/$type/$cluster-$id``
+
+
+``$host``
+
+:Description: Expands to the host name of the instant daemon.
+
+
+``$name``
+
+:Description: Expands to ``$type.$id``.
+:Example: ``/var/run/ceph/$cluster-$name.asok``
+
+``$pid``
+
+:Description: Expands to daemon pid.
+:Example: ``/var/run/ceph/$cluster-$name-$pid.asok``
+
+
+.. _ceph-conf-common-settings:
+
+Common Settings
+===============
+
+The `Hardware Recommendations`_ section provides some hardware guidelines for
+configuring a Ceph Storage Cluster. It is possible for a single :term:`Ceph
+Node` to run multiple daemons. For example, a single node with multiple drives
+may run one ``ceph-osd`` for each drive. Ideally, you will  have a node for a
+particular type of process. For example, some nodes may run ``ceph-osd``
+daemons, other nodes may run ``ceph-mds`` daemons, and still  other nodes may
+run ``ceph-mon`` daemons.
+
+Each node has a name identified by the ``host`` setting. Monitors also specify
+a network address and port (i.e., domain name or IP address) identified by the
+``addr`` setting.  A basic configuration file will typically specify only
+minimal settings for each instance of monitor daemons. For example:
+
+.. code-block:: ini
+
+	[global]
+	mon_initial_members = ceph1
+	mon_host = 10.0.0.1
+
+
+.. important:: The ``host`` setting is the short name of the node (i.e., not 
+   an fqdn). It is **NOT** an IP address either.  Enter ``hostname -s`` on 
+   the command line to retrieve the name of the node. Do not use ``host`` 
+   settings for anything other than initial monitors unless you are deploying
+   Ceph manually. You **MUST NOT** specify ``host`` under individual daemons 
+   when using deployment tools like ``chef`` or ``ceph-deploy``, as those tools 
+   will enter the appropriate values for you in the cluster map.
+
+
+.. _ceph-network-config:
+
+Networks
+========
+
+See the `Network Configuration Reference`_ for a detailed discussion about
+configuring a network for use with Ceph.
+
+
+Monitors
+========
+
+Ceph production clusters typically deploy with a minimum 3 :term:`Ceph Monitor`
+daemons to ensure high availability should a monitor instance crash. At least
+three (3) monitors ensures that the Paxos algorithm can determine which version
+of the :term:`Ceph Cluster Map` is the most recent from a majority of Ceph
+Monitors in the quorum.
+
+.. note:: You may deploy Ceph with a single monitor, but if the instance fails,
+	       the lack of other monitors may interrupt data service availability.
+
+Ceph Monitors typically listen on port ``6789``. For example:
+
+.. code-block:: ini 
+
+	[mon.a]
+	host = hostName
+	mon addr = 150.140.130.120:6789
+
+By default, Ceph expects that you will store a monitor's data under the
+following path::
+
+	/var/lib/ceph/mon/$cluster-$id
+	
+You or a deployment tool (e.g., ``ceph-deploy``) must create the corresponding
+directory. With metavariables fully  expressed and a cluster named "ceph", the
+foregoing directory would evaluate to:: 
+
+	/var/lib/ceph/mon/ceph-a
+	
+For additional details, see the `Monitor Config Reference`_.
+
+.. _Monitor Config Reference: ../mon-config-ref
+
+
+.. _ceph-osd-config:
+
+
+Authentication
+==============
+
+.. versionadded:: Bobtail 0.56
+
+For Bobtail (v 0.56) and beyond, you should expressly enable or disable
+authentication in the ``[global]`` section of your Ceph configuration file. ::
+
+	auth cluster required = cephx
+	auth service required = cephx
+	auth client required = cephx
+
+Additionally, you should enable message signing. See `Cephx Config Reference`_ for details. 
+
+.. important:: When upgrading, we recommend expressly disabling authentication 
+   first, then perform the upgrade. Once the upgrade is complete, re-enable 
+   authentication.
+
+.. _Cephx Config Reference: ../auth-config-ref
+
+
+.. _ceph-monitor-config:
+
+
+OSDs
+====
+
+Ceph production clusters typically deploy :term:`Ceph OSD Daemons` where one node
+has one OSD daemon running a filestore on one storage drive. A typical
+deployment specifies a journal size. For example:
+
+.. code-block:: ini
+
+	[osd]
+	osd journal size = 10000
+		
+	[osd.0]
+	host = {hostname} #manual deployments only.
+
+
+By default, Ceph expects that you will store a Ceph OSD Daemon's data with the 
+following path:: 
+
+	/var/lib/ceph/osd/$cluster-$id
+	
+You or a deployment tool (e.g., ``ceph-deploy``) must create the corresponding
+directory. With metavariables fully  expressed and a cluster named "ceph", the
+foregoing directory would evaluate to:: 
+
+	/var/lib/ceph/osd/ceph-0
+	
+You may override this path using the ``osd data`` setting. We don't recommend 
+changing the default location. Create the default directory on your OSD host.
+
+:: 
+
+	ssh {osd-host}
+	sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}	
+
+The ``osd data`` path ideally leads to a mount point with a hard disk that is
+separate from the hard disk storing and running the operating system and
+daemons. If the OSD is for a disk other than the OS disk, prepare it for
+use with Ceph, and mount it to the directory you just created:: 
+
+	ssh {new-osd-host}
+	sudo mkfs -t {fstype} /dev/{disk}
+	sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
+
+We recommend using the ``xfs`` file system when running
+:command:`mkfs`.  (``btrfs`` and ``ext4`` are not recommended and no
+longer tested.)
+
+See the `OSD Config Reference`_ for additional configuration details.
+
+
+Heartbeats
+==========
+
+During runtime operations, Ceph OSD Daemons check up on other Ceph OSD Daemons
+and report their  findings to the Ceph Monitor. You do not have to provide any
+settings. However, if you have network latency issues, you may wish to modify
+the settings. 
+
+See `Configuring Monitor/OSD Interaction`_ for additional details.
+
+
+.. _ceph-logging-and-debugging:
+
+Logs / Debugging
+================
+
+Sometimes you may encounter issues with Ceph that require
+modifying logging output and using Ceph's debugging. See `Debugging and
+Logging`_ for details on log rotation.
+
+.. _Debugging and Logging: ../../troubleshooting/log-and-debug
+
+
+Example ceph.conf
+=================
+
+.. literalinclude:: demo-ceph.conf
+   :language: ini
+
+.. _ceph-runtime-config:
+
+Runtime Changes
+===============
+
+Ceph allows you to make changes to the configuration of a ``ceph-osd``,
+``ceph-mon``, or ``ceph-mds`` daemon at runtime. This capability is quite
+useful for increasing/decreasing logging output, enabling/disabling debug
+settings, and even for runtime optimization. The following reflects runtime
+configuration usage::
+
+	ceph tell {daemon-type}.{id or *} injectargs --{name} {value} [--{name} {value}]
+	
+Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply
+the  runtime setting to all daemons of a particular type with ``*``, or specify
+a specific  daemon's ID (i.e., its number or letter). For example, to increase
+debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following::
+
+	ceph tell osd.0 injectargs --debug-osd 20 --debug-ms 1
+
+In your ``ceph.conf`` file, you may use spaces when specifying a
+setting name.  When specifying a setting name on the command line,
+ensure that you use an underscore or hyphen (``_`` or ``-``) between
+terms (e.g., ``debug osd`` becomes ``--debug-osd``).
+
+
+Viewing a Configuration at Runtime
+==================================
+
+If your Ceph Storage Cluster is running, and you would like to see the
+configuration settings from a running daemon, execute the following:: 
+
+	ceph daemon {daemon-type}.{id} config show | less
+
+If you are on a machine where osd.0 is running, the command would be::
+
+    ceph daemon osd.0 config show | less
+
+Reading Configuration Metadata at Runtime
+=========================================
+
+Information about the available configuration options is available via
+the ``config help`` command:
+
+::
+
+	ceph daemon {daemon-type}.{id} config help | less
+
+
+This metadata is primarily intended to be used when integrating other
+software with Ceph, such as graphical user interfaces.  The output is
+a list of JSON objects, for example:
+
+::
+
+        {                                                                       
+            "name": "mon_host",                                                 
+            "type": "std::string",                                              
+            "level": "basic",                                                   
+            "desc": "list of hosts or addresses to search for a monitor",            
+            "long_desc": "This is a comma, whitespace, or semicolon separated list of IP addresses or hostnames. Hostnames are resolved via DNS and all A or AAAA records are included in the search list.",
+            "default": "",                                                      
+            "daemon_default": "",                                               
+            "tags": [],                                                         
+            "services": [                                                       
+                "common"                                                        
+            ],                                                                  
+            "see_also": [],                                                     
+            "enum_values": [],                                                  
+            "min": "",                                                          
+            "max": ""                                                           
+        }
+
+type
+____
+
+The type of the setting, given as a C++ type name.
+
+level
+_____
+
+One of `basic`, `advanced`, `dev`.  The `dev` options are not intended
+for use outside of development and testing.
+
+desc
+____
+
+A short description -- this is a sentence fragment suitable for display
+in small spaces like a single line in a list.
+
+long_desc
+_________
+
+A full description of what the setting does, this may be as long as needed.
+
+default
+_______
+
+The default value, if any.
+
+daemon_default
+______________
+
+An alternative default used for daemons (services) as opposed to clients.
+
+tags
+____
+
+A list of strings indicating topics to which this setting relates.  Examples
+of tags are `performance` and `networking`.
+
+services
+________
+
+A list of strings indicating which Ceph services the setting relates to, such
+as `osd`, `mds`, `mon`.  For settings that are relevant to any Ceph client
+or server, `common` is used.
+
+see_also
+________
+
+A list of strings indicating other configuration options that may also
+be of interest to a user setting this option.
+
+enum_values
+___________
+
+Optional: a list of strings indicating the valid settings.
+
+min, max
+________
+
+Optional: upper and lower (inclusive) bounds on valid settings.
+
+
+
+
+Running Multiple Clusters
+=========================
+
+With Ceph, you can run multiple Ceph Storage Clusters on the same hardware.
+Running multiple clusters provides a higher level of isolation compared to 
+using different pools on the same cluster with different CRUSH rulesets. A 
+separate cluster will have separate monitor, OSD and metadata server processes. 
+When running Ceph with  default settings, the default cluster name is ``ceph``, 
+which means you would  save your Ceph configuration file with the file name
+``ceph.conf`` in the  ``/etc/ceph`` default directory.
+
+See `ceph-deploy new`_ for details.
+.. _ceph-deploy new:../ceph-deploy-new
+
+When you run multiple clusters, you must name your cluster and save the Ceph
+configuration file with the name of the cluster. For example, a cluster named
+``openstack`` will have a Ceph configuration file with the file name
+``openstack.conf`` in the  ``/etc/ceph`` default directory. 
+
+.. important:: Cluster names must consist of letters a-z and digits 0-9 only.
+
+Separate clusters imply separate data disks and journals, which are not shared
+between clusters. Referring to `Metavariables`_, the ``$cluster``  metavariable
+evaluates to the cluster name (i.e., ``openstack`` in the  foregoing example).
+Various settings use the ``$cluster`` metavariable, including: 
+
+- ``keyring``
+- ``admin socket``
+- ``log file``
+- ``pid file``
+- ``mon data``
+- ``mon cluster log file``
+- ``osd data``
+- ``osd journal``
+- ``mds data``
+- ``rgw data``
+
+See `General Settings`_, `OSD Settings`_, `Monitor Settings`_, `MDS Settings`_, 
+`RGW Settings`_ and `Log Settings`_ for relevant path defaults that use the 
+``$cluster`` metavariable.
+
+.. _General Settings: ../general-config-ref
+.. _OSD Settings: ../osd-config-ref
+.. _Monitor Settings: ../mon-config-ref
+.. _MDS Settings: ../../../cephfs/mds-config-ref
+.. _RGW Settings: ../../../radosgw/config-ref/
+.. _Log Settings: ../../troubleshooting/log-and-debug
+
+
+When creating default directories or files, you should use the cluster
+name at the appropriate places in the path. For example:: 
+
+	sudo mkdir /var/lib/ceph/osd/openstack-0
+	sudo mkdir /var/lib/ceph/mon/openstack-a
+	
+.. important:: When running monitors on the same host, you should use 
+   different ports. By default, monitors use port 6789. If you already 
+   have monitors using port 6789, use a different port for your other cluster(s). 
+
+To invoke a cluster other than the default ``ceph`` cluster, use the 
+``-c {filename}.conf`` option with the ``ceph`` command. For example:: 
+
+	ceph -c {cluster-name}.conf health
+	ceph -c openstack.conf health
+
+
+.. _Hardware Recommendations: ../../../start/hardware-recommendations
+.. _Network Configuration Reference: ../network-config-ref
+.. _OSD Config Reference: ../osd-config-ref
+.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction
+.. _ceph-deploy new: ../../deployment/ceph-deploy-new#naming-a-cluster
diff --git a/src/ceph/doc/rados/configuration/demo-ceph.conf b/src/ceph/doc/rados/configuration/demo-ceph.conf
new file mode 100644
index 0000000..ba86d53
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/demo-ceph.conf
@@ -0,0 +1,31 @@
+[global]
+fsid = {cluster-id}
+mon initial members = {hostname}[, {hostname}]
+mon host = {ip-address}[, {ip-address}]
+
+#All clusters have a front-side public network.
+#If you have two NICs, you can configure a back side cluster 
+#network for OSD object replication, heart beats, backfilling,
+#recovery, etc.
+public network = {network}[, {network}]
+#cluster network = {network}[, {network}] 
+
+#Clusters require authentication by default.
+auth cluster required = cephx
+auth service required = cephx
+auth client required = cephx
+
+#Choose reasonable numbers for your journals, number of replicas
+#and placement groups.
+osd journal size = {n}
+osd pool default size = {n}  # Write an object n times.
+osd pool default min size = {n} # Allow writing n copy in a degraded state.
+osd pool default pg num = {n}
+osd pool default pgp num = {n}
+
+#Choose a reasonable crush leaf type.
+#0 for a 1-node cluster.
+#1 for a multi node cluster in a single rack
+#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
+#3 for a multi node cluster with hosts across racks, etc.
+osd crush chooseleaf type = {n}
+\ No newline at end of file
diff --git a/src/ceph/doc/rados/configuration/filestore-config-ref.rst b/src/ceph/doc/rados/configuration/filestore-config-ref.rst
new file mode 100644
index 0000000..4dff60c
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/filestore-config-ref.rst
@@ -0,0 +1,365 @@
+============================
+ Filestore Config Reference
+============================
+
+
+``filestore debug omap check``
+
+:Description: Debugging check on synchronization. Expensive. For debugging only.
+:Type: Boolean
+:Required: No
+:Default: ``0``
+
+
+.. index:: filestore; extended attributes
+
+Extended Attributes
+===================
+
+Extended Attributes (XATTRs) are an important aspect in your configuration. 
+Some file systems have limits on the number of bytes stored in XATTRS. 
+Additionally, in some cases, the filesystem may not be as fast as an alternative
+method of storing XATTRs. The following settings may help improve performance
+by using a method of storing XATTRs that is extrinsic to the underlying filesystem.
+
+Ceph XATTRs are stored as ``inline xattr``, using the XATTRs provided
+by the underlying file system, if it does not impose a size limit. If
+there is a size limit (4KB total on ext4, for instance), some Ceph
+XATTRs will be stored in an key/value database when either the
+``filestore max inline xattr size`` or ``filestore max inline
+xattrs`` threshold is reached.
+
+
+``filestore max inline xattr size``
+
+:Description: The maximimum size of an XATTR stored in the filesystem (i.e., XFS,
+              btrfs, ext4, etc.) per object. Should not be larger than the
+              filesytem can handle. Default value of 0 means to use the value
+              specific to the underlying filesystem.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``filestore max inline xattr size xfs``
+
+:Description: The maximimum size of an XATTR stored in the XFS filesystem.
+              Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``65536``
+
+
+``filestore max inline xattr size btrfs``
+
+:Description: The maximimum size of an XATTR stored in the btrfs filesystem.
+              Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``2048``
+
+
+``filestore max inline xattr size other``
+
+:Description: The maximimum size of an XATTR stored in other filesystems.
+              Only used if ``filestore max inline xattr size`` == 0.
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``512``
+
+
+``filestore max inline xattrs``
+
+:Description: The maximum number of XATTRs stored in the filesystem per object.
+              Default value of 0 means to use the value specific to the
+              underlying filesystem.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``filestore max inline xattrs xfs``
+
+:Description: The maximum number of XATTRs stored in the XFS filesystem per object.
+              Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore max inline xattrs btrfs``
+
+:Description: The maximum number of XATTRs stored in the btrfs filesystem per object.
+              Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore max inline xattrs other``
+
+:Description: The maximum number of XATTRs stored in other filesystems per object.
+              Only used if ``filestore max inline xattrs`` == 0.
+:Type: 32-bit Integer
+:Required: No
+:Default: ``2``
+
+.. index:: filestore; synchronization
+
+Synchronization Intervals
+=========================
+
+Periodically, the filestore needs to quiesce writes and synchronize the
+filesystem, which creates a consistent commit point. It can then free journal
+entries up to the commit point. Synchronizing more frequently tends to reduce
+the time required to perform synchronization, and reduces the amount of data
+that needs to remain in the  journal. Less frequent synchronization allows the
+backing filesystem to coalesce  small writes and metadata updates more
+optimally--potentially resulting in more efficient synchronization.
+
+
+``filestore max sync interval``
+
+:Description: The maximum interval in seconds for synchronizing the filestore.
+:Type: Double
+:Required: No
+:Default: ``5``
+
+
+``filestore min sync interval``
+
+:Description: The minimum interval in seconds for synchronizing the filestore.
+:Type: Double
+:Required: No
+:Default: ``.01``
+
+
+.. index:: filestore; flusher
+
+Flusher
+=======
+
+The filestore flusher forces data from large writes to be written out using
+``sync file range`` before the sync in order to (hopefully) reduce the cost of
+the eventual sync. In practice, disabling 'filestore flusher' seems to improve
+performance in some cases.
+
+
+``filestore flusher``
+
+:Description: Enables the filestore flusher.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+.. deprecated:: v.65
+
+``filestore flusher max fds``
+
+:Description: Sets the maximum number of file descriptors for the flusher.
+:Type: Integer
+:Required: No
+:Default: ``512``
+
+.. deprecated:: v.65
+
+``filestore sync flush``
+
+:Description: Enables the synchronization flusher. 
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+.. deprecated:: v.65
+
+``filestore fsync flushes journal data``
+
+:Description: Flush journal data during filesystem synchronization.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+.. index:: filestore; queue
+
+Queue
+=====
+
+The following settings provide limits on the size of filestore queue.
+
+``filestore queue max ops``
+
+:Description: Defines the maximum number of in progress operations the file store accepts before blocking on queuing new operations. 
+:Type: Integer
+:Required: No. Minimal impact on performance.
+:Default: ``50``
+
+
+``filestore queue max bytes``
+
+:Description: The maximum number of bytes for an operation. 
+:Type: Integer
+:Required: No
+:Default: ``100 << 20``
+
+
+
+
+.. index:: filestore; timeouts
+
+Timeouts
+========
+
+
+``filestore op threads``
+
+:Description: The number of filesystem operation threads that execute in parallel. 
+:Type: Integer
+:Required: No
+:Default: ``2``
+
+
+``filestore op thread timeout``
+
+:Description: The timeout for a filesystem operation thread (in seconds).
+:Type: Integer
+:Required: No
+:Default: ``60``
+
+
+``filestore op thread suicide timeout``
+
+:Description: The timeout for a commit operation before cancelling the commit (in seconds). 
+:Type: Integer
+:Required: No
+:Default: ``180``
+
+
+.. index:: filestore; btrfs
+
+B-Tree Filesystem
+=================
+
+
+``filestore btrfs snap``
+
+:Description: Enable snapshots for a ``btrfs`` filestore.
+:Type: Boolean
+:Required: No. Only used for ``btrfs``.
+:Default: ``true``
+
+
+``filestore btrfs clone range``
+
+:Description: Enable cloning ranges for a ``btrfs`` filestore.
+:Type: Boolean
+:Required: No. Only used for ``btrfs``.
+:Default: ``true``
+
+
+.. index:: filestore; journal
+
+Journal
+=======
+
+
+``filestore journal parallel``
+
+:Description: Enables parallel journaling, default for btrfs.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore journal writeahead``
+
+:Description: Enables writeahead journaling, default for xfs.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore journal trailing``
+
+:Description: Deprecated, never use.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+Misc
+====
+
+
+``filestore merge threshold``
+
+:Description: Min number of files in a subdir before merging into parent
+              NOTE: A negative value means to disable subdir merging
+:Type: Integer
+:Required: No
+:Default: ``10``
+
+
+``filestore split multiple``
+
+:Description:  ``(filestore_split_multiple * abs(filestore_merge_threshold) + (rand() % filestore_split_rand_factor)) * 16``
+               is the maximum number of files in a subdirectory before 
+               splitting into child directories.
+
+:Type: Integer
+:Required: No
+:Default: ``2``
+
+
+``filestore split rand factor``
+
+:Description:  A random factor added to the split threshold to avoid
+               too many filestore splits occurring at once. See
+               ``filestore split multiple`` for details.
+               This can only be changed for an existing osd offline,
+               via ceph-objectstore-tool's apply-layout-settings command.
+
+:Type: Unsigned 32-bit Integer
+:Required: No
+:Default: ``20``
+
+
+``filestore update to``
+
+:Description: Limits filestore auto upgrade to specified version.
+:Type: Integer
+:Required: No
+:Default: ``1000``
+
+
+``filestore blackhole``
+
+:Description: Drop any new transactions on the floor.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore dump file``
+
+:Description: File onto which store transaction dumps.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``filestore kill at``
+
+:Description: inject a failure at the n'th opportunity
+:Type: String
+:Required: No
+:Default: ``false``
+
+
+``filestore fail eio``
+
+:Description: Fail/Crash on eio.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
diff --git a/src/ceph/doc/rados/configuration/general-config-ref.rst b/src/ceph/doc/rados/configuration/general-config-ref.rst
new file mode 100644
index 0000000..ca09ee5
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/general-config-ref.rst
@@ -0,0 +1,66 @@
+==========================
+ General Config Reference
+==========================
+
+
+``fsid``
+
+:Description: The filesystem ID. One per cluster.
+:Type: UUID
+:Required: No. 
+:Default: N/A. Usually generated by deployment tools.
+
+
+``admin socket``
+
+:Description: The socket for executing administrative commands on a daemon, 
+              irrespective of whether Ceph Monitors have established a quorum.
+
+:Type: String
+:Required: No
+:Default: ``/var/run/ceph/$cluster-$name.asok`` 
+
+
+``pid file``
+
+:Description: The file in which the mon, osd or mds will write its
+              PID.  For instance, ``/var/run/$cluster/$type.$id.pid``
+              will create /var/run/ceph/mon.a.pid for the ``mon`` with
+              id ``a`` running in the ``ceph`` cluster. The ``pid
+              file`` is removed when the daemon stops gracefully. If
+              the process is not daemonized (i.e. runs with the ``-f``
+              or ``-d`` option), the ``pid file`` is not created. 
+:Type: String
+:Required: No
+:Default: No
+
+
+``chdir``
+
+:Description: The directory Ceph daemons change to once they are 
+              up and running. Default ``/`` directory recommended.
+
+:Type: String
+:Required: No
+:Default: ``/``
+
+
+``max open files``
+
+:Description: If set, when the :term:`Ceph Storage Cluster` starts, Ceph sets 
+              the  ``max open fds`` at the OS level (i.e., the max # of file 
+              descriptors). It helps prevents Ceph OSD Daemons from running out
+              of file descriptors.
+
+:Type: 64-bit Integer
+:Required: No
+:Default: ``0``
+
+
+``fatal signal handlers``
+
+:Description: If set, we will install signal handlers for SEGV, ABRT, BUS, ILL,
+              FPE, XCPU, XFSZ, SYS signals to generate a useful log message
+
+:Type: Boolean
+:Default: ``true``
diff --git a/src/ceph/doc/rados/configuration/index.rst b/src/ceph/doc/rados/configuration/index.rst
new file mode 100644
index 0000000..48b58ef
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/index.rst
@@ -0,0 +1,64 @@
+===============
+ Configuration
+===============
+
+Ceph can run with a cluster containing thousands of Object Storage Devices
+(OSDs). A minimal system will have at least two OSDs for data replication. To
+configure OSD clusters, you must provide settings in the configuration file.
+Ceph provides default values for many settings, which you can override in the
+configuration file. Additionally, you can make runtime modification to the
+configuration using command-line utilities.
+
+When Ceph starts, it activates three daemons:
+
+- ``ceph-mon`` (mandatory)
+- ``ceph-osd`` (mandatory)
+- ``ceph-mds`` (mandatory for cephfs only)
+
+Each process, daemon or utility loads the host's configuration file. A process
+may have information about more than one daemon instance (*i.e.,* multiple
+contexts). A daemon or utility only has information about a single daemon
+instance (a single context).
+
+.. note:: Ceph can run on a single host for evaluation purposes.
+
+
+.. raw:: html
+
+	<table cellpadding="10"><colgroup><col width="50%"><col width="50%"></colgroup><tbody valign="top"><tr><td><h3>Configuring the Object Store</h3>
+
+For general object store configuration, refer to the following:
+
+.. toctree::
+   :maxdepth: 1
+
+   Storage devices <storage-devices>
+   ceph-conf
+
+
+.. raw:: html 
+
+	</td><td><h3>Reference</h3>
+
+To optimize the performance of your cluster, refer to the following:
+
+.. toctree::
+   :maxdepth: 1
+
+   Network Settings <network-config-ref>
+   Auth Settings <auth-config-ref>
+   Monitor Settings <mon-config-ref>
+   mon-lookup-dns
+   Heartbeat Settings <mon-osd-interaction>
+   OSD Settings <osd-config-ref>
+   BlueStore Settings <bluestore-config-ref>
+   FileStore Settings <filestore-config-ref>
+   Journal Settings <journal-ref>
+   Pool, PG & CRUSH Settings <pool-pg-config-ref.rst>
+   Messaging Settings <ms-ref>
+   General Settings <general-config-ref>
+
+   
+.. raw:: html
+
+	</td></tr></tbody></table>
diff --git a/src/ceph/doc/rados/configuration/journal-ref.rst b/src/ceph/doc/rados/configuration/journal-ref.rst
new file mode 100644
index 0000000..97300f4
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/journal-ref.rst
@@ -0,0 +1,116 @@
+==========================
+ Journal Config Reference
+==========================
+
+.. index:: journal; journal configuration
+
+Ceph OSDs use a journal for two reasons: speed and consistency.  
+
+- **Speed:** The journal enables the Ceph OSD Daemon to commit small writes 
+  quickly. Ceph writes small, random i/o to the journal sequentially, which 
+  tends to speed up bursty workloads by allowing the backing filesystem more 
+  time to coalesce writes. The Ceph OSD Daemon's journal, however, can lead 
+  to spiky performance with short spurts of high-speed writes followed by 
+  periods without any write progress as the filesystem catches up to the 
+  journal.
+
+- **Consistency:** Ceph OSD Daemons require a filesystem interface that 
+  guarantees atomic compound operations. Ceph OSD Daemons write a description 
+  of the operation to the journal and apply the operation to the filesystem. 
+  This enables atomic updates to an object (for example, placement group 
+  metadata). Every few seconds--between ``filestore max sync interval`` and
+  ``filestore min sync interval``--the Ceph OSD Daemon stops writes and 
+  synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to 
+  trim operations from the journal and reuse the space. On failure, Ceph 
+  OSD Daemons replay the journal starting after the last synchronization 
+  operation.
+
+Ceph OSD Daemons support the following journal settings: 
+
+
+``journal dio``
+
+:Description: Enables direct i/o to the journal. Requires ``journal block 
+              align`` set to ``true``.
+              
+:Type: Boolean
+:Required: Yes when using ``aio``.
+:Default: ``true``
+
+
+
+``journal aio``
+
+.. versionchanged:: 0.61 Cuttlefish
+
+:Description: Enables using ``libaio`` for asynchronous writes to the journal. 
+              Requires ``journal dio`` set to ``true``.
+
+:Type: Boolean 
+:Required: No.
+:Default: Version 0.61 and later, ``true``. Version 0.60 and earlier, ``false``.
+
+
+``journal block align``
+
+:Description: Block aligns write operations. Required for ``dio`` and ``aio``.
+:Type: Boolean
+:Required: Yes when using ``dio`` and ``aio``.
+:Default: ``true``
+
+
+``journal max write bytes``
+
+:Description: The maximum number of bytes the journal will write at 
+              any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``10 << 20``
+
+
+``journal max write entries``
+
+:Description: The maximum number of entries the journal will write at 
+              any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``100``
+
+
+``journal queue max ops``
+
+:Description: The maximum number of operations allowed in the queue at 
+              any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``500``
+
+
+``journal queue max bytes``
+
+:Description: The maximum number of bytes allowed in the queue at 
+              any one time.
+
+:Type: Integer
+:Required: No
+:Default: ``10 << 20``
+
+
+``journal align min size``
+
+:Description: Align data payloads greater than the specified minimum.
+:Type: Integer
+:Required: No
+:Default: ``64 << 10``
+
+
+``journal zero on create``
+
+:Description: Causes the file store to overwrite the entire journal with 
+              ``0``'s during ``mkfs``.
+:Type: Boolean
+:Required: No
+:Default: ``false``
diff --git a/src/ceph/doc/rados/configuration/mon-config-ref.rst b/src/ceph/doc/rados/configuration/mon-config-ref.rst
new file mode 100644
index 0000000..6c8e92b
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-config-ref.rst
@@ -0,0 +1,1222 @@
+==========================
+ Monitor Config Reference
+==========================
+
+Understanding how to configure a :term:`Ceph Monitor` is an important part of
+building a reliable :term:`Ceph Storage Cluster`. **All Ceph Storage Clusters
+have at least one monitor**. A monitor configuration usually remains fairly
+consistent, but you can add, remove or replace a monitor in a cluster. See
+`Adding/Removing a Monitor`_ and `Add/Remove a Monitor (ceph-deploy)`_ for
+details.
+
+
+.. index:: Ceph Monitor; Paxos
+
+Background
+==========
+
+Ceph Monitors maintain a "master copy" of the :term:`cluster map`, which means a
+:term:`Ceph Client` can determine the location of all Ceph Monitors, Ceph OSD
+Daemons, and Ceph Metadata Servers just by connecting to one Ceph Monitor and
+retrieving a current cluster map. Before Ceph Clients can read from or write to
+Ceph OSD Daemons or Ceph Metadata Servers, they must connect to a Ceph Monitor
+first. With a current copy of the cluster map and the CRUSH algorithm, a Ceph
+Client can compute the location for any object. The ability to compute object
+locations allows a Ceph Client to talk directly to Ceph OSD Daemons, which is a
+very important aspect of Ceph's high scalability and performance. See 
+`Scalability and High Availability`_ for additional details.
+
+The primary role of the Ceph Monitor is to maintain a master copy of the cluster
+map. Ceph Monitors also provide authentication and logging services. Ceph
+Monitors write all changes in the monitor services to a single Paxos instance,
+and Paxos writes the changes to a key/value store for strong consistency. Ceph
+Monitors can query the most recent version of the cluster map during sync
+operations. Ceph Monitors leverage the key/value store's snapshots and iterators
+(using leveldb) to perform store-wide synchronization.
+
+.. ditaa:: 
+
+ /-------------\               /-------------\
+ |   Monitor   | Write Changes |    Paxos    |
+ |   cCCC      +-------------->+   cCCC      |
+ |             |               |             |
+ +-------------+               \------+------/
+ |    Auth     |                      |
+ +-------------+                      | Write Changes
+ |    Log      |                      |
+ +-------------+                      v
+ | Monitor Map |               /------+------\
+ +-------------+               | Key / Value |
+ |   OSD Map   |               |    Store    |
+ +-------------+               |  cCCC       |
+ |   PG Map    |               \------+------/
+ +-------------+                      ^
+ |   MDS Map   |                      | Read Changes
+ +-------------+                      |
+ |    cCCC     |*---------------------+
+ \-------------/
+
+
+.. deprecated:: version 0.58
+
+In Ceph versions 0.58 and earlier, Ceph Monitors use a Paxos instance for
+each service and store the map as a file. 
+
+.. index:: Ceph Monitor; cluster map
+
+Cluster Maps
+------------
+
+The cluster map is a composite of maps, including the monitor map, the OSD map,
+the placement group map and the metadata server map. The cluster map tracks a
+number of important things: which processes are ``in`` the Ceph Storage Cluster;
+which processes that are ``in`` the Ceph Storage Cluster are ``up`` and running
+or ``down``; whether, the placement groups are ``active`` or ``inactive``, and
+``clean`` or in some other state; and, other details that reflect the current
+state of the cluster such as the total amount of storage space, and the amount
+of storage used.
+
+When there is a significant change in the state of the cluster--e.g., a Ceph OSD
+Daemon goes down, a placement group falls into a degraded state, etc.--the
+cluster map gets updated to reflect the current state of the cluster.
+Additionally, the Ceph Monitor also maintains a history of the prior states of
+the cluster. The monitor map, OSD map, placement group map and metadata server
+map each maintain a history of their map versions. We call each version an
+"epoch."
+
+When operating your Ceph Storage Cluster, keeping track of these states is an
+important part of your system administration duties. See `Monitoring a Cluster`_
+and `Monitoring OSDs and PGs`_ for additional details.
+
+.. index:: high availability; quorum
+
+Monitor Quorum
+--------------
+
+Our Configuring ceph section provides a trivial `Ceph configuration file`_ that
+provides for one monitor in the test cluster. A cluster will run fine with a
+single monitor; however, **a single monitor is a single-point-of-failure**. To
+ensure high availability in a production Ceph Storage Cluster, you should run
+Ceph with multiple monitors so that the failure of a single monitor **WILL NOT**
+bring down your entire cluster.
+
+When a Ceph Storage Cluster runs multiple Ceph Monitors for high availability,
+Ceph Monitors use `Paxos`_ to establish consensus about the master cluster map.
+A consensus requires a majority of monitors running to establish a quorum for
+consensus about the cluster map (e.g., 1; 2 out of 3; 3 out of 5; 4 out of 6;
+etc.).
+
+``mon force quorum join``
+
+:Description: Force monitor to join quorum even if it has been previously removed from the map
+:Type: Boolean
+:Default: ``False``
+
+.. index:: Ceph Monitor; consistency
+
+Consistency
+-----------
+
+When you add monitor settings to your Ceph configuration file, you need to be
+aware of some of the architectural aspects of Ceph Monitors. **Ceph imposes
+strict consistency requirements** for a Ceph monitor when discovering another
+Ceph Monitor within the cluster. Whereas, Ceph Clients and other Ceph daemons
+use the Ceph configuration file to discover monitors, monitors discover each
+other using the monitor map (monmap), not the Ceph configuration file.
+
+A Ceph Monitor always refers to the local copy of the monmap when discovering
+other Ceph Monitors in the Ceph Storage Cluster. Using the monmap instead of the
+Ceph configuration file avoids errors that could break the cluster (e.g., typos
+in ``ceph.conf`` when specifying a monitor address or port). Since monitors use
+monmaps for discovery and they share monmaps with clients and other Ceph
+daemons, **the monmap provides monitors with a strict guarantee that their
+consensus is valid.**
+
+Strict consistency also applies to updates to the monmap. As with any other
+updates on the Ceph Monitor, changes to the monmap always run through a
+distributed consensus algorithm called `Paxos`_. The Ceph Monitors must agree on
+each update to the monmap, such as adding or removing a Ceph Monitor, to ensure
+that each monitor in the quorum has the same version of the monmap. Updates to
+the monmap are incremental so that Ceph Monitors have the latest agreed upon
+version, and a set of previous versions. Maintaining a history enables a Ceph
+Monitor that has an older version of the monmap to catch up with the current
+state of the Ceph Storage Cluster.
+
+If Ceph Monitors discovered each other through the Ceph configuration file
+instead of through the monmap, it would introduce additional risks because the
+Ceph configuration files are not updated and distributed automatically. Ceph
+Monitors might inadvertently use an older Ceph configuration file, fail to
+recognize a Ceph Monitor, fall out of a quorum, or develop a situation where
+`Paxos`_ is not able to determine the current state of the system accurately.
+
+
+.. index:: Ceph Monitor; bootstrapping monitors
+
+Bootstrapping Monitors
+----------------------
+
+In most configuration and deployment cases, tools that deploy Ceph may help
+bootstrap the Ceph Monitors by generating a monitor map for you (e.g.,
+``ceph-deploy``, etc). A Ceph Monitor requires a few explicit
+settings:
+
+- **Filesystem ID**: The ``fsid`` is the unique identifier for your
+  object store. Since you can run multiple clusters on the same
+  hardware, you must specify the unique ID of the object store when
+  bootstrapping a monitor.  Deployment tools usually do this for you
+  (e.g., ``ceph-deploy`` can call a tool like ``uuidgen``), but you
+  may specify the ``fsid`` manually too.
+  
+- **Monitor ID**: A monitor ID is a unique ID assigned to each monitor within 
+  the cluster. It is an alphanumeric value, and by convention the identifier 
+  usually follows an alphabetical increment (e.g., ``a``, ``b``, etc.). This 
+  can be set in a Ceph configuration file (e.g., ``[mon.a]``, ``[mon.b]``, etc.), 
+  by a deployment tool, or using the ``ceph`` commandline.
+
+- **Keys**: The monitor must have secret keys. A deployment tool such as 
+  ``ceph-deploy`` usually does this for you, but you may
+  perform this step manually too. See `Monitor Keyrings`_ for details.
+
+For additional details on bootstrapping, see `Bootstrapping a Monitor`_.
+
+.. index:: Ceph Monitor; configuring monitors
+
+Configuring Monitors
+====================
+
+To apply configuration settings to the entire cluster, enter the configuration
+settings under ``[global]``. To apply configuration settings to all monitors in
+your cluster, enter the configuration settings under ``[mon]``. To apply
+configuration settings to specific monitors, specify the monitor instance 
+(e.g., ``[mon.a]``). By convention, monitor instance names use alpha notation.
+
+.. code-block:: ini
+
+	[global]
+
+	[mon]		
+		
+	[mon.a]
+		
+	[mon.b]
+		
+	[mon.c]
+
+
+Minimum Configuration
+---------------------
+
+The bare minimum monitor settings for a Ceph monitor via the Ceph configuration
+file include a hostname and a monitor address for each monitor. You can configure
+these under ``[mon]`` or under the entry for a specific monitor.
+
+.. code-block:: ini
+
+	[mon]
+		mon host = hostname1,hostname2,hostname3
+		mon addr = 10.0.0.10:6789,10.0.0.11:6789,10.0.0.12:6789
+
+
+.. code-block:: ini
+
+	[mon.a]
+		host = hostname1
+		mon addr = 10.0.0.10:6789
+
+See the `Network Configuration Reference`_ for details.
+
+.. note:: This minimum configuration for monitors assumes that a deployment 
+   tool generates the ``fsid`` and the ``mon.`` key for you.
+
+Once you deploy a Ceph cluster, you **SHOULD NOT** change the IP address of
+the monitors. However, if you decide to change the monitor's IP address, you
+must follow a specific procedure. See `Changing a Monitor's IP Address`_ for
+details.
+
+Monitors can also be found by clients using DNS SRV records. See `Monitor lookup through DNS`_ for details.
+
+Cluster ID
+----------
+
+Each Ceph Storage Cluster has a unique identifier (``fsid``). If specified, it
+usually appears under the ``[global]`` section of the configuration file.
+Deployment tools usually generate the ``fsid`` and store it in the monitor map,
+so the value may not appear in a configuration file. The ``fsid`` makes it
+possible to run daemons for multiple clusters on the same hardware.
+
+``fsid``
+
+:Description: The cluster ID. One per cluster.
+:Type: UUID
+:Required: Yes.
+:Default: N/A. May be generated by a deployment tool if not specified.
+
+.. note:: Do not set this value if you use a deployment tool that does
+   it for you.
+
+
+.. index:: Ceph Monitor; initial members
+
+Initial Members
+---------------
+
+We recommend running a production Ceph Storage Cluster with at least three Ceph
+Monitors to ensure high availability. When you run multiple monitors, you may
+specify the initial monitors that must be members of the cluster in order to
+establish a quorum. This may reduce the time it takes for your cluster to come
+online.
+
+.. code-block:: ini
+
+	[mon]		
+		mon initial members = a,b,c
+
+
+``mon initial members``
+
+:Description: The IDs of initial monitors in a cluster during startup. If 
+              specified, Ceph requires an odd number of monitors to form an 
+              initial quorum (e.g., 3). 
+
+:Type: String
+:Default: None
+
+.. note:: A *majority* of monitors in your cluster must be able to reach 
+   each other in order to establish a quorum. You can decrease the initial 
+   number of monitors to establish a quorum with this setting.
+
+.. index:: Ceph Monitor; data path
+
+Data
+----
+
+Ceph provides a default path where Ceph Monitors store data. For optimal
+performance in a production Ceph Storage Cluster, we recommend running Ceph
+Monitors on separate hosts and drives from Ceph OSD Daemons. As leveldb is using
+``mmap()`` for writing the data, Ceph Monitors flush their data from memory to disk
+very often, which can interfere with Ceph OSD Daemon workloads if the data
+store is co-located with the OSD Daemons.
+
+In Ceph versions 0.58 and earlier, Ceph Monitors store their data in files. This 
+approach allows users to inspect monitor data with common tools like ``ls``
+and ``cat``. However, it doesn't provide strong consistency.
+
+In Ceph versions 0.59 and later, Ceph Monitors store their data as key/value
+pairs. Ceph Monitors require `ACID`_ transactions. Using a data store prevents
+recovering Ceph Monitors from running corrupted versions through Paxos, and it
+enables multiple modification operations in one single atomic batch, among other
+advantages.
+
+Generally, we do not recommend changing the default data location. If you modify
+the default location, we recommend that you make it uniform across Ceph Monitors
+by setting it in the ``[mon]`` section of the configuration file.
+
+
+``mon data`` 
+
+:Description: The monitor's data location.
+:Type: String
+:Default: ``/var/lib/ceph/mon/$cluster-$id``
+
+
+``mon data size warn``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log when the monitor's data
+              store goes over 15GB.
+:Type: Integer
+:Default: 15*1024*1024*1024*
+
+
+``mon data avail warn``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log when the available disk
+              space of monitor's data store is lower or equal to this
+              percentage.
+:Type: Integer
+:Default: 30
+
+
+``mon data avail crit``
+
+:Description: Issue a ``HEALTH_ERR`` in cluster log when the available disk
+              space of monitor's data store is lower or equal to this
+              percentage.
+:Type: Integer
+:Default: 5
+
+
+``mon warn on cache pools without hit sets``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if a cache pool does not
+              have the hitset type set set.
+              See `hit set type <../operations/pools#hit-set-type>`_ for more
+              details.
+:Type: Boolean
+:Default: True
+
+
+``mon warn on crush straw calc version zero``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the CRUSH's
+              ``straw_calc_version`` is zero. See
+              `CRUSH map tunables <../operations/crush-map#tunables>`_ for
+              details.
+:Type: Boolean
+:Default: True
+
+
+``mon warn on legacy crush tunables``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if
+              CRUSH tunables are too old (older than ``mon_min_crush_required_version``)
+:Type: Boolean
+:Default: True
+
+
+``mon crush min required version``
+
+:Description: The minimum tunable profile version required by the cluster.
+              See
+              `CRUSH map tunables <../operations/crush-map#tunables>`_ for
+              details.
+:Type: String
+:Default: ``firefly``
+
+
+``mon warn on osd down out interval zero``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if
+              ``mon osd down out interval`` is zero. Having this option set to
+              zero on the leader acts much like the ``noout`` flag. It's hard
+              to figure out what's going wrong with clusters witout the
+              ``noout`` flag set but acting like that just the same, so we
+              report a warning in this case.
+:Type: Boolean
+:Default: True
+
+
+``mon cache target full warn ratio``
+
+:Description: Position between pool's ``cache_target_full`` and
+              ``target_max_object`` where we start warning
+:Type: Float
+:Default: ``0.66``
+
+
+``mon health data update interval``
+
+:Description: How often (in seconds) the monitor in quorum shares its health
+              status with its peers. (negative number disables it)
+:Type: Float
+:Default: ``60``
+
+
+``mon health to clog``
+
+:Description: Enable sending health summary to cluster log periodically.
+:Type: Boolean
+:Default: True
+
+
+``mon health to clog tick interval``
+
+:Description: How often (in seconds) the monitor send health summary to cluster
+              log (a non-positive number disables it). If current health summary
+              is empty or identical to the last time, monitor will not send it
+              to cluster log.
+:Type: Integer
+:Default: 3600
+
+
+``mon health to clog interval``
+
+:Description: How often (in seconds) the monitor send health summary to cluster
+              log (a non-positive number disables it). Monitor will always
+              send the summary to cluster log no matter if the summary changes
+              or not.
+:Type: Integer
+:Default: 60
+
+
+
+.. index:: Ceph Storage Cluster; capacity planning, Ceph Monitor; capacity planning
+
+Storage Capacity
+----------------
+
+When a Ceph Storage Cluster gets close to its maximum capacity (i.e., ``mon osd
+full ratio``),  Ceph prevents you from writing to or reading from Ceph OSD
+Daemons as a safety measure to prevent data loss. Therefore, letting a
+production Ceph Storage Cluster approach its full ratio is not a good practice,
+because it sacrifices high availability. The default full ratio is ``.95``, or
+95% of capacity. This a very aggressive setting for a test cluster with a small
+number of OSDs.
+
+.. tip:: When monitoring your cluster, be alert to warnings related to the 
+   ``nearfull`` ratio. This means that a failure of some OSDs could result
+   in a temporary service disruption if one or more OSDs fails. Consider adding
+   more OSDs to increase storage capacity.
+
+A common scenario for test clusters involves a system administrator removing a
+Ceph OSD Daemon from the Ceph Storage Cluster to watch the cluster rebalance;
+then, removing another Ceph OSD Daemon, and so on until the Ceph Storage Cluster
+eventually reaches the full ratio and locks up. We recommend a bit of capacity
+planning even with a test cluster. Planning enables you to gauge how much spare
+capacity you will need in order to maintain high availability. Ideally, you want
+to plan for a series of Ceph OSD Daemon failures where the cluster can recover
+to an ``active + clean`` state without replacing those Ceph OSD Daemons
+immediately. You can run a cluster in an ``active + degraded`` state, but this
+is not ideal for normal operating conditions.
+
+The following diagram depicts a simplistic Ceph Storage Cluster containing 33
+Ceph Nodes with one Ceph OSD Daemon per host, each Ceph OSD Daemon reading from
+and writing to a 3TB drive. So this exemplary Ceph Storage Cluster has a maximum
+actual capacity of 99TB. With a ``mon osd full ratio`` of ``0.95``, if the Ceph
+Storage Cluster falls to 5TB of remaining capacity, the cluster will not allow
+Ceph Clients to read and write data. So the Ceph Storage Cluster's operating
+capacity is 95TB, not 99TB.
+
+.. ditaa::
+
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | Rack 1 |  | Rack 2 |  | Rack 3 |  | Rack 4 |  | Rack 5 |  | Rack 6 |
+ | cCCC   |  | cF00   |  | cCCC   |  | cCCC   |  | cCCC   |  | cCCC   |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 1  |  | OSD 7  |  | OSD 13 |  | OSD 19 |  | OSD 25 |  | OSD 31 |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 2  |  | OSD 8  |  | OSD 14 |  | OSD 20 |  | OSD 26 |  | OSD 32 |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 3  |  | OSD 9  |  | OSD 15 |  | OSD 21 |  | OSD 27 |  | OSD 33 |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 4  |  | OSD 10 |  | OSD 16 |  | OSD 22 |  | OSD 28 |  | Spare  | 
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 5  |  | OSD 11 |  | OSD 17 |  | OSD 23 |  | OSD 29 |  | Spare  |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+ | OSD 6  |  | OSD 12 |  | OSD 18 |  | OSD 24 |  | OSD 30 |  | Spare  |
+ +--------+  +--------+  +--------+  +--------+  +--------+  +--------+
+
+It is normal in such a cluster for one or two OSDs to fail. A less frequent but
+reasonable scenario involves a rack's router or power supply failing, which
+brings down multiple OSDs simultaneously (e.g., OSDs 7-12). In such a scenario,
+you should still strive for a cluster that can remain operational and achieve an
+``active + clean`` state--even if that means adding a few hosts with additional
+OSDs in short order. If your capacity utilization is too high, you may not lose
+data, but you could still sacrifice data availability while resolving an outage
+within a failure domain if capacity utilization of the cluster exceeds the full
+ratio. For this reason, we recommend at least some rough capacity planning.
+
+Identify two numbers for your cluster:
+
+#. The number of OSDs. 
+#. The total capacity of the cluster 
+
+If you divide the total capacity of your cluster by the number of OSDs in your
+cluster, you will find the mean average capacity of an OSD within your cluster.
+Consider multiplying that number by the number of OSDs you expect will fail
+simultaneously during normal operations (a relatively small number). Finally
+multiply the capacity of the cluster by the full ratio to arrive at a maximum
+operating capacity; then, subtract the number of amount of data from the OSDs
+you expect to fail to arrive at a reasonable full ratio. Repeat the foregoing
+process with a higher number of OSD failures (e.g., a rack of OSDs) to arrive at
+a reasonable number for a near full ratio.
+
+.. code-block:: ini
+
+	[global]
+		
+		mon osd full ratio = .80
+		mon osd backfillfull ratio = .75
+		mon osd nearfull ratio = .70
+
+
+``mon osd full ratio`` 
+
+:Description: The percentage of disk space used before an OSD is 
+              considered ``full``.
+
+:Type: Float
+:Default: ``.95``
+
+
+``mon osd backfillfull ratio``
+
+:Description: The percentage of disk space used before an OSD is
+              considered too ``full`` to backfill.
+
+:Type: Float
+:Default: ``.90``
+
+
+``mon osd nearfull ratio`` 
+
+:Description: The percentage of disk space used before an OSD is 
+              considered ``nearfull``.
+
+:Type: Float
+:Default: ``.85``
+
+
+.. tip:: If some OSDs are nearfull, but others have plenty of capacity, you 
+         may have a problem with the CRUSH weight for the nearfull OSDs.
+
+.. index:: heartbeat
+
+Heartbeat
+---------
+
+Ceph monitors know about the cluster by requiring reports from each OSD, and by
+receiving reports from OSDs about the status of their neighboring OSDs. Ceph
+provides reasonable default settings for monitor/OSD interaction; however,  you
+may modify them as needed. See `Monitor/OSD Interaction`_ for details.
+
+
+.. index:: Ceph Monitor; leader, Ceph Monitor; provider, Ceph Monitor; requester, Ceph Monitor; synchronization
+
+Monitor Store Synchronization
+-----------------------------
+
+When you run a production cluster with multiple monitors (recommended), each
+monitor checks to see if a neighboring monitor has a more recent version of the
+cluster map (e.g., a map in a neighboring monitor with one or more epoch numbers
+higher than the most current epoch in the map of the instant monitor).
+Periodically, one monitor in the cluster may fall behind the other monitors to
+the point where it must leave the quorum, synchronize to retrieve the most
+current information about the cluster, and then rejoin the quorum. For the
+purposes of synchronization, monitors may assume one of three roles: 
+
+#. **Leader**: The `Leader` is the first monitor to achieve the most recent
+   Paxos version of the cluster map.
+
+#. **Provider**: The `Provider` is a monitor that has the most recent version
+   of the cluster map, but wasn't the first to achieve the most recent version.
+
+#. **Requester:** A `Requester` is a monitor that has fallen behind the leader
+   and must synchronize in order to retrieve the most recent information about
+   the cluster before it can rejoin the quorum.
+
+These roles enable a leader to delegate synchronization duties to a provider,
+which prevents synchronization requests from overloading the leader--improving
+performance. In the following diagram, the requester has learned that it has
+fallen behind the other monitors. The requester asks the leader to synchronize,
+and the leader tells the requester to synchronize with a provider.
+
+
+.. ditaa:: +-----------+          +---------+          +----------+
+           | Requester |          | Leader  |          | Provider |
+           +-----------+          +---------+          +----------+
+                  |                    |                     |
+                  |                    |                     |
+                  | Ask to Synchronize |                     |
+                  |------------------->|                     |
+                  |                    |                     |
+                  |<-------------------|                     |
+                  | Tell Requester to  |                     |
+                  | Sync with Provider |                     |
+                  |                    |                     |
+                  |               Synchronize                |
+                  |--------------------+-------------------->|
+                  |                    |                     |
+                  |<-------------------+---------------------|
+                  |        Send Chunk to Requester           |
+                  |         (repeat as necessary)            |
+                  |    Requester Acks Chuck to Provider      |
+                  |--------------------+-------------------->|
+                  |                    |
+                  |   Sync Complete    |
+                  |    Notification    |
+                  |------------------->|
+                  |                    |
+                  |<-------------------|
+                  |        Ack         |
+                  |                    |
+
+
+Synchronization always occurs when a new monitor joins the cluster. During
+runtime operations, monitors may receive updates to the cluster map at different
+times. This means the leader and provider roles may migrate from one monitor to
+another. If this happens while synchronizing (e.g., a provider falls behind the
+leader), the provider can terminate synchronization with a requester.
+
+Once synchronization is complete, Ceph requires trimming across the cluster. 
+Trimming requires that the placement groups are ``active + clean``.
+
+
+``mon sync trim timeout``
+
+:Description: 
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync heartbeat timeout``
+
+:Description: 
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync heartbeat interval``
+
+:Description: 
+:Type: Double
+:Default: ``5.0``
+
+
+``mon sync backoff timeout``
+
+:Description: 
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync timeout``
+
+:Description: Number of seconds the monitor will wait for the next update
+              message from its sync provider before it gives up and bootstrap
+              again.
+:Type: Double
+:Default: ``30.0``
+
+
+``mon sync max retries``
+
+:Description: 
+:Type: Integer
+:Default: ``5``
+
+
+``mon sync max payload size``
+
+:Description: The maximum size for a sync payload (in bytes).
+:Type: 32-bit Integer
+:Default: ``1045676``
+
+
+``paxos max join drift``
+
+:Description: The maximum Paxos iterations before we must first sync the
+              monitor data stores. When a monitor finds that its peer is too
+              far ahead of it, it will first sync with data stores before moving
+              on.
+:Type: Integer
+:Default: ``10``
+
+``paxos stash full interval``
+
+:Description: How often (in commits) to stash a full copy of the PaxosService state.
+              Current this setting only affects ``mds``, ``mon``, ``auth`` and ``mgr``
+              PaxosServices.
+:Type: Integer
+:Default: 25
+
+``paxos propose interval``
+
+:Description: Gather updates for this time interval before proposing 
+              a map update.
+:Type: Double
+:Default: ``1.0``
+
+
+``paxos min``
+
+:Description: The minimum number of paxos states to keep around
+:Type: Integer
+:Default: 500
+
+
+``paxos min wait``
+
+:Description: The minimum amount of time to gather updates after a period of 
+              inactivity.
+:Type: Double
+:Default: ``0.05``
+
+
+``paxos trim min``
+
+:Description: Number of extra proposals tolerated before trimming
+:Type: Integer
+:Default: 250
+
+
+``paxos trim max``
+
+:Description: The maximum number of extra proposals to trim at a time
+:Type: Integer
+:Default: 500
+
+
+``paxos service trim min``
+
+:Description: The minimum amount of versions to trigger a trim (0 disables it)
+:Type: Integer
+:Default: 250
+
+
+``paxos service trim max``
+
+:Description: The maximum amount of versions to trim during a single proposal (0 disables it)
+:Type: Integer
+:Default: 500
+
+
+``mon max log epochs``
+
+:Description: The maximum amount of log epochs to trim during a single proposal
+:Type: Integer
+:Default: 500
+
+
+``mon max pgmap epochs``
+
+:Description: The maximum amount of pgmap epochs to trim during a single proposal
+:Type: Integer
+:Default: 500
+
+
+``mon mds force trim to``
+
+:Description: Force monitor to trim mdsmaps to this point (0 disables it.
+              dangerous, use with care)
+:Type: Integer
+:Default: 0
+
+
+``mon osd force trim to``
+
+:Description: Force monitor to trim osdmaps to this point, even if there is
+              PGs not clean at the specified epoch (0 disables it. dangerous,
+              use with care)
+:Type: Integer
+:Default: 0
+
+``mon osd cache size``
+
+:Description: The size of osdmaps cache, not to rely on underlying store's cache
+:Type: Integer
+:Default: 10
+
+
+``mon election timeout``
+
+:Description: On election proposer, maximum waiting time for all ACKs in seconds.
+:Type: Float
+:Default: ``5``
+
+
+``mon lease`` 
+
+:Description: The length (in seconds) of the lease on the monitor's versions.
+:Type: Float
+:Default: ``5``
+
+
+``mon lease renew interval factor``
+
+:Description: ``mon lease`` \* ``mon lease renew interval factor`` will be the
+              interval for the Leader to renew the other monitor's leases. The
+              factor should be less than ``1.0``.
+:Type: Float
+:Default: ``0.6``
+
+
+``mon lease ack timeout factor``
+
+:Description: The Leader will wait ``mon lease`` \* ``mon lease ack timeout factor``
+              for the Providers to acknowledge the lease extension.
+:Type: Float
+:Default: ``2.0``
+
+
+``mon accept timeout factor``
+
+:Description: The Leader will wait ``mon lease`` \* ``mon accept timeout factor``
+              for the Requester(s) to accept a Paxos update. It is also used
+              during the Paxos recovery phase for similar purposes.
+:Type: Float
+:Default: ``2.0``
+
+
+``mon min osdmap epochs`` 
+
+:Description: Minimum number of OSD map epochs to keep at all times.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``mon max pgmap epochs`` 
+
+:Description: Maximum number of PG map epochs the monitor should keep.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``mon max log epochs`` 
+
+:Description: Maximum number of Log epochs the monitor should keep.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+
+.. index:: Ceph Monitor; clock
+
+Clock
+-----
+
+Ceph daemons pass critical messages to each other, which must be processed
+before daemons reach a timeout threshold. If the clocks in Ceph monitors
+are not synchronized, it can lead to a number of anomalies. For example:
+
+- Daemons ignoring received messages (e.g., timestamps outdated)
+- Timeouts triggered too soon/late when a message wasn't received in time.
+
+See `Monitor Store Synchronization`_ for details.
+
+
+.. tip:: You SHOULD install NTP on your Ceph monitor hosts to 
+         ensure that the monitor cluster operates with synchronized clocks.
+
+Clock drift may still be noticeable with NTP even though the discrepancy is not
+yet harmful. Ceph's clock drift / clock skew warnings may get triggered even 
+though NTP maintains a reasonable level of synchronization. Increasing your 
+clock drift may be tolerable under such circumstances; however, a number of 
+factors such as workload, network latency, configuring overrides to default 
+timeouts and the `Monitor Store Synchronization`_ settings may influence 
+the level of acceptable clock drift without compromising Paxos guarantees.
+
+Ceph provides the following tunable options to allow you to find 
+acceptable values.
+
+
+``clock offset``
+
+:Description: How much to offset the system clock. See ``Clock.cc`` for details.
+:Type: Double
+:Default: ``0``
+
+
+.. deprecated:: 0.58
+
+``mon tick interval`` 
+
+:Description: A monitor's tick interval in seconds. 
+:Type: 32-bit Integer
+:Default: ``5`` 
+
+
+``mon clock drift allowed`` 
+
+:Description: The clock drift in seconds allowed between monitors.
+:Type: Float
+:Default: ``.050``
+
+
+``mon clock drift warn backoff`` 
+
+:Description: Exponential backoff for clock drift warnings
+:Type: Float
+:Default: ``5``
+
+
+``mon timecheck interval``
+
+:Description: The time check interval (clock drift check) in seconds 
+              for the Leader.
+
+:Type: Float
+:Default: ``300.0``
+
+
+``mon timecheck skew interval``
+
+:Description: The time check interval (clock drift check) in seconds when in
+              presence of a skew in seconds for the Leader.
+:Type: Float
+:Default: ``30.0``
+
+
+Client
+------
+
+``mon client hunt interval``
+
+:Description: The client will try a new monitor every ``N`` seconds until it
+              establishes a connection.
+              
+:Type: Double
+:Default: ``3.0``
+
+
+``mon client ping interval``
+
+:Description: The client will ping the monitor every ``N`` seconds.
+:Type: Double
+:Default: ``10.0``
+
+
+``mon client max log entries per message``
+
+:Description: The maximum number of log entries a monitor will generate 
+              per client message.
+
+:Type: Integer
+:Default: ``1000``
+
+
+``mon client bytes``
+
+:Description: The amount of client message data allowed in memory (in bytes).
+:Type: 64-bit Integer Unsigned
+:Default: ``100ul << 20``
+
+
+Pool settings
+=============
+Since version v0.94 there is support for pool flags which allow or disallow changes to be made to pools.
+
+Monitors can also disallow removal of pools if configured that way.
+
+``mon allow pool delete``
+
+:Description: If the monitors should allow pools to be removed. Regardless of what the pool flags say.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag hashpspool``
+
+:Description: Set the hashpspool flag on new pools
+:Type: Boolean
+:Default: ``true``
+
+``osd pool default flag nodelete``
+
+:Description: Set the nodelete flag on new pools. Prevents allow pool removal with this flag in any way.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag nopgchange``
+
+:Description: Set the nopgchange flag on new pools. Does not allow the number of PGs to be changed for a pool.
+:Type: Boolean
+:Default: ``false``
+
+``osd pool default flag nosizechange``
+
+:Description: Set the nosizechange flag on new pools. Does not allow the size to be changed of pool.
+:Type: Boolean
+:Default: ``false``
+
+For more information about the pool flags see `Pool values`_.
+
+Miscellaneous
+=============
+
+
+``mon max osd``
+
+:Description: The maximum number of OSDs allowed in the cluster.
+:Type: 32-bit Integer
+:Default: ``10000``
+
+``mon globalid prealloc`` 
+
+:Description: The number of global IDs to pre-allocate for clients and daemons in the cluster.
+:Type: 32-bit Integer
+:Default: ``100``
+
+``mon subscribe interval`` 
+
+:Description: The refresh interval (in seconds) for subscriptions. The 
+              subscription mechanism enables obtaining the cluster maps 
+              and log information.
+
+:Type: Double
+:Default: ``300`` 
+
+
+``mon stat smooth intervals``
+
+:Description: Ceph will smooth statistics over the last ``N`` PG maps.
+:Type: Integer
+:Default: ``2``
+
+
+``mon probe timeout`` 
+
+:Description: Number of seconds the monitor will wait to find peers before bootstrapping.
+:Type: Double
+:Default: ``2.0``
+
+
+``mon daemon bytes``
+
+:Description: The message memory cap for metadata server and OSD messages (in bytes).
+:Type: 64-bit Integer Unsigned
+:Default: ``400ul << 20``
+
+
+``mon max log entries per event``
+
+:Description: The maximum number of log entries per event. 
+:Type: Integer
+:Default: ``4096``
+
+
+``mon osd prime pg temp``
+
+:Description: Enables or disable priming the PGMap with the previous OSDs when an out
+              OSD comes back into the cluster. With the ``true`` setting the clients
+              will continue to use the previous OSDs until the newly in OSDs as that
+              PG peered.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd prime pg temp max time``
+
+:Description: How much time in seconds the monitor should spend trying to prime the
+              PGMap when an out OSD comes back into the cluster.
+:Type: Float
+:Default: ``0.5``
+
+
+``mon osd prime pg temp max time estimate``
+
+:Description: Maximum estimate of time spent on each PG before we prime all PGs
+              in parallel.
+:Type: Float
+:Default: ``0.25``
+
+
+``mon osd allow primary affinity``
+
+:Description: allow ``primary_affinity`` to be set in the osdmap.
+:Type: Boolean
+:Default: False
+
+
+``mon osd pool ec fast read``
+
+:Description: Whether turn on fast read on the pool or not. It will be used as
+              the default setting of newly created erasure pools if ``fast_read``
+              is not specified at create time.
+:Type: Boolean
+:Default: False
+
+
+``mon mds skip sanity``
+
+:Description: Skip safety assertions on FSMap (in case of bugs where we want to
+              continue anyway). Monitor terminates if the FSMap sanity check
+              fails, but we can disable it by enabling this option.
+:Type: Boolean
+:Default: False
+
+
+``mon max mdsmap epochs``
+
+:Description: The maximum amount of mdsmap epochs to trim during a single proposal.
+:Type: Integer
+:Default: 500
+
+
+``mon config key max entry size``
+
+:Description: The maximum size of config-key entry (in bytes)
+:Type: Integer
+:Default: 4096
+
+
+``mon scrub interval``
+
+:Description: How often (in seconds) the monitor scrub its store by comparing
+              the stored checksums with the computed ones of all the stored
+              keys.
+:Type: Integer
+:Default: 3600*24
+
+
+``mon scrub max keys``
+
+:Description: The maximum number of keys to scrub each time.
+:Type: Integer
+:Default: 100
+
+
+``mon compact on start``
+
+:Description: Compact the database used as Ceph Monitor store on
+              ``ceph-mon`` start. A manual compaction helps to shrink the
+              monitor database and improve the performance of it if the regular
+              compaction fails to work.
+:Type: Boolean
+:Default: False
+
+
+``mon compact on bootstrap``
+
+:Description: Compact the database used as Ceph Monitor store on
+              on bootstrap. Monitor starts probing each other for creating
+              a quorum after bootstrap. If it times out before joining the
+              quorum, it will start over and bootstrap itself again.
+:Type: Boolean
+:Default: False
+
+
+``mon compact on trim``
+
+:Description: Compact a certain prefix (including paxos) when we trim its old states.
+:Type: Boolean
+:Default: True
+
+
+``mon cpu threads``
+
+:Description: Number of threads for performing CPU intensive work on monitor.
+:Type: Boolean
+:Default: True
+
+
+``mon osd mapping pgs per chunk``
+
+:Description: We calculate the mapping from placement group to OSDs in chunks.
+              This option specifies the number of placement groups per chunk.
+:Type: Integer
+:Default: 4096
+
+
+``mon osd max split count``
+
+:Description: Largest number of PGs per "involved" OSD to let split create.
+              When we increase the ``pg_num`` of a pool, the placement groups
+              will be splitted on all OSDs serving that pool. We want to avoid
+              extreme multipliers on PG splits.
+:Type: Integer
+:Default: 300
+
+
+``mon session timeout``
+
+:Description: Monitor will terminate inactive sessions stay idle over this
+              time limit.
+:Type: Integer
+:Default: 300
+
+
+
+.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)
+.. _Monitor Keyrings: ../../../dev/mon-bootstrap#secret-keys
+.. _Ceph configuration file: ../ceph-conf/#monitors
+.. _Network Configuration Reference: ../network-config-ref
+.. _Monitor lookup through DNS: ../mon-lookup-dns
+.. _ACID: http://en.wikipedia.org/wiki/ACID
+.. _Adding/Removing a Monitor: ../../operations/add-or-rm-mons
+.. _Add/Remove a Monitor (ceph-deploy): ../../deployment/ceph-deploy-mon
+.. _Monitoring a Cluster: ../../operations/monitoring
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg
+.. _Bootstrapping a Monitor: ../../../dev/mon-bootstrap
+.. _Changing a Monitor's IP Address: ../../operations/add-or-rm-mons#changing-a-monitor-s-ip-address
+.. _Monitor/OSD Interaction: ../mon-osd-interaction
+.. _Scalability and High Availability: ../../../architecture#scalability-and-high-availability
+.. _Pool values: ../../operations/pools/#set-pool-values
diff --git a/src/ceph/doc/rados/configuration/mon-lookup-dns.rst b/src/ceph/doc/rados/configuration/mon-lookup-dns.rst
new file mode 100644
index 0000000..e32b320
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-lookup-dns.rst
@@ -0,0 +1,51 @@
+===============================
+Looking op Monitors through DNS
+===============================
+
+Since version 11.0.0 RADOS supports looking up Monitors through DNS.
+
+This way daemons and clients do not require a *mon host* configuration directive in their ceph.conf configuration file.
+
+Using DNS SRV TCP records clients are able to look up the monitors.
+
+This allows for less configuration on clients and monitors. Using a DNS update clients and daemons can be made aware of changes in the monitor topology.
+
+By default clients and daemons will look for the TCP service called *ceph-mon* which is configured by the *mon_dns_srv_name* configuration directive.
+
+
+``mon dns srv name``
+
+:Description: the service name used querying the DNS for the monitor hosts/addresses
+:Type: String
+:Default: ``ceph-mon``
+
+Example
+-------
+When the DNS search domain is set to *example.com* a DNS zone file might contain the following elements.
+
+First, create records for the Monitors, either IPv4 (A) or IPv6 (AAAA).
+
+::
+
+    mon1.example.com. AAAA 2001:db8::100
+    mon2.example.com. AAAA 2001:db8::200
+    mon3.example.com. AAAA 2001:db8::300
+
+::
+
+    mon1.example.com. A 192.168.0.1
+    mon2.example.com. A 192.168.0.2
+    mon3.example.com. A 192.168.0.3
+
+
+With those records now existing we can create the SRV TCP records with the name *ceph-mon* pointing to the three Monitors.
+
+::
+
+    _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon1.example.com.
+    _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon2.example.com.
+    _ceph-mon._tcp.example.com. 60 IN SRV 10 60 6789 mon3.example.com.
+
+In this case the Monitors are running on port *6789*, and their priority and weight are all *10* and *60* respectively.
+
+The current implementation in clients and daemons will *only* respect the priority set in SRV records, and they will only connect to the monitors with lowest-numbered priority. The targets with the same priority will be selected at random.
diff --git a/src/ceph/doc/rados/configuration/mon-osd-interaction.rst b/src/ceph/doc/rados/configuration/mon-osd-interaction.rst
new file mode 100644
index 0000000..e335ff0
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/mon-osd-interaction.rst
@@ -0,0 +1,408 @@
+=====================================
+ Configuring Monitor/OSD Interaction
+=====================================
+
+.. index:: heartbeat
+
+After you have completed your initial Ceph configuration, you may deploy and run
+Ceph.  When you execute a command such as ``ceph health`` or ``ceph -s``,  the
+:term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
+Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
+reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
+OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
+Monitor doesn't receive reports, or if it receives reports of changes in the
+Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
+Cluster Map`.
+
+Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
+interaction. However, you may override the defaults. The following sections
+describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
+monitoring the Ceph Storage Cluster.
+
+.. index:: heartbeat interval
+
+OSDs Check Heartbeats
+=====================
+
+Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
+seconds. You can change the heartbeat interval by adding an ``osd heartbeat
+interval`` setting under the ``[osd]`` section of your Ceph configuration file,
+or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
+show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
+consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
+Monitor, which will update the Ceph Cluster Map. You may change this grace
+period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
+and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
+or by setting the value at runtime.
+
+
+.. ditaa:: +---------+          +---------+
+           |  OSD 1  |          |  OSD 2  |
+           +---------+          +---------+
+                |                    |
+                |----+ Heartbeat     |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |       Check        |
+                |     Heartbeat      |
+                |------------------->|
+                |                    |
+                |<-------------------|
+                |   Heart Beating    |
+                |                    |
+                |----+ Heartbeat     |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |       Check        |
+                |     Heartbeat      |
+                |------------------->|
+                |                    |
+                |----+ Grace         |
+                |    | Period        |
+                |<---+ Exceeded      |
+                |                    |
+                |----+ Mark          |
+                |    | OSD 2         |
+                |<---+ Down          |
+
+
+.. index:: OSD down report
+
+OSDs Report Down OSDs
+=====================
+
+By default, two Ceph OSD Daemons from different hosts must report to the Ceph
+Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
+acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
+that all the OSDs reporting the failure are hosted in a rack with a bad switch
+which has trouble connecting to another OSD. To avoid this sort of false alarm,
+we consider the peers reporting a failure a proxy for a potential "subcluster"
+over the overall cluster that is similarly laggy. This is clearly not true in
+all cases, but will sometimes help us localize the grace correction to a subset
+of the system that is unhappy. ``mon osd reporter subtree level`` is used to
+group the peers into the "subcluster" by their common ancestor type in CRUSH
+map. By default, only two reports from different subtree are required to report
+another Ceph OSD Daemon ``down``. You can change the number of reporters from
+unique subtrees and the common ancestor type required to report a Ceph OSD
+Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
+and ``mon osd reporter subtree level`` settings  under the ``[mon]`` section of
+your Ceph configuration file, or by setting the value at runtime.
+
+
+.. ditaa:: +---------+     +---------+      +---------+
+           |  OSD 1  |     |  OSD 2  |      | Monitor |
+           +---------+     +---------+      +---------+
+                |               |                |
+                | OSD 3 Is Down |                |
+                |---------------+--------------->|
+                |               |                |
+                |               |                |
+                |               | OSD 3 Is Down  |
+                |               |--------------->|
+                |               |                |
+                |               |                |
+                |               |                |---------+ Mark
+                |               |                |         | OSD 3
+                |               |                |<--------+ Down
+
+
+.. index:: peering failure
+
+OSDs Report Peering Failure
+===========================
+
+If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
+Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
+the most recent copy of the cluster map every 30 seconds. You can change the
+Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
+setting under the ``[osd]`` section of your Ceph configuration file, or by
+setting the value at runtime.
+
+.. ditaa:: +---------+     +---------+     +-------+     +---------+
+           |  OSD 1  |     |  OSD 2  |     | OSD 3 |     | Monitor |
+           +---------+     +---------+     +-------+     +---------+
+                |               |              |              |
+                |  Request To   |              |              |
+                |     Peer      |              |              |
+                |-------------->|              |              |
+                |<--------------|              |              |
+                |    Peering                   |              |
+                |                              |              |
+                |  Request To                  |              |
+                |     Peer                     |              |
+                |----------------------------->|              |
+                |                                             |
+                |----+ OSD Monitor                            |
+                |    | Heartbeat                              |
+                |<---+ Interval Exceeded                      |
+                |                                             |
+                |         Failed to Peer with OSD 3           |
+                |-------------------------------------------->|
+                |<--------------------------------------------|
+                |          Receive New Cluster Map            |
+
+
+.. index:: OSD status
+
+OSDs Report Their Status
+========================
+
+If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
+consider the Ceph OSD Daemon ``down`` after the  ``mon osd report timeout``
+elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
+event such as a failure, a change in placement group stats, a change in
+``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
+Daemon minimum report interval by adding an ``osd mon report interval min``
+setting under the ``[osd]`` section of your Ceph configuration file, or by
+setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
+Monitor every 120 seconds irrespective of whether any notable changes occur.
+You can change the Ceph Monitor report interval by adding an ``osd mon report
+interval max`` setting under the ``[osd]`` section of your Ceph configuration
+file, or by setting the value at runtime.
+
+
+.. ditaa:: +---------+          +---------+
+           |  OSD 1  |          | Monitor |
+           +---------+          +---------+
+                |                    |
+                |----+ Report Min    |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |----+ Reportable    |
+                |    | Event         |
+                |<---+ Occurs        |
+                |                    |
+                |     Report To      |
+                |      Monitor       |
+                |------------------->|
+                |                    |
+                |----+ Report Max    |
+                |    | Interval      |
+                |<---+ Exceeded      |
+                |                    |
+                |     Report To      |
+                |      Monitor       |
+                |------------------->|
+                |                    |
+                |----+ Monitor       |
+                |    | Fails         |
+                |<---+               |
+                                     +----+ Monitor OSD
+                                     |    | Report Timeout
+                                     |<---+ Exceeded
+                                     |
+                                     +----+ Mark
+                                     |    | OSD 1
+                                     |<---+ Down
+
+
+
+
+Configuration Settings
+======================
+
+When modifying heartbeat settings, you should include them in the ``[global]``
+section of your configuration file.
+
+.. index:: monitor heartbeat
+
+Monitor Settings
+----------------
+
+``mon osd min up ratio``
+
+:Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
+              mark Ceph OSD Daemons ``down``.
+
+:Type: Double
+:Default: ``.3``
+
+
+``mon osd min in ratio``
+
+:Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
+              mark Ceph OSD Daemons ``out``.
+
+:Type: Double
+:Default: ``.75``
+
+
+``mon osd laggy halflife``
+
+:Description: The number of seconds laggy estimates will decay.
+:Type: Integer
+:Default: ``60*60``
+
+
+``mon osd laggy weight``
+
+:Description: The weight for new samples in laggy estimation decay.
+:Type: Double
+:Default: ``0.3``
+
+
+
+``mon osd laggy max interval``
+
+:Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
+              Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
+              a certain OSD. This value will be used to calculate the grace time for
+              that OSD.
+:Type: Integer
+:Default: 300
+
+``mon osd adjust heartbeat grace``
+
+:Description: If set to ``true``, Ceph will scale based on laggy estimations.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd adjust down out interval``
+
+:Description: If set to ``true``, Ceph will scaled based on laggy estimations.
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd auto mark in``
+
+:Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
+              the Ceph Storage Cluster.
+
+:Type: Boolean
+:Default: ``false``
+
+
+``mon osd auto mark auto out in``
+
+:Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
+              of the Ceph Storage Cluster as ``in`` the cluster.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd auto mark new in``
+
+:Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
+              Ceph Storage Cluster.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``mon osd down out interval``
+
+:Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
+              ``down`` and ``out`` if it doesn't respond.
+
+:Type: 32-bit Integer
+:Default: ``600``
+
+
+``mon osd down out subtree limit``
+
+:Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
+              automatically mark out. For instance, if set to ``host`` and if
+              all OSDs of a host are down, Ceph will not automatically mark out
+              these OSDs.
+
+:Type: String
+:Default: ``rack``
+
+
+``mon osd report timeout``
+
+:Description: The grace period in seconds before declaring
+              unresponsive Ceph OSD Daemons ``down``.
+
+:Type: 32-bit Integer
+:Default: ``900``
+
+``mon osd min down reporters``
+
+:Description: The minimum number of Ceph OSD Daemons required to report a
+              ``down`` Ceph OSD Daemon.
+
+:Type: 32-bit Integer
+:Default: ``2``
+
+
+``mon osd reporter subtree level``
+
+:Description: In which level of parent bucket the reporters are counted. The OSDs
+              send failure reports to monitor if they find its peer is not responsive.
+              And monitor mark the reported OSD out and then down after a grace period.
+:Type: String
+:Default: ``host``
+
+
+.. index:: OSD hearbeat
+
+OSD Settings
+------------
+
+``osd heartbeat address``
+
+:Description: An Ceph OSD Daemon's network address for heartbeats.
+:Type: Address
+:Default: The host address.
+
+
+``osd heartbeat interval``
+
+:Description: How often an Ceph OSD Daemon pings its peers (in seconds).
+:Type: 32-bit Integer
+:Default: ``6``
+
+
+``osd heartbeat grace``
+
+:Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
+              that the Ceph Storage Cluster considers it ``down``.
+              This setting has to be set in both the [mon] and [osd] or [global]
+              section so that it is read by both the MON and OSD daemons.
+:Type: 32-bit Integer
+:Default: ``20``
+
+
+``osd mon heartbeat interval``
+
+:Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
+              Ceph OSD Daemon peers.
+
+:Type: 32-bit Integer
+:Default: ``30``
+
+
+``osd mon report interval max``
+
+:Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
+              it must report to a Ceph Monitor.
+
+:Type: 32-bit Integer
+:Default: ``120``
+
+
+``osd mon report interval min``
+
+:Description: The minimum number of seconds a Ceph OSD Daemon may wait
+              from startup or another reportable event before reporting
+              to a Ceph Monitor.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: Should be less than ``osd mon report interval max``
+
+
+``osd mon ack timeout``
+
+:Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
+              request for statistics.
+
+:Type: 32-bit Integer
+:Default: ``30``
diff --git a/src/ceph/doc/rados/configuration/ms-ref.rst b/src/ceph/doc/rados/configuration/ms-ref.rst
new file mode 100644
index 0000000..55d009e
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/ms-ref.rst
@@ -0,0 +1,154 @@
+===========
+ Messaging
+===========
+
+General Settings
+================
+
+``ms tcp nodelay``
+
+:Description: Disables nagle's algorithm on messenger tcp sessions.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``ms initial backoff``
+
+:Description: The initial time to wait before reconnecting on a fault.
+:Type: Double
+:Required: No
+:Default: ``.2``
+
+
+``ms max backoff``
+
+:Description: The maximum time to wait before reconnecting on a fault.
+:Type: Double
+:Required: No
+:Default: ``15.0``
+
+
+``ms nocrc``
+
+:Description: Disables crc on network messages.  May increase performance if cpu limited.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms die on bad msg``
+
+:Description: Debug option; do not configure.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms dispatch throttle bytes``
+
+:Description: Throttles total size of messages waiting to be dispatched.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``100 << 20``
+
+
+``ms bind ipv6``
+
+:Description: Enable if you want your daemons to bind to IPv6 address instead of IPv4 ones. (Not required if you specify a daemon or cluster IP.)
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``ms rwthread stack bytes``
+
+:Description: Debug option for stack size; do not configure.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``1024 << 10``
+
+
+``ms tcp read timeout``
+
+:Description: Controls how long (in seconds) the messenger will wait before closing an idle connection.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``900``
+
+
+``ms inject socket failures``
+
+:Description: Debug option; do not configure.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``0``
+
+Async messenger options
+=======================
+
+
+``ms async transport type``
+
+:Description: Transport type used by Async Messenger. Can be ``posix``, ``dpdk``
+              or ``rdma``. Posix uses standard TCP/IP networking and is default. 
+              Other transports may be experimental and support may be limited.
+:Type: String
+:Required: No
+:Default: ``posix``
+
+
+``ms async op threads``
+
+:Description: Initial number of worker threads used by each Async Messenger instance.
+              Should be at least equal to highest number of replicas, but you can
+              decrease it if you are low on CPU core count and/or you host a lot of
+              OSDs on single server.
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``3``
+
+
+``ms async max op threads``
+
+:Description: Maximum number of worker threads used by each Async Messenger instance. 
+              Set to lower values when your machine has limited CPU count, and increase 
+              when your CPUs are underutilized (i. e. one or more of CPUs are
+              constantly on 100% load during I/O operations).
+:Type: 64-bit Unsigned Integer
+:Required: No
+:Default: ``5``
+
+
+``ms async set affinity``
+
+:Description: Set to true to bind Async Messenger workers to particular CPU cores. 
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``ms async affinity cores``
+
+:Description: When ``ms async set affinity`` is true, this string specifies how Async
+              Messenger workers are bound to CPU cores. For example, "0,2" will bind
+              workers #1 and #2 to CPU cores #0 and #2, respectively.
+              NOTE: when manually setting affinity, make sure to not assign workers to
+              processors that are virtual CPUs created as an effect of Hyperthreading
+              or similar technology, because they are slower than regular CPU cores.
+:Type: String
+:Required: No
+:Default: ``(empty)``
+
+
+``ms async send inline``
+
+:Description: Send messages directly from the thread that generated them instead of
+              queuing and sending from Async Messenger thread. This option is known
+              to decrease performance on systems with a lot of CPU cores, so it's
+              disabled by default.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
diff --git a/src/ceph/doc/rados/configuration/network-config-ref.rst b/src/ceph/doc/rados/configuration/network-config-ref.rst
new file mode 100644
index 0000000..2d7f9d6
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/network-config-ref.rst
@@ -0,0 +1,494 @@
+=================================
+ Network Configuration Reference
+=================================
+
+Network configuration is critical for building a high performance  :term:`Ceph
+Storage Cluster`. The Ceph Storage Cluster does not perform  request routing or
+dispatching on behalf of the :term:`Ceph Client`. Instead, Ceph Clients make
+requests directly to Ceph OSD Daemons. Ceph OSD Daemons perform data replication
+on behalf of Ceph Clients, which means replication and other factors impose
+additional loads on Ceph Storage Cluster networks.
+
+Our Quick Start configurations provide a trivial `Ceph configuration file`_ that
+sets monitor IP addresses and daemon host names only. Unless you specify a
+cluster network, Ceph assumes a single "public" network. Ceph functions just
+fine with a public network only, but you may see significant performance
+improvement with a second "cluster" network in a large cluster.
+
+We recommend running a Ceph Storage Cluster with two networks: a public
+(front-side) network and a cluster (back-side) network. To support two networks,
+each :term:`Ceph Node` will need to have more than one NIC. See `Hardware
+Recommendations -  Networks`_ for additional details.
+
+.. ditaa::
+                               +-------------+
+                               | Ceph Client |
+                               +----*--*-----+
+                                    |  ^
+                            Request |  : Response
+                                    v  |
+ /----------------------------------*--*-------------------------------------\
+ |                              Public Network                               |
+ \---*--*------------*--*-------------*--*------------*--*------------*--*---/
+     ^  ^            ^  ^             ^  ^            ^  ^            ^  ^
+     |  |            |  |             |  |            |  |            |  |
+     |  :            |  :             |  :            |  :            |  :
+     v  v            v  v             v  v            v  v            v  v
+ +---*--*---+    +---*--*---+     +---*--*---+    +---*--*---+    +---*--*---+
+ | Ceph MON |    | Ceph MDS |     | Ceph OSD |    | Ceph OSD |    | Ceph OSD |
+ +----------+    +----------+     +---*--*---+    +---*--*---+    +---*--*---+
+                                      ^  ^            ^  ^            ^  ^
+     The cluster network relieves     |  |            |  |            |  |
+     OSD replication and heartbeat    |  :            |  :            |  :
+     traffic from the public network. v  v            v  v            v  v
+ /------------------------------------*--*------------*--*------------*--*---\
+ |   cCCC                      Cluster Network                               |
+ \---------------------------------------------------------------------------/
+
+
+There are several reasons to consider operating two separate networks:
+
+#. **Performance:** Ceph OSD Daemons handle data replication for the Ceph 
+   Clients. When Ceph OSD Daemons replicate data more than once, the network 
+   load between Ceph OSD Daemons easily dwarfs the network load between Ceph 
+   Clients and the Ceph Storage Cluster. This can introduce latency and 
+   create a performance problem. Recovery and rebalancing can 
+   also introduce significant latency on the public network. See 
+   `Scalability and High Availability`_ for additional details on how Ceph 
+   replicates data. See `Monitor / OSD Interaction`_  for details on heartbeat 
+   traffic.
+
+#. **Security**: While most people are generally civil, a very tiny segment of 
+   the population likes to engage in what's known as a Denial of Service (DoS) 
+   attack. When traffic between Ceph OSD Daemons gets disrupted, placement 
+   groups may no longer reflect an ``active + clean`` state, which may prevent 
+   users from reading and writing data. A great way to defeat this type of 
+   attack is to maintain a completely separate cluster network that doesn't 
+   connect directly to the internet. Also, consider using `Message Signatures`_ 
+   to defeat spoofing attacks.
+
+
+IP Tables
+=========
+
+By default, daemons `bind`_ to ports within the ``6800:7300`` range. You may
+configure this range at your discretion. Before configuring your IP tables,
+check the default ``iptables`` configuration.
+
+	sudo iptables -L
+
+Some Linux distributions include rules that reject all inbound requests
+except SSH from all network interfaces. For example:: 
+
+	REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
+
+You will need to delete these rules on both your public and cluster networks
+initially, and replace them with appropriate rules when you are ready to 
+harden the ports on your Ceph Nodes.
+
+
+Monitor IP Tables
+-----------------
+
+Ceph Monitors listen on port ``6789`` by default. Additionally, Ceph Monitors
+always operate on the public network. When you add the rule using the example
+below, make sure you replace ``{iface}`` with the public network interface
+(e.g., ``eth0``, ``eth1``, etc.), ``{ip-address}`` with  the IP address of the
+public network and ``{netmask}`` with the netmask for the public network. ::
+
+   sudo iptables -A INPUT -i {iface} -p tcp -s {ip-address}/{netmask} --dport 6789 -j ACCEPT
+
+
+MDS IP Tables
+-------------
+
+A :term:`Ceph Metadata Server` listens on the first available port on the public
+network beginning at port 6800. Note that this behavior is not deterministic, so
+if you are running more than one OSD or MDS on the same host, or if you restart
+the daemons within a short window of time, the daemons will bind to higher
+ports. You should open the entire 6800-7300 range by default.  When you add the
+rule using the example below, make sure you replace ``{iface}`` with the public
+network interface (e.g., ``eth0``, ``eth1``, etc.), ``{ip-address}`` with the IP
+address of the public network and ``{netmask}`` with the netmask of the public
+network.
+
+For example:: 
+
+	sudo iptables -A INPUT -i {iface} -m multiport -p tcp -s {ip-address}/{netmask} --dports 6800:7300 -j ACCEPT
+
+
+OSD IP Tables
+-------------
+
+By default, Ceph OSD Daemons `bind`_ to the first available ports on a Ceph Node
+beginning at port 6800.  Note that this behavior is not deterministic, so if you
+are running more than one OSD or MDS on the same host, or if you restart the
+daemons within a short window of time, the daemons will bind to higher ports.
+Each Ceph OSD Daemon on a Ceph Node may use up to four ports:
+
+#. One for talking to clients and monitors.
+#. One for sending data to other OSDs.
+#. Two for heartbeating on each interface.
+
+.. ditaa:: 
+              /---------------\
+              |      OSD      |
+              |           +---+----------------+-----------+
+              |           | Clients & Monitors | Heartbeat |
+              |           +---+----------------+-----------+
+              |               |
+              |           +---+----------------+-----------+
+              |           | Data Replication   | Heartbeat |
+              |           +---+----------------+-----------+
+              | cCCC          |
+              \---------------/
+
+When a daemon fails and restarts without letting go of the port, the restarted
+daemon will bind to a new port. You should open the entire 6800-7300 port range
+to handle this possibility.
+
+If you set up separate public and cluster networks, you must add rules for both
+the public network and the cluster network, because clients will connect using
+the public network and other Ceph OSD Daemons will connect using the cluster
+network. When you add the rule using the example below, make sure you replace
+``{iface}`` with the network interface (e.g., ``eth0``, ``eth1``, etc.),
+``{ip-address}`` with the IP address and ``{netmask}`` with the netmask of the
+public or cluster network. For example:: 
+
+	sudo iptables -A INPUT -i {iface}  -m multiport -p tcp -s {ip-address}/{netmask} --dports 6800:7300 -j ACCEPT
+
+.. tip:: If you run Ceph Metadata Servers on the same Ceph Node as the 
+   Ceph OSD Daemons, you can consolidate the public network configuration step. 
+
+
+Ceph Networks
+=============
+
+To configure Ceph networks, you must add a network configuration to the
+``[global]`` section of the configuration file. Our 5-minute Quick Start
+provides a trivial `Ceph configuration file`_ that assumes one public network
+with client and server on the same network and subnet. Ceph functions just fine
+with a public network only. However, Ceph allows you to establish much more
+specific criteria, including multiple IP network and subnet masks for your
+public network. You can also establish a separate cluster network to handle OSD
+heartbeat, object replication and recovery traffic. Don't confuse the IP
+addresses you set in your configuration with the public-facing IP addresses
+network clients may use to access your service. Typical internal IP networks are
+often ``192.168.0.0`` or ``10.0.0.0``.
+
+.. tip:: If you specify more than one IP address and subnet mask for
+   either the public or the cluster network, the subnets within the network
+   must be capable of routing to each other. Additionally, make sure you
+   include each IP address/subnet in your IP tables and open ports for them
+   as necessary.
+
+.. note:: Ceph uses `CIDR`_ notation for subnets (e.g., ``10.0.0.0/24``).
+
+When you have configured your networks, you may restart your cluster or restart
+each daemon. Ceph daemons bind dynamically, so you do not have to restart the
+entire cluster at once if you change your network configuration.
+
+
+Public Network
+--------------
+
+To configure a public network, add the following option to the ``[global]``
+section of your Ceph configuration file. 
+
+.. code-block:: ini
+
+	[global]
+		...
+		public network = {public-network/netmask}
+
+
+Cluster Network
+---------------
+
+If you declare a cluster network, OSDs will route heartbeat, object replication
+and recovery traffic over the cluster network. This may improve performance
+compared to using a single network. To configure a cluster network, add the
+following option to the ``[global]`` section of your Ceph configuration file. 
+
+.. code-block:: ini
+
+	[global]
+		...
+		cluster network = {cluster-network/netmask}
+
+We prefer that the cluster network is **NOT** reachable from the public network
+or the Internet for added security.
+
+
+Ceph Daemons
+============
+
+Ceph has one network configuration requirement that applies to all daemons: the
+Ceph configuration file **MUST** specify the ``host`` for each daemon. Ceph also
+requires that a Ceph configuration file specify the monitor IP address and its
+port.
+
+.. important:: Some deployment tools (e.g., ``ceph-deploy``, Chef) may create a
+   configuration file for you. **DO NOT** set these values if the deployment 
+   tool does it for you.
+
+.. tip:: The ``host`` setting is the short name of the host (i.e., not 
+   an fqdn). It is **NOT** an IP address either.  Enter ``hostname -s`` on 
+   the command line to retrieve the name of the host.
+
+
+.. code-block:: ini
+
+	[mon.a]
+	
+		host = {hostname}
+		mon addr = {ip-address}:6789
+
+	[osd.0]
+		host = {hostname}
+
+
+You do not have to set the host IP address for a daemon. If you have a static IP
+configuration and both public and cluster networks running, the Ceph
+configuration file may specify the IP address of the host for each daemon. To
+set a static IP address for a daemon, the following option(s) should appear in
+the daemon instance sections of your ``ceph.conf`` file.
+
+.. code-block:: ini
+
+	[osd.0]
+		public addr = {host-public-ip-address}
+		cluster addr = {host-cluster-ip-address}
+
+
+.. topic:: One NIC OSD in a Two Network Cluster
+
+   Generally, we do not recommend deploying an OSD host with a single NIC in a 
+   cluster with two networks. However, you may accomplish this by forcing the 
+   OSD host to operate on the public network by adding a ``public addr`` entry
+   to the ``[osd.n]`` section of the Ceph configuration file, where ``n`` 
+   refers to the number of the OSD with one NIC. Additionally, the public
+   network and cluster network must be able to route traffic to each other, 
+   which we don't recommend for security reasons.
+
+
+Network Config Settings
+=======================
+
+Network configuration settings are not required. Ceph assumes a public network
+with all hosts operating on it unless you specifically configure a cluster 
+network.
+
+
+Public Network
+--------------
+
+The public network configuration allows you specifically define IP addresses
+and subnets for the public network. You may specifically assign static IP 
+addresses or override ``public network`` settings using the ``public addr``
+setting for a specific daemon.
+
+``public network``
+
+:Description: The IP address and netmask of the public (front-side) network 
+              (e.g., ``192.168.0.0/24``). Set in ``[global]``. You may specify
+              comma-delimited subnets.
+
+:Type: ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
+:Required: No
+:Default: N/A
+
+
+``public addr``
+
+:Description: The IP address for the public (front-side) network. 
+              Set for each daemon.
+
+:Type: IP Address
+:Required: No
+:Default: N/A
+
+
+
+Cluster Network
+---------------
+
+The cluster network configuration allows you to declare a cluster network, and
+specifically define IP addresses and subnets for the cluster network. You may
+specifically assign static IP  addresses or override ``cluster network``
+settings using the ``cluster addr`` setting for specific OSD daemons.
+
+
+``cluster network``
+
+:Description: The IP address and netmask of the cluster (back-side) network 
+              (e.g., ``10.0.0.0/24``).  Set in ``[global]``. You may specify
+              comma-delimited subnets.
+
+:Type: ``{ip-address}/{netmask} [, {ip-address}/{netmask}]``
+:Required: No
+:Default: N/A
+
+
+``cluster addr``
+
+:Description: The IP address for the cluster (back-side) network. 
+              Set for each daemon.
+
+:Type: Address
+:Required: No
+:Default: N/A
+
+
+Bind
+----
+
+Bind settings set the default port ranges Ceph OSD and MDS daemons use. The
+default range is ``6800:7300``. Ensure that your `IP Tables`_ configuration
+allows you to use the configured port range.
+
+You may also enable Ceph daemons to bind to IPv6 addresses instead of IPv4
+addresses.
+
+
+``ms bind port min``
+
+:Description: The minimum port number to which an OSD or MDS daemon will bind.
+:Type: 32-bit Integer
+:Default: ``6800``
+:Required: No
+
+
+``ms bind port max``
+
+:Description: The maximum port number to which an OSD or MDS daemon will bind.
+:Type: 32-bit Integer
+:Default: ``7300``
+:Required: No. 
+
+
+``ms bind ipv6``
+
+:Description: Enables Ceph daemons to bind to IPv6 addresses. Currently the
+              messenger *either* uses IPv4 or IPv6, but it cannot do both.
+:Type: Boolean
+:Default: ``false``
+:Required: No
+
+``public bind addr``
+
+:Description: In some dynamic deployments the Ceph MON daemon might bind
+              to an IP address locally that is different from the ``public addr``
+              advertised to other peers in the network. The environment must ensure
+              that routing rules are set correclty. If ``public bind addr`` is set
+              the Ceph MON daemon will bind to it locally and use ``public addr``
+              in the monmaps to advertise its address to peers. This behavior is limited
+              to the MON daemon.
+
+:Type: IP Address
+:Required: No
+:Default: N/A
+
+
+
+Hosts
+-----
+
+Ceph expects at least one monitor declared in the Ceph configuration file, with
+a ``mon addr`` setting under each declared monitor. Ceph expects a ``host``
+setting under each declared monitor, metadata server and OSD in the Ceph
+configuration file. Optionally, a monitor can be assigned with a priority, and
+the clients will always connect to the monitor with lower value of priority if
+specified.
+
+
+``mon addr``
+
+:Description: A list of ``{hostname}:{port}`` entries that clients can use to 
+              connect to a Ceph monitor. If not set, Ceph searches ``[mon.*]`` 
+              sections. 
+
+:Type: String
+:Required: No
+:Default: N/A
+
+``mon priority``
+
+:Description: The priority of the declared monitor, the lower value the more
+              prefered when a client selects a monitor when trying to connect
+              to the cluster.
+
+:Type: Unsigned 16-bit Integer
+:Required: No
+:Default: 0
+
+``host``
+
+:Description: The hostname. Use this setting for specific daemon instances 
+              (e.g., ``[osd.0]``).
+
+:Type: String
+:Required: Yes, for daemon instances.
+:Default: ``localhost``
+
+.. tip:: Do not use ``localhost``. To get your host name, execute 
+         ``hostname -s`` on your command line and use the name of your host 
+         (to the first period, not the fully-qualified domain name).
+
+.. important:: You should not specify any value for ``host`` when using a third
+               party deployment system that retrieves the host name for you.
+
+
+
+TCP
+---
+
+Ceph disables TCP buffering by default.
+
+
+``ms tcp nodelay``
+
+:Description: Ceph enables ``ms tcp nodelay`` so that each request is sent 
+              immediately (no buffering). Disabling `Nagle's algorithm`_
+              increases network traffic, which can introduce latency. If you 
+              experience large numbers of small packets, you may try 
+              disabling ``ms tcp nodelay``. 
+
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+
+``ms tcp rcvbuf``
+
+:Description: The size of the socket buffer on the receiving end of a network
+              connection. Disable by default.
+
+:Type: 32-bit Integer
+:Required: No
+:Default: ``0``
+
+
+
+``ms tcp read timeout``
+
+:Description: If a client or daemon makes a request to another Ceph daemon and
+              does not drop an unused connection, the ``ms tcp read timeout`` 
+              defines the connection as idle after the specified number 
+              of seconds.
+
+:Type: Unsigned 64-bit Integer
+:Required: No
+:Default: ``900`` 15 minutes.
+
+
+
+.. _Scalability and High Availability: ../../../architecture#scalability-and-high-availability
+.. _Hardware Recommendations - Networks: ../../../start/hardware-recommendations#networks
+.. _Ceph configuration file: ../../../start/quick-ceph-deploy/#create-a-cluster
+.. _hardware recommendations: ../../../start/hardware-recommendations
+.. _Monitor / OSD Interaction: ../mon-osd-interaction
+.. _Message Signatures: ../auth-config-ref#signatures
+.. _CIDR: http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
+.. _Nagle's Algorithm: http://en.wikipedia.org/wiki/Nagle's_algorithm
diff --git a/src/ceph/doc/rados/configuration/osd-config-ref.rst b/src/ceph/doc/rados/configuration/osd-config-ref.rst
new file mode 100644
index 0000000..fae7078
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/osd-config-ref.rst
@@ -0,0 +1,1105 @@
+======================
+ OSD Config Reference
+======================
+
+.. index:: OSD; configuration
+
+You can configure Ceph OSD Daemons in the Ceph configuration file, but Ceph OSD
+Daemons can use the default values and a very minimal configuration. A minimal
+Ceph OSD Daemon configuration sets ``osd journal size`` and ``host``,  and
+uses default values for nearly everything else.
+
+Ceph OSD Daemons are numerically identified in incremental fashion, beginning
+with ``0`` using the following convention. ::
+
+	osd.0
+	osd.1
+	osd.2
+
+In a configuration file, you may specify settings for all Ceph OSD Daemons in
+the cluster by adding configuration settings to the ``[osd]`` section of your
+configuration file. To add settings directly to a specific Ceph OSD Daemon
+(e.g., ``host``), enter  it in an OSD-specific section of your configuration
+file. For example:
+
+.. code-block:: ini
+	
+	[osd]
+		osd journal size = 1024
+	
+	[osd.0]
+		host = osd-host-a
+		
+	[osd.1]
+		host = osd-host-b
+
+
+.. index:: OSD; config settings
+
+General Settings
+================
+
+The following settings provide an Ceph OSD Daemon's ID, and determine paths to
+data and journals. Ceph deployment scripts typically generate the UUID
+automatically. We **DO NOT** recommend changing the default paths for data or
+journals, as it makes it more problematic to troubleshoot Ceph later. 
+
+The journal size should be at least twice the product of the expected drive
+speed multiplied by ``filestore max sync interval``. However, the most common
+practice is to partition the journal drive (often an SSD), and mount it such
+that Ceph uses the entire partition for the journal.
+
+
+``osd uuid``
+
+:Description: The universally unique identifier (UUID) for the Ceph OSD Daemon.
+:Type: UUID
+:Default: The UUID.
+:Note: The ``osd uuid`` applies to a single Ceph OSD Daemon. The ``fsid`` 
+       applies to the entire cluster.
+
+
+``osd data`` 
+
+:Description: The path to the OSDs data. You must create the directory when 
+              deploying Ceph. You should mount a drive for OSD data at this 
+              mount point. We do not recommend changing the default. 
+
+:Type: String
+:Default: ``/var/lib/ceph/osd/$cluster-$id``
+
+
+``osd max write size`` 
+
+:Description: The maximum size of a write in megabytes.
+:Type: 32-bit Integer
+:Default: ``90``
+
+
+``osd client message size cap`` 
+
+:Description: The largest client data message allowed in memory.
+:Type: 64-bit Unsigned Integer
+:Default: 500MB default. ``500*1024L*1024L`` 
+
+
+``osd class dir`` 
+
+:Description: The class path for RADOS class plug-ins.
+:Type: String
+:Default: ``$libdir/rados-classes``
+
+
+.. index:: OSD; file system
+
+File System Settings
+====================
+Ceph builds and mounts file systems which are used for Ceph OSDs.
+
+``osd mkfs options {fs-type}`` 
+
+:Description: Options used when creating a new Ceph OSD of type {fs-type}.
+
+:Type: String
+:Default for xfs: ``-f -i 2048``
+:Default for other file systems: {empty string}
+
+For example::
+  ``osd mkfs options xfs = -f -d agcount=24``
+
+``osd mount options {fs-type}`` 
+
+:Description: Options used when mounting a Ceph OSD of type {fs-type}.
+
+:Type: String
+:Default for xfs: ``rw,noatime,inode64``
+:Default for other file systems: ``rw, noatime``
+
+For example::
+  ``osd mount options xfs = rw, noatime, inode64, logbufs=8``
+
+
+.. index:: OSD; journal settings
+
+Journal Settings
+================
+
+By default, Ceph expects that you will store an Ceph OSD Daemons journal with
+the  following path::
+
+	/var/lib/ceph/osd/$cluster-$id/journal
+
+Without performance optimization, Ceph stores the journal on the same disk as
+the Ceph OSD Daemons data. An Ceph OSD Daemon optimized for performance may use
+a separate disk to store journal data (e.g., a solid state drive delivers high
+performance journaling).
+
+Ceph's default ``osd journal size`` is 0, so you will need to set this in your
+``ceph.conf`` file. A journal size should find the product of the ``filestore
+max sync interval`` and the expected throughput, and multiply the product by
+two (2)::  
+	  
+	osd journal size = {2 * (expected throughput * filestore max sync interval)}
+
+The expected throughput number should include the expected disk throughput
+(i.e., sustained data transfer rate), and network throughput. For example, 
+a 7200 RPM disk will likely have approximately 100 MB/s. Taking the ``min()``
+of the disk and network throughput should provide a reasonable expected 
+throughput. Some users just start off with a 10GB journal size. For 
+example::
+
+	osd journal size = 10000
+
+
+``osd journal`` 
+
+:Description: The path to the OSD's journal. This may be a path to a file or a
+              block device (such as a partition of an SSD). If it is a file, 
+              you must create the directory to contain it. We recommend using a
+              drive separate from the ``osd data`` drive.
+
+:Type: String
+:Default: ``/var/lib/ceph/osd/$cluster-$id/journal``
+
+
+``osd journal size`` 
+
+:Description: The size of the journal in megabytes. If this is 0, and the 
+              journal is a block device, the entire block device is used. 
+              Since v0.54, this is ignored if the journal is a block device, 
+              and the entire block device is used.
+
+:Type: 32-bit Integer
+:Default: ``5120``
+:Recommended: Begin with 1GB. Should be at least twice the product of the 
+              expected speed multiplied by ``filestore max sync interval``.
+
+
+See `Journal Config Reference`_ for additional details.
+
+
+Monitor OSD Interaction
+=======================
+
+Ceph OSD Daemons check each other's heartbeats and report to monitors
+periodically. Ceph can use default values in many cases. However, if your
+network  has latency issues, you may need to adopt longer intervals. See
+`Configuring Monitor/OSD Interaction`_ for a detailed discussion of heartbeats.
+
+
+Data Placement
+==============
+
+See `Pool & PG Config Reference`_ for details.
+
+
+.. index:: OSD; scrubbing
+
+Scrubbing
+=========
+
+In addition to making multiple copies of objects, Ceph insures data integrity by
+scrubbing placement groups. Ceph scrubbing is analogous to ``fsck`` on the
+object storage layer. For each placement group, Ceph generates a catalog of all
+objects and compares each primary object and its replicas to ensure that no
+objects are missing or mismatched. Light scrubbing (daily) checks the object
+size and attributes.  Deep scrubbing (weekly) reads the data and uses checksums
+to ensure data integrity.
+
+Scrubbing is important for maintaining data integrity, but it can reduce
+performance. You can adjust the following settings to increase or decrease
+scrubbing operations.
+
+
+``osd max scrubs`` 
+
+:Description: The maximum number of simultaneous scrub operations for 
+              a Ceph OSD Daemon.
+
+:Type: 32-bit Int
+:Default: ``1`` 
+
+``osd scrub begin hour``
+
+:Description: The time of day for the lower bound when a scheduled scrub can be
+              performed.
+:Type: Integer in the range of 0 to 24
+:Default: ``0``
+
+
+``osd scrub end hour``
+
+:Description: The time of day for the upper bound when a scheduled scrub can be
+              performed. Along with ``osd scrub begin hour``, they define a time
+              window, in which the scrubs can happen. But a scrub will be performed
+              no matter the time window allows or not, as long as the placement
+              group's scrub interval exceeds ``osd scrub max interval``.
+:Type: Integer in the range of 0 to 24
+:Default: ``24``
+
+
+``osd scrub during recovery``
+
+:Description: Allow scrub during recovery. Setting this to ``false`` will disable
+              scheduling new scrub (and deep--scrub) while there is active recovery.
+              Already running scrubs will be continued. This might be useful to reduce
+              load on busy clusters.
+:Type: Boolean
+:Default: ``true``
+
+
+``osd scrub thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a scrub thread.
+:Type: 32-bit Integer
+:Default: ``60`` 
+
+
+``osd scrub finalize thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a scrub finalize 
+              thread.
+
+:Type: 32-bit Integer
+:Default: ``60*10``
+
+
+``osd scrub load threshold`` 
+
+:Description: The maximum load. Ceph will not scrub when the system load 
+              (as defined by ``getloadavg()``) is higher than this number. 
+              Default is ``0.5``.
+
+:Type: Float
+:Default: ``0.5`` 
+
+
+``osd scrub min interval`` 
+
+:Description: The minimal interval in seconds for scrubbing the Ceph OSD Daemon
+              when the Ceph Storage Cluster load is low.
+
+:Type: Float
+:Default: Once per day. ``60*60*24``
+
+
+``osd scrub max interval`` 
+
+:Description: The maximum interval in seconds for scrubbing the Ceph OSD Daemon 
+              irrespective of cluster load.
+
+:Type: Float
+:Default: Once per week. ``7*60*60*24``
+
+
+``osd scrub chunk min``
+
+:Description: The minimal number of object store chunks to scrub during single operation.
+              Ceph blocks writes to single chunk during scrub.
+
+:Type: 32-bit Integer
+:Default: 5
+
+
+``osd scrub chunk max``
+
+:Description: The maximum number of object store chunks to scrub during single operation.
+
+:Type: 32-bit Integer
+:Default: 25
+
+
+``osd scrub sleep``
+
+:Description: Time to sleep before scrubbing next group of chunks. Increasing this value will slow
+              down whole scrub operation while client operations will be less impacted.
+
+:Type: Float
+:Default: 0
+
+
+``osd deep scrub interval``
+
+:Description: The interval for "deep" scrubbing (fully reading all data). The 
+              ``osd scrub load threshold`` does not affect this setting.
+
+:Type: Float
+:Default: Once per week.  ``60*60*24*7``
+
+
+``osd scrub interval randomize ratio``
+
+:Description: Add a random delay to ``osd scrub min interval`` when scheduling
+              the next scrub job for a placement group. The delay is a random
+              value less than ``osd scrub min interval`` \*
+              ``osd scrub interval randomized ratio``. So the default setting
+              practically randomly spreads the scrubs out in the allowed time
+              window of ``[1, 1.5]`` \* ``osd scrub min interval``.
+:Type: Float
+:Default: ``0.5``
+
+``osd deep scrub stride``
+
+:Description: Read size when doing a deep scrub.
+:Type: 32-bit Integer
+:Default: 512 KB. ``524288``
+
+
+.. index:: OSD; operations settings
+
+Operations
+==========
+
+Operations settings allow you to configure the number of threads for servicing
+requests. If you set ``osd op threads`` to ``0``, it disables multi-threading.
+By default, Ceph  uses two threads with a 30 second timeout and a 30 second
+complaint time if an operation doesn't complete within those time parameters.
+You can set operations priority weights between client operations and
+recovery operations to ensure optimal performance during recovery.
+
+
+``osd op threads`` 
+
+:Description: The number of threads to service Ceph OSD Daemon operations. 
+              Set to ``0`` to disable it. Increasing the number may increase 
+              the request processing rate.
+
+:Type: 32-bit Integer
+:Default: ``2`` 
+
+
+``osd op queue``
+
+:Description: This sets the type of queue to be used for prioritizing ops
+              in the OSDs. Both queues feature a strict sub-queue which is
+              dequeued before the normal queue. The normal queue is different
+              between implementations. The original PrioritizedQueue (``prio``) uses a
+              token bucket system which when there are sufficient tokens will
+              dequeue high priority queues first. If there are not enough
+              tokens available, queues are dequeued low priority to high priority.
+              The WeightedPriorityQueue (``wpq``) dequeues all priorities in
+              relation to their priorities to prevent starvation of any queue.
+              WPQ should help in cases where a few OSDs are more overloaded
+              than others. The new mClock based OpClassQueue
+              (``mclock_opclass``) prioritizes operations based on which class
+              they belong to (recovery, scrub, snaptrim, client op, osd subop).
+              And, the mClock based ClientQueue (``mclock_client``) also
+              incorporates the client identifier in order to promote fairness
+              between clients. See `QoS Based on mClock`_. Requires a restart.
+
+:Type: String
+:Valid Choices: prio, wpq, mclock_opclass, mclock_client
+:Default: ``prio``
+
+
+``osd op queue cut off``
+
+:Description: This selects which priority ops will be sent to the strict
+              queue verses the normal queue. The ``low`` setting sends all
+              replication ops and higher to the strict queue, while the ``high``
+              option sends only replication acknowledgement ops and higher to
+              the strict queue. Setting this to ``high`` should help when a few
+              OSDs in the cluster are very busy especially when combined with
+              ``wpq`` in the ``osd op queue`` setting. OSDs that are very busy
+              handling replication traffic could starve primary client traffic
+              on these OSDs without these settings. Requires a restart.
+
+:Type: String
+:Valid Choices: low, high
+:Default: ``low``
+
+
+``osd client op priority``
+
+:Description: The priority set for client operations. It is relative to 
+              ``osd recovery op priority``.
+
+:Type: 32-bit Integer
+:Default: ``63`` 
+:Valid Range: 1-63
+
+
+``osd recovery op priority``
+
+:Description: The priority set for recovery operations. It is relative to 
+              ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``3`` 
+:Valid Range: 1-63
+
+
+``osd scrub priority``
+
+:Description: The priority set for scrub operations. It is relative to
+              ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: 1-63
+
+
+``osd snap trim priority``
+
+:Description: The priority set for snap trim operations. It is relative to
+              ``osd client op priority``.
+
+:Type: 32-bit Integer
+:Default: ``5``
+:Valid Range: 1-63
+
+
+``osd op thread timeout`` 
+
+:Description: The Ceph OSD Daemon operation thread timeout in seconds.
+:Type: 32-bit Integer
+:Default: ``15`` 
+
+
+``osd op complaint time`` 
+
+:Description: An operation becomes complaint worthy after the specified number
+              of seconds have elapsed.
+
+:Type: Float
+:Default: ``30`` 
+
+
+``osd disk threads`` 
+
+:Description: The number of disk threads, which are used to perform background 
+              disk intensive OSD operations such as scrubbing and snap 
+              trimming.
+
+:Type: 32-bit Integer
+:Default: ``1`` 
+
+``osd disk thread ioprio class``
+
+:Description: Warning: it will only be used if both ``osd disk thread
+	      ioprio class`` and ``osd disk thread ioprio priority`` are
+	      set to a non default value.  Sets the ioprio_set(2) I/O
+	      scheduling ``class`` for the disk thread. Acceptable
+	      values are ``idle``, ``be`` or ``rt``. The ``idle``
+	      class means the disk thread will have lower priority
+	      than any other thread in the OSD. This is useful to slow
+	      down scrubbing on an OSD that is busy handling client
+	      operations. ``be`` is the default and is the same
+	      priority as all other threads in the OSD. ``rt`` means
+	      the disk thread will have precendence over all other
+	      threads in the OSD. Note: Only works with the Linux Kernel 
+	      CFQ scheduler. Since Jewel scrubbing is no longer carried
+	      out by the disk iothread, see osd priority options instead.
+:Type: String
+:Default: the empty string
+
+``osd disk thread ioprio priority``
+
+:Description: Warning: it will only be used if both ``osd disk thread
+	      ioprio class`` and ``osd disk thread ioprio priority`` are
+	      set to a non default value. It sets the ioprio_set(2)
+	      I/O scheduling ``priority`` of the disk thread ranging
+	      from 0 (highest) to 7 (lowest). If all OSDs on a given
+	      host were in class ``idle`` and compete for I/O
+	      (i.e. due to controller congestion), it can be used to
+	      lower the disk thread priority of one OSD to 7 so that
+	      another OSD with priority 0 can have priority.
+	      Note: Only works with the Linux Kernel CFQ scheduler.
+:Type: Integer in the range of 0 to 7 or -1 if not to be used.
+:Default: ``-1``
+
+``osd op history size``
+
+:Description: The maximum number of completed operations to track.
+:Type: 32-bit Unsigned Integer
+:Default: ``20``
+
+
+``osd op history duration``
+
+:Description: The oldest completed operation to track.
+:Type: 32-bit Unsigned Integer
+:Default: ``600``
+
+
+``osd op log threshold``
+
+:Description: How many operations logs to display at once.
+:Type: 32-bit Integer
+:Default: ``5``
+
+
+QoS Based on mClock
+-------------------
+
+Ceph's use of mClock is currently in the experimental phase and should
+be approached with an exploratory mindset.
+
+Core Concepts
+`````````````
+
+The QoS support of Ceph is implemented using a queueing scheduler
+based on `the dmClock algorithm`_. This algorithm allocates the I/O
+resources of the Ceph cluster in proportion to weights, and enforces
+the constraits of minimum reservation and maximum limitation, so that
+the services can compete for the resources fairly. Currently the
+*mclock_opclass* operation queue divides Ceph services involving I/O
+resources into following buckets:
+
+- client op: the iops issued by client
+- osd subop: the iops issued by primary OSD
+- snap trim: the snap trimming related requests
+- pg recovery: the recovery related requests
+- pg scrub: the scrub related requests
+
+And the resources are partitioned using following three sets of tags. In other
+words, the share of each type of service is controlled by three tags:
+
+#. reservation: the minimum IOPS allocated for the service.
+#. limitation: the maximum IOPS allocated for the service.
+#. weight: the proportional share of capacity if extra capacity or system
+   oversubscribed.
+
+In Ceph operations are graded with "cost". And the resources allocated
+for serving various services are consumed by these "costs". So, for
+example, the more reservation a services has, the more resource it is
+guaranteed to possess, as long as it requires. Assuming there are 2
+services: recovery and client ops:
+
+- recovery: (r:1, l:5, w:1)
+- client ops: (r:2, l:0, w:9)
+
+The settings above ensure that the recovery won't get more than 5
+requests per second serviced, even if it requires so (see CURRENT
+IMPLEMENTATION NOTE below), and no other services are competing with
+it. But if the clients start to issue large amount of I/O requests,
+neither will they exhaust all the I/O resources. 1 request per second
+is always allocated for recovery jobs as long as there are any such
+requests. So the recovery jobs won't be starved even in a cluster with
+high load. And in the meantime, the client ops can enjoy a larger
+portion of the I/O resource, because its weight is "9", while its
+competitor "1". In the case of client ops, it is not clamped by the
+limit setting, so it can make use of all the resources if there is no
+recovery ongoing.
+
+Along with *mclock_opclass* another mclock operation queue named
+*mclock_client* is available. It divides operations based on category
+but also divides them based on the client making the request. This
+helps not only manage the distribution of resources spent on different
+classes of operations but also tries to insure fairness among clients.
+
+CURRENT IMPLEMENTATION NOTE: the current experimental implementation
+does not enforce the limit values. As a first approximation we decided
+not to prevent operations that would otherwise enter the operation
+sequencer from doing so.
+
+Subtleties of mClock
+````````````````````
+
+The reservation and limit values have a unit of requests per
+second. The weight, however, does not technically have a unit and the
+weights are relative to one another. So if one class of requests has a
+weight of 1 and another a weight of 9, then the latter class of
+requests should get 9 executed at a 9 to 1 ratio as the first class.
+However that will only happen once the reservations are met and those
+values include the operations executed under the reservation phase.
+
+Even though the weights do not have units, one must be careful in
+choosing their values due how the algorithm assigns weight tags to
+requests. If the weight is *W*, then for a given class of requests,
+the next one that comes in will have a weight tag of *1/W* plus the
+previous weight tag or the current time, whichever is larger. That
+means if *W* is sufficiently large and therefore *1/W* is sufficiently
+small, the calculated tag may never be assigned as it will get a value
+of the current time. The ultimate lesson is that values for weight
+should not be too large. They should be under the number of requests
+one expects to ve serviced each second.
+
+Caveats
+```````
+
+There are some factors that can reduce the impact of the mClock op
+queues within Ceph. First, requests to an OSD are sharded by their
+placement group identifier. Each shard has its own mClock queue and
+these queues neither interact nor share information among them. The
+number of shards can be controlled with the configuration options
+``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and
+``osd_op_num_shards_ssd``. A lower number of shards will increase the
+impact of the mClock queues, but may have other deliterious effects.
+
+Second, requests are transferred from the operation queue to the
+operation sequencer, in which they go through the phases of
+execution. The operation queue is where mClock resides and mClock
+determines the next op to transfer to the operation sequencer. The
+number of operations allowed in the operation sequencer is a complex
+issue. In general we want to keep enough operations in the sequencer
+so it's always getting work done on some operations while it's waiting
+for disk and network access to complete on other operations. On the
+other hand, once an operation is transferred to the operation
+sequencer, mClock no longer has control over it. Therefore to maximize
+the impact of mClock, we want to keep as few operations in the
+operation sequencer as possible. So we have an inherent tension.
+
+The configuration options that influence the number of operations in
+the operation sequencer are ``bluestore_throttle_bytes``,
+``bluestore_throttle_deferred_bytes``,
+``bluestore_throttle_cost_per_io``,
+``bluestore_throttle_cost_per_io_hdd``, and
+``bluestore_throttle_cost_per_io_ssd``.
+
+A third factor that affects the impact of the mClock algorithm is that
+we're using a distributed system, where requests are made to multiple
+OSDs and each OSD has (can have) multiple shards. Yet we're currently
+using the mClock algorithm, which is not distributed (note: dmClock is
+the distributed version of mClock).
+
+Various organizations and individuals are currently experimenting with
+mClock as it exists in this code base along with their modifications
+to the code base. We hope you'll share you're experiences with your
+mClock and dmClock experiments in the ceph-devel mailing list.
+
+
+``osd push per object cost``
+
+:Description: the overhead for serving a push op
+
+:Type: Unsigned Integer
+:Default: 1000
+
+``osd recovery max chunk``
+
+:Description: the maximum total size of data chunks a recovery op can carry.
+
+:Type: Unsigned Integer
+:Default: 8 MiB
+
+
+``osd op queue mclock client op res``
+
+:Description: the reservation of client op.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock client op wgt``
+
+:Description: the weight of client op.
+
+:Type: Float
+:Default: 500.0
+
+
+``osd op queue mclock client op lim``
+
+:Description: the limit of client op.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock osd subop res``
+
+:Description: the reservation of osd subop.
+
+:Type: Float
+:Default: 1000.0
+
+
+``osd op queue mclock osd subop wgt``
+
+:Description: the weight of osd subop.
+
+:Type: Float
+:Default: 500.0
+
+
+``osd op queue mclock osd subop lim``
+
+:Description: the limit of osd subop.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock snap res``
+
+:Description: the reservation of snap trimming.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock snap wgt``
+
+:Description: the weight of snap trimming.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock snap lim``
+
+:Description: the limit of snap trimming.
+
+:Type: Float
+:Default: 0.001
+
+
+``osd op queue mclock recov res``
+
+:Description: the reservation of recovery.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock recov wgt``
+
+:Description: the weight of recovery.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock recov lim``
+
+:Description: the limit of recovery.
+
+:Type: Float
+:Default: 0.001
+
+
+``osd op queue mclock scrub res``
+
+:Description: the reservation of scrub jobs.
+
+:Type: Float
+:Default: 0.0
+
+
+``osd op queue mclock scrub wgt``
+
+:Description: the weight of scrub jobs.
+
+:Type: Float
+:Default: 1.0
+
+
+``osd op queue mclock scrub lim``
+
+:Description: the limit of scrub jobs.
+
+:Type: Float
+:Default: 0.001
+
+.. _the dmClock algorithm: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
+
+
+.. index:: OSD; backfilling
+
+Backfilling
+===========
+
+When you add or remove Ceph OSD Daemons to a cluster, the CRUSH algorithm will
+want to rebalance the cluster by moving placement groups to or from Ceph OSD
+Daemons to restore the balance. The process of migrating placement groups and
+the objects they contain can reduce the cluster's operational performance
+considerably. To maintain operational performance, Ceph performs this migration
+with 'backfilling', which allows Ceph to set backfill operations to a lower
+priority than requests to read or write data. 
+
+
+``osd max backfills``
+
+:Description: The maximum number of backfills allowed to or from a single OSD.
+:Type: 64-bit Unsigned Integer
+:Default: ``1``
+
+
+``osd backfill scan min`` 
+
+:Description: The minimum number of objects per backfill scan.
+
+:Type: 32-bit Integer
+:Default: ``64`` 
+
+
+``osd backfill scan max`` 
+
+:Description: The maximum number of objects per backfill scan.
+
+:Type: 32-bit Integer
+:Default: ``512`` 
+
+
+``osd backfill retry interval``
+
+:Description: The number of seconds to wait before retrying backfill requests.
+:Type: Double
+:Default: ``10.0``
+
+.. index:: OSD; osdmap
+
+OSD Map
+=======
+
+OSD maps reflect the OSD daemons operating in the cluster. Over time, the 
+number of map epochs increases. Ceph provides some settings to ensure that
+Ceph performs well as the OSD map grows larger.
+
+
+``osd map dedup``
+
+:Description: Enable removing duplicates in the OSD map. 
+:Type: Boolean
+:Default: ``true``
+
+
+``osd map cache size`` 
+
+:Description: The number of OSD maps to keep cached.
+:Type: 32-bit Integer
+:Default: ``500``
+
+
+``osd map cache bl size``
+
+:Description: The size of the in-memory OSD map cache in OSD daemons. 
+:Type: 32-bit Integer
+:Default: ``50``
+
+
+``osd map cache bl inc size``
+
+:Description: The size of the in-memory OSD map cache incrementals in 
+              OSD daemons.
+
+:Type: 32-bit Integer
+:Default: ``100``
+
+
+``osd map message max`` 
+
+:Description: The maximum map entries allowed per MOSDMap message.
+:Type: 32-bit Integer
+:Default: ``100``
+
+
+
+.. index:: OSD; recovery
+
+Recovery
+========
+
+When the cluster starts or when a Ceph OSD Daemon crashes and restarts, the OSD
+begins peering with other Ceph OSD Daemons before writes can occur.  See
+`Monitoring OSDs and PGs`_ for details.
+
+If a Ceph OSD Daemon crashes and comes back online, usually it will be out of
+sync with other Ceph OSD Daemons containing more recent versions of objects in
+the placement groups. When this happens, the Ceph OSD Daemon goes into recovery
+mode and seeks to get the latest copy of the data and bring its map back up to
+date. Depending upon how long the Ceph OSD Daemon was down, the OSD's objects
+and placement groups may be significantly out of date. Also, if a failure domain
+went down (e.g., a rack), more than one Ceph OSD Daemon may come back online at
+the same time. This can make the recovery process time consuming and resource
+intensive.
+
+To maintain operational performance, Ceph performs recovery with limitations on
+the number recovery requests, threads and object chunk sizes which allows Ceph
+perform well in a degraded state. 
+
+
+``osd recovery delay start`` 
+
+:Description: After peering completes, Ceph will delay for the specified number 
+              of seconds before starting to recover objects.
+
+:Type: Float
+:Default: ``0`` 
+
+
+``osd recovery max active`` 
+
+:Description: The number of active recovery requests per OSD at one time. More 
+              requests will accelerate recovery, but the requests places an 
+              increased load on the cluster.
+
+:Type: 32-bit Integer
+:Default: ``3``
+
+
+``osd recovery max chunk`` 
+
+:Description: The maximum size of a recovered chunk of data to push. 
+:Type: 64-bit Unsigned Integer
+:Default: ``8 << 20`` 
+
+
+``osd recovery max single start``
+
+:Description: The maximum number of recovery operations per OSD that will be
+              newly started when an OSD is recovering.
+:Type: 64-bit Unsigned Integer
+:Default: ``1``
+
+
+``osd recovery thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a recovery thread.
+:Type: 32-bit Integer
+:Default: ``30``
+
+
+``osd recover clone overlap``
+
+:Description: Preserves clone overlap during recovery. Should always be set 
+              to ``true``.
+
+:Type: Boolean
+:Default: ``true``
+
+
+``osd recovery sleep``
+
+:Description: Time in seconds to sleep before next recovery or backfill op.
+              Increasing this value will slow down recovery operation while
+              client operations will be less impacted.
+
+:Type: Float
+:Default: ``0``
+
+
+``osd recovery sleep hdd``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+              for HDDs.
+
+:Type: Float
+:Default: ``0.1``
+
+
+``osd recovery sleep ssd``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+              for SSDs.
+
+:Type: Float
+:Default: ``0``
+
+
+``osd recovery sleep hybrid``
+
+:Description: Time in seconds to sleep before next recovery or backfill op
+              when osd data is on HDD and osd journal is on SSD.
+
+:Type: Float
+:Default: ``0.025``
+
+Tiering
+=======
+
+``osd agent max ops``
+
+:Description: The maximum number of simultaneous flushing ops per tiering agent
+              in the high speed mode.
+:Type: 32-bit Integer
+:Default: ``4``
+
+
+``osd agent max low ops``
+
+:Description: The maximum number of simultaneous flushing ops per tiering agent
+              in the low speed mode.
+:Type: 32-bit Integer
+:Default: ``2``
+
+See `cache target dirty high ratio`_ for when the tiering agent flushes dirty
+objects within the high speed mode.
+
+Miscellaneous
+=============
+
+
+``osd snap trim thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a snap trim thread.
+:Type: 32-bit Integer
+:Default: ``60*60*1`` 
+
+
+``osd backlog thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a backlog thread.
+:Type: 32-bit Integer
+:Default: ``60*60*1`` 
+
+
+``osd default notify timeout`` 
+
+:Description: The OSD default notification timeout (in seconds).
+:Type: 32-bit Unsigned Integer
+:Default: ``30`` 
+
+
+``osd check for log corruption`` 
+
+:Description: Check log files for corruption. Can be computationally expensive.
+:Type: Boolean
+:Default: ``false`` 
+
+
+``osd remove thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a remove OSD thread.
+:Type: 32-bit Integer
+:Default: ``60*60``
+
+
+``osd command thread timeout`` 
+
+:Description: The maximum time in seconds before timing out a command thread.
+:Type: 32-bit Integer
+:Default: ``10*60`` 
+
+
+``osd command max records`` 
+
+:Description: Limits the number of lost objects to return. 
+:Type: 32-bit Integer
+:Default: ``256`` 
+
+
+``osd auto upgrade tmap`` 
+
+:Description: Uses ``tmap`` for ``omap`` on old objects.
+:Type: Boolean
+:Default: ``true``
+ 
+
+``osd tmapput sets users tmap`` 
+
+:Description: Uses ``tmap`` for debugging only.
+:Type: Boolean
+:Default: ``false`` 
+
+
+``osd fast fail on connection refused``
+
+:Description: If this option is enabled, crashed OSDs are marked down
+              immediately by connected peers and MONs (assuming that the
+              crashed OSD host survives). Disable it to restore old
+              behavior, at the expense of possible long I/O stalls when
+              OSDs crash in the middle of I/O operations.
+:Type: Boolean
+:Default: ``true``
+
+
+
+.. _pool: ../../operations/pools
+.. _Configuring Monitor/OSD Interaction: ../mon-osd-interaction
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering
+.. _Pool & PG Config Reference: ../pool-pg-config-ref
+.. _Journal Config Reference: ../journal-ref
+.. _cache target dirty high ratio: ../../operations/pools#cache-target-dirty-high-ratio
diff --git a/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst b/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst
new file mode 100644
index 0000000..89a3707
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/pool-pg-config-ref.rst
@@ -0,0 +1,270 @@
+======================================
+ Pool, PG and CRUSH Config Reference
+======================================
+
+.. index:: pools; configuration
+
+When you create pools and set the number of placement groups for the pool, Ceph
+uses default values when you don't specifically override the defaults. **We
+recommend** overridding some of the defaults. Specifically, we recommend setting
+a pool's replica size and overriding the default number of placement groups. You
+can specifically set these values when running `pool`_ commands. You can also
+override the defaults by adding new ones in the ``[global]`` section of  your
+Ceph configuration file. 
+
+
+.. literalinclude:: pool-pg.conf
+   :language: ini
+
+
+
+``mon max pool pg num``
+
+:Description: The maximum number of placement groups per pool.
+:Type: Integer
+:Default: ``65536``
+
+
+``mon pg create interval`` 
+
+:Description: Number of seconds between PG creation in the same 
+              Ceph OSD Daemon.
+
+:Type: Float
+:Default: ``30.0``
+
+
+``mon pg stuck threshold`` 
+
+:Description: Number of seconds after which PGs can be considered as 
+              being stuck.
+
+:Type: 32-bit Integer
+:Default: ``300``
+
+``mon pg min inactive``
+
+:Description: Issue a ``HEALTH_ERR`` in cluster log if the number of PGs stay
+              inactive longer than ``mon_pg_stuck_threshold`` exceeds this
+              setting. A non-positive number means disabled, never go into ERR.
+:Type: Integer
+:Default: ``1``
+
+
+``mon pg warn min per osd``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
+              of PGs per (in) OSD is under this number. (a non-positive number
+              disables this)
+:Type: Integer
+:Default: ``30``
+
+
+``mon pg warn max per osd``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
+              of PGs per (in) OSD is above this number. (a non-positive number
+              disables this)
+:Type: Integer
+:Default: ``300``
+
+
+``mon pg warn min objects``
+
+:Description: Do not warn if the total number of objects in cluster is below
+              this number
+:Type: Integer
+:Default: ``1000``
+
+
+``mon pg warn min pool objects``
+
+:Description: Do not warn on pools whose object number is below this number
+:Type: Integer
+:Default: ``1000``
+
+
+``mon pg check down all threshold``
+
+:Description: Threshold of down OSDs percentage after which we check all PGs
+              for stale ones.
+:Type: Float
+:Default: ``0.5``
+
+
+``mon pg warn max object skew``
+
+:Description: Issue a ``HEALTH_WARN`` in cluster log if the average object number
+              of a certain pool is greater than ``mon pg warn max object skew`` times
+              the average object number of the whole pool. (a non-positive number
+              disables this)
+:Type: Float
+:Default: ``10``
+
+
+``mon delta reset interval``
+
+:Description: Seconds of inactivity before we reset the pg delta to 0. We keep
+              track of the delta of the used space of each pool, so, for
+              example, it would be easier for us to understand the progress of
+              recovery or the performance of cache tier. But if there's no
+              activity reported for a certain pool, we just reset the history of
+              deltas of that pool.
+:Type: Integer
+:Default: ``10``
+
+
+``mon osd max op age``
+
+:Description: Maximum op age before we get concerned (make it a power of 2).
+              A ``HEALTH_WARN`` will be issued if a request has been blocked longer
+              than this limit.
+:Type: Float
+:Default: ``32.0``
+
+
+``osd pg bits``
+
+:Description: Placement group bits per Ceph OSD Daemon.
+:Type: 32-bit Integer
+:Default: ``6`` 
+
+
+``osd pgp bits``
+
+:Description: The number of bits per Ceph OSD Daemon for PGPs.
+:Type: 32-bit Integer
+:Default: ``6``
+
+
+``osd crush chooseleaf type``
+
+:Description: The bucket type to use for ``chooseleaf`` in a CRUSH rule. Uses 
+              ordinal rank rather than name.
+
+:Type: 32-bit Integer
+:Default: ``1``. Typically a host containing one or more Ceph OSD Daemons.
+
+
+``osd crush initial weight``
+
+:Description: The initial crush weight for newly added osds into crushmap.
+
+:Type: Double
+:Default: ``the size of newly added osd in TB``. By default, the initial crush
+          weight for the newly added osd is set to its volume size in TB.
+          See `Weighting Bucket Items`_ for details.
+
+
+``osd pool default crush replicated ruleset`` 
+
+:Description: The default CRUSH ruleset to use when creating a replicated pool.
+:Type: 8-bit Integer
+:Default: ``CEPH_DEFAULT_CRUSH_REPLICATED_RULESET``, which means "pick
+          a ruleset with the lowest numerical ID and use that".  This is to
+          make pool creation work in the absence of ruleset 0.
+
+
+``osd pool erasure code stripe unit``
+
+:Description: Sets the default size, in bytes, of a chunk of an object
+              stripe for erasure coded pools. Every object of size S
+              will be stored as N stripes, with each data chunk
+              receiving ``stripe unit`` bytes. Each stripe of ``N *
+              stripe unit`` bytes will be encoded/decoded
+              individually. This option can is overridden by the
+              ``stripe_unit`` setting in an erasure code profile.
+
+:Type: Unsigned 32-bit Integer
+:Default: ``4096``
+
+
+``osd pool default size``
+
+:Description: Sets the number of replicas for objects in the pool. The default
+              value is the same as
+              ``ceph osd pool set {pool-name} size {size}``.
+
+:Type: 32-bit Integer
+:Default: ``3``
+
+
+``osd pool default min size``
+
+:Description: Sets the minimum number of written replicas for objects in the 
+             pool in order to acknowledge a write operation to the client. 
+             If minimum is not met, Ceph will not acknowledge the write to the 
+             client. This setting ensures a minimum number of replicas when 
+             operating in ``degraded`` mode.
+
+:Type: 32-bit Integer
+:Default: ``0``, which means no particular minimum. If ``0``, 
+          minimum is ``size - (size / 2)``.
+
+
+``osd pool default pg num`` 
+
+:Description: The default number of placement groups for a pool. The default 
+              value is the same as ``pg_num`` with ``mkpool``.
+
+:Type: 32-bit Integer
+:Default: ``8`` 
+
+
+``osd pool default pgp num`` 
+
+:Description: The default number of placement groups for placement for a pool. 
+              The default value is the same as ``pgp_num`` with ``mkpool``. 
+              PG and PGP should be equal (for now).
+
+:Type: 32-bit Integer
+:Default: ``8``
+
+
+``osd pool default flags``
+
+:Description: The default flags for new pools. 
+:Type: 32-bit Integer
+:Default: ``0``
+
+
+``osd max pgls``
+
+:Description: The maximum number of placement groups to list. A client 
+              requesting a large number can tie up the Ceph OSD Daemon.
+
+:Type: Unsigned 64-bit Integer
+:Default: ``1024``
+:Note: Default should be fine.
+
+
+``osd min pg log entries`` 
+
+:Description: The minimum number of placement group logs to maintain 
+              when trimming log files.
+
+:Type: 32-bit Int Unsigned
+:Default: ``1000``
+
+
+``osd default data pool replay window``
+
+:Description: The time (in seconds) for an OSD to wait for a client to replay
+              a request.
+
+:Type: 32-bit Integer
+:Default: ``45``
+
+``osd max pg per osd hard ratio``
+
+:Description: The ratio of number of PGs per OSD allowed by the cluster before
+              OSD refuses to create new PGs. OSD stops creating new PGs if the number
+              of PGs it serves exceeds
+              ``osd max pg per osd hard ratio`` \* ``mon max pg per osd``.
+
+:Type: Float
+:Default: ``2``
+
+.. _pool: ../../operations/pools
+.. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering
+.. _Weighting Bucket Items: ../../operations/crush-map#weightingbucketitems
diff --git a/src/ceph/doc/rados/configuration/pool-pg.conf b/src/ceph/doc/rados/configuration/pool-pg.conf
new file mode 100644
index 0000000..5f1b3b7
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/pool-pg.conf
@@ -0,0 +1,20 @@
+[global]
+
+	# By default, Ceph makes 3 replicas of objects. If you want to make four 
+	# copies of an object the default value--a primary copy and three replica 
+	# copies--reset the default values as shown in 'osd pool default size'.
+	# If you want to allow Ceph to write a lesser number of copies in a degraded 
+	# state, set 'osd pool default min size' to a number less than the
+	# 'osd pool default size' value.
+
+	osd pool default size = 4  # Write an object 4 times.
+	osd pool default min size = 1 # Allow writing one copy in a degraded state.
+
+	# Ensure you have a realistic number of placement groups. We recommend
+	# approximately 100 per OSD. E.g., total number of OSDs multiplied by 100 
+	# divided by the number of replicas (i.e., osd pool default size). So for
+	# 10 OSDs and osd pool default size = 4, we'd recommend approximately
+	# (100 * 10) / 4 = 250.
+
+	osd pool default pg num = 250
+	osd pool default pgp num = 250
diff --git a/src/ceph/doc/rados/configuration/storage-devices.rst b/src/ceph/doc/rados/configuration/storage-devices.rst
new file mode 100644
index 0000000..83c0c9b
--- /dev/null
+++ b/src/ceph/doc/rados/configuration/storage-devices.rst
@@ -0,0 +1,83 @@
+=================
+ Storage Devices
+=================
+
+There are two Ceph daemons that store data on disk:
+
+* **Ceph OSDs** (or Object Storage Daemons) are where most of the
+  data is stored in Ceph.  Generally speaking, each OSD is backed by
+  a single storage device, like a traditional hard disk (HDD) or
+  solid state disk (SSD).  OSDs can also be backed by a combination
+  of devices, like a HDD for most data and an SSD (or partition of an
+  SSD) for some metadata.  The number of OSDs in a cluster is
+  generally a function of how much data will be stored, how big each
+  storage device will be, and the level and type of redundancy
+  (replication or erasure coding).
+* **Ceph Monitor** daemons manage critical cluster state like cluster
+  membership and authentication information.  For smaller clusters a
+  few gigabytes is all that is needed, although for larger clusters
+  the monitor database can reach tens or possibly hundreds of
+  gigabytes.
+
+
+OSD Backends
+============
+
+There are two ways that OSDs can manage the data they store.  Starting
+with the Luminous 12.2.z release, the new default (and recommended) backend is
+*BlueStore*.  Prior to Luminous, the default (and only option) was
+*FileStore*.
+
+BlueStore
+---------
+
+BlueStore is a special-purpose storage backend designed specifically
+for managing data on disk for Ceph OSD workloads.  It is motivated by
+experience supporting and managing OSDs using FileStore over the
+last ten years.  Key BlueStore features include:
+
+* Direct management of storage devices.  BlueStore consumes raw block
+  devices or partitions.  This avoids any intervening layers of
+  abstraction (such as local file systems like XFS) that may limit
+  performance or add complexity.
+* Metadata management with RocksDB.  We embed RocksDB's key/value database
+  in order to manage internal metadata, such as the mapping from object
+  names to block locations on disk.
+* Full data and metadata checksumming.  By default all data and
+  metadata written to BlueStore is protected by one or more
+  checksums.  No data or metadata will be read from disk or returned
+  to the user without being verified.
+* Inline compression.  Data written may be optionally compressed
+  before being written to disk.
+* Multi-device metadata tiering.  BlueStore allows its internal
+  journal (write-ahead log) to be written to a separate, high-speed
+  device (like an SSD, NVMe, or NVDIMM) to increased performance.  If
+  a significant amount of faster storage is available, internal
+  metadata can also be stored on the faster device.
+* Efficient copy-on-write.  RBD and CephFS snapshots rely on a
+  copy-on-write *clone* mechanism that is implemented efficiently in
+  BlueStore.  This results in efficient IO both for regular snapshots
+  and for erasure coded pools (which rely on cloning to implement
+  efficient two-phase commits).
+
+For more information, see :doc:`bluestore-config-ref`.
+
+FileStore
+---------
+
+FileStore is the legacy approach to storing objects in Ceph.  It
+relies on a standard file system (normally XFS) in combination with a
+key/value database (traditionally LevelDB, now RocksDB) for some
+metadata.
+
+FileStore is well-tested and widely used in production but suffers
+from many performance deficiencies due to its overall design and
+reliance on a traditional file system for storing object data.
+
+Although FileStore is generally capable of functioning on most
+POSIX-compatible file systems (including btrfs and ext4), we only
+recommend that XFS be used.  Both btrfs and ext4 have known bugs and
+deficiencies and their use may lead to data loss.  By default all Ceph
+provisioning tools will use XFS.
+
+For more information, see :doc:`filestore-config-ref`.
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-admin.rst b/src/ceph/doc/rados/deployment/ceph-deploy-admin.rst
new file mode 100644
index 0000000..a91f69c
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-admin.rst
@@ -0,0 +1,38 @@
+=============
+ Admin Tasks
+=============
+
+Once you have set up a cluster with ``ceph-deploy``, you may 
+provide the client admin key and the Ceph configuration file
+to another host so that a user on the host may use the ``ceph``
+command line as an administrative user.
+
+
+Create an Admin Host
+====================
+
+To enable a host to execute ceph commands with administrator
+privileges, use the ``admin`` command. ::
+
+	ceph-deploy admin {host-name [host-name]...}
+	
+
+Deploy Config File
+==================
+
+To send an updated copy of the Ceph configuration file to hosts
+in your cluster, use the ``config push`` command. ::
+
+	ceph-deploy config push {host-name [host-name]...}
+	
+.. tip:: With a base name and increment host-naming convention, 
+   it is easy to deploy configuration files via simple scripts
+   (e.g., ``ceph-deploy config hostname{1,2,3,4,5}``).
+
+Retrieve Config File
+====================
+
+To retrieve a copy of the Ceph configuration file from a host
+in your cluster, use the ``config pull`` command. ::
+
+	ceph-deploy config pull {host-name [host-name]...}
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-install.rst b/src/ceph/doc/rados/deployment/ceph-deploy-install.rst
new file mode 100644
index 0000000..849d68e
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-install.rst
@@ -0,0 +1,46 @@
+====================
+ Package Management
+====================
+
+Install
+=======
+
+To install Ceph packages on your cluster hosts, open a command line on your
+client machine and type the following::
+
+	ceph-deploy install {hostname [hostname] ...}
+
+Without additional arguments, ``ceph-deploy`` will install the most recent
+major release of Ceph to the cluster host(s). To specify a particular package, 
+you may select from the following:
+
+- ``--release <code-name>`` 
+- ``--testing`` 
+- ``--dev <branch-or-tag>`` 
+
+For example:: 
+
+	ceph-deploy install --release cuttlefish hostname1
+	ceph-deploy install --testing hostname2
+	ceph-deploy install --dev wip-some-branch hostname{1,2,3,4,5}
+	
+For additional usage, execute:: 
+
+	ceph-deploy install -h
+
+
+Uninstall
+=========
+
+To uninstall Ceph packages from your cluster hosts, open a terminal on
+your admin host and type the following:: 
+
+	ceph-deploy uninstall {hostname [hostname] ...}
+
+On a Debian or Ubuntu system, you may also::
+
+	ceph-deploy purge {hostname [hostname] ...}
+
+The tool will unininstall ``ceph`` packages from the specified hosts.  Purge
+additionally removes configuration files.
+
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-keys.rst b/src/ceph/doc/rados/deployment/ceph-deploy-keys.rst
new file mode 100644
index 0000000..3e106c9
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-keys.rst
@@ -0,0 +1,32 @@
+=================
+ Keys Management
+=================
+
+
+Gather Keys
+===========
+
+Before you can provision a host to run OSDs or metadata servers, you must gather
+monitor keys and the OSD and MDS bootstrap keyrings. To gather keys, enter the
+following:: 
+
+	ceph-deploy gatherkeys {monitor-host}
+
+
+.. note:: To retrieve the keys, you specify a host that has a
+   Ceph monitor. 
+
+.. note:: If you have specified multiple monitors in the setup of the cluster,
+   make sure, that all monitors are up and running. If the monitors haven't
+   formed quorum, ``ceph-create-keys`` will not finish and the keys are not 
+   generated.
+
+Forget Keys
+===========
+
+When you are no longer using ``ceph-deploy`` (or if you are recreating a
+cluster),  you should delete the keys in the local directory of your admin host.
+To delete keys, enter the following:: 
+
+	ceph-deploy forgetkeys
+
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-mds.rst b/src/ceph/doc/rados/deployment/ceph-deploy-mds.rst
new file mode 100644
index 0000000..d2afaec
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-mds.rst
@@ -0,0 +1,46 @@
+============================
+ Add/Remove Metadata Server
+============================
+
+With ``ceph-deploy``, adding and removing metadata servers is a simple task. You
+just add or remove one or more metadata servers on the command line with one
+command.
+
+.. important:: You must deploy at least one metadata server to use CephFS.
+    There is experimental support for running multiple metadata servers.
+    Do not run multiple active metadata servers in production.
+
+See `MDS Config Reference`_ for details on configuring metadata servers.
+
+
+Add a Metadata Server
+=====================
+
+Once you deploy monitors and OSDs you may deploy the metadata server(s). ::
+
+	ceph-deploy mds create {host-name}[:{daemon-name}] [{host-name}[:{daemon-name}] ...]
+
+You may specify a daemon instance a name (optional) if you would like to run
+multiple daemons on a single server.
+
+
+Remove a Metadata Server
+========================
+
+Coming soon...
+
+.. If you have a metadata server in your cluster that you'd like to remove, you may use 
+.. the ``destroy`` option. :: 
+
+..	ceph-deploy mds destroy {host-name}[:{daemon-name}] [{host-name}[:{daemon-name}] ...]
+
+.. You may specify a daemon instance a name (optional) if you would like to destroy
+.. a particular daemon that runs on a single server with multiple MDS daemons.
+ 
+.. .. note:: Ensure that if you remove a metadata server, the remaining metadata
+   servers will be able to service requests from CephFS clients. If that is not
+   possible, consider adding a metadata server before destroying the metadata 
+   server you would like to take offline.
+
+
+.. _MDS Config Reference: ../../../cephfs/mds-config-ref
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-mon.rst b/src/ceph/doc/rados/deployment/ceph-deploy-mon.rst
new file mode 100644
index 0000000..bda34fe
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-mon.rst
@@ -0,0 +1,56 @@
+=====================
+ Add/Remove Monitors
+=====================
+
+With ``ceph-deploy``, adding and removing monitors is a simple task. You just
+add or remove one or more monitors on the command line with one command. Before
+``ceph-deploy``,  the process of `adding and removing monitors`_ involved
+numerous manual steps. Using ``ceph-deploy`` imposes a restriction:  **you may
+only install one monitor per host.**
+
+.. note:: We do not recommend comingling monitors and OSDs on 
+   the same host.
+
+For high availability, you should run a production Ceph cluster with **AT
+LEAST** three monitors. Ceph uses the Paxos algorithm, which requires a
+consensus among the majority of monitors in a quorum. With Paxos, the monitors
+cannot determine a majority for establishing a quorum with only two monitors. A
+majority of monitors must be counted as such: 1:1, 2:3, 3:4, 3:5, 4:6, etc.
+
+See `Monitor Config Reference`_ for details on configuring monitors.
+
+
+Add a Monitor
+=============
+
+Once you create a cluster and install Ceph packages to the monitor host(s), you
+may deploy the monitor(s) to the monitor host(s). When using ``ceph-deploy``,
+the tool enforces a single monitor per host. ::
+
+	ceph-deploy mon create {host-name [host-name]...}
+
+
+.. note:: Ensure that you add monitors such that they may arrive at a consensus
+   among a majority of monitors, otherwise other steps (like ``ceph-deploy gatherkeys``)
+   will fail.
+
+.. note::  When adding a monitor on a host that was not in hosts initially defined
+   with the ``ceph-deploy new`` command, a ``public network`` statement needs
+   to be added to the ceph.conf file.
+
+Remove a Monitor
+================
+
+If you have a monitor in your cluster that you'd like to remove, you may use 
+the ``destroy`` option. :: 
+
+	ceph-deploy mon destroy {host-name [host-name]...}
+
+
+.. note:: Ensure that if you remove a monitor, the remaining monitors will be 
+   able to establish a consensus. If that is not possible, consider adding a 
+   monitor before removing the monitor you would like to take offline.
+
+
+.. _adding and removing monitors: ../../operations/add-or-rm-mons
+.. _Monitor Config Reference: ../../configuration/mon-config-ref
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-new.rst b/src/ceph/doc/rados/deployment/ceph-deploy-new.rst
new file mode 100644
index 0000000..5eb37a9
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-new.rst
@@ -0,0 +1,66 @@
+==================
+ Create a Cluster
+==================
+
+The first step in using Ceph with ``ceph-deploy`` is to create a new Ceph
+cluster. A new Ceph cluster has:
+
+- A Ceph configuration file, and
+- A monitor keyring.
+
+The Ceph configuration file consists of at least:
+
+- Its own filesystem ID (``fsid``)
+- The initial monitor(s) hostname(s), and
+- The initial monitor(s) and IP address(es).
+
+For additional details, see the `Monitor Configuration Reference`_.
+
+The ``ceph-deploy`` tool also creates a monitor keyring and populates it with a
+``[mon.]`` key.  For additional details, see the `Cephx Guide`_.
+
+
+Usage
+-----
+
+To create a cluster with ``ceph-deploy``, use the ``new`` command and specify
+the host(s) that will be initial members of the monitor quorum. ::
+
+	ceph-deploy new {host [host], ...}
+
+For example::
+
+	ceph-deploy new mon1.foo.com
+	ceph-deploy new mon{1,2,3}
+
+The ``ceph-deploy`` utility will use DNS to resolve hostnames to IP
+addresses.  The monitors will be named using the first component of
+the name (e.g., ``mon1`` above).  It will add the specified host names
+to the Ceph configuration file. For additional details, execute::
+
+	ceph-deploy new -h
+
+
+Naming a Cluster
+----------------
+
+By default, Ceph clusters have a cluster name of ``ceph``. You can specify
+a cluster name if you want to run multiple clusters on the same hardware. For
+example, if you want to optimize a cluster for use with block devices, and
+another for use with the gateway, you can run two different clusters on the same
+hardware if they have a different ``fsid`` and cluster name. ::
+
+	ceph-deploy --cluster {cluster-name} new {host [host], ...}
+
+For example::
+
+	ceph-deploy --cluster rbdcluster new ceph-mon1
+	ceph-deploy --cluster rbdcluster new ceph-mon{1,2,3}
+
+.. note:: If you run multiple clusters, ensure you adjust the default
+   port settings and open ports for your additional cluster(s) so that
+   the networks of the two different clusters don't conflict with each other.
+
+
+.. _Monitor Configuration Reference: ../../configuration/mon-config-ref
+.. _Cephx Guide: ../../../dev/mon-bootstrap#secret-keys
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-osd.rst b/src/ceph/doc/rados/deployment/ceph-deploy-osd.rst
new file mode 100644
index 0000000..a4eb4d1
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-osd.rst
@@ -0,0 +1,121 @@
+=================
+ Add/Remove OSDs
+=================
+
+Adding and removing Ceph OSD Daemons to your cluster may involve a few more
+steps when compared to adding and removing other Ceph daemons. Ceph OSD Daemons
+write data to the disk and to journals. So you need to provide a disk for the
+OSD and a path to the journal partition (i.e., this is the most common
+configuration, but you may configure your system to  your own needs).
+
+In Ceph v0.60 and later releases, Ceph supports ``dm-crypt`` on disk encryption.
+You may specify the ``--dmcrypt`` argument when preparing an OSD to tell
+``ceph-deploy`` that you want to use encryption. You may also specify the
+``--dmcrypt-key-dir`` argument to specify the location of ``dm-crypt``
+encryption keys.
+
+You should test various drive configurations to gauge their throughput before
+before building out a large cluster. See `Data Storage`_ for additional details.
+
+
+List Disks
+==========
+
+To list the disks on a node, execute the following command:: 
+
+	ceph-deploy disk list {node-name [node-name]...}
+
+
+Zap Disks
+=========
+
+To zap a disk (delete its partition table) in preparation for use with Ceph,
+execute the following::
+
+	ceph-deploy disk zap {osd-server-name}:{disk-name}
+	ceph-deploy disk zap osdserver1:sdb
+
+.. important:: This will delete all data.
+
+
+Prepare OSDs
+============
+
+Once you create a cluster, install Ceph packages, and gather keys, you
+may prepare the OSDs and deploy them to the OSD node(s). If you need to 
+identify a disk or zap it prior to preparing it for use as an OSD, 
+see `List Disks`_ and `Zap Disks`_. ::
+
+	ceph-deploy osd prepare {node-name}:{data-disk}[:{journal-disk}]
+	ceph-deploy osd prepare osdserver1:sdb:/dev/ssd
+	ceph-deploy osd prepare osdserver1:sdc:/dev/ssd
+
+The ``prepare`` command only prepares the OSD. On most operating
+systems, the ``activate`` phase will automatically run when the
+partitions are created on the disk (using Ceph ``udev`` rules). If not
+use the ``activate`` command. See `Activate OSDs`_ for
+details.
+
+The foregoing example assumes a disk dedicated to one Ceph OSD Daemon, and 
+a path to an SSD journal partition. We recommend storing the journal on 
+a separate drive to maximize throughput. You may dedicate a single drive
+for the journal too (which may be expensive) or place the journal on the 
+same disk as the OSD (not recommended as it impairs performance). In the
+foregoing example we store the journal on a partitioned solid state drive.
+
+You can use the settings --fs-type or --bluestore to choose which file system
+you want to install in the OSD drive. (More information by running
+'ceph-deploy osd prepare --help').
+
+.. note:: When running multiple Ceph OSD daemons on a single node, and 
+   sharing a partioned journal with each OSD daemon, you should consider
+   the entire node the minimum failure domain for CRUSH purposes, because
+   if the SSD drive fails, all of the Ceph OSD daemons that journal to it
+   will fail too.
+
+
+Activate OSDs
+=============
+
+Once you prepare an OSD you may activate it with the following command.  ::
+
+	ceph-deploy osd activate {node-name}:{data-disk-partition}[:{journal-disk-partition}]
+	ceph-deploy osd activate osdserver1:/dev/sdb1:/dev/ssd1
+	ceph-deploy osd activate osdserver1:/dev/sdc1:/dev/ssd2
+
+The ``activate`` command will cause your OSD to come ``up`` and be placed
+``in`` the cluster. The ``activate`` command uses the path to the partition
+created when running the ``prepare`` command.
+
+
+Create OSDs
+===========
+
+You may prepare OSDs, deploy them to the OSD node(s) and activate them in one
+step with the ``create`` command. The ``create`` command is a convenience method
+for executing the ``prepare`` and ``activate`` command sequentially.  ::
+
+	ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
+	ceph-deploy osd create osdserver1:sdb:/dev/ssd1
+
+.. List OSDs
+.. =========
+
+.. To list the OSDs deployed on a node(s), execute the following command:: 
+
+..	ceph-deploy osd list {node-name}
+
+
+Destroy OSDs
+============
+
+.. note:: Coming soon. See `Remove OSDs`_ for manual procedures.
+
+.. To destroy an OSD, execute the following command:: 
+
+..	ceph-deploy osd destroy {node-name}:{path-to-disk}[:{path/to/journal}]
+
+.. Destroying an OSD will take it ``down`` and ``out`` of the cluster.
+
+.. _Data Storage: ../../../start/hardware-recommendations#data-storage
+.. _Remove OSDs: ../../operations/add-or-rm-osds#removing-osds-manual
diff --git a/src/ceph/doc/rados/deployment/ceph-deploy-purge.rst b/src/ceph/doc/rados/deployment/ceph-deploy-purge.rst
new file mode 100644
index 0000000..685c3c4
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/ceph-deploy-purge.rst
@@ -0,0 +1,25 @@
+==============
+ Purge a Host
+==============
+
+When you remove Ceph daemons and uninstall Ceph, there may still be  extraneous
+data from the cluster on your server. The ``purge`` and  ``purgedata`` commands
+provide a convenient means of cleaning up a  host. 
+
+
+Purge Data
+==========
+
+To remove all data from ``/var/lib/ceph`` (but leave Ceph packages intact),
+execute the ``purgedata`` command.
+
+	ceph-deploy purgedata {hostname} [{hostname} ...]
+	
+
+Purge
+=====
+
+To remove all data from ``/var/lib/ceph`` and uninstall Ceph packages, execute
+the ``purge`` command.
+
+	ceph-deploy purge {hostname} [{hostname} ...]
+\ No newline at end of file
diff --git a/src/ceph/doc/rados/deployment/index.rst b/src/ceph/doc/rados/deployment/index.rst
new file mode 100644
index 0000000..0853e4a
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/index.rst
@@ -0,0 +1,58 @@
+=================
+ Ceph Deployment
+=================
+
+The ``ceph-deploy`` tool is a way to deploy Ceph relying only upon SSH access to
+the servers, ``sudo``, and some Python. It runs on your workstation, and does
+not require servers, databases, or any other tools. If you set up and
+tear down Ceph clusters a lot, and want minimal extra bureaucracy,
+``ceph-deploy`` is an ideal tool. The ``ceph-deploy`` tool is not a generic
+deployment system. It was designed exclusively for Ceph users who want to get
+Ceph up and running quickly with sensible initial configuration settings without
+the overhead of installing Chef, Puppet or Juju. Users who want fine-control
+over security settings, partitions or directory  locations should use a tool
+such as Juju, Puppet, `Chef`_ or Crowbar. 
+
+
+With ``ceph-deploy``, you can develop scripts to install Ceph packages on remote
+hosts, create a cluster, add monitors, gather (or forget) keys, add OSDs and
+metadata servers, configure admin hosts, and tear down the clusters.
+
+.. raw:: html
+
+	<table cellpadding="10"><tbody valign="top"><tr><td>
+
+.. toctree:: 
+
+   Preflight Checklist <preflight-checklist>	
+	Install Ceph <ceph-deploy-install>
+
+.. raw:: html
+
+	</td><td>	
+	
+.. toctree::
+	
+	Create a Cluster <ceph-deploy-new>
+	Add/Remove Monitor(s) <ceph-deploy-mon>
+	Key Management <ceph-deploy-keys>
+	Add/Remove OSD(s) <ceph-deploy-osd>
+	Add/Remove MDS(s) <ceph-deploy-mds>
+
+
+.. raw:: html
+
+	</td><td>	
+
+.. toctree::
+
+	Purge Hosts <ceph-deploy-purge>
+	Admin Tasks <ceph-deploy-admin>
+
+	
+.. raw:: html
+
+	</td></tr></tbody></table>
+
+
+.. _Chef: http://tracker.ceph.com/projects/ceph/wiki/Deploying_Ceph_with_Chef
diff --git a/src/ceph/doc/rados/deployment/preflight-checklist.rst b/src/ceph/doc/rados/deployment/preflight-checklist.rst
new file mode 100644
index 0000000..64a669f
--- /dev/null
+++ b/src/ceph/doc/rados/deployment/preflight-checklist.rst
@@ -0,0 +1,109 @@
+=====================
+ Preflight Checklist
+=====================
+
+.. versionadded:: 0.60
+
+This **Preflight Checklist** will help you prepare an admin node for use with
+``ceph-deploy``,  and server nodes for use with passwordless ``ssh`` and
+``sudo``.
+
+Before you can deploy Ceph using ``ceph-deploy``, you need to ensure that you
+have a few things set up first on your admin node and on nodes running Ceph
+daemons.
+ 
+
+Install an Operating System
+===========================
+
+Install a recent release of Debian or Ubuntu (e.g., 12.04 LTS, 14.04 LTS) on
+your nodes. For additional details on operating systems or to use other
+operating systems other than Debian or Ubuntu, see `OS Recommendations`_.
+
+
+Install an SSH Server
+=====================
+
+The ``ceph-deploy`` utility requires ``ssh``, so your server node(s) require an
+SSH server. ::
+
+	sudo apt-get install openssh-server
+
+
+Create a User
+=============
+
+Create a user on nodes running Ceph daemons. 
+
+.. tip:: We recommend a username that brute force attackers won't
+   guess easily (e.g., something other than ``root``, ``ceph``, etc).
+
+::
+
+	ssh user@ceph-server
+	sudo useradd -d /home/ceph -m ceph
+	sudo passwd ceph
+
+
+``ceph-deploy`` installs packages onto your nodes. This means that
+the user you create requires passwordless ``sudo`` privileges. 
+
+.. note:: We **DO NOT** recommend enabling the ``root`` password 
+   for security reasons. 
+
+To provide full privileges to the user, add the following to 
+``/etc/sudoers.d/ceph``. ::
+
+	echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
+	sudo chmod 0440 /etc/sudoers.d/ceph
+
+
+Configure SSH
+=============
+
+Configure your admin machine with password-less SSH access to each node
+running Ceph daemons (leave the passphrase empty). ::
+
+	ssh-keygen
+	Generating public/private key pair.
+	Enter file in which to save the key (/ceph-client/.ssh/id_rsa):
+	Enter passphrase (empty for no passphrase):
+	Enter same passphrase again:
+	Your identification has been saved in /ceph-client/.ssh/id_rsa.
+	Your public key has been saved in /ceph-client/.ssh/id_rsa.pub.
+
+Copy the key to each node running Ceph daemons:: 
+
+	ssh-copy-id ceph@ceph-server
+
+Modify your ~/.ssh/config file of your admin node so that it defaults 
+to logging in as the user you created when no username is specified. ::
+
+	Host ceph-server
+		Hostname ceph-server.fqdn-or-ip-address.com
+		User ceph
+
+
+Install ceph-deploy
+===================
+
+To install ``ceph-deploy``, execute the following:: 
+
+	wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
+	echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
+	sudo apt-get update	
+	sudo apt-get install ceph-deploy
+
+
+Ensure Connectivity
+===================
+
+Ensure that your Admin node has connectivity to the network and to your Server
+node (e.g., ensure ``iptables``, ``ufw`` or other tools that may prevent
+connections, traffic forwarding, etc. to allow what you need).
+
+
+Once you have completed this pre-flight checklist, you are ready to begin using
+``ceph-deploy``.
+
+.. _OS Recommendations: ../../../start/os-recommendations
diff --git a/src/ceph/doc/rados/index.rst b/src/ceph/doc/rados/index.rst
new file mode 100644
index 0000000..929bb7e
--- /dev/null
+++ b/src/ceph/doc/rados/index.rst
@@ -0,0 +1,76 @@
+======================
+ Ceph Storage Cluster
+======================
+
+The :term:`Ceph Storage Cluster` is the foundation for all Ceph deployments.
+Based upon :abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, Ceph
+Storage Clusters consist of two types of daemons: a :term:`Ceph OSD Daemon`
+(OSD) stores data as objects on a storage node; and a :term:`Ceph Monitor` (MON)
+maintains a master copy of the cluster map. A Ceph Storage Cluster may contain
+thousands of storage nodes. A minimal system will have at least one 
+Ceph Monitor and two Ceph OSD Daemons for data replication. 
+
+The Ceph Filesystem, Ceph Object Storage and Ceph Block Devices read data from
+and write data to the Ceph Storage Cluster.
+
+.. raw:: html
+
+	<style type="text/css">div.body h3{margin:5px 0px 0px 0px;}</style>
+	<table cellpadding="10"><colgroup><col width="33%"><col width="33%"><col width="33%"></colgroup><tbody valign="top"><tr><td><h3>Config and Deploy</h3>
+
+Ceph Storage Clusters have a few required settings, but most configuration
+settings have default values. A typical deployment uses a deployment tool 
+to define a cluster and bootstrap a monitor. See `Deployment`_ for details 
+on ``ceph-deploy.``
+
+.. toctree::
+	:maxdepth: 2
+
+	Configuration <configuration/index>
+	Deployment <deployment/index>
+
+.. raw:: html 
+
+	</td><td><h3>Operations</h3>
+
+Once you have a deployed a Ceph Storage Cluster, you may begin operating 
+your cluster.
+
+.. toctree::
+	:maxdepth: 2
+	
+	
+	Operations <operations/index>
+
+.. toctree::
+	:maxdepth: 1
+
+	Man Pages <man/index>
+
+
+.. toctree:: 
+	:hidden:
+	
+	troubleshooting/index
+
+.. raw:: html 
+
+	</td><td><h3>APIs</h3>
+
+Most Ceph deployments use `Ceph Block Devices`_, `Ceph Object Storage`_ and/or the
+`Ceph Filesystem`_. You  may also develop applications that talk directly to
+the Ceph Storage Cluster.
+
+.. toctree::
+	:maxdepth: 2
+
+	APIs <api/index>
+	
+.. raw:: html
+
+	</td></tr></tbody></table>
+
+.. _Ceph Block Devices: ../rbd/
+.. _Ceph Filesystem: ../cephfs/
+.. _Ceph Object Storage: ../radosgw/
+.. _Deployment: ../rados/deployment/
diff --git a/src/ceph/doc/rados/man/index.rst b/src/ceph/doc/rados/man/index.rst
new file mode 100644
index 0000000..abeb88b
--- /dev/null
+++ b/src/ceph/doc/rados/man/index.rst
@@ -0,0 +1,34 @@
+=======================
+ Object Store Manpages
+=======================
+
+.. toctree:: 
+   :maxdepth: 1
+
+   ../../man/8/ceph-disk.rst
+   ../../man/8/ceph-volume.rst
+   ../../man/8/ceph-volume-systemd.rst
+   ../../man/8/ceph.rst
+   ../../man/8/ceph-deploy.rst
+   ../../man/8/ceph-rest-api.rst
+   ../../man/8/ceph-authtool.rst
+   ../../man/8/ceph-clsinfo.rst
+   ../../man/8/ceph-conf.rst
+   ../../man/8/ceph-debugpack.rst
+   ../../man/8/ceph-dencoder.rst
+   ../../man/8/ceph-mon.rst
+   ../../man/8/ceph-osd.rst
+   ../../man/8/ceph-kvstore-tool.rst
+   ../../man/8/ceph-run.rst
+   ../../man/8/ceph-syn.rst
+   ../../man/8/crushtool.rst
+   ../../man/8/librados-config.rst
+   ../../man/8/monmaptool.rst
+   ../../man/8/osdmaptool.rst
+   ../../man/8/rados.rst
+
+
+.. toctree::
+	:hidden:
+	
+	../../man/8/ceph-post-file.rst
diff --git a/src/ceph/doc/rados/operations/add-or-rm-mons.rst b/src/ceph/doc/rados/operations/add-or-rm-mons.rst
new file mode 100644
index 0000000..0cdc431
--- /dev/null
+++ b/src/ceph/doc/rados/operations/add-or-rm-mons.rst
@@ -0,0 +1,370 @@
+==========================
+ Adding/Removing Monitors
+==========================
+
+When you have a cluster up and running, you may add or remove monitors
+from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
+or `Monitor Bootstrap`_.
+
+Adding Monitors
+===============
+
+Ceph monitors are light-weight processes that maintain a master copy of the 
+cluster map. You can run a cluster with 1 monitor. We recommend at least 3 
+monitors for a production cluster. Ceph monitors use a variation of the
+`Paxos`_ protocol to establish consensus about maps and other critical
+information across the cluster. Due to the nature of Paxos, Ceph requires
+a majority of monitors running to establish a quorum (thus establishing
+consensus).
+
+It is advisable to run an odd-number of monitors but not mandatory. An
+odd-number of monitors has a higher resiliency to failures than an
+even-number of monitors. For instance, on a 2 monitor deployment, no
+failures can be tolerated in order to maintain a quorum; with 3 monitors,
+one failure can be tolerated; in a 4 monitor deployment, one failure can
+be tolerated; with 5 monitors, two failures can be tolerated.  This is
+why an odd-number is advisable. Summarizing, Ceph needs a majority of
+monitors to be running (and able to communicate with each other), but that
+majority can be achieved using a single monitor, or 2 out of 2 monitors,
+2 out of 3, 3 out of 4, etc.
+
+For an initial deployment of a multi-node Ceph cluster, it is advisable to
+deploy three monitors, increasing the number two at a time if a valid need
+for more than three exists.
+
+Since monitors are light-weight, it is possible to run them on the same 
+host as an OSD; however, we recommend running them on separate hosts,
+because fsync issues with the kernel may impair performance. 
+
+.. note:: A *majority* of monitors in your cluster must be able to 
+   reach each other in order to establish a quorum.
+
+Deploy your Hardware
+--------------------
+
+If you are adding a new host when adding a new monitor,  see `Hardware
+Recommendations`_ for details on minimum recommendations for monitor hardware.
+To add a monitor host to your cluster, first make sure you have an up-to-date
+version of Linux installed (typically Ubuntu 14.04 or RHEL 7). 
+
+Add your monitor host to a rack in your cluster, connect it to the network
+and ensure that it has network connectivity.
+
+.. _Hardware Recommendations: ../../../start/hardware-recommendations
+
+Install the Required Software
+-----------------------------
+
+For manually deployed clusters, you must install Ceph packages
+manually. See `Installing Packages`_ for details.
+You should configure SSH to a user with password-less authentication
+and root permissions.
+
+.. _Installing Packages: ../../../install/install-storage-cluster
+
+
+.. _Adding a Monitor (Manual):
+
+Adding a Monitor (Manual)
+-------------------------
+
+This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
+and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster.  If
+this results in only two monitor daemons, you may add more monitors by
+repeating this procedure until you have a sufficient number of ``ceph-mon`` 
+daemons to achieve a quorum.
+
+At this point you should define your monitor's id.  Traditionally, monitors 
+have been named with single letters (``a``, ``b``, ``c``, ...), but you are 
+free to define the id as you see fit.  For the purpose of this document, 
+please take into account that ``{mon-id}`` should be the id you chose, 
+without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a`` 
+on ``mon.a``).
+
+#. Create the default directory on the machine that will host your 
+   new monitor. :: 
+
+	ssh {new-mon-host}
+	sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}
+
+#. Create a temporary directory ``{tmp}`` to keep the files needed during 
+   this process. This directory should be different from the monitor's default 
+   directory created in the previous step, and can be removed after all the 
+   steps are executed. :: 
+
+	mkdir {tmp}
+
+#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to 
+   the retrieved keyring, and ``{key-filename}`` is the name of the file 
+   containing the retrieved monitor key. :: 
+
+	ceph auth get mon. -o {tmp}/{key-filename}
+
+#. Retrieve the monitor map, where ``{tmp}`` is the path to 
+   the retrieved monitor map, and ``{map-filename}`` is the name of the file 
+   containing the retrieved monitor monitor map. :: 
+
+	ceph mon getmap -o {tmp}/{map-filename}
+
+#. Prepare the monitor's data directory created in the first step. You must 
+   specify the path to the monitor map so that you can retrieve the 
+   information about a quorum of monitors and their ``fsid``. You must also 
+   specify a path to the monitor keyring:: 
+
+	sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
+	
+
+#. Start the new monitor and it will automatically join the cluster.
+   The daemon needs to know which address to bind to, either via
+   ``--public-addr {ip:port}`` or by setting ``mon addr`` in the
+   appropriate section of ``ceph.conf``.  For example::
+
+	ceph-mon -i {mon-id} --public-addr {ip:port}
+
+
+Removing Monitors
+=================
+
+When you remove monitors from a cluster, consider that Ceph monitors use 
+PAXOS to establish consensus about the master cluster map. You must have 
+a sufficient number of monitors to establish a quorum for consensus about 
+the cluster map.
+
+.. _Removing a Monitor (Manual):
+
+Removing a Monitor (Manual)
+---------------------------
+
+This procedure removes a ``ceph-mon`` daemon from your cluster.   If this
+procedure results in only two monitor daemons, you may add or remove another
+monitor until you have a number of ``ceph-mon`` daemons that can achieve a 
+quorum.
+
+#. Stop the monitor. ::
+
+	service ceph -a stop mon.{mon-id}
+	
+#. Remove the monitor from the cluster. ::
+
+	ceph mon remove {mon-id}
+	
+#. Remove the monitor entry from ``ceph.conf``. 
+
+
+Removing Monitors from an Unhealthy Cluster
+-------------------------------------------
+
+This procedure removes a ``ceph-mon`` daemon from an unhealthy
+cluster, for example a cluster where the monitors cannot form a
+quorum.
+
+
+#. Stop all ``ceph-mon`` daemons on all monitor hosts. ::
+
+	ssh {mon-host}
+	service ceph stop mon || stop ceph-mon-all
+	# and repeat for all mons
+
+#. Identify a surviving monitor and log in to that host. :: 
+
+	ssh {mon-host}
+
+#. Extract a copy of the monmap file.  ::
+
+        ceph-mon -i {mon-id} --extract-monmap {map-path}
+        # in most cases, that's
+        ceph-mon -i `hostname` --extract-monmap /tmp/monmap
+
+#. Remove the non-surviving or problematic monitors.  For example, if
+   you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
+   only ``mon.a`` will survive, follow the example below::
+
+	monmaptool {map-path} --rm {mon-id}
+	# for example,
+	monmaptool /tmp/monmap --rm b
+	monmaptool /tmp/monmap --rm c
+	
+#. Inject the surviving map with the removed monitors into the
+   surviving monitor(s).  For example, to inject a map into monitor
+   ``mon.a``, follow the example below::
+
+	ceph-mon -i {mon-id} --inject-monmap {map-path}
+	# for example,
+	ceph-mon -i a --inject-monmap /tmp/monmap
+
+#. Start only the surviving monitors.
+
+#. Verify the monitors form a quorum (``ceph -s``).
+
+#. You may wish to archive the removed monitors' data directory in
+   ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
+   confident the remaining monitors are healthy and are sufficiently
+   redundant.
+
+.. _Changing a Monitor's IP address:
+
+Changing a Monitor's IP Address
+===============================
+
+.. important:: Existing monitors are not supposed to change their IP addresses.
+
+Monitors are critical components of a Ceph cluster, and they need to maintain a
+quorum for the whole system to work properly. To establish a quorum, the
+monitors need to discover each other. Ceph has strict requirements for
+discovering monitors.
+
+Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
+However, monitors discover each other using the monitor map, not ``ceph.conf``.
+For example,  if you refer to `Adding a Monitor (Manual)`_ you will see that you
+need to obtain the current monmap for the cluster when creating a new monitor,
+as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
+following sections explain the consistency requirements for Ceph monitors, and a
+few safe ways to change a monitor's IP address.
+
+
+Consistency Requirements
+------------------------
+
+A monitor always refers to the local copy of the monmap  when discovering other
+monitors in the cluster.  Using the monmap instead of ``ceph.conf`` avoids
+errors that could  break the cluster (e.g., typos in ``ceph.conf`` when
+specifying a monitor address or port). Since monitors use monmaps for discovery
+and they share monmaps with clients and other Ceph daemons, the monmap provides
+monitors with a strict guarantee that their consensus is valid.
+
+Strict consistency also applies to updates to the monmap. As with any other
+updates on the monitor, changes to the monmap always run through a distributed
+consensus algorithm called `Paxos`_. The monitors must agree on each update to
+the monmap, such as adding or removing a monitor, to ensure that each monitor in
+the quorum has the same version of the monmap. Updates to the monmap are
+incremental so that monitors have the latest agreed upon version, and a set of
+previous versions, allowing a monitor that has an older version of the monmap to
+catch up with the current state of the cluster.
+
+If monitors discovered each other through the Ceph configuration file instead of
+through the monmap, it would introduce additional risks because the Ceph
+configuration files are not updated and distributed automatically. Monitors
+might inadvertently use an older ``ceph.conf`` file, fail to recognize a
+monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
+to determine the current state of the system accurately. Consequently,  making
+changes to an existing monitor's IP address must be done with  great care.
+
+
+Changing a Monitor's IP address (The Right Way)
+-----------------------------------------------
+
+Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
+ensure that other monitors in the cluster will receive the update.  To change a
+monitor's IP address, you must add a new monitor with the IP  address you want
+to use (as described in `Adding a Monitor (Manual)`_),  ensure that the new
+monitor successfully joins the  quorum; then, remove the monitor that uses the
+old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
+other daemons know the IP address of the new monitor.
+
+For example, lets assume there are three monitors in place, such as :: 
+
+	[mon.a]
+		host = host01
+		addr = 10.0.0.1:6789
+	[mon.b]
+		host = host02
+		addr = 10.0.0.2:6789
+	[mon.c]
+		host = host03
+		addr = 10.0.0.3:6789
+
+To change ``mon.c`` to ``host04`` with the IP address  ``10.0.0.4``, follow the
+steps in `Adding a Monitor (Manual)`_ by adding a  new monitor ``mon.d``. Ensure
+that ``mon.d`` is  running before removing ``mon.c``, or it will break the
+quorum. Remove ``mon.c`` as described on  `Removing a Monitor (Manual)`_. Moving
+all three  monitors would thus require repeating this process as many times as
+needed.
+
+
+Changing a Monitor's IP address (The Messy Way)
+-----------------------------------------------
+
+There may come a time when the monitors must be moved to a different network,  a
+different part of the datacenter or a different datacenter altogether. While  it
+is possible to do it, the process becomes a bit more hazardous.
+
+In such a case, the solution is to generate a new monmap with updated IP
+addresses for all the monitors in the cluster, and inject the new map on each
+individual monitor.  This is not the most user-friendly approach, but we do not
+expect this to be something that needs to be done every other week.  As it is
+clearly stated on the top of this section, monitors are not supposed to change
+IP addresses.
+
+Using the previous monitor configuration as an example, assume you want to move
+all the  monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
+networks  are unable to communicate.  Use the following procedure:
+
+#. Retrieve the monitor map, where ``{tmp}`` is the path to 
+   the retrieved monitor map, and ``{filename}`` is the name of the file 
+   containing the retrieved monitor monitor map. :: 
+
+	ceph mon getmap -o {tmp}/{filename}
+
+#. The following example demonstrates the contents of the monmap. ::
+
+	$ monmaptool --print {tmp}/{filename}
+	
+	monmaptool: monmap file {tmp}/{filename}
+	epoch 1
+	fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
+	last_changed 2012-12-17 02:46:41.591248
+	created 2012-12-17 02:46:41.591248
+	0: 10.0.0.1:6789/0 mon.a
+	1: 10.0.0.2:6789/0 mon.b
+	2: 10.0.0.3:6789/0 mon.c
+
+#. Remove the existing monitors. ::
+
+	$ monmaptool --rm a --rm b --rm c {tmp}/{filename}
+	
+	monmaptool: monmap file {tmp}/{filename}
+	monmaptool: removing a
+	monmaptool: removing b
+	monmaptool: removing c
+	monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)
+
+#. Add the new monitor locations. ::
+
+	$ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
+	
+	monmaptool: monmap file {tmp}/{filename}
+	monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)
+
+#. Check new contents. ::
+
+	$ monmaptool --print {tmp}/{filename}
+	
+	monmaptool: monmap file {tmp}/{filename}
+	epoch 1
+	fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
+	last_changed 2012-12-17 02:46:41.591248
+	created 2012-12-17 02:46:41.591248
+	0: 10.1.0.1:6789/0 mon.a
+	1: 10.1.0.2:6789/0 mon.b
+	2: 10.1.0.3:6789/0 mon.c
+
+At this point, we assume the monitors (and stores) are installed at the new
+location. The next step is to propagate the modified monmap to the new 
+monitors, and inject the modified monmap into each new monitor.
+
+#. First, make sure to stop all your monitors.  Injection must be done while 
+   the daemon is not running.
+
+#. Inject the monmap. ::
+
+	ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}
+
+#. Restart the monitors.
+
+After this step, migration to the new location is complete and 
+the monitors should operate successfully.
+
+
+.. _Manual Deployment: ../../../install/manual-deployment
+.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
+.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)
diff --git a/src/ceph/doc/rados/operations/add-or-rm-osds.rst b/src/ceph/doc/rados/operations/add-or-rm-osds.rst
new file mode 100644
index 0000000..59ce4c7
--- /dev/null
+++ b/src/ceph/doc/rados/operations/add-or-rm-osds.rst
@@ -0,0 +1,366 @@
+======================
+ Adding/Removing OSDs
+======================
+
+When you have a cluster up and running, you may add OSDs or remove OSDs
+from the cluster at runtime. 
+
+Adding OSDs
+===========
+
+When you want to expand a cluster, you may add an OSD at runtime. With Ceph, an
+OSD is generally one Ceph ``ceph-osd`` daemon for one storage drive within a
+host machine. If your host has multiple storage drives, you may map one
+``ceph-osd`` daemon for each drive.
+
+Generally, it's a good idea to check the capacity of your cluster to see if you
+are reaching the upper end of its capacity. As your cluster reaches its ``near
+full`` ratio, you should add one or more OSDs to expand your cluster's capacity.
+
+.. warning:: Do not let your cluster reach its ``full ratio`` before
+   adding an OSD. OSD failures that occur after the cluster reaches 
+   its ``near full`` ratio may cause the cluster to exceed its
+   ``full ratio``.
+
+Deploy your Hardware
+--------------------
+
+If you are adding a new host when adding a new OSD,  see `Hardware
+Recommendations`_ for details on minimum recommendations for OSD hardware. To
+add an OSD host to your cluster, first make sure you have an up-to-date version
+of Linux installed, and you have made some initial preparations for your 
+storage drives.  See `Filesystem Recommendations`_ for details.
+
+Add your OSD host to a rack in your cluster, connect it to the network
+and ensure that it has network connectivity. See the `Network Configuration
+Reference`_ for details.
+
+.. _Hardware Recommendations: ../../../start/hardware-recommendations
+.. _Filesystem Recommendations: ../../configuration/filesystem-recommendations
+.. _Network Configuration Reference: ../../configuration/network-config-ref
+
+Install the Required Software
+-----------------------------
+
+For manually deployed clusters, you must install Ceph packages
+manually. See `Installing Ceph (Manual)`_ for details.
+You should configure SSH to a user with password-less authentication
+and root permissions.
+
+.. _Installing Ceph (Manual): ../../../install
+
+
+Adding an OSD (Manual)
+----------------------
+
+This procedure sets up a ``ceph-osd`` daemon, configures it to use one drive,
+and configures the cluster to distribute data to the OSD. If your host has
+multiple drives, you may add an OSD for each drive by repeating this procedure.
+
+To add an OSD, create a data directory for it, mount a drive to that directory, 
+add the OSD to the cluster, and then add it to the CRUSH map.
+
+When you add the OSD to the CRUSH map, consider the weight you give to the new
+OSD. Hard drive capacity grows 40% per year, so newer OSD hosts may have larger
+hard drives than older hosts in the cluster (i.e., they may have greater 
+weight).
+
+.. tip:: Ceph prefers uniform hardware across pools. If you are adding drives
+   of dissimilar size, you can adjust their weights. However, for best 
+   performance, consider a CRUSH hierarchy with drives of the same type/size.
+
+#. Create the OSD. If no UUID is given, it will be set automatically when the 
+   OSD starts up. The following command will output the OSD number, which you 
+   will need for subsequent steps. ::
+	
+	ceph osd create [{uuid} [{id}]]
+
+   If the optional parameter {id} is given it will be used as the OSD id.
+   Note, in this case the command may fail if the number is already in use.
+
+   .. warning:: In general, explicitly specifying {id} is not recommended.
+      IDs are allocated as an array, and skipping entries consumes some extra
+      memory. This can become significant if there are large gaps and/or
+      clusters are large. If {id} is not specified, the smallest available is
+      used.
+
+#. Create the default directory on your new OSD. :: 
+
+	ssh {new-osd-host}
+	sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
+	
+
+#. If the OSD is for a drive other than the OS drive, prepare it 
+   for use with Ceph, and mount it to the directory you just created:: 
+
+	ssh {new-osd-host}
+	sudo mkfs -t {fstype} /dev/{drive}
+	sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
+
+	
+#. Initialize the OSD data directory. :: 
+
+	ssh {new-osd-host}
+	ceph-osd -i {osd-num} --mkfs --mkkey
+	
+   The directory must be empty before you can run ``ceph-osd``.
+
+#. Register the OSD authentication key. The value of ``ceph`` for 
+   ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.  If your 
+   cluster name differs from ``ceph``, use your cluster name instead.::
+
+	ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
+
+
+#. Add the OSD to the CRUSH map so that the OSD can begin receiving data. The 
+   ``ceph osd crush add`` command allows you to add OSDs to the CRUSH hierarchy 
+   wherever you wish. If you specify at least one bucket, the command 
+   will place the OSD into the most specific bucket you specify, *and* it will 
+   move that bucket underneath any other buckets you specify. **Important:** If 
+   you specify only the root bucket, the command will attach the OSD directly 
+   to the root, but CRUSH rules expect OSDs to be inside of hosts.
+      
+   For Argonaut (v 0.48), execute the following::
+
+	ceph osd crush add {id} {name} {weight}  [{bucket-type}={bucket-name} ...]
+
+   For Bobtail (v 0.56) and later releases, execute the following:: 
+
+	ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]
+
+   You may also decompile the CRUSH map, add the OSD to the device list, add the 
+   host as a bucket (if it's not already in the CRUSH map), add the device as an 
+   item in the host, assign it a weight, recompile it and set it. See 
+   `Add/Move an OSD`_ for details.
+
+
+.. topic:: Argonaut (v0.48) Best Practices
+
+ To limit impact on user I/O performance, add an OSD to the CRUSH map
+ with an initial weight of ``0``. Then, ramp up the CRUSH weight a
+ little bit at a time.  For example, to ramp by increments of ``0.2``,
+ start with::
+
+      ceph osd crush reweight {osd-id} .2
+
+ and allow migration to complete before reweighting to ``0.4``,
+ ``0.6``, and so on until the desired CRUSH weight is reached.
+
+ To limit the impact of OSD failures, you can set::
+
+      mon osd down out interval = 0
+
+ which prevents down OSDs from automatically being marked out, and then
+ ramp them down manually with::
+
+      ceph osd reweight {osd-num} .8
+
+ Again, wait for the cluster to finish migrating data, and then adjust
+ the weight further until you reach a weight of 0.  Note that this
+ problem prevents the cluster to automatically re-replicate data after
+ a failure, so please ensure that sufficient monitoring is in place for
+ an administrator to intervene promptly.
+
+ Note that this practice will no longer be necessary in Bobtail and
+ subsequent releases.
+
+
+Replacing an OSD
+----------------
+
+When disks fail, or if an admnistrator wants to reprovision OSDs with a new
+backend, for instance, for switching from FileStore to BlueStore, OSDs need to
+be replaced. Unlike `Removing the OSD`_, replaced OSD's id and CRUSH map entry
+need to be keep intact after the OSD is destroyed for replacement.
+
+#. Destroy the OSD first::
+
+     ceph osd destroy {id} --yes-i-really-mean-it
+
+#. Zap a disk for the new OSD, if the disk was used before for other purposes.
+   It's not necessary for a new disk::
+
+     ceph-disk zap /dev/sdX
+
+#. Prepare the disk for replacement by using the previously destroyed OSD id::
+
+     ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} --osd-uuid `uuidgen`
+
+#. And activate the OSD::
+
+     ceph-disk activate /dev/sdX1
+
+
+Starting the OSD
+----------------
+
+After you add an OSD to Ceph, the OSD is in your configuration. However, 
+it is not yet running. The OSD is ``down`` and ``in``. You must start 
+your new OSD before it can begin receiving data. You may use
+``service ceph`` from your admin host or start the OSD from its host
+machine.
+
+For Ubuntu Trusty use Upstart. ::
+
+	sudo start ceph-osd id={osd-num}
+
+For all other distros use systemd. ::
+
+	sudo systemctl start ceph-osd@{osd-num}
+
+
+Once you start your OSD, it is ``up`` and ``in``.
+
+
+Observe the Data Migration
+--------------------------
+
+Once you have added your new OSD to the CRUSH map, Ceph  will begin rebalancing
+the server by migrating placement groups to your new OSD. You can observe this
+process with  the `ceph`_ tool. :: 
+
+	ceph -w
+
+You should see the placement group states change from ``active+clean`` to
+``active, some degraded objects``, and finally ``active+clean`` when migration
+completes. (Control-c to exit.)
+
+
+.. _Add/Move an OSD: ../crush-map#addosd
+.. _ceph: ../monitoring
+
+
+
+Removing OSDs (Manual)
+======================
+
+When you want to reduce the size of a cluster or replace hardware, you may
+remove an OSD at runtime. With Ceph, an OSD is generally one Ceph ``ceph-osd``
+daemon for one storage drive within a host machine. If your host has multiple
+storage drives, you may need to remove one ``ceph-osd`` daemon for each drive.
+Generally, it's a good idea to check the capacity of your cluster to see if you
+are reaching the upper end of its capacity. Ensure that when you remove an OSD
+that your cluster is not at its ``near full`` ratio.
+
+.. warning:: Do not let your cluster reach its ``full ratio`` when
+   removing an OSD. Removing OSDs could cause the cluster to reach 
+   or exceed its ``full ratio``.
+   
+
+Take the OSD out of the Cluster
+-----------------------------------
+
+Before you remove an OSD, it is usually ``up`` and ``in``.  You need to take it
+out of the cluster so that Ceph can begin rebalancing and copying its data to
+other OSDs. :: 
+
+	ceph osd out {osd-num}
+
+
+Observe the Data Migration
+--------------------------
+
+Once you have taken your OSD ``out`` of the cluster, Ceph  will begin
+rebalancing the cluster by migrating placement groups out of the OSD you
+removed. You can observe  this process with  the `ceph`_ tool. :: 
+
+	ceph -w
+
+You should see the placement group states change from ``active+clean`` to
+``active, some degraded objects``, and finally ``active+clean`` when migration
+completes. (Control-c to exit.)
+
+.. note:: Sometimes, typically in a "small" cluster with few hosts (for
+   instance with a small testing cluster), the fact to take ``out`` the
+   OSD can spawn a CRUSH corner case where some PGs remain stuck in the
+   ``active+remapped`` state. If you are in this case, you should mark
+   the OSD ``in`` with:
+
+       ``ceph osd in {osd-num}``
+
+   to come back to the initial state and then, instead of marking ``out``
+   the OSD, set its weight to 0 with:
+
+       ``ceph osd crush reweight osd.{osd-num} 0``
+
+   After that, you can observe the data migration which should come to its
+   end. The difference between marking ``out`` the OSD and reweighting it
+   to 0 is that in the first case the weight of the bucket which contains
+   the OSD is not changed whereas in the second case the weight of the bucket
+   is updated (and decreased of the OSD weight). The reweight command could
+   be sometimes favoured in the case of a "small" cluster.
+
+
+
+Stopping the OSD
+----------------
+
+After you take an OSD out of the cluster, it may still be running. 
+That is, the OSD may be ``up`` and ``out``. You must stop 
+your OSD before you remove it from the configuration. :: 
+
+	ssh {osd-host}
+	sudo systemctl stop ceph-osd@{osd-num}
+
+Once you stop your OSD, it is ``down``. 
+
+
+Removing the OSD
+----------------
+
+This procedure removes an OSD from a cluster map, removes its authentication
+key, removes the OSD from the OSD map, and removes the OSD from the
+``ceph.conf`` file. If your host has multiple drives, you may need to remove an
+OSD for each drive by repeating this procedure.
+
+#. Let the cluster forget the OSD first. This step removes the OSD from the CRUSH
+   map, removes its authentication key. And it is removed from the OSD map as
+   well. Please note the `purge subcommand`_ is introduced in Luminous, for older
+   versions, please see below ::
+
+    ceph osd purge {id} --yes-i-really-mean-it
+
+#. Navigate to the host where you keep the master copy of the cluster's
+   ``ceph.conf`` file. ::
+
+	ssh {admin-host}
+	cd /etc/ceph
+	vim ceph.conf
+
+#. Remove the OSD entry from your ``ceph.conf`` file (if it exists). ::
+
+	[osd.1]
+		host = {hostname}
+
+#. From the host where you keep the master copy of the cluster's ``ceph.conf`` file,
+   copy the updated ``ceph.conf`` file to the ``/etc/ceph`` directory of other
+   hosts in your cluster.
+
+If your Ceph cluster is older than Luminous, instead of using ``ceph osd purge``,
+you need to perform this step manually:
+
+
+#. Remove the OSD from the CRUSH map so that it no longer receives data. You may
+   also decompile the CRUSH map, remove the OSD from the device list, remove the
+   device as an item in the host bucket or remove the host  bucket (if it's in the
+   CRUSH map and you intend to remove the host), recompile the map and set it. 
+   See `Remove an OSD`_ for details. :: 
+
+	ceph osd crush remove {name}
+	
+#. Remove the OSD authentication key. ::
+
+	ceph auth del osd.{osd-num}
+	
+   The value of ``ceph`` for ``ceph-{osd-num}`` in the path is the ``$cluster-$id``. 
+   If your cluster name differs from ``ceph``, use your cluster name instead.	
+	
+#. Remove the OSD. ::
+
+	ceph osd rm {osd-num}
+	#for example
+	ceph osd rm 1
+
+	
+.. _Remove an OSD: ../crush-map#removeosd
+.. _purge subcommand: /man/8/ceph#osd
diff --git a/src/ceph/doc/rados/operations/cache-tiering.rst b/src/ceph/doc/rados/operations/cache-tiering.rst
new file mode 100644
index 0000000..322c6ff
--- /dev/null
+++ b/src/ceph/doc/rados/operations/cache-tiering.rst
@@ -0,0 +1,461 @@
+===============
+ Cache Tiering
+===============
+
+A cache tier provides Ceph Clients with better I/O performance for a subset of
+the data stored in a backing storage tier. Cache tiering involves creating a
+pool of relatively fast/expensive storage devices (e.g., solid state drives)
+configured to act as a cache tier, and a backing pool of either erasure-coded
+or relatively slower/cheaper devices configured to act as an economical storage
+tier. The Ceph objecter handles where to place the objects and the tiering
+agent determines when to flush objects from the cache to the backing storage
+tier. So the cache tier and the backing storage tier are completely transparent 
+to Ceph clients.
+
+
+.. ditaa:: 
+           +-------------+
+           | Ceph Client |
+           +------+------+
+                  ^
+     Tiering is   |  
+    Transparent   |              Faster I/O
+        to Ceph   |           +---------------+
+     Client Ops   |           |               |   
+                  |    +----->+   Cache Tier  |
+                  |    |      |               |
+                  |    |      +-----+---+-----+
+                  |    |            |   ^ 
+                  v    v            |   |   Active Data in Cache Tier
+           +------+----+--+         |   |
+           |   Objecter   |         |   |
+           +-----------+--+         |   |
+                       ^            |   |   Inactive Data in Storage Tier
+                       |            v   |
+                       |      +-----+---+-----+
+                       |      |               |
+                       +----->|  Storage Tier |
+                              |               |
+                              +---------------+
+                                 Slower I/O
+
+
+The cache tiering agent handles the migration of data between the cache tier 
+and the backing storage tier automatically. However, admins have the ability to
+configure how this migration takes place. There are two main scenarios: 
+
+- **Writeback Mode:** When admins configure tiers with ``writeback`` mode, Ceph
+  clients write data to the cache tier and receive an ACK from the cache tier.
+  In time, the data written to the cache tier migrates to the storage tier
+  and gets flushed from the cache tier. Conceptually, the cache tier is 
+  overlaid "in front" of the backing storage tier. When a Ceph client needs 
+  data that resides in the storage tier, the cache tiering agent migrates the
+  data to the cache tier on read, then it is sent to the Ceph client. 
+  Thereafter, the Ceph client can perform I/O using the cache tier, until the 
+  data becomes inactive. This is ideal for mutable data (e.g., photo/video 
+  editing, transactional data, etc.).
+
+- **Read-proxy Mode:** This mode will use any objects that already
+  exist in the cache tier, but if an object is not present in the
+  cache the request will be proxied to the base tier.  This is useful
+  for transitioning from ``writeback`` mode to a disabled cache as it
+  allows the workload to function properly while the cache is drained,
+  without adding any new objects to the cache.
+
+A word of caution
+=================
+
+Cache tiering will *degrade* performance for most workloads.  Users should use
+extreme caution before using this feature.
+
+* *Workload dependent*: Whether a cache will improve performance is
+  highly dependent on the workload.  Because there is a cost
+  associated with moving objects into or out of the cache, it can only
+  be effective when there is a *large skew* in the access pattern in
+  the data set, such that most of the requests touch a small number of
+  objects.  The cache pool should be large enough to capture the
+  working set for your workload to avoid thrashing.
+
+* *Difficult to benchmark*: Most benchmarks that users run to measure
+  performance will show terrible performance with cache tiering, in
+  part because very few of them skew requests toward a small set of
+  objects, it can take a long time for the cache to "warm up," and
+  because the warm-up cost can be high.
+
+* *Usually slower*: For workloads that are not cache tiering-friendly,
+  performance is often slower than a normal RADOS pool without cache
+  tiering enabled.
+
+* *librados object enumeration*: The librados-level object enumeration
+  API is not meant to be coherent in the presence of the case.  If
+  your applicatoin is using librados directly and relies on object
+  enumeration, cache tiering will probably not work as expected.
+  (This is not a problem for RGW, RBD, or CephFS.)
+
+* *Complexity*: Enabling cache tiering means that a lot of additional
+  machinery and complexity within the RADOS cluster is being used.
+  This increases the probability that you will encounter a bug in the system
+  that other users have not yet encountered and will put your deployment at a
+  higher level of risk.
+
+Known Good Workloads
+--------------------
+
+* *RGW time-skewed*: If the RGW workload is such that almost all read
+  operations are directed at recently written objects, a simple cache
+  tiering configuration that destages recently written objects from
+  the cache to the base tier after a configurable period can work
+  well.
+
+Known Bad Workloads
+-------------------
+
+The following configurations are *known to work poorly* with cache
+tiering.
+
+* *RBD with replicated cache and erasure-coded base*: This is a common
+  request, but usually does not perform well.  Even reasonably skewed
+  workloads still send some small writes to cold objects, and because
+  small writes are not yet supported by the erasure-coded pool, entire
+  (usually 4 MB) objects must be migrated into the cache in order to
+  satisfy a small (often 4 KB) write.  Only a handful of users have
+  successfully deployed this configuration, and it only works for them
+  because their data is extremely cold (backups) and they are not in
+  any way sensitive to performance.
+
+* *RBD with replicated cache and base*: RBD with a replicated base
+  tier does better than when the base is erasure coded, but it is
+  still highly dependent on the amount of skew in the workload, and
+  very difficult to validate.  The user will need to have a good
+  understanding of their workload and will need to tune the cache
+  tiering parameters carefully.
+
+
+Setting Up Pools
+================
+
+To set up cache tiering, you must have two pools. One will act as the 
+backing storage and the other will act as the cache.
+
+
+Setting Up a Backing Storage Pool
+---------------------------------
+
+Setting up a backing storage pool typically involves one of two scenarios: 
+
+- **Standard Storage**: In this scenario, the pool stores multiple copies
+  of an object in the Ceph Storage Cluster.
+
+- **Erasure Coding:** In this scenario, the pool uses erasure coding to 
+  store data much more efficiently with a small performance tradeoff.
+
+In the standard storage scenario, you can setup a CRUSH ruleset to establish 
+the failure domain (e.g., osd, host, chassis, rack, row, etc.). Ceph OSD 
+Daemons perform optimally when all storage drives in the ruleset are of the 
+same size, speed (both RPMs and throughput) and type. See `CRUSH Maps`_ 
+for details on creating a ruleset. Once you have created a ruleset, create 
+a backing storage pool. 
+
+In the erasure coding scenario, the pool creation arguments will generate the
+appropriate ruleset automatically. See `Create a Pool`_ for details.
+
+In subsequent examples, we will refer to the backing storage pool 
+as ``cold-storage``.
+
+
+Setting Up a Cache Pool
+-----------------------
+
+Setting up a cache pool follows the same procedure as the standard storage
+scenario, but with this difference: the drives for the cache tier are typically
+high performance drives that reside in their own servers and have their own
+ruleset.  When setting up a ruleset, it should take account of the hosts that
+have the high performance drives while omitting the hosts that don't. See
+`Placing Different Pools on Different OSDs`_ for details.
+
+
+In subsequent examples, we will refer to the cache pool as ``hot-storage`` and
+the backing pool as ``cold-storage``.
+
+For cache tier configuration and default values, see 
+`Pools - Set Pool Values`_.
+
+
+Creating a Cache Tier
+=====================
+
+Setting up a cache tier involves associating a backing storage pool with
+a cache pool ::
+
+	ceph osd tier add {storagepool} {cachepool}
+
+For example ::
+
+	ceph osd tier add cold-storage hot-storage
+
+To set the cache mode, execute the following::
+
+	ceph osd tier cache-mode {cachepool} {cache-mode}
+
+For example:: 
+
+	ceph osd tier cache-mode hot-storage writeback
+
+The cache tiers overlay the backing storage tier, so they require one
+additional step: you must direct all client traffic from the storage pool to 
+the cache pool. To direct client traffic directly to the cache pool, execute 
+the following:: 
+
+	ceph osd tier set-overlay {storagepool} {cachepool}
+
+For example:: 
+
+	ceph osd tier set-overlay cold-storage hot-storage
+
+
+Configuring a Cache Tier
+========================
+
+Cache tiers have several configuration options. You may set
+cache tier configuration options with the following usage:: 
+
+	ceph osd pool set {cachepool} {key} {value}
+
+See `Pools - Set Pool Values`_ for details.
+
+
+Target Size and Type
+--------------------
+
+Ceph's production cache tiers use a `Bloom Filter`_ for the ``hit_set_type``::
+
+	ceph osd pool set {cachepool} hit_set_type bloom
+
+For example::
+
+	ceph osd pool set hot-storage hit_set_type bloom
+
+The ``hit_set_count`` and ``hit_set_period`` define how much time each HitSet
+should cover, and how many such HitSets to store. ::
+
+	ceph osd pool set {cachepool} hit_set_count 12
+	ceph osd pool set {cachepool} hit_set_period 14400
+	ceph osd pool set {cachepool} target_max_bytes 1000000000000
+
+.. note:: A larger ``hit_set_count`` results in more RAM consumed by
+          the ``ceph-osd`` process.
+
+Binning accesses over time allows Ceph to determine whether a Ceph client
+accessed an object at least once, or more than once over a time period 
+("age" vs "temperature").
+
+The ``min_read_recency_for_promote`` defines how many HitSets to check for the
+existence of an object when handling a read operation. The checking result is
+used to decide whether to promote the object asynchronously. Its value should be
+between 0 and ``hit_set_count``. If it's set to 0, the object is always promoted.
+If it's set to 1, the current HitSet is checked. And if this object is in the
+current HitSet, it's promoted. Otherwise not. For the other values, the exact
+number of archive HitSets are checked. The object is promoted if the object is
+found in any of the most recent ``min_read_recency_for_promote`` HitSets.
+
+A similar parameter can be set for the write operation, which is
+``min_write_recency_for_promote``. ::
+
+	ceph osd pool set {cachepool} min_read_recency_for_promote 2
+	ceph osd pool set {cachepool} min_write_recency_for_promote 2
+
+.. note:: The longer the period and the higher the
+   ``min_read_recency_for_promote`` and
+   ``min_write_recency_for_promote``values, the more RAM the ``ceph-osd``
+   daemon consumes. In particular, when the agent is active to flush
+   or evict cache objects, all ``hit_set_count`` HitSets are loaded
+   into RAM.
+
+
+Cache Sizing
+------------
+
+The cache tiering agent performs two main functions: 
+
+- **Flushing:** The agent identifies modified (or dirty) objects and forwards
+  them to the storage pool for long-term storage.
+  
+- **Evicting:** The agent identifies objects that haven't been modified 
+  (or clean) and evicts the least recently used among them from the cache.
+
+
+Absolute Sizing
+~~~~~~~~~~~~~~~
+
+The cache tiering agent can flush or evict objects based upon the total number
+of bytes or the total number of objects. To specify a maximum number of bytes,
+execute the following::
+
+	ceph osd pool set {cachepool} target_max_bytes {#bytes}
+
+For example, to flush or evict at 1 TB, execute the following::
+
+	ceph osd pool set hot-storage target_max_bytes 1099511627776
+
+
+To specify the maximum number of objects, execute the following::
+
+	ceph osd pool set {cachepool} target_max_objects {#objects}
+
+For example, to flush or evict at 1M objects, execute the following::
+
+	ceph osd pool set hot-storage target_max_objects 1000000
+
+.. note:: Ceph is not able to determine the size of a cache pool automatically, so
+   the configuration on the absolute size is required here, otherwise the
+   flush/evict will not work. If you specify both limits, the cache tiering
+   agent will begin flushing or evicting when either threshold is triggered.
+
+.. note:: All client requests will be blocked only when  ``target_max_bytes`` or
+   ``target_max_objects`` reached
+
+Relative Sizing
+~~~~~~~~~~~~~~~
+
+The cache tiering agent can flush or evict objects relative to the size of the
+cache pool(specified by ``target_max_bytes`` / ``target_max_objects`` in
+`Absolute sizing`_).  When the cache pool consists of a certain percentage of
+modified (or dirty) objects, the cache tiering agent will flush them to the
+storage pool. To set the ``cache_target_dirty_ratio``, execute the following::
+
+	ceph osd pool set {cachepool} cache_target_dirty_ratio {0.0..1.0}
+
+For example, setting the value to ``0.4`` will begin flushing modified
+(dirty) objects when they reach 40% of the cache pool's capacity:: 
+
+	ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
+
+When the dirty objects reaches a certain percentage of its capacity, flush dirty
+objects with a higher speed. To set the ``cache_target_dirty_high_ratio``::
+
+	ceph osd pool set {cachepool} cache_target_dirty_high_ratio {0.0..1.0}
+
+For example, setting the value to ``0.6`` will begin aggressively flush dirty objects
+when they reach 60% of the cache pool's capacity. obviously, we'd better set the value
+between dirty_ratio and full_ratio::
+
+	ceph osd pool set hot-storage cache_target_dirty_high_ratio 0.6
+
+When the cache pool reaches a certain percentage of its capacity, the cache
+tiering agent will evict objects to maintain free capacity. To set the 
+``cache_target_full_ratio``, execute the following:: 
+
+	ceph osd pool set {cachepool} cache_target_full_ratio {0.0..1.0}
+
+For example, setting the value to ``0.8`` will begin flushing unmodified
+(clean) objects when they reach 80% of the cache pool's capacity:: 
+
+	ceph osd pool set hot-storage cache_target_full_ratio 0.8
+
+
+Cache Age
+---------
+
+You can specify the minimum age of an object before the cache tiering agent 
+flushes a recently modified (or dirty) object to the backing storage pool::
+
+	ceph osd pool set {cachepool} cache_min_flush_age {#seconds}
+
+For example, to flush modified (or dirty) objects after 10 minutes, execute 
+the following:: 
+
+	ceph osd pool set hot-storage cache_min_flush_age 600
+
+You can specify the minimum age of an object before it will be evicted from
+the cache tier::
+
+	ceph osd pool {cache-tier} cache_min_evict_age {#seconds}
+
+For example, to evict objects after 30 minutes, execute the following:: 
+
+	ceph osd pool set hot-storage cache_min_evict_age 1800
+
+
+Removing a Cache Tier
+=====================
+
+Removing a cache tier differs depending on whether it is a writeback 
+cache or a read-only cache.
+
+
+Removing a Read-Only Cache
+--------------------------
+
+Since a read-only cache does not have modified data, you can disable
+and remove it without losing any recent changes to objects in the cache. 
+
+#. Change the cache-mode to ``none`` to disable it. :: 
+
+	ceph osd tier cache-mode {cachepool} none
+
+   For example:: 
+
+	ceph osd tier cache-mode hot-storage none
+
+#. Remove the cache pool from the backing pool. ::
+
+	ceph osd tier remove {storagepool} {cachepool}
+
+   For example::
+
+	ceph osd tier remove cold-storage hot-storage
+
+
+
+Removing a Writeback Cache
+--------------------------
+
+Since a writeback cache may have modified data, you must take steps to ensure 
+that you do not lose any recent changes to objects in the cache before you 
+disable and remove it.
+
+
+#. Change the cache mode to ``forward`` so that new and modified objects will 
+   flush to the backing storage pool. ::
+
+	ceph osd tier cache-mode {cachepool} forward
+
+   For example:: 
+
+	ceph osd tier cache-mode hot-storage forward
+
+
+#. Ensure that the cache pool has been flushed. This may take a few minutes::
+
+	rados -p {cachepool} ls
+
+   If the cache pool still has objects, you can flush them manually. 
+   For example::
+
+	rados -p {cachepool} cache-flush-evict-all
+
+
+#. Remove the overlay so that clients will not direct traffic to the cache. ::
+
+	ceph osd tier remove-overlay {storagetier}
+
+   For example::
+
+	ceph osd tier remove-overlay cold-storage
+
+
+#. Finally, remove the cache tier pool from the backing storage pool. ::
+
+	ceph osd tier remove {storagepool} {cachepool} 
+
+   For example::
+
+	ceph osd tier remove cold-storage hot-storage
+
+
+.. _Create a Pool: ../pools#create-a-pool
+.. _Pools - Set Pool Values: ../pools#set-pool-values
+.. _Placing Different Pools on Different OSDs: ../crush-map/#placing-different-pools-on-different-osds
+.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
+.. _CRUSH Maps: ../crush-map
+.. _Absolute Sizing: #absolute-sizing
diff --git a/src/ceph/doc/rados/operations/control.rst b/src/ceph/doc/rados/operations/control.rst
new file mode 100644
index 0000000..1a58076
--- /dev/null
+++ b/src/ceph/doc/rados/operations/control.rst
@@ -0,0 +1,453 @@
+.. index:: control, commands
+
+==================
+ Control Commands
+==================
+
+
+Monitor Commands
+================
+
+Monitor commands are issued using the ceph utility::
+
+	ceph [-m monhost] {command}
+
+The command is usually (though not always) of the form::
+
+	ceph {subsystem} {command}
+
+
+System Commands
+===============
+
+Execute the following to display the current status of the cluster.  ::
+
+	ceph -s
+	ceph status
+
+Execute the following to display a running summary of the status of the cluster,
+and major events. ::
+
+	ceph -w
+
+Execute the following to show the monitor quorum, including which monitors are
+participating and which one is the leader. ::
+
+	ceph quorum_status
+
+Execute the following to query the status of a single monitor, including whether
+or not it is in the quorum. ::
+
+	ceph [-m monhost] mon_status
+
+
+Authentication Subsystem
+========================
+
+To add a keyring for an OSD, execute the following::
+
+	ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}
+
+To list the cluster's keys and their capabilities, execute the following::
+
+	ceph auth ls
+
+
+Placement Group Subsystem
+=========================
+
+To display the statistics for all placement groups, execute the following:: 
+
+	ceph pg dump [--format {format}]
+
+The valid formats are ``plain`` (default) and ``json``.
+
+To display the statistics for all placement groups stuck in a specified state, 
+execute the following:: 
+
+	ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]
+
+
+``--format`` may be ``plain`` (default) or ``json``
+
+``--threshold`` defines how many seconds "stuck" is (default: 300)
+
+**Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
+with the most up-to-date data to come back.
+
+**Unclean** Placement groups contain objects that are not replicated the desired number
+of times. They should be recovering.
+
+**Stale** Placement groups are in an unknown state - the OSDs that host them have not
+reported to the monitor cluster in a while (configured by
+``mon_osd_report_timeout``).
+
+Delete "lost" objects or revert them to their prior state, either a previous version
+or delete them if they were just created. ::
+
+	ceph pg {pgid} mark_unfound_lost revert|delete
+
+
+OSD Subsystem
+=============
+
+Query OSD subsystem status. ::
+
+	ceph osd stat
+
+Write a copy of the most recent OSD map to a file. See
+`osdmaptool`_. ::
+
+	ceph osd getmap -o file
+
+.. _osdmaptool: ../../man/8/osdmaptool
+
+Write a copy of the crush map from the most recent OSD map to
+file. ::
+
+	ceph osd getcrushmap -o file
+
+The foregoing functionally equivalent to ::
+
+	ceph osd getmap -o /tmp/osdmap
+	osdmaptool /tmp/osdmap --export-crush file
+
+Dump the OSD map. Valid formats for ``-f`` are ``plain`` and ``json``. If no
+``--format`` option is given, the OSD map is dumped as plain text. ::
+
+	ceph osd dump [--format {format}]
+
+Dump the OSD map as a tree with one line per OSD containing weight
+and state. ::
+
+	ceph osd tree [--format {format}]
+
+Find out where a specific object is or would be stored in the system::
+
+	ceph osd map <pool-name> <object-name>
+
+Add or move a new item (OSD) with the given id/name/weight at the specified
+location. ::
+
+	ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]
+
+Remove an existing item (OSD) from the CRUSH map. ::
+
+	ceph osd crush remove {name}
+
+Remove an existing bucket from the CRUSH map. ::
+
+	ceph osd crush remove {bucket-name}
+
+Move an existing bucket from one position in the hierarchy to another.  ::
+
+	ceph osd crush move {id} {loc1} [{loc2} ...]
+
+Set the weight of the item given by ``{name}`` to ``{weight}``. ::
+
+	ceph osd crush reweight {name} {weight}
+
+Mark an OSD as lost. This may result in permanent data loss. Use with caution. ::
+
+	ceph osd lost {id} [--yes-i-really-mean-it]
+
+Create a new OSD. If no UUID is given, it will be set automatically when the OSD
+starts up. ::
+
+	ceph osd create [{uuid}]
+
+Remove the given OSD(s). ::
+
+	ceph osd rm [{id}...]
+
+Query the current max_osd parameter in the OSD map. ::
+
+	ceph osd getmaxosd
+
+Import the given crush map. ::
+
+	ceph osd setcrushmap -i file
+
+Set the ``max_osd`` parameter in the OSD map. This is necessary when
+expanding the storage cluster. ::
+
+	ceph osd setmaxosd
+
+Mark OSD ``{osd-num}`` down. ::
+
+	ceph osd down {osd-num}
+
+Mark OSD ``{osd-num}`` out of the distribution (i.e. allocated no data). ::
+
+	ceph osd out {osd-num}
+
+Mark ``{osd-num}`` in the distribution (i.e. allocated data). ::
+
+	ceph osd in {osd-num}
+
+Set or clear the pause flags in the OSD map. If set, no IO requests
+will be sent to any OSD. Clearing the flags via unpause results in
+resending pending requests. ::
+
+	ceph osd pause
+	ceph osd unpause
+
+Set the weight of ``{osd-num}`` to ``{weight}``. Two OSDs with the
+same weight will receive roughly the same number of I/O requests and
+store approximately the same amount of data. ``ceph osd reweight``
+sets an override weight on the OSD. This value is in the range 0 to 1,
+and forces CRUSH to re-place (1-weight) of the data that would
+otherwise live on this drive. It does not change the weights assigned
+to the buckets above the OSD in the crush map, and is a corrective
+measure in case the normal CRUSH distribution is not working out quite
+right. For instance, if one of your OSDs is at 90% and the others are
+at 50%, you could reduce this weight to try and compensate for it. ::
+
+	ceph osd reweight {osd-num} {weight}
+
+Reweights all the OSDs by reducing the weight of OSDs which are
+heavily overused. By default it will adjust the weights downward on
+OSDs which have 120% of the average utilization, but if you include
+threshold it will use that percentage instead. ::
+
+	ceph osd reweight-by-utilization [threshold]
+
+Describes what reweight-by-utilization would do. ::
+
+	ceph osd test-reweight-by-utilization
+
+Adds/removes the address to/from the blacklist. When adding an address,
+you can specify how long it should be blacklisted in seconds; otherwise,
+it will default to 1 hour. A blacklisted address is prevented from
+connecting to any OSD. Blacklisting is most often used to prevent a
+lagging metadata server from making bad changes to data on the OSDs.
+
+These commands are mostly only useful for failure testing, as
+blacklists are normally maintained automatically and shouldn't need
+manual intervention. ::
+
+	ceph osd blacklist add ADDRESS[:source_port] [TIME]
+	ceph osd blacklist rm ADDRESS[:source_port]
+
+Creates/deletes a snapshot of a pool. ::
+
+	ceph osd pool mksnap {pool-name} {snap-name}
+	ceph osd pool rmsnap {pool-name} {snap-name}
+
+Creates/deletes/renames a storage pool. ::
+
+	ceph osd pool create {pool-name} pg_num [pgp_num]
+	ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
+	ceph osd pool rename {old-name} {new-name}
+
+Changes a pool setting. :: 
+
+	ceph osd pool set {pool-name} {field} {value}
+
+Valid fields are:
+
+	* ``size``: Sets the number of copies of data in the pool.
+	* ``pg_num``: The placement group number.
+	* ``pgp_num``: Effective number when calculating pg placement.
+	* ``crush_ruleset``: rule number for mapping placement.
+
+Get the value of a pool setting. ::
+
+	ceph osd pool get {pool-name} {field}
+
+Valid fields are:
+
+	* ``pg_num``: The placement group number.
+	* ``pgp_num``: Effective number of placement groups when calculating placement.
+	* ``lpg_num``: The number of local placement groups.
+	* ``lpgp_num``: The number used for placing the local placement groups.
+
+
+Sends a scrub command to OSD ``{osd-num}``. To send the command to all OSDs, use ``*``. ::
+
+	ceph osd scrub {osd-num}
+
+Sends a repair command to OSD.N. To send the command to all OSDs, use ``*``. ::
+
+	ceph osd repair N
+
+Runs a simple throughput benchmark against OSD.N, writing ``TOTAL_DATA_BYTES``
+in write requests of ``BYTES_PER_WRITE`` each. By default, the test
+writes 1 GB in total in 4-MB increments.
+The benchmark is non-destructive and will not overwrite existing live
+OSD data, but might temporarily affect the performance of clients
+concurrently accessing the OSD. ::
+
+	ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]
+
+
+MDS Subsystem
+=============
+
+Change configuration parameters on a running mds. ::
+
+	ceph tell mds.{mds-id} injectargs --{switch} {value} [--{switch} {value}]
+
+Example::
+
+	ceph tell mds.0 injectargs --debug_ms 1 --debug_mds 10
+
+Enables debug messages. ::
+
+	ceph mds stat
+
+Displays the status of all metadata servers. ::
+
+	ceph mds fail 0
+
+Marks the active MDS as failed, triggering failover to a standby if present.
+
+.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap
+
+
+Mon Subsystem
+=============
+
+Show monitor stats::
+
+	ceph mon stat
+
+	e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c
+
+
+The ``quorum`` list at the end lists monitor nodes that are part of the current quorum.
+
+This is also available more directly::
+
+	ceph quorum_status -f json-pretty
+	
+.. code-block:: javascript	
+
+	{
+	    "election_epoch": 6,
+	    "quorum": [
+		0,
+		1,
+		2
+	    ],
+	    "quorum_names": [
+		"a",
+		"b",
+		"c"
+	    ],
+	    "quorum_leader_name": "a",
+	    "monmap": {
+		"epoch": 2,
+		"fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
+		"modified": "2016-12-26 14:42:09.288066",
+		"created": "2016-12-26 14:42:03.573585",
+		"features": {
+		    "persistent": [
+			"kraken"
+		    ],
+		    "optional": []
+		},
+		"mons": [
+		    {
+			"rank": 0,
+			"name": "a",
+			"addr": "127.0.0.1:40000\/0",
+			"public_addr": "127.0.0.1:40000\/0"
+		    },
+		    {
+			"rank": 1,
+			"name": "b",
+			"addr": "127.0.0.1:40001\/0",
+			"public_addr": "127.0.0.1:40001\/0"
+		    },
+		    {
+			"rank": 2,
+			"name": "c",
+			"addr": "127.0.0.1:40002\/0",
+			"public_addr": "127.0.0.1:40002\/0"
+		    }
+		]
+	    }
+	}
+	  
+
+The above will block until a quorum is reached.
+
+For a status of just the monitor you connect to (use ``-m HOST:PORT``
+to select)::
+
+	ceph mon_status -f json-pretty
+	
+	
+.. code-block:: javascript
+	
+	{
+	    "name": "b",
+	    "rank": 1,
+	    "state": "peon",
+	    "election_epoch": 6,
+	    "quorum": [
+		0,
+		1,
+		2
+	    ],
+	    "features": {
+		"required_con": "9025616074522624",
+		"required_mon": [
+		    "kraken"
+		],
+		"quorum_con": "1152921504336314367",
+		"quorum_mon": [
+		    "kraken"
+		]
+	    },
+	    "outside_quorum": [],
+	    "extra_probe_peers": [],
+	    "sync_provider": [],
+	    "monmap": {
+		"epoch": 2,
+		"fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
+		"modified": "2016-12-26 14:42:09.288066",
+		"created": "2016-12-26 14:42:03.573585",
+		"features": {
+		    "persistent": [
+			"kraken"
+		    ],
+		    "optional": []
+		},
+		"mons": [
+		    {
+			"rank": 0,
+			"name": "a",
+			"addr": "127.0.0.1:40000\/0",
+			"public_addr": "127.0.0.1:40000\/0"
+		    },
+		    {
+			"rank": 1,
+			"name": "b",
+			"addr": "127.0.0.1:40001\/0",
+			"public_addr": "127.0.0.1:40001\/0"
+		    },
+		    {
+			"rank": 2,
+			"name": "c",
+			"addr": "127.0.0.1:40002\/0",
+			"public_addr": "127.0.0.1:40002\/0"
+		    }
+		]
+	    }
+	}
+
+A dump of the monitor state::
+
+	ceph mon dump
+
+	dumped monmap epoch 2
+	epoch 2
+	fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
+	last_changed 2016-12-26 14:42:09.288066
+	created 2016-12-26 14:42:03.573585
+	0: 127.0.0.1:40000/0 mon.a
+	1: 127.0.0.1:40001/0 mon.b
+	2: 127.0.0.1:40002/0 mon.c
+
diff --git a/src/ceph/doc/rados/operations/crush-map-edits.rst b/src/ceph/doc/rados/operations/crush-map-edits.rst
new file mode 100644
index 0000000..5222270
--- /dev/null
+++ b/src/ceph/doc/rados/operations/crush-map-edits.rst
@@ -0,0 +1,654 @@
+Manually editing a CRUSH Map
+============================
+
+.. note:: Manually editing the CRUSH map is considered an advanced
+	  administrator operation.  All CRUSH changes that are
+	  necessary for the overwhelming majority of installations are
+	  possible via the standard ceph CLI and do not require manual
+	  CRUSH map edits.  If you have identified a use case where
+	  manual edits *are* necessary, consider contacting the Ceph
+	  developers so that future versions of Ceph can make this
+	  unnecessary.
+
+To edit an existing CRUSH map:
+
+#. `Get the CRUSH map`_.
+#. `Decompile`_ the CRUSH map.
+#. Edit at least one of `Devices`_, `Buckets`_ and `Rules`_.
+#. `Recompile`_ the CRUSH map.
+#. `Set the CRUSH map`_.
+
+To activate CRUSH map rules for a specific pool, identify the common ruleset
+number for those rules and specify that ruleset number for the pool. See `Set
+Pool Values`_ for details.
+
+.. _Get the CRUSH map: #getcrushmap
+.. _Decompile: #decompilecrushmap
+.. _Devices: #crushmapdevices
+.. _Buckets: #crushmapbuckets
+.. _Rules: #crushmaprules
+.. _Recompile: #compilecrushmap
+.. _Set the CRUSH map: #setcrushmap
+.. _Set Pool Values: ../pools#setpoolvalues
+
+.. _getcrushmap:
+
+Get a CRUSH Map
+---------------
+
+To get the CRUSH map for your cluster, execute the following::
+
+	ceph osd getcrushmap -o {compiled-crushmap-filename}
+
+Ceph will output (-o) a compiled CRUSH map to the filename you specified. Since
+the CRUSH map is in a compiled form, you must decompile it first before you can
+edit it.
+
+.. _decompilecrushmap:
+
+Decompile a CRUSH Map
+---------------------
+
+To decompile a CRUSH map, execute the following::
+
+	crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
+
+
+Sections
+--------
+
+There are six main sections to a CRUSH Map.
+
+#. **tunables:** The preamble at the top of the map described any *tunables*
+   for CRUSH behavior that vary from the historical/legacy CRUSH behavior. These
+   correct for old bugs, optimizations, or other changes in behavior that have
+   been made over the years to improve CRUSH's behavior.
+
+#. **devices:** Devices are individual ``ceph-osd`` daemons that can
+   store data.
+
+#. **types**: Bucket ``types`` define the types of buckets used in
+   your CRUSH hierarchy. Buckets consist of a hierarchical aggregation
+   of storage locations (e.g., rows, racks, chassis, hosts, etc.) and
+   their assigned weights.
+
+#. **buckets:** Once you define bucket types, you must define each node
+   in the hierarchy, its type, and which devices or other nodes it
+   containes.
+
+#. **rules:** Rules define policy about how data is distributed across
+   devices in the hierarchy.
+
+#. **choose_args:** Choose_args are alternative weights associated with
+   the hierarchy that have been adjusted to optimize data placement.  A single
+   choose_args map can be used for the entire cluster, or one can be
+   created for each individual pool.
+
+
+.. _crushmapdevices:
+
+CRUSH Map Devices
+-----------------
+
+Devices are individual ``ceph-osd`` daemons that can store data.  You
+will normally have one defined here for each OSD daemon in your
+cluster.  Devices are identified by an id (a non-negative integer) and
+a name, normally ``osd.N`` where ``N`` is the device id.
+
+Devices may also have a *device class* associated with them (e.g.,
+``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
+crush rule.
+
+::
+
+	# devices
+	device {num} {osd.name} [class {class}]
+
+For example::
+
+	# devices
+	device 0 osd.0 class ssd
+	device 1 osd.1 class hdd
+	device 2 osd.2
+	device 3 osd.3
+
+In most cases, each device maps to a single ``ceph-osd`` daemon.  This
+is normally a single storage device, a pair of devices (for example,
+one for data and one for a journal or metadata), or in some cases a
+small RAID device.
+
+
+
+
+
+CRUSH Map Bucket Types
+----------------------
+
+The second list in the CRUSH map defines 'bucket' types. Buckets facilitate
+a hierarchy of nodes and leaves. Node (or non-leaf) buckets typically represent
+physical locations in a hierarchy. Nodes aggregate other nodes or leaves.
+Leaf buckets represent ``ceph-osd`` daemons and their corresponding storage
+media.
+
+.. tip:: The term "bucket" used in the context of CRUSH means a node in
+   the hierarchy, i.e. a location or a piece of physical hardware. It
+   is a different concept from the term "bucket" when used in the
+   context of RADOS Gateway APIs.
+
+To add a bucket type to the CRUSH map, create a new line under your list of
+bucket types. Enter ``type`` followed by a unique numeric ID and a bucket name.
+By convention, there is one leaf bucket and it is ``type 0``;  however, you may
+give it any name you like (e.g., osd, disk, drive, storage, etc.)::
+
+	#types
+	type {num} {bucket-name}
+
+For example::
+
+	# types
+	type 0 osd
+	type 1 host
+	type 2 chassis
+	type 3 rack
+	type 4 row
+	type 5 pdu
+	type 6 pod
+	type 7 room
+	type 8 datacenter
+	type 9 region
+	type 10 root
+
+
+
+.. _crushmapbuckets:
+
+CRUSH Map Bucket Hierarchy
+--------------------------
+
+The CRUSH algorithm distributes data objects among storage devices according
+to a per-device weight value, approximating a uniform probability distribution.
+CRUSH distributes objects and their replicas according to the hierarchical
+cluster map you define. Your CRUSH map represents the available storage
+devices and the logical elements that contain them.
+
+To map placement groups to OSDs across failure domains, a CRUSH map defines a
+hierarchical list of bucket types (i.e., under ``#types`` in the generated CRUSH
+map). The purpose of creating a bucket hierarchy is to segregate the
+leaf nodes by their failure domains, such as hosts, chassis, racks, power
+distribution units, pods, rows, rooms, and data centers. With the exception of
+the leaf nodes representing OSDs, the rest of the hierarchy is arbitrary, and
+you may define it according to your own needs.
+
+We recommend adapting your CRUSH map to your firms's hardware naming conventions
+and using instances names that reflect the physical hardware. Your naming
+practice can make it easier to administer the cluster and troubleshoot
+problems when an OSD and/or other hardware malfunctions and the administrator
+need access to physical hardware.
+
+In the following example, the bucket hierarchy has a leaf bucket named ``osd``,
+and two node buckets named ``host`` and ``rack`` respectively.
+
+.. ditaa::
+                           +-----------+
+                           | {o}rack   |
+                           |   Bucket  |
+                           +-----+-----+
+                                 |
+                 +---------------+---------------+
+                 |                               |
+           +-----+-----+                   +-----+-----+
+           | {o}host   |                   | {o}host   |
+           |   Bucket  |                   |   Bucket  |
+           +-----+-----+                   +-----+-----+
+                 |                               |
+         +-------+-------+               +-------+-------+
+         |               |               |               |
+   +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
+   |    osd    |   |    osd    |   |    osd    |   |    osd    |
+   |   Bucket  |   |   Bucket  |   |   Bucket  |   |   Bucket  |
+   +-----------+   +-----------+   +-----------+   +-----------+
+
+.. note:: The higher numbered ``rack`` bucket type aggregates the lower
+   numbered ``host`` bucket type.
+
+Since leaf nodes reflect storage devices declared under the ``#devices`` list
+at the beginning of the CRUSH map, you do not need to declare them as bucket
+instances. The second lowest bucket type in your hierarchy usually aggregates
+the devices (i.e., it's usually the computer containing the storage media, and
+uses whatever term you prefer to describe it, such as  "node", "computer",
+"server," "host", "machine", etc.). In high density environments, it is
+increasingly common to see multiple hosts/nodes per chassis. You should account
+for chassis failure too--e.g., the need to pull a chassis if a node fails may
+result in bringing down numerous hosts/nodes and their OSDs.
+
+When declaring a bucket instance, you must specify its type, give it a unique
+name (string), assign it a unique ID expressed as a negative integer (optional),
+specify a weight relative to the total capacity/capability of its item(s),
+specify the bucket algorithm (usually ``straw``), and the hash (usually ``0``,
+reflecting hash algorithm ``rjenkins1``). A bucket may have one or more items.
+The items may consist of node buckets or leaves. Items may have a weight that
+reflects the relative weight of the item.
+
+You may declare a node bucket with the following syntax::
+
+	[bucket-type] [bucket-name] {
+		id [a unique negative numeric ID]
+		weight [the relative capacity/capability of the item(s)]
+		alg [the bucket type: uniform | list | tree | straw ]
+		hash [the hash type: 0 by default]
+		item [item-name] weight [weight]
+	}
+
+For example, using the diagram above, we would define two host buckets
+and one rack bucket. The OSDs are declared as items within the host buckets::
+
+	host node1 {
+		id -1
+		alg straw
+		hash 0
+		item osd.0 weight 1.00
+		item osd.1 weight 1.00
+	}
+
+	host node2 {
+		id -2
+		alg straw
+		hash 0
+		item osd.2 weight 1.00
+		item osd.3 weight 1.00
+	}
+
+	rack rack1 {
+		id -3
+		alg straw
+		hash 0
+		item node1 weight 2.00
+		item node2 weight 2.00
+	}
+
+.. note:: In the foregoing example, note that the rack bucket does not contain
+   any OSDs. Rather it contains lower level host buckets, and includes the
+   sum total of their weight in the item entry.
+
+.. topic:: Bucket Types
+
+   Ceph supports four bucket types, each representing a tradeoff between
+   performance and reorganization efficiency. If you are unsure of which bucket
+   type to use, we recommend using a ``straw`` bucket.  For a detailed
+   discussion of bucket types, refer to
+   `CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
+   and more specifically to **Section 3.4**. The bucket types are:
+
+	#. **Uniform:** Uniform buckets aggregate devices with **exactly** the same
+	   weight. For example, when firms commission or decommission hardware, they
+	   typically do so with many machines that have exactly the same physical
+	   configuration (e.g., bulk purchases). When storage devices have exactly
+	   the same weight, you may use the ``uniform`` bucket type, which allows
+	   CRUSH to map replicas into uniform buckets in constant time. With
+	   non-uniform weights, you should use another bucket algorithm.
+
+	#. **List**: List buckets aggregate their content as linked lists. Based on
+	   the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`P` algorithm,
+	   a list is a natural and intuitive choice for an **expanding cluster**:
+	   either an object is relocated to the newest device with some appropriate
+	   probability, or it remains on the older devices as before. The result is
+	   optimal data migration when items are added to the bucket. Items removed
+	   from the middle or tail of the list, however, can result in a signiﬁcant
+	   amount of unnecessary movement, making list buckets most suitable for
+	   circumstances in which they **never (or very rarely) shrink**.
+
+	#. **Tree**: Tree buckets use a binary search tree. They are more efficient
+	   than list buckets when a bucket contains a larger set of items. Based on
+	   the :abbr:`RUSH (Replication Under Scalable Hashing)` :sub:`R` algorithm,
+	   tree buckets reduce the placement time to O(log :sub:`n`), making them
+	   suitable for managing much larger sets of devices or nested buckets.
+
+	#. **Straw:** List and Tree buckets use a divide and conquer strategy
+	   in a way that either gives certain items precedence (e.g., those
+	   at the beginning of a list) or obviates the need to consider entire
+	   subtrees of items at all. That improves the performance of the replica
+	   placement process, but can also introduce suboptimal reorganization
+	   behavior when the contents of a bucket change due an addition, removal,
+	   or re-weighting of an item. The straw bucket type allows all items to
+	   fairly “compete” against each other for replica placement through a
+	   process analogous to a draw of straws.
+
+.. topic:: Hash
+
+   Each bucket uses a hash algorithm. Currently, Ceph supports ``rjenkins1``.
+   Enter ``0`` as your hash setting to select ``rjenkins1``.
+
+
+.. _weightingbucketitems:
+
+.. topic:: Weighting Bucket Items
+
+   Ceph expresses bucket weights as doubles, which allows for fine
+   weighting. A weight is the relative difference between device capacities. We
+   recommend using ``1.00`` as the relative weight for a 1TB storage device.
+   In such a scenario, a weight of ``0.5`` would represent approximately 500GB,
+   and a weight of ``3.00`` would represent approximately 3TB. Higher level
+   buckets have a weight that is the sum total of the leaf items aggregated by
+   the bucket.
+
+   A bucket item weight is one dimensional, but you may also calculate your
+   item weights to reflect the performance of the storage drive. For example,
+   if you have many 1TB drives where some have relatively low data transfer
+   rate and the others have a relatively high data transfer rate, you may
+   weight them differently, even though they have the same capacity (e.g.,
+   a weight of 0.80 for the first set of drives with lower total throughput,
+   and 1.20 for the second set of drives with higher total throughput).
+
+
+.. _crushmaprules:
+
+CRUSH Map Rules
+---------------
+
+CRUSH maps support the notion of 'CRUSH rules', which are the rules that
+determine data placement for a pool. For large clusters, you will likely create
+many pools where each pool may have its own CRUSH ruleset and rules. The default
+CRUSH map has a rule for each pool, and one ruleset assigned to each of the
+default pools.
+
+.. note:: In most cases, you will not need to modify the default rules. When
+   you create a new pool, its default ruleset is ``0``.
+
+
+CRUSH rules define placement and replication strategies or distribution policies
+that allow you to specify exactly how CRUSH places object replicas. For
+example, you might create a rule selecting a pair of targets for 2-way
+mirroring, another rule for selecting three targets in two different data
+centers for 3-way mirroring, and yet another rule for erasure coding over six
+storage devices. For a detailed discussion of CRUSH rules, refer to
+`CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_,
+and more specifically to **Section 3.2**.
+
+A rule takes the following form::
+
+	rule <rulename> {
+
+		ruleset <ruleset>
+		type [ replicated | erasure ]
+		min_size <min-size>
+		max_size <max-size>
+		step take <bucket-name> [class <device-class>]
+		step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
+		step emit
+	}
+
+
+``ruleset``
+
+:Description: A means of classifying a rule as belonging to a set of rules.
+              Activated by `setting the ruleset in a pool`_.
+
+:Purpose: A component of the rule mask.
+:Type: Integer
+:Required: Yes
+:Default: 0
+
+.. _setting the ruleset in a pool: ../pools#setpoolvalues
+
+
+``type``
+
+:Description: Describes a rule for either a storage drive (replicated)
+              or a RAID.
+
+:Purpose: A component of the rule mask.
+:Type: String
+:Required: Yes
+:Default: ``replicated``
+:Valid Values: Currently only ``replicated`` and ``erasure``
+
+``min_size``
+
+:Description: If a pool makes fewer replicas than this number, CRUSH will
+              **NOT** select this rule.
+
+:Type: Integer
+:Purpose: A component of the rule mask.
+:Required: Yes
+:Default: ``1``
+
+``max_size``
+
+:Description: If a pool makes more replicas than this number, CRUSH will
+              **NOT** select this rule.
+
+:Type: Integer
+:Purpose: A component of the rule mask.
+:Required: Yes
+:Default: 10
+
+
+``step take <bucket-name> [class <device-class>]``
+
+:Description: Takes a bucket name, and begins iterating down the tree.
+              If the ``device-class`` is specified, it must match
+              a class previously used when defining a device. All
+              devices that do not belong to the class are excluded.
+:Purpose: A component of the rule.
+:Required: Yes
+:Example: ``step take data``
+
+
+``step choose firstn {num} type {bucket-type}``
+
+:Description: Selects the number of buckets of the given type. The number is
+              usually the number of replicas in the pool (i.e., pool size).
+
+              - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
+              - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
+              - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
+
+:Purpose: A component of the rule.
+:Prerequisite: Follows ``step take`` or ``step choose``.
+:Example: ``step choose firstn 1 type row``
+
+
+``step chooseleaf firstn {num} type {bucket-type}``
+
+:Description: Selects a set of buckets of ``{bucket-type}`` and chooses a leaf
+              node from the subtree of each bucket in the set of buckets. The
+              number of buckets in the set is usually the number of replicas in
+              the pool (i.e., pool size).
+
+              - If ``{num} == 0``, choose ``pool-num-replicas`` buckets (all available).
+              - If ``{num} > 0 && < pool-num-replicas``, choose that many buckets.
+              - If ``{num} < 0``, it means ``pool-num-replicas - {num}``.
+
+:Purpose: A component of the rule. Usage removes the need to select a device using two steps.
+:Prerequisite: Follows ``step take`` or ``step choose``.
+:Example: ``step chooseleaf firstn 0 type row``
+
+
+
+``step emit``
+
+:Description: Outputs the current value and empties the stack. Typically used
+              at the end of a rule, but may also be used to pick from different
+              trees in the same rule.
+
+:Purpose: A component of the rule.
+:Prerequisite: Follows ``step choose``.
+:Example: ``step emit``
+
+.. important:: To activate one or more rules with a common ruleset number to a
+   pool, set the ruleset number of the pool.
+
+
+Placing Different Pools on Different OSDS:
+==========================================
+
+Suppose you want to have most pools default to OSDs backed by large hard drives,
+but have some pools mapped to OSDs backed by fast solid-state drives (SSDs).
+It's possible to have multiple independent CRUSH hierarchies within the same
+CRUSH map. Define two hierarchies with two different root nodes--one for hard
+disks (e.g., "root platter") and one for SSDs (e.g., "root ssd") as shown
+below::
+
+  device 0 osd.0
+  device 1 osd.1
+  device 2 osd.2
+  device 3 osd.3
+  device 4 osd.4
+  device 5 osd.5
+  device 6 osd.6
+  device 7 osd.7
+
+	host ceph-osd-ssd-server-1 {
+		id -1
+		alg straw
+		hash 0
+		item osd.0 weight 1.00
+		item osd.1 weight 1.00
+	}
+
+	host ceph-osd-ssd-server-2 {
+		id -2
+		alg straw
+		hash 0
+		item osd.2 weight 1.00
+		item osd.3 weight 1.00
+	}
+
+	host ceph-osd-platter-server-1 {
+		id -3
+		alg straw
+		hash 0
+		item osd.4 weight 1.00
+		item osd.5 weight 1.00
+	}
+
+	host ceph-osd-platter-server-2 {
+		id -4
+		alg straw
+		hash 0
+		item osd.6 weight 1.00
+		item osd.7 weight 1.00
+	}
+
+	root platter {
+		id -5
+		alg straw
+		hash 0
+		item ceph-osd-platter-server-1 weight 2.00
+		item ceph-osd-platter-server-2 weight 2.00
+	}
+
+	root ssd {
+		id -6
+		alg straw
+		hash 0
+		item ceph-osd-ssd-server-1 weight 2.00
+		item ceph-osd-ssd-server-2 weight 2.00
+	}
+
+	rule data {
+		ruleset 0
+		type replicated
+		min_size 2
+		max_size 2
+		step take platter
+		step chooseleaf firstn 0 type host
+		step emit
+	}
+
+	rule metadata {
+		ruleset 1
+		type replicated
+		min_size 0
+		max_size 10
+		step take platter
+		step chooseleaf firstn 0 type host
+		step emit
+	}
+
+	rule rbd {
+		ruleset 2
+		type replicated
+		min_size 0
+		max_size 10
+		step take platter
+		step chooseleaf firstn 0 type host
+		step emit
+	}
+
+	rule platter {
+		ruleset 3
+		type replicated
+		min_size 0
+		max_size 10
+		step take platter
+		step chooseleaf firstn 0 type host
+		step emit
+	}
+
+	rule ssd {
+		ruleset 4
+		type replicated
+		min_size 0
+		max_size 4
+		step take ssd
+		step chooseleaf firstn 0 type host
+		step emit
+	}
+
+	rule ssd-primary {
+		ruleset 5
+		type replicated
+		min_size 5
+		max_size 10
+		step take ssd
+		step chooseleaf firstn 1 type host
+		step emit
+		step take platter
+		step chooseleaf firstn -1 type host
+		step emit
+	}
+
+You can then set a pool to use the SSD rule by::
+
+  ceph osd pool set <poolname> crush_ruleset 4
+
+Similarly, using the ``ssd-primary`` rule will cause each placement group in the
+pool to be placed with an SSD as the primary and platters as the replicas.
+
+
+Tuning CRUSH, the hard way
+--------------------------
+
+If you can ensure that all clients are running recent code, you can
+adjust the tunables by extracting the CRUSH map, modifying the values,
+and reinjecting it into the cluster.
+
+* Extract the latest CRUSH map::
+
+	ceph osd getcrushmap -o /tmp/crush
+
+* Adjust tunables.  These values appear to offer the best behavior
+  for both large and small clusters we tested with.  You will need to
+  additionally specify the ``--enable-unsafe-tunables`` argument to
+  ``crushtool`` for this to work.  Please use this option with
+  extreme care.::
+
+	crushtool -i /tmp/crush --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --set-choose-total-tries 50 -o /tmp/crush.new
+
+* Reinject modified map::
+
+	ceph osd setcrushmap -i /tmp/crush.new
+
+Legacy values
+-------------
+
+For reference, the legacy values for the CRUSH tunables can be set
+with::
+
+   crushtool -i /tmp/crush --set-choose-local-tries 2 --set-choose-local-fallback-tries 5 --set-choose-total-tries 19 --set-chooseleaf-descend-once 0 --set-chooseleaf-vary-r 0 -o /tmp/crush.legacy
+
+Again, the special ``--enable-unsafe-tunables`` option is required.
+Further, as noted above, be careful running old versions of the
+``ceph-osd`` daemon after reverting to legacy values as the feature
+bit is not perfectly enforced.
diff --git a/src/ceph/doc/rados/operations/crush-map.rst b/src/ceph/doc/rados/operations/crush-map.rst
new file mode 100644
index 0000000..05fa4ff
--- /dev/null
+++ b/src/ceph/doc/rados/operations/crush-map.rst
@@ -0,0 +1,956 @@
+============
+ CRUSH Maps
+============
+
+The :abbr:`CRUSH (Controlled Replication Under Scalable Hashing)` algorithm
+determines how to store and retrieve data by computing data storage locations.
+CRUSH empowers Ceph clients to communicate with OSDs directly rather than
+through a centralized server or broker. With an algorithmically determined
+method of storing and retrieving data, Ceph avoids a single point of failure, a
+performance bottleneck, and a physical limit to its scalability.
+
+CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly 
+store and retrieve data in OSDs with a uniform distribution of data across the 
+cluster. For a detailed discussion of CRUSH, see 
+`CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data`_
+
+CRUSH maps contain a list of :abbr:`OSDs (Object Storage Devices)`, a list of
+'buckets' for aggregating the devices into physical locations, and a list of
+rules that tell CRUSH how it should replicate data in a Ceph cluster's pools. By
+reflecting the underlying physical organization of the installation, CRUSH can
+model—and thereby address—potential sources of correlated device failures.
+Typical sources include physical proximity, a shared power source, and a shared
+network. By encoding this information into the cluster map, CRUSH placement
+policies can separate object replicas across different failure domains while
+still maintaining the desired distribution. For example, to address the
+possibility of concurrent failures, it may be desirable to ensure that data
+replicas are on devices using different shelves, racks, power supplies,
+controllers, and/or physical locations.
+
+When you deploy OSDs they are automatically placed within the CRUSH map under a
+``host`` node named with the hostname for the host they are running on.  This,
+combined with the default CRUSH failure domain, ensures that replicas or erasure
+code shards are separated across hosts and a single host failure will not
+affect availability.  For larger clusters, however, administrators should carefully consider their choice of failure domain.  Separating replicas across racks,
+for example, is common for mid- to large-sized clusters.
+
+
+CRUSH Location
+==============
+
+The location of an OSD in terms of the CRUSH map's hierarchy is
+referred to as a ``crush location``.  This location specifier takes the
+form of a list of key and value pairs describing a position.  For
+example, if an OSD is in a particular row, rack, chassis and host, and
+is part of the 'default' CRUSH tree (this is the case for the vast
+majority of clusters), its crush location could be described as::
+
+  root=default row=a rack=a2 chassis=a2a host=a2a1
+
+Note:
+
+#. Note that the order of the keys does not matter.
+#. The key name (left of ``=``) must be a valid CRUSH ``type``.  By default
+   these include root, datacenter, room, row, pod, pdu, rack, chassis and host, 
+   but those types can be customized to be anything appropriate by modifying 
+   the CRUSH map.
+#. Not all keys need to be specified.  For example, by default, Ceph
+   automatically sets a ``ceph-osd`` daemon's location to be
+   ``root=default host=HOSTNAME`` (based on the output from ``hostname -s``).
+
+The crush location for an OSD is normally expressed via the ``crush location``
+config option being set in the ``ceph.conf`` file.  Each time the OSD starts,
+it verifies it is in the correct location in the CRUSH map and, if it is not,
+it moved itself.  To disable this automatic CRUSH map management, add the
+following to your configuration file in the ``[osd]`` section::
+
+  osd crush update on start = false
+
+
+Custom location hooks
+---------------------
+
+A customized location hook can be used to generate a more complete
+crush location on startup. The sample ``ceph-crush-location`` utility
+will generate a CRUSH location string for a given daemon.  The
+location is based on, in order of preference:
+
+#. A ``crush location`` option in ceph.conf.
+#. A default of ``root=default host=HOSTNAME`` where the hostname is
+   generated with the ``hostname -s`` command.
+
+This is not useful by itself, as the OSD itself has the exact same
+behavior.  However, the script can be modified to provide additional
+location fields (for example, the rack or datacenter), and then the
+hook enabled via the config option::
+
+  crush location hook = /path/to/customized-ceph-crush-location
+
+This hook is passed several arguments (below) and should output a single line
+to stdout with the CRUSH location description.::
+
+  $ ceph-crush-location --cluster CLUSTER --id ID --type TYPE
+
+where the cluster name is typically 'ceph', the id is the daemon
+identifier (the OSD number), and the daemon type is typically ``osd``.
+
+
+CRUSH structure
+===============
+
+The CRUSH map consists of, loosely speaking, a hierarchy describing
+the physical topology of the cluster, and a set of rules defining
+policy about how we place data on those devices.  The hierarchy has
+devices (``ceph-osd`` daemons) at the leaves, and internal nodes
+corresponding to other physical features or groupings: hosts, racks,
+rows, datacenters, and so on.  The rules describe how replicas are
+placed in terms of that hierarchy (e.g., 'three replicas in different
+racks').
+
+Devices
+-------
+
+Devices are individual ``ceph-osd`` daemons that can store data.  You
+will normally have one defined here for each OSD daemon in your
+cluster.  Devices are identified by an id (a non-negative integer) and
+a name, normally ``osd.N`` where ``N`` is the device id.
+
+Devices may also have a *device class* associated with them (e.g.,
+``hdd`` or ``ssd``), allowing them to be conveniently targetted by a
+crush rule.
+
+Types and Buckets
+-----------------
+
+A bucket is the CRUSH term for internal nodes in the hierarchy: hosts,
+racks, rows, etc.  The CRUSH map defines a series of *types* that are
+used to describe these nodes.  By default, these types include:
+
+- osd (or device)
+- host
+- chassis
+- rack
+- row
+- pdu
+- pod
+- room
+- datacenter
+- region
+- root
+
+Most clusters make use of only a handful of these types, and others
+can be defined as needed.
+
+The hierarchy is built with devices (normally type ``osd``) at the
+leaves, interior nodes with non-device types, and a root node of type
+``root``.  For example,
+
+.. ditaa::
+
+                        +-----------------+
+                        | {o}root default |
+                        +--------+--------+
+                                 |
+                 +---------------+---------------+             
+                 |                               |
+         +-------+-------+                 +-----+-------+
+         | {o}host foo   |                 | {o}host bar |
+         +-------+-------+                 +-----+-------+
+                 |                               | 
+         +-------+-------+               +-------+-------+
+         |               |               |               |
+   +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
+   |  osd.0    |   |   osd.1   |   |   osd.2   |   |   osd.3   |
+   +-----------+   +-----------+   +-----------+   +-----------+
+
+Each node (device or bucket) in the hierarchy has a *weight*
+associated with it, indicating the relative proportion of the total
+data that device or hierarchy subtree should store.  Weights are set
+at the leaves, indicating the size of the device, and automatically
+sum up the tree from there, such that the weight of the default node
+will be the total of all devices contained beneath it.  Normally
+weights are in units of terabytes (TB).
+
+You can get a simple view the CRUSH hierarchy for your cluster,
+including the weights, with::
+
+  ceph osd crush tree
+
+Rules
+-----
+
+Rules define policy about how data is distributed across the devices
+in the hierarchy.
+
+CRUSH rules define placement and replication strategies or
+distribution policies that allow you to specify exactly how CRUSH
+places object replicas. For example, you might create a rule selecting
+a pair of targets for 2-way mirroring, another rule for selecting
+three targets in two different data centers for 3-way mirroring, and
+yet another rule for erasure coding over six storage devices. For a
+detailed discussion of CRUSH rules, refer to `CRUSH - Controlled,
+Scalable, Decentralized Placement of Replicated Data`_, and more
+specifically to **Section 3.2**.
+
+In almost all cases, CRUSH rules can be created via the CLI by
+specifying the *pool type* they will be used for (replicated or
+erasure coded), the *failure domain*, and optionally a *device class*.
+In rare cases rules must be written by hand by manually editing the
+CRUSH map.
+   
+You can see what rules are defined for your cluster with::
+
+  ceph osd crush rule ls
+
+You can view the contents of the rules with::
+
+  ceph osd crush rule dump
+
+Device classes
+--------------
+
+Each device can optionally have a *class* associated with it.  By
+default, OSDs automatically set their class on startup to either
+`hdd`, `ssd`, or `nvme` based on the type of device they are backed
+by.
+
+The device class for one or more OSDs can be explicitly set with::
+
+  ceph osd crush set-device-class <class> <osd-name> [...]
+
+Once a device class is set, it cannot be changed to another class
+until the old class is unset with::
+
+  ceph osd crush rm-device-class <osd-name> [...]
+
+This allows administrators to set device classes without the class
+being changed on OSD restart or by some other script.
+
+A placement rule that targets a specific device class can be created with::
+
+  ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
+
+A pool can then be changed to use the new rule with::
+
+  ceph osd pool set <pool-name> crush_rule <rule-name>
+
+Device classes are implemented by creating a "shadow" CRUSH hierarchy
+for each device class in use that contains only devices of that class.
+Rules can then distribute data over the shadow hierarchy.  One nice
+thing about this approach is that it is fully backward compatible with
+old Ceph clients.  You can view the CRUSH hierarchy with shadow items
+with::
+
+  ceph osd crush tree --show-shadow
+
+
+Weights sets
+------------
+
+A *weight set* is an alternative set of weights to use when
+calculating data placement.  The normal weights associated with each
+device in the CRUSH map are set based on the device size and indicate
+how much data we *should* be storing where.  However, because CRUSH is
+based on a pseudorandom placement process, there is always some
+variation from this ideal distribution, the same way that rolling a
+dice sixty times will not result in rolling exactly 10 ones and 10
+sixes.  Weight sets allow the cluster to do a numerical optimization
+based on the specifics of your cluster (hierarchy, pools, etc.) to achieve
+a balanced distribution.
+
+There are two types of weight sets supported:
+
+ #. A **compat** weight set is a single alternative set of weights for
+    each device and node in the cluster.  This is not well-suited for
+    correcting for all anomalies (for example, placement groups for
+    different pools may be different sizes and have different load
+    levels, but will be mostly treated the same by the balancer).
+    However, compat weight sets have the huge advantage that they are
+    *backward compatible* with previous versions of Ceph, which means
+    that even though weight sets were first introduced in Luminous
+    v12.2.z, older clients (e.g., firefly) can still connect to the
+    cluster when a compat weight set is being used to balance data.
+ #. A **per-pool** weight set is more flexible in that it allows
+    placement to be optimized for each data pool.  Additionally,
+    weights can be adjusted for each position of placement, allowing
+    the optimizer to correct for a suble skew of data toward devices
+    with small weights relative to their peers (and effect that is
+    usually only apparently in very large clusters but which can cause
+    balancing problems).
+
+When weight sets are in use, the weights associated with each node in
+the hierarchy is visible as a separate column (labeled either
+``(compat)`` or the pool name) from the command::
+
+  ceph osd crush tree
+
+When both *compat* and *per-pool* weight sets are in use, data
+placement for a particular pool will use its own per-pool weight set
+if present.  If not, it will use the compat weight set if present.  If
+neither are present, it will use the normal CRUSH weights.
+
+Although weight sets can be set up and manipulated by hand, it is
+recommended that the *balancer* module be enabled to do so
+automatically.
+
+
+Modifying the CRUSH map
+=======================
+
+.. _addosd:
+
+Add/Move an OSD
+---------------
+
+.. note: OSDs are normally automatically added to the CRUSH map when
+         the OSD is created.  This command is rarely needed.
+
+To add or move an OSD in the CRUSH map of a running cluster::
+
+  ceph osd crush set {name} {weight} root={root} [{bucket-type}={bucket-name} ...]
+
+Where:
+
+``name``
+
+:Description: The full name of the OSD. 
+:Type: String
+:Required: Yes
+:Example: ``osd.0``
+
+
+``weight``
+
+:Description: The CRUSH weight for the OSD, normally its size measure in terabytes (TB).
+:Type: Double
+:Required: Yes
+:Example: ``2.0``
+
+
+``root``
+
+:Description: The root node of the tree in which the OSD resides (normally ``default``)
+:Type: Key/value pair.
+:Required: Yes
+:Example: ``root=default``
+
+
+``bucket-type``
+
+:Description: You may specify the OSD's location in the CRUSH hierarchy. 
+:Type: Key/value pairs.
+:Required: No
+:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
+
+
+The following example adds ``osd.0`` to the hierarchy, or moves the
+OSD from a previous location. ::
+
+  ceph osd crush set osd.0 1.0 root=default datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1
+
+
+Adjust OSD weight
+-----------------
+
+.. note: Normally OSDs automatically add themselves to the CRUSH map
+         with the correct weight when they are created. This command
+         is rarely needed.
+
+To adjust an OSD's crush weight in the CRUSH map of a running cluster, execute
+the following::
+
+  ceph osd crush reweight {name} {weight}
+
+Where:
+
+``name``
+
+:Description: The full name of the OSD. 
+:Type: String
+:Required: Yes
+:Example: ``osd.0``
+
+
+``weight``
+
+:Description: The CRUSH weight for the OSD. 
+:Type: Double
+:Required: Yes
+:Example: ``2.0``
+
+
+.. _removeosd:
+
+Remove an OSD
+-------------
+
+.. note: OSDs are normally removed from the CRUSH as part of the
+   ``ceph osd purge`` command.  This command is rarely needed.
+
+To remove an OSD from the CRUSH map of a running cluster, execute the
+following::
+
+  ceph osd crush remove {name}
+
+Where:
+
+``name``
+
+:Description: The full name of the OSD. 
+:Type: String
+:Required: Yes
+:Example: ``osd.0``
+
+
+Add a Bucket
+------------
+
+.. note: Buckets are normally implicitly created when an OSD is added
+   that specifies a ``{bucket-type}={bucket-name}`` as part of its
+   location and a bucket with that name does not already exist.  This
+   command is typically used when manually adjusting the structure of the
+   hierarchy after OSDs have been created (for example, to move a
+   series of hosts underneath a new rack-level bucket).
+
+To add a bucket in the CRUSH map of a running cluster, execute the
+``ceph osd crush add-bucket`` command::
+
+  ceph osd crush add-bucket {bucket-name} {bucket-type}
+
+Where:
+
+``bucket-name``
+
+:Description: The full name of the bucket.
+:Type: String
+:Required: Yes
+:Example: ``rack12``
+
+
+``bucket-type``
+
+:Description: The type of the bucket. The type must already exist in the hierarchy.
+:Type: String
+:Required: Yes
+:Example: ``rack``
+
+
+The following example adds the ``rack12`` bucket to the hierarchy::
+
+  ceph osd crush add-bucket rack12 rack
+
+Move a Bucket
+-------------
+
+To move a bucket to a different location or position in the CRUSH map
+hierarchy, execute the following::
+
+  ceph osd crush move {bucket-name} {bucket-type}={bucket-name}, [...]
+
+Where:
+
+``bucket-name``
+
+:Description: The name of the bucket to move/reposition.
+:Type: String
+:Required: Yes
+:Example: ``foo-bar-1``
+
+``bucket-type``
+
+:Description: You may specify the bucket's location in the CRUSH hierarchy. 
+:Type: Key/value pairs.
+:Required: No
+:Example: ``datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1``
+
+Remove a Bucket
+---------------
+
+To remove a bucket from the CRUSH map hierarchy, execute the following::
+
+  ceph osd crush remove {bucket-name}
+
+.. note:: A bucket must be empty before removing it from the CRUSH hierarchy.
+
+Where:
+
+``bucket-name``
+
+:Description: The name of the bucket that you'd like to remove.
+:Type: String
+:Required: Yes
+:Example: ``rack12``
+
+The following example removes the ``rack12`` bucket from the hierarchy::
+
+  ceph osd crush remove rack12
+
+Creating a compat weight set
+----------------------------
+
+.. note: This step is normally done automatically by the ``balancer``
+   module when enabled.
+
+To create a *compat* weight set::
+
+  ceph osd crush weight-set create-compat
+
+Weights for the compat weight set can be adjusted with::
+
+  ceph osd crush weight-set reweight-compat {name} {weight}
+
+The compat weight set can be destroyed with::
+
+  ceph osd crush weight-set rm-compat
+
+Creating per-pool weight sets
+-----------------------------
+
+To create a weight set for a specific pool,::
+
+  ceph osd crush weight-set create {pool-name} {mode}
+
+.. note:: Per-pool weight sets require that all servers and daemons
+          run Luminous v12.2.z or later.
+
+Where:
+
+``pool-name``
+
+:Description: The name of a RADOS pool
+:Type: String
+:Required: Yes
+:Example: ``rbd``
+
+``mode``
+
+:Description: Either ``flat`` or ``positional``.  A *flat* weight set
+	      has a single weight for each device or bucket.  A
+	      *positional* weight set has a potentially different
+	      weight for each position in the resulting placement
+	      mapping.  For example, if a pool has a replica count of
+	      3, then a positional weight set will have three weights
+	      for each device and bucket.
+:Type: String
+:Required: Yes
+:Example: ``flat``
+
+To adjust the weight of an item in a weight set::
+
+  ceph osd crush weight-set reweight {pool-name} {item-name} {weight [...]}
+
+To list existing weight sets,::
+
+  ceph osd crush weight-set ls
+
+To remove a weight set,::
+
+  ceph osd crush weight-set rm {pool-name}
+
+Creating a rule for a replicated pool
+-------------------------------------
+
+For a replicated pool, the primary decision when creating the CRUSH
+rule is what the failure domain is going to be.  For example, if a
+failure domain of ``host`` is selected, then CRUSH will ensure that
+each replica of the data is stored on a different host.  If ``rack``
+is selected, then each replica will be stored in a different rack.
+What failure domain you choose primarily depends on the size of your
+cluster and how your hierarchy is structured.
+
+Normally, the entire cluster hierarchy is nested beneath a root node
+named ``default``.  If you have customized your hierarchy, you may
+want to create a rule nested at some other node in the hierarchy.  It
+doesn't matter what type is associated with that node (it doesn't have
+to be a ``root`` node).
+
+It is also possible to create a rule that restricts data placement to
+a specific *class* of device.  By default, Ceph OSDs automatically
+classify themselves as either ``hdd`` or ``ssd``, depending on the
+underlying type of device being used.  These classes can also be
+customized.
+
+To create a replicated rule,::
+
+  ceph osd crush rule create-replicated {name} {root} {failure-domain-type} [{class}]
+
+Where:
+
+``name``
+
+:Description: The name of the rule
+:Type: String
+:Required: Yes
+:Example: ``rbd-rule``
+
+``root``
+
+:Description: The name of the node under which data should be placed.
+:Type: String
+:Required: Yes
+:Example: ``default``
+
+``failure-domain-type``
+
+:Description: The type of CRUSH nodes across which we should separate replicas.
+:Type: String
+:Required: Yes
+:Example: ``rack``
+
+``class``
+
+:Description: The device class data should be placed on.
+:Type: String
+:Required: No
+:Example: ``ssd``
+
+Creating a rule for an erasure coded pool
+-----------------------------------------
+
+For an erasure-coded pool, the same basic decisions need to be made as
+with a replicated pool: what is the failure domain, what node in the
+hierarchy will data be placed under (usually ``default``), and will
+placement be restricted to a specific device class.  Erasure code
+pools are created a bit differently, however, because they need to be
+constructed carefully based on the erasure code being used.  For this reason,
+you must include this information in the *erasure code profile*.  A CRUSH
+rule will then be created from that either explicitly or automatically when
+the profile is used to create a pool.
+
+The erasure code profiles can be listed with::
+
+  ceph osd erasure-code-profile ls
+
+An existing profile can be viewed with::
+
+  ceph osd erasure-code-profile get {profile-name}
+
+Normally profiles should never be modified; instead, a new profile
+should be created and used when creating a new pool or creating a new
+rule for an existing pool.
+
+An erasure code profile consists of a set of key=value pairs.  Most of
+these control the behavior of the erasure code that is encoding data
+in the pool.  Those that begin with ``crush-``, however, affect the
+CRUSH rule that is created.
+
+The erasure code profile properties of interest are:
+
+ * **crush-root**: the name of the CRUSH node to place data under [default: ``default``].
+ * **crush-failure-domain**: the CRUSH type to separate erasure-coded shards across [default: ``host``].
+ * **crush-device-class**: the device class to place data on [default: none, meaning all devices are used].
+ * **k** and **m** (and, for the ``lrc`` plugin, **l**): these determine the number of erasure code shards, affecting the resulting CRUSH rule.
+
+Once a profile is defined, you can create a CRUSH rule with::
+
+  ceph osd crush rule create-erasure {name} {profile-name}
+
+.. note: When creating a new pool, it is not actually necessary to
+   explicitly create the rule.  If the erasure code profile alone is
+   specified and the rule argument is left off then Ceph will create
+   the CRUSH rule automatically.
+
+Deleting rules
+--------------
+
+Rules that are not in use by pools can be deleted with::
+
+  ceph osd crush rule rm {rule-name}
+
+
+Tunables
+========
+
+Over time, we have made (and continue to make) improvements to the
+CRUSH algorithm used to calculate the placement of data.  In order to
+support the change in behavior, we have introduced a series of tunable
+options that control whether the legacy or improved variation of the
+algorithm is used.
+
+In order to use newer tunables, both clients and servers must support
+the new version of CRUSH.  For this reason, we have created
+``profiles`` that are named after the Ceph version in which they were
+introduced.  For example, the ``firefly`` tunables are first supported
+in the firefly release, and will not work with older (e.g., dumpling)
+clients.  Once a given set of tunables are changed from the legacy
+default behavior, the ``ceph-mon`` and ``ceph-osd`` will prevent older
+clients who do not support the new CRUSH features from connecting to
+the cluster.
+
+argonaut (legacy)
+-----------------
+
+The legacy CRUSH behavior used by argonaut and older releases works
+fine for most clusters, provided there are not too many OSDs that have
+been marked out.
+
+bobtail (CRUSH_TUNABLES2)
+-------------------------
+
+The bobtail tunable profile fixes a few key misbehaviors:
+
+ * For hierarchies with a small number of devices in the leaf buckets,
+   some PGs map to fewer than the desired number of replicas.  This
+   commonly happens for hierarchies with "host" nodes with a small
+   number (1-3) of OSDs nested beneath each one.
+
+ * For large clusters, some small percentages of PGs map to less than
+   the desired number of OSDs.  This is more prevalent when there are
+   several layers of the hierarchy (e.g., row, rack, host, osd).
+
+ * When some OSDs are marked out, the data tends to get redistributed
+   to nearby OSDs instead of across the entire hierarchy.
+
+The new tunables are:
+
+ * ``choose_local_tries``: Number of local retries.  Legacy value is
+   2, optimal value is 0.
+
+ * ``choose_local_fallback_tries``: Legacy value is 5, optimal value
+   is 0.
+
+ * ``choose_total_tries``: Total number of attempts to choose an item.
+   Legacy value was 19, subsequent testing indicates that a value of
+   50 is more appropriate for typical clusters.  For extremely large
+   clusters, a larger value might be necessary.
+
+ * ``chooseleaf_descend_once``: Whether a recursive chooseleaf attempt
+   will retry, or only try once and allow the original placement to
+   retry.  Legacy default is 0, optimal value is 1.
+
+Migration impact:
+
+ * Moving from argonaut to bobtail tunables triggers a moderate amount
+   of data movement.  Use caution on a cluster that is already
+   populated with data.
+
+firefly (CRUSH_TUNABLES3)
+-------------------------
+
+The firefly tunable profile fixes a problem
+with the ``chooseleaf`` CRUSH rule behavior that tends to result in PG
+mappings with too few results when too many OSDs have been marked out.
+
+The new tunable is:
+
+ * ``chooseleaf_vary_r``: Whether a recursive chooseleaf attempt will
+   start with a non-zero value of r, based on how many attempts the
+   parent has already made.  Legacy default is 0, but with this value
+   CRUSH is sometimes unable to find a mapping.  The optimal value (in
+   terms of computational cost and correctness) is 1.
+
+Migration impact: 
+
+ * For existing clusters that have lots of existing data, changing
+   from 0 to 1 will cause a lot of data to move; a value of 4 or 5
+   will allow CRUSH to find a valid mapping but will make less data
+   move.
+
+straw_calc_version tunable (introduced with Firefly too)
+--------------------------------------------------------
+
+There were some problems with the internal weights calculated and
+stored in the CRUSH map for ``straw`` buckets.  Specifically, when
+there were items with a CRUSH weight of 0 or both a mix of weights and
+some duplicated weights CRUSH would distribute data incorrectly (i.e.,
+not in proportion to the weights).
+
+The new tunable is:
+
+ * ``straw_calc_version``: A value of 0 preserves the old, broken
+   internal weight calculation; a value of 1 fixes the behavior.
+
+Migration impact:
+
+ * Moving to straw_calc_version 1 and then adjusting a straw bucket
+   (by adding, removing, or reweighting an item, or by using the
+   reweight-all command) can trigger a small to moderate amount of
+   data movement *if* the cluster has hit one of the problematic
+   conditions.
+
+This tunable option is special because it has absolutely no impact
+concerning the required kernel version in the client side.
+
+hammer (CRUSH_V4)
+-----------------
+
+The hammer tunable profile does not affect the
+mapping of existing CRUSH maps simply by changing the profile.  However:
+
+ * There is a new bucket type (``straw2``) supported.  The new
+   ``straw2`` bucket type fixes several limitations in the original
+   ``straw`` bucket.  Specifically, the old ``straw`` buckets would
+   change some mappings that should have changed when a weight was
+   adjusted, while ``straw2`` achieves the original goal of only
+   changing mappings to or from the bucket item whose weight has
+   changed.
+
+ * ``straw2`` is the default for any newly created buckets.
+
+Migration impact:
+
+ * Changing a bucket type from ``straw`` to ``straw2`` will result in
+   a reasonably small amount of data movement, depending on how much
+   the bucket item weights vary from each other.  When the weights are
+   all the same no data will move, and when item weights vary
+   significantly there will be more movement.
+
+jewel (CRUSH_TUNABLES5)
+-----------------------
+
+The jewel tunable profile improves the
+overall behavior of CRUSH such that significantly fewer mappings
+change when an OSD is marked out of the cluster.
+
+The new tunable is:
+
+ * ``chooseleaf_stable``: Whether a recursive chooseleaf attempt will
+   use a better value for an inner loop that greatly reduces the number
+   of mapping changes when an OSD is marked out.  The legacy value is 0,
+   while the new value of 1 uses the new approach.
+
+Migration impact:
+
+ * Changing this value on an existing cluster will result in a very
+   large amount of data movement as almost every PG mapping is likely
+   to change.
+
+
+
+
+Which client versions support CRUSH_TUNABLES
+--------------------------------------------
+
+ * argonaut series, v0.48.1 or later
+ * v0.49 or later
+ * Linux kernel version v3.6 or later (for the file system and RBD kernel clients)
+
+Which client versions support CRUSH_TUNABLES2
+---------------------------------------------
+
+ * v0.55 or later, including bobtail series (v0.56.x)
+ * Linux kernel version v3.9 or later (for the file system and RBD kernel clients)
+
+Which client versions support CRUSH_TUNABLES3
+---------------------------------------------
+
+ * v0.78 (firefly) or later
+ * Linux kernel version v3.15 or later (for the file system and RBD kernel clients)
+
+Which client versions support CRUSH_V4
+--------------------------------------
+
+ * v0.94 (hammer) or later
+ * Linux kernel version v4.1 or later (for the file system and RBD kernel clients)
+
+Which client versions support CRUSH_TUNABLES5
+---------------------------------------------
+
+ * v10.0.2 (jewel) or later
+ * Linux kernel version v4.5 or later (for the file system and RBD kernel clients)
+
+Warning when tunables are non-optimal
+-------------------------------------
+
+Starting with version v0.74, Ceph will issue a health warning if the
+current CRUSH tunables don't include all the optimal values from the
+``default`` profile (see below for the meaning of the ``default`` profile).
+To make this warning go away, you have two options:
+
+1. Adjust the tunables on the existing cluster.  Note that this will
+   result in some data movement (possibly as much as 10%).  This is the
+   preferred route, but should be taken with care on a production cluster
+   where the data movement may affect performance.  You can enable optimal
+   tunables with::
+
+      ceph osd crush tunables optimal
+
+   If things go poorly (e.g., too much load) and not very much
+   progress has been made, or there is a client compatibility problem
+   (old kernel cephfs or rbd clients, or pre-bobtail librados
+   clients), you can switch back with::
+
+      ceph osd crush tunables legacy
+
+2. You can make the warning go away without making any changes to CRUSH by
+   adding the following option to your ceph.conf ``[mon]`` section::
+
+      mon warn on legacy crush tunables = false
+
+   For the change to take effect, you will need to restart the monitors, or
+   apply the option to running monitors with::
+
+      ceph tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables
+
+
+A few important points
+----------------------
+
+ * Adjusting these values will result in the shift of some PGs between
+   storage nodes.  If the Ceph cluster is already storing a lot of
+   data, be prepared for some fraction of the data to move.
+ * The ``ceph-osd`` and ``ceph-mon`` daemons will start requiring the
+   feature bits of new connections as soon as they get
+   the updated map.  However, already-connected clients are
+   effectively grandfathered in, and will misbehave if they do not
+   support the new feature.
+ * If the CRUSH tunables are set to non-legacy values and then later
+   changed back to the defult values, ``ceph-osd`` daemons will not be
+   required to support the feature.  However, the OSD peering process
+   requires examining and understanding old maps.  Therefore, you
+   should not run old versions of the ``ceph-osd`` daemon
+   if the cluster has previously used non-legacy CRUSH values, even if
+   the latest version of the map has been switched back to using the
+   legacy defaults.
+
+Tuning CRUSH
+------------
+
+The simplest way to adjust the crush tunables is by changing to a known
+profile.  Those are:
+
+ * ``legacy``: the legacy behavior from argonaut and earlier.
+ * ``argonaut``: the legacy values supported by the original argonaut release
+ * ``bobtail``: the values supported by the bobtail release
+ * ``firefly``: the values supported by the firefly release
+ * ``hammer``: the values supported by the hammer release
+ * ``jewel``: the values supported by the jewel release
+ * ``optimal``: the best (ie optimal) values of the current version of Ceph
+ * ``default``: the default values of a new cluster installed from
+   scratch. These values, which depend on the current version of Ceph,
+   are hard coded and are generally a mix of optimal and legacy values.
+   These values generally match the ``optimal`` profile of the previous
+   LTS release, or the most recent release for which we generally except
+   more users to have up to date clients for.
+
+You can select a profile on a running cluster with the command::
+
+ ceph osd crush tunables {PROFILE}
+
+Note that this may result in some data movement.
+
+
+.. _CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data: https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
+
+
+Primary Affinity
+================
+
+When a Ceph Client reads or writes data, it always contacts the primary OSD in
+the acting set. For set ``[2, 3, 4]``, ``osd.2`` is the primary. Sometimes an
+OSD is not well suited to act as a primary compared to other OSDs (e.g., it has
+a slow disk or a slow controller). To prevent performance bottlenecks
+(especially on read operations) while maximizing utilization of your hardware,
+you can set a Ceph OSD's primary affinity so that CRUSH is less likely to use
+the OSD as a primary in an acting set. ::
+
+	ceph osd primary-affinity <osd-id> <weight>
+
+Primary affinity is ``1`` by default (*i.e.,* an OSD may act as a primary). You
+may set the OSD primary range from ``0-1``, where ``0`` means that the OSD may
+**NOT** be used as a primary and ``1`` means that an OSD may be used as a
+primary.  When the weight is ``< 1``, it is less likely that CRUSH will select
+the Ceph OSD Daemon to act as a primary.
+
+
+
diff --git a/src/ceph/doc/rados/operations/data-placement.rst b/src/ceph/doc/rados/operations/data-placement.rst
new file mode 100644
index 0000000..27966b0
--- /dev/null
+++ b/src/ceph/doc/rados/operations/data-placement.rst
@@ -0,0 +1,37 @@
+=========================
+ Data Placement Overview
+=========================
+
+Ceph stores, replicates and rebalances data objects across a RADOS cluster
+dynamically.  With many different users storing objects in different pools for
+different purposes on countless OSDs, Ceph operations require some data
+placement planning.  The main data placement planning concepts in Ceph include:
+
+- **Pools:** Ceph stores data within pools, which are logical groups for storing
+  objects. Pools manage the number of placement groups, the number of replicas,
+  and the ruleset for the pool. To store data in a pool, you must have
+  an authenticated user with permissions for the pool. Ceph can snapshot pools.
+  See `Pools`_ for additional details.
+
+- **Placement Groups:** Ceph maps objects to placement groups (PGs).
+  Placement groups (PGs) are shards or fragments of a logical object pool
+  that place objects as a group into OSDs. Placement groups reduce the amount
+  of per-object metadata when Ceph stores the data in OSDs. A larger number of
+  placement groups (e.g., 100 per OSD) leads to better balancing. See
+  `Placement Groups`_ for additional details.
+
+- **CRUSH Maps:**  CRUSH is a big part of what allows Ceph to scale without
+  performance bottlenecks, without limitations to scalability, and without a
+  single point of failure. CRUSH maps provide the physical topology of the
+  cluster to the CRUSH algorithm to determine where the data for an object
+  and its replicas should be stored, and how to do so across failure domains
+  for added data safety among other things. See `CRUSH Maps`_ for additional
+  details.
+
+When you initially set up a test cluster, you can use the default values. Once
+you begin planning for a large Ceph cluster, refer to pools, placement groups
+and CRUSH for data placement operations.
+
+.. _Pools: ../pools
+.. _Placement Groups: ../placement-groups
+.. _CRUSH Maps: ../crush-map
diff --git a/src/ceph/doc/rados/operations/erasure-code-isa.rst b/src/ceph/doc/rados/operations/erasure-code-isa.rst
new file mode 100644
index 0000000..b52933a
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code-isa.rst
@@ -0,0 +1,105 @@
+=======================
+ISA erasure code plugin
+=======================
+
+The *isa* plugin encapsulates the `ISA
+<https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version/>`_
+library. It only runs on Intel processors.
+
+Create an isa profile
+=====================
+
+To create a new *isa* erasure code profile::
+
+        ceph osd erasure-code-profile set {name} \
+             plugin=isa \
+             technique={reed_sol_van|cauchy} \
+             [k={data-chunks}] \
+             [m={coding-chunks}] \
+             [crush-root={root}] \
+             [crush-failure-domain={bucket-type}] \
+             [crush-device-class={device-class}] \
+             [directory={directory}] \
+             [--force]
+
+Where:
+
+``k={data chunks}``
+
+:Description: Each object is split in **data-chunks** parts,
+              each stored on a different OSD.
+
+:Type: Integer
+:Required: No.
+:Default: 7
+
+``m={coding-chunks}``
+
+:Description: Compute **coding chunks** for each object and store them
+              on different OSDs. The number of coding chunks is also
+              the number of OSDs that can be down without losing data.
+
+:Type: Integer
+:Required: No.
+:Default: 3
+
+``technique={reed_sol_van|cauchy}``
+
+:Description: The ISA plugin comes in two `Reed Solomon
+              <https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction>`_
+              forms. If *reed_sol_van* is set, it is `Vandermonde
+              <https://en.wikipedia.org/wiki/Vandermonde_matrix>`_, if
+              *cauchy* is set, it is `Cauchy
+              <https://en.wikipedia.org/wiki/Cauchy_matrix>`_.
+
+:Type: String
+:Required: No.
+:Default: reed_sol_van
+
+``crush-root={root}``
+
+:Description: The name of the crush bucket used for the first step of
+              the ruleset. For intance **step take default**.
+
+:Type: String
+:Required: No.
+:Default: default
+
+``crush-failure-domain={bucket-type}``
+
+:Description: Ensure that no two chunks are in a bucket with the same
+              failure domain. For instance, if the failure domain is
+              **host** no two chunks will be stored on the same
+              host. It is used to create a ruleset step such as **step
+              chooseleaf host**.
+
+:Type: String
+:Required: No.
+:Default: host
+
+``crush-device-class={device-class}``
+
+:Description: Restrict placement to devices of a specific class (e.g.,
+              ``ssd`` or ``hdd``), using the crush device class names
+              in the CRUSH map.
+
+:Type: String
+:Required: No.
+:Default:
+
+``directory={directory}``
+
+:Description: Set the **directory** name from which the erasure code
+              plugin is loaded.
+
+:Type: String
+:Required: No.
+:Default: /usr/lib/ceph/erasure-code
+
+``--force``
+
+:Description: Override an existing profile by the same name.
+
+:Type: String
+:Required: No.
+
diff --git a/src/ceph/doc/rados/operations/erasure-code-jerasure.rst b/src/ceph/doc/rados/operations/erasure-code-jerasure.rst
new file mode 100644
index 0000000..e8da097
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code-jerasure.rst
@@ -0,0 +1,120 @@
+============================
+Jerasure erasure code plugin
+============================
+
+The *jerasure* plugin is the most generic and flexible plugin, it is
+also the default for Ceph erasure coded pools. 
+
+The *jerasure* plugin encapsulates the `Jerasure
+<http://jerasure.org>`_ library. It is
+recommended to read the *jerasure* documentation to get a better
+understanding of the parameters.
+
+Create a jerasure profile
+=========================
+
+To create a new *jerasure* erasure code profile::
+
+        ceph osd erasure-code-profile set {name} \
+             plugin=jerasure \
+             k={data-chunks} \
+             m={coding-chunks} \
+             technique={reed_sol_van|reed_sol_r6_op|cauchy_orig|cauchy_good|liberation|blaum_roth|liber8tion} \
+             [crush-root={root}] \
+             [crush-failure-domain={bucket-type}] \
+             [crush-device-class={device-class}] \
+             [directory={directory}] \
+             [--force]
+
+Where:
+
+``k={data chunks}``
+
+:Description: Each object is split in **data-chunks** parts,
+              each stored on a different OSD.
+
+:Type: Integer
+:Required: Yes.
+:Example: 4
+
+``m={coding-chunks}``
+
+:Description: Compute **coding chunks** for each object and store them
+              on different OSDs. The number of coding chunks is also
+              the number of OSDs that can be down without losing data.
+
+:Type: Integer
+:Required: Yes.
+:Example: 2
+
+``technique={reed_sol_van|reed_sol_r6_op|cauchy_orig|cauchy_good|liberation|blaum_roth|liber8tion}``
+
+:Description: The more flexible technique is *reed_sol_van* : it is
+              enough to set *k* and *m*. The *cauchy_good* technique
+              can be faster but you need to chose the *packetsize*
+              carefully. All of *reed_sol_r6_op*, *liberation*,
+              *blaum_roth*, *liber8tion* are *RAID6* equivalents in
+              the sense that they can only be configured with *m=2*. 
+
+:Type: String
+:Required: No.
+:Default: reed_sol_van
+
+``packetsize={bytes}``
+
+:Description: The encoding will be done on packets of *bytes* size at
+              a time. Chosing the right packet size is difficult. The
+              *jerasure* documentation contains extensive information
+              on this topic.
+
+:Type: Integer
+:Required: No.
+:Default: 2048
+
+``crush-root={root}``
+
+:Description: The name of the crush bucket used for the first step of
+              the ruleset. For intance **step take default**.
+
+:Type: String
+:Required: No.
+:Default: default
+
+``crush-failure-domain={bucket-type}``
+
+:Description: Ensure that no two chunks are in a bucket with the same
+              failure domain. For instance, if the failure domain is
+              **host** no two chunks will be stored on the same
+              host. It is used to create a ruleset step such as **step
+              chooseleaf host**.
+
+:Type: String
+:Required: No.
+:Default: host
+
+``crush-device-class={device-class}``
+
+:Description: Restrict placement to devices of a specific class (e.g.,
+              ``ssd`` or ``hdd``), using the crush device class names
+              in the CRUSH map.
+
+:Type: String
+:Required: No.
+:Default:
+
+ ``directory={directory}``
+
+:Description: Set the **directory** name from which the erasure code
+              plugin is loaded.
+
+:Type: String
+:Required: No.
+:Default: /usr/lib/ceph/erasure-code
+
+``--force``
+
+:Description: Override an existing profile by the same name.
+
+:Type: String
+:Required: No.
+
diff --git a/src/ceph/doc/rados/operations/erasure-code-lrc.rst b/src/ceph/doc/rados/operations/erasure-code-lrc.rst
new file mode 100644
index 0000000..447ce23
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code-lrc.rst
@@ -0,0 +1,371 @@
+======================================
+Locally repairable erasure code plugin
+======================================
+
+With the *jerasure* plugin, when an erasure coded object is stored on
+multiple OSDs, recovering from the loss of one OSD requires reading
+from all the others. For instance if *jerasure* is configured with
+*k=8* and *m=4*, losing one OSD requires reading from the eleven
+others to repair.
+
+The *lrc* erasure code plugin creates local parity chunks to be able
+to recover using less OSDs. For instance if *lrc* is configured with
+*k=8*, *m=4* and *l=4*, it will create an additional parity chunk for
+every four OSDs. When a single OSD is lost, it can be recovered with
+only four OSDs instead of eleven.
+
+Erasure code profile examples
+=============================
+
+Reduce recovery bandwidth between hosts
+---------------------------------------
+
+Although it is probably not an interesting use case when all hosts are
+connected to the same switch, reduced bandwidth usage can actually be
+observed.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             k=4 m=2 l=3 \
+             crush-failure-domain=host
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+
+Reduce recovery bandwidth between racks
+---------------------------------------
+
+In Firefly the reduced bandwidth will only be observed if the primary
+OSD is in the same rack as the lost chunk.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             k=4 m=2 l=3 \
+             crush-locality=rack \
+             crush-failure-domain=host
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+
+Create an lrc profile
+=====================
+
+To create a new lrc erasure code profile::
+
+        ceph osd erasure-code-profile set {name} \
+             plugin=lrc \
+             k={data-chunks} \
+             m={coding-chunks} \
+             l={locality} \
+             [crush-root={root}] \
+             [crush-locality={bucket-type}] \
+             [crush-failure-domain={bucket-type}] \
+             [crush-device-class={device-class}] \
+             [directory={directory}] \
+             [--force]
+
+Where:
+
+``k={data chunks}``
+
+:Description: Each object is split in **data-chunks** parts,
+              each stored on a different OSD.
+
+:Type: Integer
+:Required: Yes.
+:Example: 4
+
+``m={coding-chunks}``
+
+:Description: Compute **coding chunks** for each object and store them
+              on different OSDs. The number of coding chunks is also
+              the number of OSDs that can be down without losing data.
+
+:Type: Integer
+:Required: Yes.
+:Example: 2
+
+``l={locality}``
+
+:Description: Group the coding and data chunks into sets of size
+              **locality**. For instance, for **k=4** and **m=2**,
+              when **locality=3** two groups of three are created.
+              Each set can be recovered without reading chunks
+              from another set.
+
+:Type: Integer
+:Required: Yes.
+:Example: 3
+
+``crush-root={root}``
+
+:Description: The name of the crush bucket used for the first step of
+              the ruleset. For intance **step take default**.
+
+:Type: String
+:Required: No.
+:Default: default
+
+``crush-locality={bucket-type}``
+
+:Description: The type of the crush bucket in which each set of chunks
+              defined by **l** will be stored. For instance, if it is
+              set to **rack**, each group of **l** chunks will be
+              placed in a different rack. It is used to create a
+              ruleset step such as **step choose rack**. If it is not
+              set, no such grouping is done.
+
+:Type: String
+:Required: No.
+
+``crush-failure-domain={bucket-type}``
+
+:Description: Ensure that no two chunks are in a bucket with the same
+              failure domain. For instance, if the failure domain is
+              **host** no two chunks will be stored on the same
+              host. It is used to create a ruleset step such as **step
+              chooseleaf host**.
+
+:Type: String
+:Required: No.
+:Default: host
+
+``crush-device-class={device-class}``
+
+:Description: Restrict placement to devices of a specific class (e.g.,
+              ``ssd`` or ``hdd``), using the crush device class names
+              in the CRUSH map.
+
+:Type: String
+:Required: No.
+:Default:
+
+``directory={directory}``
+
+:Description: Set the **directory** name from which the erasure code
+              plugin is loaded.
+
+:Type: String
+:Required: No.
+:Default: /usr/lib/ceph/erasure-code
+
+``--force``
+
+:Description: Override an existing profile by the same name.
+
+:Type: String
+:Required: No.
+
+Low level plugin configuration
+==============================
+
+The sum of **k** and **m** must be a multiple of the **l** parameter.
+The low level configuration parameters do not impose such a
+restriction and it may be more convienient to use it for specific
+purposes. It is for instance possible to define two groups, one with 4
+chunks and another with 3 chunks. It is also possible to recursively
+define locality sets, for instance datacenters and racks into
+datacenters. The **k/m/l** are implemented by generating a low level
+configuration.
+
+The *lrc* erasure code plugin recursively applies erasure code
+techniques so that recovering from the loss of some chunks only
+requires a subset of the available chunks, most of the time.
+
+For instance, when three coding steps are described as::
+
+   chunk nr    01234567
+   step 1      _cDD_cDD
+   step 2      cDDD____
+   step 3      ____cDDD
+
+where *c* are coding chunks calculated from the data chunks *D*, the
+loss of chunk *7* can be recovered with the last four chunks. And the
+loss of chunk *2* chunk can be recovered with the first four
+chunks.
+
+Erasure code profile examples using low level configuration
+===========================================================
+
+Minimal testing
+---------------
+
+It is strictly equivalent to using the default erasure code profile. The *DD*
+implies *K=2*, the *c* implies *M=1* and the *jerasure* plugin is used
+by default.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             mapping=DD_ \
+             layers='[ [ "DDc", "" ] ]'
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+Reduce recovery bandwidth between hosts
+---------------------------------------
+
+Although it is probably not an interesting use case when all hosts are
+connected to the same switch, reduced bandwidth usage can actually be
+observed. It is equivalent to **k=4**, **m=2** and **l=3** although
+the layout of the chunks is different::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             mapping=__DD__DD \
+             layers='[
+                       [ "_cDD_cDD", "" ],
+                       [ "cDDD____", "" ],
+                       [ "____cDDD", "" ],
+                     ]'
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+
+Reduce recovery bandwidth between racks
+---------------------------------------
+
+In Firefly the reduced bandwidth will only be observed if the primary
+OSD is in the same rack as the lost chunk.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             mapping=__DD__DD \
+             layers='[
+                       [ "_cDD_cDD", "" ],
+                       [ "cDDD____", "" ],
+                       [ "____cDDD", "" ],
+                     ]' \
+             crush-steps='[
+                             [ "choose", "rack", 2 ],
+                             [ "chooseleaf", "host", 4 ],
+                            ]'
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+Testing with different Erasure Code backends
+--------------------------------------------
+
+LRC now uses jerasure as the default EC backend. It is possible to
+specify the EC backend/algorithm on a per layer basis using the low
+level configuration. The second argument in layers='[ [ "DDc", "" ] ]'
+is actually an erasure code profile to be used for this level. The
+example below specifies the ISA backend with the cauchy technique to
+be used in the lrcpool.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             mapping=DD_ \
+             layers='[ [ "DDc", "plugin=isa technique=cauchy" ] ]'
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+You could also use a different erasure code profile for for each
+layer.::
+
+        $ ceph osd erasure-code-profile set LRCprofile \
+             plugin=lrc \
+             mapping=__DD__DD \
+             layers='[
+                       [ "_cDD_cDD", "plugin=isa technique=cauchy" ],
+                       [ "cDDD____", "plugin=isa" ],
+                       [ "____cDDD", "plugin=jerasure" ],
+                     ]'
+        $ ceph osd pool create lrcpool 12 12 erasure LRCprofile
+
+
+
+Erasure coding and decoding algorithm
+=====================================
+
+The steps found in the layers description::
+
+   chunk nr    01234567
+
+   step 1      _cDD_cDD
+   step 2      cDDD____
+   step 3      ____cDDD
+
+are applied in order. For instance, if a 4K object is encoded, it will
+first go thru *step 1* and be divided in four 1K chunks (the four
+uppercase D). They are stored in the chunks 2, 3, 6 and 7, in
+order. From these, two coding chunks are calculated (the two lowercase
+c). The coding chunks are stored in the chunks 1 and 5, respectively.
+
+The *step 2* re-uses the content created by *step 1* in a similar
+fashion and stores a single coding chunk *c* at position 0. The last four
+chunks, marked with an underscore (*_*) for readability, are ignored.
+
+The *step 3* stores a single coding chunk *c* at position 4. The three
+chunks created by *step 1* are used to compute this coding chunk,
+i.e. the coding chunk from *step 1* becomes a data chunk in *step 3*.
+
+If chunk *2* is lost::
+
+   chunk nr    01234567
+
+   step 1      _c D_cDD
+   step 2      cD D____
+   step 3      __ _cDDD
+
+decoding will attempt to recover it by walking the steps in reverse
+order: *step 3* then *step 2* and finally *step 1*.
+
+The *step 3* knows nothing about chunk *2* (i.e. it is an underscore)
+and is skipped.
+
+The coding chunk from *step 2*, stored in chunk *0*, allows it to
+recover the content of chunk *2*. There are no more chunks to recover
+and the process stops, without considering *step 1*.
+
+Recovering chunk *2* requires reading chunks *0, 1, 3* and writing
+back chunk *2*.
+
+If chunk *2, 3, 6* are lost::
+
+   chunk nr    01234567
+
+   step 1      _c  _c D
+   step 2      cD  __ _
+   step 3      __  cD D
+
+The *step 3* can recover the content of chunk *6*::
+
+   chunk nr    01234567
+
+   step 1      _c  _cDD
+   step 2      cD  ____
+   step 3      __  cDDD
+
+The *step 2* fails to recover and is skipped because there are two
+chunks missing (*2, 3*) and it can only recover from one missing
+chunk.
+
+The coding chunk from *step 1*, stored in chunk *1, 5*, allows it to
+recover the content of chunk *2, 3*::
+
+   chunk nr    01234567
+
+   step 1      _cDD_cDD
+   step 2      cDDD____
+   step 3      ____cDDD
+
+Controlling crush placement
+===========================
+
+The default crush ruleset provides OSDs that are on different hosts. For instance::
+
+   chunk nr    01234567
+
+   step 1      _cDD_cDD
+   step 2      cDDD____
+   step 3      ____cDDD
+
+needs exactly *8* OSDs, one for each chunk. If the hosts are in two
+adjacent racks, the first four chunks can be placed in the first rack
+and the last four in the second rack. So that recovering from the loss
+of a single OSD does not require using bandwidth between the two
+racks.
+
+For instance::
+
+   crush-steps='[ [ "choose", "rack", 2 ], [ "chooseleaf", "host", 4 ] ]'
+
+will create a ruleset that will select two crush buckets of type
+*rack* and for each of them choose four OSDs, each of them located in
+different buckets of type *host*.
+
+The ruleset can also be manually crafted for finer control.
diff --git a/src/ceph/doc/rados/operations/erasure-code-profile.rst b/src/ceph/doc/rados/operations/erasure-code-profile.rst
new file mode 100644
index 0000000..ddf772d
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code-profile.rst
@@ -0,0 +1,121 @@
+=====================
+Erasure code profiles
+=====================
+
+Erasure code is defined by a **profile** and is used when creating an
+erasure coded pool and the associated crush ruleset.
+
+The **default** erasure code profile (which is created when the Ceph
+cluster is initialized) provides the same level of redundancy as two
+copies but requires 25% less disk space. It is described as a profile
+with **k=2** and **m=1**, meaning the information is spread over three
+OSD (k+m == 3) and one of them can be lost.
+
+To improve redundancy without increasing raw storage requirements, a
+new profile can be created. For instance, a profile with **k=10** and
+**m=4** can sustain the loss of four (**m=4**) OSDs by distributing an
+object on fourteen (k+m=14) OSDs. The object is first divided in
+**10** chunks (if the object is 10MB, each chunk is 1MB) and **4**
+coding chunks are computed, for recovery (each coding chunk has the
+same size as the data chunk, i.e. 1MB). The raw space overhead is only
+40% and the object will not be lost even if four OSDs break at the
+same time.
+
+.. _list of available plugins:
+
+.. toctree::
+	:maxdepth: 1
+
+	erasure-code-jerasure
+	erasure-code-isa
+	erasure-code-lrc
+	erasure-code-shec
+
+osd erasure-code-profile set
+============================
+
+To create a new erasure code profile::
+
+	ceph osd erasure-code-profile set {name} \
+             [{directory=directory}] \
+             [{plugin=plugin}] \
+             [{stripe_unit=stripe_unit}] \
+             [{key=value} ...] \
+             [--force]
+
+Where:
+
+``{directory=directory}``
+
+:Description: Set the **directory** name from which the erasure code
+              plugin is loaded.
+
+:Type: String
+:Required: No.
+:Default: /usr/lib/ceph/erasure-code
+
+``{plugin=plugin}``
+
+:Description: Use the erasure code **plugin** to compute coding chunks
+              and recover missing chunks. See the `list of available
+              plugins`_ for more information.
+
+:Type: String
+:Required: No.
+:Default: jerasure
+
+``{stripe_unit=stripe_unit}``
+
+:Description: The amount of data in a data chunk, per stripe. For
+              example, a profile with 2 data chunks and stripe_unit=4K
+              would put the range 0-4K in chunk 0, 4K-8K in chunk 1,
+              then 8K-12K in chunk 0 again. This should be a multiple
+              of 4K for best performance. The default value is taken
+              from the monitor config option
+              ``osd_pool_erasure_code_stripe_unit`` when a pool is
+              created.  The stripe_width of a pool using this profile
+              will be the number of data chunks multiplied by this
+              stripe_unit.
+
+:Type: String
+:Required: No.
+
+``{key=value}``
+
+:Description: The semantic of the remaining key/value pairs is defined
+              by the erasure code plugin.
+
+:Type: String
+:Required: No.
+
+``--force``
+
+:Description: Override an existing profile by the same name, and allow
+              setting a non-4K-aligned stripe_unit.
+
+:Type: String
+:Required: No.
+
+osd erasure-code-profile rm
+============================
+
+To remove an erasure code profile::
+
+	ceph osd erasure-code-profile rm {name}
+
+If the profile is referenced by a pool, the deletion will fail.
+
+osd erasure-code-profile get
+============================
+
+To display an erasure code profile::
+
+	ceph osd erasure-code-profile get {name}
+
+osd erasure-code-profile ls
+===========================
+
+To list the names of all erasure code profiles::
+
+	ceph osd erasure-code-profile ls
+
diff --git a/src/ceph/doc/rados/operations/erasure-code-shec.rst b/src/ceph/doc/rados/operations/erasure-code-shec.rst
new file mode 100644
index 0000000..e3bab37
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code-shec.rst
@@ -0,0 +1,144 @@
+========================
+SHEC erasure code plugin
+========================
+
+The *shec* plugin encapsulates the `multiple SHEC
+<http://tracker.ceph.com/projects/ceph/wiki/Shingled_Erasure_Code_(SHEC)>`_
+library. It allows ceph to recover data more efficiently than Reed Solomon codes.
+
+Create an SHEC profile
+======================
+
+To create a new *shec* erasure code profile::
+
+        ceph osd erasure-code-profile set {name} \
+             plugin=shec \
+             [k={data-chunks}] \
+             [m={coding-chunks}] \
+             [c={durability-estimator}] \
+             [crush-root={root}] \
+             [crush-failure-domain={bucket-type}] \
+             [crush-device-class={device-class}] \
+             [directory={directory}] \
+             [--force]
+
+Where:
+
+``k={data-chunks}``
+
+:Description: Each object is split in **data-chunks** parts,
+              each stored on a different OSD.
+
+:Type: Integer
+:Required: No.
+:Default: 4
+
+``m={coding-chunks}``
+
+:Description: Compute **coding-chunks** for each object and store them on
+              different OSDs. The number of **coding-chunks** does not necessarily
+              equal the number of OSDs that can be down without losing data.
+
+:Type: Integer
+:Required: No.
+:Default: 3
+
+``c={durability-estimator}``
+
+:Description: The number of parity chunks each of which includes each data chunk in its
+              calculation range. The number is used as a **durability estimator**.
+              For instance, if c=2, 2 OSDs can be down without losing data.
+
+:Type: Integer
+:Required: No.
+:Default: 2
+
+``crush-root={root}``
+
+:Description: The name of the crush bucket used for the first step of
+              the ruleset. For intance **step take default**.
+
+:Type: String
+:Required: No.
+:Default: default
+
+``crush-failure-domain={bucket-type}``
+
+:Description: Ensure that no two chunks are in a bucket with the same
+              failure domain. For instance, if the failure domain is
+              **host** no two chunks will be stored on the same
+              host. It is used to create a ruleset step such as **step
+              chooseleaf host**.
+
+:Type: String
+:Required: No.
+:Default: host
+
+``crush-device-class={device-class}``
+
+:Description: Restrict placement to devices of a specific class (e.g.,
+              ``ssd`` or ``hdd``), using the crush device class names
+              in the CRUSH map.
+
+:Type: String
+:Required: No.
+:Default:
+
+``directory={directory}``
+
+:Description: Set the **directory** name from which the erasure code
+              plugin is loaded.
+
+:Type: String
+:Required: No.
+:Default: /usr/lib/ceph/erasure-code
+
+``--force``
+
+:Description: Override an existing profile by the same name.
+
+:Type: String
+:Required: No.
+
+Brief description of SHEC's layouts
+===================================
+
+Space Efficiency
+----------------
+
+Space efficiency is a ratio of data chunks to all ones in a object and
+represented as k/(k+m).
+In order to improve space efficiency, you should increase k or decrease m.
+
+::
+
+        space efficiency of SHEC(4,3,2) = 4/(4+3) = 0.57
+        SHEC(5,3,2) or SHEC(4,2,2) improves SHEC(4,3,2)'s space efficiency
+
+Durability
+----------
+
+The third parameter of SHEC (=c) is a durability estimator, which approximates
+the number of OSDs that can be down without losing data.
+
+``durability estimator of SHEC(4,3,2) = 2``
+
+Recovery Efficiency
+-------------------
+
+Describing calculation of recovery efficiency is beyond the scope of this document,
+but at least increasing m without increasing c achieves improvement of recovery efficiency.
+(However, we must pay attention to the sacrifice of space efficiency in this case.)
+
+``SHEC(4,2,2) -> SHEC(4,3,2) : achieves improvement of recovery efficiency``
+
+Erasure code profile examples
+=============================
+
+::
+
+        $ ceph osd erasure-code-profile set SHECprofile \
+             plugin=shec \
+             k=8 m=4 c=3 \
+             crush-failure-domain=host
+        $ ceph osd pool create shecpool 256 256 erasure SHECprofile
diff --git a/src/ceph/doc/rados/operations/erasure-code.rst b/src/ceph/doc/rados/operations/erasure-code.rst
new file mode 100644
index 0000000..6ec5a09
--- /dev/null
+++ b/src/ceph/doc/rados/operations/erasure-code.rst
@@ -0,0 +1,195 @@
+=============
+ Erasure code
+=============
+
+A Ceph pool is associated to a type to sustain the loss of an OSD
+(i.e. a disk since most of the time there is one OSD per disk). The
+default choice when `creating a pool <../pools>`_ is *replicated*,
+meaning every object is copied on multiple disks. The `Erasure Code
+<https://en.wikipedia.org/wiki/Erasure_code>`_ pool type can be used
+instead to save space.
+
+Creating a sample erasure coded pool
+------------------------------------
+
+The simplest erasure coded pool is equivalent to `RAID5
+<https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5>`_ and
+requires at least three hosts::
+
+    $ ceph osd pool create ecpool 12 12 erasure
+    pool 'ecpool' created
+    $ echo ABCDEFGHI | rados --pool ecpool put NYAN -
+    $ rados --pool ecpool get NYAN -
+    ABCDEFGHI
+
+.. note:: the 12 in *pool create* stands for 
+          `the number of placement groups <../pools>`_.
+
+Erasure code profiles
+---------------------
+
+The default erasure code profile sustains the loss of a single OSD. It
+is equivalent to a replicated pool of size two but requires 1.5TB
+instead of 2TB to store 1TB of data. The default profile can be
+displayed with::
+
+    $ ceph osd erasure-code-profile get default
+    k=2
+    m=1
+    plugin=jerasure
+    crush-failure-domain=host
+    technique=reed_sol_van
+
+Choosing the right profile is important because it cannot be modified
+after the pool is created: a new pool with a different profile needs
+to be created and all objects from the previous pool moved to the new.
+
+The most important parameters of the profile are *K*, *M* and
+*crush-failure-domain* because they define the storage overhead and
+the data durability. For instance, if the desired architecture must
+sustain the loss of two racks with a storage overhead of 40% overhead,
+the following profile can be defined::
+
+    $ ceph osd erasure-code-profile set myprofile \
+       k=3 \
+       m=2 \
+       crush-failure-domain=rack
+    $ ceph osd pool create ecpool 12 12 erasure myprofile
+    $ echo ABCDEFGHI | rados --pool ecpool put NYAN -
+    $ rados --pool ecpool get NYAN -
+    ABCDEFGHI
+
+The *NYAN* object will be divided in three (*K=3*) and two additional
+*chunks* will be created (*M=2*). The value of *M* defines how many
+OSD can be lost simultaneously without losing any data. The
+*crush-failure-domain=rack* will create a CRUSH ruleset that ensures
+no two *chunks* are stored in the same rack.
+
+.. ditaa::
+                            +-------------------+
+                       name |       NYAN        |
+                            +-------------------+
+                    content |     ABCDEFGHI     |
+                            +--------+----------+
+                                     |
+                                     |
+                                     v
+                              +------+------+
+              +---------------+ encode(3,2) +-----------+
+              |               +--+--+---+---+           |
+              |                  |  |   |               |
+              |          +-------+  |   +-----+         |
+              |          |          |         |         |
+           +--v---+   +--v---+   +--v---+  +--v---+  +--v---+
+     name  | NYAN |   | NYAN |   | NYAN |  | NYAN |  | NYAN |
+           +------+   +------+   +------+  +------+  +------+
+    shard  |  1   |   |  2   |   |  3   |  |  4   |  |  5   |
+           +------+   +------+   +------+  +------+  +------+
+  content  | ABC  |   | DEF  |   | GHI  |  | YXY  |  | QGC  |
+           +--+---+   +--+---+   +--+---+  +--+---+  +--+---+
+              |          |          |         |         |
+              |          |          v         |         |
+              |          |       +--+---+     |         |
+              |          |       | OSD1 |     |         |
+              |          |       +------+     |         |
+              |          |                    |         |
+              |          |       +------+     |         |
+              |          +------>| OSD2 |     |         |
+              |                  +------+     |         |
+              |                               |         |
+              |                  +------+     |         |
+              |                  | OSD3 |<----+         |
+              |                  +------+               |
+              |                                         |
+              |                  +------+               |
+              |                  | OSD4 |<--------------+
+              |                  +------+
+              |
+              |                  +------+
+              +----------------->| OSD5 |
+                                 +------+
+
+ 
+More information can be found in the `erasure code profiles
+<../erasure-code-profile>`_ documentation.
+
+
+Erasure Coding with Overwrites
+------------------------------
+
+By default, erasure coded pools only work with uses like RGW that
+perform full object writes and appends.
+
+Since Luminous, partial writes for an erasure coded pool may be
+enabled with a per-pool setting. This lets RBD and Cephfs store their
+data in an erasure coded pool::
+
+    ceph osd pool set ec_pool allow_ec_overwrites true
+
+This can only be enabled on a pool residing on bluestore OSDs, since
+bluestore's checksumming is used to detect bitrot or other corruption
+during deep-scrub. In addition to being unsafe, using filestore with
+ec overwrites yields low performance compared to bluestore.
+
+Erasure coded pools do not support omap, so to use them with RBD and
+Cephfs you must instruct them to store their data in an ec pool, and
+their metadata in a replicated pool. For RBD, this means using the
+erasure coded pool as the ``--data-pool`` during image creation::
+
+    rbd create --size 1G --data-pool ec_pool replicated_pool/image_name
+
+For Cephfs, using an erasure coded pool means setting that pool in
+a `file layout <../../../cephfs/file-layouts>`_.
+
+
+Erasure coded pool and cache tiering
+------------------------------------
+
+Erasure coded pools require more resources than replicated pools and
+lack some functionalities such as omap. To overcome these
+limitations, one can set up a `cache tier <../cache-tiering>`_
+before the erasure coded pool.
+
+For instance, if the pool *hot-storage* is made of fast storage::
+
+    $ ceph osd tier add ecpool hot-storage
+    $ ceph osd tier cache-mode hot-storage writeback
+    $ ceph osd tier set-overlay ecpool hot-storage
+
+will place the *hot-storage* pool as tier of *ecpool* in *writeback*
+mode so that every write and read to the *ecpool* are actually using
+the *hot-storage* and benefit from its flexibility and speed.
+
+More information can be found in the `cache tiering
+<../cache-tiering>`_ documentation.
+
+Glossary
+--------
+
+*chunk*
+   when the encoding function is called, it returns chunks of the same
+   size. Data chunks which can be concatenated to reconstruct the original
+   object and coding chunks which can be used to rebuild a lost chunk.
+
+*K*
+   the number of data *chunks*, i.e. the number of *chunks* in which the
+   original object is divided. For instance if *K* = 2 a 10KB object
+   will be divided into *K* objects of 5KB each.
+
+*M*
+   the number of coding *chunks*, i.e. the number of additional *chunks*
+   computed by the encoding functions. If there are 2 coding *chunks*,
+   it means 2 OSDs can be out without losing data.
+
+
+Table of content
+----------------
+
+.. toctree::
+	:maxdepth: 1
+
+	erasure-code-profile
+	erasure-code-jerasure
+	erasure-code-isa
+	erasure-code-lrc
+	erasure-code-shec
diff --git a/src/ceph/doc/rados/operations/health-checks.rst b/src/ceph/doc/rados/operations/health-checks.rst
new file mode 100644
index 0000000..c1e2200
--- /dev/null
+++ b/src/ceph/doc/rados/operations/health-checks.rst
@@ -0,0 +1,527 @@
+
+=============
+Health checks
+=============
+
+Overview
+========
+
+There is a finite set of possible health messages that a Ceph cluster can
+raise -- these are defined as *health checks* which have unique identifiers.
+
+The identifier is a terse pseudo-human-readable (i.e. like a variable name)
+string.  It is intended to enable tools (such as UIs) to make sense of
+health checks, and present them in a way that reflects their meaning.
+
+This page lists the health checks that are raised by the monitor and manager
+daemons.  In addition to these, you may also see health checks that originate
+from MDS daemons (see :doc:`/cephfs/health-messages`), and health checks
+that are defined by ceph-mgr python modules.
+
+Definitions
+===========
+
+
+OSDs
+----
+
+OSD_DOWN
+________
+
+One or more OSDs are marked down.  The ceph-osd daemon may have been
+stopped, or peer OSDs may be unable to reach the OSD over the network.
+Common causes include a stopped or crashed daemon, a down host, or a
+network outage.
+
+Verify the host is healthy, the daemon is started, and network is
+functioning.  If the daemon has crashed, the daemon log file
+(``/var/log/ceph/ceph-osd.*``) may contain debugging information.
+
+OSD_<crush type>_DOWN
+_____________________
+
+(e.g. OSD_HOST_DOWN, OSD_ROOT_DOWN)
+
+All the OSDs within a particular CRUSH subtree are marked down, for example
+all OSDs on a host.
+
+OSD_ORPHAN
+__________
+
+An OSD is referenced in the CRUSH map hierarchy but does not exist.
+
+The OSD can be removed from the CRUSH hierarchy with::
+
+  ceph osd crush rm osd.<id>
+
+OSD_OUT_OF_ORDER_FULL
+_____________________
+
+The utilization thresholds for `backfillfull`, `nearfull`, `full`,
+and/or `failsafe_full` are not ascending.  In particular, we expect
+`backfillfull < nearfull`, `nearfull < full`, and `full <
+failsafe_full`.
+
+The thresholds can be adjusted with::
+
+  ceph osd set-backfillfull-ratio <ratio>
+  ceph osd set-nearfull-ratio <ratio>
+  ceph osd set-full-ratio <ratio>
+
+
+OSD_FULL
+________
+
+One or more OSDs has exceeded the `full` threshold and is preventing
+the cluster from servicing writes.
+
+Utilization by pool can be checked with::
+
+  ceph df
+
+The currently defined `full` ratio can be seen with::
+
+  ceph osd dump | grep full_ratio
+
+A short-term workaround to restore write availability is to raise the full
+threshold by a small amount::
+
+  ceph osd set-full-ratio <ratio>
+
+New storage should be added to the cluster by deploying more OSDs or
+existing data should be deleted in order to free up space.
+  
+OSD_BACKFILLFULL
+________________
+
+One or more OSDs has exceeded the `backfillfull` threshold, which will
+prevent data from being allowed to rebalance to this device.  This is
+an early warning that rebalancing may not be able to complete and that
+the cluster is approaching full.
+
+Utilization by pool can be checked with::
+
+  ceph df
+
+OSD_NEARFULL
+____________
+
+One or more OSDs has exceeded the `nearfull` threshold.  This is an early
+warning that the cluster is approaching full.
+
+Utilization by pool can be checked with::
+
+  ceph df
+
+OSDMAP_FLAGS
+____________
+
+One or more cluster flags of interest has been set.  These flags include:
+
+* *full* - the cluster is flagged as full and cannot service writes
+* *pauserd*, *pausewr* - paused reads or writes
+* *noup* - OSDs are not allowed to start
+* *nodown* - OSD failure reports are being ignored, such that the
+  monitors will not mark OSDs `down`
+* *noin* - OSDs that were previously marked `out` will not be marked
+  back `in` when they start
+* *noout* - down OSDs will not automatically be marked out after the
+  configured interval
+* *nobackfill*, *norecover*, *norebalance* - recovery or data
+  rebalancing is suspended
+* *noscrub*, *nodeep_scrub* - scrubbing is disabled
+* *notieragent* - cache tiering activity is suspended
+
+With the exception of *full*, these flags can be set or cleared with::
+
+  ceph osd set <flag>
+  ceph osd unset <flag>
+    
+OSD_FLAGS
+_________
+
+One or more OSDs has a per-OSD flag of interest set.  These flags include:
+
+* *noup*: OSD is not allowed to start
+* *nodown*: failure reports for this OSD will be ignored
+* *noin*: if this OSD was previously marked `out` automatically
+  after a failure, it will not be marked in when it stats
+* *noout*: if this OSD is down it will not automatically be marked
+  `out` after the configured interval
+
+Per-OSD flags can be set and cleared with::
+
+  ceph osd add-<flag> <osd-id>
+  ceph osd rm-<flag> <osd-id>
+
+For example, ::
+
+  ceph osd rm-nodown osd.123
+
+OLD_CRUSH_TUNABLES
+__________________
+
+The CRUSH map is using very old settings and should be updated.  The
+oldest tunables that can be used (i.e., the oldest client version that
+can connect to the cluster) without triggering this health warning is
+determined by the ``mon_crush_min_required_version`` config option.
+See :doc:`/rados/operations/crush-map/#tunables` for more information.
+
+OLD_CRUSH_STRAW_CALC_VERSION
+____________________________
+
+The CRUSH map is using an older, non-optimal method for calculating
+intermediate weight values for ``straw`` buckets.
+
+The CRUSH map should be updated to use the newer method
+(``straw_calc_version=1``).  See
+:doc:`/rados/operations/crush-map/#tunables` for more information.
+
+CACHE_POOL_NO_HIT_SET
+_____________________
+
+One or more cache pools is not configured with a *hit set* to track
+utilization, which will prevent the tiering agent from identifying
+cold objects to flush and evict from the cache.
+
+Hit sets can be configured on the cache pool with::
+
+  ceph osd pool set <poolname> hit_set_type <type>
+  ceph osd pool set <poolname> hit_set_period <period-in-seconds>
+  ceph osd pool set <poolname> hit_set_count <number-of-hitsets>
+  ceph osd pool set <poolname> hit_set_fpp <target-false-positive-rate>  
+
+OSD_NO_SORTBITWISE
+__________________
+
+No pre-luminous v12.y.z OSDs are running but the ``sortbitwise`` flag has not
+been set.
+
+The ``sortbitwise`` flag must be set before luminous v12.y.z or newer
+OSDs can start.  You can safely set the flag with::
+
+  ceph osd set sortbitwise
+
+POOL_FULL
+_________
+
+One or more pools has reached its quota and is no longer allowing writes.
+
+Pool quotas and utilization can be seen with::
+
+  ceph df detail
+
+You can either raise the pool quota with::
+
+  ceph osd pool set-quota <poolname> max_objects <num-objects>
+  ceph osd pool set-quota <poolname> max_bytes <num-bytes>
+
+or delete some existing data to reduce utilization.
+
+
+Data health (pools & placement groups)
+--------------------------------------
+
+PG_AVAILABILITY
+_______________
+
+Data availability is reduced, meaning that the cluster is unable to
+service potential read or write requests for some data in the cluster.
+Specifically, one or more PGs is in a state that does not allow IO
+requests to be serviced.  Problematic PG states include *peering*,
+*stale*, *incomplete*, and the lack of *active* (if those conditions do not clear
+quickly).
+
+Detailed information about which PGs are affected is available from::
+
+  ceph health detail
+
+In most cases the root cause is that one or more OSDs is currently
+down; see the dicussion for ``OSD_DOWN`` above.
+
+The state of specific problematic PGs can be queried with::
+
+  ceph tell <pgid> query
+
+PG_DEGRADED
+___________
+
+Data redundancy is reduced for some data, meaning the cluster does not
+have the desired number of replicas for all data (for replicated
+pools) or erasure code fragments (for erasure coded pools).
+Specifically, one or more PGs:
+
+* has the *degraded* or *undersized* flag set, meaning there are not
+  enough instances of that placement group in the cluster;
+* has not had the *clean* flag set for some time.
+
+Detailed information about which PGs are affected is available from::
+
+  ceph health detail
+
+In most cases the root cause is that one or more OSDs is currently
+down; see the dicussion for ``OSD_DOWN`` above.
+
+The state of specific problematic PGs can be queried with::
+
+  ceph tell <pgid> query
+
+
+PG_DEGRADED_FULL
+________________
+
+Data redundancy may be reduced or at risk for some data due to a lack
+of free space in the cluster.  Specifically, one or more PGs has the
+*backfill_toofull* or *recovery_toofull* flag set, meaning that the
+cluster is unable to migrate or recover data because one or more OSDs
+is above the *backfillfull* threshold.
+
+See the discussion for *OSD_BACKFILLFULL* or *OSD_FULL* above for
+steps to resolve this condition.
+
+PG_DAMAGED
+__________
+
+Data scrubbing has discovered some problems with data consistency in
+the cluster.  Specifically, one or more PGs has the *inconsistent* or
+*snaptrim_error* flag is set, indicating an earlier scrub operation
+found a problem, or that the *repair* flag is set, meaning a repair
+for such an inconsistency is currently in progress.
+
+See :doc:`pg-repair` for more information.
+
+OSD_SCRUB_ERRORS
+________________
+
+Recent OSD scrubs have uncovered inconsistencies. This error is generally
+paired with *PG_DAMANGED* (see above).
+
+See :doc:`pg-repair` for more information.
+
+CACHE_POOL_NEAR_FULL
+____________________
+
+A cache tier pool is nearly full.  Full in this context is determined
+by the ``target_max_bytes`` and ``target_max_objects`` properties on
+the cache pool.  Once the pool reaches the target threshold, write
+requests to the pool may block while data is flushed and evicted
+from the cache, a state that normally leads to very high latencies and
+poor performance.
+
+The cache pool target size can be adjusted with::
+
+  ceph osd pool set <cache-pool-name> target_max_bytes <bytes>
+  ceph osd pool set <cache-pool-name> target_max_objects <objects>
+
+Normal cache flush and evict activity may also be throttled due to reduced
+availability or performance of the base tier, or overall cluster load.
+
+TOO_FEW_PGS
+___________
+
+The number of PGs in use in the cluster is below the configurable
+threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD.  This can lead
+to suboptimizal distribution and balance of data across the OSDs in
+the cluster, and similar reduce overall performance.
+
+This may be an expected condition if data pools have not yet been
+created.
+
+The PG count for existing pools can be increased or new pools can be
+created.  Please refer to
+:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for
+more information.
+
+TOO_MANY_PGS
+____________
+
+The number of PGs in use in the cluster is above the configurable
+threshold of ``mon_max_pg_per_osd`` PGs per OSD.  If this threshold is
+exceed the cluster will not allow new pools to be created, pool `pg_num` to
+be increased, or pool replication to be increased (any of which would lead to
+more PGs in the cluster).  A large number of PGs can lead
+to higher memory utilization for OSD daemons, slower peering after
+cluster state changes (like OSD restarts, additions, or removals), and
+higher load on the Manager and Monitor daemons.
+
+The simplest way to mitigate the problem is to increase the number of
+OSDs in the cluster by adding more hardware.  Note that the OSD count
+used for the purposes of this health check is the number of "in" OSDs,
+so marking "out" OSDs "in" (if there are any) can also help::
+
+  ceph osd in <osd id(s)>
+
+Please refer to
+:doc:`placement-groups#Choosing-the-number-of-Placement-Groups` for
+more information.
+
+SMALLER_PGP_NUM
+_______________
+
+One or more pools has a ``pgp_num`` value less than ``pg_num``.  This
+is normally an indication that the PG count was increased without
+also increasing the placement behavior.
+
+This is sometimes done deliberately to separate out the `split` step
+when the PG count is adjusted from the data migration that is needed
+when ``pgp_num`` is changed.
+
+This is normally resolved by setting ``pgp_num`` to match ``pg_num``,
+triggering the data migration, with::
+
+  ceph osd pool set <pool> pgp_num <pg-num-value>
+
+MANY_OBJECTS_PER_PG
+___________________
+
+One or more pools has an average number of objects per PG that is
+significantly higher than the overall cluster average.  The specific
+threshold is controlled by the ``mon_pg_warn_max_object_skew``
+configuration value.
+
+This is usually an indication that the pool(s) containing most of the
+data in the cluster have too few PGs, and/or that other pools that do
+not contain as much data have too many PGs.  See the discussion of
+*TOO_MANY_PGS* above.
+
+The threshold can be raised to silence the health warning by adjusting
+the ``mon_pg_warn_max_object_skew`` config option on the monitors.
+
+POOL_APP_NOT_ENABLED
+____________________
+
+A pool exists that contains one or more objects but has not been
+tagged for use by a particular application.
+
+Resolve this warning by labeling the pool for use by an application.  For
+example, if the pool is used by RBD,::
+
+  rbd pool init <poolname>
+
+If the pool is being used by a custom application 'foo', you can also label
+via the low-level command::
+
+  ceph osd pool application enable foo
+
+For more information, see :doc:`pools.rst#associate-pool-to-application`.
+
+POOL_FULL
+_________
+
+One or more pools has reached (or is very close to reaching) its
+quota.  The threshold to trigger this error condition is controlled by
+the ``mon_pool_quota_crit_threshold`` configuration option.
+
+Pool quotas can be adjusted up or down (or removed) with::
+
+  ceph osd pool set-quota <pool> max_bytes <bytes>
+  ceph osd pool set-quota <pool> max_objects <objects>
+
+Setting the quota value to 0 will disable the quota.  
+
+POOL_NEAR_FULL
+______________
+
+One or more pools is approaching is quota.  The threshold to trigger
+this warning condition is controlled by the
+``mon_pool_quota_warn_threshold`` configuration option.
+
+Pool quotas can be adjusted up or down (or removed) with::
+
+  ceph osd pool set-quota <pool> max_bytes <bytes>
+  ceph osd pool set-quota <pool> max_objects <objects>
+
+Setting the quota value to 0 will disable the quota.
+
+OBJECT_MISPLACED
+________________
+
+One or more objects in the cluster is not stored on the node the
+cluster would like it to be stored on.  This is an indication that
+data migration due to some recent cluster change has not yet completed.
+
+Misplaced data is not a dangerous condition in and of itself; data
+consistency is never at risk, and old copies of objects are never
+removed until the desired number of new copies (in the desired
+locations) are present.
+
+OBJECT_UNFOUND
+______________
+
+One or more objects in the cluster cannot be found.  Specifically, the
+OSDs know that a new or updated copy of an object should exist, but a
+copy of that version of the object has not been found on OSDs that are
+currently online.
+
+Read or write requests to unfound objects will block.
+
+Ideally, a down OSD can be brought back online that has the more
+recent copy of the unfound object.  Candidate OSDs can be identified from the
+peering state for the PG(s) responsible for the unfound object::
+
+  ceph tell <pgid> query
+
+If the latest copy of the object is not available, the cluster can be
+told to roll back to a previous version of the object.  See
+:doc:`troubleshooting-pg#Unfound-objects` for more information.
+
+REQUEST_SLOW
+____________
+
+One or more OSD requests is taking a long time to process.  This can
+be an indication of extreme load, a slow storage device, or a software
+bug.
+
+The request queue on the OSD(s) in question can be queried with the
+following command, executed from the OSD host::
+
+  ceph daemon osd.<id> ops
+
+A summary of the slowest recent requests can be seen with::
+
+  ceph daemon osd.<id> dump_historic_ops
+
+The location of an OSD can be found with::
+
+  ceph osd find osd.<id>
+
+REQUEST_STUCK
+_____________
+
+One or more OSD requests has been blocked for an extremely long time.
+This is an indication that either the cluster has been unhealthy for
+an extended period of time (e.g., not enough running OSDs) or there is
+some internal problem with the OSD.  See the dicussion of
+*REQUEST_SLOW* above.
+
+PG_NOT_SCRUBBED
+_______________
+
+One or more PGs has not been scrubbed recently.  PGs are normally
+scrubbed every ``mon_scrub_interval`` seconds, and this warning
+triggers when ``mon_warn_not_scrubbed`` such intervals have elapsed
+without a scrub.
+
+PGs will not scrub if they are not flagged as *clean*, which may
+happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
+*PG_DEGRADED* above).
+
+You can manually initiate a scrub of a clean PG with::
+
+  ceph pg scrub <pgid>
+
+PG_NOT_DEEP_SCRUBBED
+____________________
+
+One or more PGs has not been deep scrubbed recently.  PGs are normally
+scrubbed every ``osd_deep_mon_scrub_interval`` seconds, and this warning
+triggers when ``mon_warn_not_deep_scrubbed`` such intervals have elapsed
+without a scrub.
+
+PGs will not (deep) scrub if they are not flagged as *clean*, which may
+happen if they are misplaced or degraded (see *PG_AVAILABILITY* and
+*PG_DEGRADED* above).
+
+You can manually initiate a scrub of a clean PG with::
+
+  ceph pg deep-scrub <pgid>
diff --git a/src/ceph/doc/rados/operations/index.rst b/src/ceph/doc/rados/operations/index.rst
new file mode 100644
index 0000000..aacf764
--- /dev/null
+++ b/src/ceph/doc/rados/operations/index.rst
@@ -0,0 +1,90 @@
+====================
+ Cluster Operations
+====================
+
+.. raw:: html
+
+	<table><colgroup><col width="50%"><col width="50%"></colgroup><tbody valign="top"><tr><td><h3>High-level Operations</h3>
+
+High-level cluster operations consist primarily of starting, stopping, and
+restarting a cluster with the ``ceph`` service;  checking the cluster's health;
+and, monitoring an operating cluster.
+
+.. toctree::
+	:maxdepth: 1 
+	
+	operating
+	health-checks
+	monitoring
+	monitoring-osd-pg
+	user-management
+
+.. raw:: html 
+
+	</td><td><h3>Data Placement</h3>
+
+Once you have your cluster up and running, you may begin working with data
+placement. Ceph supports petabyte-scale data storage clusters, with storage
+pools and placement groups that distribute data across the cluster using Ceph's
+CRUSH algorithm.
+
+.. toctree::
+	:maxdepth: 1
+
+	data-placement
+	pools
+	erasure-code
+	cache-tiering
+	placement-groups
+	upmap
+	crush-map
+	crush-map-edits
+
+
+
+.. raw:: html
+
+	</td></tr><tr><td><h3>Low-level Operations</h3>
+
+Low-level cluster operations consist of starting, stopping, and restarting a
+particular daemon within a cluster; changing the settings of a particular
+daemon or subsystem; and, adding a daemon to the cluster or removing a  daemon
+from the cluster. The most common use cases for low-level operations include
+growing or shrinking the Ceph cluster and replacing legacy or failed hardware
+with new hardware.
+
+.. toctree::
+	:maxdepth: 1
+
+	add-or-rm-osds
+	add-or-rm-mons
+	Command Reference <control>
+
+	
+
+.. raw:: html
+
+	</td><td><h3>Troubleshooting</h3>
+
+Ceph is still on the leading edge, so you may encounter situations that require
+you to evaluate your Ceph configuration and modify your logging and debugging
+settings to identify and remedy issues you are encountering with your cluster.
+
+.. toctree::
+	:maxdepth: 1 
+
+	../troubleshooting/community
+	../troubleshooting/troubleshooting-mon
+	../troubleshooting/troubleshooting-osd
+	../troubleshooting/troubleshooting-pg
+	../troubleshooting/log-and-debug
+	../troubleshooting/cpu-profiling
+	../troubleshooting/memory-profiling
+
+
+
+
+.. raw:: html
+
+	</td></tr></tbody></table>
+
diff --git a/src/ceph/doc/rados/operations/monitoring-osd-pg.rst b/src/ceph/doc/rados/operations/monitoring-osd-pg.rst
new file mode 100644
index 0000000..0107e34
--- /dev/null
+++ b/src/ceph/doc/rados/operations/monitoring-osd-pg.rst
@@ -0,0 +1,617 @@
+=========================
+ Monitoring OSDs and PGs
+=========================
+
+High availability and high reliability require a fault-tolerant approach to
+managing hardware and software issues. Ceph has no single point-of-failure, and
+can service requests for data in a "degraded" mode. Ceph's `data placement`_
+introduces a layer of indirection to ensure that data doesn't bind directly to
+particular OSD addresses. This means that tracking down system faults requires
+finding the `placement group`_ and the underlying OSDs at root of the problem.
+
+.. tip:: A fault in one part of the cluster may prevent you from accessing a 
+   particular object, but that doesn't mean that you cannot access other objects.
+   When you run into a fault, don't panic. Just follow the steps for monitoring
+   your OSDs and placement groups. Then, begin troubleshooting.
+
+Ceph is generally self-repairing. However, when problems persist, monitoring
+OSDs and placement groups will help you identify the problem.
+
+
+Monitoring OSDs
+===============
+
+An OSD's status is either in the cluster (``in``) or out of the cluster
+(``out``); and, it is either up and running (``up``), or it is down and not
+running (``down``). If an OSD is ``up``, it may be either ``in`` the cluster
+(you can read and write data) or it is ``out`` of the cluster.  If it was
+``in`` the cluster and recently moved ``out`` of the cluster, Ceph will migrate
+placement groups to other OSDs. If an OSD is ``out`` of the cluster, CRUSH will
+not assign placement groups to the OSD. If an OSD is ``down``, it should also be
+``out``.
+
+.. note:: If an OSD is ``down`` and ``in``, there is a problem and the cluster 
+   will not be in a healthy state.
+
+.. ditaa:: +----------------+        +----------------+
+           |                |        |                |
+           |   OSD #n In    |        |   OSD #n Up    |
+           |                |        |                |
+           +----------------+        +----------------+
+                   ^                         ^
+                   |                         |
+                   |                         |
+                   v                         v
+           +----------------+        +----------------+
+           |                |        |                |
+           |   OSD #n Out   |        |   OSD #n Down  |
+           |                |        |                |
+           +----------------+        +----------------+
+
+If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
+you may notice that the cluster does not always echo back ``HEALTH OK``. Don't
+panic. With respect to OSDs, you should expect that the cluster will **NOT**
+echo   ``HEALTH OK`` in a few expected circumstances:
+
+#. You haven't started the cluster yet (it won't respond).
+#. You have just started or restarted the cluster and it's not ready yet,
+   because the placement groups are getting created and the OSDs are in
+   the process of peering.
+#. You just added or removed an OSD.
+#. You just have modified your cluster map.
+
+An important aspect of monitoring OSDs is to ensure that when the cluster
+is up and running that all OSDs that are ``in`` the cluster are ``up`` and
+running, too. To see if all OSDs are running, execute:: 
+
+	ceph osd stat
+
+The result should tell you the map epoch (eNNNN), the total number of OSDs (x),
+how many are ``up`` (y) and how many are ``in`` (z). ::
+
+	eNNNN: x osds: y up, z in
+
+If the number of OSDs that are ``in`` the cluster is more than the number of
+OSDs that are ``up``, execute the following command to identify the ``ceph-osd``
+daemons that are not running:: 
+
+	ceph osd tree
+
+:: 
+
+	dumped osdmap tree epoch 1
+	# id	weight	type name	up/down	reweight
+	-1	2	pool openstack
+	-3	2		rack dell-2950-rack-A
+	-2	2			host dell-2950-A1
+	0	1				osd.0	up	1	
+	1	1				osd.1	down	1
+
+
+.. tip:: The ability to search through a well-designed CRUSH hierarchy may help
+   you troubleshoot your cluster by identifying the physcial locations faster.
+
+If an OSD is ``down``, start it:: 
+
+	sudo systemctl start ceph-osd@1
+
+See `OSD Not Running`_ for problems associated with OSDs that stopped, or won't
+restart.
+	
+
+PG Sets
+=======
+
+When CRUSH assigns placement groups to OSDs, it looks at the number of replicas
+for the pool and assigns the placement group to OSDs such that each replica of
+the placement group gets assigned to a different OSD. For example, if the pool
+requires three replicas of a placement group, CRUSH may assign them to
+``osd.1``, ``osd.2`` and ``osd.3`` respectively. CRUSH actually seeks a
+pseudo-random placement that will take into account failure domains you set in
+your `CRUSH map`_, so you will rarely see placement groups assigned to nearest
+neighbor OSDs in a large cluster. We refer to the set of OSDs that should
+contain the replicas of a particular placement group as the **Acting Set**. In
+some cases, an OSD in the Acting Set is ``down`` or otherwise not able to
+service requests for objects in the placement group. When these situations
+arise, don't panic. Common examples include:
+
+- You added or removed an OSD. Then, CRUSH reassigned the placement group to 
+  other OSDs--thereby changing the composition of the Acting Set and spawning
+  the migration of data with a "backfill" process.
+- An OSD was ``down``, was restarted, and is now ``recovering``.
+- An OSD in the Acting Set is ``down`` or unable to service requests, 
+  and another OSD has temporarily assumed its duties.
+
+Ceph processes a client request using the **Up Set**, which is the set of OSDs
+that will actually handle the requests. In most cases, the Up Set and the Acting
+Set are virtually identical. When they are not, it may indicate that Ceph is
+migrating data, an OSD is recovering, or that there is a problem (i.e., Ceph
+usually echoes a "HEALTH WARN" state with a "stuck stale" message in such
+scenarios).
+
+To retrieve a list of placement groups, execute:: 
+
+	ceph pg dump
+	
+To view which OSDs are within the Acting Set or the Up Set for a given placement
+group, execute:: 
+
+	ceph pg map {pg-num}
+
+The result should tell you the osdmap epoch (eNNN), the placement group number
+({pg-num}),  the OSDs in the Up Set (up[]), and the OSDs in the acting set
+(acting[]). ::
+
+	osdmap eNNN pg {pg-num} -> up [0,1,2] acting [0,1,2]
+
+.. note:: If the Up Set and Acting Set do not match, this may be an indicator
+   that the cluster rebalancing itself or of a potential problem with 
+   the cluster.
+ 
+
+Peering
+=======
+
+Before you can write data to a placement group, it must be in an ``active``
+state, and it  **should** be in a ``clean`` state. For Ceph to determine the
+current state of a placement group, the primary OSD of the placement group
+(i.e., the first OSD in the acting set), peers with the secondary and tertiary
+OSDs to establish agreement on the current state of the placement group
+(assuming a pool with 3 replicas of the PG).
+
+
+.. ditaa:: +---------+     +---------+     +-------+
+           |  OSD 1  |     |  OSD 2  |     | OSD 3 |
+           +---------+     +---------+     +-------+
+                |               |              |
+                |  Request To   |              |
+                |     Peer      |              |             
+                |-------------->|              |
+                |<--------------|              |
+                |    Peering                   |
+                |                              |
+                |         Request To           |
+                |            Peer              | 
+                |----------------------------->|  
+                |<-----------------------------|
+                |          Peering             |
+
+The OSDs also report their status to the monitor. See `Configuring Monitor/OSD
+Interaction`_ for details. To troubleshoot peering issues, see `Peering
+Failure`_.
+
+
+Monitoring Placement Group States
+=================================
+
+If you execute a command such as ``ceph health``, ``ceph -s`` or ``ceph -w``,
+you may notice that the cluster does not always echo back ``HEALTH OK``. After
+you check to see if the OSDs are running, you should also check placement group
+states. You should expect that the cluster will **NOT** echo ``HEALTH OK`` in a
+number of placement group peering-related circumstances:
+
+#. You have just created a pool and placement groups haven't peered yet.
+#. The placement groups are recovering.
+#. You have just added an OSD to or removed an OSD from the cluster.
+#. You have just modified your CRUSH map and your placement groups are migrating.
+#. There is inconsistent data in different replicas of a placement group.
+#. Ceph is scrubbing a placement group's replicas.
+#. Ceph doesn't have enough storage capacity to complete backfilling operations.
+
+If one of the foregoing circumstances causes Ceph to echo ``HEALTH WARN``, don't
+panic. In many cases, the cluster will recover on its own. In some cases, you
+may need to take action. An important aspect of monitoring placement groups is
+to ensure that when the cluster is up and running that all placement groups are
+``active``, and preferably in the ``clean`` state. To see the status of all
+placement groups, execute:: 
+
+	ceph pg stat
+
+The result should tell you the placement group map version (vNNNNNN), the total
+number of placement groups (x), and how many placement groups are in a
+particular state such as ``active+clean`` (y). ::
+
+	vNNNNNN: x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail
+
+.. note:: It is common for Ceph to report multiple states for placement groups.
+
+In addition to the placement group states, Ceph will also echo back the amount
+of data used (aa), the amount of storage capacity remaining (bb), and the total
+storage capacity for the placement group. These numbers can be important in a
+few cases: 
+
+- You are reaching your ``near full ratio`` or ``full ratio``. 
+- Your data is not getting distributed across the cluster due to an 
+  error in your CRUSH configuration.
+
+
+.. topic:: Placement Group IDs
+
+   Placement group IDs consist of the pool number (not pool name) followed 
+   by a period (.) and the placement group ID--a hexadecimal number. You
+   can view pool numbers and their names from the output of ``ceph osd 
+   lspools``. For example, the default pool ``rbd`` corresponds to
+   pool number ``0``. A fully qualified placement group ID has the
+   following form::
+   
+   	{pool-num}.{pg-id}
+   
+   And it typically looks like this:: 
+   
+   	0.1f
+   
+
+To retrieve a list of placement groups, execute the following:: 
+
+	ceph pg dump
+	
+You can also format the output in JSON format and save it to a file:: 
+
+	ceph pg dump -o {filename} --format=json
+
+To query a particular placement group, execute the following:: 
+
+	ceph pg {poolnum}.{pg-id} query
+	
+Ceph will output the query in JSON format.
+
+.. code-block:: javascript
+	
+	{
+	  "state": "active+clean",
+	  "up": [
+	    1,
+	    0
+	  ],
+	  "acting": [
+	    1,
+	    0
+	  ],
+	  "info": {
+	    "pgid": "1.e",
+	    "last_update": "4'1",
+	    "last_complete": "4'1",
+	    "log_tail": "0'0",
+	    "last_backfill": "MAX",
+	    "purged_snaps": "[]",
+	    "history": {
+	      "epoch_created": 1,
+	      "last_epoch_started": 537,
+	      "last_epoch_clean": 537,
+	      "last_epoch_split": 534,
+	      "same_up_since": 536,
+	      "same_interval_since": 536,
+	      "same_primary_since": 536,
+	      "last_scrub": "4'1",
+	      "last_scrub_stamp": "2013-01-25 10:12:23.828174"
+	    },
+	    "stats": {
+	      "version": "4'1",
+	      "reported": "536'782",
+	      "state": "active+clean",
+	      "last_fresh": "2013-01-25 10:12:23.828271",
+	      "last_change": "2013-01-25 10:12:23.828271",
+	      "last_active": "2013-01-25 10:12:23.828271",
+	      "last_clean": "2013-01-25 10:12:23.828271",
+	      "last_unstale": "2013-01-25 10:12:23.828271",
+	      "mapping_epoch": 535,
+	      "log_start": "0'0",
+	      "ondisk_log_start": "0'0",
+	      "created": 1,
+	      "last_epoch_clean": 1,
+	      "parent": "0.0",
+	      "parent_split_bits": 0,
+	      "last_scrub": "4'1",
+	      "last_scrub_stamp": "2013-01-25 10:12:23.828174",
+	      "log_size": 128,
+	      "ondisk_log_size": 128,
+	      "stat_sum": {
+	        "num_bytes": 205,
+	        "num_objects": 1,
+	        "num_object_clones": 0,
+	        "num_object_copies": 0,
+	        "num_objects_missing_on_primary": 0,
+	        "num_objects_degraded": 0,
+	        "num_objects_unfound": 0,
+	        "num_read": 1,
+	        "num_read_kb": 0,
+	        "num_write": 3,
+	        "num_write_kb": 1
+	      },
+	      "stat_cat_sum": {
+	        
+	      },
+	      "up": [
+	        1,
+	        0
+	      ],
+	      "acting": [
+	        1,
+	        0
+	      ]
+	    },
+	    "empty": 0,
+	    "dne": 0,
+	    "incomplete": 0
+	  },
+	  "recovery_state": [
+	    {
+	      "name": "Started\/Primary\/Active",
+	      "enter_time": "2013-01-23 09:35:37.594691",
+	      "might_have_unfound": [
+	        
+	      ],
+	      "scrub": {
+	        "scrub_epoch_start": "536",
+	        "scrub_active": 0,
+	        "scrub_block_writes": 0,
+	        "finalizing_scrub": 0,
+	        "scrub_waiting_on": 0,
+	        "scrub_waiting_on_whom": [
+	          
+	        ]
+	      }
+	    },
+	    {
+	      "name": "Started",
+	      "enter_time": "2013-01-23 09:35:31.581160"
+	    }
+	  ]
+	}
+
+
+
+The following subsections describe common states in greater detail.
+
+Creating
+--------
+
+When you create a pool, it will create the number of placement groups you
+specified.  Ceph will echo ``creating`` when it is creating one or more
+placement groups. Once they are created, the OSDs that are part of a placement
+group's Acting Set will peer. Once peering is complete, the placement group
+status should be ``active+clean``, which means a Ceph client can begin writing
+to the placement group.
+
+.. ditaa:: 
+         
+       /-----------\       /-----------\       /-----------\
+       | Creating  |------>|  Peering  |------>|  Active   |
+       \-----------/       \-----------/       \-----------/
+
+Peering
+-------
+
+When Ceph is Peering a placement group, Ceph is bringing the OSDs that
+store the replicas of the placement group into **agreement about the state**
+of the objects and metadata in the placement group. When Ceph completes peering,
+this means that the OSDs that store the placement group agree about the current
+state of the placement group. However, completion of the peering process does
+**NOT** mean that each replica has the latest contents.
+
+.. topic:: Authoratative History
+
+   Ceph will **NOT** acknowledge a write operation to a client, until 
+   all OSDs of the acting set persist the write operation. This practice 
+   ensures that at least one member of the acting set will have a record 
+   of every acknowledged write operation since the last successful 
+   peering operation.
+   
+   With an accurate record of each acknowledged write operation, Ceph can 
+   construct and disseminate a new authoritative history of the placement 
+   group--a complete, and fully ordered set of operations that, if performed, 
+   would bring an OSD’s copy of a placement group up to date.
+
+
+Active
+------
+
+Once Ceph completes the peering process, a placement group may become
+``active``. The ``active`` state means that the data in the placement group is
+generally  available in the primary placement group and the replicas for read
+and write operations. 
+
+
+Clean 
+-----
+
+When a placement group is in the ``clean`` state, the primary OSD and the
+replica OSDs have successfully peered and there are no stray replicas for the
+placement group. Ceph replicated all objects in the placement group the correct 
+number of times.
+
+
+Degraded
+--------
+
+When a client writes an object to the primary OSD, the primary OSD is
+responsible for writing the replicas to the replica OSDs. After the primary OSD
+writes the object to storage, the placement group will remain in a ``degraded``
+state until the primary OSD has received an acknowledgement from the replica
+OSDs that Ceph created the replica objects successfully. 
+
+The reason a placement group can be ``active+degraded`` is that an OSD may be
+``active`` even though it doesn't hold all of the objects yet. If an OSD goes
+``down``, Ceph marks each placement group assigned to the OSD as ``degraded``.
+The OSDs must peer again when the OSD comes back online. However, a client can
+still write a new object to a ``degraded`` placement group if it is ``active``.
+
+If an OSD is ``down`` and the ``degraded`` condition persists, Ceph may mark the
+``down`` OSD as ``out`` of the cluster and remap the data from the ``down`` OSD
+to another OSD. The time between being marked ``down`` and being marked ``out``
+is controlled by ``mon osd down out interval``, which is set to ``600`` seconds
+by default.
+
+A placement group can also be ``degraded``, because Ceph cannot find one or more
+objects that Ceph thinks should be in the placement group. While you cannot
+read or write to unfound objects, you can still access all of the other objects
+in the ``degraded`` placement group.
+
+
+Recovering
+----------
+
+Ceph was designed for fault-tolerance at a scale where hardware and software
+problems are ongoing. When an OSD goes ``down``, its contents may fall behind
+the current state of other replicas in the placement groups. When the OSD is
+back ``up``, the contents of the placement groups must be updated to reflect the
+current state. During that time period, the OSD may reflect a ``recovering``
+state.
+
+Recovery is not always trivial, because a hardware failure might cause a
+cascading failure of multiple OSDs. For example, a network switch for a rack or
+cabinet may fail, which can cause the OSDs of a number of host machines to fall
+behind the current state  of the cluster. Each one of the OSDs must recover once
+the fault is resolved.
+
+Ceph provides a number of settings to balance the resource contention between
+new service requests and the need to recover data objects and restore the
+placement groups to the current state. The ``osd recovery delay start`` setting
+allows an OSD to restart, re-peer and even process some replay requests before
+starting the recovery process.  The ``osd
+recovery thread timeout`` sets a thread timeout, because multiple OSDs may fail,
+restart and re-peer at staggered rates. The ``osd recovery max active`` setting
+limits the  number of recovery requests an OSD will entertain simultaneously to
+prevent the OSD from failing to serve . The ``osd recovery max chunk`` setting
+limits the size of the recovered data chunks to prevent network congestion.
+
+
+Back Filling
+------------
+
+When a new OSD joins the cluster, CRUSH will reassign placement groups from OSDs
+in the cluster to the newly added OSD. Forcing the new OSD to accept the
+reassigned placement groups immediately can put excessive load on the new OSD.
+Back filling the OSD with the placement groups allows this process to begin in
+the background.  Once backfilling is complete, the new OSD will begin serving
+requests when it is ready.
+
+During the backfill operations, you may see one of several states:
+``backfill_wait`` indicates that a backfill operation is pending, but is not
+underway yet; ``backfill`` indicates that a backfill operation is underway;
+and, ``backfill_too_full`` indicates that a backfill operation was requested,
+but couldn't be completed due to insufficient storage capacity. When a 
+placement group cannot be backfilled, it may be considered ``incomplete``.
+
+Ceph provides a number of settings to manage the load spike associated with
+reassigning placement groups to an OSD (especially a new OSD). By default,
+``osd_max_backfills`` sets the maximum number of concurrent backfills to or from
+an OSD to 10. The ``backfill full ratio`` enables an OSD to refuse a
+backfill request if the OSD is approaching its full ratio (90%, by default) and
+change with ``ceph osd set-backfillfull-ratio`` comand.
+If an OSD refuses a backfill request, the ``osd backfill retry interval``
+enables an OSD to retry the request (after 10 seconds, by default). OSDs can
+also set ``osd backfill scan min`` and ``osd backfill scan max`` to manage scan
+intervals (64 and 512, by default).
+
+
+Remapped
+--------
+
+When the Acting Set that services a placement group changes, the data migrates
+from the old acting set to the new acting set. It may take some time for a new
+primary OSD to service requests. So it may ask the old primary to continue to
+service requests until the placement group migration is complete. Once  data
+migration completes, the mapping uses the primary OSD of the new acting set.
+
+
+Stale
+-----
+
+While Ceph uses heartbeats to ensure that hosts and daemons are running, the
+``ceph-osd`` daemons may also get into a ``stuck`` state where they are not
+reporting statistics in a timely manner (e.g., a temporary network fault). By
+default, OSD daemons report their placement group, up thru, boot and failure
+statistics every half second (i.e., ``0.5``),  which is more frequent than the
+heartbeat thresholds. If the **Primary OSD** of a placement group's acting set
+fails to report to the monitor or if other OSDs have reported the primary OSD
+``down``, the monitors will mark the placement group ``stale``.
+
+When you start your cluster, it is common to see the ``stale`` state until
+the peering process completes. After your cluster has been running for awhile, 
+seeing placement groups in the ``stale`` state indicates that the primary OSD
+for those placement groups is ``down`` or not reporting placement group statistics
+to the monitor.
+
+
+Identifying Troubled PGs
+========================
+
+As previously noted, a placement group is not necessarily problematic just 
+because its state is not ``active+clean``. Generally, Ceph's ability to self
+repair may not be working when placement groups get stuck. The stuck states
+include:
+
+- **Unclean**: Placement groups contain objects that are not replicated the 
+  desired number of times. They should be recovering.
+- **Inactive**: Placement groups cannot process reads or writes because they 
+  are waiting for an OSD with the most up-to-date data to come back ``up``.
+- **Stale**: Placement groups are in an unknown state, because the OSDs that 
+  host them have not reported to the monitor cluster in a while (configured 
+  by ``mon osd report timeout``).
+
+To identify stuck placement groups, execute the following:: 
+
+	ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
+
+See `Placement Group Subsystem`_ for additional details. To troubleshoot
+stuck placement groups, see `Troubleshooting PG Errors`_.
+
+
+Finding an Object Location
+==========================
+
+To store object data in the Ceph Object Store, a Ceph client must: 
+
+#. Set an object name
+#. Specify a `pool`_
+
+The Ceph client retrieves the latest cluster map and the CRUSH algorithm
+calculates how to map the object to a `placement group`_, and then calculates
+how to assign the placement group to an OSD dynamically. To find the object
+location, all you need is the object name and the pool name. For example:: 
+
+	ceph osd map {poolname} {object-name}
+
+.. topic:: Exercise: Locate an Object
+
+	As an exercise, lets create an object. Specify an object name, a path to a
+	test file containing some object data and a pool name using the 
+	``rados put`` command on the command line. For example::
+   
+		rados put {object-name} {file-path} --pool=data   	
+		rados put test-object-1 testfile.txt --pool=data
+   
+	To verify that the Ceph Object Store stored the object, execute the following::
+   
+		rados -p data ls
+   
+	Now, identify the object location::	
+
+		ceph osd map {pool-name} {object-name}
+		ceph osd map data test-object-1
+   
+	Ceph should output the object's location. For example:: 
+   
+		osdmap e537 pool 'data' (0) object 'test-object-1' -> pg 0.d1743484 (0.4) -> up [1,0] acting [1,0]
+   
+	To remove the test object, simply delete it using the ``rados rm`` command.
+	For example:: 
+   
+		rados rm test-object-1 --pool=data
+   
+
+As the cluster evolves, the object location may change dynamically. One benefit
+of Ceph's dynamic rebalancing is that Ceph relieves you from having to perform
+the migration manually. See the  `Architecture`_ section for details.
+
+.. _data placement: ../data-placement
+.. _pool: ../pools
+.. _placement group: ../placement-groups
+.. _Architecture: ../../../architecture
+.. _OSD Not Running: ../../troubleshooting/troubleshooting-osd#osd-not-running
+.. _Troubleshooting PG Errors: ../../troubleshooting/troubleshooting-pg#troubleshooting-pg-errors
+.. _Peering Failure: ../../troubleshooting/troubleshooting-pg#failures-osd-peering
+.. _CRUSH map: ../crush-map
+.. _Configuring Monitor/OSD Interaction: ../../configuration/mon-osd-interaction/
+.. _Placement Group Subsystem: ../control#placement-group-subsystem
diff --git a/src/ceph/doc/rados/operations/monitoring.rst b/src/ceph/doc/rados/operations/monitoring.rst
new file mode 100644
index 0000000..c291440
--- /dev/null
+++ b/src/ceph/doc/rados/operations/monitoring.rst
@@ -0,0 +1,351 @@
+======================
+ Monitoring a Cluster
+======================
+
+Once you have a running cluster, you may use the ``ceph`` tool to monitor your
+cluster. Monitoring a cluster typically involves checking OSD status, monitor 
+status, placement group status and metadata server status.
+
+Using the command line
+======================
+
+Interactive mode
+----------------
+
+To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
+with no arguments.  For example:: 
+
+	ceph
+	ceph> health
+	ceph> status
+	ceph> quorum_status
+	ceph> mon_status
+
+Non-default paths
+-----------------
+
+If you specified non-default locations for your configuration or keyring,
+you may specify their locations::
+
+   ceph -c /path/to/conf -k /path/to/keyring health
+
+Checking a Cluster's Status
+===========================
+
+After you start your cluster, and before you start reading and/or
+writing data, check your cluster's status first.
+
+To check a cluster's status, execute the following:: 
+
+	ceph status
+	
+Or:: 
+
+	ceph -s
+
+In interactive mode, type ``status`` and press **Enter**. ::
+
+	ceph> status
+
+Ceph will print the cluster status. For example, a tiny Ceph demonstration
+cluster with one of each service may print the following:
+
+::
+
+  cluster:
+    id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
+    health: HEALTH_OK
+   
+  services:
+    mon: 1 daemons, quorum a
+    mgr: x(active)
+    mds: 1/1/1 up {0=a=up:active}
+    osd: 1 osds: 1 up, 1 in
+  
+  data:
+    pools:   2 pools, 16 pgs
+    objects: 21 objects, 2246 bytes
+    usage:   546 GB used, 384 GB / 931 GB avail
+    pgs:     16 active+clean
+
+
+.. topic:: How Ceph Calculates Data Usage
+
+   The ``usage`` value reflects the *actual* amount of raw storage used. The 
+   ``xxx GB / xxx GB`` value means the amount available (the lesser number)
+   of the overall storage capacity of the cluster. The notional number reflects 
+   the size of the stored data before it is replicated, cloned or snapshotted.
+   Therefore, the amount of data actually stored typically exceeds the notional
+   amount stored, because Ceph creates replicas of the data and may also use 
+   storage capacity for cloning and snapshotting.
+
+
+Watching a Cluster
+==================
+
+In addition to local logging by each daemon, Ceph clusters maintain
+a *cluster log* that records high level events about the whole system.
+This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
+default), but can also be monitored via the command line.
+
+To follow the cluster log, use the following command
+
+:: 
+
+	ceph -w
+
+Ceph will print the status of the system, followed by each log message as it
+is emitted.  For example:
+
+:: 
+
+  cluster:
+    id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
+    health: HEALTH_OK
+  
+  services:
+    mon: 1 daemons, quorum a
+    mgr: x(active)
+    mds: 1/1/1 up {0=a=up:active}
+    osd: 1 osds: 1 up, 1 in
+  
+  data:
+    pools:   2 pools, 16 pgs
+    objects: 21 objects, 2246 bytes
+    usage:   546 GB used, 384 GB / 931 GB avail
+    pgs:     16 active+clean
+  
+  
+  2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
+  2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
+  2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
+
+
+In addition to using ``ceph -w`` to print log lines as they are emitted,
+use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
+log.
+
+Monitoring Health Checks
+========================
+
+Ceph continously runs various *health checks* against its own status.  When
+a health check fails, this is reflected in the output of ``ceph status`` (or
+``ceph health``).  In addition, messages are sent to the cluster log to
+indicate when a check fails, and when the cluster recovers.
+
+For example, when an OSD goes down, the ``health`` section of the status
+output may be updated as follows:
+
+::
+
+    health: HEALTH_WARN
+            1 osds down
+            Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
+
+At this time, cluster log messages are also emitted to record the failure of the 
+health checks:
+
+::
+
+    2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
+    2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
+
+When the OSD comes back online, the cluster log records the cluster's return
+to a health state:
+
+::
+
+    2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
+    2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
+    2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
+
+
+Detecting configuration issues
+==============================
+
+In addition to the health checks that Ceph continuously runs on its
+own status, there are some configuration issues that may only be detected
+by an external tool.
+
+Use the `ceph-medic`_ tool to run these additional checks on your Ceph
+cluster's configuration.
+
+Checking a Cluster's Usage Stats
+================================
+
+To check a cluster's data usage and data distribution among pools, you can
+use the ``df`` option. It is similar to Linux ``df``. Execute 
+the following::
+
+	ceph df
+
+The **GLOBAL** section of the output provides an overview of the amount of 
+storage your cluster uses for your data.
+
+- **SIZE:** The overall storage capacity of the cluster.
+- **AVAIL:** The amount of free space available in the cluster.
+- **RAW USED:** The amount of raw storage used.
+- **% RAW USED:** The percentage of raw storage used. Use this number in 
+  conjunction with the ``full ratio`` and ``near full ratio`` to ensure that 
+  you are not reaching your cluster's capacity. See `Storage Capacity`_ for 
+  additional details.
+
+The **POOLS** section of the output provides a list of pools and the notional 
+usage of each pool. The output from this section **DOES NOT** reflect replicas,
+clones or snapshots. For example, if you store an object with 1MB of data, the 
+notional usage will be 1MB, but the actual usage may be 2MB or more depending 
+on the number of replicas, clones and snapshots.
+
+- **NAME:** The name of the pool.
+- **ID:** The pool ID.
+- **USED:** The notional amount of data stored in kilobytes, unless the number 
+  appends **M** for megabytes or **G** for gigabytes.
+- **%USED:** The notional percentage of storage used per pool.
+- **MAX AVAIL:** An estimate of the notional amount of data that can be written
+  to this pool.
+- **Objects:** The notional number of objects stored per pool.
+
+.. note:: The numbers in the **POOLS** section are notional. They are not 
+   inclusive of the number of replicas, shapshots or clones. As a result, 
+   the sum of the **USED** and **%USED** amounts will not add up to the 
+   **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the 
+   output.
+
+.. note:: The **MAX AVAIL** value is a complicated function of the
+   replication or erasure code used, the CRUSH rule that maps storage
+   to devices, the utilization of those devices, and the configured
+   mon_osd_full_ratio.
+
+
+
+Checking OSD Status
+===================
+
+You can check OSDs to ensure they are ``up`` and ``in`` by executing:: 
+
+	ceph osd stat
+	
+Or:: 
+
+	ceph osd dump
+	
+You can also check view OSDs according to their position in the CRUSH map. :: 
+
+	ceph osd tree
+
+Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
+and their weight. ::  
+
+	# id	weight	type name	up/down	reweight
+	-1	3	pool default
+	-3	3		rack mainrack
+	-2	3			host osd-host
+	0	1				osd.0	up	1	
+	1	1				osd.1	up	1	
+	2	1				osd.2	up	1
+
+For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
+
+Checking Monitor Status
+=======================
+
+If your cluster has multiple monitors (likely), you should check the monitor
+quorum status after you start the cluster before reading and/or writing data. A
+quorum must be present when multiple monitors are running. You should also check
+monitor status periodically to ensure that they are running.
+
+To see display the monitor map, execute the following::
+
+	ceph mon stat
+	
+Or:: 
+
+	ceph mon dump
+	
+To check the quorum status for the monitor cluster, execute the following:: 
+	
+	ceph quorum_status
+
+Ceph will return the quorum status. For example, a Ceph  cluster consisting of
+three monitors may return the following:
+
+.. code-block:: javascript
+
+	{ "election_epoch": 10,
+	  "quorum": [
+	        0,
+	        1,
+	        2],
+	  "monmap": { "epoch": 1,
+	      "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
+	      "modified": "2011-12-12 13:28:27.505520",
+	      "created": "2011-12-12 13:28:27.505520",
+	      "mons": [
+	            { "rank": 0,
+	              "name": "a",
+	              "addr": "127.0.0.1:6789\/0"},
+	            { "rank": 1,
+	              "name": "b",
+	              "addr": "127.0.0.1:6790\/0"},
+	            { "rank": 2,
+	              "name": "c",
+	              "addr": "127.0.0.1:6791\/0"}
+	           ]
+	    }
+	}
+
+Checking MDS Status
+===================
+
+Metadata servers provide metadata services for  Ceph FS. Metadata servers have
+two sets of states: ``up | down`` and ``active | inactive``. To ensure your
+metadata servers are ``up`` and ``active``,  execute the following:: 
+
+	ceph mds stat
+	
+To display details of the metadata cluster, execute the following:: 
+
+	ceph fs dump
+
+
+Checking Placement Group States
+===============================
+
+Placement groups map objects to OSDs. When you monitor your
+placement groups,  you will want them to be ``active`` and ``clean``. 
+For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
+
+.. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
+
+
+Using the Admin Socket
+======================
+
+The Ceph admin socket allows you to query a daemon via a socket interface. 
+By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
+via the admin socket, login to the host running the daemon and use the 
+following command:: 
+
+	ceph daemon {daemon-name}
+	ceph daemon {path-to-socket-file}
+
+For example, the following are equivalent::
+
+    ceph daemon osd.0 foo
+    ceph daemon /var/run/ceph/ceph-osd.0.asok foo
+
+To view the available admin socket commands, execute the following command:: 
+
+	ceph daemon {daemon-name} help
+
+The admin socket command enables you to show and set your configuration at
+runtime. See `Viewing a Configuration at Runtime`_ for details.
+
+Additionally, you can set configuration values at runtime directly (i.e., the
+admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
+injectargs``, which relies on the monitor but doesn't require you to login
+directly to the host in question ).
+
+.. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
+.. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
+.. _ceph-medic: http://docs.ceph.com/ceph-medic/master/
diff --git a/src/ceph/doc/rados/operations/operating.rst b/src/ceph/doc/rados/operations/operating.rst
new file mode 100644
index 0000000..791941a
--- /dev/null
+++ b/src/ceph/doc/rados/operations/operating.rst
@@ -0,0 +1,251 @@
+=====================
+ Operating a Cluster
+=====================
+
+.. index:: systemd; operating a cluster
+
+
+Running Ceph with systemd
+==========================
+
+For all distributions that support systemd (CentOS 7, Fedora, Debian
+Jessie 8 and later, SUSE), ceph daemons are now managed using native
+systemd files instead of the legacy sysvinit scripts.  For example::
+
+        sudo systemctl start ceph.target       # start all daemons
+        sudo systemctl status ceph-osd@12      # check status of osd.12
+
+To list the Ceph systemd units on a node, execute::
+
+        sudo systemctl status ceph\*.service ceph\*.target
+
+Starting all Daemons
+--------------------
+
+To start all daemons on a Ceph Node (irrespective of type), execute the
+following::
+
+	sudo systemctl start ceph.target
+
+
+Stopping all Daemons
+--------------------
+
+To stop all daemons on a Ceph Node (irrespective of type), execute the
+following::
+
+        sudo systemctl stop ceph\*.service ceph\*.target
+
+
+Starting all Daemons by Type
+----------------------------
+
+To start all daemons of a particular type on a Ceph Node, execute one of the
+following::
+
+        sudo systemctl start ceph-osd.target
+        sudo systemctl start ceph-mon.target
+        sudo systemctl start ceph-mds.target
+
+
+Stopping all Daemons by Type
+----------------------------
+
+To stop all daemons of a particular type on a Ceph Node, execute one of the
+following::
+
+        sudo systemctl stop ceph-mon\*.service ceph-mon.target
+        sudo systemctl stop ceph-osd\*.service ceph-osd.target
+        sudo systemctl stop ceph-mds\*.service ceph-mds.target
+
+
+Starting a Daemon
+-----------------
+
+To start a specific daemon instance on a Ceph Node, execute one of the
+following::
+
+	sudo systemctl start ceph-osd@{id}
+	sudo systemctl start ceph-mon@{hostname}
+	sudo systemctl start ceph-mds@{hostname}
+
+For example::
+
+	sudo systemctl start ceph-osd@1
+	sudo systemctl start ceph-mon@ceph-server
+	sudo systemctl start ceph-mds@ceph-server
+
+
+Stopping a Daemon
+-----------------
+
+To stop a specific daemon instance on a Ceph Node, execute one of the
+following::
+
+	sudo systemctl stop ceph-osd@{id}
+	sudo systemctl stop ceph-mon@{hostname}
+	sudo systemctl stop ceph-mds@{hostname}
+
+For example::
+
+	sudo systemctl stop ceph-osd@1
+	sudo systemctl stop ceph-mon@ceph-server
+	sudo systemctl stop ceph-mds@ceph-server
+
+
+.. index:: Ceph service; Upstart; operating a cluster
+
+
+
+Running Ceph with Upstart
+=========================
+
+When deploying Ceph with ``ceph-deploy`` on Ubuntu Trusty, you may start and
+stop Ceph daemons on a :term:`Ceph Node` using the event-based `Upstart`_.
+Upstart does not require you to define daemon instances in the Ceph
+configuration file.
+
+To list the Ceph Upstart jobs and instances on a node, execute:: 
+
+	sudo initctl list | grep ceph
+
+See `initctl`_ for additional details.
+
+
+Starting all Daemons
+--------------------
+
+To start all daemons on a Ceph Node (irrespective of type), execute the
+following:: 
+
+	sudo start ceph-all
+	
+
+Stopping all Daemons	
+--------------------
+
+To stop all daemons on a Ceph Node (irrespective of type), execute the
+following:: 
+
+	sudo stop ceph-all
+	
+
+Starting all Daemons by Type
+----------------------------
+
+To start all daemons of a particular type on a Ceph Node, execute one of the
+following:: 
+
+	sudo start ceph-osd-all
+	sudo start ceph-mon-all
+	sudo start ceph-mds-all
+
+
+Stopping all Daemons by Type
+----------------------------
+
+To stop all daemons of a particular type on a Ceph Node, execute one of the
+following::
+
+	sudo stop ceph-osd-all
+	sudo stop ceph-mon-all
+	sudo stop ceph-mds-all
+
+
+Starting a Daemon
+-----------------
+
+To start a specific daemon instance on a Ceph Node, execute one of the
+following:: 
+
+	sudo start ceph-osd id={id}
+	sudo start ceph-mon id={hostname}
+	sudo start ceph-mds id={hostname}
+
+For example:: 
+
+	sudo start ceph-osd id=1
+	sudo start ceph-mon id=ceph-server
+	sudo start ceph-mds id=ceph-server
+
+
+Stopping a Daemon
+-----------------
+
+To stop a specific daemon instance on a Ceph Node, execute one of the
+following:: 
+
+	sudo stop ceph-osd id={id}
+	sudo stop ceph-mon id={hostname}
+	sudo stop ceph-mds id={hostname}
+
+For example:: 
+
+	sudo stop ceph-osd id=1
+	sudo start ceph-mon id=ceph-server
+	sudo start ceph-mds id=ceph-server
+
+
+.. index:: Ceph service; sysvinit; operating a cluster
+
+
+Running Ceph
+============
+
+Each time you to **start**, **restart**, and  **stop** Ceph daemons (or your
+entire cluster) you must specify at least one option and one command. You may
+also specify a daemon type or a daemon instance. ::
+
+	{commandline} [options] [commands] [daemons]
+
+
+The ``ceph`` options include:
+
++-----------------+----------+-------------------------------------------------+
+| Option          | Shortcut | Description                                     |
++=================+==========+=================================================+
+| ``--verbose``   |  ``-v``  | Use verbose logging.                            |
++-----------------+----------+-------------------------------------------------+
+| ``--valgrind``  | ``N/A``  | (Dev and QA only) Use `Valgrind`_ debugging.    |
++-----------------+----------+-------------------------------------------------+
+| ``--allhosts``  |  ``-a``  | Execute on all nodes in ``ceph.conf.``          |
+|                 |          | Otherwise, it only executes on ``localhost``.   |
++-----------------+----------+-------------------------------------------------+
+| ``--restart``   | ``N/A``  | Automatically restart daemon if it core dumps.  |
++-----------------+----------+-------------------------------------------------+
+| ``--norestart`` | ``N/A``  | Don't restart a daemon if it core dumps.        |
++-----------------+----------+-------------------------------------------------+
+| ``--conf``      |  ``-c``  | Use an alternate configuration file.            |
++-----------------+----------+-------------------------------------------------+
+
+The ``ceph`` commands include:
+
++------------------+------------------------------------------------------------+
+| Command          | Description                                                |
++==================+============================================================+
+|    ``start``     | Start the daemon(s).                                       |
++------------------+------------------------------------------------------------+
+|    ``stop``      | Stop the daemon(s).                                        |
++------------------+------------------------------------------------------------+
+|  ``forcestop``   | Force the daemon(s) to stop. Same as ``kill -9``           |
++------------------+------------------------------------------------------------+
+|   ``killall``    | Kill all daemons of a particular type.                     | 
++------------------+------------------------------------------------------------+
+|  ``cleanlogs``   | Cleans out the log directory.                              |
++------------------+------------------------------------------------------------+
+| ``cleanalllogs`` | Cleans out **everything** in the log directory.            |
++------------------+------------------------------------------------------------+
+
+For subsystem operations, the ``ceph`` service can target specific daemon types
+by adding a particular daemon type for the ``[daemons]`` option. Daemon types
+include: 
+
+- ``mon``
+- ``osd``
+- ``mds``
+
+
+
+.. _Valgrind: http://www.valgrind.org/
+.. _Upstart: http://upstart.ubuntu.com/index.html
+.. _initctl: http://manpages.ubuntu.com/manpages/raring/en/man8/initctl.8.html
diff --git a/src/ceph/doc/rados/operations/pg-concepts.rst b/src/ceph/doc/rados/operations/pg-concepts.rst
new file mode 100644
index 0000000..636d6bf
--- /dev/null
+++ b/src/ceph/doc/rados/operations/pg-concepts.rst
@@ -0,0 +1,102 @@
+==========================
+ Placement Group Concepts
+==========================
+
+When you execute commands like ``ceph -w``, ``ceph osd dump``, and other 
+commands related to placement groups, Ceph may return values using some
+of the following terms: 
+
+*Peering*
+   The process of bringing all of the OSDs that store
+   a Placement Group (PG) into agreement about the state
+   of all of the objects (and their metadata) in that PG.
+   Note that agreeing on the state does not mean that
+   they all have the latest contents.
+
+*Acting Set*
+   The ordered list of OSDs who are (or were as of some epoch)
+   responsible for a particular placement group.
+
+*Up Set*
+   The ordered list of OSDs responsible for a particular placement
+   group for a particular epoch according to CRUSH. Normally this
+   is the same as the *Acting Set*, except when the *Acting Set* has 
+   been explicitly overridden via ``pg_temp`` in the OSD Map.
+
+*Current Interval* or *Past Interval*
+   A sequence of OSD map epochs during which the *Acting Set* and *Up
+   Set* for particular placement group do not change.
+
+*Primary*
+   The member (and by convention first) of the *Acting Set*,
+   that is responsible for coordination peering, and is
+   the only OSD that will accept client-initiated
+   writes to objects in a placement group.
+
+*Replica*
+   A non-primary OSD in the *Acting Set* for a placement group
+   (and who has been recognized as such and *activated* by the primary).
+
+*Stray*
+   An OSD that is not a member of the current *Acting Set*, but
+   has not yet been told that it can delete its copies of a
+   particular placement group.
+
+*Recovery*
+   Ensuring that copies of all of the objects in a placement group
+   are on all of the OSDs in the *Acting Set*.  Once *Peering* has 
+   been performed, the *Primary* can start accepting write operations, 
+   and *Recovery* can proceed in the background.
+
+*PG Info* 
+   Basic metadata about the placement group's creation epoch, the version
+   for the most recent write to the placement group, *last epoch started*, 
+   *last epoch clean*, and the beginning of the *current interval*.  Any
+   inter-OSD communication about placement groups includes the *PG Info*, 
+   such that any OSD that knows a placement group exists (or once existed) 
+   also has a lower bound on *last epoch clean* or *last epoch started*.
+
+*PG Log*
+   A list of recent updates made to objects in a placement group.
+   Note that these logs can be truncated after all OSDs
+   in the *Acting Set* have acknowledged up to a certain
+   point.
+
+*Missing Set*
+   Each OSD notes update log entries and if they imply updates to
+   the contents of an object, adds that object to a list of needed
+   updates.  This list is called the *Missing Set* for that ``<OSD,PG>``.
+
+*Authoritative History*
+   A complete, and fully ordered set of operations that, if
+   performed, would bring an OSD's copy of a placement group
+   up to date.
+
+*Epoch*
+   A (monotonically increasing) OSD map version number
+
+*Last Epoch Start*
+   The last epoch at which all nodes in the *Acting Set*
+   for a particular placement group agreed on an
+   *Authoritative History*.  At this point, *Peering* is
+   deemed to have been successful.
+
+*up_thru*
+   Before a *Primary* can successfully complete the *Peering* process,
+   it must inform a monitor that is alive through the current
+   OSD map *Epoch* by having the monitor set its *up_thru* in the osd
+   map.  This helps *Peering* ignore previous *Acting Sets* for which
+   *Peering* never completed after certain sequences of failures, such as
+   the second interval below:
+
+   - *acting set* = [A,B]
+   - *acting set* = [A]
+   - *acting set* = [] very shortly after (e.g., simultaneous failure, but staggered detection)
+   - *acting set* = [B] (B restarts, A does not)
+
+*Last Epoch Clean*
+   The last *Epoch* at which all nodes in the *Acting set*
+   for a particular placement group were completely
+   up to date (both placement group logs and object contents).
+   At this point, *recovery* is deemed to have been
+   completed.
diff --git a/src/ceph/doc/rados/operations/pg-repair.rst b/src/ceph/doc/rados/operations/pg-repair.rst
new file mode 100644
index 0000000..0d6692a
--- /dev/null
+++ b/src/ceph/doc/rados/operations/pg-repair.rst
@@ -0,0 +1,4 @@
+Repairing PG inconsistencies
+============================
+
+
diff --git a/src/ceph/doc/rados/operations/pg-states.rst b/src/ceph/doc/rados/operations/pg-states.rst
new file mode 100644
index 0000000..0fbd3dc
--- /dev/null
+++ b/src/ceph/doc/rados/operations/pg-states.rst
@@ -0,0 +1,80 @@
+========================
+ Placement Group States
+========================
+
+When checking a cluster's status (e.g., running ``ceph -w`` or ``ceph -s``), 
+Ceph will report on the status of the placement groups. A placement group has 
+one or more states. The optimum state for placement groups in the placement group
+map is ``active + clean``. 
+
+*Creating*
+  Ceph is still creating the placement group.
+
+*Active*
+  Ceph will process requests to the placement group.
+
+*Clean*
+  Ceph replicated all objects in the placement group the correct number of times.
+
+*Down*
+  A replica with necessary data is down, so the placement group is offline.
+
+*Scrubbing*
+  Ceph is checking the placement group for inconsistencies.
+
+*Degraded*
+  Ceph has not replicated some objects in the placement group the correct number of times yet.
+
+*Inconsistent*
+  Ceph detects inconsistencies in the one or more replicas of an object in the placement group
+  (e.g. objects are the wrong size, objects are missing from one replica *after* recovery finished, etc.).
+
+*Peering*
+  The placement group is undergoing the peering process
+
+*Repair*
+  Ceph is checking the placement group and repairing any inconsistencies it finds (if possible).
+
+*Recovering*
+  Ceph is migrating/synchronizing objects and their replicas.
+
+*Forced-Recovery*
+  High recovery priority of that PG is enforced by user.
+
+*Backfill*
+  Ceph is scanning and synchronizing the entire contents of a placement group
+  instead of inferring what contents need to be synchronized from the logs of
+  recent operations. *Backfill* is a special case of recovery.
+
+*Forced-Backfill*
+  High backfill priority of that PG is enforced by user.
+
+*Wait-backfill*
+  The placement group is waiting in line to start backfill.
+
+*Backfill-toofull*
+  A backfill operation is waiting because the destination OSD is over its
+  full ratio.
+
+*Incomplete*
+  Ceph detects that a placement group is missing information about
+  writes that may have occurred, or does not have any healthy
+  copies. If you see this state, try to start any failed OSDs that may
+  contain the needed information. In the case of an erasure coded pool
+  temporarily reducing min_size may allow recovery.
+
+*Stale*
+  The placement group is in an unknown state - the monitors have not received
+  an update for it since the placement group mapping changed.
+
+*Remapped*
+  The placement group is temporarily mapped to a different set of OSDs from what
+  CRUSH specified.
+
+*Undersized*
+  The placement group fewer copies than the configured pool replication level.
+
+*Peered*
+  The placement group has peered, but cannot serve client IO due to not having
+  enough copies to reach the pool's configured min_size parameter.  Recovery
+  may occur in this state, so the pg may heal up to min_size eventually.
diff --git a/src/ceph/doc/rados/operations/placement-groups.rst b/src/ceph/doc/rados/operations/placement-groups.rst
new file mode 100644
index 0000000..fee833a
--- /dev/null
+++ b/src/ceph/doc/rados/operations/placement-groups.rst
@@ -0,0 +1,469 @@
+==================
+ Placement Groups
+==================
+
+.. _preselection:
+
+A preselection of pg_num
+========================
+
+When creating a new pool with::
+
+        ceph osd pool create {pool-name} pg_num
+
+it is mandatory to choose the value of ``pg_num`` because it cannot be
+calculated automatically. Here are a few values commonly used:
+
+- Less than 5 OSDs set ``pg_num`` to 128
+
+- Between 5 and 10 OSDs set ``pg_num`` to 512
+
+- Between 10 and 50 OSDs set ``pg_num`` to 1024
+
+- If you have more than 50 OSDs, you need to understand the tradeoffs
+  and how to calculate the ``pg_num`` value by yourself
+
+- For calculating ``pg_num`` value by yourself please take help of `pgcalc`_ tool 
+
+As the number of OSDs increases, chosing the right value for pg_num
+becomes more important because it has a significant influence on the
+behavior of the cluster as well as the durability of the data when
+something goes wrong (i.e. the probability that a catastrophic event
+leads to data loss).
+
+How are Placement Groups used ?
+===============================
+
+A placement group (PG) aggregates objects within a pool because
+tracking object placement and object metadata on a per-object basis is
+computationally expensive--i.e., a system with millions of objects
+cannot realistically track placement on a per-object basis.
+
+.. ditaa::
+           /-----\  /-----\  /-----\  /-----\  /-----\
+           | obj |  | obj |  | obj |  | obj |  | obj |
+           \-----/  \-----/  \-----/  \-----/  \-----/
+              |        |        |        |        |
+              +--------+--------+        +---+----+
+              |                              |
+              v                              v
+   +-----------------------+      +-----------------------+
+   |  Placement Group #1   |      |  Placement Group #2   |
+   |                       |      |                       |
+   +-----------------------+      +-----------------------+
+               |                              |
+               +------------------------------+
+                             |
+                             v
+                  +-----------------------+
+                  |        Pool           |
+                  |                       |
+                  +-----------------------+
+
+The Ceph client will calculate which placement group an object should
+be in. It does this by hashing the object ID and applying an operation
+based on the number of PGs in the defined pool and the ID of the pool.
+See `Mapping PGs to OSDs`_ for details.
+
+The object's contents within a placement group are stored in a set of
+OSDs. For instance, in a replicated pool of size two, each placement
+group will store objects on two OSDs, as shown below.
+
+.. ditaa::
+
+   +-----------------------+      +-----------------------+
+   |  Placement Group #1   |      |  Placement Group #2   |
+   |                       |      |                       |
+   +-----------------------+      +-----------------------+
+        |             |               |             |
+        v             v               v             v
+   /----------\  /----------\    /----------\  /----------\
+   |          |  |          |    |          |  |          |
+   |  OSD #1  |  |  OSD #2  |    |  OSD #2  |  |  OSD #3  |
+   |          |  |          |    |          |  |          |
+   \----------/  \----------/    \----------/  \----------/
+
+
+Should OSD #2 fail, another will be assigned to Placement Group #1 and
+will be filled with copies of all objects in OSD #1. If the pool size
+is changed from two to three, an additional OSD will be assigned to
+the placement group and will receive copies of all objects in the
+placement group.
+
+Placement groups do not own the OSD, they share it with other
+placement groups from the same pool or even other pools. If OSD #2
+fails, the Placement Group #2 will also have to restore copies of
+objects, using OSD #3.
+
+When the number of placement groups increases, the new placement
+groups will be assigned OSDs. The result of the CRUSH function will
+also change and some objects from the former placement groups will be
+copied over to the new Placement Groups and removed from the old ones.
+
+Placement Groups Tradeoffs
+==========================
+
+Data durability and even distribution among all OSDs call for more
+placement groups but their number should be reduced to the minimum to
+save CPU and memory.
+
+.. _data durability:
+
+Data durability
+---------------
+
+After an OSD fails, the risk of data loss increases until the data it
+contained is fully recovered. Let's imagine a scenario that causes
+permanent data loss in a single placement group:
+
+- The OSD fails and all copies of the object it contains are lost.
+  For all objects within the placement group the number of replica
+  suddently drops from three to two.
+
+- Ceph starts recovery for this placement group by chosing a new OSD
+  to re-create the third copy of all objects.
+
+- Another OSD, within the same placement group, fails before the new
+  OSD is fully populated with the third copy. Some objects will then
+  only have one surviving copies.
+
+- Ceph picks yet another OSD and keeps copying objects to restore the
+  desired number of copies.
+
+- A third OSD, within the same placement group, fails before recovery
+  is complete. If this OSD contained the only remaining copy of an
+  object, it is permanently lost.
+
+In a cluster containing 10 OSDs with 512 placement groups in a three
+replica pool, CRUSH will give each placement groups three OSDs. In the
+end, each OSDs will end up hosting (512 * 3) / 10 = ~150 Placement
+Groups. When the first OSD fails, the above scenario will therefore
+start recovery for all 150 placement groups at the same time.
+
+The 150 placement groups being recovered are likely to be
+homogeneously spread over the 9 remaining OSDs. Each remaining OSD is
+therefore likely to send copies of objects to all others and also
+receive some new objects to be stored because they became part of a
+new placement group.
+
+The amount of time it takes for this recovery to complete entirely
+depends on the architecture of the Ceph cluster. Let say each OSD is
+hosted by a 1TB SSD on a single machine and all of them are connected
+to a 10Gb/s switch and the recovery for a single OSD completes within
+M minutes. If there are two OSDs per machine using spinners with no
+SSD journal and a 1Gb/s switch, it will at least be an order of
+magnitude slower.
+
+In a cluster of this size, the number of placement groups has almost
+no influence on data durability. It could be 128 or 8192 and the
+recovery would not be slower or faster.
+
+However, growing the same Ceph cluster to 20 OSDs instead of 10 OSDs
+is likely to speed up recovery and therefore improve data durability
+significantly. Each OSD now participates in only ~75 placement groups
+instead of ~150 when there were only 10 OSDs and it will still require
+all 19 remaining OSDs to perform the same amount of object copies in
+order to recover. But where 10 OSDs had to copy approximately 100GB
+each, they now have to copy 50GB each instead. If the network was the
+bottleneck, recovery will happen twice as fast. In other words,
+recovery goes faster when the number of OSDs increases.
+
+If this cluster grows to 40 OSDs, each of them will only host ~35
+placement groups. If an OSD dies, recovery will keep going faster
+unless it is blocked by another bottleneck. However, if this cluster
+grows to 200 OSDs, each of them will only host ~7 placement groups. If
+an OSD dies, recovery will happen between at most of ~21 (7 * 3) OSDs
+in these placement groups: recovery will take longer than when there
+were 40 OSDs, meaning the number of placement groups should be
+increased.
+
+No matter how short the recovery time is, there is a chance for a
+second OSD to fail while it is in progress. In the 10 OSDs cluster
+described above, if any of them fail, then ~17 placement groups
+(i.e. ~150 / 9 placement groups being recovered) will only have one
+surviving copy. And if any of the 8 remaining OSD fail, the last
+objects of two placement groups are likely to be lost (i.e. ~17 / 8
+placement groups with only one remaining copy being recovered).
+
+When the size of the cluster grows to 20 OSDs, the number of Placement
+Groups damaged by the loss of three OSDs drops. The second OSD lost
+will degrade ~4 (i.e. ~75 / 19 placement groups being recovered)
+instead of ~17 and the third OSD lost will only lose data if it is one
+of the four OSDs containing the surviving copy. In other words, if the
+probability of losing one OSD is 0.0001% during the recovery time
+frame, it goes from 17 * 10 * 0.0001% in the cluster with 10 OSDs to 4 * 20 * 
+0.0001% in the cluster with 20 OSDs.
+
+In a nutshell, more OSDs mean faster recovery and a lower risk of
+cascading failures leading to the permanent loss of a Placement
+Group. Having 512 or 4096 Placement Groups is roughly equivalent in a
+cluster with less than 50 OSDs as far as data durability is concerned.
+
+Note: It may take a long time for a new OSD added to the cluster to be
+populated with placement groups that were assigned to it. However
+there is no degradation of any object and it has no impact on the
+durability of the data contained in the Cluster.
+
+.. _object distribution:
+
+Object distribution within a pool
+---------------------------------
+
+Ideally objects are evenly distributed in each placement group. Since
+CRUSH computes the placement group for each object, but does not
+actually know how much data is stored in each OSD within this
+placement group, the ratio between the number of placement groups and
+the number of OSDs may influence the distribution of the data
+significantly.
+
+For instance, if there was single a placement group for ten OSDs in a
+three replica pool, only three OSD would be used because CRUSH would
+have no other choice. When more placement groups are available,
+objects are more likely to be evenly spread among them. CRUSH also
+makes every effort to evenly spread OSDs among all existing Placement
+Groups.
+
+As long as there are one or two orders of magnitude more Placement
+Groups than OSDs, the distribution should be even. For instance, 300
+placement groups for 3 OSDs, 1000 placement groups for 10 OSDs etc.
+
+Uneven data distribution can be caused by factors other than the ratio
+between OSDs and placement groups. Since CRUSH does not take into
+account the size of the objects, a few very large objects may create
+an imbalance. Let say one million 4K objects totaling 4GB are evenly
+spread among 1000 placement groups on 10 OSDs. They will use 4GB / 10
+= 400MB on each OSD. If one 400MB object is added to the pool, the
+three OSDs supporting the placement group in which the object has been
+placed will be filled with 400MB + 400MB = 800MB while the seven
+others will remain occupied with only 400MB.
+
+.. _resource usage:
+
+Memory, CPU and network usage
+-----------------------------
+
+For each placement group, OSDs and MONs need memory, network and CPU
+at all times and even more during recovery. Sharing this overhead by
+clustering objects within a placement group is one of the main reasons
+they exist.
+
+Minimizing the number of placement groups saves significant amounts of
+resources.
+
+Choosing the number of Placement Groups
+=======================================
+
+If you have more than 50 OSDs, we recommend approximately 50-100
+placement groups per OSD to balance out resource usage, data
+durability and distribution. If you have less than 50 OSDs, chosing
+among the `preselection`_ above is best. For a single pool of objects,
+you can use the following formula to get a baseline::
+
+                (OSDs * 100)
+   Total PGs =  ------------
+                 pool size
+
+Where **pool size** is either the number of replicas for replicated
+pools or the K+M sum for erasure coded pools (as returned by **ceph
+osd erasure-code-profile get**).
+
+You should then check if the result makes sense with the way you
+designed your Ceph cluster to maximize `data durability`_,
+`object distribution`_ and minimize `resource usage`_.
+
+The result should be **rounded up to the nearest power of two.**
+Rounding up is optional, but recommended for CRUSH to evenly balance
+the number of objects among placement groups.
+
+As an example, for a cluster with 200 OSDs and a pool size of 3
+replicas, you would estimate your number of PGs as follows::
+
+   (200 * 100)
+   ----------- = 6667. Nearest power of 2: 8192
+        3
+
+When using multiple data pools for storing objects, you need to ensure
+that you balance the number of placement groups per pool with the
+number of placement groups per OSD so that you arrive at a reasonable
+total number of placement groups that provides reasonably low variance
+per OSD without taxing system resources or making the peering process
+too slow.
+
+For instance a cluster of 10 pools each with 512 placement groups on
+ten OSDs is a total of 5,120 placement groups spread over ten OSDs,
+that is 512 placement groups per OSD. That does not use too many
+resources. However, if 1,000 pools were created with 512 placement
+groups each, the OSDs will handle ~50,000 placement groups each and it
+would require significantly more resources and time for peering.
+
+You may find the `PGCalc`_ tool helpful.
+
+
+.. _setting the number of placement groups:
+
+Set the Number of Placement Groups
+==================================
+
+To set the number of placement groups in a pool, you must specify the
+number of placement groups at the time you create the pool.
+See `Create a Pool`_ for details. Once you have set placement groups for a
+pool, you may increase the number of placement groups (but you cannot
+decrease the number of placement groups). To increase the number of
+placement groups, execute the following::
+
+        ceph osd pool set {pool-name} pg_num {pg_num}
+
+Once you increase the number of placement groups, you must also
+increase the number of placement groups for placement (``pgp_num``)
+before your cluster will rebalance. The ``pgp_num`` will be the number of
+placement groups that will be considered for placement by the CRUSH
+algorithm. Increasing ``pg_num`` splits the placement groups but data
+will not be migrated to the newer placement groups until placement
+groups for placement, ie. ``pgp_num`` is increased. The ``pgp_num``
+should be equal to the ``pg_num``.  To increase the number of
+placement groups for placement, execute the following::
+
+        ceph osd pool set {pool-name} pgp_num {pgp_num}
+
+
+Get the Number of Placement Groups
+==================================
+
+To get the number of placement groups in a pool, execute the following::
+
+        ceph osd pool get {pool-name} pg_num
+
+
+Get a Cluster's PG Statistics
+=============================
+
+To get the statistics for the placement groups in your cluster, execute the following::
+
+        ceph pg dump [--format {format}]
+
+Valid formats are ``plain`` (default) and ``json``.
+
+
+Get Statistics for Stuck PGs
+============================
+
+To get the statistics for all placement groups stuck in a specified state,
+execute the following::
+
+        ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format <format>] [-t|--threshold <seconds>]
+
+**Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
+with the most up-to-date data to come up and in.
+
+**Unclean** Placement groups contain objects that are not replicated the desired number
+of times. They should be recovering.
+
+**Stale** Placement groups are in an unknown state - the OSDs that host them have not
+reported to the monitor cluster in a while (configured by ``mon_osd_report_timeout``).
+
+Valid formats are ``plain`` (default) and ``json``. The threshold defines the minimum number
+of seconds the placement group is stuck before including it in the returned statistics
+(default 300 seconds).
+
+
+Get a PG Map
+============
+
+To get the placement group map for a particular placement group, execute the following::
+
+        ceph pg map {pg-id}
+
+For example::
+
+        ceph pg map 1.6c
+
+Ceph will return the placement group map, the placement group, and the OSD status::
+
+        osdmap e13 pg 1.6c (1.6c) -> up [1,0] acting [1,0]
+
+
+Get a PGs Statistics
+====================
+
+To retrieve statistics for a particular placement group, execute the following::
+
+        ceph pg {pg-id} query
+
+
+Scrub a Placement Group
+=======================
+
+To scrub a placement group, execute the following::
+
+        ceph pg scrub {pg-id}
+
+Ceph checks the primary and any replica nodes, generates a catalog of all objects
+in the placement group and compares them to ensure that no objects are missing
+or mismatched, and their contents are consistent.  Assuming the replicas all
+match, a final semantic sweep ensures that all of the snapshot-related object
+metadata is consistent. Errors are reported via logs.
+
+Prioritize backfill/recovery of a Placement Group(s)
+====================================================
+
+You may run into a situation where a bunch of placement groups will require
+recovery and/or backfill, and some particular groups hold data more important
+than others (for example, those PGs may hold data for images used by running
+machines and other PGs may be used by inactive machines/less relevant data).
+In that case, you may want to prioritize recovery of those groups so
+performance and/or availability of data stored on those groups is restored
+earlier. To do this (mark particular placement group(s) as prioritized during 
+backfill or recovery), execute the following::
+
+        ceph pg force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
+        ceph pg force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
+
+This will cause Ceph to perform recovery or backfill on specified placement
+groups first, before other placement groups. This does not interrupt currently
+ongoing backfills or recovery, but causes specified PGs to be processed
+as soon as possible. If you change your mind or prioritize wrong groups,
+use::
+
+        ceph pg cancel-force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
+        ceph pg cancel-force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
+
+This will remove "force" flag from those PGs and they will be processed
+in default order. Again, this doesn't affect currently processed placement
+group, only those that are still queued.
+
+The "force" flag is cleared automatically after recovery or backfill of group
+is done.
+
+Revert Lost
+===========
+
+If the cluster has lost one or more objects, and you have decided to
+abandon the search for the lost data, you must mark the unfound objects
+as ``lost``.
+
+If all possible locations have been queried and objects are still
+lost, you may have to give up on the lost objects. This is
+possible given unusual combinations of failures that allow the cluster
+to learn about writes that were performed before the writes themselves
+are recovered.
+
+Currently the only supported option is "revert", which will either roll back to
+a previous version of the object or (if it was a new object) forget about it
+entirely. To mark the "unfound" objects as "lost", execute the following::
+
+        ceph pg {pg-id} mark_unfound_lost revert|delete
+
+.. important:: Use this feature with caution, because it may confuse
+   applications that expect the object(s) to exist.
+
+
+.. toctree::
+        :hidden:
+
+        pg-states
+        pg-concepts
+
+
+.. _Create a Pool: ../pools#createpool
+.. _Mapping PGs to OSDs: ../../../architecture#mapping-pgs-to-osds
+.. _pgcalc: http://ceph.com/pgcalc/
diff --git a/src/ceph/doc/rados/operations/pools.rst b/src/ceph/doc/rados/operations/pools.rst
new file mode 100644
index 0000000..7015593
--- /dev/null
+++ b/src/ceph/doc/rados/operations/pools.rst
@@ -0,0 +1,798 @@
+=======
+ Pools
+=======
+
+When you first deploy a cluster without creating a pool, Ceph uses the default
+pools for storing data. A pool provides you with:
+
+- **Resilience**: You can set how many OSD are allowed to fail without losing data.
+  For replicated pools, it is the desired number of copies/replicas of an object. 
+  A typical configuration stores an object and one additional copy
+  (i.e., ``size = 2``), but you can determine the number of copies/replicas.
+  For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
+  (i.e. ``m=2`` in the **erasure code profile**)
+  
+- **Placement Groups**: You can set the number of placement groups for the pool.
+  A typical configuration uses approximately 100 placement groups per OSD to 
+  provide optimal balancing without using up too many computing resources. When 
+  setting up multiple pools, be careful to ensure you set a reasonable number of
+  placement groups for both the pool and the cluster as a whole. 
+
+- **CRUSH Rules**: When you store data in a pool, a CRUSH ruleset mapped to the 
+  pool enables CRUSH to identify a rule for the placement of the object 
+  and its replicas (or chunks for erasure coded pools) in your cluster. 
+  You can create a custom CRUSH rule for your pool.
+  
+- **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``, 
+  you effectively take a snapshot of a particular pool.
+  
+To organize data into pools, you can list, create, and remove pools. 
+You can also view the utilization statistics for each pool.
+
+List Pools
+==========
+
+To list your cluster's pools, execute:: 
+
+	ceph osd lspools
+
+On a freshly installed cluster, only the ``rbd`` pool exists.
+
+
+.. _createpool:
+
+Create a Pool
+=============
+
+Before creating pools, refer to the `Pool, PG and CRUSH Config Reference`_.
+Ideally, you should override the default value for the number of placement
+groups in your Ceph configuration file, as the default is NOT ideal.
+For details on placement group numbers refer to `setting the number of placement groups`_
+
+.. note:: Starting with Luminous, all pools need to be associated to the
+   application using the pool. See `Associate Pool to Application`_ below for
+   more information.
+
+For example:: 
+
+	osd pool default pg num = 100
+	osd pool default pgp num = 100
+
+To create a pool, execute:: 
+
+	ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
+             [crush-rule-name] [expected-num-objects]
+	ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
+             [erasure-code-profile] [crush-rule-name] [expected_num_objects]
+
+Where: 
+
+``{pool-name}``
+
+:Description: The name of the pool. It must be unique.
+:Type: String
+:Required: Yes.
+
+``{pg-num}``
+
+:Description: The total number of placement groups for the pool. See `Placement
+              Groups`_  for details on calculating a suitable number. The 
+              default value ``8`` is NOT suitable for most systems.
+
+:Type: Integer
+:Required: Yes.
+:Default: 8
+
+``{pgp-num}``
+
+:Description: The total number of placement groups for placement purposes. This
+              **should be equal to the total number of placement groups**, except 
+              for placement group splitting scenarios.
+
+:Type: Integer
+:Required: Yes. Picks up default or Ceph configuration value if not specified.
+:Default: 8
+
+``{replicated|erasure}``
+
+:Description: The pool type which may either be **replicated** to
+              recover from lost OSDs by keeping multiple copies of the
+              objects or **erasure** to get a kind of
+              `generalized RAID5 <../erasure-code>`_ capability.
+              The **replicated** pools require more
+              raw storage but implement all Ceph operations. The
+              **erasure** pools require less raw storage but only
+              implement a subset of the available operations.
+
+:Type: String
+:Required: No. 
+:Default: replicated
+
+``[crush-rule-name]``
+
+:Description: The name of a CRUSH rule to use for this pool.  The specified
+              rule must exist.
+
+:Type: String
+:Required: No. 
+:Default: For **replicated** pools it is the ruleset specified by the ``osd
+          pool default crush replicated ruleset`` config variable.  This
+          ruleset must exist.
+          For **erasure** pools it is ``erasure-code`` if the ``default``
+          `erasure code profile`_ is used or ``{pool-name}`` otherwise.  This
+          ruleset will be created implicitly if it doesn't exist already.
+
+
+``[erasure-code-profile=profile]``
+
+.. _erasure code profile: ../erasure-code-profile
+
+:Description: For **erasure** pools only. Use the `erasure code profile`_. It
+              must be an existing profile as defined by 
+              **osd erasure-code-profile set**.
+
+:Type: String
+:Required: No. 
+
+When you create a pool, set the number of placement groups to a reasonable value
+(e.g., ``100``). Consider the total number of placement groups per OSD too.
+Placement groups are computationally expensive, so performance will degrade when
+you have many pools with many placement groups (e.g., 50 pools with 100
+placement groups each). The point of diminishing returns depends upon the power
+of the OSD host.
+
+See `Placement Groups`_ for details on calculating an appropriate number of
+placement groups for your pool.
+
+.. _Placement Groups: ../placement-groups
+
+``[expected-num-objects]``
+
+:Description: The expected number of objects for this pool. By setting this value (
+              together with a negative **filestore merge threshold**), the PG folder
+              splitting would happen at the pool creation time, to avoid the latency
+              impact to do a runtime folder splitting.
+
+:Type: Integer
+:Required: No.
+:Default: 0, no splitting at the pool creation time. 
+
+Associate Pool to Application
+=============================
+
+Pools need to be associated with an application before use. Pools that will be
+used with CephFS or pools that are automatically created by RGW are
+automatically associated. Pools that are intended for use with RBD should be
+initialized using the ``rbd`` tool (see `Block Device Commands`_ for more
+information).
+
+For other cases, you can manually associate a free-form application name to
+a pool.::
+
+        ceph osd pool application enable {pool-name} {application-name}
+
+.. note:: CephFS uses the application name ``cephfs``, RBD uses the
+   application name ``rbd``, and RGW uses the application name ``rgw``.
+
+Set Pool Quotas
+===============
+
+You can set pool quotas for the maximum number of bytes and/or the maximum 
+number of objects per pool. ::
+
+	ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}] 
+
+For example:: 
+
+	ceph osd pool set-quota data max_objects 10000
+
+To remove a quota, set its value to ``0``.
+
+
+Delete a Pool
+=============
+
+To delete a pool, execute::
+
+	ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
+
+
+To remove a pool the mon_allow_pool_delete flag must be set to true in the Monitor's
+configuration. Otherwise they will refuse to remove a pool.
+
+See `Monitor Configuration`_ for more information.
+
+.. _Monitor Configuration: ../../configuration/mon-config-ref
+	
+If you created your own rulesets and rules for a pool you created,  you should
+consider removing them when you no longer need your pool::
+
+	ceph osd pool get {pool-name} crush_ruleset
+
+If the ruleset was "123", for example, you can check the other pools like so::
+
+	ceph osd dump | grep "^pool" | grep "crush_ruleset 123"
+
+If no other pools use that custom ruleset, then it's safe to delete that
+ruleset from the cluster.
+
+If you created users with permissions strictly for a pool that no longer
+exists, you should consider deleting those users too::
+
+	ceph auth ls | grep -C 5 {pool-name}
+	ceph auth del {user}
+
+
+Rename a Pool
+=============
+
+To rename a pool, execute:: 
+
+	ceph osd pool rename {current-pool-name} {new-pool-name}
+
+If you rename a pool and you have per-pool capabilities for an authenticated 
+user, you must update the user's capabilities (i.e., caps) with the new pool
+name. 
+
+.. note:: Version ``0.48`` Argonaut and above.
+
+Show Pool Statistics
+====================
+
+To show a pool's utilization statistics, execute:: 
+
+	rados df
+	
+
+Make a Snapshot of a Pool
+=========================
+
+To make a snapshot of a pool, execute:: 
+
+	ceph osd pool mksnap {pool-name} {snap-name}	
+	
+.. note:: Version ``0.48`` Argonaut and above.
+
+
+Remove a Snapshot of a Pool
+===========================
+
+To remove a snapshot of a pool, execute:: 
+
+	ceph osd pool rmsnap {pool-name} {snap-name}
+
+.. note:: Version ``0.48`` Argonaut and above.	
+
+.. _setpoolvalues:
+
+
+Set Pool Values
+===============
+
+To set a value to a pool, execute the following:: 
+
+	ceph osd pool set {pool-name} {key} {value}
+	
+You may set values for the following keys: 
+
+.. _compression_algorithm:
+
+``compression_algorithm``
+:Description: Sets inline compression algorithm to use for underlying BlueStore.
+              This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression algorithm``.
+
+:Type: String
+:Valid Settings: ``lz4``, ``snappy``, ``zlib``, ``zstd``
+
+``compression_mode``
+
+:Description: Sets the policy for the inline compression algorithm for underlying BlueStore.
+              This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression mode``.
+
+:Type: String
+:Valid Settings: ``none``, ``passive``, ``aggressive``, ``force``
+
+``compression_min_blob_size``
+
+:Description: Chunks smaller than this are never compressed.
+              This setting overrides the `global setting <rados/configuration/bluestore-config-ref/#inline-compression>`_ of ``bluestore compression min blob *``.
+
+:Type: Unsigned Integer
+
+``compression_max_blob_size``
+
+:Description: Chunks larger than this are broken into smaller blobs sizing
+              ``compression_max_blob_size`` before being compressed.
+
+:Type: Unsigned Integer
+
+.. _size:
+
+``size``
+
+:Description: Sets the number of replicas for objects in the pool. 
+              See `Set the Number of Object Replicas`_ for further details. 
+              Replicated pools only.
+
+:Type: Integer
+
+.. _min_size:
+
+``min_size``
+
+:Description: Sets the minimum number of replicas required for I/O.  
+              See `Set the Number of Object Replicas`_ for further details. 
+              Replicated pools only.
+
+:Type: Integer
+:Version: ``0.54`` and above
+
+.. _pg_num:
+
+``pg_num``
+
+:Description: The effective number of placement groups to use when calculating 
+              data placement.
+:Type: Integer
+:Valid Range: Superior to ``pg_num`` current value.
+
+.. _pgp_num:
+
+``pgp_num``
+
+:Description: The effective number of placement groups for placement to use 
+              when calculating data placement.
+
+:Type: Integer
+:Valid Range: Equal to or less than ``pg_num``.
+
+.. _crush_ruleset:
+
+``crush_ruleset``
+
+:Description: The ruleset to use for mapping object placement in the cluster.
+:Type: Integer
+
+.. _allow_ec_overwrites:
+
+``allow_ec_overwrites``
+
+:Description: Whether writes to an erasure coded pool can update part
+              of an object, so cephfs and rbd can use it. See
+              `Erasure Coding with Overwrites`_ for more details.
+:Type: Boolean
+:Version: ``12.2.0`` and above
+
+.. _hashpspool:
+
+``hashpspool``
+
+:Description: Set/Unset HASHPSPOOL flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+:Version: Version ``0.48`` Argonaut and above.	
+
+.. _nodelete:
+
+``nodelete``
+
+:Description: Set/Unset NODELETE flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+:Version: Version ``FIXME``
+
+.. _nopgchange:
+
+``nopgchange``
+
+:Description: Set/Unset NOPGCHANGE flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+:Version: Version ``FIXME``
+
+.. _nosizechange:
+
+``nosizechange``
+
+:Description: Set/Unset NOSIZECHANGE flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+:Version: Version ``FIXME``
+
+.. _write_fadvise_dontneed:
+
+``write_fadvise_dontneed``
+
+:Description: Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+
+.. _noscrub:
+
+``noscrub``
+
+:Description: Set/Unset NOSCRUB flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+
+.. _nodeep-scrub:
+
+``nodeep-scrub``
+
+:Description: Set/Unset NODEEP_SCRUB flag on a given pool.
+:Type: Integer
+:Valid Range: 1 sets flag, 0 unsets flag
+
+.. _hit_set_type:
+
+``hit_set_type``
+
+:Description: Enables hit set tracking for cache pools.
+              See `Bloom Filter`_ for additional information.
+
+:Type: String
+:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
+:Default: ``bloom``. Other values are for testing.
+
+.. _hit_set_count:
+
+``hit_set_count``
+
+:Description: The number of hit sets to store for cache pools. The higher 
+              the number, the more RAM consumed by the ``ceph-osd`` daemon.
+
+:Type: Integer
+:Valid Range: ``1``. Agent doesn't handle > 1 yet.
+
+.. _hit_set_period:
+
+``hit_set_period``
+
+:Description: The duration of a hit set period in seconds for cache pools. 
+              The higher the number, the more RAM consumed by the 
+              ``ceph-osd`` daemon.
+
+:Type: Integer
+:Example: ``3600`` 1hr
+
+.. _hit_set_fpp:
+
+``hit_set_fpp``
+
+:Description: The false positive probability for the ``bloom`` hit set type.
+              See `Bloom Filter`_ for additional information.
+
+:Type: Double
+:Valid Range: 0.0 - 1.0
+:Default: ``0.05``
+
+.. _cache_target_dirty_ratio:
+
+``cache_target_dirty_ratio``
+
+:Description: The percentage of the cache pool containing modified (dirty) 
+              objects before the cache tiering agent will flush them to the
+              backing storage pool.
+              
+:Type: Double
+:Default: ``.4``
+
+.. _cache_target_dirty_high_ratio:
+
+``cache_target_dirty_high_ratio``
+
+:Description: The percentage of the cache pool containing modified (dirty)
+              objects before the cache tiering agent will flush them to the
+              backing storage pool with a higher speed.
+
+:Type: Double
+:Default: ``.6``
+
+.. _cache_target_full_ratio:
+
+``cache_target_full_ratio``
+
+:Description: The percentage of the cache pool containing unmodified (clean)
+              objects before the cache tiering agent will evict them from the
+              cache pool.
+             
+:Type: Double
+:Default: ``.8``
+
+.. _target_max_bytes:
+
+``target_max_bytes``
+
+:Description: Ceph will begin flushing or evicting objects when the 
+              ``max_bytes`` threshold is triggered.
+              
+:Type: Integer
+:Example: ``1000000000000``  #1-TB
+
+.. _target_max_objects:
+
+``target_max_objects`` 
+
+:Description: Ceph will begin flushing or evicting objects when the 
+              ``max_objects`` threshold is triggered.
+
+:Type: Integer
+:Example: ``1000000`` #1M objects
+
+
+``hit_set_grade_decay_rate``
+
+:Description: Temperature decay rate between two successive hit_sets
+:Type: Integer
+:Valid Range: 0 - 100
+:Default: ``20``
+
+
+``hit_set_search_last_n``
+
+:Description: Count at most N appearance in hit_sets for temperature calculation
+:Type: Integer
+:Valid Range: 0 - hit_set_count
+:Default: ``1``
+
+
+.. _cache_min_flush_age:
+
+``cache_min_flush_age``
+
+:Description: The time (in seconds) before the cache tiering agent will flush 
+              an object from the cache pool to the storage pool.
+              
+:Type: Integer
+:Example: ``600`` 10min 
+
+.. _cache_min_evict_age:
+
+``cache_min_evict_age``
+
+:Description: The time (in seconds) before the cache tiering agent will evict
+              an object from the cache pool.
+              
+:Type: Integer
+:Example: ``1800`` 30min
+
+.. _fast_read:
+
+``fast_read``
+
+:Description: On Erasure Coding pool, if this flag is turned on, the read request
+              would issue sub reads to all shards, and waits until it receives enough
+              shards to decode to serve the client. In the case of jerasure and isa
+              erasure plugins, once the first K replies return, client's request is
+              served immediately using the data decoded from these replies. This
+              helps to tradeoff some resources for better performance. Currently this
+              flag is only supported for Erasure Coding pool.
+
+:Type: Boolean
+:Defaults: ``0``
+
+.. _scrub_min_interval:
+
+``scrub_min_interval``
+
+:Description: The minimum interval in seconds for pool scrubbing when
+              load is low. If it is 0, the value osd_scrub_min_interval
+              from config is used.
+
+:Type: Double
+:Default: ``0``
+
+.. _scrub_max_interval:
+
+``scrub_max_interval``
+
+:Description: The maximum interval in seconds for pool scrubbing
+              irrespective of cluster load. If it is 0, the value
+              osd_scrub_max_interval from config is used.
+
+:Type: Double
+:Default: ``0``
+
+.. _deep_scrub_interval:
+
+``deep_scrub_interval``
+
+:Description: The interval in seconds for pool “deep” scrubbing. If it
+              is 0, the value osd_deep_scrub_interval from config is used.
+
+:Type: Double
+:Default: ``0``
+
+
+Get Pool Values
+===============
+
+To get a value from a pool, execute the following:: 
+
+	ceph osd pool get {pool-name} {key}
+	
+You may get values for the following keys: 
+
+``size``
+
+:Description: see size_
+
+:Type: Integer
+
+``min_size``
+
+:Description: see min_size_
+
+:Type: Integer
+:Version: ``0.54`` and above
+
+``pg_num``
+
+:Description: see pg_num_
+
+:Type: Integer
+
+
+``pgp_num``
+
+:Description: see pgp_num_
+
+:Type: Integer
+:Valid Range: Equal to or less than ``pg_num``.
+
+
+``crush_ruleset``
+
+:Description: see crush_ruleset_
+
+
+``hit_set_type``
+
+:Description: see hit_set_type_
+
+:Type: String
+:Valid Settings: ``bloom``, ``explicit_hash``, ``explicit_object``
+
+``hit_set_count``
+
+:Description: see hit_set_count_
+
+:Type: Integer
+
+
+``hit_set_period``
+
+:Description: see hit_set_period_
+
+:Type: Integer
+
+
+``hit_set_fpp``
+
+:Description: see hit_set_fpp_
+
+:Type: Double
+
+
+``cache_target_dirty_ratio``
+
+:Description: see cache_target_dirty_ratio_
+
+:Type: Double
+
+
+``cache_target_dirty_high_ratio``
+
+:Description: see cache_target_dirty_high_ratio_
+
+:Type: Double
+
+
+``cache_target_full_ratio``
+
+:Description: see cache_target_full_ratio_
+             
+:Type: Double
+
+
+``target_max_bytes``
+
+:Description: see target_max_bytes_
+              
+:Type: Integer
+
+
+``target_max_objects`` 
+
+:Description: see target_max_objects_
+
+:Type: Integer
+
+
+``cache_min_flush_age``
+
+:Description: see cache_min_flush_age_
+              
+:Type: Integer
+
+
+``cache_min_evict_age``
+
+:Description: see cache_min_evict_age_
+              
+:Type: Integer
+
+
+``fast_read``
+
+:Description: see fast_read_
+
+:Type: Boolean
+
+
+``scrub_min_interval``
+
+:Description: see scrub_min_interval_
+
+:Type: Double
+
+
+``scrub_max_interval``
+
+:Description: see scrub_max_interval_
+
+:Type: Double
+
+
+``deep_scrub_interval``
+
+:Description: see deep_scrub_interval_
+
+:Type: Double
+
+
+Set the Number of Object Replicas
+=================================
+
+To set the number of object replicas on a replicated pool, execute the following:: 
+
+	ceph osd pool set {poolname} size {num-replicas}
+
+.. important:: The ``{num-replicas}`` includes the object itself.
+   If you want the object and two copies of the object for a total of 
+   three instances of the object, specify ``3``.
+   
+For example:: 
+
+	ceph osd pool set data size 3
+
+You may execute this command for each pool. **Note:** An object might accept 
+I/Os in degraded mode with fewer than ``pool size`` replicas.  To set a minimum
+number of required replicas for I/O, you should use the ``min_size`` setting.
+For example::
+
+  ceph osd pool set data min_size 2
+
+This ensures that no object in the data pool will receive I/O with fewer than
+``min_size`` replicas.
+
+
+Get the Number of Object Replicas
+=================================
+
+To get the number of object replicas, execute the following:: 
+
+	ceph osd dump | grep 'replicated size'
+	
+Ceph will list the pools, with the ``replicated size`` attribute highlighted.
+By default, ceph creates two replicas of an object (a total of three copies, or 
+a size of 3).
+
+
+
+.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
+.. _Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
+.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
+.. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites
+.. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool
+
diff --git a/src/ceph/doc/rados/operations/upmap.rst b/src/ceph/doc/rados/operations/upmap.rst
new file mode 100644
index 0000000..58f6322
--- /dev/null
+++ b/src/ceph/doc/rados/operations/upmap.rst
@@ -0,0 +1,75 @@
+Using the pg-upmap
+==================
+
+Starting in Luminous v12.2.z there is a new *pg-upmap* exception table
+in the OSDMap that allows the cluster to explicitly map specific PGs to
+specific OSDs.  This allows the cluster to fine-tune the data
+distribution to, in most cases, perfectly distributed PGs across OSDs.
+
+The key caveat to this new mechanism is that it requires that all
+clients understand the new *pg-upmap* structure in the OSDMap.
+
+Enabling
+--------
+
+To allow use of the feature, you must tell the cluster that it only
+needs to support luminous (and newer) clients with::
+
+  ceph osd set-require-min-compat-client luminous
+
+This command will fail if any pre-luminous clients or daemons are
+connected to the monitors.  You can see what client versions are in
+use with::
+
+  ceph features
+
+A word of caution
+-----------------
+
+This is a new feature and not very user friendly.  At the time of this
+writing we are working on a new `balancer` module for ceph-mgr that
+will eventually do all of this automatically.
+
+Until then,
+
+Offline optimization
+--------------------
+
+Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
+
+#. Grab the latest copy of your osdmap::
+
+     ceph osd getmap -o om
+
+#. Run the optimizer::
+
+     osdmaptool om --upmap out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]
+
+   It is highly recommended that optimization be done for each pool
+   individually, or for sets of similarly-utilized pools.  You can
+   specify the ``--upmap-pool`` option multiple times.  "Similar pools"
+   means pools that are mapped to the same devices and store the same
+   kind of data (e.g., RBD image pools, yes; RGW index pool and RGW
+   data pool, no).
+
+   The ``max-count`` value is the maximum number of upmap entries to
+   identify in the run.  The default is 100, but you may want to make
+   this a smaller number so that the tool completes more quickly (but
+   does less work).  If it cannot find any additional changes to make
+   it will stop early (i.e., when the pool distribution is perfect).
+
+   The ``max-deviation`` value defaults to `.01` (i.e., 1%).  If an OSD
+   utilization varies from the average by less than this amount it
+   will be considered perfect.
+
+#. The proposed changes are written to the output file ``out.txt`` in
+   the example above.  These are normal ceph CLI commands that can be
+   run to apply the changes to the cluster.  This can be done with::
+
+     source out.txt
+
+The above steps can be repeated as many times as necessary to achieve
+a perfect distribution of PGs for each set of pools.
+
+You can see some (gory) details about what the tool is doing by
+passing ``--debug-osd 10`` to ``osdmaptool``.
diff --git a/src/ceph/doc/rados/operations/user-management.rst b/src/ceph/doc/rados/operations/user-management.rst
new file mode 100644
index 0000000..8a35a50
--- /dev/null
+++ b/src/ceph/doc/rados/operations/user-management.rst
@@ -0,0 +1,665 @@
+=================
+ User Management
+=================
+
+This document describes :term:`Ceph Client` users, and their authentication and
+authorization with the :term:`Ceph Storage Cluster`. Users are either
+individuals or system actors such as applications, which use Ceph clients to
+interact with the Ceph Storage Cluster daemons.
+
+.. ditaa::  +-----+
+            | {o} |
+            |     |
+            +--+--+       /---------\               /---------\
+               |          |  Ceph   |               |  Ceph   |
+            ---+---*----->|         |<------------->|         |
+               |     uses | Clients |               | Servers |
+               |          \---------/               \---------/
+            /--+--\
+            |     |
+            |     |
+             actor                                    
+
+
+When Ceph runs with authentication and authorization enabled (enabled by
+default), you must specify a user name and a keyring containing the secret key
+of the specified user (usually via the command line). If you do not specify a
+user name, Ceph will use ``client.admin`` as the default user name. If you do
+not specify a keyring, Ceph will look for a keyring via the ``keyring`` setting
+in the Ceph configuration. For example, if you execute the ``ceph health`` 
+command without specifying a user or keyring::
+
+	ceph health
+	
+Ceph interprets the command like this::
+
+	ceph -n client.admin --keyring=/etc/ceph/ceph.client.admin.keyring health
+
+Alternatively, you may use the ``CEPH_ARGS`` environment variable to avoid 
+re-entry of the user name and secret.
+
+For details on configuring the Ceph Storage Cluster to use authentication, 
+see `Cephx Config Reference`_. For details on the architecture of Cephx, see
+`Architecture - High Availability Authentication`_.
+
+
+Background
+==========
+
+Irrespective of the type of Ceph client (e.g., Block Device, Object Storage,
+Filesystem, native API, etc.), Ceph stores all data as objects within `pools`_.
+Ceph users must have access to pools in order to read and write data.
+Additionally, Ceph users must have execute permissions to use Ceph's
+administrative commands. The following concepts will help you understand Ceph
+user management.
+
+
+User
+----
+
+A user is either an individual or a system actor such as an application.
+Creating users allows you to control who (or what) can access your Ceph Storage
+Cluster, its pools, and the data within pools.
+
+Ceph has the notion of a ``type`` of user. For the purposes of user management,
+the type will always be ``client``. Ceph identifies users in period (.)
+delimited form consisting of the user type and the user ID: for example,
+``TYPE.ID``, ``client.admin``, or ``client.user1``. The reason for user typing
+is that Ceph Monitors, OSDs, and Metadata Servers also use the Cephx protocol,
+but they are not clients. Distinguishing the user type helps to distinguish
+between client users and other users--streamlining access control, user
+monitoring and traceability.
+
+Sometimes Ceph's user type may seem confusing, because the Ceph command line
+allows you to specify a user with or without the type, depending upon your
+command line usage. If you specify ``--user`` or ``--id``, you can omit the
+type. So ``client.user1`` can be entered simply as ``user1``. If you specify
+``--name`` or ``-n``, you must specify the type and name, such as
+``client.user1``. We recommend using the type and name as a best practice
+wherever possible.
+
+.. note:: A Ceph Storage Cluster user is not the same as a Ceph Object Storage
+   user or a Ceph Filesystem user. The Ceph Object Gateway uses a Ceph Storage 
+   Cluster user to communicate between the gateway daemon and the storage 
+   cluster, but the gateway has its own user management functionality for end 
+   users. The Ceph Filesystem uses POSIX semantics. The user space associated 
+   with the Ceph Filesystem is not the same as a Ceph Storage Cluster user.
+
+
+
+Authorization (Capabilities)
+----------------------------
+
+Ceph uses the term "capabilities" (caps) to describe authorizing an
+authenticated user to exercise the functionality of the monitors, OSDs and
+metadata servers. Capabilities can also restrict access to data within a pool or
+a namespace within a pool. A Ceph administrative user sets a user's
+capabilities when creating or updating a user.
+
+Capability syntax follows the form::
+
+	{daemon-type} '{capspec}[, {capspec} ...]'
+
+- **Monitor Caps:** Monitor capabilities include ``r``, ``w``, ``x`` access
+  settings or ``profile {name}``. For example::
+
+	mon 'allow rwx'
+	mon 'profile osd'
+
+- **OSD Caps:** OSD capabilities include ``r``, ``w``, ``x``, ``class-read``,
+  ``class-write`` access settings or ``profile {name}``. Additionally, OSD
+  capabilities also allow for pool and namespace settings. ::
+
+	osd 'allow {access} [pool={pool-name} [namespace={namespace-name}]]'
+	osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]'
+
+- **Metadata Server Caps:** For administrators, use ``allow *``.  For all
+  other users, such as CephFS clients, consult :doc:`/cephfs/client-auth`
+
+
+.. note:: The Ceph Object Gateway daemon (``radosgw``) is a client of the 
+          Ceph Storage Cluster, so it is not represented as a Ceph Storage 
+          Cluster daemon type.
+
+The following entries describe each capability.
+
+``allow``
+
+:Description: Precedes access settings for a daemon. Implies ``rw`` 
+              for MDS only.
+
+
+``r``
+
+:Description: Gives the user read access. Required with monitors to retrieve 
+              the CRUSH map.
+
+
+``w``
+
+:Description: Gives the user write access to objects.
+
+
+``x``
+
+:Description: Gives the user the capability to call class methods 
+              (i.e., both read and write) and to conduct ``auth``
+              operations on monitors.
+
+
+``class-read``
+
+:Descriptions: Gives the user the capability to call class read methods. 
+               Subset of ``x``. 
+
+
+``class-write``
+
+:Description: Gives the user the capability to call class write methods. 
+              Subset of ``x``. 
+
+
+``*``
+
+:Description: Gives the user read, write and execute permissions for a 
+              particular daemon/pool, and the ability to execute 
+              admin commands.
+
+
+``profile osd`` (Monitor only)
+
+:Description: Gives a user permissions to connect as an OSD to other OSDs or 
+              monitors. Conferred on OSDs to enable OSDs to handle replication
+              heartbeat traffic and status reporting.
+
+
+``profile mds`` (Monitor only)
+
+:Description: Gives a user permissions to connect as a MDS to other MDSs or 
+              monitors.
+
+
+``profile bootstrap-osd`` (Monitor only)
+
+:Description: Gives a user permissions to bootstrap an OSD. Conferred on 
+              deployment tools such as ``ceph-disk``, ``ceph-deploy``, etc.
+              so that they have permissions to add keys, etc. when 
+              bootstrapping an OSD.
+
+
+``profile bootstrap-mds`` (Monitor only)
+
+:Description: Gives a user permissions to bootstrap a metadata server. 
+              Conferred on deployment tools such as ``ceph-deploy``, etc.
+              so they have permissions to add keys, etc. when bootstrapping
+              a metadata server.
+
+``profile rbd`` (Monitor and OSD)
+
+:Description: Gives a user permissions to manipulate RBD images. When used
+              as a Monitor cap, it provides the minimal privileges required
+              by an RBD client application. When used as an OSD cap, it
+              provides read-write access to an RBD client application.
+
+``profile rbd-read-only`` (OSD only)
+
+:Description: Gives a user read-only permissions to an RBD image.
+
+
+Pool
+----
+
+A pool is a logical partition where users store data.
+In Ceph deployments, it is common to create a pool as a logical partition for
+similar types of data. For example, when deploying Ceph as a backend for
+OpenStack, a typical deployment would have pools for volumes, images, backups
+and virtual machines, and users such as ``client.glance``, ``client.cinder``,
+etc.
+
+
+Namespace
+---------
+
+Objects within a pool can be associated to a namespace--a logical group of
+objects within the pool. A user's access to a pool can be associated with a
+namespace such that reads and writes by the user take place only within the
+namespace. Objects written to a namespace within the pool can only be accessed
+by users who have access to the namespace.
+
+.. note:: Namespaces are primarily useful for applications written on top of
+   ``librados`` where the logical grouping can alleviate the need to create
+   different pools. Ceph Object Gateway (from ``luminous``) uses namespaces for various
+   metadata objects.
+
+The rationale for namespaces is that pools can be a computationally expensive
+method of segregating data sets for the purposes of authorizing separate sets
+of users. For example, a pool should have ~100 placement groups per OSD. So an 
+exemplary cluster with 1000 OSDs would have 100,000 placement groups for one 
+pool. Each pool would create another 100,000 placement groups in the exemplary 
+cluster. By contrast, writing an object to a namespace simply associates the 
+namespace to the object name with out the computational overhead of a separate 
+pool. Rather than creating a separate pool for a user or set of users, you may
+use a namespace. **Note:** Only available using ``librados`` at this time.
+
+
+Managing Users
+==============
+
+User management functionality provides Ceph Storage Cluster administrators with
+the ability to create, update and delete users directly in the Ceph Storage
+Cluster.
+
+When you create or delete users in the Ceph Storage Cluster, you may need to
+distribute keys to clients so that they can be added to keyrings. See `Keyring
+Management`_ for details.
+
+
+List Users
+----------
+
+To list the users in your cluster, execute the following::
+
+	ceph auth ls
+
+Ceph will list out all users in your cluster. For example, in a two-node
+exemplary cluster, ``ceph auth ls`` will output something that looks like
+this::
+
+	installed auth entries:
+
+	osd.0
+		key: AQCvCbtToC6MDhAATtuT70Sl+DymPCfDSsyV4w==
+		caps: [mon] allow profile osd
+		caps: [osd] allow *
+	osd.1
+		key: AQC4CbtTCFJBChAAVq5spj0ff4eHZICxIOVZeA==
+		caps: [mon] allow profile osd
+		caps: [osd] allow *
+	client.admin
+		key: AQBHCbtT6APDHhAA5W00cBchwkQjh3dkKsyPjw==
+		caps: [mds] allow
+		caps: [mon] allow *
+		caps: [osd] allow *
+	client.bootstrap-mds
+		key: AQBICbtTOK9uGBAAdbe5zcIGHZL3T/u2g6EBww==
+		caps: [mon] allow profile bootstrap-mds
+	client.bootstrap-osd
+		key: AQBHCbtT4GxqORAADE5u7RkpCN/oo4e5W0uBtw==
+		caps: [mon] allow profile bootstrap-osd
+
+
+Note that the ``TYPE.ID`` notation for users applies such that ``osd.0`` is a
+user of type ``osd`` and its ID is ``0``, ``client.admin`` is a user of type
+``client`` and its ID is ``admin`` (i.e., the default ``client.admin`` user).
+Note also that each entry has a ``key: <value>`` entry, and one or more
+``caps:`` entries.
+
+You may use the ``-o {filename}`` option with ``ceph auth ls`` to 
+save the output to a file.
+
+
+Get a User
+----------
+
+To retrieve a specific user, key and capabilities, execute the 
+following::
+
+	ceph auth get {TYPE.ID}
+
+For example::
+
+	ceph auth get client.admin
+
+You may also use the ``-o {filename}`` option with ``ceph auth get`` to 
+save the output to a file. Developers may also execute the following::
+
+	ceph auth export {TYPE.ID}
+
+The ``auth export`` command is identical to ``auth get``, but also prints
+out the internal ``auid``, which is not relevant to end users.
+
+
+
+Add a User
+----------
+
+Adding a user creates a username (i.e., ``TYPE.ID``), a secret key and
+any capabilities included in the command you use to create the user.
+
+A user's key enables the user to authenticate with the Ceph Storage Cluster. 
+The user's capabilities authorize the user to read, write, or execute on Ceph
+monitors (``mon``), Ceph OSDs (``osd``) or Ceph Metadata  Servers (``mds``).
+
+There are a few ways to add a user:
+
+- ``ceph auth add``: This command is the canonical way to add a user. It
+  will create the user, generate a key and add any specified capabilities.
+  
+- ``ceph auth get-or-create``: This command is often the most convenient way
+  to create a user, because it returns a keyfile format with the user name 
+  (in brackets) and the key. If the user already exists, this command
+  simply returns the user name and key in the keyfile format. You may use the 
+  ``-o {filename}`` option to save the output to a file.
+
+- ``ceph auth get-or-create-key``: This command is a convenient way to create
+  a user and return the user's key (only). This is useful for clients that
+  need the key only (e.g., libvirt). If the user already exists, this command
+  simply returns the key. You may use the ``-o {filename}`` option to save the 
+  output to a file.
+
+When creating client users, you may create a user with no capabilities. A user
+with no capabilities is useless beyond mere authentication, because the client
+cannot retrieve the cluster map from the monitor. However, you can create a 
+user with no capabilities if you wish to defer adding capabilities later using 
+the ``ceph auth caps`` command.
+
+A typical user has at least read capabilities on the Ceph monitor and 
+read and write capability on Ceph OSDs. Additionally, a user's OSD permissions
+are often restricted to accessing a particular pool. ::
+
+	ceph auth add client.john mon 'allow r' osd 'allow rw pool=liverpool'
+	ceph auth get-or-create client.paul mon 'allow r' osd 'allow rw pool=liverpool'
+	ceph auth get-or-create client.george mon 'allow r' osd 'allow rw pool=liverpool' -o george.keyring
+	ceph auth get-or-create-key client.ringo mon 'allow r' osd 'allow rw pool=liverpool' -o ringo.key
+
+
+.. important:: If you provide a user with capabilities to OSDs, but you DO NOT
+   restrict access to particular pools, the user will have access to ALL 
+   pools in the cluster!
+
+
+.. _modify-user-capabilities:
+
+Modify User Capabilities
+------------------------
+
+The ``ceph auth caps`` command allows you to specify a user and change the 
+user's capabilities. Setting new capabilities will overwrite current capabilities.
+To view current capabilities run ``ceph auth get USERTYPE.USERID``.  To add
+capabilities, you should also specify the existing capabilities when using the form:: 
+
+	ceph auth caps USERTYPE.USERID {daemon} 'allow [r|w|x|*|...] [pool={pool-name}] [namespace={namespace-name}]' [{daemon} 'allow [r|w|x|*|...] [pool={pool-name}] [namespace={namespace-name}]']
+
+For example:: 
+
+	ceph auth get client.john
+	ceph auth caps client.john mon 'allow r' osd 'allow rw pool=liverpool'
+	ceph auth caps client.paul mon 'allow rw' osd 'allow rwx pool=liverpool'
+	ceph auth caps client.brian-manager mon 'allow *' osd 'allow *'
+
+To remove a capability, you may reset the capability. If you want the user
+to have no access to a particular daemon that was previously set, specify 
+an empty string. For example:: 
+
+	ceph auth caps client.ringo mon ' ' osd ' '
+
+See `Authorization (Capabilities)`_ for additional details on capabilities.
+
+
+Delete a User
+-------------
+
+To delete a user, use ``ceph auth del``:: 
+
+	ceph auth del {TYPE}.{ID}
+	
+Where ``{TYPE}`` is one of ``client``, ``osd``, ``mon``, or ``mds``, 
+and ``{ID}`` is the user name or ID of the daemon.
+
+
+Print a User's Key
+------------------
+
+To print a user's authentication key to standard output, execute the following::
+
+	ceph auth print-key {TYPE}.{ID}
+
+Where ``{TYPE}`` is one of ``client``, ``osd``, ``mon``, or ``mds``, 
+and ``{ID}`` is the user name or ID of the daemon.
+
+Printing a user's key is useful when you need to populate client 
+software with a user's key  (e.g., libvirt). ::
+
+	mount -t ceph serverhost:/ mountpoint -o name=client.user,secret=`ceph auth print-key client.user`
+
+
+Import a User(s)
+----------------
+
+To import one or more users, use ``ceph auth import`` and
+specify a keyring:: 
+
+	ceph auth import -i /path/to/keyring
+
+For example:: 
+
+	sudo ceph auth import -i /etc/ceph/ceph.keyring
+
+
+.. note:: The ceph storage cluster will add new users, their keys and their 
+   capabilities and will update existing users, their keys and their 
+   capabilities.
+
+
+Keyring Management
+==================
+
+When you access Ceph via a Ceph client, the Ceph client will look for a local 
+keyring. Ceph presets the ``keyring`` setting with the following four keyring 
+names by default so you don't have to set them in your Ceph configuration file 
+unless you want to override the defaults (not recommended): 
+
+- ``/etc/ceph/$cluster.$name.keyring``
+- ``/etc/ceph/$cluster.keyring``
+- ``/etc/ceph/keyring``
+- ``/etc/ceph/keyring.bin``
+
+The ``$cluster`` metavariable is your Ceph cluster name as defined by the
+name of the Ceph configuration file (i.e., ``ceph.conf`` means the cluster name
+is ``ceph``; thus, ``ceph.keyring``). The ``$name`` metavariable is the user 
+type and user ID (e.g., ``client.admin``; thus, ``ceph.client.admin.keyring``).
+
+.. note:: When executing commands that read or write to ``/etc/ceph``, you may
+   need to use ``sudo`` to execute the command as ``root``.
+
+After you create a user (e.g., ``client.ringo``), you must get the key and add
+it to a keyring on a Ceph client so that the user can access the Ceph Storage
+Cluster.
+
+The `User Management`_ section details how to list, get, add, modify and delete
+users directly in the Ceph Storage Cluster. However, Ceph also provides the
+``ceph-authtool`` utility to allow you to manage keyrings from a Ceph client.
+
+
+Create a Keyring
+----------------
+
+When you use the procedures in the `Managing Users`_ section to create users, 
+you need to provide user keys to the Ceph client(s) so that the Ceph client 
+can retrieve the key for the specified user and authenticate with the Ceph 
+Storage Cluster. Ceph Clients access keyrings to lookup a user name and 
+retrieve the user's key.
+
+The ``ceph-authtool`` utility allows you to create a keyring. To create an 
+empty keyring, use ``--create-keyring`` or ``-C``. For example:: 
+
+	ceph-authtool --create-keyring /path/to/keyring
+
+When creating a keyring with multiple users, we recommend using the cluster name
+(e.g., ``$cluster.keyring``) for the keyring filename and saving it in the
+``/etc/ceph`` directory so that the ``keyring`` configuration default setting
+will pick up the filename without requiring you to specify it in the local copy
+of your Ceph configuration file. For example, create ``ceph.keyring`` by
+executing the following::
+
+	sudo ceph-authtool -C /etc/ceph/ceph.keyring
+
+When creating a keyring with a single user, we recommend using the cluster name,
+the user type and the user name and saving it in the ``/etc/ceph`` directory.
+For example, ``ceph.client.admin.keyring`` for the ``client.admin`` user.
+
+To create a keyring in ``/etc/ceph``, you must do so as ``root``. This means
+the file will have ``rw`` permissions for the ``root`` user only, which is 
+appropriate when the keyring contains administrator keys. However, if you 
+intend to use the keyring for a particular user or group of users, ensure
+that you execute ``chown`` or ``chmod`` to establish appropriate keyring 
+ownership and access.
+
+
+Add a User to a Keyring
+-----------------------
+
+When you  `Add a User`_ to the Ceph Storage Cluster, you can use the `Get a
+User`_ procedure to retrieve a user, key and capabilities and save the user to a
+keyring.
+
+When you only want to use one user per keyring, the `Get a User`_ procedure with
+the ``-o`` option will save the output in the keyring file format. For example, 
+to create a keyring for the ``client.admin`` user, execute the following:: 
+
+	sudo ceph auth get client.admin -o /etc/ceph/ceph.client.admin.keyring
+	
+Notice that we use the recommended file format for an individual user.
+
+When you want to import users to a keyring, you can use ``ceph-authtool``
+to specify the destination keyring and the source keyring.
+For example:: 
+
+	sudo ceph-authtool /etc/ceph/ceph.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
+
+
+Create a User
+-------------
+
+Ceph provides the `Add a User`_ function to create a user directly in the Ceph
+Storage Cluster. However, you can also create a user, keys and capabilities
+directly on a Ceph client keyring. Then, you can import the user to the Ceph
+Storage Cluster. For example::
+
+	sudo ceph-authtool -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx' /etc/ceph/ceph.keyring
+
+See `Authorization (Capabilities)`_ for additional details on capabilities.
+
+You can also create a keyring and add a new user to the keyring simultaneously.
+For example::
+
+	sudo ceph-authtool -C /etc/ceph/ceph.keyring -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx' --gen-key
+
+In the foregoing scenarios, the new user ``client.ringo`` is only in the 
+keyring. To add the new user to the Ceph Storage Cluster, you must still add
+the new user to the Ceph Storage Cluster. ::
+
+	sudo ceph auth add client.ringo -i /etc/ceph/ceph.keyring
+
+
+Modify a User
+-------------
+
+To modify the capabilities of a user record in a keyring, specify the keyring,
+and the user followed by the capabilities. For example::
+
+	sudo ceph-authtool /etc/ceph/ceph.keyring -n client.ringo --cap osd 'allow rwx' --cap mon 'allow rwx'
+
+To update the user to the Ceph Storage Cluster, you must update the user
+in the keyring to the user entry in the the Ceph Storage Cluster. ::
+
+	sudo ceph auth import -i /etc/ceph/ceph.keyring
+
+See `Import a User(s)`_ for details on updating a Ceph Storage Cluster user
+from a keyring.
+
+You may also `Modify User Capabilities`_ directly in the cluster, store the
+results to a keyring file; then, import the keyring into your main
+``ceph.keyring`` file.
+
+
+Command Line Usage
+==================
+
+Ceph supports the following usage for user name and secret:
+
+``--id`` | ``--user``
+
+:Description: Ceph identifies users with a type and an ID (e.g., ``TYPE.ID`` or
+              ``client.admin``, ``client.user1``). The ``id``, ``name`` and 
+              ``-n`` options enable you to specify the ID portion of the user 
+              name (e.g., ``admin``, ``user1``, ``foo``, etc.). You can specify 
+              the user with the ``--id`` and omit the type. For example, 
+              to specify user ``client.foo`` enter the following:: 
+              
+               ceph --id foo --keyring /path/to/keyring health
+               ceph --user foo --keyring /path/to/keyring health
+
+
+``--name`` | ``-n``
+
+:Description: Ceph identifies users with a type and an ID (e.g., ``TYPE.ID`` or
+              ``client.admin``, ``client.user1``). The ``--name`` and ``-n`` 
+              options enables you to specify the fully qualified user name. 
+              You must specify the user type (typically ``client``) with the 
+              user ID. For example:: 
+
+               ceph --name client.foo --keyring /path/to/keyring health
+               ceph -n client.foo --keyring /path/to/keyring health
+
+
+``--keyring``
+
+:Description: The path to the keyring containing one or more user name and 
+              secret. The ``--secret`` option provides the same functionality, 
+              but it does not work with Ceph RADOS Gateway, which uses 
+              ``--secret`` for another purpose. You may retrieve a keyring with 
+              ``ceph auth get-or-create`` and store it locally. This is a 
+              preferred approach, because you can switch user names without 
+              switching the keyring path. For example:: 
+
+               sudo rbd map --id foo --keyring /path/to/keyring mypool/myimage
+
+
+.. _pools: ../pools
+
+
+Limitations
+===========
+
+The ``cephx`` protocol authenticates Ceph clients and servers to each other.  It
+is not intended to handle authentication of human users or application programs
+run on their behalf.  If that effect is required to handle your access control
+needs, you must have another mechanism, which is likely to be specific to the
+front end used to access the Ceph object store.  This other mechanism has the
+role of ensuring that only acceptable users and programs are able to run on the
+machine that Ceph will permit to access its object store. 
+
+The keys used to authenticate Ceph clients and servers are typically stored in
+a plain text file with appropriate permissions in a trusted host.
+
+.. important:: Storing keys in plaintext files has security shortcomings, but 
+   they are difficult to avoid, given the basic authentication methods Ceph 
+   uses in the background. Those setting up Ceph systems should be aware of 
+   these shortcomings.  
+
+In particular, arbitrary user machines, especially portable machines, should not
+be configured to interact directly with Ceph, since that mode of use would
+require the storage of a plaintext authentication key on an insecure machine.
+Anyone  who stole that machine or obtained surreptitious access to it could
+obtain the key that will allow them to authenticate their own machines to Ceph.
+
+Rather than permitting potentially insecure machines to access a Ceph object
+store directly,  users should be required to sign in to a trusted machine in
+your environment using a method  that provides sufficient security for your
+purposes.  That trusted machine will store the plaintext Ceph keys for the
+human users.  A future version of Ceph may address these particular
+authentication issues more fully.
+
+At the moment, none of the Ceph authentication protocols provide secrecy for
+messages in transit. Thus, an eavesdropper on the wire can hear and understand
+all data sent between clients and servers in Ceph, even if it cannot create or
+alter them. Further, Ceph does not include options to encrypt user data in the
+object store. Users can hand-encrypt and store their own data in the Ceph
+object store, of course, but Ceph provides no features to perform object
+encryption itself. Those storing sensitive data in Ceph should consider
+encrypting their data before providing it  to the Ceph system.
+
+
+.. _Architecture - High Availability Authentication: ../../../architecture#high-availability-authentication
+.. _Cephx Config Reference: ../../configuration/auth-config-ref
diff --git a/src/ceph/doc/rados/troubleshooting/community.rst b/src/ceph/doc/rados/troubleshooting/community.rst
new file mode 100644
index 0000000..9faad13
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/community.rst
@@ -0,0 +1,29 @@
+====================
+ The Ceph Community
+====================
+
+The Ceph community is an excellent source of information and help. For
+operational issues with Ceph releases we recommend you `subscribe to the
+ceph-users email list`_. When  you no longer want to receive emails, you can
+`unsubscribe from the ceph-users email list`_.
+
+You may also `subscribe to the ceph-devel email list`_. You should do so if
+your issue is:
+
+- Likely related to a bug
+- Related to a development release package
+- Related to a development testing package
+- Related to your own builds
+
+If you no longer want to receive emails from the ``ceph-devel`` email list, you
+may `unsubscribe from the ceph-devel email list`_.
+
+.. tip:: The Ceph community is growing rapidly, and community members can help
+   you if you provide them with detailed information about your problem. You
+   can attach the output of the ``ceph report`` command to help people understand your issues.
+
+.. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
+.. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
+.. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
+.. _unsubscribe from the ceph-users email list: mailto:ceph-users-leave@lists.ceph.com
+.. _ceph-devel: ceph-devel@vger.kernel.org
+\ No newline at end of file
diff --git a/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst b/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst
new file mode 100644
index 0000000..159f799
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst
@@ -0,0 +1,67 @@
+===============
+ CPU Profiling
+===============
+
+If you built Ceph from source and compiled Ceph for use with `oprofile`_
+you can profile Ceph's CPU usage. See `Installing Oprofile`_ for details.
+
+
+Initializing oprofile
+=====================
+
+The first time you use ``oprofile`` you need to initialize it. Locate the
+``vmlinux`` image corresponding to the kernel you are now running. :: 
+
+	ls /boot
+	sudo opcontrol --init
+	sudo opcontrol --setup --vmlinux={path-to-image} --separate=library --callgraph=6
+
+
+Starting oprofile
+=================
+
+To start ``oprofile`` execute the following command:: 
+
+	opcontrol --start
+
+Once you start ``oprofile``, you may run some tests with Ceph. 
+
+
+Stopping oprofile
+=================
+
+To stop ``oprofile`` execute the following command:: 
+
+	opcontrol --stop
+	
+	
+Retrieving oprofile Results
+===========================
+
+To retrieve the top ``cmon`` results, execute the following command:: 
+
+	opreport -gal ./cmon | less	
+	
+
+To retrieve the top ``cmon`` results with call graphs attached, execute the
+following command:: 
+
+	opreport -cal ./cmon | less	
+	
+.. important:: After reviewing results, you should reset ``oprofile`` before
+   running it again. Resetting ``oprofile`` removes data from the session 
+   directory.
+
+
+Resetting oprofile
+==================
+
+To reset ``oprofile``, execute the following command:: 
+
+	sudo opcontrol --reset   
+   
+.. important:: You should reset ``oprofile`` after analyzing data so that 
+   you do not commingle results from different tests.
+
+.. _oprofile: http://oprofile.sourceforge.net/about/
+.. _Installing Oprofile: ../../../dev/cpu-profiler
diff --git a/src/ceph/doc/rados/troubleshooting/index.rst b/src/ceph/doc/rados/troubleshooting/index.rst
new file mode 100644
index 0000000..80d14f3
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/index.rst
@@ -0,0 +1,19 @@
+=================
+ Troubleshooting
+=================
+
+Ceph is still on the leading edge, so you may encounter situations that require 
+you to examine your configuration, modify your logging output, troubleshoot
+monitors and OSDs, profile memory and CPU usage, and reach out to the 
+Ceph community for help.
+
+.. toctree::
+   :maxdepth: 1
+   
+   community
+   log-and-debug
+   troubleshooting-mon
+   troubleshooting-osd
+   troubleshooting-pg
+   memory-profiling
+   cpu-profiling
diff --git a/src/ceph/doc/rados/troubleshooting/log-and-debug.rst b/src/ceph/doc/rados/troubleshooting/log-and-debug.rst
new file mode 100644
index 0000000..c91f272
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/log-and-debug.rst
@@ -0,0 +1,550 @@
+=======================
+ Logging and Debugging
+=======================
+
+Typically, when you add debugging to your Ceph configuration, you do so at
+runtime. You can also add Ceph debug logging to your Ceph configuration file if
+you are encountering issues when starting your cluster. You may view Ceph log
+files under ``/var/log/ceph`` (the default location).
+
+.. tip:: When debug output slows down your system, the latency can hide 
+   race conditions.
+
+Logging is resource intensive. If you are encountering a problem in a specific
+area of your cluster, enable logging for that area of the cluster. For example,
+if your OSDs are running fine, but your metadata servers are not, you should
+start by enabling debug logging for the specific metadata server instance(s)
+giving you trouble. Enable logging for each subsystem as needed.
+
+.. important:: Verbose logging can generate over 1GB of data per hour. If your 
+   OS disk reaches its capacity, the node will stop working.
+   
+If you enable or increase the rate of Ceph logging, ensure that you have
+sufficient disk space on your OS disk.  See `Accelerating Log Rotation`_ for
+details on rotating log files. When your system is running well, remove
+unnecessary debugging settings to ensure your cluster runs optimally. Logging
+debug output messages is relatively slow, and a waste of resources when
+operating your cluster.
+
+See `Subsystem, Log and Debug Settings`_ for details on available settings.
+
+Runtime
+=======
+
+If you would like to see the configuration settings at runtime, you must log
+in to a host with a running daemon and execute the following:: 
+
+	ceph daemon {daemon-name} config show | less
+
+For example,::
+
+  ceph daemon osd.0 config show | less
+
+To activate Ceph's debugging output (*i.e.*, ``dout()``) at runtime,  use the
+``ceph tell`` command to inject arguments into the runtime configuration:: 
+
+	ceph tell {daemon-type}.{daemon id or *} injectargs --{name} {value} [--{name} {value}]
+	
+Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply
+the runtime setting to all daemons of a particular type with ``*``, or specify
+a specific daemon's ID. For example, to increase
+debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following:: 
+
+	ceph tell osd.0 injectargs --debug-osd 0/5
+
+The ``ceph tell`` command goes through the monitors. If you cannot bind to the
+monitor, you can still make the change by logging into the host of the daemon
+whose configuration you'd like to change using ``ceph daemon``.
+For example:: 
+
+	sudo ceph daemon osd.0 config set debug_osd 0/5
+
+See `Subsystem, Log and Debug Settings`_ for details on available settings.
+
+
+Boot Time
+=========
+
+To activate Ceph's debugging output (*i.e.*, ``dout()``) at boot time, you must
+add settings to your Ceph configuration file. Subsystems common to each daemon
+may be set under ``[global]`` in your configuration file. Subsystems for
+particular daemons are set under the daemon section in your configuration file
+(*e.g.*, ``[mon]``, ``[osd]``, ``[mds]``). For example::
+
+	[global]
+		debug ms = 1/5
+		
+	[mon]
+		debug mon = 20
+		debug paxos = 1/5
+		debug auth = 2
+		 
+ 	[osd]
+ 		debug osd = 1/5
+ 		debug filestore = 1/5
+ 		debug journal = 1
+ 		debug monc = 5/20
+ 		
+	[mds]
+		debug mds = 1
+		debug mds balancer = 1
+
+
+See `Subsystem, Log and Debug Settings`_ for details.
+
+
+Accelerating Log Rotation
+=========================
+
+If your OS disk is relatively full, you can accelerate log rotation by modifying
+the Ceph log rotation file at ``/etc/logrotate.d/ceph``. Add  a size setting
+after the rotation frequency to accelerate log rotation (via cronjob) if your
+logs exceed the size setting. For example, the  default setting looks like
+this::
+   
+	rotate 7
+  	weekly
+  	compress
+  	sharedscripts
+   	
+Modify it by adding a ``size`` setting. ::
+   
+  	rotate 7
+  	weekly
+  	size 500M
+  	compress
+  	sharedscripts
+
+Then, start the crontab editor for your user space. ::
+   
+  	crontab -e
+	
+Finally, add an entry to check the ``etc/logrotate.d/ceph`` file. ::
+   
+  	30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1
+
+The preceding example checks the ``etc/logrotate.d/ceph`` file every 30 minutes.
+
+
+Valgrind
+========
+
+Debugging may also require you to track down memory and threading issues. 
+You can run a single daemon, a type of daemon, or the whole cluster with 
+Valgrind. You should only use Valgrind when developing or debugging Ceph. 
+Valgrind is computationally expensive, and will slow down your system otherwise. 
+Valgrind messages are logged to ``stderr``. 
+
+
+Subsystem, Log and Debug Settings
+=================================
+
+In most cases, you will enable debug logging output via subsystems. 
+
+Ceph Subsystems
+---------------
+
+Each subsystem has a logging level for its output logs, and for its logs
+in-memory. You may set different values for each of these subsystems by setting
+a log file level and a memory level for debug logging. Ceph's logging levels
+operate on a scale of ``1`` to ``20``, where ``1`` is terse and ``20`` is
+verbose [#]_ . In general, the logs in-memory are not sent to the output log unless:
+
+- a fatal signal is raised or
+- an ``assert`` in source code is triggered or
+- upon requested. Please consult `document on admin socket <http://docs.ceph.com/docs/master/man/8/ceph/#daemon>`_ for more details.
+
+A debug logging setting can take a single value for the log level and the
+memory level, which sets them both as the same value. For example, if you
+specify ``debug ms = 5``, Ceph will treat it as a log level and a memory level
+of ``5``. You may also specify them separately. The first setting is the log
+level, and the second setting is the memory level.  You must separate them with
+a forward slash (/). For example, if you want to set the ``ms`` subsystem's
+debug logging level to ``1`` and its memory level to ``5``, you would specify it
+as ``debug ms = 1/5``. For example:
+
+
+
+.. code-block:: ini 
+
+	debug {subsystem} = {log-level}/{memory-level}
+	#for example
+	debug mds balancer = 1/20
+
+
+The following table provides a list of Ceph subsystems and their default log and
+memory levels. Once you complete your logging efforts, restore the subsystems
+to their default level or to a level suitable for normal operations.
+
+
++--------------------+-----------+--------------+
+| Subsystem          | Log Level | Memory Level |
++====================+===========+==============+
+| ``default``        |     0     |      5       |
++--------------------+-----------+--------------+
+| ``lockdep``        |     0     |      5       |
++--------------------+-----------+--------------+
+| ``context``        |     0     |      5       |
++--------------------+-----------+--------------+
+| ``crush``          |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds``            |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds balancer``   |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds locker``     |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds log``        |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds log expire`` |     1     |      5       |
++--------------------+-----------+--------------+
+| ``mds migrator``   |     1     |      5       |
++--------------------+-----------+--------------+
+| ``buffer``         |     0     |      0       |
++--------------------+-----------+--------------+
+| ``timer``          |     0     |      5       |
++--------------------+-----------+--------------+
+| ``filer``          |     0     |      5       |
++--------------------+-----------+--------------+
+| ``objecter``       |     0     |      0       |
++--------------------+-----------+--------------+
+| ``rados``          |     0     |      5       |
++--------------------+-----------+--------------+
+| ``rbd``            |     0     |      5       |
++--------------------+-----------+--------------+
+| ``journaler``      |     0     |      5       |
++--------------------+-----------+--------------+
+| ``objectcacher``   |     0     |      5       |
++--------------------+-----------+--------------+
+| ``client``         |     0     |      5       |
++--------------------+-----------+--------------+
+| ``osd``            |     0     |      5       |
++--------------------+-----------+--------------+
+| ``optracker``      |     0     |      5       |
++--------------------+-----------+--------------+
+| ``objclass``       |     0     |      5       |
++--------------------+-----------+--------------+
+| ``filestore``      |     1     |      5       |
++--------------------+-----------+--------------+
+| ``journal``        |     1     |      5       |
++--------------------+-----------+--------------+
+| ``ms``             |     0     |      5       |
++--------------------+-----------+--------------+
+| ``mon``            |     1     |      5       |
++--------------------+-----------+--------------+
+| ``monc``           |     0     |      5       |
++--------------------+-----------+--------------+
+| ``paxos``          |     0     |      5       |
++--------------------+-----------+--------------+
+| ``tp``             |     0     |      5       |
++--------------------+-----------+--------------+
+| ``auth``           |     1     |      5       |
++--------------------+-----------+--------------+
+| ``finisher``       |     1     |      5       |
++--------------------+-----------+--------------+
+| ``heartbeatmap``   |     1     |      5       |
++--------------------+-----------+--------------+
+| ``perfcounter``    |     1     |      5       |
++--------------------+-----------+--------------+
+| ``rgw``            |     1     |      5       |
++--------------------+-----------+--------------+
+| ``javaclient``     |     1     |      5       |
++--------------------+-----------+--------------+
+| ``asok``           |     1     |      5       |
++--------------------+-----------+--------------+
+| ``throttle``       |     1     |      5       |
++--------------------+-----------+--------------+
+
+
+Logging Settings
+----------------
+
+Logging and debugging settings are not required in a Ceph configuration file,
+but you may override default settings as needed. Ceph supports the following
+settings:
+
+
+``log file``
+
+:Description: The location of the logging file for your cluster.
+:Type: String
+:Required: No
+:Default: ``/var/log/ceph/$cluster-$name.log``
+
+
+``log max new``
+
+:Description: The maximum number of new log files.
+:Type: Integer
+:Required: No
+:Default: ``1000``
+
+
+``log max recent``
+
+:Description: The maximum number of recent events to include in a log file.
+:Type: Integer
+:Required:  No
+:Default: ``1000000``
+
+
+``log to stderr``
+
+:Description: Determines if logging messages should appear in ``stderr``.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``err to stderr``
+
+:Description: Determines if error messages should appear in ``stderr``.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``log to syslog``
+
+:Description: Determines if logging messages should appear in ``syslog``.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``err to syslog``
+
+:Description: Determines if error messages should appear in ``syslog``.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``log flush on exit``
+
+:Description: Determines if Ceph should flush the log files after exit.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``clog to monitors``
+
+:Description: Determines if ``clog`` messages should be sent to monitors.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``clog to syslog``
+
+:Description: Determines if ``clog`` messages should be sent to syslog.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``mon cluster log to syslog``
+
+:Description: Determines if the cluster log should be output to the syslog.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``mon cluster log file``
+
+:Description: The location of the cluster's log file. 
+:Type: String
+:Required: No
+:Default: ``/var/log/ceph/$cluster.log``
+
+
+
+OSD
+---
+
+
+``osd debug drop ping probability``
+
+:Description: ?
+:Type: Double
+:Required: No
+:Default: 0
+
+
+``osd debug drop ping duration``
+
+:Description: 
+:Type: Integer
+:Required: No
+:Default: 0
+
+``osd debug drop pg create probability``
+
+:Description: 
+:Type: Integer
+:Required: No
+:Default: 0
+
+``osd debug drop pg create duration``
+
+:Description: ?
+:Type: Double
+:Required: No
+:Default: 1
+
+
+``osd tmapput sets uses tmap``
+
+:Description: Uses ``tmap``. For debug only.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``osd min pg log entries``
+
+:Description: The minimum number of log entries for placement groups. 
+:Type: 32-bit Unsigned Integer
+:Required: No
+:Default: 1000
+
+
+``osd op log threshold``
+
+:Description: How many op log messages to show up in one pass. 
+:Type: Integer
+:Required: No
+:Default: 5
+
+
+
+Filestore
+---------
+
+``filestore debug omap check``
+
+:Description: Debugging check on synchronization. This is an expensive operation.
+:Type: Boolean
+:Required: No
+:Default: 0
+
+
+MDS
+---
+
+
+``mds debug scatterstat``
+
+:Description: Ceph will assert that various recursive stat invariants are true 
+              (for developers only).
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``mds debug frag``
+
+:Description: Ceph will verify directory fragmentation invariants when 
+              convenient (developers only).
+
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``mds debug auth pins``
+
+:Description: The debug auth pin invariants (for developers only).
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``mds debug subtrees``
+
+:Description: The debug subtree invariants (for developers only).
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+
+RADOS Gateway
+-------------
+
+
+``rgw log nonexistent bucket``
+
+:Description: Should we log a non-existent buckets?
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``rgw log object name``
+
+:Description: Should an object's name be logged. // man date to see codes (a subset are supported)
+:Type: String
+:Required: No
+:Default: ``%Y-%m-%d-%H-%i-%n``
+
+
+``rgw log object name utc``
+
+:Description: Object log name contains UTC?
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+
+``rgw enable ops log``
+
+:Description: Enables logging of every RGW operation.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``rgw enable usage log``
+
+:Description: Enable logging of RGW's bandwidth usage.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``rgw usage log flush threshold``
+
+:Description: Threshold to flush pending log data.
+:Type: Integer
+:Required: No
+:Default: ``1024``
+
+
+``rgw usage log tick interval``
+
+:Description: Flush pending log data every ``s`` seconds.
+:Type: Integer
+:Required: No
+:Default: 30
+
+
+``rgw intent log object name``
+
+:Description: 
+:Type: String
+:Required: No
+:Default: ``%Y-%m-%d-%i-%n``
+
+
+``rgw intent log object name utc``
+
+:Description: Include a UTC timestamp in the intent log object name.
+:Type: Boolean
+:Required: No
+:Default: ``false``
+
+.. [#] there are levels >20 in some rare cases and that they are extremely verbose.
diff --git a/src/ceph/doc/rados/troubleshooting/memory-profiling.rst b/src/ceph/doc/rados/troubleshooting/memory-profiling.rst
new file mode 100644
index 0000000..e2396e2
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/memory-profiling.rst
@@ -0,0 +1,142 @@
+==================
+ Memory Profiling
+==================
+
+Ceph MON, OSD and MDS can generate heap profiles using
+``tcmalloc``. To generate heap profiles, ensure you have
+``google-perftools`` installed::
+
+	sudo apt-get install google-perftools
+
+The profiler dumps output to your ``log file`` directory (i.e.,
+``/var/log/ceph``). See `Logging and Debugging`_ for details.
+To view the profiler logs with Google's performance tools, execute the
+following:: 
+
+    google-pprof --text {path-to-daemon}  {log-path/filename}
+
+For example::
+
+    $ ceph tell osd.0 heap start_profiler
+    $ ceph tell osd.0 heap dump
+    osd.0 tcmalloc heap stats:------------------------------------------------
+    MALLOC:        2632288 (    2.5 MiB) Bytes in use by application
+    MALLOC: +       499712 (    0.5 MiB) Bytes in page heap freelist
+    MALLOC: +       543800 (    0.5 MiB) Bytes in central cache freelist
+    MALLOC: +       327680 (    0.3 MiB) Bytes in transfer cache freelist
+    MALLOC: +      1239400 (    1.2 MiB) Bytes in thread cache freelists
+    MALLOC: +      1142936 (    1.1 MiB) Bytes in malloc metadata
+    MALLOC:   ------------
+    MALLOC: =      6385816 (    6.1 MiB) Actual memory used (physical + swap)
+    MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
+    MALLOC:   ------------
+    MALLOC: =      6385816 (    6.1 MiB) Virtual address space used
+    MALLOC:
+    MALLOC:            231              Spans in use
+    MALLOC:             56              Thread heaps in use
+    MALLOC:           8192              Tcmalloc page size
+    ------------------------------------------------
+    Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
+    Bytes released to the OS take up virtual address space but no physical memory.
+    $ google-pprof --text \
+                   /usr/bin/ceph-osd  \
+                   /var/log/ceph/ceph-osd.0.profile.0001.heap
+     Total: 3.7 MB
+     1.9  51.1%  51.1%      1.9  51.1% ceph::log::Log::create_entry
+     1.8  47.3%  98.4%      1.8  47.3% std::string::_Rep::_S_create
+     0.0   0.4%  98.9%      0.0   0.6% SimpleMessenger::add_accept_pipe
+     0.0   0.4%  99.2%      0.0   0.6% decode_message
+     ...
+
+Another heap dump on the same daemon will add another file. It is
+convenient to compare to a previous heap dump to show what has grown
+in the interval. For instance::
+
+    $ google-pprof --text --base out/osd.0.profile.0001.heap \
+          ceph-osd out/osd.0.profile.0003.heap
+     Total: 0.2 MB
+     0.1  50.3%  50.3%      0.1  50.3% ceph::log::Log::create_entry
+     0.1  46.6%  96.8%      0.1  46.6% std::string::_Rep::_S_create
+     0.0   0.9%  97.7%      0.0  26.1% ReplicatedPG::do_op
+     0.0   0.8%  98.5%      0.0   0.8% __gnu_cxx::new_allocator::allocate
+
+Refer to `Google Heap Profiler`_ for additional details.
+
+Once you have the heap profiler installed, start your cluster and
+begin using the heap profiler. You may enable or disable the heap
+profiler at runtime, or ensure that it runs continuously. For the
+following commandline usage, replace ``{daemon-type}`` with ``mon``,
+``osd`` or ``mds``, and replace ``{daemon-id}`` with the OSD number or
+the MON or MDS id.
+
+
+Starting the Profiler
+---------------------
+
+To start the heap profiler, execute the following:: 
+
+	ceph tell {daemon-type}.{daemon-id} heap start_profiler
+
+For example:: 
+
+	ceph tell osd.1 heap start_profiler
+
+Alternatively the profile can be started when the daemon starts
+running if the ``CEPH_HEAP_PROFILER_INIT=true`` variable is found in
+the environment.
+
+Printing Stats
+--------------
+
+To print out statistics, execute the following:: 
+
+	ceph  tell {daemon-type}.{daemon-id} heap stats
+
+For example:: 
+
+	ceph tell osd.0 heap stats
+
+.. note:: Printing stats does not require the profiler to be running and does
+   not dump the heap allocation information to a file.
+
+
+Dumping Heap Information
+------------------------
+
+To dump heap information, execute the following:: 
+
+	ceph tell {daemon-type}.{daemon-id} heap dump
+
+For example:: 
+
+	ceph tell mds.a heap dump
+
+.. note:: Dumping heap information only works when the profiler is running.
+
+
+Releasing Memory
+----------------
+
+To release memory that ``tcmalloc`` has allocated but which is not being used by
+the Ceph daemon itself, execute the following:: 
+
+	ceph tell {daemon-type}{daemon-id} heap release
+
+For example:: 
+
+	ceph tell osd.2 heap release
+
+
+Stopping the Profiler
+---------------------
+
+To stop the heap profiler, execute the following:: 
+
+	ceph tell {daemon-type}.{daemon-id} heap stop_profiler
+
+For example:: 
+
+	ceph tell osd.0 heap stop_profiler
+
+.. _Logging and Debugging: ../log-and-debug
+.. _Google Heap Profiler: http://goog-perftools.sourceforge.net/doc/heap_profiler.html
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
new file mode 100644
index 0000000..89fb94c
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
@@ -0,0 +1,567 @@
+=================================
+ Troubleshooting Monitors
+=================================
+
+.. index:: monitor, high availability
+
+When a cluster encounters monitor-related troubles there's a tendency to
+panic, and some times with good reason. You should keep in mind that losing
+a monitor, or a bunch of them, don't necessarily mean that your cluster is
+down, as long as a majority is up, running and with a formed quorum.
+Regardless of how bad the situation is, the first thing you should do is to
+calm down, take a breath and try answering our initial troubleshooting script.
+
+
+Initial Troubleshooting
+========================
+
+
+**Are the monitors running?**
+
+  First of all, we need to make sure the monitors are running. You would be
+  amazed by how often people forget to run the monitors, or restart them after
+  an upgrade. There's no shame in that, but let's try not losing a couple of
+  hours chasing an issue that is not there.
+
+**Are you able to connect to the monitor's servers?**
+
+  Doesn't happen often, but sometimes people do have ``iptables`` rules that
+  block accesses to monitor servers or monitor ports. Usually leftovers from
+  monitor stress-testing that were forgotten at some point. Try ssh'ing into
+  the server and, if that succeeds, try connecting to the monitor's port
+  using you tool of choice (telnet, nc,...).
+
+**Does ceph -s run and obtain a reply from the cluster?**
+
+  If the answer is yes then your cluster is up and running.  One thing you
+  can take for granted is that the monitors will only answer to a ``status``
+  request if there is a formed quorum.
+
+  If ``ceph -s`` blocked however, without obtaining a reply from the cluster
+  or showing a lot of ``fault`` messages, then it is likely that your monitors
+  are either down completely or just a portion is up -- a portion that is not
+  enough to form a quorum (keep in mind that a quorum if formed by a majority
+  of monitors).
+
+**What if ceph -s doesn't finish?**
+
+  If you haven't gone through all the steps so far, please go back and do.
+
+  For those running on Emperor 0.72-rc1 and forward, you will be able to
+  contact each monitor individually asking them for their status, regardless
+  of a quorum being formed. This an be achieved using ``ceph ping mon.ID``,
+  ID being the monitor's identifier. You should perform this for each monitor
+  in the cluster. In section `Understanding mon_status`_ we will explain how
+  to interpret the output of this command.
+
+  For the rest of you who don't tread on the bleeding edge, you will need to
+  ssh into the server and use the monitor's admin socket. Please jump to
+  `Using the monitor's admin socket`_.
+
+For other specific issues, keep on reading.
+
+
+Using the monitor's admin socket
+=================================
+
+The admin socket allows you to interact with a given daemon directly using a
+Unix socket file. This file can be found in your monitor's ``run`` directory.
+By default, the admin socket will be kept in ``/var/run/ceph/ceph-mon.ID.asok``
+but this can vary if you defined it otherwise. If you don't find it there,
+please check your ``ceph.conf`` for an alternative path or run::
+
+  ceph-conf --name mon.ID --show-config-value admin_socket
+
+Please bear in mind that the admin socket will only be available while the
+monitor is running. When the monitor is properly shutdown, the admin socket
+will be removed. If however the monitor is not running and the admin socket
+still persists, it is likely that the monitor was improperly shutdown.
+Regardless, if the monitor is not running, you will not be able to use the
+admin socket, with ``ceph`` likely returning ``Error 111: Connection Refused``.
+
+Accessing the admin socket is as simple as telling the ``ceph`` tool to use
+the ``asok`` file.  In pre-Dumpling Ceph, this can be achieved by::
+
+  ceph --admin-daemon /var/run/ceph/ceph-mon.<id>.asok <command>
+
+while in Dumpling and beyond you can use the alternate (and recommended)
+format::
+
+  ceph daemon mon.<id> <command>
+
+Using ``help`` as the command to the ``ceph`` tool will show you the
+supported commands available through the admin socket. Please take a look
+at ``config get``, ``config show``, ``mon_status`` and ``quorum_status``,
+as those can be enlightening when troubleshooting a monitor.
+
+
+Understanding mon_status
+=========================
+
+``mon_status`` can be obtained through the ``ceph`` tool when you have
+a formed quorum, or via the admin socket if you don't. This command will
+output a multitude of information about the monitor, including the same
+output you would get with ``quorum_status``.
+
+Take the following example of ``mon_status``::
+
+  
+  { "name": "c",
+    "rank": 2,
+    "state": "peon",
+    "election_epoch": 38,
+    "quorum": [
+          1,
+          2],
+    "outside_quorum": [],
+    "extra_probe_peers": [],
+    "sync_provider": [],
+    "monmap": { "epoch": 3,
+        "fsid": "5c4e9d53-e2e1-478a-8061-f543f8be4cf8",
+        "modified": "2013-10-30 04:12:01.945629",
+        "created": "2013-10-29 14:14:41.914786",
+        "mons": [
+              { "rank": 0,
+                "name": "a",
+                "addr": "127.0.0.1:6789\/0"},
+              { "rank": 1,
+                "name": "b",
+                "addr": "127.0.0.1:6790\/0"},
+              { "rank": 2,
+                "name": "c",
+                "addr": "127.0.0.1:6795\/0"}]}}
+
+A couple of things are obvious: we have three monitors in the monmap (*a*, *b*
+and *c*), the quorum is formed by only two monitors, and *c* is in the quorum
+as a *peon*.
+
+Which monitor is out of the quorum?
+
+  The answer would be **a**.
+
+Why?
+
+  Take a look at the ``quorum`` set. We have two monitors in this set: *1*
+  and *2*. These are not monitor names. These are monitor ranks, as established
+  in the current monmap. We are missing the monitor with rank 0, and according
+  to the monmap that would be ``mon.a``.
+
+By the way, how are ranks established?
+
+  Ranks are (re)calculated whenever you add or remove monitors and follow a
+  simple rule: the **greater** the ``IP:PORT`` combination, the **lower** the
+  rank is. In this case, considering that ``127.0.0.1:6789`` is lower than all
+  the remaining ``IP:PORT`` combinations, ``mon.a`` has rank 0.
+
+Most Common Monitor Issues
+===========================
+
+Have Quorum but at least one Monitor is down
+---------------------------------------------
+
+When this happens, depending on the version of Ceph you are running,
+you should be seeing something similar to::
+
+      $ ceph health detail
+      [snip]
+      mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
+
+How to troubleshoot this?
+
+  First, make sure ``mon.a`` is running.
+
+  Second, make sure you are able to connect to ``mon.a``'s server from the
+  other monitors' servers. Check the ports as well. Check ``iptables`` on
+  all your monitor nodes and make sure you are not dropping/rejecting
+  connections.
+
+  If this initial troubleshooting doesn't solve your problems, then it's
+  time to go deeper.
+
+  First, check the problematic monitor's ``mon_status`` via the admin
+  socket as explained in `Using the monitor's admin socket`_ and
+  `Understanding mon_status`_.
+
+  Considering the monitor is out of the quorum, its state should be one of
+  ``probing``, ``electing`` or ``synchronizing``. If it happens to be either
+  ``leader`` or ``peon``, then the monitor believes to be in quorum, while
+  the remaining cluster is sure it is not; or maybe it got into the quorum
+  while we were troubleshooting the monitor, so check you ``ceph -s`` again
+  just to make sure. Proceed if the monitor is not yet in the quorum.
+
+What if the state is ``probing``?
+
+  This means the monitor is still looking for the other monitors. Every time
+  you start a monitor, the monitor will stay in this state for some time
+  while trying to find the rest of the monitors specified in the ``monmap``.
+  The time a monitor will spend in this state can vary. For instance, when on
+  a single-monitor cluster, the monitor will pass through the probing state
+  almost instantaneously, since there are no other monitors around. On a
+  multi-monitor cluster, the monitors will stay in this state until they
+  find enough monitors to form a quorum -- this means that if you have 2 out
+  of 3 monitors down, the one remaining monitor will stay in this state
+  indefinitively until you bring one of the other monitors up.
+
+  If you have a quorum, however, the monitor should be able to find the
+  remaining monitors pretty fast, as long as they can be reached. If your
+  monitor is stuck probing and you have gone through with all the communication
+  troubleshooting, then there is a fair chance that the monitor is trying
+  to reach the other monitors on a wrong address. ``mon_status`` outputs the
+  ``monmap`` known to the monitor: check if the other monitor's locations
+  match reality. If they don't, jump to
+  `Recovering a Monitor's Broken monmap`_; if they do, then it may be related
+  to severe clock skews amongst the monitor nodes and you should refer to
+  `Clock Skews`_ first, but if that doesn't solve your problem then it is
+  the time to prepare some logs and reach out to the community (please refer
+  to `Preparing your logs`_ on how to best prepare your logs).
+
+
+What if state is ``electing``?
+
+  This means the monitor is in the middle of an election. These should be
+  fast to complete, but at times the monitors can get stuck electing. This
+  is usually a sign of a clock skew among the monitor nodes; jump to
+  `Clock Skews`_ for more infos on that. If all your clocks are properly
+  synchronized, it is best if you prepare some logs and reach out to the
+  community. This is not a state that is likely to persist and aside from
+  (*really*) old bugs there is not an obvious reason besides clock skews on
+  why this would happen.
+
+What if state is ``synchronizing``?
+
+  This means the monitor is synchronizing with the rest of the cluster in
+  order to join the quorum. The synchronization process is as faster as
+  smaller your monitor store is, so if you have a big store it may
+  take a while. Don't worry, it should be finished soon enough.
+
+  However, if you notice that the monitor jumps from ``synchronizing`` to
+  ``electing`` and then back to ``synchronizing``, then you do have a
+  problem: the cluster state is advancing (i.e., generating new maps) way
+  too fast for the synchronization process to keep up. This used to be a
+  thing in early Cuttlefish, but since then the synchronization process was
+  quite refactored and enhanced to avoid just this sort of behavior. If this
+  happens in later versions let us know. And bring some logs
+  (see `Preparing your logs`_).
+
+What if state is ``leader`` or ``peon``?
+
+  This should not happen. There is a chance this might happen however, and
+  it has a lot to do with clock skews -- see `Clock Skews`_. If you are not
+  suffering from clock skews, then please prepare your logs (see
+  `Preparing your logs`_) and reach out to us.
+
+
+Recovering a Monitor's Broken monmap
+-------------------------------------
+
+This is how a ``monmap`` usually looks like, depending on the number of
+monitors::
+
+
+      epoch 3
+      fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8
+      last_changed 2013-10-30 04:12:01.945629
+      created 2013-10-29 14:14:41.914786
+      0: 127.0.0.1:6789/0 mon.a
+      1: 127.0.0.1:6790/0 mon.b
+      2: 127.0.0.1:6795/0 mon.c
+      
+This may not be what you have however. For instance, in some versions of
+early Cuttlefish there was this one bug that could cause your ``monmap``
+to be nullified.  Completely filled with zeros. This means that not even
+``monmaptool`` would be able to read it because it would find it hard to
+make sense of only-zeros. Some other times, you may end up with a monitor
+with a severely outdated monmap, thus being unable to find the remaining
+monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``,
+then remove ``mon.a``, then add a new monitor ``mon.e`` and remove
+``mon.b``; you will end up with a totally different monmap from the one
+``mon.c`` knows).
+
+In this sort of situations, you have two possible solutions:
+
+Scrap the monitor and create a new one
+
+  You should only take this route if you are positive that you won't
+  lose the information kept by that monitor; that you have other monitors
+  and that they are running just fine so that your new monitor is able
+  to synchronize from the remaining monitors. Keep in mind that destroying
+  a monitor, if there are no other copies of its contents, may lead to
+  loss of data.
+
+Inject a monmap into the monitor
+
+  Usually the safest path. You should grab the monmap from the remaining
+  monitors and inject it into the monitor with the corrupted/lost monmap.
+
+  These are the basic steps:
+
+  1. Is there a formed quorum? If so, grab the monmap from the quorum::
+
+      $ ceph mon getmap -o /tmp/monmap
+
+  2. No quorum? Grab the monmap directly from another monitor (this
+     assumes the monitor you are grabbing the monmap from has id ID-FOO
+     and has been stopped)::
+
+      $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
+
+  3. Stop the monitor you are going to inject the monmap into.
+
+  4. Inject the monmap::
+
+      $ ceph-mon -i ID --inject-monmap /tmp/monmap
+
+  5. Start the monitor
+
+  Please keep in mind that the ability to inject monmaps is a powerful
+  feature that can cause havoc with your monitors if misused as it will
+  overwrite the latest, existing monmap kept by the monitor.
+
+
+Clock Skews
+------------
+
+Monitors can be severely affected by significant clock skews across the
+monitor nodes. This usually translates into weird behavior with no obvious
+cause. To avoid such issues, you should run a clock synchronization tool
+on your monitor nodes.
+
+
+What's the maximum tolerated clock skew?
+
+  By default the monitors will allow clocks to drift up to ``0.05 seconds``.
+
+
+Can I increase the maximum tolerated clock skew?
+
+  This value is configurable via the ``mon-clock-drift-allowed`` option, and
+  although you *CAN* it doesn't mean you *SHOULD*. The clock skew mechanism
+  is in place because clock skewed monitor may not properly behave. We, as
+  developers and QA afficcionados, are comfortable with the current default
+  value, as it will alert the user before the monitors get out hand. Changing
+  this value without testing it first may cause unforeseen effects on the
+  stability of the monitors and overall cluster healthiness, although there is
+  no risk of dataloss.
+
+
+How do I know there's a clock skew?
+
+  The monitors will warn you in the form of a ``HEALTH_WARN``. ``ceph health
+  detail`` should show something in the form of::
+
+      mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)
+
+  That means that ``mon.c`` has been flagged as suffering from a clock skew.
+
+
+What should I do if there's a clock skew?
+
+  Synchronize your clocks. Running an NTP client may help. If you are already
+  using one and you hit this sort of issues, check if you are using some NTP
+  server remote to your network and consider hosting your own NTP server on
+  your network.  This last option tends to reduce the amount of issues with
+  monitor clock skews.
+
+
+Client Can't Connect or Mount
+------------------------------
+
+Check your IP tables. Some OS install utilities add a ``REJECT`` rule to
+``iptables``. The rule rejects all clients trying to connect to the host except
+for ``ssh``. If your monitor host's IP tables have such a ``REJECT`` rule in
+place, clients connecting from a separate node will fail to mount with a timeout
+error. You need to address ``iptables`` rules that reject clients trying to
+connect to Ceph daemons.  For example, you would need to address rules that look
+like this appropriately::
+
+	REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
+
+You may also need to add rules to IP tables on your Ceph hosts to ensure
+that clients can access the ports associated with your Ceph monitors (i.e., port
+6789 by default) and Ceph OSDs (i.e., 6800 through 7300 by default). For
+example::
+
+	iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:7300 -j ACCEPT
+
+Monitor Store Failures
+======================
+
+Symptoms of store corruption
+----------------------------
+
+Ceph monitor stores the `cluster map`_ in a key/value store such as LevelDB. If
+a monitor fails due to the key/value store corruption, following error messages
+might be found in the monitor log::
+
+  Corruption: error in middle of record
+
+or::
+
+  Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
+
+Recovery using healthy monitor(s)
+---------------------------------
+
+If there is any survivers, we can always `replace`_ the corrupted one with a
+new one. And after booting up, the new joiner will sync up with a healthy
+peer, and once it is fully sync'ed, it will be able to serve the clients.
+
+Recovery using OSDs
+-------------------
+
+But what if all monitors fail at the same time? Since users are encouraged to
+deploy at least three monitors in a Ceph cluster, the chance of simultaneous
+failure is rare. But unplanned power-downs in a data center with improperly
+configured disk/fs settings could fail the underlying filesystem, and hence
+kill all the monitors. In this case, we can recover the monitor store with the
+information stored in OSDs.::
+
+  ms=/tmp/mon-store
+  mkdir $ms
+  # collect the cluster map from OSDs
+  for host in $hosts; do
+    rsync -avz $ms user@host:$ms
+    rm -rf $ms
+    ssh user@host <<EOF
+      for osd in /var/lib/osd/osd-*; do
+        ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
+      done
+    EOF
+    rsync -avz user@host:$ms $ms
+  done
+  # rebuild the monitor store from the collected map, if the cluster does not
+  # use cephx authentication, we can skip the following steps to update the
+  # keyring with the caps, and there is no need to pass the "--keyring" option.
+  # i.e. just use "ceph-monstore-tool /tmp/mon-store rebuild" instead
+  ceph-authtool /path/to/admin.keyring -n mon. \
+    --cap mon 'allow *'
+  ceph-authtool /path/to/admin.keyring -n client.admin \
+    --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
+  ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /path/to/admin.keyring
+  # backup corrupted store.db just in case
+  mv /var/lib/ceph/mon/mon.0/store.db /var/lib/ceph/mon/mon.0/store.db.corrupted
+  mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db
+  chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db
+
+The steps above
+
+#. collect the map from all OSD hosts,
+#. then rebuild the store,
+#. fill the entities in keyring file with appropriate caps
+#. replace the corrupted store on ``mon.0`` with the recovered copy.
+
+Known limitations
+~~~~~~~~~~~~~~~~~
+
+Following information are not recoverable using the steps above:
+
+- **some added keyrings**: all the OSD keyrings added using ``ceph auth add`` command
+  are recovered from the OSD's copy. And the ``client.admin`` keyring is imported
+  using ``ceph-monstore-tool``. But the MDS keyrings and other keyrings are missing
+  in the recovered monitor store. You might need to re-add them manually.
+
+- **pg settings**: the ``full ratio`` and ``nearfull ratio`` settings configured using
+  ``ceph pg set_full_ratio`` and ``ceph pg set_nearfull_ratio`` will be lost.
+
+- **MDS Maps**: the MDS maps are lost.
+
+
+Everything Failed! Now What?
+=============================
+
+Reaching out for help
+----------------------
+
+You can find us on IRC at #ceph and #ceph-devel at OFTC (server irc.oftc.net)
+and on ``ceph-devel@vger.kernel.org`` and ``ceph-users@lists.ceph.com``. Make
+sure you have grabbed your logs and have them ready if someone asks: the faster
+the interaction and lower the latency in response, the better chances everyone's
+time is optimized.
+
+
+Preparing your logs
+---------------------
+
+Monitor logs are, by default, kept in ``/var/log/ceph/ceph-mon.FOO.log*``. We
+may want them. However, your logs may not have the necessary information. If
+you don't find your monitor logs at their default location, you can check
+where they should be by running::
+
+  ceph-conf --name mon.FOO --show-config-value log_file
+
+The amount of information in the logs are subject to the debug levels being
+enforced by your configuration files. If you have not enforced a specific
+debug level then Ceph is using the default levels and your logs may not
+contain important information to track down you issue.
+A first step in getting relevant information into your logs will be to raise
+debug levels. In this case we will be interested in the information from the
+monitor.
+Similarly to what happens on other components, different parts of the monitor
+will output their debug information on different subsystems.
+
+You will have to raise the debug levels of those subsystems more closely
+related to your issue. This may not be an easy task for someone unfamiliar
+with troubleshooting Ceph. For most situations, setting the following options
+on your monitors will be enough to pinpoint a potential source of the issue::
+
+      debug mon = 10
+      debug ms = 1
+
+If we find that these debug levels are not enough, there's a chance we may
+ask you to raise them or even define other debug subsystems to obtain infos
+from -- but at least we started off with some useful information, instead
+of a massively empty log without much to go on with.
+
+Do I need to restart a monitor to adjust debug levels?
+------------------------------------------------------
+
+No. You may do it in one of two ways:
+
+You have quorum
+
+  Either inject the debug option into the monitor you want to debug::
+
+        ceph tell mon.FOO injectargs --debug_mon 10/10
+
+  or into all monitors at once::
+
+        ceph tell mon.* injectargs --debug_mon 10/10
+
+No quourm
+
+  Use the monitor's admin socket and directly adjust the configuration
+  options::
+
+      ceph daemon mon.FOO config set debug_mon 10/10
+
+
+Going back to default values is as easy as rerunning the above commands
+using the debug level ``1/10`` instead.  You can check your current
+values using the admin socket and the following commands::
+
+      ceph daemon mon.FOO config show
+
+or::
+
+      ceph daemon mon.FOO config get 'OPTION_NAME'
+
+
+Reproduced the problem with appropriate debug levels. Now what?
+----------------------------------------------------------------
+
+Ideally you would send us only the relevant portions of your logs.
+We realise that figuring out the corresponding portion may not be the
+easiest of tasks. Therefore, we won't hold it to you if you provide the
+full log, but common sense should be employed. If your log has hundreds
+of thousands of lines, it may get tricky to go through the whole thing,
+specially if we are not aware at which point, whatever your issue is,
+happened. For instance, when reproducing, keep in mind to write down
+current time and date and to extract the relevant portions of your logs
+based on that.
+
+Finally, you should reach out to us on the mailing lists, on IRC or file
+a new issue on the `tracker`_.
+
+.. _cluster map: ../../architecture#cluster-map
+.. _replace: ../operation/add-or-rm-mons
+.. _tracker: http://tracker.ceph.com/projects/ceph/issues/new
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst
new file mode 100644
index 0000000..88307fe
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst
@@ -0,0 +1,536 @@
+======================
+ Troubleshooting OSDs
+======================
+
+Before troubleshooting your OSDs, check your monitors and network first. If
+you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph returns
+a health status, it means that the monitors have a quorum.
+If you don't have a monitor quorum or if there are errors with the monitor
+status, `address the monitor issues first <../troubleshooting-mon>`_.
+Check your networks to ensure they
+are running properly, because networks may have a significant impact on OSD
+operation and performance.
+
+
+
+Obtaining Data About OSDs
+=========================
+
+A good first step in troubleshooting your OSDs is to obtain information in
+addition to the information you collected while `monitoring your OSDs`_
+(e.g., ``ceph osd tree``).
+
+
+Ceph Logs
+---------
+
+If you haven't changed the default path, you can find Ceph log files at
+``/var/log/ceph``::
+
+	ls /var/log/ceph
+
+If you don't get enough log detail, you can change your logging level.  See
+`Logging and Debugging`_ for details to ensure that Ceph performs adequately
+under high logging volume.
+
+
+Admin Socket
+------------
+
+Use the admin socket tool to retrieve runtime information. For details, list
+the sockets for your Ceph processes::
+
+	ls /var/run/ceph
+
+Then, execute the following, replacing ``{daemon-name}`` with an actual
+daemon (e.g., ``osd.0``)::
+
+  ceph daemon osd.0 help
+
+Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``)::
+
+  ceph daemon {socket-file} help
+
+
+The admin socket, among other things, allows you to:
+
+- List your configuration at runtime
+- Dump historic operations
+- Dump the operation priority queue state
+- Dump operations in flight
+- Dump perfcounters
+
+
+Display Freespace
+-----------------
+
+Filesystem issues may arise. To display your filesystem's free space, execute
+``df``. ::
+
+	df -h
+
+Execute ``df --help`` for additional usage.
+
+
+I/O Statistics
+--------------
+
+Use `iostat`_ to identify I/O-related issues. ::
+
+	iostat -x
+
+
+Diagnostic Messages
+-------------------
+
+To retrieve diagnostic messages, use ``dmesg`` with ``less``, ``more``, ``grep``
+or ``tail``.  For example::
+
+	dmesg | grep scsi
+
+
+Stopping w/out Rebalancing
+==========================
+
+Periodically, you may need to perform maintenance on a subset of your cluster,
+or resolve a problem that affects a failure domain (e.g., a rack). If you do not
+want CRUSH to automatically rebalance the cluster as you stop OSDs for
+maintenance, set the cluster to ``noout`` first::
+
+	ceph osd set noout
+
+Once the cluster is set to ``noout``, you can begin stopping the OSDs within the
+failure domain that requires maintenance work. ::
+
+	stop ceph-osd id={num}
+
+.. note:: Placement groups within the OSDs you stop will become ``degraded``
+   while you are addressing issues with within the failure domain.
+
+Once you have completed your maintenance, restart the OSDs. ::
+
+	start ceph-osd id={num}
+
+Finally, you must unset the cluster from ``noout``. ::
+
+	ceph osd unset noout
+
+
+
+.. _osd-not-running:
+
+OSD Not Running
+===============
+
+Under normal circumstances, simply restarting the ``ceph-osd`` daemon will
+allow it to rejoin the cluster and recover.
+
+An OSD Won't Start
+------------------
+
+If you start your cluster and an OSD won't start, check the following:
+
+- **Configuration File:** If you were not able to get OSDs running from
+  a new installation, check your configuration file to ensure it conforms
+  (e.g., ``host`` not ``hostname``, etc.).
+
+- **Check Paths:** Check the paths in your configuration, and the actual
+  paths themselves for data and journals. If you separate the OSD data from
+  the journal data and there are errors in your configuration file or in the
+  actual mounts, you may have trouble starting OSDs. If you want to store the
+  journal on a block device, you should partition your journal disk and assign
+  one partition per OSD.
+
+- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be
+  hitting the default maximum number of threads (e.g., usually 32k), especially
+  during recovery. You can increase the number of threads using ``sysctl`` to
+  see if increasing the maximum number of threads to the maximum possible
+  number of threads allowed (i.e.,  4194303) will help. For example::
+
+	sysctl -w kernel.pid_max=4194303
+
+  If increasing the maximum thread count resolves the issue, you can make it
+  permanent by including a ``kernel.pid_max`` setting in the
+  ``/etc/sysctl.conf`` file. For example::
+
+	kernel.pid_max = 4194303
+
+- **Kernel Version:** Identify the kernel version and distribution you
+  are using. Ceph uses some third party tools by default, which may be
+  buggy or may conflict with certain distributions and/or kernel
+  versions (e.g., Google perftools). Check the `OS recommendations`_
+  to ensure you have addressed any issues related to your kernel.
+
+- **Segment Fault:** If there is a segment fault, turn your logging up
+  (if it is not already), and try again. If it segment faults again,
+  contact the ceph-devel email list and provide your Ceph configuration
+  file, your monitor output and the contents of your log file(s).
+
+
+
+An OSD Failed
+-------------
+
+When a ``ceph-osd`` process dies, the monitor will learn about the failure
+from surviving ``ceph-osd`` daemons and report it via the ``ceph health``
+command::
+
+	ceph health
+	HEALTH_WARN 1/3 in osds are down
+
+Specifically, you will get a warning whenever there are ``ceph-osd``
+processes that are marked ``in`` and ``down``.  You can identify which
+``ceph-osds`` are ``down`` with::
+
+	ceph health detail
+	HEALTH_WARN 1/3 in osds are down
+	osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080
+
+If there is a disk
+failure or other fault preventing ``ceph-osd`` from functioning or
+restarting, an error message should be present in its log file in
+``/var/log/ceph``.
+
+If the daemon stopped because of a heartbeat failure, the underlying
+kernel file system may be unresponsive. Check ``dmesg`` output for disk
+or other kernel errors.
+
+If the problem is a software error (failed assertion or other
+unexpected error), it should be reported to the `ceph-devel`_ email list.
+
+
+No Free Drive Space
+-------------------
+
+Ceph prevents you from writing to a full OSD so that you don't lose data.
+In an operational cluster, you should receive a warning when your cluster
+is getting near its full ratio. The ``mon osd full ratio`` defaults to
+``0.95``, or 95% of capacity before it stops clients from writing data.
+The ``mon osd backfillfull ratio`` defaults to ``0.90``, or 90 % of
+capacity when it blocks backfills from starting. The
+``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
+when it generates a health warning.
+
+Full cluster issues usually arise when testing how Ceph handles an OSD
+failure on a small cluster. When one node has a high percentage of the
+cluster's data, the cluster can easily eclipse its nearfull and full ratio
+immediately. If you are testing how Ceph reacts to OSD failures on a small
+cluster, you should leave ample free disk space and consider temporarily
+lowering the ``mon osd full ratio``, ``mon osd backfillfull ratio``  and
+``mon osd nearfull ratio``.
+
+Full ``ceph-osds`` will be reported by ``ceph health``::
+
+	ceph health
+	HEALTH_WARN 1 nearfull osd(s)
+
+Or::
+
+	ceph health detail
+	HEALTH_ERR 1 full osd(s); 1 backfillfull osd(s); 1 nearfull osd(s)
+	osd.3 is full at 97%
+	osd.4 is backfill full at 91%
+	osd.2 is near full at 87%
+
+The best way to deal with a full cluster is to add new ``ceph-osds``, allowing
+the cluster to redistribute data to the newly available storage.
+
+If you cannot start an OSD because it is full, you may delete some data by deleting
+some placement group directories in the full OSD.
+
+.. important:: If you choose to delete a placement group directory on a full OSD,
+   **DO NOT** delete the same placement group directory on another full OSD, or
+   **YOU MAY LOSE DATA**. You **MUST** maintain at least one copy of your data on
+   at least one OSD.
+
+See `Monitor Config Reference`_ for additional details.
+
+
+OSDs are Slow/Unresponsive
+==========================
+
+A commonly recurring issue involves slow or unresponsive OSDs. Ensure that you
+have eliminated other troubleshooting possibilities before delving into OSD
+performance issues. For example, ensure that your network(s) is working properly
+and your OSDs are running. Check to see if OSDs are throttling recovery traffic.
+
+.. tip:: Newer versions of Ceph provide better recovery handling by preventing
+   recovering OSDs from using up system resources so that ``up`` and ``in``
+   OSDs are not available or are otherwise slow.
+
+
+Networking Issues
+-----------------
+
+Ceph is a distributed storage system, so it  depends upon networks to peer with
+OSDs, replicate objects, recover from faults and check heartbeats. Networking
+issues can cause OSD latency and flapping OSDs. See `Flapping OSDs`_ for
+details.
+
+Ensure that Ceph processes and Ceph-dependent processes are connected and/or
+listening. ::
+
+	netstat -a | grep ceph
+	netstat -l | grep ceph
+	sudo netstat -p | grep ceph
+
+Check network statistics. ::
+
+	netstat -s
+
+
+Drive Configuration
+-------------------
+
+A storage drive should only support one OSD. Sequential read and sequential
+write throughput can bottleneck if other processes share the drive, including
+journals, operating systems, monitors, other OSDs and non-Ceph processes.
+
+Ceph acknowledges writes *after* journaling, so fast SSDs are an
+attractive option to accelerate the response time--particularly when
+using the ``XFS`` or ``ext4`` filesystems.  By contrast, the ``btrfs``
+filesystem can write and journal simultaneously.  (Note, however, that
+we recommend against using ``btrfs`` for production deployments.)
+
+.. note:: Partitioning a drive does not change its total throughput or
+   sequential read/write limits. Running a journal in a separate partition
+   may help, but you should prefer a separate physical drive.
+
+
+Bad Sectors / Fragmented Disk
+-----------------------------
+
+Check your disks for bad sectors and fragmentation. This can cause total throughput
+to drop substantially.
+
+
+Co-resident Monitors/OSDs
+-------------------------
+
+Monitors are generally light-weight processes, but they do lots of ``fsync()``,
+which can interfere with other workloads, particularly if monitors run on the
+same drive as your OSDs. Additionally, if you run monitors on the same host as
+the OSDs, you may incur performance issues related to:
+
+- Running an older kernel (pre-3.0)
+- Running Argonaut with an old ``glibc``
+- Running a kernel with no syncfs(2) syscall.
+
+In these cases, multiple OSDs running on the same host can drag each other down
+by doing lots of commits. That often leads to the bursty writes.
+
+
+Co-resident Processes
+---------------------
+
+Spinning up co-resident processes such as a cloud-based solution, virtual
+machines and other applications that write data to Ceph while operating on the
+same hardware as OSDs can introduce significant OSD latency. Generally, we
+recommend optimizing a host for use with Ceph and using other hosts for other
+processes. The practice of separating Ceph operations from other applications
+may help improve performance and may streamline troubleshooting and maintenance.
+
+
+Logging Levels
+--------------
+
+If you turned logging levels up to track an issue and then forgot to turn
+logging levels back down, the OSD may be putting a lot of logs onto the disk. If
+you intend to keep logging levels high, you may consider mounting a drive to the
+default path for logging (i.e., ``/var/log/ceph/$cluster-$name.log``).
+
+
+Recovery Throttling
+-------------------
+
+Depending upon your configuration, Ceph may reduce recovery rates to maintain
+performance or it may increase recovery rates to the point that recovery
+impacts OSD performance. Check to see if the OSD is recovering.
+
+
+Kernel Version
+--------------
+
+Check the kernel version you are running. Older kernels may not receive
+new backports that Ceph depends upon for better performance.
+
+
+Kernel Issues with SyncFS
+-------------------------
+
+Try running one OSD per host to see if performance improves. Old kernels
+might not have a recent enough version of ``glibc`` to support ``syncfs(2)``.
+
+
+Filesystem Issues
+-----------------
+
+Currently, we recommend deploying clusters with XFS.
+
+We recommend against using btrfs or ext4.  The btrfs filesystem has
+many attractive features, but bugs in the filesystem may lead to
+performance issues and suprious ENOSPC errors.  We do not recommend
+ext4 because xattr size limitations break our support for long object
+names (needed for RGW).
+
+For more information, see `Filesystem Recommendations`_.
+
+.. _Filesystem Recommendations: ../configuration/filesystem-recommendations
+
+
+Insufficient RAM
+----------------
+
+We recommend 1GB of RAM per OSD daemon. You may notice that during normal
+operations, the OSD only uses a fraction of that amount (e.g., 100-200MB).
+Unused RAM makes it tempting to use the excess RAM for co-resident applications,
+VMs and so forth. However, when OSDs go into recovery mode, their memory
+utilization spikes. If there is no RAM available, the OSD performance will slow
+considerably.
+
+
+Old Requests or Slow Requests
+-----------------------------
+
+If a ``ceph-osd`` daemon is slow to respond to a request, it will generate log messages
+complaining about requests that are taking too long.  The warning threshold
+defaults to 30 seconds, and is configurable via the ``osd op complaint time``
+option.  When this happens, the cluster log will receive messages.
+
+Legacy versions of Ceph complain about 'old requests`::
+
+	osd.0 192.168.106.220:6800/18813 312 : [WRN] old request osd_op(client.5099.0:790 fatty_26485_object789 [write 0~4096] 2.5e54f643) v4 received at 2012-03-06 15:42:56.054801 currently waiting for sub ops
+
+New versions of Ceph complain about 'slow requests`::
+
+	{date} {osd.num} [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
+	{date} {osd.num}  [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
+
+
+Possible causes include:
+
+- A bad drive (check ``dmesg`` output)
+- A bug in the kernel file system bug (check ``dmesg`` output)
+- An overloaded cluster (check system load, iostat, etc.)
+- A bug in the ``ceph-osd`` daemon.
+
+Possible solutions
+
+- Remove VMs Cloud Solutions from Ceph Hosts
+- Upgrade Kernel
+- Upgrade Ceph
+- Restart OSDs
+
+Debugging Slow Requests
+-----------------------
+
+If you run "ceph daemon osd.<id> dump_historic_ops" or "dump_ops_in_flight",
+you will see a set of operations and a list of events each operation went
+through. These are briefly described below.
+
+Events from the Messenger layer:
+
+- header_read: when the messenger first started reading the message off the wire
+- throttled: when the messenger tried to acquire memory throttle space to read
+  the message into memory
+- all_read: when the messenger finished reading the message off the wire
+- dispatched: when the messenger gave the message to the OSD
+- Initiated: <This is identical to header_read. The existence of both is a
+  historical oddity.
+
+Events from the OSD as it prepares operations
+
+- queued_for_pg: the op has been put into the queue for processing by its PG
+- reached_pg: the PG has started doing the op
+- waiting for \*: the op is waiting for some other work to complete before it
+  can proceed (a new OSDMap; for its object target to scrub; for the PG to
+  finish peering; all as specified in the message)
+- started: the op has been accepted as something the OSD should actually do
+  (reasons not to do it: failed security/permission checks; out-of-date local
+  state; etc) and is now actually being performed
+- waiting for subops from: the op has been sent to replica OSDs
+
+Events from the FileStore
+
+- commit_queued_for_journal_write: the op has been given to the FileStore
+- write_thread_in_journal_buffer: the op is in the journal's buffer and waiting
+  to be persisted (as the next disk write)
+- journaled_completion_queued: the op was journaled to disk and its callback
+  queued for invocation
+
+Events from the OSD after stuff has been given to local disk
+
+- op_commit: the op has been committed (ie, written to journal) by the
+  primary OSD
+- op_applied: The op has been write()'en to the backing FS (ie, applied in
+  memory but not flushed out to disk) on the primary
+- sub_op_applied: op_applied, but for a replica's "subop"
+- sub_op_committed: op_commited, but for a replica's subop (only for EC pools)
+- sub_op_commit_rec/sub_op_apply_rec from <X>: the primary marks this when it
+  hears about the above, but for a particular replica <X>
+- commit_sent: we sent a reply back to the client (or primary OSD, for sub ops)
+
+Many of these events are seemingly redundant, but cross important boundaries in
+the internal code (such as passing data across locks into new threads).
+
+Flapping OSDs
+=============
+
+We recommend using both a public (front-end) network and a cluster (back-end)
+network so that you can better meet the capacity requirements of object
+replication. Another advantage is that you can run a cluster network such that
+it is not connected to the internet, thereby preventing some denial of service
+attacks. When OSDs peer and check heartbeats, they use the cluster (back-end)
+network when it's available. See `Monitor/OSD Interaction`_ for details.
+
+However, if the cluster (back-end) network fails or develops significant latency
+while the public (front-end) network operates optimally, OSDs currently do not
+handle this situation well. What happens is that OSDs mark each other ``down``
+on the monitor, while marking themselves ``up``. We call this scenario
+'flapping`.
+
+If something is causing OSDs to 'flap' (repeatedly getting marked ``down`` and
+then ``up`` again), you can force the monitors to stop the flapping with::
+
+	ceph osd set noup      # prevent OSDs from getting marked up
+	ceph osd set nodown    # prevent OSDs from getting marked down
+
+These flags are recorded in the osdmap structure::
+
+	ceph osd dump | grep flags
+	flags no-up,no-down
+
+You can clear the flags with::
+
+	ceph osd unset noup
+	ceph osd unset nodown
+
+Two other flags are supported, ``noin`` and ``noout``, which prevent
+booting OSDs from being marked ``in`` (allocated data) or protect OSDs
+from eventually being marked ``out`` (regardless of what the current value for
+``mon osd down out interval`` is).
+
+.. note:: ``noup``, ``noout``, and ``nodown`` are temporary in the
+   sense that once the flags are cleared, the action they were blocking
+   should occur shortly after.  The ``noin`` flag, on the other hand,
+   prevents OSDs from being marked ``in`` on boot, and any daemons that
+   started while the flag was set will remain that way.
+
+
+
+
+
+
+.. _iostat: http://en.wikipedia.org/wiki/Iostat
+.. _Ceph Logging and Debugging: ../../configuration/ceph-conf#ceph-logging-and-debugging
+.. _Logging and Debugging: ../log-and-debug
+.. _Debugging and Logging: ../debug
+.. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction
+.. _Monitor Config Reference: ../../configuration/mon-config-ref
+.. _monitoring your OSDs: ../../operations/monitoring-osd-pg
+.. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
+.. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
+.. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
+.. _unsubscribe from the ceph-users email list: mailto:ceph-users-leave@lists.ceph.com
+.. _OS recommendations: ../../../start/os-recommendations
+.. _ceph-devel: ceph-devel@vger.kernel.org
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst
new file mode 100644
index 0000000..4241fee
--- /dev/null
+++ b/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst
@@ -0,0 +1,668 @@
+=====================
+ Troubleshooting PGs
+=====================
+
+Placement Groups Never Get Clean
+================================
+
+When you create a cluster and your cluster remains in ``active``, 
+``active+remapped`` or ``active+degraded`` status and never achieve an 
+``active+clean`` status, you likely have a problem with your configuration.
+
+You may need to review settings in the `Pool, PG and CRUSH Config Reference`_
+and make appropriate adjustments.
+
+As a general rule, you should run your cluster with more than one OSD and a
+pool size greater than 1 object replica.
+
+One Node Cluster
+----------------
+
+Ceph no longer provides documentation for operating on a single node, because
+you would never deploy a system designed for distributed computing on a single
+node. Additionally, mounting client kernel modules on a single node containing a
+Ceph  daemon may cause a deadlock due to issues with the Linux kernel itself
+(unless you use VMs for the clients). You can experiment with Ceph in a 1-node
+configuration, in spite of the limitations as described herein.
+
+If you are trying to create a cluster on a single node, you must change the
+default of the ``osd crush chooseleaf type`` setting from ``1`` (meaning 
+``host`` or ``node``) to ``0`` (meaning ``osd``) in your Ceph configuration
+file before you create your monitors and OSDs. This tells Ceph that an OSD
+can peer with another OSD on the same host. If you are trying to set up a
+1-node cluster and ``osd crush chooseleaf type`` is greater than ``0``, 
+Ceph will try to peer the PGs of one OSD with the PGs of another OSD on 
+another node, chassis, rack, row, or even datacenter depending on the setting.
+
+.. tip:: DO NOT mount kernel clients directly on the same node as your 
+   Ceph Storage Cluster, because kernel conflicts can arise. However, you 
+   can mount kernel clients within virtual machines (VMs) on a single node.
+
+If you are creating OSDs using a single disk, you must create directories
+for the data manually first. For example:: 
+
+	mkdir /var/local/osd0 /var/local/osd1
+	ceph-deploy osd prepare {localhost-name}:/var/local/osd0 {localhost-name}:/var/local/osd1
+	ceph-deploy osd activate {localhost-name}:/var/local/osd0 {localhost-name}:/var/local/osd1
+
+
+Fewer OSDs than Replicas
+------------------------
+
+If you have brought up two OSDs to an ``up`` and ``in`` state, but you still 
+don't see ``active + clean`` placement groups, you may have an 
+``osd pool default size`` set to greater than ``2``.
+
+There are a few ways to address this situation. If you want to operate your
+cluster in an ``active + degraded`` state with two replicas, you can set the 
+``osd pool default min size`` to ``2`` so that you can write objects in 
+an ``active + degraded`` state. You may also set the ``osd pool default size``
+setting to ``2`` so that you only have two stored replicas (the original and 
+one replica), in which case the cluster should achieve an ``active + clean`` 
+state.
+
+.. note:: You can make the changes at runtime. If you make the changes in 
+   your Ceph configuration file, you may need to restart your cluster.
+
+
+Pool Size = 1
+-------------
+
+If you have the ``osd pool default size`` set to ``1``, you will only have 
+one copy of the object. OSDs rely on other OSDs to tell them which objects 
+they should have. If a first OSD has a copy of an object and there is no
+second copy, then no second OSD can tell the first OSD that it should have
+that copy. For each placement group mapped to the first OSD (see 
+``ceph pg dump``), you can force the first OSD to notice the placement groups
+it needs by running::
+   
+   	ceph osd force-create-pg <pgid>
+   	
+
+CRUSH Map Errors
+----------------
+
+Another candidate for placement groups remaining unclean involves errors 
+in your CRUSH map.
+
+
+Stuck Placement Groups
+======================
+
+It is normal for placement groups to enter states like "degraded" or "peering"
+following a failure.  Normally these states indicate the normal progression
+through the failure recovery process. However, if a placement group stays in one
+of these states for a long time this may be an indication of a larger problem.
+For this reason, the monitor will warn when placement groups get "stuck" in a
+non-optimal state.  Specifically, we check for:
+
+* ``inactive`` - The placement group has not been ``active`` for too long 
+  (i.e., it hasn't been able to service read/write requests).
+  
+* ``unclean`` - The placement group has not been ``clean`` for too long 
+  (i.e., it hasn't been able to completely recover from a previous failure).
+
+* ``stale`` - The placement group status has not been updated by a ``ceph-osd``,
+  indicating that all nodes storing this placement group may be ``down``.
+
+You can explicitly list stuck placement groups with one of::
+
+	ceph pg dump_stuck stale
+	ceph pg dump_stuck inactive
+	ceph pg dump_stuck unclean
+
+For stuck ``stale`` placement groups, it is normally a matter of getting the
+right ``ceph-osd`` daemons running again.  For stuck ``inactive`` placement
+groups, it is usually a peering problem (see :ref:`failures-osd-peering`).  For
+stuck ``unclean`` placement groups, there is usually something preventing
+recovery from completing, like unfound objects (see
+:ref:`failures-osd-unfound`);
+
+
+
+.. _failures-osd-peering:
+
+Placement Group Down - Peering Failure
+======================================
+
+In certain cases, the ``ceph-osd`` `Peering` process can run into
+problems, preventing a PG from becoming active and usable.  For
+example, ``ceph health`` might report::
+
+	ceph health detail
+	HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
+	...
+	pg 0.5 is down+peering
+	pg 1.4 is down+peering
+	...
+	osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651
+
+We can query the cluster to determine exactly why the PG is marked ``down`` with::
+
+	ceph pg 0.5 query
+
+.. code-block:: javascript
+
+ { "state": "down+peering",
+   ...
+   "recovery_state": [
+        { "name": "Started\/Primary\/Peering\/GetInfo",
+          "enter_time": "2012-03-06 14:40:16.169679",
+          "requested_info_from": []},
+        { "name": "Started\/Primary\/Peering",
+          "enter_time": "2012-03-06 14:40:16.169659",
+          "probing_osds": [
+                0,
+                1],
+          "blocked": "peering is blocked due to down osds",
+          "down_osds_we_would_probe": [
+                1],
+          "peering_blocked_by": [
+                { "osd": 1,
+                  "current_lost_at": 0,
+                  "comment": "starting or marking this osd lost may let us proceed"}]},
+        { "name": "Started",
+          "enter_time": "2012-03-06 14:40:16.169513"}
+    ]
+ }
+
+The ``recovery_state`` section tells us that peering is blocked due to
+down ``ceph-osd`` daemons, specifically ``osd.1``.  In this case, we can start that ``ceph-osd``
+and things will recover.
+
+Alternatively, if there is a catastrophic failure of ``osd.1`` (e.g., disk
+failure), we can tell the cluster that it is ``lost`` and to cope as
+best it can. 
+
+.. important:: This is dangerous in that the cluster cannot
+   guarantee that the other copies of the data are consistent 
+   and up to date.  
+
+To instruct Ceph to continue anyway::
+
+	ceph osd lost 1
+
+Recovery will proceed.
+
+
+.. _failures-osd-unfound:
+
+Unfound Objects
+===============
+
+Under certain combinations of failures Ceph may complain about
+``unfound`` objects::
+
+	ceph health detail
+	HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)
+	pg 2.4 is active+degraded, 78 unfound
+
+This means that the storage cluster knows that some objects (or newer
+copies of existing objects) exist, but it hasn't found copies of them.
+One example of how this might come about for a PG whose data is on ceph-osds
+1 and 2:
+
+* 1 goes down
+* 2 handles some writes, alone
+* 1 comes up
+* 1 and 2 repeer, and the objects missing on 1 are queued for recovery.
+* Before the new objects are copied, 2 goes down.
+
+Now 1 knows that these object exist, but there is no live ``ceph-osd`` who
+has a copy.  In this case, IO to those objects will block, and the
+cluster will hope that the failed node comes back soon; this is
+assumed to be preferable to returning an IO error to the user.
+
+First, you can identify which objects are unfound with::
+
+	ceph pg 2.4 list_missing [starting offset, in json]
+
+.. code-block:: javascript
+
+ { "offset": { "oid": "",
+      "key": "",
+      "snapid": 0,
+      "hash": 0,
+      "max": 0},
+  "num_missing": 0,
+  "num_unfound": 0,
+  "objects": [
+     { "oid": "object 1",
+       "key": "",
+       "hash": 0,
+       "max": 0 },
+     ...
+  ],
+  "more": 0}
+
+If there are too many objects to list in a single result, the ``more``
+field will be true and you can query for more.  (Eventually the
+command line tool will hide this from you, but not yet.)
+
+Second, you can identify which OSDs have been probed or might contain
+data::
+
+	ceph pg 2.4 query
+
+.. code-block:: javascript
+
+   "recovery_state": [
+        { "name": "Started\/Primary\/Active",
+          "enter_time": "2012-03-06 15:15:46.713212",
+          "might_have_unfound": [
+                { "osd": 1,
+                  "status": "osd is down"}]},
+
+In this case, for example, the cluster knows that ``osd.1`` might have
+data, but it is ``down``.  The full range of possible states include:
+
+* already probed
+* querying
+* OSD is down
+* not queried (yet)
+
+Sometimes it simply takes some time for the cluster to query possible
+locations.  
+
+It is possible that there are other locations where the object can
+exist that are not listed.  For example, if a ceph-osd is stopped and
+taken out of the cluster, the cluster fully recovers, and due to some
+future set of failures ends up with an unfound object, it won't
+consider the long-departed ceph-osd as a potential location to
+consider.  (This scenario, however, is unlikely.)
+
+If all possible locations have been queried and objects are still
+lost, you may have to give up on the lost objects. This, again, is
+possible given unusual combinations of failures that allow the cluster
+to learn about writes that were performed before the writes themselves
+are recovered.  To mark the "unfound" objects as "lost"::
+
+	ceph pg 2.5 mark_unfound_lost revert|delete
+
+This the final argument specifies how the cluster should deal with
+lost objects.  
+
+The "delete" option will forget about them entirely.
+
+The "revert" option (not available for erasure coded pools) will
+either roll back to a previous version of the object or (if it was a
+new object) forget about it entirely.  Use this with caution, as it
+may confuse applications that expected the object to exist.
+
+
+Homeless Placement Groups
+=========================
+
+It is possible for all OSDs that had copies of a given placement groups to fail.
+If that's the case, that subset of the object store is unavailable, and the
+monitor will receive no status updates for those placement groups.  To detect
+this situation, the monitor marks any placement group whose primary OSD has
+failed as ``stale``.  For example::
+
+	ceph health
+	HEALTH_WARN 24 pgs stale; 3/300 in osds are down
+
+You can identify which placement groups are ``stale``, and what the last OSDs to
+store them were, with::
+
+	ceph health detail
+	HEALTH_WARN 24 pgs stale; 3/300 in osds are down
+	...
+	pg 2.5 is stuck stale+active+remapped, last acting [2,0]
+	...
+	osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
+	osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
+	osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861
+
+If we want to get placement group 2.5 back online, for example, this tells us that
+it was last managed by ``osd.0`` and ``osd.2``.  Restarting those ``ceph-osd``
+daemons will allow the cluster to recover that placement group (and, presumably,
+many others).
+
+
+Only a Few OSDs Receive Data
+============================
+
+If you have many nodes in your cluster and only a few of them receive data,
+`check`_ the number of placement groups in your pool. Since placement groups get
+mapped to OSDs, a small number of placement groups will not distribute across
+your cluster. Try creating a pool with a placement group count that is a
+multiple of the number of OSDs. See `Placement Groups`_ for details. The default
+placement group count for pools is not useful, but you can change it `here`_.
+
+
+Can't Write Data
+================
+
+If your cluster is up, but some OSDs are down and you cannot write data, 
+check to ensure that you have the minimum number of OSDs running for the
+placement group. If you don't have the minimum number of OSDs running, 
+Ceph will not allow you to write data because there is no guarantee
+that Ceph can replicate your data. See ``osd pool default min size``
+in the `Pool, PG and CRUSH Config Reference`_ for details.
+
+
+PGs Inconsistent
+================
+
+If you receive an ``active + clean + inconsistent`` state, this may happen
+due to an error during scrubbing. As always, we can identify the inconsistent
+placement group(s) with::
+
+    $ ceph health detail
+    HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
+    pg 0.6 is active+clean+inconsistent, acting [0,1,2]
+    2 scrub errors
+
+Or if you prefer inspecting the output in a programmatic way::
+
+    $ rados list-inconsistent-pg rbd
+    ["0.6"]
+
+There is only one consistent state, but in the worst case, we could have
+different inconsistencies in multiple perspectives found in more than one
+objects. If an object named ``foo`` in PG ``0.6`` is truncated, we will have::
+
+    $ rados list-inconsistent-obj 0.6 --format=json-pretty
+
+.. code-block:: javascript
+
+    {
+        "epoch": 14,
+        "inconsistents": [
+            {
+                "object": {
+                    "name": "foo",
+                    "nspace": "",
+                    "locator": "",
+                    "snap": "head",
+                    "version": 1
+                },
+                "errors": [
+                    "data_digest_mismatch",
+                    "size_mismatch"
+                ],
+                "union_shard_errors": [
+                    "data_digest_mismatch_oi",
+                    "size_mismatch_oi"
+                ],
+                "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
+                "shards": [
+                    {
+                        "osd": 0,
+                        "errors": [],
+                        "size": 968,
+                        "omap_digest": "0xffffffff",
+                        "data_digest": "0xe978e67f"
+                    },
+                    {
+                        "osd": 1,
+                        "errors": [],
+                        "size": 968,
+                        "omap_digest": "0xffffffff",
+                        "data_digest": "0xe978e67f"
+                    },
+                    {
+                        "osd": 2,
+                        "errors": [
+                            "data_digest_mismatch_oi",
+                            "size_mismatch_oi"
+                        ],
+                        "size": 0,
+                        "omap_digest": "0xffffffff",
+                        "data_digest": "0xffffffff"
+                    }
+                ]
+            }
+        ]
+    }
+
+In this case, we can learn from the output:
+
+* The only inconsistent object is named ``foo``, and it is its head that has
+  inconsistencies.
+* The inconsistencies fall into two categories:
+
+  * ``errors``: these errors indicate inconsistencies between shards without a
+    determination of which shard(s) are bad. Check for the ``errors`` in the
+    `shards` array, if available, to pinpoint the problem.
+
+    * ``data_digest_mismatch``: the digest of the replica read from OSD.2 is
+      different from the ones of OSD.0 and OSD.1
+    * ``size_mismatch``: the size of the replica read from OSD.2 is 0, while
+      the size reported by OSD.0 and OSD.1 is 968.
+  * ``union_shard_errors``: the union of all shard specific ``errors`` in
+    ``shards`` array. The ``errors`` are set for the given shard that has the
+    problem. They include errors like ``read_error``. The ``errors`` ending in
+    ``oi`` indicate a comparison with ``selected_object_info``. Look at the
+    ``shards`` array to determine which shard has which error(s).
+
+    * ``data_digest_mismatch_oi``: the digest stored in the object-info is not
+      ``0xffffffff``, which is calculated from the shard read from OSD.2
+    * ``size_mismatch_oi``: the size stored in the object-info is different
+      from the one read from OSD.2. The latter is 0.
+
+You can repair the inconsistent placement group by executing:: 
+
+	ceph pg repair {placement-group-ID}
+
+Which overwrites the `bad` copies with the `authoritative` ones. In most cases,
+Ceph is able to choose authoritative copies from all available replicas using
+some predefined criteria. But this does not always work. For example, the stored
+data digest could be missing, and the calculated digest will be ignored when
+choosing the authoritative copies. So, please use the above command with caution.
+
+If ``read_error`` is listed in the ``errors`` attribute of a shard, the
+inconsistency is likely due to disk errors. You might want to check your disk
+used by that OSD.
+
+If you receive ``active + clean + inconsistent`` states periodically due to 
+clock skew, you may consider configuring your `NTP`_ daemons on your 
+monitor hosts to act as peers. See `The Network Time Protocol`_ and Ceph 
+`Clock Settings`_ for additional details.
+
+
+Erasure Coded PGs are not active+clean
+======================================
+
+When CRUSH fails to find enough OSDs to map to a PG, it will show as a
+``2147483647`` which is ITEM_NONE or ``no OSD found``. For instance::
+
+     [2,1,6,0,5,8,2147483647,7,4]
+
+Not enough OSDs
+---------------
+
+If the Ceph cluster only has 8 OSDs and the erasure coded pool needs
+9, that is what it will show. You can either create another erasure
+coded pool that requires less OSDs::
+
+     ceph osd erasure-code-profile set myprofile k=5 m=3
+     ceph osd pool create erasurepool 16 16 erasure myprofile
+
+or add a new OSDs and the PG will automatically use them.
+
+CRUSH constraints cannot be satisfied
+-------------------------------------
+
+If the cluster has enough OSDs, it is possible that the CRUSH ruleset
+imposes constraints that cannot be satisfied. If there are 10 OSDs on
+two hosts and the CRUSH rulesets require that no two OSDs from the
+same host are used in the same PG, the mapping may fail because only
+two OSD will be found. You can check the constraint by displaying the
+ruleset::
+
+    $ ceph osd crush rule ls
+    [
+        "replicated_ruleset",
+        "erasurepool"]
+    $ ceph osd crush rule dump erasurepool
+    { "rule_id": 1,
+      "rule_name": "erasurepool",
+      "ruleset": 1,
+      "type": 3,
+      "min_size": 3,
+      "max_size": 20,
+      "steps": [
+            { "op": "take",
+              "item": -1,
+              "item_name": "default"},
+            { "op": "chooseleaf_indep",
+              "num": 0,
+              "type": "host"},
+            { "op": "emit"}]}
+
+
+You can resolve the problem by creating a new pool in which PGs are allowed
+to have OSDs residing on the same host with::
+
+     ceph osd erasure-code-profile set myprofile crush-failure-domain=osd
+     ceph osd pool create erasurepool 16 16 erasure myprofile
+
+CRUSH gives up too soon
+-----------------------
+
+If the Ceph cluster has just enough OSDs to map the PG (for instance a
+cluster with a total of 9 OSDs and an erasure coded pool that requires
+9 OSDs per PG), it is possible that CRUSH gives up before finding a
+mapping. It can be resolved by:
+
+* lowering the erasure coded pool requirements to use less OSDs per PG
+  (that requires the creation of another pool as erasure code profiles
+  cannot be dynamically modified).
+
+* adding more OSDs to the cluster (that does not require the erasure
+  coded pool to be modified, it will become clean automatically)
+
+* use a hand made CRUSH ruleset that tries more times to find a good
+  mapping. It can be done by setting ``set_choose_tries`` to a value
+  greater than the default.
+
+You should first verify the problem with ``crushtool`` after
+extracting the crushmap from the cluster so your experiments do not
+modify the Ceph cluster and only work on a local files::
+
+    $ ceph osd crush rule dump erasurepool
+    { "rule_name": "erasurepool",
+      "ruleset": 1,
+      "type": 3,
+      "min_size": 3,
+      "max_size": 20,
+      "steps": [
+            { "op": "take",
+              "item": -1,
+              "item_name": "default"},
+            { "op": "chooseleaf_indep",
+              "num": 0,
+              "type": "host"},
+            { "op": "emit"}]}
+    $ ceph osd getcrushmap > crush.map
+    got crush map from osdmap epoch 13
+    $ crushtool -i crush.map --test --show-bad-mappings \
+       --rule 1 \
+       --num-rep 9 \
+       --min-x 1 --max-x $((1024 * 1024))
+    bad mapping rule 8 x 43 num_rep 9 result [3,2,7,1,2147483647,8,5,6,0]
+    bad mapping rule 8 x 79 num_rep 9 result [6,0,2,1,4,7,2147483647,5,8]
+    bad mapping rule 8 x 173 num_rep 9 result [0,4,6,8,2,1,3,7,2147483647]
+
+Where ``--num-rep`` is the number of OSDs the erasure code crush
+ruleset needs, ``--rule`` is the value of the ``ruleset`` field
+displayed by ``ceph osd crush rule dump``.  The test will try mapping
+one million values (i.e. the range defined by ``[--min-x,--max-x]``)
+and must display at least one bad mapping. If it outputs nothing it
+means all mappings are successfull and you can stop right there: the
+problem is elsewhere.
+
+The crush ruleset can be edited by decompiling the crush map::
+
+    $ crushtool --decompile crush.map > crush.txt
+
+and adding the following line to the ruleset::
+
+    step set_choose_tries 100
+
+The relevant part of of the ``crush.txt`` file should look something
+like::
+
+     rule erasurepool {
+             ruleset 1
+             type erasure
+             min_size 3
+             max_size 20
+             step set_chooseleaf_tries 5
+             step set_choose_tries 100
+             step take default
+             step chooseleaf indep 0 type host
+             step emit
+     }
+
+It can then be compiled and tested again::
+
+    $ crushtool --compile crush.txt -o better-crush.map
+
+When all mappings succeed, an histogram of the number of tries that
+were necessary to find all of them can be displayed with the
+``--show-choose-tries`` option of ``crushtool``::
+
+    $ crushtool -i better-crush.map --test --show-bad-mappings \
+       --show-choose-tries \
+       --rule 1 \
+       --num-rep 9 \
+       --min-x 1 --max-x $((1024 * 1024))
+    ...
+    11:        42
+    12:        44
+    13:        54
+    14:        45
+    15:        35
+    16:        34
+    17:        30
+    18:        25
+    19:        19
+    20:        22
+    21:        20
+    22:        17
+    23:        13
+    24:        16
+    25:        13
+    26:        11
+    27:        11
+    28:        13
+    29:        11
+    30:        10
+    31:         6
+    32:         5
+    33:        10
+    34:         3
+    35:         7
+    36:         5
+    37:         2
+    38:         5
+    39:         5
+    40:         2
+    41:         5
+    42:         4
+    43:         1
+    44:         2
+    45:         2
+    46:         3
+    47:         1
+    48:         0
+    ...
+    102:         0
+    103:         1
+    104:         0
+    ...
+
+It took 11 tries to map 42 PGs, 12 tries to map 44 PGs etc. The highest number of tries is the minimum value of ``set_choose_tries`` that prevents bad mappings (i.e. 103 in the above output because it did not take more than 103 tries for any PG to be mapped).
+
+.. _check: ../../operations/placement-groups#get-the-number-of-placement-groups
+.. _here: ../../configuration/pool-pg-config-ref
+.. _Placement Groups: ../../operations/placement-groups
+.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
+.. _NTP: http://en.wikipedia.org/wiki/Network_Time_Protocol
+.. _The Network Time Protocol: http://www.ntp.org/
+.. _Clock Settings: ../../configuration/mon-config-ref/#clock
+
+