From 7da45d65be36d36b880cc55c5036e96c24b53f00 Mon Sep 17 00:00:00 2001
From: Qiaowei Ren <qiaowei.ren@intel.com>
Date: Thu, 1 Mar 2018 14:38:11 +0800
Subject: remove ceph code

This patch removes initial ceph code, due to license issue.

Change-Id: I092d44f601cdf34aed92300fe13214925563081c
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 src/ceph/doc/rados/troubleshooting/community.rst   |  29 -
 .../doc/rados/troubleshooting/cpu-profiling.rst    |  67 ---
 src/ceph/doc/rados/troubleshooting/index.rst       |  19 -
 .../doc/rados/troubleshooting/log-and-debug.rst    | 550 -----------------
 .../doc/rados/troubleshooting/memory-profiling.rst | 142 -----
 .../rados/troubleshooting/troubleshooting-mon.rst  | 567 -----------------
 .../rados/troubleshooting/troubleshooting-osd.rst  | 536 -----------------
 .../rados/troubleshooting/troubleshooting-pg.rst   | 668 ---------------------
 8 files changed, 2578 deletions(-)
 delete mode 100644 src/ceph/doc/rados/troubleshooting/community.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/cpu-profiling.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/index.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/log-and-debug.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/memory-profiling.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst
 delete mode 100644 src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst

(limited to 'src/ceph/doc/rados/troubleshooting')

diff --git a/src/ceph/doc/rados/troubleshooting/community.rst b/src/ceph/doc/rados/troubleshooting/community.rst
deleted file mode 100644
index 9faad13..0000000
--- a/src/ceph/doc/rados/troubleshooting/community.rst
+++ /dev/null
@@ -1,29 +0,0 @@
-====================
- The Ceph Community
-====================
-
-The Ceph community is an excellent source of information and help. For
-operational issues with Ceph releases we recommend you `subscribe to the
-ceph-users email list`_. When  you no longer want to receive emails, you can
-`unsubscribe from the ceph-users email list`_.
-
-You may also `subscribe to the ceph-devel email list`_. You should do so if
-your issue is:
-
-- Likely related to a bug
-- Related to a development release package
-- Related to a development testing package
-- Related to your own builds
-
-If you no longer want to receive emails from the ``ceph-devel`` email list, you
-may `unsubscribe from the ceph-devel email list`_.
-
-.. tip:: The Ceph community is growing rapidly, and community members can help
-   you if you provide them with detailed information about your problem. You
-   can attach the output of the ``ceph report`` command to help people understand your issues.
-
-.. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
-.. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
-.. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
-.. _unsubscribe from the ceph-users email list: mailto:ceph-users-leave@lists.ceph.com
-.. _ceph-devel: ceph-devel@vger.kernel.org
\ No newline at end of file
diff --git a/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst b/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst
deleted file mode 100644
index 159f799..0000000
--- a/src/ceph/doc/rados/troubleshooting/cpu-profiling.rst
+++ /dev/null
@@ -1,67 +0,0 @@
-===============
- CPU Profiling
-===============
-
-If you built Ceph from source and compiled Ceph for use with `oprofile`_
-you can profile Ceph's CPU usage. See `Installing Oprofile`_ for details.
-
-
-Initializing oprofile
-=====================
-
-The first time you use ``oprofile`` you need to initialize it. Locate the
-``vmlinux`` image corresponding to the kernel you are now running. :: 
-
-	ls /boot
-	sudo opcontrol --init
-	sudo opcontrol --setup --vmlinux={path-to-image} --separate=library --callgraph=6
-
-
-Starting oprofile
-=================
-
-To start ``oprofile`` execute the following command:: 
-
-	opcontrol --start
-
-Once you start ``oprofile``, you may run some tests with Ceph. 
-
-
-Stopping oprofile
-=================
-
-To stop ``oprofile`` execute the following command:: 
-
-	opcontrol --stop
-	
-	
-Retrieving oprofile Results
-===========================
-
-To retrieve the top ``cmon`` results, execute the following command:: 
-
-	opreport -gal ./cmon | less	
-	
-
-To retrieve the top ``cmon`` results with call graphs attached, execute the
-following command:: 
-
-	opreport -cal ./cmon | less	
-	
-.. important:: After reviewing results, you should reset ``oprofile`` before
-   running it again. Resetting ``oprofile`` removes data from the session 
-   directory.
-
-
-Resetting oprofile
-==================
-
-To reset ``oprofile``, execute the following command:: 
-
-	sudo opcontrol --reset   
-   
-.. important:: You should reset ``oprofile`` after analyzing data so that 
-   you do not commingle results from different tests.
-
-.. _oprofile: http://oprofile.sourceforge.net/about/
-.. _Installing Oprofile: ../../../dev/cpu-profiler
diff --git a/src/ceph/doc/rados/troubleshooting/index.rst b/src/ceph/doc/rados/troubleshooting/index.rst
deleted file mode 100644
index 80d14f3..0000000
--- a/src/ceph/doc/rados/troubleshooting/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-=================
- Troubleshooting
-=================
-
-Ceph is still on the leading edge, so you may encounter situations that require 
-you to examine your configuration, modify your logging output, troubleshoot
-monitors and OSDs, profile memory and CPU usage, and reach out to the 
-Ceph community for help.
-
-.. toctree::
-   :maxdepth: 1
-   
-   community
-   log-and-debug
-   troubleshooting-mon
-   troubleshooting-osd
-   troubleshooting-pg
-   memory-profiling
-   cpu-profiling
diff --git a/src/ceph/doc/rados/troubleshooting/log-and-debug.rst b/src/ceph/doc/rados/troubleshooting/log-and-debug.rst
deleted file mode 100644
index c91f272..0000000
--- a/src/ceph/doc/rados/troubleshooting/log-and-debug.rst
+++ /dev/null
@@ -1,550 +0,0 @@
-=======================
- Logging and Debugging
-=======================
-
-Typically, when you add debugging to your Ceph configuration, you do so at
-runtime. You can also add Ceph debug logging to your Ceph configuration file if
-you are encountering issues when starting your cluster. You may view Ceph log
-files under ``/var/log/ceph`` (the default location).
-
-.. tip:: When debug output slows down your system, the latency can hide 
-   race conditions.
-
-Logging is resource intensive. If you are encountering a problem in a specific
-area of your cluster, enable logging for that area of the cluster. For example,
-if your OSDs are running fine, but your metadata servers are not, you should
-start by enabling debug logging for the specific metadata server instance(s)
-giving you trouble. Enable logging for each subsystem as needed.
-
-.. important:: Verbose logging can generate over 1GB of data per hour. If your 
-   OS disk reaches its capacity, the node will stop working.
-   
-If you enable or increase the rate of Ceph logging, ensure that you have
-sufficient disk space on your OS disk.  See `Accelerating Log Rotation`_ for
-details on rotating log files. When your system is running well, remove
-unnecessary debugging settings to ensure your cluster runs optimally. Logging
-debug output messages is relatively slow, and a waste of resources when
-operating your cluster.
-
-See `Subsystem, Log and Debug Settings`_ for details on available settings.
-
-Runtime
-=======
-
-If you would like to see the configuration settings at runtime, you must log
-in to a host with a running daemon and execute the following:: 
-
-	ceph daemon {daemon-name} config show | less
-
-For example,::
-
-  ceph daemon osd.0 config show | less
-
-To activate Ceph's debugging output (*i.e.*, ``dout()``) at runtime,  use the
-``ceph tell`` command to inject arguments into the runtime configuration:: 
-
-	ceph tell {daemon-type}.{daemon id or *} injectargs --{name} {value} [--{name} {value}]
-	
-Replace ``{daemon-type}`` with one of ``osd``, ``mon`` or ``mds``. You may apply
-the runtime setting to all daemons of a particular type with ``*``, or specify
-a specific daemon's ID. For example, to increase
-debug logging for a ``ceph-osd`` daemon named ``osd.0``, execute the following:: 
-
-	ceph tell osd.0 injectargs --debug-osd 0/5
-
-The ``ceph tell`` command goes through the monitors. If you cannot bind to the
-monitor, you can still make the change by logging into the host of the daemon
-whose configuration you'd like to change using ``ceph daemon``.
-For example:: 
-
-	sudo ceph daemon osd.0 config set debug_osd 0/5
-
-See `Subsystem, Log and Debug Settings`_ for details on available settings.
-
-
-Boot Time
-=========
-
-To activate Ceph's debugging output (*i.e.*, ``dout()``) at boot time, you must
-add settings to your Ceph configuration file. Subsystems common to each daemon
-may be set under ``[global]`` in your configuration file. Subsystems for
-particular daemons are set under the daemon section in your configuration file
-(*e.g.*, ``[mon]``, ``[osd]``, ``[mds]``). For example::
-
-	[global]
-		debug ms = 1/5
-		
-	[mon]
-		debug mon = 20
-		debug paxos = 1/5
-		debug auth = 2
-		 
- 	[osd]
- 		debug osd = 1/5
- 		debug filestore = 1/5
- 		debug journal = 1
- 		debug monc = 5/20
- 		
-	[mds]
-		debug mds = 1
-		debug mds balancer = 1
-
-
-See `Subsystem, Log and Debug Settings`_ for details.
-
-
-Accelerating Log Rotation
-=========================
-
-If your OS disk is relatively full, you can accelerate log rotation by modifying
-the Ceph log rotation file at ``/etc/logrotate.d/ceph``. Add  a size setting
-after the rotation frequency to accelerate log rotation (via cronjob) if your
-logs exceed the size setting. For example, the  default setting looks like
-this::
-   
-	rotate 7
-  	weekly
-  	compress
-  	sharedscripts
-   	
-Modify it by adding a ``size`` setting. ::
-   
-  	rotate 7
-  	weekly
-  	size 500M
-  	compress
-  	sharedscripts
-
-Then, start the crontab editor for your user space. ::
-   
-  	crontab -e
-	
-Finally, add an entry to check the ``etc/logrotate.d/ceph`` file. ::
-   
-  	30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1
-
-The preceding example checks the ``etc/logrotate.d/ceph`` file every 30 minutes.
-
-
-Valgrind
-========
-
-Debugging may also require you to track down memory and threading issues. 
-You can run a single daemon, a type of daemon, or the whole cluster with 
-Valgrind. You should only use Valgrind when developing or debugging Ceph. 
-Valgrind is computationally expensive, and will slow down your system otherwise. 
-Valgrind messages are logged to ``stderr``. 
-
-
-Subsystem, Log and Debug Settings
-=================================
-
-In most cases, you will enable debug logging output via subsystems. 
-
-Ceph Subsystems
----------------
-
-Each subsystem has a logging level for its output logs, and for its logs
-in-memory. You may set different values for each of these subsystems by setting
-a log file level and a memory level for debug logging. Ceph's logging levels
-operate on a scale of ``1`` to ``20``, where ``1`` is terse and ``20`` is
-verbose [#]_ . In general, the logs in-memory are not sent to the output log unless:
-
-- a fatal signal is raised or
-- an ``assert`` in source code is triggered or
-- upon requested. Please consult `document on admin socket <http://docs.ceph.com/docs/master/man/8/ceph/#daemon>`_ for more details.
-
-A debug logging setting can take a single value for the log level and the
-memory level, which sets them both as the same value. For example, if you
-specify ``debug ms = 5``, Ceph will treat it as a log level and a memory level
-of ``5``. You may also specify them separately. The first setting is the log
-level, and the second setting is the memory level.  You must separate them with
-a forward slash (/). For example, if you want to set the ``ms`` subsystem's
-debug logging level to ``1`` and its memory level to ``5``, you would specify it
-as ``debug ms = 1/5``. For example:
-
-
-
-.. code-block:: ini 
-
-	debug {subsystem} = {log-level}/{memory-level}
-	#for example
-	debug mds balancer = 1/20
-
-
-The following table provides a list of Ceph subsystems and their default log and
-memory levels. Once you complete your logging efforts, restore the subsystems
-to their default level or to a level suitable for normal operations.
-
-
-+--------------------+-----------+--------------+
-| Subsystem          | Log Level | Memory Level |
-+====================+===========+==============+
-| ``default``        |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``lockdep``        |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``context``        |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``crush``          |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds``            |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds balancer``   |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds locker``     |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds log``        |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds log expire`` |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``mds migrator``   |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``buffer``         |     0     |      0       |
-+--------------------+-----------+--------------+
-| ``timer``          |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``filer``          |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``objecter``       |     0     |      0       |
-+--------------------+-----------+--------------+
-| ``rados``          |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``rbd``            |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``journaler``      |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``objectcacher``   |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``client``         |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``osd``            |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``optracker``      |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``objclass``       |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``filestore``      |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``journal``        |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``ms``             |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``mon``            |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``monc``           |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``paxos``          |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``tp``             |     0     |      5       |
-+--------------------+-----------+--------------+
-| ``auth``           |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``finisher``       |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``heartbeatmap``   |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``perfcounter``    |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``rgw``            |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``javaclient``     |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``asok``           |     1     |      5       |
-+--------------------+-----------+--------------+
-| ``throttle``       |     1     |      5       |
-+--------------------+-----------+--------------+
-
-
-Logging Settings
-----------------
-
-Logging and debugging settings are not required in a Ceph configuration file,
-but you may override default settings as needed. Ceph supports the following
-settings:
-
-
-``log file``
-
-:Description: The location of the logging file for your cluster.
-:Type: String
-:Required: No
-:Default: ``/var/log/ceph/$cluster-$name.log``
-
-
-``log max new``
-
-:Description: The maximum number of new log files.
-:Type: Integer
-:Required: No
-:Default: ``1000``
-
-
-``log max recent``
-
-:Description: The maximum number of recent events to include in a log file.
-:Type: Integer
-:Required:  No
-:Default: ``1000000``
-
-
-``log to stderr``
-
-:Description: Determines if logging messages should appear in ``stderr``.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``err to stderr``
-
-:Description: Determines if error messages should appear in ``stderr``.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``log to syslog``
-
-:Description: Determines if logging messages should appear in ``syslog``.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``err to syslog``
-
-:Description: Determines if error messages should appear in ``syslog``.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``log flush on exit``
-
-:Description: Determines if Ceph should flush the log files after exit.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``clog to monitors``
-
-:Description: Determines if ``clog`` messages should be sent to monitors.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``clog to syslog``
-
-:Description: Determines if ``clog`` messages should be sent to syslog.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``mon cluster log to syslog``
-
-:Description: Determines if the cluster log should be output to the syslog.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``mon cluster log file``
-
-:Description: The location of the cluster's log file. 
-:Type: String
-:Required: No
-:Default: ``/var/log/ceph/$cluster.log``
-
-
-
-OSD
----
-
-
-``osd debug drop ping probability``
-
-:Description: ?
-:Type: Double
-:Required: No
-:Default: 0
-
-
-``osd debug drop ping duration``
-
-:Description: 
-:Type: Integer
-:Required: No
-:Default: 0
-
-``osd debug drop pg create probability``
-
-:Description: 
-:Type: Integer
-:Required: No
-:Default: 0
-
-``osd debug drop pg create duration``
-
-:Description: ?
-:Type: Double
-:Required: No
-:Default: 1
-
-
-``osd tmapput sets uses tmap``
-
-:Description: Uses ``tmap``. For debug only.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``osd min pg log entries``
-
-:Description: The minimum number of log entries for placement groups. 
-:Type: 32-bit Unsigned Integer
-:Required: No
-:Default: 1000
-
-
-``osd op log threshold``
-
-:Description: How many op log messages to show up in one pass. 
-:Type: Integer
-:Required: No
-:Default: 5
-
-
-
-Filestore
----------
-
-``filestore debug omap check``
-
-:Description: Debugging check on synchronization. This is an expensive operation.
-:Type: Boolean
-:Required: No
-:Default: 0
-
-
-MDS
----
-
-
-``mds debug scatterstat``
-
-:Description: Ceph will assert that various recursive stat invariants are true 
-              (for developers only).
-
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``mds debug frag``
-
-:Description: Ceph will verify directory fragmentation invariants when 
-              convenient (developers only).
-
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``mds debug auth pins``
-
-:Description: The debug auth pin invariants (for developers only).
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``mds debug subtrees``
-
-:Description: The debug subtree invariants (for developers only).
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-
-RADOS Gateway
--------------
-
-
-``rgw log nonexistent bucket``
-
-:Description: Should we log a non-existent buckets?
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``rgw log object name``
-
-:Description: Should an object's name be logged. // man date to see codes (a subset are supported)
-:Type: String
-:Required: No
-:Default: ``%Y-%m-%d-%H-%i-%n``
-
-
-``rgw log object name utc``
-
-:Description: Object log name contains UTC?
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-
-``rgw enable ops log``
-
-:Description: Enables logging of every RGW operation.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``rgw enable usage log``
-
-:Description: Enable logging of RGW's bandwidth usage.
-:Type: Boolean
-:Required: No
-:Default: ``true``
-
-
-``rgw usage log flush threshold``
-
-:Description: Threshold to flush pending log data.
-:Type: Integer
-:Required: No
-:Default: ``1024``
-
-
-``rgw usage log tick interval``
-
-:Description: Flush pending log data every ``s`` seconds.
-:Type: Integer
-:Required: No
-:Default: 30
-
-
-``rgw intent log object name``
-
-:Description: 
-:Type: String
-:Required: No
-:Default: ``%Y-%m-%d-%i-%n``
-
-
-``rgw intent log object name utc``
-
-:Description: Include a UTC timestamp in the intent log object name.
-:Type: Boolean
-:Required: No
-:Default: ``false``
-
-.. [#] there are levels >20 in some rare cases and that they are extremely verbose.
diff --git a/src/ceph/doc/rados/troubleshooting/memory-profiling.rst b/src/ceph/doc/rados/troubleshooting/memory-profiling.rst
deleted file mode 100644
index e2396e2..0000000
--- a/src/ceph/doc/rados/troubleshooting/memory-profiling.rst
+++ /dev/null
@@ -1,142 +0,0 @@
-==================
- Memory Profiling
-==================
-
-Ceph MON, OSD and MDS can generate heap profiles using
-``tcmalloc``. To generate heap profiles, ensure you have
-``google-perftools`` installed::
-
-	sudo apt-get install google-perftools
-
-The profiler dumps output to your ``log file`` directory (i.e.,
-``/var/log/ceph``). See `Logging and Debugging`_ for details.
-To view the profiler logs with Google's performance tools, execute the
-following:: 
-
-    google-pprof --text {path-to-daemon}  {log-path/filename}
-
-For example::
-
-    $ ceph tell osd.0 heap start_profiler
-    $ ceph tell osd.0 heap dump
-    osd.0 tcmalloc heap stats:------------------------------------------------
-    MALLOC:        2632288 (    2.5 MiB) Bytes in use by application
-    MALLOC: +       499712 (    0.5 MiB) Bytes in page heap freelist
-    MALLOC: +       543800 (    0.5 MiB) Bytes in central cache freelist
-    MALLOC: +       327680 (    0.3 MiB) Bytes in transfer cache freelist
-    MALLOC: +      1239400 (    1.2 MiB) Bytes in thread cache freelists
-    MALLOC: +      1142936 (    1.1 MiB) Bytes in malloc metadata
-    MALLOC:   ------------
-    MALLOC: =      6385816 (    6.1 MiB) Actual memory used (physical + swap)
-    MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
-    MALLOC:   ------------
-    MALLOC: =      6385816 (    6.1 MiB) Virtual address space used
-    MALLOC:
-    MALLOC:            231              Spans in use
-    MALLOC:             56              Thread heaps in use
-    MALLOC:           8192              Tcmalloc page size
-    ------------------------------------------------
-    Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
-    Bytes released to the OS take up virtual address space but no physical memory.
-    $ google-pprof --text \
-                   /usr/bin/ceph-osd  \
-                   /var/log/ceph/ceph-osd.0.profile.0001.heap
-     Total: 3.7 MB
-     1.9  51.1%  51.1%      1.9  51.1% ceph::log::Log::create_entry
-     1.8  47.3%  98.4%      1.8  47.3% std::string::_Rep::_S_create
-     0.0   0.4%  98.9%      0.0   0.6% SimpleMessenger::add_accept_pipe
-     0.0   0.4%  99.2%      0.0   0.6% decode_message
-     ...
-
-Another heap dump on the same daemon will add another file. It is
-convenient to compare to a previous heap dump to show what has grown
-in the interval. For instance::
-
-    $ google-pprof --text --base out/osd.0.profile.0001.heap \
-          ceph-osd out/osd.0.profile.0003.heap
-     Total: 0.2 MB
-     0.1  50.3%  50.3%      0.1  50.3% ceph::log::Log::create_entry
-     0.1  46.6%  96.8%      0.1  46.6% std::string::_Rep::_S_create
-     0.0   0.9%  97.7%      0.0  26.1% ReplicatedPG::do_op
-     0.0   0.8%  98.5%      0.0   0.8% __gnu_cxx::new_allocator::allocate
-
-Refer to `Google Heap Profiler`_ for additional details.
-
-Once you have the heap profiler installed, start your cluster and
-begin using the heap profiler. You may enable or disable the heap
-profiler at runtime, or ensure that it runs continuously. For the
-following commandline usage, replace ``{daemon-type}`` with ``mon``,
-``osd`` or ``mds``, and replace ``{daemon-id}`` with the OSD number or
-the MON or MDS id.
-
-
-Starting the Profiler
----------------------
-
-To start the heap profiler, execute the following:: 
-
-	ceph tell {daemon-type}.{daemon-id} heap start_profiler
-
-For example:: 
-
-	ceph tell osd.1 heap start_profiler
-
-Alternatively the profile can be started when the daemon starts
-running if the ``CEPH_HEAP_PROFILER_INIT=true`` variable is found in
-the environment.
-
-Printing Stats
---------------
-
-To print out statistics, execute the following:: 
-
-	ceph  tell {daemon-type}.{daemon-id} heap stats
-
-For example:: 
-
-	ceph tell osd.0 heap stats
-
-.. note:: Printing stats does not require the profiler to be running and does
-   not dump the heap allocation information to a file.
-
-
-Dumping Heap Information
-------------------------
-
-To dump heap information, execute the following:: 
-
-	ceph tell {daemon-type}.{daemon-id} heap dump
-
-For example:: 
-
-	ceph tell mds.a heap dump
-
-.. note:: Dumping heap information only works when the profiler is running.
-
-
-Releasing Memory
-----------------
-
-To release memory that ``tcmalloc`` has allocated but which is not being used by
-the Ceph daemon itself, execute the following:: 
-
-	ceph tell {daemon-type}{daemon-id} heap release
-
-For example:: 
-
-	ceph tell osd.2 heap release
-
-
-Stopping the Profiler
----------------------
-
-To stop the heap profiler, execute the following:: 
-
-	ceph tell {daemon-type}.{daemon-id} heap stop_profiler
-
-For example:: 
-
-	ceph tell osd.0 heap stop_profiler
-
-.. _Logging and Debugging: ../log-and-debug
-.. _Google Heap Profiler: http://goog-perftools.sourceforge.net/doc/heap_profiler.html
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
deleted file mode 100644
index 89fb94c..0000000
--- a/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
+++ /dev/null
@@ -1,567 +0,0 @@
-=================================
- Troubleshooting Monitors
-=================================
-
-.. index:: monitor, high availability
-
-When a cluster encounters monitor-related troubles there's a tendency to
-panic, and some times with good reason. You should keep in mind that losing
-a monitor, or a bunch of them, don't necessarily mean that your cluster is
-down, as long as a majority is up, running and with a formed quorum.
-Regardless of how bad the situation is, the first thing you should do is to
-calm down, take a breath and try answering our initial troubleshooting script.
-
-
-Initial Troubleshooting
-========================
-
-
-**Are the monitors running?**
-
-  First of all, we need to make sure the monitors are running. You would be
-  amazed by how often people forget to run the monitors, or restart them after
-  an upgrade. There's no shame in that, but let's try not losing a couple of
-  hours chasing an issue that is not there.
-
-**Are you able to connect to the monitor's servers?**
-
-  Doesn't happen often, but sometimes people do have ``iptables`` rules that
-  block accesses to monitor servers or monitor ports. Usually leftovers from
-  monitor stress-testing that were forgotten at some point. Try ssh'ing into
-  the server and, if that succeeds, try connecting to the monitor's port
-  using you tool of choice (telnet, nc,...).
-
-**Does ceph -s run and obtain a reply from the cluster?**
-
-  If the answer is yes then your cluster is up and running.  One thing you
-  can take for granted is that the monitors will only answer to a ``status``
-  request if there is a formed quorum.
-
-  If ``ceph -s`` blocked however, without obtaining a reply from the cluster
-  or showing a lot of ``fault`` messages, then it is likely that your monitors
-  are either down completely or just a portion is up -- a portion that is not
-  enough to form a quorum (keep in mind that a quorum if formed by a majority
-  of monitors).
-
-**What if ceph -s doesn't finish?**
-
-  If you haven't gone through all the steps so far, please go back and do.
-
-  For those running on Emperor 0.72-rc1 and forward, you will be able to
-  contact each monitor individually asking them for their status, regardless
-  of a quorum being formed. This an be achieved using ``ceph ping mon.ID``,
-  ID being the monitor's identifier. You should perform this for each monitor
-  in the cluster. In section `Understanding mon_status`_ we will explain how
-  to interpret the output of this command.
-
-  For the rest of you who don't tread on the bleeding edge, you will need to
-  ssh into the server and use the monitor's admin socket. Please jump to
-  `Using the monitor's admin socket`_.
-
-For other specific issues, keep on reading.
-
-
-Using the monitor's admin socket
-=================================
-
-The admin socket allows you to interact with a given daemon directly using a
-Unix socket file. This file can be found in your monitor's ``run`` directory.
-By default, the admin socket will be kept in ``/var/run/ceph/ceph-mon.ID.asok``
-but this can vary if you defined it otherwise. If you don't find it there,
-please check your ``ceph.conf`` for an alternative path or run::
-
-  ceph-conf --name mon.ID --show-config-value admin_socket
-
-Please bear in mind that the admin socket will only be available while the
-monitor is running. When the monitor is properly shutdown, the admin socket
-will be removed. If however the monitor is not running and the admin socket
-still persists, it is likely that the monitor was improperly shutdown.
-Regardless, if the monitor is not running, you will not be able to use the
-admin socket, with ``ceph`` likely returning ``Error 111: Connection Refused``.
-
-Accessing the admin socket is as simple as telling the ``ceph`` tool to use
-the ``asok`` file.  In pre-Dumpling Ceph, this can be achieved by::
-
-  ceph --admin-daemon /var/run/ceph/ceph-mon.<id>.asok <command>
-
-while in Dumpling and beyond you can use the alternate (and recommended)
-format::
-
-  ceph daemon mon.<id> <command>
-
-Using ``help`` as the command to the ``ceph`` tool will show you the
-supported commands available through the admin socket. Please take a look
-at ``config get``, ``config show``, ``mon_status`` and ``quorum_status``,
-as those can be enlightening when troubleshooting a monitor.
-
-
-Understanding mon_status
-=========================
-
-``mon_status`` can be obtained through the ``ceph`` tool when you have
-a formed quorum, or via the admin socket if you don't. This command will
-output a multitude of information about the monitor, including the same
-output you would get with ``quorum_status``.
-
-Take the following example of ``mon_status``::
-
-  
-  { "name": "c",
-    "rank": 2,
-    "state": "peon",
-    "election_epoch": 38,
-    "quorum": [
-          1,
-          2],
-    "outside_quorum": [],
-    "extra_probe_peers": [],
-    "sync_provider": [],
-    "monmap": { "epoch": 3,
-        "fsid": "5c4e9d53-e2e1-478a-8061-f543f8be4cf8",
-        "modified": "2013-10-30 04:12:01.945629",
-        "created": "2013-10-29 14:14:41.914786",
-        "mons": [
-              { "rank": 0,
-                "name": "a",
-                "addr": "127.0.0.1:6789\/0"},
-              { "rank": 1,
-                "name": "b",
-                "addr": "127.0.0.1:6790\/0"},
-              { "rank": 2,
-                "name": "c",
-                "addr": "127.0.0.1:6795\/0"}]}}
-
-A couple of things are obvious: we have three monitors in the monmap (*a*, *b*
-and *c*), the quorum is formed by only two monitors, and *c* is in the quorum
-as a *peon*.
-
-Which monitor is out of the quorum?
-
-  The answer would be **a**.
-
-Why?
-
-  Take a look at the ``quorum`` set. We have two monitors in this set: *1*
-  and *2*. These are not monitor names. These are monitor ranks, as established
-  in the current monmap. We are missing the monitor with rank 0, and according
-  to the monmap that would be ``mon.a``.
-
-By the way, how are ranks established?
-
-  Ranks are (re)calculated whenever you add or remove monitors and follow a
-  simple rule: the **greater** the ``IP:PORT`` combination, the **lower** the
-  rank is. In this case, considering that ``127.0.0.1:6789`` is lower than all
-  the remaining ``IP:PORT`` combinations, ``mon.a`` has rank 0.
-
-Most Common Monitor Issues
-===========================
-
-Have Quorum but at least one Monitor is down
----------------------------------------------
-
-When this happens, depending on the version of Ceph you are running,
-you should be seeing something similar to::
-
-      $ ceph health detail
-      [snip]
-      mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
-
-How to troubleshoot this?
-
-  First, make sure ``mon.a`` is running.
-
-  Second, make sure you are able to connect to ``mon.a``'s server from the
-  other monitors' servers. Check the ports as well. Check ``iptables`` on
-  all your monitor nodes and make sure you are not dropping/rejecting
-  connections.
-
-  If this initial troubleshooting doesn't solve your problems, then it's
-  time to go deeper.
-
-  First, check the problematic monitor's ``mon_status`` via the admin
-  socket as explained in `Using the monitor's admin socket`_ and
-  `Understanding mon_status`_.
-
-  Considering the monitor is out of the quorum, its state should be one of
-  ``probing``, ``electing`` or ``synchronizing``. If it happens to be either
-  ``leader`` or ``peon``, then the monitor believes to be in quorum, while
-  the remaining cluster is sure it is not; or maybe it got into the quorum
-  while we were troubleshooting the monitor, so check you ``ceph -s`` again
-  just to make sure. Proceed if the monitor is not yet in the quorum.
-
-What if the state is ``probing``?
-
-  This means the monitor is still looking for the other monitors. Every time
-  you start a monitor, the monitor will stay in this state for some time
-  while trying to find the rest of the monitors specified in the ``monmap``.
-  The time a monitor will spend in this state can vary. For instance, when on
-  a single-monitor cluster, the monitor will pass through the probing state
-  almost instantaneously, since there are no other monitors around. On a
-  multi-monitor cluster, the monitors will stay in this state until they
-  find enough monitors to form a quorum -- this means that if you have 2 out
-  of 3 monitors down, the one remaining monitor will stay in this state
-  indefinitively until you bring one of the other monitors up.
-
-  If you have a quorum, however, the monitor should be able to find the
-  remaining monitors pretty fast, as long as they can be reached. If your
-  monitor is stuck probing and you have gone through with all the communication
-  troubleshooting, then there is a fair chance that the monitor is trying
-  to reach the other monitors on a wrong address. ``mon_status`` outputs the
-  ``monmap`` known to the monitor: check if the other monitor's locations
-  match reality. If they don't, jump to
-  `Recovering a Monitor's Broken monmap`_; if they do, then it may be related
-  to severe clock skews amongst the monitor nodes and you should refer to
-  `Clock Skews`_ first, but if that doesn't solve your problem then it is
-  the time to prepare some logs and reach out to the community (please refer
-  to `Preparing your logs`_ on how to best prepare your logs).
-
-
-What if state is ``electing``?
-
-  This means the monitor is in the middle of an election. These should be
-  fast to complete, but at times the monitors can get stuck electing. This
-  is usually a sign of a clock skew among the monitor nodes; jump to
-  `Clock Skews`_ for more infos on that. If all your clocks are properly
-  synchronized, it is best if you prepare some logs and reach out to the
-  community. This is not a state that is likely to persist and aside from
-  (*really*) old bugs there is not an obvious reason besides clock skews on
-  why this would happen.
-
-What if state is ``synchronizing``?
-
-  This means the monitor is synchronizing with the rest of the cluster in
-  order to join the quorum. The synchronization process is as faster as
-  smaller your monitor store is, so if you have a big store it may
-  take a while. Don't worry, it should be finished soon enough.
-
-  However, if you notice that the monitor jumps from ``synchronizing`` to
-  ``electing`` and then back to ``synchronizing``, then you do have a
-  problem: the cluster state is advancing (i.e., generating new maps) way
-  too fast for the synchronization process to keep up. This used to be a
-  thing in early Cuttlefish, but since then the synchronization process was
-  quite refactored and enhanced to avoid just this sort of behavior. If this
-  happens in later versions let us know. And bring some logs
-  (see `Preparing your logs`_).
-
-What if state is ``leader`` or ``peon``?
-
-  This should not happen. There is a chance this might happen however, and
-  it has a lot to do with clock skews -- see `Clock Skews`_. If you are not
-  suffering from clock skews, then please prepare your logs (see
-  `Preparing your logs`_) and reach out to us.
-
-
-Recovering a Monitor's Broken monmap
--------------------------------------
-
-This is how a ``monmap`` usually looks like, depending on the number of
-monitors::
-
-
-      epoch 3
-      fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8
-      last_changed 2013-10-30 04:12:01.945629
-      created 2013-10-29 14:14:41.914786
-      0: 127.0.0.1:6789/0 mon.a
-      1: 127.0.0.1:6790/0 mon.b
-      2: 127.0.0.1:6795/0 mon.c
-      
-This may not be what you have however. For instance, in some versions of
-early Cuttlefish there was this one bug that could cause your ``monmap``
-to be nullified.  Completely filled with zeros. This means that not even
-``monmaptool`` would be able to read it because it would find it hard to
-make sense of only-zeros. Some other times, you may end up with a monitor
-with a severely outdated monmap, thus being unable to find the remaining
-monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``,
-then remove ``mon.a``, then add a new monitor ``mon.e`` and remove
-``mon.b``; you will end up with a totally different monmap from the one
-``mon.c`` knows).
-
-In this sort of situations, you have two possible solutions:
-
-Scrap the monitor and create a new one
-
-  You should only take this route if you are positive that you won't
-  lose the information kept by that monitor; that you have other monitors
-  and that they are running just fine so that your new monitor is able
-  to synchronize from the remaining monitors. Keep in mind that destroying
-  a monitor, if there are no other copies of its contents, may lead to
-  loss of data.
-
-Inject a monmap into the monitor
-
-  Usually the safest path. You should grab the monmap from the remaining
-  monitors and inject it into the monitor with the corrupted/lost monmap.
-
-  These are the basic steps:
-
-  1. Is there a formed quorum? If so, grab the monmap from the quorum::
-
-      $ ceph mon getmap -o /tmp/monmap
-
-  2. No quorum? Grab the monmap directly from another monitor (this
-     assumes the monitor you are grabbing the monmap from has id ID-FOO
-     and has been stopped)::
-
-      $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
-
-  3. Stop the monitor you are going to inject the monmap into.
-
-  4. Inject the monmap::
-
-      $ ceph-mon -i ID --inject-monmap /tmp/monmap
-
-  5. Start the monitor
-
-  Please keep in mind that the ability to inject monmaps is a powerful
-  feature that can cause havoc with your monitors if misused as it will
-  overwrite the latest, existing monmap kept by the monitor.
-
-
-Clock Skews
-------------
-
-Monitors can be severely affected by significant clock skews across the
-monitor nodes. This usually translates into weird behavior with no obvious
-cause. To avoid such issues, you should run a clock synchronization tool
-on your monitor nodes.
-
-
-What's the maximum tolerated clock skew?
-
-  By default the monitors will allow clocks to drift up to ``0.05 seconds``.
-
-
-Can I increase the maximum tolerated clock skew?
-
-  This value is configurable via the ``mon-clock-drift-allowed`` option, and
-  although you *CAN* it doesn't mean you *SHOULD*. The clock skew mechanism
-  is in place because clock skewed monitor may not properly behave. We, as
-  developers and QA afficcionados, are comfortable with the current default
-  value, as it will alert the user before the monitors get out hand. Changing
-  this value without testing it first may cause unforeseen effects on the
-  stability of the monitors and overall cluster healthiness, although there is
-  no risk of dataloss.
-
-
-How do I know there's a clock skew?
-
-  The monitors will warn you in the form of a ``HEALTH_WARN``. ``ceph health
-  detail`` should show something in the form of::
-
-      mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)
-
-  That means that ``mon.c`` has been flagged as suffering from a clock skew.
-
-
-What should I do if there's a clock skew?
-
-  Synchronize your clocks. Running an NTP client may help. If you are already
-  using one and you hit this sort of issues, check if you are using some NTP
-  server remote to your network and consider hosting your own NTP server on
-  your network.  This last option tends to reduce the amount of issues with
-  monitor clock skews.
-
-
-Client Can't Connect or Mount
-------------------------------
-
-Check your IP tables. Some OS install utilities add a ``REJECT`` rule to
-``iptables``. The rule rejects all clients trying to connect to the host except
-for ``ssh``. If your monitor host's IP tables have such a ``REJECT`` rule in
-place, clients connecting from a separate node will fail to mount with a timeout
-error. You need to address ``iptables`` rules that reject clients trying to
-connect to Ceph daemons.  For example, you would need to address rules that look
-like this appropriately::
-
-	REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
-
-You may also need to add rules to IP tables on your Ceph hosts to ensure
-that clients can access the ports associated with your Ceph monitors (i.e., port
-6789 by default) and Ceph OSDs (i.e., 6800 through 7300 by default). For
-example::
-
-	iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:7300 -j ACCEPT
-
-Monitor Store Failures
-======================
-
-Symptoms of store corruption
-----------------------------
-
-Ceph monitor stores the `cluster map`_ in a key/value store such as LevelDB. If
-a monitor fails due to the key/value store corruption, following error messages
-might be found in the monitor log::
-
-  Corruption: error in middle of record
-
-or::
-
-  Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
-
-Recovery using healthy monitor(s)
----------------------------------
-
-If there is any survivers, we can always `replace`_ the corrupted one with a
-new one. And after booting up, the new joiner will sync up with a healthy
-peer, and once it is fully sync'ed, it will be able to serve the clients.
-
-Recovery using OSDs
--------------------
-
-But what if all monitors fail at the same time? Since users are encouraged to
-deploy at least three monitors in a Ceph cluster, the chance of simultaneous
-failure is rare. But unplanned power-downs in a data center with improperly
-configured disk/fs settings could fail the underlying filesystem, and hence
-kill all the monitors. In this case, we can recover the monitor store with the
-information stored in OSDs.::
-
-  ms=/tmp/mon-store
-  mkdir $ms
-  # collect the cluster map from OSDs
-  for host in $hosts; do
-    rsync -avz $ms user@host:$ms
-    rm -rf $ms
-    ssh user@host <<EOF
-      for osd in /var/lib/osd/osd-*; do
-        ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
-      done
-    EOF
-    rsync -avz user@host:$ms $ms
-  done
-  # rebuild the monitor store from the collected map, if the cluster does not
-  # use cephx authentication, we can skip the following steps to update the
-  # keyring with the caps, and there is no need to pass the "--keyring" option.
-  # i.e. just use "ceph-monstore-tool /tmp/mon-store rebuild" instead
-  ceph-authtool /path/to/admin.keyring -n mon. \
-    --cap mon 'allow *'
-  ceph-authtool /path/to/admin.keyring -n client.admin \
-    --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
-  ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /path/to/admin.keyring
-  # backup corrupted store.db just in case
-  mv /var/lib/ceph/mon/mon.0/store.db /var/lib/ceph/mon/mon.0/store.db.corrupted
-  mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db
-  chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db
-
-The steps above
-
-#. collect the map from all OSD hosts,
-#. then rebuild the store,
-#. fill the entities in keyring file with appropriate caps
-#. replace the corrupted store on ``mon.0`` with the recovered copy.
-
-Known limitations
-~~~~~~~~~~~~~~~~~
-
-Following information are not recoverable using the steps above:
-
-- **some added keyrings**: all the OSD keyrings added using ``ceph auth add`` command
-  are recovered from the OSD's copy. And the ``client.admin`` keyring is imported
-  using ``ceph-monstore-tool``. But the MDS keyrings and other keyrings are missing
-  in the recovered monitor store. You might need to re-add them manually.
-
-- **pg settings**: the ``full ratio`` and ``nearfull ratio`` settings configured using
-  ``ceph pg set_full_ratio`` and ``ceph pg set_nearfull_ratio`` will be lost.
-
-- **MDS Maps**: the MDS maps are lost.
-
-
-Everything Failed! Now What?
-=============================
-
-Reaching out for help
-----------------------
-
-You can find us on IRC at #ceph and #ceph-devel at OFTC (server irc.oftc.net)
-and on ``ceph-devel@vger.kernel.org`` and ``ceph-users@lists.ceph.com``. Make
-sure you have grabbed your logs and have them ready if someone asks: the faster
-the interaction and lower the latency in response, the better chances everyone's
-time is optimized.
-
-
-Preparing your logs
----------------------
-
-Monitor logs are, by default, kept in ``/var/log/ceph/ceph-mon.FOO.log*``. We
-may want them. However, your logs may not have the necessary information. If
-you don't find your monitor logs at their default location, you can check
-where they should be by running::
-
-  ceph-conf --name mon.FOO --show-config-value log_file
-
-The amount of information in the logs are subject to the debug levels being
-enforced by your configuration files. If you have not enforced a specific
-debug level then Ceph is using the default levels and your logs may not
-contain important information to track down you issue.
-A first step in getting relevant information into your logs will be to raise
-debug levels. In this case we will be interested in the information from the
-monitor.
-Similarly to what happens on other components, different parts of the monitor
-will output their debug information on different subsystems.
-
-You will have to raise the debug levels of those subsystems more closely
-related to your issue. This may not be an easy task for someone unfamiliar
-with troubleshooting Ceph. For most situations, setting the following options
-on your monitors will be enough to pinpoint a potential source of the issue::
-
-      debug mon = 10
-      debug ms = 1
-
-If we find that these debug levels are not enough, there's a chance we may
-ask you to raise them or even define other debug subsystems to obtain infos
-from -- but at least we started off with some useful information, instead
-of a massively empty log without much to go on with.
-
-Do I need to restart a monitor to adjust debug levels?
-------------------------------------------------------
-
-No. You may do it in one of two ways:
-
-You have quorum
-
-  Either inject the debug option into the monitor you want to debug::
-
-        ceph tell mon.FOO injectargs --debug_mon 10/10
-
-  or into all monitors at once::
-
-        ceph tell mon.* injectargs --debug_mon 10/10
-
-No quourm
-
-  Use the monitor's admin socket and directly adjust the configuration
-  options::
-
-      ceph daemon mon.FOO config set debug_mon 10/10
-
-
-Going back to default values is as easy as rerunning the above commands
-using the debug level ``1/10`` instead.  You can check your current
-values using the admin socket and the following commands::
-
-      ceph daemon mon.FOO config show
-
-or::
-
-      ceph daemon mon.FOO config get 'OPTION_NAME'
-
-
-Reproduced the problem with appropriate debug levels. Now what?
-----------------------------------------------------------------
-
-Ideally you would send us only the relevant portions of your logs.
-We realise that figuring out the corresponding portion may not be the
-easiest of tasks. Therefore, we won't hold it to you if you provide the
-full log, but common sense should be employed. If your log has hundreds
-of thousands of lines, it may get tricky to go through the whole thing,
-specially if we are not aware at which point, whatever your issue is,
-happened. For instance, when reproducing, keep in mind to write down
-current time and date and to extract the relevant portions of your logs
-based on that.
-
-Finally, you should reach out to us on the mailing lists, on IRC or file
-a new issue on the `tracker`_.
-
-.. _cluster map: ../../architecture#cluster-map
-.. _replace: ../operation/add-or-rm-mons
-.. _tracker: http://tracker.ceph.com/projects/ceph/issues/new
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst
deleted file mode 100644
index 88307fe..0000000
--- a/src/ceph/doc/rados/troubleshooting/troubleshooting-osd.rst
+++ /dev/null
@@ -1,536 +0,0 @@
-======================
- Troubleshooting OSDs
-======================
-
-Before troubleshooting your OSDs, check your monitors and network first. If
-you execute ``ceph health`` or ``ceph -s`` on the command line and Ceph returns
-a health status, it means that the monitors have a quorum.
-If you don't have a monitor quorum or if there are errors with the monitor
-status, `address the monitor issues first <../troubleshooting-mon>`_.
-Check your networks to ensure they
-are running properly, because networks may have a significant impact on OSD
-operation and performance.
-
-
-
-Obtaining Data About OSDs
-=========================
-
-A good first step in troubleshooting your OSDs is to obtain information in
-addition to the information you collected while `monitoring your OSDs`_
-(e.g., ``ceph osd tree``).
-
-
-Ceph Logs
----------
-
-If you haven't changed the default path, you can find Ceph log files at
-``/var/log/ceph``::
-
-	ls /var/log/ceph
-
-If you don't get enough log detail, you can change your logging level.  See
-`Logging and Debugging`_ for details to ensure that Ceph performs adequately
-under high logging volume.
-
-
-Admin Socket
-------------
-
-Use the admin socket tool to retrieve runtime information. For details, list
-the sockets for your Ceph processes::
-
-	ls /var/run/ceph
-
-Then, execute the following, replacing ``{daemon-name}`` with an actual
-daemon (e.g., ``osd.0``)::
-
-  ceph daemon osd.0 help
-
-Alternatively, you can specify a ``{socket-file}`` (e.g., something in ``/var/run/ceph``)::
-
-  ceph daemon {socket-file} help
-
-
-The admin socket, among other things, allows you to:
-
-- List your configuration at runtime
-- Dump historic operations
-- Dump the operation priority queue state
-- Dump operations in flight
-- Dump perfcounters
-
-
-Display Freespace
------------------
-
-Filesystem issues may arise. To display your filesystem's free space, execute
-``df``. ::
-
-	df -h
-
-Execute ``df --help`` for additional usage.
-
-
-I/O Statistics
---------------
-
-Use `iostat`_ to identify I/O-related issues. ::
-
-	iostat -x
-
-
-Diagnostic Messages
--------------------
-
-To retrieve diagnostic messages, use ``dmesg`` with ``less``, ``more``, ``grep``
-or ``tail``.  For example::
-
-	dmesg | grep scsi
-
-
-Stopping w/out Rebalancing
-==========================
-
-Periodically, you may need to perform maintenance on a subset of your cluster,
-or resolve a problem that affects a failure domain (e.g., a rack). If you do not
-want CRUSH to automatically rebalance the cluster as you stop OSDs for
-maintenance, set the cluster to ``noout`` first::
-
-	ceph osd set noout
-
-Once the cluster is set to ``noout``, you can begin stopping the OSDs within the
-failure domain that requires maintenance work. ::
-
-	stop ceph-osd id={num}
-
-.. note:: Placement groups within the OSDs you stop will become ``degraded``
-   while you are addressing issues with within the failure domain.
-
-Once you have completed your maintenance, restart the OSDs. ::
-
-	start ceph-osd id={num}
-
-Finally, you must unset the cluster from ``noout``. ::
-
-	ceph osd unset noout
-
-
-
-.. _osd-not-running:
-
-OSD Not Running
-===============
-
-Under normal circumstances, simply restarting the ``ceph-osd`` daemon will
-allow it to rejoin the cluster and recover.
-
-An OSD Won't Start
-------------------
-
-If you start your cluster and an OSD won't start, check the following:
-
-- **Configuration File:** If you were not able to get OSDs running from
-  a new installation, check your configuration file to ensure it conforms
-  (e.g., ``host`` not ``hostname``, etc.).
-
-- **Check Paths:** Check the paths in your configuration, and the actual
-  paths themselves for data and journals. If you separate the OSD data from
-  the journal data and there are errors in your configuration file or in the
-  actual mounts, you may have trouble starting OSDs. If you want to store the
-  journal on a block device, you should partition your journal disk and assign
-  one partition per OSD.
-
-- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be
-  hitting the default maximum number of threads (e.g., usually 32k), especially
-  during recovery. You can increase the number of threads using ``sysctl`` to
-  see if increasing the maximum number of threads to the maximum possible
-  number of threads allowed (i.e.,  4194303) will help. For example::
-
-	sysctl -w kernel.pid_max=4194303
-
-  If increasing the maximum thread count resolves the issue, you can make it
-  permanent by including a ``kernel.pid_max`` setting in the
-  ``/etc/sysctl.conf`` file. For example::
-
-	kernel.pid_max = 4194303
-
-- **Kernel Version:** Identify the kernel version and distribution you
-  are using. Ceph uses some third party tools by default, which may be
-  buggy or may conflict with certain distributions and/or kernel
-  versions (e.g., Google perftools). Check the `OS recommendations`_
-  to ensure you have addressed any issues related to your kernel.
-
-- **Segment Fault:** If there is a segment fault, turn your logging up
-  (if it is not already), and try again. If it segment faults again,
-  contact the ceph-devel email list and provide your Ceph configuration
-  file, your monitor output and the contents of your log file(s).
-
-
-
-An OSD Failed
--------------
-
-When a ``ceph-osd`` process dies, the monitor will learn about the failure
-from surviving ``ceph-osd`` daemons and report it via the ``ceph health``
-command::
-
-	ceph health
-	HEALTH_WARN 1/3 in osds are down
-
-Specifically, you will get a warning whenever there are ``ceph-osd``
-processes that are marked ``in`` and ``down``.  You can identify which
-``ceph-osds`` are ``down`` with::
-
-	ceph health detail
-	HEALTH_WARN 1/3 in osds are down
-	osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080
-
-If there is a disk
-failure or other fault preventing ``ceph-osd`` from functioning or
-restarting, an error message should be present in its log file in
-``/var/log/ceph``.
-
-If the daemon stopped because of a heartbeat failure, the underlying
-kernel file system may be unresponsive. Check ``dmesg`` output for disk
-or other kernel errors.
-
-If the problem is a software error (failed assertion or other
-unexpected error), it should be reported to the `ceph-devel`_ email list.
-
-
-No Free Drive Space
--------------------
-
-Ceph prevents you from writing to a full OSD so that you don't lose data.
-In an operational cluster, you should receive a warning when your cluster
-is getting near its full ratio. The ``mon osd full ratio`` defaults to
-``0.95``, or 95% of capacity before it stops clients from writing data.
-The ``mon osd backfillfull ratio`` defaults to ``0.90``, or 90 % of
-capacity when it blocks backfills from starting. The
-``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
-when it generates a health warning.
-
-Full cluster issues usually arise when testing how Ceph handles an OSD
-failure on a small cluster. When one node has a high percentage of the
-cluster's data, the cluster can easily eclipse its nearfull and full ratio
-immediately. If you are testing how Ceph reacts to OSD failures on a small
-cluster, you should leave ample free disk space and consider temporarily
-lowering the ``mon osd full ratio``, ``mon osd backfillfull ratio``  and
-``mon osd nearfull ratio``.
-
-Full ``ceph-osds`` will be reported by ``ceph health``::
-
-	ceph health
-	HEALTH_WARN 1 nearfull osd(s)
-
-Or::
-
-	ceph health detail
-	HEALTH_ERR 1 full osd(s); 1 backfillfull osd(s); 1 nearfull osd(s)
-	osd.3 is full at 97%
-	osd.4 is backfill full at 91%
-	osd.2 is near full at 87%
-
-The best way to deal with a full cluster is to add new ``ceph-osds``, allowing
-the cluster to redistribute data to the newly available storage.
-
-If you cannot start an OSD because it is full, you may delete some data by deleting
-some placement group directories in the full OSD.
-
-.. important:: If you choose to delete a placement group directory on a full OSD,
-   **DO NOT** delete the same placement group directory on another full OSD, or
-   **YOU MAY LOSE DATA**. You **MUST** maintain at least one copy of your data on
-   at least one OSD.
-
-See `Monitor Config Reference`_ for additional details.
-
-
-OSDs are Slow/Unresponsive
-==========================
-
-A commonly recurring issue involves slow or unresponsive OSDs. Ensure that you
-have eliminated other troubleshooting possibilities before delving into OSD
-performance issues. For example, ensure that your network(s) is working properly
-and your OSDs are running. Check to see if OSDs are throttling recovery traffic.
-
-.. tip:: Newer versions of Ceph provide better recovery handling by preventing
-   recovering OSDs from using up system resources so that ``up`` and ``in``
-   OSDs are not available or are otherwise slow.
-
-
-Networking Issues
------------------
-
-Ceph is a distributed storage system, so it  depends upon networks to peer with
-OSDs, replicate objects, recover from faults and check heartbeats. Networking
-issues can cause OSD latency and flapping OSDs. See `Flapping OSDs`_ for
-details.
-
-Ensure that Ceph processes and Ceph-dependent processes are connected and/or
-listening. ::
-
-	netstat -a | grep ceph
-	netstat -l | grep ceph
-	sudo netstat -p | grep ceph
-
-Check network statistics. ::
-
-	netstat -s
-
-
-Drive Configuration
--------------------
-
-A storage drive should only support one OSD. Sequential read and sequential
-write throughput can bottleneck if other processes share the drive, including
-journals, operating systems, monitors, other OSDs and non-Ceph processes.
-
-Ceph acknowledges writes *after* journaling, so fast SSDs are an
-attractive option to accelerate the response time--particularly when
-using the ``XFS`` or ``ext4`` filesystems.  By contrast, the ``btrfs``
-filesystem can write and journal simultaneously.  (Note, however, that
-we recommend against using ``btrfs`` for production deployments.)
-
-.. note:: Partitioning a drive does not change its total throughput or
-   sequential read/write limits. Running a journal in a separate partition
-   may help, but you should prefer a separate physical drive.
-
-
-Bad Sectors / Fragmented Disk
------------------------------
-
-Check your disks for bad sectors and fragmentation. This can cause total throughput
-to drop substantially.
-
-
-Co-resident Monitors/OSDs
--------------------------
-
-Monitors are generally light-weight processes, but they do lots of ``fsync()``,
-which can interfere with other workloads, particularly if monitors run on the
-same drive as your OSDs. Additionally, if you run monitors on the same host as
-the OSDs, you may incur performance issues related to:
-
-- Running an older kernel (pre-3.0)
-- Running Argonaut with an old ``glibc``
-- Running a kernel with no syncfs(2) syscall.
-
-In these cases, multiple OSDs running on the same host can drag each other down
-by doing lots of commits. That often leads to the bursty writes.
-
-
-Co-resident Processes
----------------------
-
-Spinning up co-resident processes such as a cloud-based solution, virtual
-machines and other applications that write data to Ceph while operating on the
-same hardware as OSDs can introduce significant OSD latency. Generally, we
-recommend optimizing a host for use with Ceph and using other hosts for other
-processes. The practice of separating Ceph operations from other applications
-may help improve performance and may streamline troubleshooting and maintenance.
-
-
-Logging Levels
---------------
-
-If you turned logging levels up to track an issue and then forgot to turn
-logging levels back down, the OSD may be putting a lot of logs onto the disk. If
-you intend to keep logging levels high, you may consider mounting a drive to the
-default path for logging (i.e., ``/var/log/ceph/$cluster-$name.log``).
-
-
-Recovery Throttling
--------------------
-
-Depending upon your configuration, Ceph may reduce recovery rates to maintain
-performance or it may increase recovery rates to the point that recovery
-impacts OSD performance. Check to see if the OSD is recovering.
-
-
-Kernel Version
---------------
-
-Check the kernel version you are running. Older kernels may not receive
-new backports that Ceph depends upon for better performance.
-
-
-Kernel Issues with SyncFS
--------------------------
-
-Try running one OSD per host to see if performance improves. Old kernels
-might not have a recent enough version of ``glibc`` to support ``syncfs(2)``.
-
-
-Filesystem Issues
------------------
-
-Currently, we recommend deploying clusters with XFS.
-
-We recommend against using btrfs or ext4.  The btrfs filesystem has
-many attractive features, but bugs in the filesystem may lead to
-performance issues and suprious ENOSPC errors.  We do not recommend
-ext4 because xattr size limitations break our support for long object
-names (needed for RGW).
-
-For more information, see `Filesystem Recommendations`_.
-
-.. _Filesystem Recommendations: ../configuration/filesystem-recommendations
-
-
-Insufficient RAM
-----------------
-
-We recommend 1GB of RAM per OSD daemon. You may notice that during normal
-operations, the OSD only uses a fraction of that amount (e.g., 100-200MB).
-Unused RAM makes it tempting to use the excess RAM for co-resident applications,
-VMs and so forth. However, when OSDs go into recovery mode, their memory
-utilization spikes. If there is no RAM available, the OSD performance will slow
-considerably.
-
-
-Old Requests or Slow Requests
------------------------------
-
-If a ``ceph-osd`` daemon is slow to respond to a request, it will generate log messages
-complaining about requests that are taking too long.  The warning threshold
-defaults to 30 seconds, and is configurable via the ``osd op complaint time``
-option.  When this happens, the cluster log will receive messages.
-
-Legacy versions of Ceph complain about 'old requests`::
-
-	osd.0 192.168.106.220:6800/18813 312 : [WRN] old request osd_op(client.5099.0:790 fatty_26485_object789 [write 0~4096] 2.5e54f643) v4 received at 2012-03-06 15:42:56.054801 currently waiting for sub ops
-
-New versions of Ceph complain about 'slow requests`::
-
-	{date} {osd.num} [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
-	{date} {osd.num}  [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
-
-
-Possible causes include:
-
-- A bad drive (check ``dmesg`` output)
-- A bug in the kernel file system bug (check ``dmesg`` output)
-- An overloaded cluster (check system load, iostat, etc.)
-- A bug in the ``ceph-osd`` daemon.
-
-Possible solutions
-
-- Remove VMs Cloud Solutions from Ceph Hosts
-- Upgrade Kernel
-- Upgrade Ceph
-- Restart OSDs
-
-Debugging Slow Requests
------------------------
-
-If you run "ceph daemon osd.<id> dump_historic_ops" or "dump_ops_in_flight",
-you will see a set of operations and a list of events each operation went
-through. These are briefly described below.
-
-Events from the Messenger layer:
-
-- header_read: when the messenger first started reading the message off the wire
-- throttled: when the messenger tried to acquire memory throttle space to read
-  the message into memory
-- all_read: when the messenger finished reading the message off the wire
-- dispatched: when the messenger gave the message to the OSD
-- Initiated: <This is identical to header_read. The existence of both is a
-  historical oddity.
-
-Events from the OSD as it prepares operations
-
-- queued_for_pg: the op has been put into the queue for processing by its PG
-- reached_pg: the PG has started doing the op
-- waiting for \*: the op is waiting for some other work to complete before it
-  can proceed (a new OSDMap; for its object target to scrub; for the PG to
-  finish peering; all as specified in the message)
-- started: the op has been accepted as something the OSD should actually do
-  (reasons not to do it: failed security/permission checks; out-of-date local
-  state; etc) and is now actually being performed
-- waiting for subops from: the op has been sent to replica OSDs
-
-Events from the FileStore
-
-- commit_queued_for_journal_write: the op has been given to the FileStore
-- write_thread_in_journal_buffer: the op is in the journal's buffer and waiting
-  to be persisted (as the next disk write)
-- journaled_completion_queued: the op was journaled to disk and its callback
-  queued for invocation
-
-Events from the OSD after stuff has been given to local disk
-
-- op_commit: the op has been committed (ie, written to journal) by the
-  primary OSD
-- op_applied: The op has been write()'en to the backing FS (ie, applied in
-  memory but not flushed out to disk) on the primary
-- sub_op_applied: op_applied, but for a replica's "subop"
-- sub_op_committed: op_commited, but for a replica's subop (only for EC pools)
-- sub_op_commit_rec/sub_op_apply_rec from <X>: the primary marks this when it
-  hears about the above, but for a particular replica <X>
-- commit_sent: we sent a reply back to the client (or primary OSD, for sub ops)
-
-Many of these events are seemingly redundant, but cross important boundaries in
-the internal code (such as passing data across locks into new threads).
-
-Flapping OSDs
-=============
-
-We recommend using both a public (front-end) network and a cluster (back-end)
-network so that you can better meet the capacity requirements of object
-replication. Another advantage is that you can run a cluster network such that
-it is not connected to the internet, thereby preventing some denial of service
-attacks. When OSDs peer and check heartbeats, they use the cluster (back-end)
-network when it's available. See `Monitor/OSD Interaction`_ for details.
-
-However, if the cluster (back-end) network fails or develops significant latency
-while the public (front-end) network operates optimally, OSDs currently do not
-handle this situation well. What happens is that OSDs mark each other ``down``
-on the monitor, while marking themselves ``up``. We call this scenario
-'flapping`.
-
-If something is causing OSDs to 'flap' (repeatedly getting marked ``down`` and
-then ``up`` again), you can force the monitors to stop the flapping with::
-
-	ceph osd set noup      # prevent OSDs from getting marked up
-	ceph osd set nodown    # prevent OSDs from getting marked down
-
-These flags are recorded in the osdmap structure::
-
-	ceph osd dump | grep flags
-	flags no-up,no-down
-
-You can clear the flags with::
-
-	ceph osd unset noup
-	ceph osd unset nodown
-
-Two other flags are supported, ``noin`` and ``noout``, which prevent
-booting OSDs from being marked ``in`` (allocated data) or protect OSDs
-from eventually being marked ``out`` (regardless of what the current value for
-``mon osd down out interval`` is).
-
-.. note:: ``noup``, ``noout``, and ``nodown`` are temporary in the
-   sense that once the flags are cleared, the action they were blocking
-   should occur shortly after.  The ``noin`` flag, on the other hand,
-   prevents OSDs from being marked ``in`` on boot, and any daemons that
-   started while the flag was set will remain that way.
-
-
-
-
-
-
-.. _iostat: http://en.wikipedia.org/wiki/Iostat
-.. _Ceph Logging and Debugging: ../../configuration/ceph-conf#ceph-logging-and-debugging
-.. _Logging and Debugging: ../log-and-debug
-.. _Debugging and Logging: ../debug
-.. _Monitor/OSD Interaction: ../../configuration/mon-osd-interaction
-.. _Monitor Config Reference: ../../configuration/mon-config-ref
-.. _monitoring your OSDs: ../../operations/monitoring-osd-pg
-.. _subscribe to the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=subscribe+ceph-devel
-.. _unsubscribe from the ceph-devel email list: mailto:majordomo@vger.kernel.org?body=unsubscribe+ceph-devel
-.. _subscribe to the ceph-users email list: mailto:ceph-users-join@lists.ceph.com
-.. _unsubscribe from the ceph-users email list: mailto:ceph-users-leave@lists.ceph.com
-.. _OS recommendations: ../../../start/os-recommendations
-.. _ceph-devel: ceph-devel@vger.kernel.org
diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst
deleted file mode 100644
index 4241fee..0000000
--- a/src/ceph/doc/rados/troubleshooting/troubleshooting-pg.rst
+++ /dev/null
@@ -1,668 +0,0 @@
-=====================
- Troubleshooting PGs
-=====================
-
-Placement Groups Never Get Clean
-================================
-
-When you create a cluster and your cluster remains in ``active``, 
-``active+remapped`` or ``active+degraded`` status and never achieve an 
-``active+clean`` status, you likely have a problem with your configuration.
-
-You may need to review settings in the `Pool, PG and CRUSH Config Reference`_
-and make appropriate adjustments.
-
-As a general rule, you should run your cluster with more than one OSD and a
-pool size greater than 1 object replica.
-
-One Node Cluster
-----------------
-
-Ceph no longer provides documentation for operating on a single node, because
-you would never deploy a system designed for distributed computing on a single
-node. Additionally, mounting client kernel modules on a single node containing a
-Ceph  daemon may cause a deadlock due to issues with the Linux kernel itself
-(unless you use VMs for the clients). You can experiment with Ceph in a 1-node
-configuration, in spite of the limitations as described herein.
-
-If you are trying to create a cluster on a single node, you must change the
-default of the ``osd crush chooseleaf type`` setting from ``1`` (meaning 
-``host`` or ``node``) to ``0`` (meaning ``osd``) in your Ceph configuration
-file before you create your monitors and OSDs. This tells Ceph that an OSD
-can peer with another OSD on the same host. If you are trying to set up a
-1-node cluster and ``osd crush chooseleaf type`` is greater than ``0``, 
-Ceph will try to peer the PGs of one OSD with the PGs of another OSD on 
-another node, chassis, rack, row, or even datacenter depending on the setting.
-
-.. tip:: DO NOT mount kernel clients directly on the same node as your 
-   Ceph Storage Cluster, because kernel conflicts can arise. However, you 
-   can mount kernel clients within virtual machines (VMs) on a single node.
-
-If you are creating OSDs using a single disk, you must create directories
-for the data manually first. For example:: 
-
-	mkdir /var/local/osd0 /var/local/osd1
-	ceph-deploy osd prepare {localhost-name}:/var/local/osd0 {localhost-name}:/var/local/osd1
-	ceph-deploy osd activate {localhost-name}:/var/local/osd0 {localhost-name}:/var/local/osd1
-
-
-Fewer OSDs than Replicas
-------------------------
-
-If you have brought up two OSDs to an ``up`` and ``in`` state, but you still 
-don't see ``active + clean`` placement groups, you may have an 
-``osd pool default size`` set to greater than ``2``.
-
-There are a few ways to address this situation. If you want to operate your
-cluster in an ``active + degraded`` state with two replicas, you can set the 
-``osd pool default min size`` to ``2`` so that you can write objects in 
-an ``active + degraded`` state. You may also set the ``osd pool default size``
-setting to ``2`` so that you only have two stored replicas (the original and 
-one replica), in which case the cluster should achieve an ``active + clean`` 
-state.
-
-.. note:: You can make the changes at runtime. If you make the changes in 
-   your Ceph configuration file, you may need to restart your cluster.
-
-
-Pool Size = 1
--------------
-
-If you have the ``osd pool default size`` set to ``1``, you will only have 
-one copy of the object. OSDs rely on other OSDs to tell them which objects 
-they should have. If a first OSD has a copy of an object and there is no
-second copy, then no second OSD can tell the first OSD that it should have
-that copy. For each placement group mapped to the first OSD (see 
-``ceph pg dump``), you can force the first OSD to notice the placement groups
-it needs by running::
-   
-   	ceph osd force-create-pg <pgid>
-   	
-
-CRUSH Map Errors
-----------------
-
-Another candidate for placement groups remaining unclean involves errors 
-in your CRUSH map.
-
-
-Stuck Placement Groups
-======================
-
-It is normal for placement groups to enter states like "degraded" or "peering"
-following a failure.  Normally these states indicate the normal progression
-through the failure recovery process. However, if a placement group stays in one
-of these states for a long time this may be an indication of a larger problem.
-For this reason, the monitor will warn when placement groups get "stuck" in a
-non-optimal state.  Specifically, we check for:
-
-* ``inactive`` - The placement group has not been ``active`` for too long 
-  (i.e., it hasn't been able to service read/write requests).
-  
-* ``unclean`` - The placement group has not been ``clean`` for too long 
-  (i.e., it hasn't been able to completely recover from a previous failure).
-
-* ``stale`` - The placement group status has not been updated by a ``ceph-osd``,
-  indicating that all nodes storing this placement group may be ``down``.
-
-You can explicitly list stuck placement groups with one of::
-
-	ceph pg dump_stuck stale
-	ceph pg dump_stuck inactive
-	ceph pg dump_stuck unclean
-
-For stuck ``stale`` placement groups, it is normally a matter of getting the
-right ``ceph-osd`` daemons running again.  For stuck ``inactive`` placement
-groups, it is usually a peering problem (see :ref:`failures-osd-peering`).  For
-stuck ``unclean`` placement groups, there is usually something preventing
-recovery from completing, like unfound objects (see
-:ref:`failures-osd-unfound`);
-
-
-
-.. _failures-osd-peering:
-
-Placement Group Down - Peering Failure
-======================================
-
-In certain cases, the ``ceph-osd`` `Peering` process can run into
-problems, preventing a PG from becoming active and usable.  For
-example, ``ceph health`` might report::
-
-	ceph health detail
-	HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
-	...
-	pg 0.5 is down+peering
-	pg 1.4 is down+peering
-	...
-	osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651
-
-We can query the cluster to determine exactly why the PG is marked ``down`` with::
-
-	ceph pg 0.5 query
-
-.. code-block:: javascript
-
- { "state": "down+peering",
-   ...
-   "recovery_state": [
-        { "name": "Started\/Primary\/Peering\/GetInfo",
-          "enter_time": "2012-03-06 14:40:16.169679",
-          "requested_info_from": []},
-        { "name": "Started\/Primary\/Peering",
-          "enter_time": "2012-03-06 14:40:16.169659",
-          "probing_osds": [
-                0,
-                1],
-          "blocked": "peering is blocked due to down osds",
-          "down_osds_we_would_probe": [
-                1],
-          "peering_blocked_by": [
-                { "osd": 1,
-                  "current_lost_at": 0,
-                  "comment": "starting or marking this osd lost may let us proceed"}]},
-        { "name": "Started",
-          "enter_time": "2012-03-06 14:40:16.169513"}
-    ]
- }
-
-The ``recovery_state`` section tells us that peering is blocked due to
-down ``ceph-osd`` daemons, specifically ``osd.1``.  In this case, we can start that ``ceph-osd``
-and things will recover.
-
-Alternatively, if there is a catastrophic failure of ``osd.1`` (e.g., disk
-failure), we can tell the cluster that it is ``lost`` and to cope as
-best it can. 
-
-.. important:: This is dangerous in that the cluster cannot
-   guarantee that the other copies of the data are consistent 
-   and up to date.  
-
-To instruct Ceph to continue anyway::
-
-	ceph osd lost 1
-
-Recovery will proceed.
-
-
-.. _failures-osd-unfound:
-
-Unfound Objects
-===============
-
-Under certain combinations of failures Ceph may complain about
-``unfound`` objects::
-
-	ceph health detail
-	HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)
-	pg 2.4 is active+degraded, 78 unfound
-
-This means that the storage cluster knows that some objects (or newer
-copies of existing objects) exist, but it hasn't found copies of them.
-One example of how this might come about for a PG whose data is on ceph-osds
-1 and 2:
-
-* 1 goes down
-* 2 handles some writes, alone
-* 1 comes up
-* 1 and 2 repeer, and the objects missing on 1 are queued for recovery.
-* Before the new objects are copied, 2 goes down.
-
-Now 1 knows that these object exist, but there is no live ``ceph-osd`` who
-has a copy.  In this case, IO to those objects will block, and the
-cluster will hope that the failed node comes back soon; this is
-assumed to be preferable to returning an IO error to the user.
-
-First, you can identify which objects are unfound with::
-
-	ceph pg 2.4 list_missing [starting offset, in json]
-
-.. code-block:: javascript
-
- { "offset": { "oid": "",
-      "key": "",
-      "snapid": 0,
-      "hash": 0,
-      "max": 0},
-  "num_missing": 0,
-  "num_unfound": 0,
-  "objects": [
-     { "oid": "object 1",
-       "key": "",
-       "hash": 0,
-       "max": 0 },
-     ...
-  ],
-  "more": 0}
-
-If there are too many objects to list in a single result, the ``more``
-field will be true and you can query for more.  (Eventually the
-command line tool will hide this from you, but not yet.)
-
-Second, you can identify which OSDs have been probed or might contain
-data::
-
-	ceph pg 2.4 query
-
-.. code-block:: javascript
-
-   "recovery_state": [
-        { "name": "Started\/Primary\/Active",
-          "enter_time": "2012-03-06 15:15:46.713212",
-          "might_have_unfound": [
-                { "osd": 1,
-                  "status": "osd is down"}]},
-
-In this case, for example, the cluster knows that ``osd.1`` might have
-data, but it is ``down``.  The full range of possible states include:
-
-* already probed
-* querying
-* OSD is down
-* not queried (yet)
-
-Sometimes it simply takes some time for the cluster to query possible
-locations.  
-
-It is possible that there are other locations where the object can
-exist that are not listed.  For example, if a ceph-osd is stopped and
-taken out of the cluster, the cluster fully recovers, and due to some
-future set of failures ends up with an unfound object, it won't
-consider the long-departed ceph-osd as a potential location to
-consider.  (This scenario, however, is unlikely.)
-
-If all possible locations have been queried and objects are still
-lost, you may have to give up on the lost objects. This, again, is
-possible given unusual combinations of failures that allow the cluster
-to learn about writes that were performed before the writes themselves
-are recovered.  To mark the "unfound" objects as "lost"::
-
-	ceph pg 2.5 mark_unfound_lost revert|delete
-
-This the final argument specifies how the cluster should deal with
-lost objects.  
-
-The "delete" option will forget about them entirely.
-
-The "revert" option (not available for erasure coded pools) will
-either roll back to a previous version of the object or (if it was a
-new object) forget about it entirely.  Use this with caution, as it
-may confuse applications that expected the object to exist.
-
-
-Homeless Placement Groups
-=========================
-
-It is possible for all OSDs that had copies of a given placement groups to fail.
-If that's the case, that subset of the object store is unavailable, and the
-monitor will receive no status updates for those placement groups.  To detect
-this situation, the monitor marks any placement group whose primary OSD has
-failed as ``stale``.  For example::
-
-	ceph health
-	HEALTH_WARN 24 pgs stale; 3/300 in osds are down
-
-You can identify which placement groups are ``stale``, and what the last OSDs to
-store them were, with::
-
-	ceph health detail
-	HEALTH_WARN 24 pgs stale; 3/300 in osds are down
-	...
-	pg 2.5 is stuck stale+active+remapped, last acting [2,0]
-	...
-	osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
-	osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
-	osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861
-
-If we want to get placement group 2.5 back online, for example, this tells us that
-it was last managed by ``osd.0`` and ``osd.2``.  Restarting those ``ceph-osd``
-daemons will allow the cluster to recover that placement group (and, presumably,
-many others).
-
-
-Only a Few OSDs Receive Data
-============================
-
-If you have many nodes in your cluster and only a few of them receive data,
-`check`_ the number of placement groups in your pool. Since placement groups get
-mapped to OSDs, a small number of placement groups will not distribute across
-your cluster. Try creating a pool with a placement group count that is a
-multiple of the number of OSDs. See `Placement Groups`_ for details. The default
-placement group count for pools is not useful, but you can change it `here`_.
-
-
-Can't Write Data
-================
-
-If your cluster is up, but some OSDs are down and you cannot write data, 
-check to ensure that you have the minimum number of OSDs running for the
-placement group. If you don't have the minimum number of OSDs running, 
-Ceph will not allow you to write data because there is no guarantee
-that Ceph can replicate your data. See ``osd pool default min size``
-in the `Pool, PG and CRUSH Config Reference`_ for details.
-
-
-PGs Inconsistent
-================
-
-If you receive an ``active + clean + inconsistent`` state, this may happen
-due to an error during scrubbing. As always, we can identify the inconsistent
-placement group(s) with::
-
-    $ ceph health detail
-    HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
-    pg 0.6 is active+clean+inconsistent, acting [0,1,2]
-    2 scrub errors
-
-Or if you prefer inspecting the output in a programmatic way::
-
-    $ rados list-inconsistent-pg rbd
-    ["0.6"]
-
-There is only one consistent state, but in the worst case, we could have
-different inconsistencies in multiple perspectives found in more than one
-objects. If an object named ``foo`` in PG ``0.6`` is truncated, we will have::
-
-    $ rados list-inconsistent-obj 0.6 --format=json-pretty
-
-.. code-block:: javascript
-
-    {
-        "epoch": 14,
-        "inconsistents": [
-            {
-                "object": {
-                    "name": "foo",
-                    "nspace": "",
-                    "locator": "",
-                    "snap": "head",
-                    "version": 1
-                },
-                "errors": [
-                    "data_digest_mismatch",
-                    "size_mismatch"
-                ],
-                "union_shard_errors": [
-                    "data_digest_mismatch_oi",
-                    "size_mismatch_oi"
-                ],
-                "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
-                "shards": [
-                    {
-                        "osd": 0,
-                        "errors": [],
-                        "size": 968,
-                        "omap_digest": "0xffffffff",
-                        "data_digest": "0xe978e67f"
-                    },
-                    {
-                        "osd": 1,
-                        "errors": [],
-                        "size": 968,
-                        "omap_digest": "0xffffffff",
-                        "data_digest": "0xe978e67f"
-                    },
-                    {
-                        "osd": 2,
-                        "errors": [
-                            "data_digest_mismatch_oi",
-                            "size_mismatch_oi"
-                        ],
-                        "size": 0,
-                        "omap_digest": "0xffffffff",
-                        "data_digest": "0xffffffff"
-                    }
-                ]
-            }
-        ]
-    }
-
-In this case, we can learn from the output:
-
-* The only inconsistent object is named ``foo``, and it is its head that has
-  inconsistencies.
-* The inconsistencies fall into two categories:
-
-  * ``errors``: these errors indicate inconsistencies between shards without a
-    determination of which shard(s) are bad. Check for the ``errors`` in the
-    `shards` array, if available, to pinpoint the problem.
-
-    * ``data_digest_mismatch``: the digest of the replica read from OSD.2 is
-      different from the ones of OSD.0 and OSD.1
-    * ``size_mismatch``: the size of the replica read from OSD.2 is 0, while
-      the size reported by OSD.0 and OSD.1 is 968.
-  * ``union_shard_errors``: the union of all shard specific ``errors`` in
-    ``shards`` array. The ``errors`` are set for the given shard that has the
-    problem. They include errors like ``read_error``. The ``errors`` ending in
-    ``oi`` indicate a comparison with ``selected_object_info``. Look at the
-    ``shards`` array to determine which shard has which error(s).
-
-    * ``data_digest_mismatch_oi``: the digest stored in the object-info is not
-      ``0xffffffff``, which is calculated from the shard read from OSD.2
-    * ``size_mismatch_oi``: the size stored in the object-info is different
-      from the one read from OSD.2. The latter is 0.
-
-You can repair the inconsistent placement group by executing:: 
-
-	ceph pg repair {placement-group-ID}
-
-Which overwrites the `bad` copies with the `authoritative` ones. In most cases,
-Ceph is able to choose authoritative copies from all available replicas using
-some predefined criteria. But this does not always work. For example, the stored
-data digest could be missing, and the calculated digest will be ignored when
-choosing the authoritative copies. So, please use the above command with caution.
-
-If ``read_error`` is listed in the ``errors`` attribute of a shard, the
-inconsistency is likely due to disk errors. You might want to check your disk
-used by that OSD.
-
-If you receive ``active + clean + inconsistent`` states periodically due to 
-clock skew, you may consider configuring your `NTP`_ daemons on your 
-monitor hosts to act as peers. See `The Network Time Protocol`_ and Ceph 
-`Clock Settings`_ for additional details.
-
-
-Erasure Coded PGs are not active+clean
-======================================
-
-When CRUSH fails to find enough OSDs to map to a PG, it will show as a
-``2147483647`` which is ITEM_NONE or ``no OSD found``. For instance::
-
-     [2,1,6,0,5,8,2147483647,7,4]
-
-Not enough OSDs
----------------
-
-If the Ceph cluster only has 8 OSDs and the erasure coded pool needs
-9, that is what it will show. You can either create another erasure
-coded pool that requires less OSDs::
-
-     ceph osd erasure-code-profile set myprofile k=5 m=3
-     ceph osd pool create erasurepool 16 16 erasure myprofile
-
-or add a new OSDs and the PG will automatically use them.
-
-CRUSH constraints cannot be satisfied
--------------------------------------
-
-If the cluster has enough OSDs, it is possible that the CRUSH ruleset
-imposes constraints that cannot be satisfied. If there are 10 OSDs on
-two hosts and the CRUSH rulesets require that no two OSDs from the
-same host are used in the same PG, the mapping may fail because only
-two OSD will be found. You can check the constraint by displaying the
-ruleset::
-
-    $ ceph osd crush rule ls
-    [
-        "replicated_ruleset",
-        "erasurepool"]
-    $ ceph osd crush rule dump erasurepool
-    { "rule_id": 1,
-      "rule_name": "erasurepool",
-      "ruleset": 1,
-      "type": 3,
-      "min_size": 3,
-      "max_size": 20,
-      "steps": [
-            { "op": "take",
-              "item": -1,
-              "item_name": "default"},
-            { "op": "chooseleaf_indep",
-              "num": 0,
-              "type": "host"},
-            { "op": "emit"}]}
-
-
-You can resolve the problem by creating a new pool in which PGs are allowed
-to have OSDs residing on the same host with::
-
-     ceph osd erasure-code-profile set myprofile crush-failure-domain=osd
-     ceph osd pool create erasurepool 16 16 erasure myprofile
-
-CRUSH gives up too soon
------------------------
-
-If the Ceph cluster has just enough OSDs to map the PG (for instance a
-cluster with a total of 9 OSDs and an erasure coded pool that requires
-9 OSDs per PG), it is possible that CRUSH gives up before finding a
-mapping. It can be resolved by:
-
-* lowering the erasure coded pool requirements to use less OSDs per PG
-  (that requires the creation of another pool as erasure code profiles
-  cannot be dynamically modified).
-
-* adding more OSDs to the cluster (that does not require the erasure
-  coded pool to be modified, it will become clean automatically)
-
-* use a hand made CRUSH ruleset that tries more times to find a good
-  mapping. It can be done by setting ``set_choose_tries`` to a value
-  greater than the default.
-
-You should first verify the problem with ``crushtool`` after
-extracting the crushmap from the cluster so your experiments do not
-modify the Ceph cluster and only work on a local files::
-
-    $ ceph osd crush rule dump erasurepool
-    { "rule_name": "erasurepool",
-      "ruleset": 1,
-      "type": 3,
-      "min_size": 3,
-      "max_size": 20,
-      "steps": [
-            { "op": "take",
-              "item": -1,
-              "item_name": "default"},
-            { "op": "chooseleaf_indep",
-              "num": 0,
-              "type": "host"},
-            { "op": "emit"}]}
-    $ ceph osd getcrushmap > crush.map
-    got crush map from osdmap epoch 13
-    $ crushtool -i crush.map --test --show-bad-mappings \
-       --rule 1 \
-       --num-rep 9 \
-       --min-x 1 --max-x $((1024 * 1024))
-    bad mapping rule 8 x 43 num_rep 9 result [3,2,7,1,2147483647,8,5,6,0]
-    bad mapping rule 8 x 79 num_rep 9 result [6,0,2,1,4,7,2147483647,5,8]
-    bad mapping rule 8 x 173 num_rep 9 result [0,4,6,8,2,1,3,7,2147483647]
-
-Where ``--num-rep`` is the number of OSDs the erasure code crush
-ruleset needs, ``--rule`` is the value of the ``ruleset`` field
-displayed by ``ceph osd crush rule dump``.  The test will try mapping
-one million values (i.e. the range defined by ``[--min-x,--max-x]``)
-and must display at least one bad mapping. If it outputs nothing it
-means all mappings are successfull and you can stop right there: the
-problem is elsewhere.
-
-The crush ruleset can be edited by decompiling the crush map::
-
-    $ crushtool --decompile crush.map > crush.txt
-
-and adding the following line to the ruleset::
-
-    step set_choose_tries 100
-
-The relevant part of of the ``crush.txt`` file should look something
-like::
-
-     rule erasurepool {
-             ruleset 1
-             type erasure
-             min_size 3
-             max_size 20
-             step set_chooseleaf_tries 5
-             step set_choose_tries 100
-             step take default
-             step chooseleaf indep 0 type host
-             step emit
-     }
-
-It can then be compiled and tested again::
-
-    $ crushtool --compile crush.txt -o better-crush.map
-
-When all mappings succeed, an histogram of the number of tries that
-were necessary to find all of them can be displayed with the
-``--show-choose-tries`` option of ``crushtool``::
-
-    $ crushtool -i better-crush.map --test --show-bad-mappings \
-       --show-choose-tries \
-       --rule 1 \
-       --num-rep 9 \
-       --min-x 1 --max-x $((1024 * 1024))
-    ...
-    11:        42
-    12:        44
-    13:        54
-    14:        45
-    15:        35
-    16:        34
-    17:        30
-    18:        25
-    19:        19
-    20:        22
-    21:        20
-    22:        17
-    23:        13
-    24:        16
-    25:        13
-    26:        11
-    27:        11
-    28:        13
-    29:        11
-    30:        10
-    31:         6
-    32:         5
-    33:        10
-    34:         3
-    35:         7
-    36:         5
-    37:         2
-    38:         5
-    39:         5
-    40:         2
-    41:         5
-    42:         4
-    43:         1
-    44:         2
-    45:         2
-    46:         3
-    47:         1
-    48:         0
-    ...
-    102:         0
-    103:         1
-    104:         0
-    ...
-
-It took 11 tries to map 42 PGs, 12 tries to map 44 PGs etc. The highest number of tries is the minimum value of ``set_choose_tries`` that prevents bad mappings (i.e. 103 in the above output because it did not take more than 103 tries for any PG to be mapped).
-
-.. _check: ../../operations/placement-groups#get-the-number-of-placement-groups
-.. _here: ../../configuration/pool-pg-config-ref
-.. _Placement Groups: ../../operations/placement-groups
-.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
-.. _NTP: http://en.wikipedia.org/wiki/Network_Time_Protocol
-.. _The Network Time Protocol: http://www.ntp.org/
-.. _Clock Settings: ../../configuration/mon-config-ref/#clock
-
-
-- 
cgit 1.2.3-korg