summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/rbd
diff options
context:
space:
mode:
Diffstat (limited to 'src/ceph/doc/rbd')
-rw-r--r--src/ceph/doc/rbd/api/index.rst8
-rw-r--r--src/ceph/doc/rbd/api/librbdpy.rst82
-rw-r--r--src/ceph/doc/rbd/disk.conf8
-rw-r--r--src/ceph/doc/rbd/index.rst72
-rw-r--r--src/ceph/doc/rbd/iscsi-initiator-esx.rst36
-rw-r--r--src/ceph/doc/rbd/iscsi-initiator-rhel.rst90
-rw-r--r--src/ceph/doc/rbd/iscsi-initiator-win.rst100
-rw-r--r--src/ceph/doc/rbd/iscsi-initiators.rst10
-rw-r--r--src/ceph/doc/rbd/iscsi-monitoring.rst103
-rw-r--r--src/ceph/doc/rbd/iscsi-overview.rst50
-rw-r--r--src/ceph/doc/rbd/iscsi-requirements.rst49
-rw-r--r--src/ceph/doc/rbd/iscsi-target-ansible.rst343
-rw-r--r--src/ceph/doc/rbd/iscsi-target-cli.rst163
-rw-r--r--src/ceph/doc/rbd/iscsi-targets.rst27
-rw-r--r--src/ceph/doc/rbd/libvirt.rst319
-rw-r--r--src/ceph/doc/rbd/man/index.rst16
-rw-r--r--src/ceph/doc/rbd/qemu-rbd.rst218
-rw-r--r--src/ceph/doc/rbd/rados-rbd-cmds.rst223
-rw-r--r--src/ceph/doc/rbd/rbd-cloudstack.rst135
-rw-r--r--src/ceph/doc/rbd/rbd-config-ref.rst136
-rw-r--r--src/ceph/doc/rbd/rbd-ko.rst59
-rw-r--r--src/ceph/doc/rbd/rbd-mirroring.rst318
-rw-r--r--src/ceph/doc/rbd/rbd-openstack.rst512
-rw-r--r--src/ceph/doc/rbd/rbd-replay.rst42
-rw-r--r--src/ceph/doc/rbd/rbd-snapshot.rst308
25 files changed, 3427 insertions, 0 deletions
diff --git a/src/ceph/doc/rbd/api/index.rst b/src/ceph/doc/rbd/api/index.rst
new file mode 100644
index 0000000..71f6809
--- /dev/null
+++ b/src/ceph/doc/rbd/api/index.rst
@@ -0,0 +1,8 @@
+========================
+ Ceph Block Device APIs
+========================
+
+.. toctree::
+ :maxdepth: 2
+
+ librados (Python) <librbdpy>
diff --git a/src/ceph/doc/rbd/api/librbdpy.rst b/src/ceph/doc/rbd/api/librbdpy.rst
new file mode 100644
index 0000000..fa90331
--- /dev/null
+++ b/src/ceph/doc/rbd/api/librbdpy.rst
@@ -0,0 +1,82 @@
+================
+ Librbd (Python)
+================
+
+.. highlight:: python
+
+The `rbd` python module provides file-like access to RBD images.
+
+
+Example: Creating and writing to an image
+=========================================
+
+To use `rbd`, you must first connect to RADOS and open an IO
+context::
+
+ cluster = rados.Rados(conffile='my_ceph.conf')
+ cluster.connect()
+ ioctx = cluster.open_ioctx('mypool')
+
+Then you instantiate an :class:rbd.RBD object, which you use to create the
+image::
+
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+
+To perform I/O on the image, you instantiate an :class:rbd.Image object::
+
+ image = rbd.Image(ioctx, 'myimage')
+ data = 'foo' * 200
+ image.write(data, 0)
+
+This writes 'foo' to the first 600 bytes of the image. Note that data
+cannot be :type:unicode - `Librbd` does not know how to deal with
+characters wider than a :c:type:char.
+
+In the end, you will want to close the image, the IO context and the connection to RADOS::
+
+ image.close()
+ ioctx.close()
+ cluster.shutdown()
+
+To be safe, each of these calls would need to be in a separate :finally
+block::
+
+ cluster = rados.Rados(conffile='my_ceph_conf')
+ try:
+ ioctx = cluster.open_ioctx('my_pool')
+ try:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ image = rbd.Image(ioctx, 'myimage')
+ try:
+ data = 'foo' * 200
+ image.write(data, 0)
+ finally:
+ image.close()
+ finally:
+ ioctx.close()
+ finally:
+ cluster.shutdown()
+
+This can be cumbersome, so the :class:`Rados`, :class:`Ioctx`, and
+:class:`Image` classes can be used as context managers that close/shutdown
+automatically (see :pep:`343`). Using them as context managers, the
+above example becomes::
+
+ with rados.Rados(conffile='my_ceph.conf') as cluster:
+ with cluster.open_ioctx('mypool') as ioctx:
+ rbd_inst = rbd.RBD()
+ size = 4 * 1024**3 # 4 GiB
+ rbd_inst.create(ioctx, 'myimage', size)
+ with rbd.Image(ioctx, 'myimage') as image:
+ data = 'foo' * 200
+ image.write(data, 0)
+
+API Reference
+=============
+
+.. automodule:: rbd
+ :members: RBD, Image, SnapIterator
diff --git a/src/ceph/doc/rbd/disk.conf b/src/ceph/doc/rbd/disk.conf
new file mode 100644
index 0000000..3db9b8a
--- /dev/null
+++ b/src/ceph/doc/rbd/disk.conf
@@ -0,0 +1,8 @@
+<disk type='network' device='disk'>
+ <source protocol='rbd' name='poolname/imagename'>
+ <host name='{fqdn}' port='6789'/>
+ <host name='{fqdn}' port='6790'/>
+ <host name='{fqdn}' port='6791'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+</disk>
diff --git a/src/ceph/doc/rbd/index.rst b/src/ceph/doc/rbd/index.rst
new file mode 100644
index 0000000..c297d0d
--- /dev/null
+++ b/src/ceph/doc/rbd/index.rst
@@ -0,0 +1,72 @@
+===================
+ Ceph Block Device
+===================
+
+.. index:: Ceph Block Device; introduction
+
+A block is a sequence of bytes (for example, a 512-byte block of data).
+Block-based storage interfaces are the most common way to store data with
+rotating media such as hard disks, CDs, floppy disks, and even traditional
+9-track tape. The ubiquity of block device interfaces makes a virtual block
+device an ideal candidate to interact with a mass data storage system like Ceph.
+
+Ceph block devices are thin-provisioned, resizable and store data striped over
+multiple OSDs in a Ceph cluster. Ceph block devices leverage
+:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` capabilities
+such as snapshotting, replication and consistency. Ceph's
+:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` Block Devices (RBD)
+interact with OSDs using kernel modules or the ``librbd`` library.
+
+.. ditaa:: +------------------------+ +------------------------+
+ | Kernel Module | | librbd |
+ +------------------------+-+------------------------+
+ | RADOS Protocol |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. note:: Kernel modules can use Linux page caching. For ``librbd``-based
+ applications, Ceph supports `RBD Caching`_.
+
+Ceph's block devices deliver high performance with infinite scalability to
+`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
+cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
+libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
+to operate the `Ceph RADOS Gateway`_, the `Ceph FS filesystem`_, and Ceph block
+devices simultaneously.
+
+.. important:: To use Ceph Block Devices, you must have access to a running
+ Ceph cluster.
+
+.. toctree::
+ :maxdepth: 1
+
+ Commands <rados-rbd-cmds>
+ Kernel Modules <rbd-ko>
+ Snapshots<rbd-snapshot>
+ Mirroring <rbd-mirroring>
+ iSCSI Gateway <iscsi-overview>
+ QEMU <qemu-rbd>
+ libvirt <libvirt>
+ Cache Settings <rbd-config-ref/>
+ OpenStack <rbd-openstack>
+ CloudStack <rbd-cloudstack>
+ RBD Replay <rbd-replay>
+
+.. toctree::
+ :maxdepth: 2
+
+ Manpages <man/index>
+
+.. toctree::
+ :maxdepth: 2
+
+ APIs <api/index>
+
+.. _RBD Caching: ../rbd-config-ref/
+.. _kernel modules: ../rbd-ko/
+.. _QEMU: ../qemu-rbd/
+.. _OpenStack: ../rbd-openstack
+.. _CloudStack: ../rbd-cloudstack
+.. _Ceph RADOS Gateway: ../../radosgw/
+.. _Ceph FS filesystem: ../../cephfs/
diff --git a/src/ceph/doc/rbd/iscsi-initiator-esx.rst b/src/ceph/doc/rbd/iscsi-initiator-esx.rst
new file mode 100644
index 0000000..18dd583
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-initiator-esx.rst
@@ -0,0 +1,36 @@
+----------------------------------
+The iSCSI Initiator for VMware ESX
+----------------------------------
+
+**Prerequisite:**
+
+- VMware ESX 6.0 or later
+
+**iSCSI Discovery and Multipath Device Setup:**
+
+#. From vSphere, open the Storage Adapters, on the Configuration tab. Right click
+ on the iSCSI Software Adapter and select Properties.
+
+#. In the General tab click the "Advanced" button and in the "Advanced Settings"
+ set RecoveryTimeout to 25.
+
+#. If CHAP was setup on the iSCSI gateway, in the General tab click the "CHAP…​"
+ button. If CHAP is not being used, skip to step 4.
+
+#. On the CHAP Credentials windows, select “Do not use CHAP unless required by target”,
+ and enter the "Name" and "Secret" values used on the initial setup for the iSCSI
+ gateway, then click on the "OK" button.
+
+#. On the Dynamic Discovery tab, click the "Add…​" button, and enter the IP address
+ and port of one of the iSCSI target portals. Click on the "OK" button.
+
+#. Close the iSCSI Initiator Properties window. A prompt will ask to rescan the
+ iSCSI software adapter. Select Yes.
+
+#. In the Details pane, the LUN on the iSCSI target will be displayed. Right click
+ on a device and select "Manage Paths".
+
+#. On the Manage Paths window, select “Most Recently Used (VMware)” for the policy
+ path selection. Close and repeat for the other disks.
+
+Now the disks can be used for datastores.
diff --git a/src/ceph/doc/rbd/iscsi-initiator-rhel.rst b/src/ceph/doc/rbd/iscsi-initiator-rhel.rst
new file mode 100644
index 0000000..51248e4
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-initiator-rhel.rst
@@ -0,0 +1,90 @@
+------------------------------------------------
+The iSCSI Initiator for Red Hat Enterprise Linux
+------------------------------------------------
+
+**Prerequisite:**
+
+- Package ``iscsi-initiator-utils-6.2.0.873-35`` or newer must be
+ installed
+
+- Package ``device-mapper-multipath-0.4.9-99`` or newer must be
+ installed
+
+**Installing:**
+
+Install the iSCSI initiator and multipath tools:
+
+ ::
+
+ # yum install iscsi-initiator-utils
+ # yum install device-mapper-multipath
+
+**Configuring:**
+
+#. Create the default ``/etc/multipath.conf`` file and enable the
+ ``multiapthd`` service:
+
+ ::
+
+ # mpathconf --enable --with_multipathd y
+
+#. Add the following to ``/etc/multipath.conf`` file:
+
+ ::
+
+ devices {
+ device {
+ vendor "LIO-ORG"
+ hardware_handler "1 alua"
+ path_grouping_policy "failover"
+ path_selector "queue-length 0"
+ failback 60
+ path_checker tur
+ prio alua
+ prio_args exclusive_pref_bit
+ fast_oi_fail_tmo 25
+ no_path_retry queue
+ }
+ }
+
+#. Restart the ``multipathd`` service:
+
+ ::
+
+ # systemctl reload multipathd
+
+**iSCSI Discovery and Setup:**
+
+#. Discover the target portals:
+
+ ::
+
+ # iscsiadm -m discovery -t -st 192.168.56.101
+ 192.168.56.101:3260,1 iqn.2003-01.org.linux-iscsi.rheln1
+ 192.168.56.102:3260,2 iqn.2003-01.org.linux-iscsi.rheln1
+
+#. Login to target:
+
+ ::
+
+ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.rheln1 -l
+
+**Multipath IO Setup:**
+
+The multipath daemon (``multipathd``), will set up devices automatically
+based on the ``multipath.conf`` settings. Running the ``multipath``
+command show devices setup in a failover configuration with a priority
+group for each path.
+
+::
+
+ # multipath -ll
+ mpathbt (360014059ca317516a69465c883a29603) dm-1 LIO-ORG ,IBLOCK
+ size=1.0G features='0' hwhandler='1 alua' wp=rw
+ |-+- policy='queue-length 0' prio=50 status=active
+ | `- 28:0:0:1 sde 8:64 active ready running
+ `-+- policy='queue-length 0' prio=10 status=enabled
+ `- 29:0:0:1 sdc 8:32 active ready running
+
+You should now be able to use the RBD image like you would a normal
+multipath’d iSCSI disk.
diff --git a/src/ceph/doc/rbd/iscsi-initiator-win.rst b/src/ceph/doc/rbd/iscsi-initiator-win.rst
new file mode 100644
index 0000000..08a1cfb
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-initiator-win.rst
@@ -0,0 +1,100 @@
+-----------------------------------------
+The iSCSI Initiator for Microsoft Windows
+-----------------------------------------
+
+**Prerequisite:**
+
+- Microsoft Windows 2016
+
+**iSCSI Initiator, Discovery and Setup:**
+
+#. Install the iSCSI initiator driver and MPIO tools.
+
+#. Launch the MPIO program, click on the “Discover Multi-Paths” tab select “Add
+ support for iSCSI devices”.
+
+#. On the iSCSI Initiator Properties window, on the "Discovery" tab, add a target
+ portal. Enter the IP address or DNS name and Port of the Ceph iSCSI gateway.
+
+#. On the “Targets” tab, select the target and click on “Connect”.
+
+#. On the “Connect To Target” window, select the “Enable multi-path” option, and
+ click the “Advanced” button.
+
+#. Under the "Connet using" section, select a “Target portal IP” . Select the
+ “Enable CHAP login on” and enter the "Name" and "Target secret" values from the
+ Ceph iSCSI Ansible client credentials section, and click OK.
+
+#. Repeat steps 5 and 6 for each target portal defined when setting up
+ the iSCSI gateway.
+
+**Multipath IO Setup:**
+
+Configuring the MPIO load balancing policy, setting the timeout and
+retry options are using PowerShell with the ``mpclaim`` command. The
+reset is done in the MPIO tool.
+
+.. note::
+ It is recommended to increase the ``PDORemovePeriod`` option to 120
+ seconds from PowerShell. This value might need to be adjusted based
+ on the application. When all paths are down, and 120 seconds
+ expires, the operating system will start failing IO requests.
+
+::
+
+ Set-MPIOSetting -NewPDORemovePeriod 120
+
+::
+
+ mpclaim.exe -l -m 1
+
+::
+
+ mpclaim -s -m
+ MSDSM-wide Load Balance Policy: Fail Over Only
+
+#. Using the MPIO tool, from the “Targets” tab, click on the
+ “Devices...” button.
+
+#. From the Devices window, select a disk and click the
+ “MPIO...” button.
+
+#. On the "Device Details" window the paths to each target portal is
+ displayed. If using the ``ceph-ansible`` setup method, the
+ iSCSI gateway will use ALUA to tell the iSCSI initiator which path
+ and iSCSI gateway should be used as the primary path. The Load
+ Balancing Policy “Fail Over Only” must be selected
+
+::
+
+ mpclaim -s -d $MPIO_DISK_ID
+
+.. note::
+ For the ``ceph-ansible`` setup method, there will be one
+ Active/Optimized path which is the path to the iSCSI gateway node
+ that owns the LUN, and there will be an Active/Unoptimized path for
+ each other iSCSI gateway node.
+
+**Tuning:**
+
+Consider using the following registry settings:
+
+- Windows Disk Timeout
+
+ ::
+
+ HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk
+
+ ::
+
+ TimeOutValue = 65
+
+- Microsoft iSCSI Initiator Driver
+
+ ::
+
+ HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
+
+ ::
+ LinkDownTime = 25
+ SRBTimeoutDelta = 15
diff --git a/src/ceph/doc/rbd/iscsi-initiators.rst b/src/ceph/doc/rbd/iscsi-initiators.rst
new file mode 100644
index 0000000..d3ad633
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-initiators.rst
@@ -0,0 +1,10 @@
+--------------------------------
+Configuring the iSCSI Initiators
+--------------------------------
+
+.. toctree::
+ :maxdepth: 1
+
+ The iSCSI Initiator for Red Hat Enterprise Linux <iscsi-initiator-rhel>
+ The iSCSI Initiator for Microsoft Windows <iscsi-initiator-win>
+ The iSCSI Initiator for VMware ESX <iscsi-initiator-esx>
diff --git a/src/ceph/doc/rbd/iscsi-monitoring.rst b/src/ceph/doc/rbd/iscsi-monitoring.rst
new file mode 100644
index 0000000..d425232
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-monitoring.rst
@@ -0,0 +1,103 @@
+-----------------------------
+Monitoring the iSCSI gateways
+-----------------------------
+
+Ceph provides an additional tool for iSCSI gateway environments
+to monitor performance of exported RADOS Block Device (RBD) images.
+
+The ``gwtop`` tool is a ``top``-like tool that displays aggregated
+performance metrics of RBD images that are exported to clients over
+iSCSI. The metrics are sourced from a Performance Metrics Domain Agent
+(PMDA). Information from the Linux-IO target (LIO) PMDA is used to list
+each exported RBD image with the connected client and its associated I/O
+metrics.
+
+**Requirements:**
+
+- A running Ceph iSCSI gateway
+
+**Installing:**
+
+#. As ``root``, install the ``ceph-iscsi-tools`` package on each iSCSI
+ gateway node:
+
+ ::
+
+ # yum install ceph-iscsi-tools
+
+#. As ``root``, install the performance co-pilot package on each iSCSI
+ gateway node:
+
+ ::
+
+ # yum install pcp
+
+#. As ``root``, install the LIO PMDA package on each iSCSI gateway node:
+
+ ::
+
+ # yum install pcp-pmda-lio
+
+#. As ``root``, enable and start the performance co-pilot service on
+ each iSCSI gateway node:
+
+ ::
+
+ # systemctl enable pmcd
+ # systemctl start pmcd
+
+#. As ``root``, register the ``pcp-pmda-lio`` agent:
+
+ ::
+
+ cd /var/lib/pcp/pmdas/lio
+ ./Install
+
+By default, ``gwtop`` assumes the iSCSI gateway configuration object is
+stored in a RADOS object called ``gateway.conf`` in the ``rbd`` pool.
+This configuration defines the iSCSI gateways to contact for gathering
+the performance statistics. This can be overridden by using either the
+``-g`` or ``-c`` flags. See ``gwtop --help`` for more details.
+
+The LIO configuration determines which type of performance statistics to
+extract from performance co-pilot. When ``gwtop`` starts it looks at the
+LIO configuration, and if it find user-space disks, then ``gwtop``
+selects the LIO collector automatically.
+
+**Example ``gwtop`` Outputs**
+
+For kernel RBD-based devices:
+
+::
+
+ gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:09
+ Capacity: 8G Disks: 8 IOPS: 500 Clients: 1 Ceph: HEALTH_OK OSDs: 3
+ Pool.Image Src Device Size r/s w/s rMB/s wMB/s await r_await w_await Client
+ iscsi.t1703 rbd0 500M 0 0 0.00 0.00 0.00 0.00 0.00
+ iscsi.testme1 rbd5 500M 0 0 0.00 0.00 0.00 0.00 0.00
+ iscsi.testme2 rbd2 500M 0 0 0.00 0.00 0.00 0.00 0.00
+ iscsi.testme3 rbd3 500M 0 0 0.00 0.00 0.00 0.00 0.00
+ iscsi.testme5 rbd1 500M 0 0 0.00 0.00 0.00 0.00 0.00
+ rbd.myhost_1 T rbd4 4G 500 0 1.95 0.00 2.37 2.37 0.00 rh460p(CON)
+ rbd.test_2 rbd6 1G 0 0 0.00 0.00 0.00 0.00 0.00
+ rbd.testme rbd7 500M 0 0 0.00 0.00 0.00 0.00 0.00
+
+For user backed storage (TCMU) devices:
+
+::
+
+ gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:00
+ Capacity: 8G Disks: 8 IOPS: 503 Clients: 1 Ceph: HEALTH_OK OSDs: 3
+ Pool.Image Src Size iops rMB/s wMB/s Client
+ iscsi.t1703 500M 0 0.00 0.00
+ iscsi.testme1 500M 0 0.00 0.00
+ iscsi.testme2 500M 0 0.00 0.00
+ iscsi.testme3 500M 0 0.00 0.00
+ iscsi.testme5 500M 0 0.00 0.00
+ rbd.myhost_1 T 4G 504 1.95 0.00 rh460p(CON)
+ rbd.test_2 1G 0 0.00 0.00
+ rbd.testme 500M 0 0.00 0.00
+
+In the *Client* column, ``(CON)`` means the iSCSI initiator (client) is
+currently logged into the iSCSI gateway. If ``-multi-`` is displayed,
+then multiple clients are mapped to the single RBD image.
diff --git a/src/ceph/doc/rbd/iscsi-overview.rst b/src/ceph/doc/rbd/iscsi-overview.rst
new file mode 100644
index 0000000..a8c64e2
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-overview.rst
@@ -0,0 +1,50 @@
+==================
+Ceph iSCSI Gateway
+==================
+
+The iSCSI gateway is integrating Ceph Storage with the iSCSI standard to provide
+a Highly Available (HA) iSCSI target that exports RADOS Block Device (RBD) images
+as SCSI disks. The iSCSI protocol allows clients (initiators) to send SCSI commands
+to SCSI storage devices (targets) over a TCP/IP network. This allows for heterogeneous
+clients, such as Microsoft Windows, to access the Ceph Storage cluster.
+
+Each iSCSI gateway runs the Linux IO target kernel subsystem (LIO) to provide the
+iSCSI protocol support. LIO utilizes a userspace passthrough (TCMU) to interact
+with Ceph's librbd library and expose RBD images to iSCSI clients. With Ceph’s
+iSCSI gateway you can effectively run a fully integrated block-storage
+infrastructure with all the features and benefits of a conventional Storage Area
+Network (SAN).
+
+.. ditaa::
+ Cluster Network
+ +-------------------------------------------+
+ | | | |
+ +-------+ +-------+ +-------+ +-------+
+ | | | | | | | |
+ | OSD 1 | | OSD 2 | | OSD 3 | | OSD N |
+ | {s}| | {s}| | {s}| | {s}|
+ +-------+ +-------+ +-------+ +-------+
+ | | | |
+ +--------->| | +---------+ | |<---------+
+ : | | | RBD | | | :
+ | +----------------| Image |----------------+ |
+ | Public Network | {d} | |
+ | +---------+ |
+ | |
+ | +-------------------+ |
+ | +--------------+ | iSCSI Initators | +--------------+ |
+ | | iSCSI GW | | +-----------+ | | iSCSI GW | |
+ +-->| RBD Module |<--+ | Various | +-->| RBD Module |<--+
+ | | | | Operating | | | |
+ +--------------+ | | Systems | | +--------------+
+ | +-----------+ |
+ +-------------------+
+
+
+.. toctree::
+ :maxdepth: 1
+
+ Requirements <iscsi-requirements>
+ Configuring the iSCSI Target <iscsi-targets>
+ Configuring the iSCSI Initiator <iscsi-initiators>
+ Monitoring the iSCSI Gateways <iscsi-monitoring>
diff --git a/src/ceph/doc/rbd/iscsi-requirements.rst b/src/ceph/doc/rbd/iscsi-requirements.rst
new file mode 100644
index 0000000..1ae19e0
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-requirements.rst
@@ -0,0 +1,49 @@
+==========================
+iSCSI Gateway Requirements
+==========================
+
+To implement the Ceph iSCSI gateway there are a few requirements. It is recommended
+to use two to four iSCSI gateway nodes for a highly available Ceph iSCSI gateway
+solution.
+
+For hardware recommendations, see the `Hardware Recommendation page <http://docs.ceph.com/docs/master/start/hardware-recommendations/>`_
+for more details.
+
+.. note::
+ On the iSCSI gateway nodes, the memory footprint of the RBD images
+ can grow to a large size. Plan memory requirements accordingly based
+ off the number RBD images mapped.
+
+There are no specific iSCSI gateway options for the Ceph Monitors or
+OSDs, but it is important to lower the default timers for detecting
+down OSDs to reduce the possibility of initiator timeouts. The following
+configuration options are suggested for each OSD node in the storage
+cluster::
+
+ [osd]
+ osd heartbeat grace = 20
+ osd heartbeat interval = 5
+
+- Online Updating Using the Ceph Monitor
+
+ ::
+
+ ceph tell <daemon_type>.<id> injectargs '--<parameter_name> <new_value>'
+
+ ::
+
+ ceph tell osd.0 injectargs '--osd_heartbeat_grace 20'
+ ceph tell osd.0 injectargs '--osd_heartbeat_interval 5'
+
+- Online Updating on the OSD Node
+
+ ::
+
+ ceph daemon <daemon_type>.<id> config set osd_client_watch_timeout 15
+
+ ::
+
+ ceph daemon osd.0 config set osd_heartbeat_grace 20
+ ceph daemon osd.0 config set osd_heartbeat_interval 5
+
+For more details on setting Ceph's configuration options, see the `Configuration page <http://docs.ceph.com/docs/master/rados/configuration/>`_.
diff --git a/src/ceph/doc/rbd/iscsi-target-ansible.rst b/src/ceph/doc/rbd/iscsi-target-ansible.rst
new file mode 100644
index 0000000..4169a9f
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-target-ansible.rst
@@ -0,0 +1,343 @@
+==========================================
+Configuring the iSCSI Target using Ansible
+==========================================
+
+The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client
+node. The Ceph iSCSI gateway can be a standalone node or be colocated on
+a Ceph Object Store Disk (OSD) node. Completing the following steps will
+install, and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous (12.2.x) cluster or newer
+
+- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer
+
+- The ``ceph-iscsi-config`` package installed on all the iSCSI gateway nodes
+
+**Installing:**
+
+#. On the Ansible installer node, which could be either the administration node
+ or a dedicated deployment node, perform the following steps:
+
+ #. As ``root``, install the ``ceph-ansible`` package:
+
+ ::
+
+ # yum install ceph-ansible
+
+ #. Add an entry in ``/etc/ansible/hosts`` file for the gateway group:
+
+ ::
+
+ [ceph-iscsi-gw]
+ ceph-igw-1
+ ceph-igw-2
+
+.. note::
+ If co-locating the iSCSI gateway with an OSD node, then add the OSD node to the
+ ``[ceph-iscsi-gw]`` section.
+
+**Configuring:**
+
+The ``ceph-ansible`` package places a file in the ``/usr/share/ceph-ansible/group_vars/``
+directory called ``ceph-iscsi-gw.sample``. Create a copy of this sample file named
+``ceph-iscsi-gw.yml``. Review the following Ansible variables and descriptions,
+and update accordingly.
+
++--------------------------------------+--------------------------------------+
+| Variable | Meaning/Purpose |
++======================================+======================================+
+| ``seed_monitor`` | Each gateway needs access to the |
+| | ceph cluster for rados and rbd |
+| | calls. This means the iSCSI gateway |
+| | must have an appropriate |
+| | ``/etc/ceph/`` directory defined. |
+| | The ``seed_monitor`` host is used to |
+| | populate the iSCSI gateway’s |
+| | ``/etc/ceph/`` directory. |
++--------------------------------------+--------------------------------------+
+| ``cluster_name`` | Define a custom storage cluster |
+| | name. |
++--------------------------------------+--------------------------------------+
+| ``gateway_keyring`` | Define a custom keyring name. |
++--------------------------------------+--------------------------------------+
+| ``deploy_settings`` | If set to ``true``, then deploy the |
+| | settings when the playbook is ran. |
++--------------------------------------+--------------------------------------+
+| ``perform_system_checks`` | This is a boolean value that checks |
+| | for multipath and lvm configuration |
+| | settings on each gateway. It must be |
+| | set to true for at least the first |
+| | run to ensure multipathd and lvm are |
+| | configured properly. |
++--------------------------------------+--------------------------------------+
+| ``gateway_iqn`` | This is the iSCSI IQN that all the |
+| | gateways will expose to clients. |
+| | This means each client will see the |
+| | gateway group as a single subsystem. |
++--------------------------------------+--------------------------------------+
+| ``gateway_ip_list`` | The ip list defines the IP addresses |
+| | that will be used on the front end |
+| | network for iSCSI traffic. This IP |
+| | will be bound to the active target |
+| | portal group on each node, and is |
+| | the access point for iSCSI traffic. |
+| | Each IP should correspond to an IP |
+| | available on the hosts defined in |
+| | the ``ceph-iscsi-gw`` host group in |
+| | ``/etc/ansible/hosts``. |
++--------------------------------------+--------------------------------------+
+| ``rbd_devices`` | This section defines the RBD images |
+| | that will be controlled and managed |
+| | within the iSCSI gateway |
+| | configuration. Parameters like |
+| | ``pool`` and ``image`` are self |
+| | explanatory. Here are the other |
+| | parameters: ``size`` = This defines |
+| | the size of the RBD. You may |
+| | increase the size later, by simply |
+| | changing this value, but shrinking |
+| | the size of an RBD is not supported |
+| | and is ignored. ``host`` = This is |
+| | the iSCSI gateway host name that |
+| | will be responsible for the rbd |
+| | allocation/resize. Every defined |
+| | ``rbd_device`` entry must have a |
+| | host assigned. ``state`` = This is |
+| | typical Ansible syntax for whether |
+| | the resource should be defined or |
+| | removed. A request with a state of |
+| | absent will first be checked to |
+| | ensure the rbd is not mapped to any |
+| | client. If the RBD is unallocated, |
+| | it will be removed from the iSCSI |
+| | gateway and deleted from the |
+| | configuration. |
++--------------------------------------+--------------------------------------+
+| ``client_connections`` | This section defines the iSCSI |
+| | client connection details together |
+| | with the LUN (RBD image) masking. |
+| | Currently only CHAP is supported as |
+| | an authentication mechanism. Each |
+| | connection defines an ``image_list`` |
+| | which is a comma separated list of |
+| | the form |
+| | ``pool.rbd_image[,pool.rbd_image]``. |
+| | RBD images can be added and removed |
+| | from this list, to change the client |
+| | masking. Note that there are no |
+| | checks done to limit RBD sharing |
+| | across client connections. |
++--------------------------------------+--------------------------------------+
+
+.. note::
+ When using the ``gateway_iqn`` variable, and for Red Hat Enterprise Linux
+ clients, installing the ``iscsi-initiator-utils`` package is required for
+ retrieving the gateway’s IQN name. The iSCSI initiator name is located in the
+ ``/etc/iscsi/initiatorname.iscsi`` file.
+
+**Deploying:**
+
+On the Ansible installer node, perform the following steps.
+
+#. As ``root``, execute the Ansible playbook:
+
+ ::
+
+ # cd /usr/share/ceph-ansible
+ # ansible-playbook ceph-iscsi-gw.yml
+
+ .. note::
+ The Ansible playbook will handle RPM dependencies, RBD creation
+ and Linux IO configuration.
+
+#. Verify the configuration from an iSCSI gateway node:
+
+ ::
+
+ # gwcli ls
+
+ .. note::
+ For more information on using the ``gwcli`` command to install and configure
+ a Ceph iSCSI gateaway, see the `Configuring the iSCSI Target using the Command Line Interface`_
+ section.
+
+ .. important::
+ Attempting to use the ``targetcli`` tool to change the configuration will
+ result in the following issues, such as ALUA misconfiguration and path failover
+ problems. There is the potential to corrupt data, to have mismatched
+ configuration across iSCSI gateways, and to have mismatched WWN information,
+ which will lead to client multipath problems.
+
+**Service Management:**
+
+The ``ceph-iscsi-config`` package installs the configuration management
+logic and a Systemd service called ``rbd-target-gw``. When the Systemd
+service is enabled, the ``rbd-target-gw`` will start at boot time and
+will restore the Linux IO state. The Ansible playbook disables the
+target service during the deployment. Below are the outcomes of when
+interacting with the ``rbd-target-gw`` Systemd service.
+
+::
+
+ # systemctl <start|stop|restart|reload> rbd-target-gw
+
+- ``reload``
+
+ A reload request will force ``rbd-target-gw`` to reread the
+ configuration and apply it to the current running environment. This
+ is normally not required, since changes are deployed in parallel from
+ Ansible to all iSCSI gateway nodes
+
+- ``stop``
+
+ A stop request will close the gateway’s portal interfaces, dropping
+ connections to clients and wipe the current LIO configuration from
+ the kernel. This returns the iSCSI gateway to a clean state. When
+ clients are disconnected, active I/O is rescheduled to the other
+ iSCSI gateways by the client side multipathing layer.
+
+**Administration:**
+
+Within the ``/usr/share/ceph-ansible/group_vars/ceph-iscsi-gw`` file
+there are a number of operational workflows that the Ansible playbook
+supports.
+
+.. warning::
+ Before removing RBD images from the iSCSI gateway configuration,
+ follow the standard procedures for removing a storage device from
+ the operating system.
+
++--------------------------------------+--------------------------------------+
+| I want to…​ | Update the ``ceph-iscsi-gw`` file |
+| | by…​ |
++======================================+======================================+
+| Add more RBD images | Adding another entry to the |
+| | ``rbd_devices`` section with the new |
+| | image. |
++--------------------------------------+--------------------------------------+
+| Resize an existing RBD image | Updating the size parameter within |
+| | the ``rbd_devices`` section. Client |
+| | side actions are required to pick up |
+| | the new size of the disk. |
++--------------------------------------+--------------------------------------+
+| Add a client | Adding an entry to the |
+| | ``client_connections`` section. |
++--------------------------------------+--------------------------------------+
+| Add another RBD to a client | Adding the relevant RBD |
+| | ``pool.image`` name to the |
+| | ``image_list`` variable for the |
+| | client. |
++--------------------------------------+--------------------------------------+
+| Remove an RBD from a client | Removing the RBD ``pool.image`` name |
+| | from the clients ``image_list`` |
+| | variable. |
++--------------------------------------+--------------------------------------+
+| Remove an RBD from the system | Changing the RBD entry state |
+| | variable to ``absent``. The RBD |
+| | image must be unallocated from the |
+| | operating system first for this to |
+| | succeed. |
++--------------------------------------+--------------------------------------+
+| Change the clients CHAP credentials | Updating the relevant CHAP details |
+| | in ``client_connections``. This will |
+| | need to be coordinated with the |
+| | clients. For example, the client |
+| | issues an iSCSI logout, the |
+| | credentials are changed by the |
+| | Ansible playbook, the credentials |
+| | are changed at the client, then the |
+| | client performs an iSCSI login. |
++--------------------------------------+--------------------------------------+
+| Remove a client | Updating the relevant |
+| | ``client_connections`` item with a |
+| | state of ``absent``. Once the |
+| | Ansible playbook is ran, the client |
+| | will be purged from the system, but |
+| | the disks will remain defined to |
+| | Linux IO for potential reuse. |
++--------------------------------------+--------------------------------------+
+
+Once a change has been made, rerun the Ansible playbook to apply the
+change across the iSCSI gateway nodes.
+
+::
+
+ # ansible-playbook ceph-iscsi-gw.yml
+
+**Removing the Configuration:**
+
+The ``ceph-ansible`` package provides an Ansible playbook to
+remove the iSCSI gateway configuration and related RBD images. The
+Ansible playbook is ``/usr/share/ceph-ansible/purge_gateways.yml``. When
+this Ansible playbook is ran a prompted for the type of purge to
+perform:
+
+*lio* :
+
+In this mode the LIO configuration is purged on all iSCSI gateways that
+are defined. Disks that were created are left untouched within the Ceph
+storage cluster.
+
+*all* :
+
+When ``all`` is chosen, the LIO configuration is removed together with
+**all** RBD images that were defined within the iSCSI gateway
+environment, other unrelated RBD images will not be removed. Ensure the
+correct mode is chosen, this operation will delete data.
+
+.. warning::
+ A purge operation is destructive action against your iSCSI gateway
+ environment.
+
+.. warning::
+ A purge operation will fail, if RBD images have snapshots or clones
+ and are exported through the Ceph iSCSI gateway.
+
+::
+
+ [root@rh7-iscsi-client ceph-ansible]# ansible-playbook purge_gateways.yml
+ Which configuration elements should be purged? (all, lio or abort) [abort]: all
+
+
+ PLAY [Confirm removal of the iSCSI gateway configuration] *********************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [localhost]
+
+
+ TASK: [Exit playbook if user aborted the purge] *******************************
+ skipping: [localhost]
+
+
+ TASK: [set_fact ] *************************************************************
+ ok: [localhost]
+
+
+ PLAY [Removing the gateway configuration] *************************************
+
+
+ GATHERING FACTS ***************************************************************
+ ok: [ceph-igw-1]
+ ok: [ceph-igw-2]
+
+
+ TASK: [igw_purge | purging the gateway configuration] *************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ TASK: [igw_purge | deleting configured rbd devices] ***************************
+ changed: [ceph-igw-1]
+ changed: [ceph-igw-2]
+
+
+ PLAY RECAP ********************************************************************
+ ceph-igw-1 : ok=3 changed=2 unreachable=0 failed=0
+ ceph-igw-2 : ok=3 changed=2 unreachable=0 failed=0
+ localhost : ok=2 changed=0 unreachable=0 failed=0
+
+
+.. _Configuring the iSCSI Target using the Command Line Interface: ../iscsi-target-cli
diff --git a/src/ceph/doc/rbd/iscsi-target-cli.rst b/src/ceph/doc/rbd/iscsi-target-cli.rst
new file mode 100644
index 0000000..6da6f10
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-target-cli.rst
@@ -0,0 +1,163 @@
+=============================================================
+Configuring the iSCSI Target using the Command Line Interface
+=============================================================
+
+The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client
+node. The Ceph iSCSI gateway can be a standalone node or be colocated on
+a Ceph Object Store Disk (OSD) node. Completing the following steps will
+install, and configure the Ceph iSCSI gateway for basic operation.
+
+**Requirements:**
+
+- A running Ceph Luminous or later storage cluster
+
+- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer
+
+- The following packages must be installed from your Linux distribution's software repository:
+
+ - ``targetcli-2.1.fb47`` or newer package
+
+ - ``python-rtslib-2.1.fb64`` or newer package
+
+ - ``tcmu-runner-1.3.0`` or newer package
+
+ - ``ceph-iscsi-config-2.3`` or newer package
+
+ - ``ceph-iscsi-cli-2.5`` or newer package
+
+ .. important::
+ If previous versions of these packages exist, then they must
+ be removed first before installing the newer versions.
+
+Do the following steps on the Ceph iSCSI gateway node before proceeding
+to the *Installing* section:
+
+#. If the Ceph iSCSI gateway is not colocated on an OSD node, then copy
+ the Ceph configuration files, located in ``/etc/ceph/``, from a
+ running Ceph node in the storage cluster to the iSCSI Gateway node.
+ The Ceph configuration files must exist on the iSCSI gateway node
+ under ``/etc/ceph/``.
+
+#. Install and configure the `Ceph Command-line
+ Interface <http://docs.ceph.com/docs/master/start/quick-rbd/#install-ceph>`_
+
+#. If needed, open TCP ports 3260 and 5000 on the firewall.
+
+#. Create a new or use an existing RADOS Block Device (RBD).
+
+**Installing:**
+
+#. As ``root``, on all iSCSI gateway nodes, install the
+ ``ceph-iscsi-cli`` package:
+
+ ::
+
+ # yum install ceph-iscsi-cli
+
+#. As ``root``, on all iSCSI gateway nodes, install the ``tcmu-runner``
+ package:
+
+ ::
+
+ # yum install tcmu-runner
+
+#. As ``root``, on a iSCSI gateway node, create a file named
+ ``iscsi-gateway.cfg`` in the ``/etc/ceph/`` directory:
+
+ ::
+
+ # touch /etc/ceph/iscsi-gateway.cfg
+
+ #. Edit the ``iscsi-gateway.cfg`` file and add the following lines:
+
+ ::
+
+ [config]
+ # Name of the Ceph storage cluster. A suitable Ceph configuration file allowing
+ # access to the Ceph storage cluster from the gateway node is required, if not
+ # colocated on an OSD node.
+ cluster_name = ceph
+
+ # Place a copy of the ceph cluster's admin keyring in the gateway's /etc/ceph
+ # drectory and reference the filename here
+ gateway_keyring = ceph.client.admin.keyring
+
+
+ # API settings.
+ # The API supports a number of options that allow you to tailor it to your
+ # local environment. If you want to run the API under https, you will need to
+ # create cert/key files that are compatible for each iSCSI gateway node, that is
+ # not locked to a specific node. SSL cert and key files *must* be called
+ # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory
+ # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true'
+ # to switch to https mode.
+
+ # To support the API, the bear minimum settings are:
+ api_secure = false
+
+ # Additional API configuration options are as follows, defaults shown.
+ # api_user = admin
+ # api_password = admin
+ # api_port = 5001
+ # trusted_ip_list = 192.168.0.10,192.168.0.11
+
+ .. important::
+ The ``iscsi-gateway.cfg`` file must be identical on all iSCSI gateway nodes.
+
+ #. As ``root``, copy the ``iscsi-gateway.cfg`` file to all iSCSI
+ gateway nodes.
+
+#. As ``root``, on all iSCSI gateway nodes, enable and start the API
+ service:
+
+ ::
+
+ # systemctl enable rbd-target-api
+ # systemctl start rbd-target-api
+
+**Configuring:**
+
+#. As ``root``, on a iSCSI gateway node, start the iSCSI gateway
+ command-line interface:
+
+ ::
+
+ # gwcli
+
+#. Creating the iSCSI gateways:
+
+ ::
+
+ >/iscsi-target create iqn.2003-01.com.redhat.iscsi-gw:<target_name>
+ > goto gateways
+ > create <iscsi_gw_name> <IP_addr_of_gw>
+ > create <iscsi_gw_name> <IP_addr_of_gw>
+
+#. Adding a RADOS Block Device (RBD):
+
+ ::
+
+ > cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:<target_name>/disks/
+ >/disks/ create pool=<pool_name> image=<image_name> size=<image_size>m|g|t
+
+#. Creating a client:
+
+ ::
+
+ > goto hosts
+ > create iqn.1994-05.com.redhat:<client_name>
+ > auth chap=<user_name>/<password> | nochap
+
+
+ .. warning::
+ CHAP must always be configured. Without CHAP, the target will
+ reject any login requests.
+
+#. Adding disks to a client:
+
+ ::
+
+ >/iscsi-target..eph-igw/hosts> cd iqn.1994-05.com.redhat:<client_name>
+ > disk add <pool_name>.<image_name>
+
+The next step is to configure the iSCSI initiators.
diff --git a/src/ceph/doc/rbd/iscsi-targets.rst b/src/ceph/doc/rbd/iscsi-targets.rst
new file mode 100644
index 0000000..b7dcac7
--- /dev/null
+++ b/src/ceph/doc/rbd/iscsi-targets.rst
@@ -0,0 +1,27 @@
+=============
+iSCSI Targets
+=============
+
+Traditionally, block-level access to a Ceph storage cluster has been
+limited to QEMU and ``librbd``, which is a key enabler for adoption
+within OpenStack environments. Starting with the Ceph Luminous release,
+block-level access is expanding to offer standard iSCSI support allowing
+wider platform usage, and potentially opening new use cases.
+
+- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer
+
+- A working Ceph Storage cluster, deployed with ``ceph-ansible`` or using the command-line interface
+
+- iSCSI gateways nodes, which can either be colocated with OSD nodes or on dedicated nodes
+
+- Separate network subnets for iSCSI front-end traffic and Ceph back-end traffic
+
+A choice of using Ansible or the command-line interface are the
+available deployment methods for installing and configuring the Ceph
+iSCSI gateway:
+
+.. toctree::
+ :maxdepth: 1
+
+ Using Ansible <iscsi-target-ansible>
+ Using the Command Line Interface <iscsi-target-cli>
diff --git a/src/ceph/doc/rbd/libvirt.rst b/src/ceph/doc/rbd/libvirt.rst
new file mode 100644
index 0000000..f953b1f
--- /dev/null
+++ b/src/ceph/doc/rbd/libvirt.rst
@@ -0,0 +1,319 @@
+=================================
+ Using libvirt with Ceph RBD
+=================================
+
+.. index:: Ceph Block Device; livirt
+
+The ``libvirt`` library creates a virtual machine abstraction layer between
+hypervisor interfaces and the software applications that use them. With
+``libvirt``, developers and system administrators can focus on a common
+management framework, common API, and common shell interface (i.e., ``virsh``)
+to many different hypervisors, including:
+
+- QEMU/KVM
+- XEN
+- LXC
+- VirtualBox
+- etc.
+
+Ceph block devices support QEMU/KVM. You can use Ceph block devices with
+software that interfaces with ``libvirt``. The following stack diagram
+illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
+
+
+.. ditaa:: +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+The most common ``libvirt`` use case involves providing Ceph block devices to
+cloud solutions like OpenStack or CloudStack. The cloud solution uses
+``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
+devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
+and CloudStack`_ for details. See `Installation`_ for installation details.
+
+You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
+``libvirt`` API. See `libvirt Virtualization API`_ for details.
+
+
+To create VMs that use Ceph block devices, use the procedures in the following
+sections. In the exemplary embodiment, we have used ``libvirt-pool`` for the pool
+name, ``client.libvirt`` for the user name, and ``new-libvirt-image`` for the
+image name. You may use any value you like, but ensure you replace those values
+when executing commands in the subsequent procedures.
+
+
+Configuring Ceph
+================
+
+To configure Ceph for use with ``libvirt``, perform the following steps:
+
+#. `Create a pool`_. The following example uses the
+ pool name ``libvirt-pool`` with 128 placement groups. ::
+
+ ceph osd pool create libvirt-pool 128 128
+
+ Verify the pool exists. ::
+
+ ceph osd lspools
+
+#. Use the ``rbd`` tool to initialize the pool for use by RBD::
+
+ rbd pool init <pool-name>
+
+#. `Create a Ceph User`_ (or use ``client.admin`` for version 0.9.7 and
+ earlier). The following example uses the Ceph user name ``client.libvirt``
+ and references ``libvirt-pool``. ::
+
+ ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool'
+
+ Verify the name exists. ::
+
+ ceph auth ls
+
+ **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
+ not the Ceph name ``client.libvirt``. See `User Management - User`_ and
+ `User Management - CLI`_ for a detailed explanation of the difference
+ between ID and name.
+
+#. Use QEMU to `create an image`_ in your RBD pool.
+ The following example uses the image name ``new-libvirt-image``
+ and references ``libvirt-pool``. ::
+
+ qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G
+
+ Verify the image exists. ::
+
+ rbd -p libvirt-pool ls
+
+ **NOTE:** You can also use `rbd create`_ to create an image, but we
+ recommend ensuring that QEMU is working properly.
+
+.. tip:: Optionally, if you wish to enable debug logs and the admin socket for
+ this client, you can add the following section to ``/etc/ceph/ceph.conf``::
+
+ [client.libvirt]
+ log file = /var/log/ceph/qemu-guest-$pid.log
+ admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
+
+ The ``client.libvirt`` section name should match the cephx user you created
+ above. If SELinux or AppArmor is enabled, note that this could prevent the
+ client process (qemu via libvirt) from writing the logs or admin socket to
+ the destination locations (``/var/log/ceph`` or ``/var/run/ceph``).
+
+
+
+Preparing the VM Manager
+========================
+
+You may use ``libvirt`` without a VM manager, but you may find it simpler to
+create your first domain with ``virt-manager``.
+
+#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. ::
+
+ sudo apt-get install virt-manager
+
+#. Download an OS image (if necessary).
+
+#. Launch the virtual machine manager. ::
+
+ sudo virt-manager
+
+
+
+Creating a VM
+=============
+
+To create a VM with ``virt-manager``, perform the following steps:
+
+#. Press the **Create New Virtual Machine** button.
+
+#. Name the new virtual machine domain. In the exemplary embodiment, we
+ use the name ``libvirt-virtual-machine``. You may use any name you wish,
+ but ensure you replace ``libvirt-virtual-machine`` with the name you
+ choose in subsequent commandline and configuration examples. ::
+
+ libvirt-virtual-machine
+
+#. Import the image. ::
+
+ /path/to/image/recent-linux.img
+
+ **NOTE:** Import a recent image. Some older images may not rescan for
+ virtual devices properly.
+
+#. Configure and start the VM.
+
+#. You may use ``virsh list`` to verify the VM domain exists. ::
+
+ sudo virsh list
+
+#. Login to the VM (root/root)
+
+#. Stop the VM before configuring it for use with Ceph.
+
+
+Configuring the VM
+==================
+
+When configuring the VM for use with Ceph, it is important to use ``virsh``
+where appropriate. Additionally, ``virsh`` commands often require root
+privileges (i.e., ``sudo``) and will not return appropriate results or notify
+you that that root privileges are required. For a reference of ``virsh``
+commands, refer to `Virsh Command Reference`_.
+
+
+#. Open the configuration file with ``virsh edit``. ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Under ``<devices>`` there should be a ``<disk>`` entry. ::
+
+ <devices>
+ <emulator>/usr/bin/kvm</emulator>
+ <disk type='file' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source file='/path/to/image/recent-linux.img'/>
+ <target dev='vda' bus='virtio'/>
+ <address type='drive' controller='0' bus='0' unit='0'/>
+ </disk>
+
+
+ Replace ``/path/to/image/recent-linux.img`` with the path to the OS image.
+ The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
+ `Virtio`_ for details.
+
+ **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
+ the configuration file under ``/etc/libvirt/qemu`` with a text editor,
+ ``libvirt`` may not recognize the change. If there is a discrepancy between
+ the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
+ ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
+ properly.
+
+
+#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
+
+ <disk type='network' device='disk'>
+ <source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
+ <host name='{monitor-host}' port='6789'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+ </disk>
+
+ Replace ``{monitor-host}`` with the name of your host, and replace the
+ pool and/or image name as necessary. You may add multiple ``<host>``
+ entries for your Ceph monitors. The ``dev`` attribute is the logical
+ device name that will appear under the ``/dev`` directory of your
+ VM. The optional ``bus`` attribute indicates the type of disk device to
+ emulate. The valid settings are driver specific (e.g., "ide", "scsi",
+ "virtio", "xen", "usb" or "sata").
+
+ See `Disks`_ for details of the ``<disk>`` element, and its child elements
+ and attributes.
+
+#. Save the file.
+
+#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
+ default), you must generate a secret. ::
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <usage type='ceph'>
+ <name>client.libvirt secret</name>
+ </usage>
+ </secret>
+ EOF
+
+#. Define the secret. ::
+
+ sudo virsh secret-define --file secret.xml
+ <uuid of secret is output here>
+
+#. Get the ``client.libvirt`` key and save the key string to a file. ::
+
+ ceph auth get-key client.libvirt | sudo tee client.libvirt.key
+
+#. Set the UUID of the secret. ::
+
+ sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
+
+ You must also set the secret manually by adding the following ``<auth>``
+ entry to the ``<disk>`` element you entered earlier (replacing the
+ ``uuid`` value with the result from the command line example above). ::
+
+ sudo virsh edit {vm-domain-name}
+
+ Then, add ``<auth></auth>`` element to the domain configuration file::
+
+ ...
+ </source>
+ <auth username='libvirt'>
+ <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/>
+ </auth>
+ <target ...
+
+
+ **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
+ ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
+ you use the ID component of the Ceph name you generated. If for some reason
+ you need to regenerate the secret, you will have to execute
+ ``sudo virsh secret-undefine {uuid}`` before executing
+ ``sudo virsh secret-set-value`` again.
+
+
+Summary
+=======
+
+Once you have configured the VM for use with Ceph, you can start the VM.
+To verify that the VM and Ceph are communicating, you may perform the
+following procedures.
+
+
+#. Check to see if Ceph is running::
+
+ ceph health
+
+#. Check to see if the VM is running. ::
+
+ sudo virsh list
+
+#. Check to see if the VM is communicating with Ceph. Replace
+ ``{vm-domain-name}`` with the name of your VM domain::
+
+ sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block'
+
+#. Check to see if the device from ``<target dev='hdb' bus='ide'/>`` appears
+ under ``/dev`` or under ``proc/partitions``. ::
+
+ ls dev
+ cat proc/partitions
+
+If everything looks okay, you may begin using the Ceph block device
+within your VM.
+
+
+.. _Installation: ../../install
+.. _libvirt Virtualization API: http://www.libvirt.org
+.. _Block Devices and OpenStack: ../rbd-openstack
+.. _Block Devices and CloudStack: ../rbd-cloudstack
+.. _Create a pool: ../../rados/operations/pools#create-a-pool
+.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user
+.. _create an image: ../qemu-rbd#creating-images-with-qemu
+.. _Virsh Command Reference: http://www.libvirt.org/virshcmdref.html
+.. _KVM/VirtManager: https://help.ubuntu.com/community/KVM/VirtManager
+.. _Ceph Authentication: ../../rados/configuration/auth-config-ref
+.. _Disks: http://www.libvirt.org/formatdomain.html#elementsDisks
+.. _rbd create: ../rados-rbd-cmds#creating-a-block-device-image
+.. _User Management - User: ../../rados/operations/user-management#user
+.. _User Management - CLI: ../../rados/operations/user-management#command-line-usage
+.. _Virtio: http://www.linux-kvm.org/page/Virtio
diff --git a/src/ceph/doc/rbd/man/index.rst b/src/ceph/doc/rbd/man/index.rst
new file mode 100644
index 0000000..33a192a
--- /dev/null
+++ b/src/ceph/doc/rbd/man/index.rst
@@ -0,0 +1,16 @@
+============================
+ Ceph Block Device Manpages
+============================
+
+.. toctree::
+ :maxdepth: 1
+
+ rbd <../../man/8/rbd>
+ rbd-fuse <../../man/8/rbd-fuse>
+ rbd-nbd <../../man/8/rbd-nbd>
+ rbd-ggate <../../man/8/rbd-ggate>
+ ceph-rbdnamer <../../man/8/ceph-rbdnamer>
+ rbd-replay-prep <../../man/8/rbd-replay-prep>
+ rbd-replay <../../man/8/rbd-replay>
+ rbd-replay-many <../../man/8/rbd-replay-many>
+ rbd-map <../../man/8/rbdmap>
diff --git a/src/ceph/doc/rbd/qemu-rbd.rst b/src/ceph/doc/rbd/qemu-rbd.rst
new file mode 100644
index 0000000..80c5dcc
--- /dev/null
+++ b/src/ceph/doc/rbd/qemu-rbd.rst
@@ -0,0 +1,218 @@
+========================
+ QEMU and Block Devices
+========================
+
+.. index:: Ceph Block Device; QEMU KVM
+
+The most frequent Ceph Block Device use case involves providing block device
+images to virtual machines. For example, a user may create a "golden" image
+with an OS and any relevant software in an ideal configuration. Then, the user
+takes a snapshot of the image. Finally, the user clones the snapshot (usually
+many times). See `Snapshots`_ for details. The ability to make copy-on-write
+clones of a snapshot means that Ceph can provision block device images to
+virtual machines quickly, because the client doesn't have to download an entire
+image each time it spins up a new virtual machine.
+
+
+.. ditaa:: +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+
+Ceph Block Devices can integrate with the QEMU virtual machine. For details on
+QEMU, see `QEMU Open Source Processor Emulator`_. For QEMU documentation, see
+`QEMU Manual`_. For installation details, see `Installation`_.
+
+.. important:: To use Ceph Block Devices with QEMU, you must have access to a
+ running Ceph cluster.
+
+
+Usage
+=====
+
+The QEMU command line expects you to specify the pool name and image name. You
+may also specify a snapshot name.
+
+QEMU will assume that the Ceph configuration file resides in the default
+location (e.g., ``/etc/ceph/$cluster.conf``) and that you are executing
+commands as the default ``client.admin`` user unless you expressly specify
+another Ceph configuration file path or another user. When specifying a user,
+QEMU uses the ``ID`` rather than the full ``TYPE:ID``. See `User Management -
+User`_ for details. Do not prepend the client type (i.e., ``client.``) to the
+beginning of the user ``ID``, or you will receive an authentication error. You
+should have the key for the ``admin`` user or the key of another user you
+specify with the ``:id={user}`` option in a keyring file stored in default path
+(i.e., ``/etc/ceph`` or the local directory with appropriate file ownership and
+permissions. Usage takes the following form::
+
+ qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...]
+
+For example, specifying the ``id`` and ``conf`` options might look like the following::
+
+ qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf
+
+.. tip:: Configuration values containing ``:``, ``@``, or ``=`` can be escaped with a
+ leading ``\`` character.
+
+
+Creating Images with QEMU
+=========================
+
+You can create a block device image from QEMU. You must specify ``rbd``, the
+pool name, and the name of the image you wish to create. You must also specify
+the size of the image. ::
+
+ qemu-img create -f raw rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img create -f raw rbd:data/foo 10G
+
+.. important:: The ``raw`` data format is really the only sensible
+ ``format`` option to use with RBD. Technically, you could use other
+ QEMU-supported formats (such as ``qcow2`` or ``vmdk``), but doing
+ so would add additional overhead, and would also render the volume
+ unsafe for virtual machine live migration when caching (see below)
+ is enabled.
+
+
+Resizing Images with QEMU
+=========================
+
+You can resize a block device image from QEMU. You must specify ``rbd``,
+the pool name, and the name of the image you wish to resize. You must also
+specify the size of the image. ::
+
+ qemu-img resize rbd:{pool-name}/{image-name} {size}
+
+For example::
+
+ qemu-img resize rbd:data/foo 10G
+
+
+Retrieving Image Info with QEMU
+===============================
+
+You can retrieve block device image information from QEMU. You must
+specify ``rbd``, the pool name, and the name of the image. ::
+
+ qemu-img info rbd:{pool-name}/{image-name}
+
+For example::
+
+ qemu-img info rbd:data/foo
+
+
+Running QEMU with RBD
+=====================
+
+QEMU can pass a block device from the host on to a guest, but since
+QEMU 0.15, there's no need to map an image as a block device on
+the host. Instead, QEMU can access an image as a virtual block
+device directly via ``librbd``. This performs better because it avoids
+an additional context switch, and can take advantage of `RBD caching`_.
+
+You can use ``qemu-img`` to convert existing virtual machine images to Ceph
+block device images. For example, if you have a qcow2 image, you could run::
+
+ qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze
+
+To run a virtual machine booting from that image, you could run::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze
+
+`RBD caching`_ can significantly improve performance.
+Since QEMU 1.2, QEMU's cache options control ``librbd`` caching::
+
+ qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback
+
+If you have an older version of QEMU, you can set the ``librbd`` cache
+configuration (like any Ceph configuration option) as part of the
+'file' parameter::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback
+
+.. important:: If you set rbd_cache=true, you must set cache=writeback
+ or risk data loss. Without cache=writeback, QEMU will not send
+ flush requests to librbd. If QEMU exits uncleanly in this
+ configuration, filesystems on top of rbd can be corrupted.
+
+.. _RBD caching: ../rbd-config-ref/#rbd-cache-config-settings
+
+
+.. index:: Ceph Block Device; discard trim and libvirt
+
+Enabling Discard/TRIM
+=====================
+
+Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support the
+discard operation. This means that a guest can send TRIM requests to let a Ceph
+block device reclaim unused space. This can be enabled in the guest by mounting
+``ext4`` or ``XFS`` with the ``discard`` option.
+
+For this to be available to the guest, it must be explicitly enabled
+for the block device. To do this, you must specify a
+``discard_granularity`` associated with the drive::
+
+ qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \
+ -device driver=ide-hd,drive=drive1,discard_granularity=512
+
+Note that this uses the IDE driver. The virtio driver does not
+support discard.
+
+If using libvirt, edit your libvirt domain's configuration file using ``virsh
+edit`` to include the ``xmlns:qemu`` value. Then, add a ``qemu:commandline``
+block as a child of that domain. The following example shows how to set two
+devices with ``qemu id=`` to different ``discard_granularity`` values.
+
+.. code-block:: guess
+
+ <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
+ <qemu:commandline>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/>
+ <qemu:arg value='-set'/>
+ <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/>
+ </qemu:commandline>
+ </domain>
+
+
+.. index:: Ceph Block Device; cache options
+
+QEMU Cache Options
+==================
+
+QEMU's cache options correspond to the following Ceph `RBD Cache`_ settings.
+
+Writeback::
+
+ rbd_cache = true
+
+Writethrough::
+
+ rbd_cache = true
+ rbd_cache_max_dirty = 0
+
+None::
+
+ rbd_cache = false
+
+QEMU's cache settings override Ceph's cache settings (including settings that
+are explicitly set in the Ceph configuration file).
+
+.. note:: Prior to QEMU v2.4.0, if you explicitly set `RBD Cache`_ settings
+ in the Ceph configuration file, your Ceph settings override the QEMU cache
+ settings.
+
+.. _QEMU Open Source Processor Emulator: http://wiki.qemu.org/Main_Page
+.. _QEMU Manual: http://wiki.qemu.org/Manual
+.. _RBD Cache: ../rbd-config-ref/
+.. _Snapshots: ../rbd-snapshot/
+.. _Installation: ../../install
+.. _User Management - User: ../../rados/operations/user-management#user
diff --git a/src/ceph/doc/rbd/rados-rbd-cmds.rst b/src/ceph/doc/rbd/rados-rbd-cmds.rst
new file mode 100644
index 0000000..65f7737
--- /dev/null
+++ b/src/ceph/doc/rbd/rados-rbd-cmds.rst
@@ -0,0 +1,223 @@
+=======================
+ Block Device Commands
+=======================
+
+.. index:: Ceph Block Device; image management
+
+The ``rbd`` command enables you to create, list, introspect and remove block
+device images. You can also use it to clone images, create snapshots,
+rollback an image to a snapshot, view a snapshot, etc. For details on using
+the ``rbd`` command, see `RBD – Manage RADOS Block Device (RBD) Images`_ for
+details.
+
+.. important:: To use Ceph Block Device commands, you must have access to
+ a running Ceph cluster.
+
+Create a Block Device Pool
+==========================
+
+#. On the admin node, use the ``ceph`` tool to `create a pool`_.
+
+#. On the admin node, use the ``rbd`` tool to initialize the pool for use by RBD::
+
+ rbd pool init <pool-name>
+
+.. note:: The ``rbd`` tool assumes a default pool name of 'rbd' when not
+ provided.
+
+Create a Block Device User
+==========================
+
+Unless specified, the ``rbd`` command will access the Ceph cluster using the ID
+``admin``. This ID allows full administrative access to the cluster. It is
+recommended that you utilize a more restricted user wherever possible.
+
+To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]'
+
+For example, to create a user ID named ``qemu`` with read-write access to the
+pool ``vms`` and read-only access to the pool ``images``, execute the
+following::
+
+ ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images'
+
+The output from the ``ceph auth get-or-create`` command will be the keyring for
+the specified user, which can be written to ``/etc/ceph/ceph.client.{ID}.keyring``.
+
+.. note:: The user ID can be specified when using the ``rbd`` command by
+ providing the ``--id {id}`` optional argument.
+
+Creating a Block Device Image
+=============================
+
+Before you can add a block device to a node, you must create an image for it in
+the :term:`Ceph Storage Cluster` first. To create a block device image, execute
+the following::
+
+ rbd create --size {megabytes} {pool-name}/{image-name}
+
+For example, to create a 1GB image named ``bar`` that stores information in a
+pool named ``swimmingpool``, execute the following::
+
+ rbd create --size 1024 swimmingpool/bar
+
+If you don't specify pool when creating an image, it will be stored in the
+default pool ``rbd``. For example, to create a 1GB image named ``foo`` stored in
+the default pool ``rbd``, execute the following::
+
+ rbd create --size 1024 foo
+
+.. note:: You must create a pool first before you can specify it as a
+ source. See `Storage Pools`_ for details.
+
+Listing Block Device Images
+===========================
+
+To list block devices in the ``rbd`` pool, execute the following
+(i.e., ``rbd`` is the default pool name)::
+
+ rbd ls
+
+To list block devices in a particular pool, execute the following,
+but replace ``{poolname}`` with the name of the pool::
+
+ rbd ls {poolname}
+
+For example::
+
+ rbd ls swimmingpool
+
+To list deferred delete block devices in the ``rbd`` pool, execute the
+following::
+
+ rbd trash ls
+
+To list deferred delete block devices in a particular pool, execute the
+following, but replace ``{poolname}`` with the name of the pool::
+
+ rbd trash ls {poolname}
+
+For example::
+
+ rbd trash ls swimmingpool
+
+Retrieving Image Information
+============================
+
+To retrieve information from a particular image, execute the following,
+but replace ``{image-name}`` with the name for the image::
+
+ rbd info {image-name}
+
+For example::
+
+ rbd info foo
+
+To retrieve information from an image within a pool, execute the following,
+but replace ``{image-name}`` with the name of the image and replace ``{pool-name}``
+with the name of the pool::
+
+ rbd info {pool-name}/{image-name}
+
+For example::
+
+ rbd info swimmingpool/bar
+
+Resizing a Block Device Image
+=============================
+
+:term:`Ceph Block Device` images are thin provisioned. They don't actually use
+any physical storage until you begin saving data to them. However, they do have
+a maximum capacity that you set with the ``--size`` option. If you want to
+increase (or decrease) the maximum size of a Ceph Block Device image, execute
+the following::
+
+ rbd resize --size 2048 foo (to increase)
+ rbd resize --size 2048 foo --allow-shrink (to decrease)
+
+
+Removing a Block Device Image
+=============================
+
+To remove a block device, execute the following, but replace ``{image-name}``
+with the name of the image you want to remove::
+
+ rbd rm {image-name}
+
+For example::
+
+ rbd rm foo
+
+To remove a block device from a pool, execute the following, but replace
+``{image-name}`` with the name of the image to remove and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd rm {pool-name}/{image-name}
+
+For example::
+
+ rbd rm swimmingpool/bar
+
+To defer delete a block device from a pool, execute the following, but
+replace ``{image-name}`` with the name of the image to move and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd trash mv {pool-name}/{image-name}
+
+For example::
+
+ rbd trash mv swimmingpool/bar
+
+To remove a deferred block device from a pool, execute the following, but
+replace ``{image-id}`` with the id of the image to remove and replace
+``{pool-name}`` with the name of the pool::
+
+ rbd trash rm {pool-name}/{image-id}
+
+For example::
+
+ rbd trash rm swimmingpool/2bf4474b0dc51
+
+.. note::
+
+ * You can move an image to the trash even it has shapshot(s) or actively
+ in-use by clones, but can not be removed from trash.
+
+ * You can use *--delay* to set the defer time (default is 0), and if its
+ deferment time has not expired, it can not be removed unless you use
+ force.
+
+Restoring a Block Device Image
+==============================
+
+To restore a deferred delete block device in the rbd pool, execute the
+following, but replace ``{image-id}`` with the id of the image::
+
+ rbd trash restore {image-d}
+
+For example::
+
+ rbd trash restore 2bf4474b0dc51
+
+To restore a deferred delete block device in a particular pool, execute
+the following, but replace ``{image-id}`` with the id of the image and
+replace ``{pool-name}`` with the name of the pool::
+
+ rbd trash restore {pool-name}/{image-id}
+
+For example::
+
+ rbd trash restore swimmingpool/2bf4474b0dc51
+
+Also you can use *--image* to rename the iamge when restore it, for
+example::
+
+ rbd trash restore swimmingpool/2bf4474b0dc51 --image new-name
+
+
+.. _create a pool: ../../rados/operations/pools/#create-a-pool
+.. _Storage Pools: ../../rados/operations/pools
+.. _RBD – Manage RADOS Block Device (RBD) Images: ../../man/8/rbd/
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
diff --git a/src/ceph/doc/rbd/rbd-cloudstack.rst b/src/ceph/doc/rbd/rbd-cloudstack.rst
new file mode 100644
index 0000000..f66d6d4
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-cloudstack.rst
@@ -0,0 +1,135 @@
+=============================
+ Block Devices and CloudStack
+=============================
+
+You may use Ceph Block Device images with CloudStack 4.0 and higher through
+``libvirt``, which configures the QEMU interface to ``librbd``. Ceph stripes
+block device images as objects across the cluster, which means that large Ceph
+Block Device images have better performance than a standalone server!
+
+To use Ceph Block Devices with CloudStack 4.0 and higher, you must install QEMU,
+``libvirt``, and CloudStack first. We recommend using a separate physical host
+for your CloudStack installation. CloudStack recommends a minimum of 4GB of RAM
+and a dual-core processor, but more CPU and RAM will perform better. The
+following diagram depicts the CloudStack/Ceph technology stack.
+
+
+.. ditaa:: +---------------------------------------------------+
+ | CloudStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with CloudStack, you must have
+ access to a running Ceph Storage Cluster.
+
+CloudStack integrates with Ceph's block devices to provide CloudStack with a
+back end for CloudStack's Primary Storage. The instructions below detail the
+setup for CloudStack Primary Storage.
+
+.. note:: We recommend installing with Ubuntu 14.04 or later so that
+ you can use package installation instead of having to compile
+ libvirt from source.
+
+Installing and configuring QEMU for use with CloudStack doesn't require any
+special handling. Ensure that you have a running Ceph Storage Cluster. Install
+QEMU and configure it for use with Ceph; then, install ``libvirt`` version
+0.9.13 or higher (you may need to compile from source) and ensure it is running
+with Ceph.
+
+
+.. note:: Ubuntu 14.04 and CentOS 7.2 will have ``libvirt`` with RBD storage
+ pool support enabled by default.
+
+.. index:: pools; CloudStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. Create a pool for
+CloudStack NFS Primary Storage. Ensure your Ceph cluster is running, then create
+the pool. ::
+
+ ceph osd pool create cloudstack
+
+See `Create a Pool`_ for details on specifying the number of placement groups
+for your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+A newly created pool must initialized prior to use. Use the ``rbd`` tool
+to initialize the pool::
+
+ rbd pool init cloudstack
+
+Create a Ceph User
+==================
+
+To access the Ceph cluster we require a Ceph user which has the correct
+credentials to access the ``cloudstack`` pool we just created. Although we could
+use ``client.admin`` for this, it's recommended to create a user with only
+access to the ``cloudstack`` pool. ::
+
+ ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack'
+
+Use the information returned by the command in the next step when adding the
+Primary Storage.
+
+See `User Management`_ for additional details.
+
+Add Primary Storage
+===================
+
+To add primary storage, refer to `Add Primary Storage (4.2.0)`_ to add a Ceph block device, the steps
+include:
+
+#. Log in to the CloudStack UI.
+#. Click **Infrastructure** on the left side navigation bar.
+#. Select the Zone you want to use for Primary Storage.
+#. Click the **Compute** tab.
+#. Select **View All** on the `Primary Storage` node in the diagram.
+#. Click **Add Primary Storage**.
+#. Follow the CloudStack instructions.
+
+ - For **Protocol**, select ``RBD``.
+ - Add cluster information (cephx is supported). Note: Do not include the ``client.`` part of the user.
+ - Add ``rbd`` as a tag.
+
+
+Create a Disk Offering
+======================
+
+To create a new disk offering, refer to `Create a New Disk Offering (4.2.0)`_.
+Create a disk offering so that it matches the ``rbd`` tag.
+The ``StoragePoolAllocator`` will choose the ``rbd``
+pool when searching for a suitable storage pool. If the disk offering doesn't
+match the ``rbd`` tag, the ``StoragePoolAllocator`` may select the pool you
+created (e.g., ``cloudstack``).
+
+
+Limitations
+===========
+
+- CloudStack will only bind to one monitor (You can however create a Round Robin DNS record over multiple monitors)
+
+
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+.. _Install and Configure QEMU: ../qemu-rbd
+.. _Install and Configure libvirt: ../libvirt
+.. _KVM Hypervisor Host Installation: http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Installation_Guide/hypervisor-kvm-install-flow.html
+.. _Add Primary Storage (4.2.0): http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/primary-storage-add.html
+.. _Create a New Disk Offering (4.2.0): http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/compute-disk-service-offerings.html#creating-disk-offerings
+.. _User Management: ../../rados/operations/user-management
diff --git a/src/ceph/doc/rbd/rbd-config-ref.rst b/src/ceph/doc/rbd/rbd-config-ref.rst
new file mode 100644
index 0000000..db942f8
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-config-ref.rst
@@ -0,0 +1,136 @@
+=======================
+ librbd Settings
+=======================
+
+See `Block Device`_ for additional details.
+
+Cache Settings
+=======================
+
+.. sidebar:: Kernel Caching
+
+ The kernel driver for Ceph block devices can use the Linux page cache to
+ improve performance.
+
+The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
+take advantage of the Linux page cache, so it includes its own in-memory
+caching, called "RBD caching." RBD caching behaves just like well-behaved hard
+disk caching. When the OS sends a barrier or a flush request, all dirty data is
+written to the OSDs. This means that using write-back caching is just as safe as
+using a well-behaved physical hard disk with a VM that properly sends flushes
+(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
+algorithm, and in write-back mode it can coalesce contiguous requests for
+better throughput.
+
+.. versionadded:: 0.46
+
+Ceph supports write-back caching for RBD. To enable it, add ``rbd cache =
+true`` to the ``[client]`` section of your ``ceph.conf`` file. By default
+``librbd`` does not perform any caching. Writes and reads go directly to the
+storage cluster, and writes return only when the data is on disk on all
+replicas. With caching enabled, writes return immediately, unless there are more
+than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers
+writeback and blocks until enough bytes are flushed.
+
+.. versionadded:: 0.47
+
+Ceph supports write-through caching for RBD. You can set the size of
+the cache, and you can set targets and limits to switch from
+write-back caching to write through caching. To enable write-through
+mode, set ``rbd cache max dirty`` to 0. This means writes return only
+when the data is on disk on all replicas, but reads may come from the
+cache. The cache is in memory on the client, and each RBD image has
+its own. Since the cache is local to the client, there's no coherency
+if there are others accessing the image. Running GFS or OCFS on top of
+RBD will not work with caching enabled.
+
+The ``ceph.conf`` file settings for RBD should be set in the ``[client]``
+section of your configuration file. The settings include:
+
+
+``rbd cache``
+
+:Description: Enable caching for RADOS Block Device (RBD).
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+
+``rbd cache size``
+
+:Description: The RBD cache size in bytes.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``32 MiB``
+
+
+``rbd cache max dirty``
+
+:Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching.
+:Type: 64-bit Integer
+:Required: No
+:Constraint: Must be less than ``rbd cache size``.
+:Default: ``24 MiB``
+
+
+``rbd cache target dirty``
+
+:Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache.
+:Type: 64-bit Integer
+:Required: No
+:Constraint: Must be less than ``rbd cache max dirty``.
+:Default: ``16 MiB``
+
+
+``rbd cache max dirty age``
+
+:Description: The number of seconds dirty data is in the cache before writeback starts.
+:Type: Float
+:Required: No
+:Default: ``1.0``
+
+.. versionadded:: 0.60
+
+``rbd cache writethrough until flush``
+
+:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
+:Type: Boolean
+:Required: No
+:Default: ``true``
+
+.. _Block Device: ../../rbd
+
+
+Read-ahead Settings
+=======================
+
+.. versionadded:: 0.86
+
+RBD supports read-ahead/prefetching to optimize small, sequential reads.
+This should normally be handled by the guest OS in the case of a VM,
+but boot loaders may not issue efficient reads.
+Read-ahead is automatically disabled if caching is disabled.
+
+
+``rbd readahead trigger requests``
+
+:Description: Number of sequential read requests necessary to trigger read-ahead.
+:Type: Integer
+:Required: No
+:Default: ``10``
+
+
+``rbd readahead max bytes``
+
+:Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``512 KiB``
+
+
+``rbd readahead disable after bytes``
+
+:Description: After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled.
+:Type: 64-bit Integer
+:Required: No
+:Default: ``50 MiB``
diff --git a/src/ceph/doc/rbd/rbd-ko.rst b/src/ceph/doc/rbd/rbd-ko.rst
new file mode 100644
index 0000000..951757c
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-ko.rst
@@ -0,0 +1,59 @@
+==========================
+ Kernel Module Operations
+==========================
+
+.. index:: Ceph Block Device; kernel module
+
+.. important:: To use kernel module operations, you must have a running Ceph cluster.
+
+Get a List of Images
+====================
+
+To mount a block device image, first return a list of the images. ::
+
+ rbd list
+
+Map a Block Device
+==================
+
+Use ``rbd`` to map an image name to a kernel module. You must specify the
+image name, the pool name, and the user name. ``rbd`` will load RBD kernel
+module on your behalf if it's not already loaded. ::
+
+ sudo rbd map {pool-name}/{image-name} --id {user-name}
+
+For example::
+
+ sudo rbd map rbd/myimage --id admin
+
+If you use `cephx`_ authentication, you must also specify a secret. It may come
+from a keyring or a file containing the secret. ::
+
+ sudo rbd map rbd/myimage --id admin --keyring /path/to/keyring
+ sudo rbd map rbd/myimage --id admin --keyfile /path/to/file
+
+
+Show Mapped Block Devices
+=========================
+
+To show block device images mapped to kernel modules with the ``rbd`` command,
+specify the ``showmapped`` option. ::
+
+ rbd showmapped
+
+
+Unmapping a Block Device
+========================
+
+To unmap a block device image with the ``rbd`` command, specify the ``unmap``
+option and the device name (i.e., by convention the same as the block device
+image name). ::
+
+ sudo rbd unmap /dev/rbd/{poolname}/{imagename}
+
+For example::
+
+ sudo rbd unmap /dev/rbd/rbd/foo
+
+
+.. _cephx: ../../rados/operations/user-management/
diff --git a/src/ceph/doc/rbd/rbd-mirroring.rst b/src/ceph/doc/rbd/rbd-mirroring.rst
new file mode 100644
index 0000000..989f1fc
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-mirroring.rst
@@ -0,0 +1,318 @@
+===============
+ RBD Mirroring
+===============
+
+.. index:: Ceph Block Device; mirroring
+
+RBD images can be asynchronously mirrored between two Ceph clusters. This
+capability uses the RBD journaling image feature to ensure crash-consistent
+replication between clusters. Mirroring is configured on a per-pool basis
+within peer clusters and can be configured to automatically mirror all
+images within a pool or only a specific subset of images. Mirroring is
+configured using the ``rbd`` command. The ``rbd-mirror`` daemon is responsible
+for pulling image updates from the remote, peer cluster and applying them to
+the image within the local cluster.
+
+.. note:: RBD mirroring requires the Ceph Jewel release or later.
+
+.. important:: To use RBD mirroring, you must have two Ceph clusters, each
+ running the ``rbd-mirror`` daemon.
+
+Pool Configuration
+==================
+
+The following procedures demonstrate how to perform the basic administrative
+tasks to configure mirroring using the ``rbd`` command. Mirroring is
+configured on a per-pool basis within the Ceph clusters.
+
+The pool configuration steps should be performed on both peer clusters. These
+procedures assume two clusters, named "local" and "remote", are accessible from
+a single host for clarity.
+
+See the `rbd`_ manpage for additional details of how to connect to different
+Ceph clusters.
+
+.. note:: The cluster name in the following examples corresponds to a Ceph
+ configuration file of the same name (e.g. /etc/ceph/remote.conf). See the
+ `ceph-conf`_ documentation for how to configure multiple clusters.
+
+.. note:: Images in a given pool will be mirrored to a pool with the same name
+ on the remote cluster. Images using a separate data-pool will use a data-pool
+ with the same name on the remote cluster. E.g., if an image being mirrored is
+ in the ``rbd`` pool on the local cluster and using a data-pool called
+ ``rbd-ec``, pools called ``rbd`` and ``rbd-ec`` must exist on the remote
+ cluster and will be used for mirroring the image.
+
+Enable Mirroring
+----------------
+
+To enable mirroring on a pool with ``rbd``, specify the ``mirror pool enable``
+command, the pool name, and the mirroring mode::
+
+ rbd mirror pool enable {pool-name} {mode}
+
+The mirroring mode can either be ``pool`` or ``image``:
+
+* **pool**: When configured in ``pool`` mode, all images in the pool with the
+ journaling feature enabled are mirrored.
+* **image**: When configured in ``image`` mode, mirroring needs to be
+ `explicitly enabled`_ on each image.
+
+For example::
+
+ rbd --cluster local mirror pool enable image-pool pool
+ rbd --cluster remote mirror pool enable image-pool pool
+
+Disable Mirroring
+-----------------
+
+To disable mirroring on a pool with ``rbd``, specify the ``mirror pool disable``
+command and the pool name::
+
+ rbd mirror pool disable {pool-name}
+
+When mirroring is disabled on a pool in this way, mirroring will also be
+disabled on any images (within the pool) for which mirroring was enabled
+explicitly.
+
+For example::
+
+ rbd --cluster local mirror pool disable image-pool
+ rbd --cluster remote mirror pool disable image-pool
+
+Add Cluster Peer
+----------------
+
+In order for the ``rbd-mirror`` daemon to discover its peer cluster, the peer
+needs to be registered to the pool. To add a mirroring peer Ceph cluster with
+``rbd``, specify the ``mirror pool peer add`` command, the pool name, and a
+cluster specification::
+
+ rbd mirror pool peer add {pool-name} {client-name}@{cluster-name}
+
+For example::
+
+ rbd --cluster local mirror pool peer add image-pool client.remote@remote
+ rbd --cluster remote mirror pool peer add image-pool client.local@local
+
+Remove Cluster Peer
+-------------------
+
+To remove a mirroring peer Ceph cluster with ``rbd``, specify the
+``mirror pool peer remove`` command, the pool name, and the peer UUID
+(available from the ``rbd mirror pool info`` command)::
+
+ rbd mirror pool peer remove {pool-name} {peer-uuid}
+
+For example::
+
+ rbd --cluster local mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445
+ rbd --cluster remote mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08
+
+Image Configuration
+===================
+
+Unlike pool configuration, image configuration only needs to be performed against
+a single mirroring peer Ceph cluster.
+
+Mirrored RBD images are designated as either primary or non-primary. This is a
+property of the image and not the pool. Images that are designated as
+non-primary cannot be modified.
+
+Images are automatically promoted to primary when mirroring is first enabled on
+an image (either implicitly if the pool mirror mode was **pool** and the image
+has the journaling image feature enabled, or `explicitly enabled`_ by the
+``rbd`` command).
+
+Enable Image Journaling Support
+-------------------------------
+
+RBD mirroring uses the RBD journaling feature to ensure that the replicated
+image always remains crash-consistent. Before an image can be mirrored to
+a peer cluster, the journaling feature must be enabled. The feature can be
+enabled at image creation time by providing the
+``--image-feature exclusive-lock,journaling`` option to the ``rbd`` command.
+
+Alternatively, the journaling feature can be dynamically enabled on
+pre-existing RBD images. To enable journaling with ``rbd``, specify
+the ``feature enable`` command, the pool and image name, and the feature name::
+
+ rbd feature enable {pool-name}/{image-name} {feature-name}
+
+For example::
+
+ rbd --cluster local feature enable image-pool/image-1 journaling
+
+.. note:: The journaling feature is dependent on the exclusive-lock feature. If
+ the exclusive-lock feature is not already enabled, it should be enabled prior
+ to enabling the journaling feature.
+
+.. tip:: You can enable journaling on all new images by default by adding
+ ``rbd default features = 125`` to your Ceph configuration file.
+
+Enable Image Mirroring
+----------------------
+
+If the mirroring is configured in ``image`` mode for the image's pool, then it
+is necessary to explicitly enable mirroring for each image within the pool.
+To enable mirroring for a specific image with ``rbd``, specify the
+``mirror image enable`` command along with the pool and image name::
+
+ rbd mirror image enable {pool-name}/{image-name}
+
+For example::
+
+ rbd --cluster local mirror image enable image-pool/image-1
+
+Disable Image Mirroring
+-----------------------
+
+To disable mirroring for a specific image with ``rbd``, specify the
+``mirror image disable`` command along with the pool and image name::
+
+ rbd mirror image disable {pool-name}/{image-name}
+
+For example::
+
+ rbd --cluster local mirror image disable image-pool/image-1
+
+Image Promotion and Demotion
+----------------------------
+
+In a failover scenario where the primary designation needs to be moved to the
+image in the peer Ceph cluster, access to the primary image should be stopped
+(e.g. power down the VM or remove the associated drive from a VM), demote the
+current primary image, promote the new primary image, and resume access to the
+image on the alternate cluster.
+
+.. note:: RBD only provides the necessary tools to facilitate an orderly
+ failover of an image. An external mechanism is required to coordinate the
+ full failover process (e.g. closing the image before demotion).
+
+To demote a specific image to non-primary with ``rbd``, specify the
+``mirror image demote`` command along with the pool and image name::
+
+ rbd mirror image demote {pool-name}/{image-name}
+
+For example::
+
+ rbd --cluster local mirror image demote image-pool/image-1
+
+To demote all primary images within a pool to non-primary with ``rbd``, specify
+the ``mirror pool demote`` command along with the pool name::
+
+ rbd mirror pool demote {pool-name}
+
+For example::
+
+ rbd --cluster local mirror pool demote image-pool
+
+To promote a specific image to primary with ``rbd``, specify the
+``mirror image promote`` command along with the pool and image name::
+
+ rbd mirror image promote [--force] {pool-name}/{image-name}
+
+For example::
+
+ rbd --cluster remote mirror image promote image-pool/image-1
+
+To promote all non-primary images within a pool to primary with ``rbd``, specify
+the ``mirror pool promote`` command along with the pool name::
+
+ rbd mirror pool promote [--force] {pool-name}
+
+For example::
+
+ rbd --cluster local mirror pool promote image-pool
+
+.. tip:: Since the primary / non-primary status is per-image, it is possible to
+ have two clusters split the IO load and stage failover / failback.
+
+.. note:: Promotion can be forced using the ``--force`` option. Forced
+ promotion is needed when the demotion cannot be propagated to the peer
+ Ceph cluster (e.g. Ceph cluster failure, communication outage). This will
+ result in a split-brain scenario between the two peers and the image will no
+ longer be in-sync until a `force resync command`_ is issued.
+
+Force Image Resync
+------------------
+
+If a split-brain event is detected by the ``rbd-mirror`` daemon, it will not
+attempt to mirror the affected image until corrected. To resume mirroring for an
+image, first `demote the image`_ determined to be out-of-date and then request a
+resync to the primary image. To request an image resync with ``rbd``, specify the
+``mirror image resync`` command along with the pool and image name::
+
+ rbd mirror image resync {pool-name}/{image-name}
+
+For example::
+
+ rbd mirror image resync image-pool/image-1
+
+.. note:: The ``rbd`` command only flags the image as requiring a resync. The
+ local cluster's ``rbd-mirror`` daemon process is responsible for performing
+ the resync asynchronously.
+
+Mirror Status
+=============
+
+The peer cluster replication status is stored for every primary mirrored image.
+This status can be retrieved using the ``mirror image status`` and
+``mirror pool status`` commands.
+
+To request the mirror image status with ``rbd``, specify the
+``mirror image status`` command along with the pool and image name::
+
+ rbd mirror image status {pool-name}/{image-name}
+
+For example::
+
+ rbd mirror image status image-pool/image-1
+
+To request the mirror pool summary status with ``rbd``, specify the
+``mirror pool status`` command along with the pool name::
+
+ rbd mirror pool status {pool-name}
+
+For example::
+
+ rbd mirror pool status image-pool
+
+.. note:: Adding ``--verbose`` option to the ``mirror pool status`` command will
+ additionally output status details for every mirroring image in the pool.
+
+rbd-mirror Daemon
+=================
+
+The two ``rbd-mirror`` daemons are responsible for watching image journals on the
+remote, peer cluster and replaying the journal events against the local
+cluster. The RBD image journaling feature records all modifications to the
+image in the order they occur. This ensures that a crash-consistent mirror of
+the remote image is available locally.
+
+The ``rbd-mirror`` daemon is available within the optional ``rbd-mirror``
+distribution package.
+
+.. important:: Each ``rbd-mirror`` daemon requires the ability to connect
+ to both clusters simultaneously.
+.. warning:: Pre-Luminous releases: only run a single ``rbd-mirror`` daemon per
+ Ceph cluster.
+
+Each ``rbd-mirror`` daemon should use a unique Ceph user ID. To
+`create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
+command, user name, monitor caps, and OSD caps::
+
+ ceph auth get-or-create client.rbd-mirror.{unique id} mon 'profile rbd' osd 'profile rbd'
+
+The ``rbd-mirror`` daemon can be managed by ``systemd`` by specifying the user
+ID as the daemon instance::
+
+ systemctl enable ceph-rbd-mirror@rbd-mirror.{unique id}
+
+.. _rbd: ../../man/8/rbd
+.. _ceph-conf: ../../rados/configuration/ceph-conf/#running-multiple-clusters
+.. _explicitly enabled: #enable-image-mirroring
+.. _force resync command: #force-image-resync
+.. _demote the image: #image-promotion-and-demotion
+.. _create a Ceph user: ../../rados/operations/user-management#add-a-user
+
diff --git a/src/ceph/doc/rbd/rbd-openstack.rst b/src/ceph/doc/rbd/rbd-openstack.rst
new file mode 100644
index 0000000..db52028
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-openstack.rst
@@ -0,0 +1,512 @@
+=============================
+ Block Devices and OpenStack
+=============================
+
+.. index:: Ceph Block Device; OpenStack
+
+You may use Ceph Block Device images with OpenStack through ``libvirt``, which
+configures the QEMU interface to ``librbd``. Ceph stripes block device images as
+objects across the cluster, which means that large Ceph Block Device images have
+better performance than a standalone server!
+
+To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``,
+and OpenStack first. We recommend using a separate physical node for your
+OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a
+quad-core processor. The following diagram depicts the OpenStack/Ceph
+technology stack.
+
+
+.. ditaa:: +---------------------------------------------------+
+ | OpenStack |
+ +---------------------------------------------------+
+ | libvirt |
+ +------------------------+--------------------------+
+ |
+ | configures
+ v
+ +---------------------------------------------------+
+ | QEMU |
+ +---------------------------------------------------+
+ | librbd |
+ +---------------------------------------------------+
+ | librados |
+ +------------------------+-+------------------------+
+ | OSDs | | Monitors |
+ +------------------------+ +------------------------+
+
+.. important:: To use Ceph Block Devices with OpenStack, you must have
+ access to a running Ceph Storage Cluster.
+
+Three parts of OpenStack integrate with Ceph's block devices:
+
+- **Images**: OpenStack Glance manages images for VMs. Images are immutable.
+ OpenStack treats images as binary blobs and downloads them accordingly.
+
+- **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs,
+ or to attach volumes to running VMs. OpenStack manages volumes using
+ Cinder services.
+
+- **Guest Disks**: Guest disks are guest operating system disks. By default,
+ when you boot a virtual machine, its disk appears as a file on the filesystem
+ of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior
+ to OpenStack Havana, the only way to boot a VM in Ceph was to use the
+ boot-from-volume functionality of Cinder. However, now it is possible to boot
+ every virtual machine inside Ceph directly without using Cinder, which is
+ advantageous because it allows you to perform maintenance operations easily
+ with the live-migration process. Additionally, if your hypervisor dies it is
+ also convenient to trigger ``nova evacuate`` and run the virtual machine
+ elsewhere almost seamlessly.
+
+You can use OpenStack Glance to store images in a Ceph Block Device, and you
+can use Cinder to boot a VM using a copy-on-write clone of an image.
+
+The instructions below detail the setup for Glance, Cinder and Nova, although
+they do not have to be used together. You may store images in Ceph block devices
+while running VMs using a local disk, or vice versa.
+
+.. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
+ Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot
+ from volume), the Glance image format must be ``RAW``.
+
+.. tip:: This document describes using Ceph Block Devices with OpenStack Havana.
+ For earlier versions of OpenStack see
+ `Block Devices and OpenStack (Dumpling)`_.
+
+.. index:: pools; OpenStack
+
+Create a Pool
+=============
+
+By default, Ceph block devices use the ``rbd`` pool. You may use any available
+pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure
+your Ceph cluster is running, then create the pools. ::
+
+ ceph osd pool create volumes 128
+ ceph osd pool create images 128
+ ceph osd pool create backups 128
+ ceph osd pool create vms 128
+
+See `Create a Pool`_ for detail on specifying the number of placement groups for
+your pools, and `Placement Groups`_ for details on the number of placement
+groups you should set for your pools.
+
+Newly created pools must initialized prior to use. Use the ``rbd`` tool
+to initialize the pools::
+
+ rbd pool init volumes
+ rbd pool init images
+ rbd pool init backups
+ rbd pool init vms
+
+.. _Create a Pool: ../../rados/operations/pools#createpool
+.. _Placement Groups: ../../rados/operations/placement-groups
+
+
+Configure OpenStack Ceph Clients
+================================
+
+The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and
+``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file::
+
+ ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf
+
+
+Install Ceph client packages
+----------------------------
+
+On the ``glance-api`` node, you will need the Python bindings for ``librbd``::
+
+ sudo apt-get install python-rbd
+ sudo yum install python-rbd
+
+On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node,
+use both the Python bindings and the client command line tools::
+
+ sudo apt-get install ceph-common
+ sudo yum install ceph-common
+
+
+Setup Ceph Client Authentication
+--------------------------------
+
+If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
+and Glance. Execute the following::
+
+ ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
+ ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images'
+ ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
+
+Add the keyrings for ``client.cinder``, ``client.glance``, and
+``client.cinder-backup`` to the appropriate nodes and change their ownership::
+
+ ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
+ ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
+ ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+ ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
+ ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
+ ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring
+
+Nodes running ``nova-compute`` need the keyring file for the ``nova-compute``
+process::
+
+ ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
+
+They also need to store the secret key of the ``client.cinder`` user in
+``libvirt``. The libvirt process needs it to access the cluster while attaching
+a block device from Cinder.
+
+Create a temporary copy of the secret key on the nodes running
+``nova-compute``::
+
+ ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
+
+Then, on the compute nodes, add the secret key to ``libvirt`` and remove the
+temporary copy of the key::
+
+ uuidgen
+ 457eb676-33da-42ec-9a8c-9293d545c337
+
+ cat > secret.xml <<EOF
+ <secret ephemeral='no' private='no'>
+ <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid>
+ <usage type='ceph'>
+ <name>client.cinder secret</name>
+ </usage>
+ </secret>
+ EOF
+ sudo virsh secret-define --file secret.xml
+ Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
+ sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml
+
+Save the uuid of the secret for configuring ``nova-compute`` later.
+
+.. important:: You don't necessarily need the UUID on all the compute nodes.
+ However from a platform consistency perspective, it's better to keep the
+ same UUID.
+
+.. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx
+
+
+Configure OpenStack to use Ceph
+===============================
+
+Configuring Glance
+------------------
+
+Glance can use multiple back ends to store images. To use Ceph block devices by
+default, configure Glance like the following.
+
+Prior to Juno
+~~~~~~~~~~~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section::
+
+ default_store = rbd
+ rbd_store_user = glance
+ rbd_store_pool = images
+ rbd_store_chunk_size = 8
+
+
+Juno
+~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
+
+ [DEFAULT]
+ ...
+ default_store = rbd
+ ...
+ [glance_store]
+ stores = rbd
+ rbd_store_pool = images
+ rbd_store_user = glance
+ rbd_store_ceph_conf = /etc/ceph/ceph.conf
+ rbd_store_chunk_size = 8
+
+.. important:: Glance has not completely moved to 'store' yet.
+ So we still need to configure the store in the DEFAULT section until Kilo.
+
+Kilo and after
+~~~~~~~~~~~~~~
+
+Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section::
+
+ [glance_store]
+ stores = rbd
+ default_store = rbd
+ rbd_store_pool = images
+ rbd_store_user = glance
+ rbd_store_ceph_conf = /etc/ceph/ceph.conf
+ rbd_store_chunk_size = 8
+
+For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/.
+
+Enable copy-on-write cloning of images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Note that this exposes the back end location via Glance's API, so the endpoint
+with this option enabled should not be publicly accessible.
+
+Any OpenStack version except Mitaka
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section::
+
+ show_image_direct_url = True
+
+For Mitaka only
+^^^^^^^^^^^^^^^
+
+To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section::
+
+ show_multiple_locations = True
+ show_image_direct_url = True
+
+Disable cache management (any OpenStack version)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``,
+assuming your configuration file has ``flavor = keystone+cachemanagement``::
+
+ [paste_deploy]
+ flavor = keystone
+
+Image properties
+~~~~~~~~~~~~~~~~
+
+We recommend to use the following properties for your images:
+
+- ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation
+- ``hw_disk_bus=scsi``: connect every cinder block devices to that controller
+- ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent
+- ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent
+
+
+Configuring Cinder
+------------------
+
+OpenStack requires a driver to interact with Ceph block devices. You must also
+specify the pool name for the block device. On your OpenStack node, edit
+``/etc/cinder/cinder.conf`` by adding::
+
+ [DEFAULT]
+ ...
+ enabled_backends = ceph
+ ...
+ [ceph]
+ volume_driver = cinder.volume.drivers.rbd.RBDDriver
+ volume_backend_name = ceph
+ rbd_pool = volumes
+ rbd_ceph_conf = /etc/ceph/ceph.conf
+ rbd_flatten_volume_from_snapshot = false
+ rbd_max_clone_depth = 5
+ rbd_store_chunk_size = 4
+ rados_connect_timeout = -1
+ glance_api_version = 2
+
+If you are using `cephx authentication`_, also configure the user and uuid of
+the secret you added to ``libvirt`` as documented earlier::
+
+ [ceph]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+Note that if you are configuring multiple cinder back ends,
+``glance_api_version = 2`` must be in the ``[DEFAULT]`` section.
+
+
+Configuring Cinder Backup
+-------------------------
+
+OpenStack Cinder Backup requires a specific daemon so don't forget to install it.
+On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add::
+
+ backup_driver = cinder.backup.drivers.ceph
+ backup_ceph_conf = /etc/ceph/ceph.conf
+ backup_ceph_user = cinder-backup
+ backup_ceph_chunk_size = 134217728
+ backup_ceph_pool = backups
+ backup_ceph_stripe_unit = 0
+ backup_ceph_stripe_count = 0
+ restore_discard_excess_bytes = true
+
+
+Configuring Nova to attach Ceph RBD block device
+------------------------------------------------
+
+In order to attach Cinder devices (either normal block or by issuing a boot
+from volume), you must tell Nova (and libvirt) which user and UUID to refer to
+when attaching the device. libvirt will refer to this user when connecting and
+authenticating with the Ceph cluster. ::
+
+ [libvirt]
+ ...
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+These two flags are also used by the Nova ephemeral backend.
+
+
+Configuring Nova
+----------------
+
+In order to boot all the virtual machines directly into Ceph, you must
+configure the ephemeral backend for Nova.
+
+It is recommended to enable the RBD cache in your Ceph configuration file
+(enabled by default since Giant). Moreover, enabling the admin socket
+brings a lot of benefits while troubleshooting. Having one socket
+per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors.
+
+This socket can be accessed like this::
+
+ ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help
+
+Now on every compute nodes edit your Ceph configuration file::
+
+ [client]
+ rbd cache = true
+ rbd cache writethrough until flush = true
+ admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
+ log file = /var/log/qemu/qemu-guest-$pid.log
+ rbd concurrent management ops = 20
+
+Configure the permissions of these paths::
+
+ mkdir -p /var/run/ceph/guests/ /var/log/qemu/
+ chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/
+
+Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system.
+The provided example works for RedHat based systems.
+
+.. tip:: If your virtual machine is already running you can simply restart it to get the socket
+
+
+Havana and Icehouse
+~~~~~~~~~~~~~~~~~~~
+
+Havana and Icehouse require patches to implement copy-on-write cloning and fix
+bugs with image size and live migration of ephemeral disks on rbd. These are
+available in branches based on upstream Nova `stable/havana`_ and
+`stable/icehouse`_. Using them is not mandatory but **highly recommended** in
+order to take advantage of the copy-on-write clone functionality.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add::
+
+ libvirt_images_type = rbd
+ libvirt_images_rbd_pool = vms
+ libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
+ disk_cachemodes="network=writeback"
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+
+It is also a good practice to disable file injection. While booting an
+instance, Nova usually attempts to open the rootfs of the virtual machine.
+Then, Nova injects values such as password, ssh keys etc. directly into the
+filesystem. However, it is better to rely on the metadata service and
+``cloud-init``.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add::
+
+ libvirt_inject_password = false
+ libvirt_inject_key = false
+ libvirt_inject_partition = -2
+
+To ensure a proper live-migration, use the following flags::
+
+ libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
+
+Juno
+~~~~
+
+In Juno, Ceph block device was moved under the ``[libvirt]`` section.
+On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]``
+section and add::
+
+ [libvirt]
+ images_type = rbd
+ images_rbd_pool = vms
+ images_rbd_ceph_conf = /etc/ceph/ceph.conf
+ rbd_user = cinder
+ rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
+ disk_cachemodes="network=writeback"
+
+
+It is also a good practice to disable file injection. While booting an
+instance, Nova usually attempts to open the rootfs of the virtual machine.
+Then, Nova injects values such as password, ssh keys etc. directly into the
+filesystem. However, it is better to rely on the metadata service and
+``cloud-init``.
+
+On every Compute node, edit ``/etc/nova/nova.conf`` and add the following
+under the ``[libvirt]`` section::
+
+ inject_password = false
+ inject_key = false
+ inject_partition = -2
+
+To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section)::
+
+ live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
+
+Kilo
+~~~~
+
+Enable discard support for virtual machine ephemeral root disk::
+
+ [libvirt]
+ ...
+ ...
+ hw_disk_discard = unmap # enable discard support (be careful of performance)
+
+
+Restart OpenStack
+=================
+
+To activate the Ceph block device driver and load the block device pool name
+into the configuration, you must restart OpenStack. Thus, for Debian based
+systems execute these commands on the appropriate nodes::
+
+ sudo glance-control api restart
+ sudo service nova-compute restart
+ sudo service cinder-volume restart
+ sudo service cinder-backup restart
+
+For Red Hat based systems execute::
+
+ sudo service openstack-glance-api restart
+ sudo service openstack-nova-compute restart
+ sudo service openstack-cinder-volume restart
+ sudo service openstack-cinder-backup restart
+
+Once OpenStack is up and running, you should be able to create a volume
+and boot from it.
+
+
+Booting from a Block Device
+===========================
+
+You can create a volume from an image using the Cinder command line tool::
+
+ cinder create --image-id {id of image} --display-name {name of volume} {size of volume}
+
+Note that image must be RAW format. You can use `qemu-img`_ to convert
+from one format to another. For example::
+
+ qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename}
+ qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw
+
+When Glance and Cinder are both using Ceph block devices, the image is a
+copy-on-write clone, so it can create a new volume quickly. In the OpenStack
+dashboard, you can boot from that volume by performing the following steps:
+
+#. Launch a new instance.
+#. Choose the image associated to the copy-on-write clone.
+#. Select 'boot from volume'.
+#. Select the volume you created.
+
+.. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd
+.. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack
+.. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd
+.. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse
diff --git a/src/ceph/doc/rbd/rbd-replay.rst b/src/ceph/doc/rbd/rbd-replay.rst
new file mode 100644
index 0000000..e1c96b2
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-replay.rst
@@ -0,0 +1,42 @@
+===================
+ RBD Replay
+===================
+
+.. index:: Ceph Block Device; RBD Replay
+
+RBD Replay is a set of tools for capturing and replaying Rados Block Device
+(RBD) workloads. To capture an RBD workload, ``lttng-tools`` must be installed
+on the client, and ``librbd`` on the client must be the v0.87 (Giant) release
+or later. To replay an RBD workload, ``librbd`` on the client must be the Giant
+release or later.
+
+Capture and replay takes three steps:
+
+#. Capture the trace. Make sure to capture ``pthread_id`` context::
+
+ mkdir -p traces
+ lttng create -o traces librbd
+ lttng enable-event -u 'librbd:*'
+ lttng add-context -u -t pthread_id
+ lttng start
+ # run RBD workload here
+ lttng stop
+
+#. Process the trace with `rbd-replay-prep`_::
+
+ rbd-replay-prep traces/ust/uid/*/* replay.bin
+
+#. Replay the trace with `rbd-replay`_. Use read-only until you know
+ it's doing what you want::
+
+ rbd-replay --read-only replay.bin
+
+.. important:: ``rbd-replay`` will destroy data by default. Do not use against
+ an image you wish to keep, unless you use the ``--read-only`` option.
+
+The replayed workload does not have to be against the same RBD image or even the
+same cluster as the captured workload. To account for differences, you may need
+to use the ``--pool`` and ``--map-image`` options of ``rbd-replay``.
+
+.. _rbd-replay: ../../man/8/rbd-replay
+.. _rbd-replay-prep: ../../man/8/rbd-replay-prep
diff --git a/src/ceph/doc/rbd/rbd-snapshot.rst b/src/ceph/doc/rbd/rbd-snapshot.rst
new file mode 100644
index 0000000..2e5af9f
--- /dev/null
+++ b/src/ceph/doc/rbd/rbd-snapshot.rst
@@ -0,0 +1,308 @@
+===========
+ Snapshots
+===========
+
+.. index:: Ceph Block Device; snapshots
+
+A snapshot is a read-only copy of the state of an image at a particular point in
+time. One of the advanced features of Ceph block devices is that you can create
+snapshots of the images to retain a history of an image's state. Ceph also
+supports snapshot layering, which allows you to clone images (e.g., a VM image)
+quickly and easily. Ceph supports block device snapshots using the ``rbd``
+command and many higher level interfaces, including `QEMU`_, `libvirt`_,
+`OpenStack`_ and `CloudStack`_.
+
+.. important:: To use use RBD snapshots, you must have a running Ceph cluster.
+
+.. note:: If a snapshot is taken while `I/O` is still in progress in a image, the
+ snapshot might not get the exact or latest data of the image and the snapshot
+ may have to be cloned to a new image to be mountable. So, we recommend to stop
+ `I/O` before taking a snapshot of an image. If the image contains a filesystem,
+ the filesystem must be in a consistent state before taking a snapshot. To stop
+ `I/O` you can use `fsfreeze` command. See `fsfreeze(8)` man page for more details.
+ For virtual machines, `qemu-guest-agent` can be used to automatically freeze
+ filesystems when creating a snapshot.
+
+.. ditaa:: +------------+ +-------------+
+ | {s} | | {s} c999 |
+ | Active |<-------*| Snapshot |
+ | Image | | of Image |
+ | (stop i/o) | | (read only) |
+ +------------+ +-------------+
+
+
+Cephx Notes
+===========
+
+When `cephx`_ is enabled (it is by default), you must specify a user name or ID
+and a path to the keyring containing the corresponding key for the user. See
+`User Management`_ for details. You may also add the ``CEPH_ARGS`` environment
+variable to avoid re-entry of the following parameters. ::
+
+ rbd --id {user-ID} --keyring=/path/to/secret [commands]
+ rbd --name {username} --keyring=/path/to/secret [commands]
+
+For example::
+
+ rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands]
+ rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands]
+
+.. tip:: Add the user and secret to the ``CEPH_ARGS`` environment
+ variable so that you don't need to enter them each time.
+
+
+Snapshot Basics
+===============
+
+The following procedures demonstrate how to create, list, and remove
+snapshots using the ``rbd`` command on the command line.
+
+Create Snapshot
+---------------
+
+To create a snapshot with ``rbd``, specify the ``snap create`` option, the pool
+name and the image name. ::
+
+ rbd snap create {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap create rbd/foo@snapname
+
+
+List Snapshots
+--------------
+
+To list snapshots of an image, specify the pool name and the image name. ::
+
+ rbd snap ls {pool-name}/{image-name}
+
+For example::
+
+ rbd snap ls rbd/foo
+
+
+Rollback Snapshot
+-----------------
+
+To rollback to a snapshot with ``rbd``, specify the ``snap rollback`` option, the
+pool name, the image name and the snap name. ::
+
+ rbd snap rollback {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap rollback rbd/foo@snapname
+
+
+.. note:: Rolling back an image to a snapshot means overwriting
+ the current version of the image with data from a snapshot. The
+ time it takes to execute a rollback increases with the size of the
+ image. It is **faster to clone** from a snapshot **than to rollback**
+ an image to a snapshot, and it is the preferred method of returning
+ to a pre-existing state.
+
+
+Delete a Snapshot
+-----------------
+
+To delete a snapshot with ``rbd``, specify the ``snap rm`` option, the pool
+name, the image name and the snap name. ::
+
+ rbd snap rm {pool-name}/{image-name}@{snap-name}
+
+For example::
+
+ rbd snap rm rbd/foo@snapname
+
+
+.. note:: Ceph OSDs delete data asynchronously, so deleting a snapshot
+ doesn't free up the disk space immediately.
+
+Purge Snapshots
+---------------
+
+To delete all snapshots for an image with ``rbd``, specify the ``snap purge``
+option and the image name. ::
+
+ rbd snap purge {pool-name}/{image-name}
+
+For example::
+
+ rbd snap purge rbd/foo
+
+
+.. index:: Ceph Block Device; snapshot layering
+
+Layering
+========
+
+Ceph supports the ability to create many copy-on-write (COW) clones of a block
+device shapshot. Snapshot layering enables Ceph block device clients to create
+images very quickly. For example, you might create a block device image with a
+Linux VM written to it; then, snapshot the image, protect the snapshot, and
+create as many copy-on-write clones as you like. A snapshot is read-only,
+so cloning a snapshot simplifies semantics--making it possible to create
+clones rapidly.
+
+
+.. ditaa:: +-------------+ +-------------+
+ | {s} c999 | | {s} |
+ | Snapshot | Child refers | COW Clone |
+ | of Image |<------------*| of Snapshot |
+ | | to Parent | |
+ | (read only) | | (writable) |
+ +-------------+ +-------------+
+
+ Parent Child
+
+.. note:: The terms "parent" and "child" mean a Ceph block device snapshot (parent),
+ and the corresponding image cloned from the snapshot (child). These terms are
+ important for the command line usage below.
+
+Each cloned image (child) stores a reference to its parent image, which enables
+the cloned image to open the parent snapshot and read it.
+
+A COW clone of a snapshot behaves exactly like any other Ceph block device
+image. You can read to, write from, clone, and resize cloned images. There are
+no special restrictions with cloned images. However, the copy-on-write clone of
+a snapshot refers to the snapshot, so you **MUST** protect the snapshot before
+you clone it. The following diagram depicts the process.
+
+.. note:: Ceph only supports cloning for format 2 images (i.e., created with
+ ``rbd create --image-format 2``). The kernel client supports cloned images
+ since kernel 3.10.
+
+Getting Started with Layering
+-----------------------------
+
+Ceph block device layering is a simple process. You must have an image. You must
+create a snapshot of the image. You must protect the snapshot. Once you have
+performed these steps, you can begin cloning the snapshot.
+
+.. ditaa:: +----------------------------+ +-----------------------------+
+ | | | |
+ | Create Block Device Image |------->| Create a Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+ |
+ +--------------------------------------+
+ |
+ v
+ +----------------------------+ +-----------------------------+
+ | | | |
+ | Protect the Snapshot |------->| Clone the Snapshot |
+ | | | |
+ +----------------------------+ +-----------------------------+
+
+
+The cloned image has a reference to the parent snapshot, and includes the pool
+ID, image ID and snapshot ID. The inclusion of the pool ID means that you may
+clone snapshots from one pool to images in another pool.
+
+
+#. **Image Template:** A common use case for block device layering is to create a
+ a master image and a snapshot that serves as a template for clones. For example,
+ a user may create an image for a Linux distribution (e.g., Ubuntu 12.04), and
+ create a snapshot for it. Periodically, the user may update the image and create
+ a new snapshot (e.g., ``sudo apt-get update``, ``sudo apt-get upgrade``,
+ ``sudo apt-get dist-upgrade`` followed by ``rbd snap create``). As the image
+ matures, the user can clone any one of the snapshots.
+
+#. **Extended Template:** A more advanced use case includes extending a template
+ image that provides more information than a base image. For example, a user may
+ clone an image (e.g., a VM template) and install other software (e.g., a database,
+ a content management system, an analytics system, etc.) and then snapshot the
+ extended image, which itself may be updated just like the base image.
+
+#. **Template Pool:** One way to use block device layering is to create a
+ pool that contains master images that act as templates, and snapshots of those
+ templates. You may then extend read-only privileges to users so that they
+ may clone the snapshots without the ability to write or execute within the pool.
+
+#. **Image Migration/Recovery:** One way to use block device layering is to migrate
+ or recover data from one pool into another pool.
+
+Protecting a Snapshot
+---------------------
+
+Clones access the parent snapshots. All clones would break if a user inadvertently
+deleted the parent snapshot. To prevent data loss, you **MUST** protect the
+snapshot before you can clone it. ::
+
+ rbd snap protect {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd snap protect rbd/my-image@my-snapshot
+
+.. note:: You cannot delete a protected snapshot.
+
+Cloning a Snapshot
+------------------
+
+To clone a snapshot, specify you need to specify the parent pool, image and
+snapshot; and, the child pool and image name. You must protect the snapshot
+before you can clone it. ::
+
+ rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}
+
+For example::
+
+ rbd clone rbd/my-image@my-snapshot rbd/new-image
+
+.. note:: You may clone a snapshot from one pool to an image in another pool. For example,
+ you may maintain read-only images and snapshots as templates in one pool, and writeable
+ clones in another pool.
+
+Unprotecting a Snapshot
+-----------------------
+
+Before you can delete a snapshot, you must unprotect it first. Additionally,
+you may *NOT* delete snapshots that have references from clones. You must
+flatten each clone of a snapshot, before you can delete the snapshot. ::
+
+ rbd snap unprotect {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd snap unprotect rbd/my-image@my-snapshot
+
+
+Listing Children of a Snapshot
+------------------------------
+
+To list the children of a snapshot, execute the following::
+
+ rbd children {pool-name}/{image-name}@{snapshot-name}
+
+For example::
+
+ rbd children rbd/my-image@my-snapshot
+
+
+Flattening a Cloned Image
+-------------------------
+
+Cloned images retain a reference to the parent snapshot. When you remove the
+reference from the child clone to the parent snapshot, you effectively "flatten"
+the image by copying the information from the snapshot to the clone. The time
+it takes to flatten a clone increases with the size of the snapshot. To delete
+a snapshot, you must flatten the child images first. ::
+
+ rbd flatten {pool-name}/{image-name}
+
+For example::
+
+ rbd flatten rbd/my-image
+
+.. note:: Since a flattened image contains all the information from the snapshot,
+ a flattened image will take up more storage space than a layered clone.
+
+
+.. _cephx: ../../rados/configuration/auth-config-ref/
+.. _User Management: ../../operations/user-management
+.. _QEMU: ../qemu-rbd/
+.. _OpenStack: ../rbd-openstack/
+.. _CloudStack: ../rbd-cloudstack/
+.. _libvirt: ../libvirt/