diff options
Diffstat (limited to 'src/ceph/doc/rbd')
25 files changed, 3427 insertions, 0 deletions
diff --git a/src/ceph/doc/rbd/api/index.rst b/src/ceph/doc/rbd/api/index.rst new file mode 100644 index 0000000..71f6809 --- /dev/null +++ b/src/ceph/doc/rbd/api/index.rst @@ -0,0 +1,8 @@ +======================== + Ceph Block Device APIs +======================== + +.. toctree:: + :maxdepth: 2 + + librados (Python) <librbdpy> diff --git a/src/ceph/doc/rbd/api/librbdpy.rst b/src/ceph/doc/rbd/api/librbdpy.rst new file mode 100644 index 0000000..fa90331 --- /dev/null +++ b/src/ceph/doc/rbd/api/librbdpy.rst @@ -0,0 +1,82 @@ +================ + Librbd (Python) +================ + +.. highlight:: python + +The `rbd` python module provides file-like access to RBD images. + + +Example: Creating and writing to an image +========================================= + +To use `rbd`, you must first connect to RADOS and open an IO +context:: + + cluster = rados.Rados(conffile='my_ceph.conf') + cluster.connect() + ioctx = cluster.open_ioctx('mypool') + +Then you instantiate an :class:rbd.RBD object, which you use to create the +image:: + + rbd_inst = rbd.RBD() + size = 4 * 1024**3 # 4 GiB + rbd_inst.create(ioctx, 'myimage', size) + +To perform I/O on the image, you instantiate an :class:rbd.Image object:: + + image = rbd.Image(ioctx, 'myimage') + data = 'foo' * 200 + image.write(data, 0) + +This writes 'foo' to the first 600 bytes of the image. Note that data +cannot be :type:unicode - `Librbd` does not know how to deal with +characters wider than a :c:type:char. + +In the end, you will want to close the image, the IO context and the connection to RADOS:: + + image.close() + ioctx.close() + cluster.shutdown() + +To be safe, each of these calls would need to be in a separate :finally +block:: + + cluster = rados.Rados(conffile='my_ceph_conf') + try: + ioctx = cluster.open_ioctx('my_pool') + try: + rbd_inst = rbd.RBD() + size = 4 * 1024**3 # 4 GiB + rbd_inst.create(ioctx, 'myimage', size) + image = rbd.Image(ioctx, 'myimage') + try: + data = 'foo' * 200 + image.write(data, 0) + finally: + image.close() + finally: + ioctx.close() + finally: + cluster.shutdown() + +This can be cumbersome, so the :class:`Rados`, :class:`Ioctx`, and +:class:`Image` classes can be used as context managers that close/shutdown +automatically (see :pep:`343`). Using them as context managers, the +above example becomes:: + + with rados.Rados(conffile='my_ceph.conf') as cluster: + with cluster.open_ioctx('mypool') as ioctx: + rbd_inst = rbd.RBD() + size = 4 * 1024**3 # 4 GiB + rbd_inst.create(ioctx, 'myimage', size) + with rbd.Image(ioctx, 'myimage') as image: + data = 'foo' * 200 + image.write(data, 0) + +API Reference +============= + +.. automodule:: rbd + :members: RBD, Image, SnapIterator diff --git a/src/ceph/doc/rbd/disk.conf b/src/ceph/doc/rbd/disk.conf new file mode 100644 index 0000000..3db9b8a --- /dev/null +++ b/src/ceph/doc/rbd/disk.conf @@ -0,0 +1,8 @@ +<disk type='network' device='disk'> + <source protocol='rbd' name='poolname/imagename'> + <host name='{fqdn}' port='6789'/> + <host name='{fqdn}' port='6790'/> + <host name='{fqdn}' port='6791'/> + </source> + <target dev='vda' bus='virtio'/> +</disk> diff --git a/src/ceph/doc/rbd/index.rst b/src/ceph/doc/rbd/index.rst new file mode 100644 index 0000000..c297d0d --- /dev/null +++ b/src/ceph/doc/rbd/index.rst @@ -0,0 +1,72 @@ +=================== + Ceph Block Device +=================== + +.. index:: Ceph Block Device; introduction + +A block is a sequence of bytes (for example, a 512-byte block of data). +Block-based storage interfaces are the most common way to store data with +rotating media such as hard disks, CDs, floppy disks, and even traditional +9-track tape. The ubiquity of block device interfaces makes a virtual block +device an ideal candidate to interact with a mass data storage system like Ceph. + +Ceph block devices are thin-provisioned, resizable and store data striped over +multiple OSDs in a Ceph cluster. Ceph block devices leverage +:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` capabilities +such as snapshotting, replication and consistency. Ceph's +:abbr:`RADOS (Reliable Autonomic Distributed Object Store)` Block Devices (RBD) +interact with OSDs using kernel modules or the ``librbd`` library. + +.. ditaa:: +------------------------+ +------------------------+ + | Kernel Module | | librbd | + +------------------------+-+------------------------+ + | RADOS Protocol | + +------------------------+-+------------------------+ + | OSDs | | Monitors | + +------------------------+ +------------------------+ + +.. note:: Kernel modules can use Linux page caching. For ``librbd``-based + applications, Ceph supports `RBD Caching`_. + +Ceph's block devices deliver high performance with infinite scalability to +`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and +cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on +libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster +to operate the `Ceph RADOS Gateway`_, the `Ceph FS filesystem`_, and Ceph block +devices simultaneously. + +.. important:: To use Ceph Block Devices, you must have access to a running + Ceph cluster. + +.. toctree:: + :maxdepth: 1 + + Commands <rados-rbd-cmds> + Kernel Modules <rbd-ko> + Snapshots<rbd-snapshot> + Mirroring <rbd-mirroring> + iSCSI Gateway <iscsi-overview> + QEMU <qemu-rbd> + libvirt <libvirt> + Cache Settings <rbd-config-ref/> + OpenStack <rbd-openstack> + CloudStack <rbd-cloudstack> + RBD Replay <rbd-replay> + +.. toctree:: + :maxdepth: 2 + + Manpages <man/index> + +.. toctree:: + :maxdepth: 2 + + APIs <api/index> + +.. _RBD Caching: ../rbd-config-ref/ +.. _kernel modules: ../rbd-ko/ +.. _QEMU: ../qemu-rbd/ +.. _OpenStack: ../rbd-openstack +.. _CloudStack: ../rbd-cloudstack +.. _Ceph RADOS Gateway: ../../radosgw/ +.. _Ceph FS filesystem: ../../cephfs/ diff --git a/src/ceph/doc/rbd/iscsi-initiator-esx.rst b/src/ceph/doc/rbd/iscsi-initiator-esx.rst new file mode 100644 index 0000000..18dd583 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-initiator-esx.rst @@ -0,0 +1,36 @@ +---------------------------------- +The iSCSI Initiator for VMware ESX +---------------------------------- + +**Prerequisite:** + +- VMware ESX 6.0 or later + +**iSCSI Discovery and Multipath Device Setup:** + +#. From vSphere, open the Storage Adapters, on the Configuration tab. Right click + on the iSCSI Software Adapter and select Properties. + +#. In the General tab click the "Advanced" button and in the "Advanced Settings" + set RecoveryTimeout to 25. + +#. If CHAP was setup on the iSCSI gateway, in the General tab click the "CHAP…" + button. If CHAP is not being used, skip to step 4. + +#. On the CHAP Credentials windows, select “Do not use CHAP unless required by target”, + and enter the "Name" and "Secret" values used on the initial setup for the iSCSI + gateway, then click on the "OK" button. + +#. On the Dynamic Discovery tab, click the "Add…" button, and enter the IP address + and port of one of the iSCSI target portals. Click on the "OK" button. + +#. Close the iSCSI Initiator Properties window. A prompt will ask to rescan the + iSCSI software adapter. Select Yes. + +#. In the Details pane, the LUN on the iSCSI target will be displayed. Right click + on a device and select "Manage Paths". + +#. On the Manage Paths window, select “Most Recently Used (VMware)” for the policy + path selection. Close and repeat for the other disks. + +Now the disks can be used for datastores. diff --git a/src/ceph/doc/rbd/iscsi-initiator-rhel.rst b/src/ceph/doc/rbd/iscsi-initiator-rhel.rst new file mode 100644 index 0000000..51248e4 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-initiator-rhel.rst @@ -0,0 +1,90 @@ +------------------------------------------------ +The iSCSI Initiator for Red Hat Enterprise Linux +------------------------------------------------ + +**Prerequisite:** + +- Package ``iscsi-initiator-utils-6.2.0.873-35`` or newer must be + installed + +- Package ``device-mapper-multipath-0.4.9-99`` or newer must be + installed + +**Installing:** + +Install the iSCSI initiator and multipath tools: + + :: + + # yum install iscsi-initiator-utils + # yum install device-mapper-multipath + +**Configuring:** + +#. Create the default ``/etc/multipath.conf`` file and enable the + ``multiapthd`` service: + + :: + + # mpathconf --enable --with_multipathd y + +#. Add the following to ``/etc/multipath.conf`` file: + + :: + + devices { + device { + vendor "LIO-ORG" + hardware_handler "1 alua" + path_grouping_policy "failover" + path_selector "queue-length 0" + failback 60 + path_checker tur + prio alua + prio_args exclusive_pref_bit + fast_oi_fail_tmo 25 + no_path_retry queue + } + } + +#. Restart the ``multipathd`` service: + + :: + + # systemctl reload multipathd + +**iSCSI Discovery and Setup:** + +#. Discover the target portals: + + :: + + # iscsiadm -m discovery -t -st 192.168.56.101 + 192.168.56.101:3260,1 iqn.2003-01.org.linux-iscsi.rheln1 + 192.168.56.102:3260,2 iqn.2003-01.org.linux-iscsi.rheln1 + +#. Login to target: + + :: + + # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.rheln1 -l + +**Multipath IO Setup:** + +The multipath daemon (``multipathd``), will set up devices automatically +based on the ``multipath.conf`` settings. Running the ``multipath`` +command show devices setup in a failover configuration with a priority +group for each path. + +:: + + # multipath -ll + mpathbt (360014059ca317516a69465c883a29603) dm-1 LIO-ORG ,IBLOCK + size=1.0G features='0' hwhandler='1 alua' wp=rw + |-+- policy='queue-length 0' prio=50 status=active + | `- 28:0:0:1 sde 8:64 active ready running + `-+- policy='queue-length 0' prio=10 status=enabled + `- 29:0:0:1 sdc 8:32 active ready running + +You should now be able to use the RBD image like you would a normal +multipath’d iSCSI disk. diff --git a/src/ceph/doc/rbd/iscsi-initiator-win.rst b/src/ceph/doc/rbd/iscsi-initiator-win.rst new file mode 100644 index 0000000..08a1cfb --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-initiator-win.rst @@ -0,0 +1,100 @@ +----------------------------------------- +The iSCSI Initiator for Microsoft Windows +----------------------------------------- + +**Prerequisite:** + +- Microsoft Windows 2016 + +**iSCSI Initiator, Discovery and Setup:** + +#. Install the iSCSI initiator driver and MPIO tools. + +#. Launch the MPIO program, click on the “Discover Multi-Paths” tab select “Add + support for iSCSI devices”. + +#. On the iSCSI Initiator Properties window, on the "Discovery" tab, add a target + portal. Enter the IP address or DNS name and Port of the Ceph iSCSI gateway. + +#. On the “Targets” tab, select the target and click on “Connect”. + +#. On the “Connect To Target” window, select the “Enable multi-path” option, and + click the “Advanced” button. + +#. Under the "Connet using" section, select a “Target portal IP” . Select the + “Enable CHAP login on” and enter the "Name" and "Target secret" values from the + Ceph iSCSI Ansible client credentials section, and click OK. + +#. Repeat steps 5 and 6 for each target portal defined when setting up + the iSCSI gateway. + +**Multipath IO Setup:** + +Configuring the MPIO load balancing policy, setting the timeout and +retry options are using PowerShell with the ``mpclaim`` command. The +reset is done in the MPIO tool. + +.. note:: + It is recommended to increase the ``PDORemovePeriod`` option to 120 + seconds from PowerShell. This value might need to be adjusted based + on the application. When all paths are down, and 120 seconds + expires, the operating system will start failing IO requests. + +:: + + Set-MPIOSetting -NewPDORemovePeriod 120 + +:: + + mpclaim.exe -l -m 1 + +:: + + mpclaim -s -m + MSDSM-wide Load Balance Policy: Fail Over Only + +#. Using the MPIO tool, from the “Targets” tab, click on the + “Devices...” button. + +#. From the Devices window, select a disk and click the + “MPIO...” button. + +#. On the "Device Details" window the paths to each target portal is + displayed. If using the ``ceph-ansible`` setup method, the + iSCSI gateway will use ALUA to tell the iSCSI initiator which path + and iSCSI gateway should be used as the primary path. The Load + Balancing Policy “Fail Over Only” must be selected + +:: + + mpclaim -s -d $MPIO_DISK_ID + +.. note:: + For the ``ceph-ansible`` setup method, there will be one + Active/Optimized path which is the path to the iSCSI gateway node + that owns the LUN, and there will be an Active/Unoptimized path for + each other iSCSI gateway node. + +**Tuning:** + +Consider using the following registry settings: + +- Windows Disk Timeout + + :: + + HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk + + :: + + TimeOutValue = 65 + +- Microsoft iSCSI Initiator Driver + + :: + + HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters + + :: + LinkDownTime = 25 + SRBTimeoutDelta = 15 diff --git a/src/ceph/doc/rbd/iscsi-initiators.rst b/src/ceph/doc/rbd/iscsi-initiators.rst new file mode 100644 index 0000000..d3ad633 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-initiators.rst @@ -0,0 +1,10 @@ +-------------------------------- +Configuring the iSCSI Initiators +-------------------------------- + +.. toctree:: + :maxdepth: 1 + + The iSCSI Initiator for Red Hat Enterprise Linux <iscsi-initiator-rhel> + The iSCSI Initiator for Microsoft Windows <iscsi-initiator-win> + The iSCSI Initiator for VMware ESX <iscsi-initiator-esx> diff --git a/src/ceph/doc/rbd/iscsi-monitoring.rst b/src/ceph/doc/rbd/iscsi-monitoring.rst new file mode 100644 index 0000000..d425232 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-monitoring.rst @@ -0,0 +1,103 @@ +----------------------------- +Monitoring the iSCSI gateways +----------------------------- + +Ceph provides an additional tool for iSCSI gateway environments +to monitor performance of exported RADOS Block Device (RBD) images. + +The ``gwtop`` tool is a ``top``-like tool that displays aggregated +performance metrics of RBD images that are exported to clients over +iSCSI. The metrics are sourced from a Performance Metrics Domain Agent +(PMDA). Information from the Linux-IO target (LIO) PMDA is used to list +each exported RBD image with the connected client and its associated I/O +metrics. + +**Requirements:** + +- A running Ceph iSCSI gateway + +**Installing:** + +#. As ``root``, install the ``ceph-iscsi-tools`` package on each iSCSI + gateway node: + + :: + + # yum install ceph-iscsi-tools + +#. As ``root``, install the performance co-pilot package on each iSCSI + gateway node: + + :: + + # yum install pcp + +#. As ``root``, install the LIO PMDA package on each iSCSI gateway node: + + :: + + # yum install pcp-pmda-lio + +#. As ``root``, enable and start the performance co-pilot service on + each iSCSI gateway node: + + :: + + # systemctl enable pmcd + # systemctl start pmcd + +#. As ``root``, register the ``pcp-pmda-lio`` agent: + + :: + + cd /var/lib/pcp/pmdas/lio + ./Install + +By default, ``gwtop`` assumes the iSCSI gateway configuration object is +stored in a RADOS object called ``gateway.conf`` in the ``rbd`` pool. +This configuration defines the iSCSI gateways to contact for gathering +the performance statistics. This can be overridden by using either the +``-g`` or ``-c`` flags. See ``gwtop --help`` for more details. + +The LIO configuration determines which type of performance statistics to +extract from performance co-pilot. When ``gwtop`` starts it looks at the +LIO configuration, and if it find user-space disks, then ``gwtop`` +selects the LIO collector automatically. + +**Example ``gwtop`` Outputs** + +For kernel RBD-based devices: + +:: + + gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:09 + Capacity: 8G Disks: 8 IOPS: 500 Clients: 1 Ceph: HEALTH_OK OSDs: 3 + Pool.Image Src Device Size r/s w/s rMB/s wMB/s await r_await w_await Client + iscsi.t1703 rbd0 500M 0 0 0.00 0.00 0.00 0.00 0.00 + iscsi.testme1 rbd5 500M 0 0 0.00 0.00 0.00 0.00 0.00 + iscsi.testme2 rbd2 500M 0 0 0.00 0.00 0.00 0.00 0.00 + iscsi.testme3 rbd3 500M 0 0 0.00 0.00 0.00 0.00 0.00 + iscsi.testme5 rbd1 500M 0 0 0.00 0.00 0.00 0.00 0.00 + rbd.myhost_1 T rbd4 4G 500 0 1.95 0.00 2.37 2.37 0.00 rh460p(CON) + rbd.test_2 rbd6 1G 0 0 0.00 0.00 0.00 0.00 0.00 + rbd.testme rbd7 500M 0 0 0.00 0.00 0.00 0.00 0.00 + +For user backed storage (TCMU) devices: + +:: + + gwtop 2/2 Gateways CPU% MIN: 4 MAX: 5 Network Total In: 2M Out: 3M 10:20:00 + Capacity: 8G Disks: 8 IOPS: 503 Clients: 1 Ceph: HEALTH_OK OSDs: 3 + Pool.Image Src Size iops rMB/s wMB/s Client + iscsi.t1703 500M 0 0.00 0.00 + iscsi.testme1 500M 0 0.00 0.00 + iscsi.testme2 500M 0 0.00 0.00 + iscsi.testme3 500M 0 0.00 0.00 + iscsi.testme5 500M 0 0.00 0.00 + rbd.myhost_1 T 4G 504 1.95 0.00 rh460p(CON) + rbd.test_2 1G 0 0.00 0.00 + rbd.testme 500M 0 0.00 0.00 + +In the *Client* column, ``(CON)`` means the iSCSI initiator (client) is +currently logged into the iSCSI gateway. If ``-multi-`` is displayed, +then multiple clients are mapped to the single RBD image. diff --git a/src/ceph/doc/rbd/iscsi-overview.rst b/src/ceph/doc/rbd/iscsi-overview.rst new file mode 100644 index 0000000..a8c64e2 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-overview.rst @@ -0,0 +1,50 @@ +================== +Ceph iSCSI Gateway +================== + +The iSCSI gateway is integrating Ceph Storage with the iSCSI standard to provide +a Highly Available (HA) iSCSI target that exports RADOS Block Device (RBD) images +as SCSI disks. The iSCSI protocol allows clients (initiators) to send SCSI commands +to SCSI storage devices (targets) over a TCP/IP network. This allows for heterogeneous +clients, such as Microsoft Windows, to access the Ceph Storage cluster. + +Each iSCSI gateway runs the Linux IO target kernel subsystem (LIO) to provide the +iSCSI protocol support. LIO utilizes a userspace passthrough (TCMU) to interact +with Ceph's librbd library and expose RBD images to iSCSI clients. With Ceph’s +iSCSI gateway you can effectively run a fully integrated block-storage +infrastructure with all the features and benefits of a conventional Storage Area +Network (SAN). + +.. ditaa:: + Cluster Network + +-------------------------------------------+ + | | | | + +-------+ +-------+ +-------+ +-------+ + | | | | | | | | + | OSD 1 | | OSD 2 | | OSD 3 | | OSD N | + | {s}| | {s}| | {s}| | {s}| + +-------+ +-------+ +-------+ +-------+ + | | | | + +--------->| | +---------+ | |<---------+ + : | | | RBD | | | : + | +----------------| Image |----------------+ | + | Public Network | {d} | | + | +---------+ | + | | + | +-------------------+ | + | +--------------+ | iSCSI Initators | +--------------+ | + | | iSCSI GW | | +-----------+ | | iSCSI GW | | + +-->| RBD Module |<--+ | Various | +-->| RBD Module |<--+ + | | | | Operating | | | | + +--------------+ | | Systems | | +--------------+ + | +-----------+ | + +-------------------+ + + +.. toctree:: + :maxdepth: 1 + + Requirements <iscsi-requirements> + Configuring the iSCSI Target <iscsi-targets> + Configuring the iSCSI Initiator <iscsi-initiators> + Monitoring the iSCSI Gateways <iscsi-monitoring> diff --git a/src/ceph/doc/rbd/iscsi-requirements.rst b/src/ceph/doc/rbd/iscsi-requirements.rst new file mode 100644 index 0000000..1ae19e0 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-requirements.rst @@ -0,0 +1,49 @@ +========================== +iSCSI Gateway Requirements +========================== + +To implement the Ceph iSCSI gateway there are a few requirements. It is recommended +to use two to four iSCSI gateway nodes for a highly available Ceph iSCSI gateway +solution. + +For hardware recommendations, see the `Hardware Recommendation page <http://docs.ceph.com/docs/master/start/hardware-recommendations/>`_ +for more details. + +.. note:: + On the iSCSI gateway nodes, the memory footprint of the RBD images + can grow to a large size. Plan memory requirements accordingly based + off the number RBD images mapped. + +There are no specific iSCSI gateway options for the Ceph Monitors or +OSDs, but it is important to lower the default timers for detecting +down OSDs to reduce the possibility of initiator timeouts. The following +configuration options are suggested for each OSD node in the storage +cluster:: + + [osd] + osd heartbeat grace = 20 + osd heartbeat interval = 5 + +- Online Updating Using the Ceph Monitor + + :: + + ceph tell <daemon_type>.<id> injectargs '--<parameter_name> <new_value>' + + :: + + ceph tell osd.0 injectargs '--osd_heartbeat_grace 20' + ceph tell osd.0 injectargs '--osd_heartbeat_interval 5' + +- Online Updating on the OSD Node + + :: + + ceph daemon <daemon_type>.<id> config set osd_client_watch_timeout 15 + + :: + + ceph daemon osd.0 config set osd_heartbeat_grace 20 + ceph daemon osd.0 config set osd_heartbeat_interval 5 + +For more details on setting Ceph's configuration options, see the `Configuration page <http://docs.ceph.com/docs/master/rados/configuration/>`_. diff --git a/src/ceph/doc/rbd/iscsi-target-ansible.rst b/src/ceph/doc/rbd/iscsi-target-ansible.rst new file mode 100644 index 0000000..4169a9f --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-target-ansible.rst @@ -0,0 +1,343 @@ +========================================== +Configuring the iSCSI Target using Ansible +========================================== + +The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client +node. The Ceph iSCSI gateway can be a standalone node or be colocated on +a Ceph Object Store Disk (OSD) node. Completing the following steps will +install, and configure the Ceph iSCSI gateway for basic operation. + +**Requirements:** + +- A running Ceph Luminous (12.2.x) cluster or newer + +- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer + +- The ``ceph-iscsi-config`` package installed on all the iSCSI gateway nodes + +**Installing:** + +#. On the Ansible installer node, which could be either the administration node + or a dedicated deployment node, perform the following steps: + + #. As ``root``, install the ``ceph-ansible`` package: + + :: + + # yum install ceph-ansible + + #. Add an entry in ``/etc/ansible/hosts`` file for the gateway group: + + :: + + [ceph-iscsi-gw] + ceph-igw-1 + ceph-igw-2 + +.. note:: + If co-locating the iSCSI gateway with an OSD node, then add the OSD node to the + ``[ceph-iscsi-gw]`` section. + +**Configuring:** + +The ``ceph-ansible`` package places a file in the ``/usr/share/ceph-ansible/group_vars/`` +directory called ``ceph-iscsi-gw.sample``. Create a copy of this sample file named +``ceph-iscsi-gw.yml``. Review the following Ansible variables and descriptions, +and update accordingly. + ++--------------------------------------+--------------------------------------+ +| Variable | Meaning/Purpose | ++======================================+======================================+ +| ``seed_monitor`` | Each gateway needs access to the | +| | ceph cluster for rados and rbd | +| | calls. This means the iSCSI gateway | +| | must have an appropriate | +| | ``/etc/ceph/`` directory defined. | +| | The ``seed_monitor`` host is used to | +| | populate the iSCSI gateway’s | +| | ``/etc/ceph/`` directory. | ++--------------------------------------+--------------------------------------+ +| ``cluster_name`` | Define a custom storage cluster | +| | name. | ++--------------------------------------+--------------------------------------+ +| ``gateway_keyring`` | Define a custom keyring name. | ++--------------------------------------+--------------------------------------+ +| ``deploy_settings`` | If set to ``true``, then deploy the | +| | settings when the playbook is ran. | ++--------------------------------------+--------------------------------------+ +| ``perform_system_checks`` | This is a boolean value that checks | +| | for multipath and lvm configuration | +| | settings on each gateway. It must be | +| | set to true for at least the first | +| | run to ensure multipathd and lvm are | +| | configured properly. | ++--------------------------------------+--------------------------------------+ +| ``gateway_iqn`` | This is the iSCSI IQN that all the | +| | gateways will expose to clients. | +| | This means each client will see the | +| | gateway group as a single subsystem. | ++--------------------------------------+--------------------------------------+ +| ``gateway_ip_list`` | The ip list defines the IP addresses | +| | that will be used on the front end | +| | network for iSCSI traffic. This IP | +| | will be bound to the active target | +| | portal group on each node, and is | +| | the access point for iSCSI traffic. | +| | Each IP should correspond to an IP | +| | available on the hosts defined in | +| | the ``ceph-iscsi-gw`` host group in | +| | ``/etc/ansible/hosts``. | ++--------------------------------------+--------------------------------------+ +| ``rbd_devices`` | This section defines the RBD images | +| | that will be controlled and managed | +| | within the iSCSI gateway | +| | configuration. Parameters like | +| | ``pool`` and ``image`` are self | +| | explanatory. Here are the other | +| | parameters: ``size`` = This defines | +| | the size of the RBD. You may | +| | increase the size later, by simply | +| | changing this value, but shrinking | +| | the size of an RBD is not supported | +| | and is ignored. ``host`` = This is | +| | the iSCSI gateway host name that | +| | will be responsible for the rbd | +| | allocation/resize. Every defined | +| | ``rbd_device`` entry must have a | +| | host assigned. ``state`` = This is | +| | typical Ansible syntax for whether | +| | the resource should be defined or | +| | removed. A request with a state of | +| | absent will first be checked to | +| | ensure the rbd is not mapped to any | +| | client. If the RBD is unallocated, | +| | it will be removed from the iSCSI | +| | gateway and deleted from the | +| | configuration. | ++--------------------------------------+--------------------------------------+ +| ``client_connections`` | This section defines the iSCSI | +| | client connection details together | +| | with the LUN (RBD image) masking. | +| | Currently only CHAP is supported as | +| | an authentication mechanism. Each | +| | connection defines an ``image_list`` | +| | which is a comma separated list of | +| | the form | +| | ``pool.rbd_image[,pool.rbd_image]``. | +| | RBD images can be added and removed | +| | from this list, to change the client | +| | masking. Note that there are no | +| | checks done to limit RBD sharing | +| | across client connections. | ++--------------------------------------+--------------------------------------+ + +.. note:: + When using the ``gateway_iqn`` variable, and for Red Hat Enterprise Linux + clients, installing the ``iscsi-initiator-utils`` package is required for + retrieving the gateway’s IQN name. The iSCSI initiator name is located in the + ``/etc/iscsi/initiatorname.iscsi`` file. + +**Deploying:** + +On the Ansible installer node, perform the following steps. + +#. As ``root``, execute the Ansible playbook: + + :: + + # cd /usr/share/ceph-ansible + # ansible-playbook ceph-iscsi-gw.yml + + .. note:: + The Ansible playbook will handle RPM dependencies, RBD creation + and Linux IO configuration. + +#. Verify the configuration from an iSCSI gateway node: + + :: + + # gwcli ls + + .. note:: + For more information on using the ``gwcli`` command to install and configure + a Ceph iSCSI gateaway, see the `Configuring the iSCSI Target using the Command Line Interface`_ + section. + + .. important:: + Attempting to use the ``targetcli`` tool to change the configuration will + result in the following issues, such as ALUA misconfiguration and path failover + problems. There is the potential to corrupt data, to have mismatched + configuration across iSCSI gateways, and to have mismatched WWN information, + which will lead to client multipath problems. + +**Service Management:** + +The ``ceph-iscsi-config`` package installs the configuration management +logic and a Systemd service called ``rbd-target-gw``. When the Systemd +service is enabled, the ``rbd-target-gw`` will start at boot time and +will restore the Linux IO state. The Ansible playbook disables the +target service during the deployment. Below are the outcomes of when +interacting with the ``rbd-target-gw`` Systemd service. + +:: + + # systemctl <start|stop|restart|reload> rbd-target-gw + +- ``reload`` + + A reload request will force ``rbd-target-gw`` to reread the + configuration and apply it to the current running environment. This + is normally not required, since changes are deployed in parallel from + Ansible to all iSCSI gateway nodes + +- ``stop`` + + A stop request will close the gateway’s portal interfaces, dropping + connections to clients and wipe the current LIO configuration from + the kernel. This returns the iSCSI gateway to a clean state. When + clients are disconnected, active I/O is rescheduled to the other + iSCSI gateways by the client side multipathing layer. + +**Administration:** + +Within the ``/usr/share/ceph-ansible/group_vars/ceph-iscsi-gw`` file +there are a number of operational workflows that the Ansible playbook +supports. + +.. warning:: + Before removing RBD images from the iSCSI gateway configuration, + follow the standard procedures for removing a storage device from + the operating system. + ++--------------------------------------+--------------------------------------+ +| I want to… | Update the ``ceph-iscsi-gw`` file | +| | by… | ++======================================+======================================+ +| Add more RBD images | Adding another entry to the | +| | ``rbd_devices`` section with the new | +| | image. | ++--------------------------------------+--------------------------------------+ +| Resize an existing RBD image | Updating the size parameter within | +| | the ``rbd_devices`` section. Client | +| | side actions are required to pick up | +| | the new size of the disk. | ++--------------------------------------+--------------------------------------+ +| Add a client | Adding an entry to the | +| | ``client_connections`` section. | ++--------------------------------------+--------------------------------------+ +| Add another RBD to a client | Adding the relevant RBD | +| | ``pool.image`` name to the | +| | ``image_list`` variable for the | +| | client. | ++--------------------------------------+--------------------------------------+ +| Remove an RBD from a client | Removing the RBD ``pool.image`` name | +| | from the clients ``image_list`` | +| | variable. | ++--------------------------------------+--------------------------------------+ +| Remove an RBD from the system | Changing the RBD entry state | +| | variable to ``absent``. The RBD | +| | image must be unallocated from the | +| | operating system first for this to | +| | succeed. | ++--------------------------------------+--------------------------------------+ +| Change the clients CHAP credentials | Updating the relevant CHAP details | +| | in ``client_connections``. This will | +| | need to be coordinated with the | +| | clients. For example, the client | +| | issues an iSCSI logout, the | +| | credentials are changed by the | +| | Ansible playbook, the credentials | +| | are changed at the client, then the | +| | client performs an iSCSI login. | ++--------------------------------------+--------------------------------------+ +| Remove a client | Updating the relevant | +| | ``client_connections`` item with a | +| | state of ``absent``. Once the | +| | Ansible playbook is ran, the client | +| | will be purged from the system, but | +| | the disks will remain defined to | +| | Linux IO for potential reuse. | ++--------------------------------------+--------------------------------------+ + +Once a change has been made, rerun the Ansible playbook to apply the +change across the iSCSI gateway nodes. + +:: + + # ansible-playbook ceph-iscsi-gw.yml + +**Removing the Configuration:** + +The ``ceph-ansible`` package provides an Ansible playbook to +remove the iSCSI gateway configuration and related RBD images. The +Ansible playbook is ``/usr/share/ceph-ansible/purge_gateways.yml``. When +this Ansible playbook is ran a prompted for the type of purge to +perform: + +*lio* : + +In this mode the LIO configuration is purged on all iSCSI gateways that +are defined. Disks that were created are left untouched within the Ceph +storage cluster. + +*all* : + +When ``all`` is chosen, the LIO configuration is removed together with +**all** RBD images that were defined within the iSCSI gateway +environment, other unrelated RBD images will not be removed. Ensure the +correct mode is chosen, this operation will delete data. + +.. warning:: + A purge operation is destructive action against your iSCSI gateway + environment. + +.. warning:: + A purge operation will fail, if RBD images have snapshots or clones + and are exported through the Ceph iSCSI gateway. + +:: + + [root@rh7-iscsi-client ceph-ansible]# ansible-playbook purge_gateways.yml + Which configuration elements should be purged? (all, lio or abort) [abort]: all + + + PLAY [Confirm removal of the iSCSI gateway configuration] ********************* + + + GATHERING FACTS *************************************************************** + ok: [localhost] + + + TASK: [Exit playbook if user aborted the purge] ******************************* + skipping: [localhost] + + + TASK: [set_fact ] ************************************************************* + ok: [localhost] + + + PLAY [Removing the gateway configuration] ************************************* + + + GATHERING FACTS *************************************************************** + ok: [ceph-igw-1] + ok: [ceph-igw-2] + + + TASK: [igw_purge | purging the gateway configuration] ************************* + changed: [ceph-igw-1] + changed: [ceph-igw-2] + + + TASK: [igw_purge | deleting configured rbd devices] *************************** + changed: [ceph-igw-1] + changed: [ceph-igw-2] + + + PLAY RECAP ******************************************************************** + ceph-igw-1 : ok=3 changed=2 unreachable=0 failed=0 + ceph-igw-2 : ok=3 changed=2 unreachable=0 failed=0 + localhost : ok=2 changed=0 unreachable=0 failed=0 + + +.. _Configuring the iSCSI Target using the Command Line Interface: ../iscsi-target-cli diff --git a/src/ceph/doc/rbd/iscsi-target-cli.rst b/src/ceph/doc/rbd/iscsi-target-cli.rst new file mode 100644 index 0000000..6da6f10 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-target-cli.rst @@ -0,0 +1,163 @@ +============================================================= +Configuring the iSCSI Target using the Command Line Interface +============================================================= + +The Ceph iSCSI gateway is the iSCSI target node and also a Ceph client +node. The Ceph iSCSI gateway can be a standalone node or be colocated on +a Ceph Object Store Disk (OSD) node. Completing the following steps will +install, and configure the Ceph iSCSI gateway for basic operation. + +**Requirements:** + +- A running Ceph Luminous or later storage cluster + +- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer + +- The following packages must be installed from your Linux distribution's software repository: + + - ``targetcli-2.1.fb47`` or newer package + + - ``python-rtslib-2.1.fb64`` or newer package + + - ``tcmu-runner-1.3.0`` or newer package + + - ``ceph-iscsi-config-2.3`` or newer package + + - ``ceph-iscsi-cli-2.5`` or newer package + + .. important:: + If previous versions of these packages exist, then they must + be removed first before installing the newer versions. + +Do the following steps on the Ceph iSCSI gateway node before proceeding +to the *Installing* section: + +#. If the Ceph iSCSI gateway is not colocated on an OSD node, then copy + the Ceph configuration files, located in ``/etc/ceph/``, from a + running Ceph node in the storage cluster to the iSCSI Gateway node. + The Ceph configuration files must exist on the iSCSI gateway node + under ``/etc/ceph/``. + +#. Install and configure the `Ceph Command-line + Interface <http://docs.ceph.com/docs/master/start/quick-rbd/#install-ceph>`_ + +#. If needed, open TCP ports 3260 and 5000 on the firewall. + +#. Create a new or use an existing RADOS Block Device (RBD). + +**Installing:** + +#. As ``root``, on all iSCSI gateway nodes, install the + ``ceph-iscsi-cli`` package: + + :: + + # yum install ceph-iscsi-cli + +#. As ``root``, on all iSCSI gateway nodes, install the ``tcmu-runner`` + package: + + :: + + # yum install tcmu-runner + +#. As ``root``, on a iSCSI gateway node, create a file named + ``iscsi-gateway.cfg`` in the ``/etc/ceph/`` directory: + + :: + + # touch /etc/ceph/iscsi-gateway.cfg + + #. Edit the ``iscsi-gateway.cfg`` file and add the following lines: + + :: + + [config] + # Name of the Ceph storage cluster. A suitable Ceph configuration file allowing + # access to the Ceph storage cluster from the gateway node is required, if not + # colocated on an OSD node. + cluster_name = ceph + + # Place a copy of the ceph cluster's admin keyring in the gateway's /etc/ceph + # drectory and reference the filename here + gateway_keyring = ceph.client.admin.keyring + + + # API settings. + # The API supports a number of options that allow you to tailor it to your + # local environment. If you want to run the API under https, you will need to + # create cert/key files that are compatible for each iSCSI gateway node, that is + # not locked to a specific node. SSL cert and key files *must* be called + # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory + # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true' + # to switch to https mode. + + # To support the API, the bear minimum settings are: + api_secure = false + + # Additional API configuration options are as follows, defaults shown. + # api_user = admin + # api_password = admin + # api_port = 5001 + # trusted_ip_list = 192.168.0.10,192.168.0.11 + + .. important:: + The ``iscsi-gateway.cfg`` file must be identical on all iSCSI gateway nodes. + + #. As ``root``, copy the ``iscsi-gateway.cfg`` file to all iSCSI + gateway nodes. + +#. As ``root``, on all iSCSI gateway nodes, enable and start the API + service: + + :: + + # systemctl enable rbd-target-api + # systemctl start rbd-target-api + +**Configuring:** + +#. As ``root``, on a iSCSI gateway node, start the iSCSI gateway + command-line interface: + + :: + + # gwcli + +#. Creating the iSCSI gateways: + + :: + + >/iscsi-target create iqn.2003-01.com.redhat.iscsi-gw:<target_name> + > goto gateways + > create <iscsi_gw_name> <IP_addr_of_gw> + > create <iscsi_gw_name> <IP_addr_of_gw> + +#. Adding a RADOS Block Device (RBD): + + :: + + > cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:<target_name>/disks/ + >/disks/ create pool=<pool_name> image=<image_name> size=<image_size>m|g|t + +#. Creating a client: + + :: + + > goto hosts + > create iqn.1994-05.com.redhat:<client_name> + > auth chap=<user_name>/<password> | nochap + + + .. warning:: + CHAP must always be configured. Without CHAP, the target will + reject any login requests. + +#. Adding disks to a client: + + :: + + >/iscsi-target..eph-igw/hosts> cd iqn.1994-05.com.redhat:<client_name> + > disk add <pool_name>.<image_name> + +The next step is to configure the iSCSI initiators. diff --git a/src/ceph/doc/rbd/iscsi-targets.rst b/src/ceph/doc/rbd/iscsi-targets.rst new file mode 100644 index 0000000..b7dcac7 --- /dev/null +++ b/src/ceph/doc/rbd/iscsi-targets.rst @@ -0,0 +1,27 @@ +============= +iSCSI Targets +============= + +Traditionally, block-level access to a Ceph storage cluster has been +limited to QEMU and ``librbd``, which is a key enabler for adoption +within OpenStack environments. Starting with the Ceph Luminous release, +block-level access is expanding to offer standard iSCSI support allowing +wider platform usage, and potentially opening new use cases. + +- RHEL/CentOS 7.4; or Linux kernel v4.14 or newer + +- A working Ceph Storage cluster, deployed with ``ceph-ansible`` or using the command-line interface + +- iSCSI gateways nodes, which can either be colocated with OSD nodes or on dedicated nodes + +- Separate network subnets for iSCSI front-end traffic and Ceph back-end traffic + +A choice of using Ansible or the command-line interface are the +available deployment methods for installing and configuring the Ceph +iSCSI gateway: + +.. toctree:: + :maxdepth: 1 + + Using Ansible <iscsi-target-ansible> + Using the Command Line Interface <iscsi-target-cli> diff --git a/src/ceph/doc/rbd/libvirt.rst b/src/ceph/doc/rbd/libvirt.rst new file mode 100644 index 0000000..f953b1f --- /dev/null +++ b/src/ceph/doc/rbd/libvirt.rst @@ -0,0 +1,319 @@ +================================= + Using libvirt with Ceph RBD +================================= + +.. index:: Ceph Block Device; livirt + +The ``libvirt`` library creates a virtual machine abstraction layer between +hypervisor interfaces and the software applications that use them. With +``libvirt``, developers and system administrators can focus on a common +management framework, common API, and common shell interface (i.e., ``virsh``) +to many different hypervisors, including: + +- QEMU/KVM +- XEN +- LXC +- VirtualBox +- etc. + +Ceph block devices support QEMU/KVM. You can use Ceph block devices with +software that interfaces with ``libvirt``. The following stack diagram +illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``. + + +.. ditaa:: +---------------------------------------------------+ + | libvirt | + +------------------------+--------------------------+ + | + | configures + v + +---------------------------------------------------+ + | QEMU | + +---------------------------------------------------+ + | librbd | + +------------------------+-+------------------------+ + | OSDs | | Monitors | + +------------------------+ +------------------------+ + + +The most common ``libvirt`` use case involves providing Ceph block devices to +cloud solutions like OpenStack or CloudStack. The cloud solution uses +``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block +devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices +and CloudStack`_ for details. See `Installation`_ for installation details. + +You can also use Ceph block devices with ``libvirt``, ``virsh`` and the +``libvirt`` API. See `libvirt Virtualization API`_ for details. + + +To create VMs that use Ceph block devices, use the procedures in the following +sections. In the exemplary embodiment, we have used ``libvirt-pool`` for the pool +name, ``client.libvirt`` for the user name, and ``new-libvirt-image`` for the +image name. You may use any value you like, but ensure you replace those values +when executing commands in the subsequent procedures. + + +Configuring Ceph +================ + +To configure Ceph for use with ``libvirt``, perform the following steps: + +#. `Create a pool`_. The following example uses the + pool name ``libvirt-pool`` with 128 placement groups. :: + + ceph osd pool create libvirt-pool 128 128 + + Verify the pool exists. :: + + ceph osd lspools + +#. Use the ``rbd`` tool to initialize the pool for use by RBD:: + + rbd pool init <pool-name> + +#. `Create a Ceph User`_ (or use ``client.admin`` for version 0.9.7 and + earlier). The following example uses the Ceph user name ``client.libvirt`` + and references ``libvirt-pool``. :: + + ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool' + + Verify the name exists. :: + + ceph auth ls + + **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``, + not the Ceph name ``client.libvirt``. See `User Management - User`_ and + `User Management - CLI`_ for a detailed explanation of the difference + between ID and name. + +#. Use QEMU to `create an image`_ in your RBD pool. + The following example uses the image name ``new-libvirt-image`` + and references ``libvirt-pool``. :: + + qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G + + Verify the image exists. :: + + rbd -p libvirt-pool ls + + **NOTE:** You can also use `rbd create`_ to create an image, but we + recommend ensuring that QEMU is working properly. + +.. tip:: Optionally, if you wish to enable debug logs and the admin socket for + this client, you can add the following section to ``/etc/ceph/ceph.conf``:: + + [client.libvirt] + log file = /var/log/ceph/qemu-guest-$pid.log + admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok + + The ``client.libvirt`` section name should match the cephx user you created + above. If SELinux or AppArmor is enabled, note that this could prevent the + client process (qemu via libvirt) from writing the logs or admin socket to + the destination locations (``/var/log/ceph`` or ``/var/run/ceph``). + + + +Preparing the VM Manager +======================== + +You may use ``libvirt`` without a VM manager, but you may find it simpler to +create your first domain with ``virt-manager``. + +#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. :: + + sudo apt-get install virt-manager + +#. Download an OS image (if necessary). + +#. Launch the virtual machine manager. :: + + sudo virt-manager + + + +Creating a VM +============= + +To create a VM with ``virt-manager``, perform the following steps: + +#. Press the **Create New Virtual Machine** button. + +#. Name the new virtual machine domain. In the exemplary embodiment, we + use the name ``libvirt-virtual-machine``. You may use any name you wish, + but ensure you replace ``libvirt-virtual-machine`` with the name you + choose in subsequent commandline and configuration examples. :: + + libvirt-virtual-machine + +#. Import the image. :: + + /path/to/image/recent-linux.img + + **NOTE:** Import a recent image. Some older images may not rescan for + virtual devices properly. + +#. Configure and start the VM. + +#. You may use ``virsh list`` to verify the VM domain exists. :: + + sudo virsh list + +#. Login to the VM (root/root) + +#. Stop the VM before configuring it for use with Ceph. + + +Configuring the VM +================== + +When configuring the VM for use with Ceph, it is important to use ``virsh`` +where appropriate. Additionally, ``virsh`` commands often require root +privileges (i.e., ``sudo``) and will not return appropriate results or notify +you that that root privileges are required. For a reference of ``virsh`` +commands, refer to `Virsh Command Reference`_. + + +#. Open the configuration file with ``virsh edit``. :: + + sudo virsh edit {vm-domain-name} + + Under ``<devices>`` there should be a ``<disk>`` entry. :: + + <devices> + <emulator>/usr/bin/kvm</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='raw'/> + <source file='/path/to/image/recent-linux.img'/> + <target dev='vda' bus='virtio'/> + <address type='drive' controller='0' bus='0' unit='0'/> + </disk> + + + Replace ``/path/to/image/recent-linux.img`` with the path to the OS image. + The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See + `Virtio`_ for details. + + **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit + the configuration file under ``/etc/libvirt/qemu`` with a text editor, + ``libvirt`` may not recognize the change. If there is a discrepancy between + the contents of the XML file under ``/etc/libvirt/qemu`` and the result of + ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work + properly. + + +#. Add the Ceph RBD image you created as a ``<disk>`` entry. :: + + <disk type='network' device='disk'> + <source protocol='rbd' name='libvirt-pool/new-libvirt-image'> + <host name='{monitor-host}' port='6789'/> + </source> + <target dev='vda' bus='virtio'/> + </disk> + + Replace ``{monitor-host}`` with the name of your host, and replace the + pool and/or image name as necessary. You may add multiple ``<host>`` + entries for your Ceph monitors. The ``dev`` attribute is the logical + device name that will appear under the ``/dev`` directory of your + VM. The optional ``bus`` attribute indicates the type of disk device to + emulate. The valid settings are driver specific (e.g., "ide", "scsi", + "virtio", "xen", "usb" or "sata"). + + See `Disks`_ for details of the ``<disk>`` element, and its child elements + and attributes. + +#. Save the file. + +#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by + default), you must generate a secret. :: + + cat > secret.xml <<EOF + <secret ephemeral='no' private='no'> + <usage type='ceph'> + <name>client.libvirt secret</name> + </usage> + </secret> + EOF + +#. Define the secret. :: + + sudo virsh secret-define --file secret.xml + <uuid of secret is output here> + +#. Get the ``client.libvirt`` key and save the key string to a file. :: + + ceph auth get-key client.libvirt | sudo tee client.libvirt.key + +#. Set the UUID of the secret. :: + + sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml + + You must also set the secret manually by adding the following ``<auth>`` + entry to the ``<disk>`` element you entered earlier (replacing the + ``uuid`` value with the result from the command line example above). :: + + sudo virsh edit {vm-domain-name} + + Then, add ``<auth></auth>`` element to the domain configuration file:: + + ... + </source> + <auth username='libvirt'> + <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> + </auth> + <target ... + + + **NOTE:** The exemplary ID is ``libvirt``, not the Ceph name + ``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure + you use the ID component of the Ceph name you generated. If for some reason + you need to regenerate the secret, you will have to execute + ``sudo virsh secret-undefine {uuid}`` before executing + ``sudo virsh secret-set-value`` again. + + +Summary +======= + +Once you have configured the VM for use with Ceph, you can start the VM. +To verify that the VM and Ceph are communicating, you may perform the +following procedures. + + +#. Check to see if Ceph is running:: + + ceph health + +#. Check to see if the VM is running. :: + + sudo virsh list + +#. Check to see if the VM is communicating with Ceph. Replace + ``{vm-domain-name}`` with the name of your VM domain:: + + sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block' + +#. Check to see if the device from ``<target dev='hdb' bus='ide'/>`` appears + under ``/dev`` or under ``proc/partitions``. :: + + ls dev + cat proc/partitions + +If everything looks okay, you may begin using the Ceph block device +within your VM. + + +.. _Installation: ../../install +.. _libvirt Virtualization API: http://www.libvirt.org +.. _Block Devices and OpenStack: ../rbd-openstack +.. _Block Devices and CloudStack: ../rbd-cloudstack +.. _Create a pool: ../../rados/operations/pools#create-a-pool +.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user +.. _create an image: ../qemu-rbd#creating-images-with-qemu +.. _Virsh Command Reference: http://www.libvirt.org/virshcmdref.html +.. _KVM/VirtManager: https://help.ubuntu.com/community/KVM/VirtManager +.. _Ceph Authentication: ../../rados/configuration/auth-config-ref +.. _Disks: http://www.libvirt.org/formatdomain.html#elementsDisks +.. _rbd create: ../rados-rbd-cmds#creating-a-block-device-image +.. _User Management - User: ../../rados/operations/user-management#user +.. _User Management - CLI: ../../rados/operations/user-management#command-line-usage +.. _Virtio: http://www.linux-kvm.org/page/Virtio diff --git a/src/ceph/doc/rbd/man/index.rst b/src/ceph/doc/rbd/man/index.rst new file mode 100644 index 0000000..33a192a --- /dev/null +++ b/src/ceph/doc/rbd/man/index.rst @@ -0,0 +1,16 @@ +============================ + Ceph Block Device Manpages +============================ + +.. toctree:: + :maxdepth: 1 + + rbd <../../man/8/rbd> + rbd-fuse <../../man/8/rbd-fuse> + rbd-nbd <../../man/8/rbd-nbd> + rbd-ggate <../../man/8/rbd-ggate> + ceph-rbdnamer <../../man/8/ceph-rbdnamer> + rbd-replay-prep <../../man/8/rbd-replay-prep> + rbd-replay <../../man/8/rbd-replay> + rbd-replay-many <../../man/8/rbd-replay-many> + rbd-map <../../man/8/rbdmap> diff --git a/src/ceph/doc/rbd/qemu-rbd.rst b/src/ceph/doc/rbd/qemu-rbd.rst new file mode 100644 index 0000000..80c5dcc --- /dev/null +++ b/src/ceph/doc/rbd/qemu-rbd.rst @@ -0,0 +1,218 @@ +======================== + QEMU and Block Devices +======================== + +.. index:: Ceph Block Device; QEMU KVM + +The most frequent Ceph Block Device use case involves providing block device +images to virtual machines. For example, a user may create a "golden" image +with an OS and any relevant software in an ideal configuration. Then, the user +takes a snapshot of the image. Finally, the user clones the snapshot (usually +many times). See `Snapshots`_ for details. The ability to make copy-on-write +clones of a snapshot means that Ceph can provision block device images to +virtual machines quickly, because the client doesn't have to download an entire +image each time it spins up a new virtual machine. + + +.. ditaa:: +---------------------------------------------------+ + | QEMU | + +---------------------------------------------------+ + | librbd | + +---------------------------------------------------+ + | librados | + +------------------------+-+------------------------+ + | OSDs | | Monitors | + +------------------------+ +------------------------+ + + +Ceph Block Devices can integrate with the QEMU virtual machine. For details on +QEMU, see `QEMU Open Source Processor Emulator`_. For QEMU documentation, see +`QEMU Manual`_. For installation details, see `Installation`_. + +.. important:: To use Ceph Block Devices with QEMU, you must have access to a + running Ceph cluster. + + +Usage +===== + +The QEMU command line expects you to specify the pool name and image name. You +may also specify a snapshot name. + +QEMU will assume that the Ceph configuration file resides in the default +location (e.g., ``/etc/ceph/$cluster.conf``) and that you are executing +commands as the default ``client.admin`` user unless you expressly specify +another Ceph configuration file path or another user. When specifying a user, +QEMU uses the ``ID`` rather than the full ``TYPE:ID``. See `User Management - +User`_ for details. Do not prepend the client type (i.e., ``client.``) to the +beginning of the user ``ID``, or you will receive an authentication error. You +should have the key for the ``admin`` user or the key of another user you +specify with the ``:id={user}`` option in a keyring file stored in default path +(i.e., ``/etc/ceph`` or the local directory with appropriate file ownership and +permissions. Usage takes the following form:: + + qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...] + +For example, specifying the ``id`` and ``conf`` options might look like the following:: + + qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf + +.. tip:: Configuration values containing ``:``, ``@``, or ``=`` can be escaped with a + leading ``\`` character. + + +Creating Images with QEMU +========================= + +You can create a block device image from QEMU. You must specify ``rbd``, the +pool name, and the name of the image you wish to create. You must also specify +the size of the image. :: + + qemu-img create -f raw rbd:{pool-name}/{image-name} {size} + +For example:: + + qemu-img create -f raw rbd:data/foo 10G + +.. important:: The ``raw`` data format is really the only sensible + ``format`` option to use with RBD. Technically, you could use other + QEMU-supported formats (such as ``qcow2`` or ``vmdk``), but doing + so would add additional overhead, and would also render the volume + unsafe for virtual machine live migration when caching (see below) + is enabled. + + +Resizing Images with QEMU +========================= + +You can resize a block device image from QEMU. You must specify ``rbd``, +the pool name, and the name of the image you wish to resize. You must also +specify the size of the image. :: + + qemu-img resize rbd:{pool-name}/{image-name} {size} + +For example:: + + qemu-img resize rbd:data/foo 10G + + +Retrieving Image Info with QEMU +=============================== + +You can retrieve block device image information from QEMU. You must +specify ``rbd``, the pool name, and the name of the image. :: + + qemu-img info rbd:{pool-name}/{image-name} + +For example:: + + qemu-img info rbd:data/foo + + +Running QEMU with RBD +===================== + +QEMU can pass a block device from the host on to a guest, but since +QEMU 0.15, there's no need to map an image as a block device on +the host. Instead, QEMU can access an image as a virtual block +device directly via ``librbd``. This performs better because it avoids +an additional context switch, and can take advantage of `RBD caching`_. + +You can use ``qemu-img`` to convert existing virtual machine images to Ceph +block device images. For example, if you have a qcow2 image, you could run:: + + qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze + +To run a virtual machine booting from that image, you could run:: + + qemu -m 1024 -drive format=raw,file=rbd:data/squeeze + +`RBD caching`_ can significantly improve performance. +Since QEMU 1.2, QEMU's cache options control ``librbd`` caching:: + + qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback + +If you have an older version of QEMU, you can set the ``librbd`` cache +configuration (like any Ceph configuration option) as part of the +'file' parameter:: + + qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback + +.. important:: If you set rbd_cache=true, you must set cache=writeback + or risk data loss. Without cache=writeback, QEMU will not send + flush requests to librbd. If QEMU exits uncleanly in this + configuration, filesystems on top of rbd can be corrupted. + +.. _RBD caching: ../rbd-config-ref/#rbd-cache-config-settings + + +.. index:: Ceph Block Device; discard trim and libvirt + +Enabling Discard/TRIM +===================== + +Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support the +discard operation. This means that a guest can send TRIM requests to let a Ceph +block device reclaim unused space. This can be enabled in the guest by mounting +``ext4`` or ``XFS`` with the ``discard`` option. + +For this to be available to the guest, it must be explicitly enabled +for the block device. To do this, you must specify a +``discard_granularity`` associated with the drive:: + + qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \ + -device driver=ide-hd,drive=drive1,discard_granularity=512 + +Note that this uses the IDE driver. The virtio driver does not +support discard. + +If using libvirt, edit your libvirt domain's configuration file using ``virsh +edit`` to include the ``xmlns:qemu`` value. Then, add a ``qemu:commandline`` +block as a child of that domain. The following example shows how to set two +devices with ``qemu id=`` to different ``discard_granularity`` values. + +.. code-block:: guess + + <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> + <qemu:commandline> + <qemu:arg value='-set'/> + <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/> + <qemu:arg value='-set'/> + <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/> + </qemu:commandline> + </domain> + + +.. index:: Ceph Block Device; cache options + +QEMU Cache Options +================== + +QEMU's cache options correspond to the following Ceph `RBD Cache`_ settings. + +Writeback:: + + rbd_cache = true + +Writethrough:: + + rbd_cache = true + rbd_cache_max_dirty = 0 + +None:: + + rbd_cache = false + +QEMU's cache settings override Ceph's cache settings (including settings that +are explicitly set in the Ceph configuration file). + +.. note:: Prior to QEMU v2.4.0, if you explicitly set `RBD Cache`_ settings + in the Ceph configuration file, your Ceph settings override the QEMU cache + settings. + +.. _QEMU Open Source Processor Emulator: http://wiki.qemu.org/Main_Page +.. _QEMU Manual: http://wiki.qemu.org/Manual +.. _RBD Cache: ../rbd-config-ref/ +.. _Snapshots: ../rbd-snapshot/ +.. _Installation: ../../install +.. _User Management - User: ../../rados/operations/user-management#user diff --git a/src/ceph/doc/rbd/rados-rbd-cmds.rst b/src/ceph/doc/rbd/rados-rbd-cmds.rst new file mode 100644 index 0000000..65f7737 --- /dev/null +++ b/src/ceph/doc/rbd/rados-rbd-cmds.rst @@ -0,0 +1,223 @@ +======================= + Block Device Commands +======================= + +.. index:: Ceph Block Device; image management + +The ``rbd`` command enables you to create, list, introspect and remove block +device images. You can also use it to clone images, create snapshots, +rollback an image to a snapshot, view a snapshot, etc. For details on using +the ``rbd`` command, see `RBD – Manage RADOS Block Device (RBD) Images`_ for +details. + +.. important:: To use Ceph Block Device commands, you must have access to + a running Ceph cluster. + +Create a Block Device Pool +========================== + +#. On the admin node, use the ``ceph`` tool to `create a pool`_. + +#. On the admin node, use the ``rbd`` tool to initialize the pool for use by RBD:: + + rbd pool init <pool-name> + +.. note:: The ``rbd`` tool assumes a default pool name of 'rbd' when not + provided. + +Create a Block Device User +========================== + +Unless specified, the ``rbd`` command will access the Ceph cluster using the ID +``admin``. This ID allows full administrative access to the cluster. It is +recommended that you utilize a more restricted user wherever possible. + +To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create`` +command, user name, monitor caps, and OSD caps:: + + ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' + +For example, to create a user ID named ``qemu`` with read-write access to the +pool ``vms`` and read-only access to the pool ``images``, execute the +following:: + + ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images' + +The output from the ``ceph auth get-or-create`` command will be the keyring for +the specified user, which can be written to ``/etc/ceph/ceph.client.{ID}.keyring``. + +.. note:: The user ID can be specified when using the ``rbd`` command by + providing the ``--id {id}`` optional argument. + +Creating a Block Device Image +============================= + +Before you can add a block device to a node, you must create an image for it in +the :term:`Ceph Storage Cluster` first. To create a block device image, execute +the following:: + + rbd create --size {megabytes} {pool-name}/{image-name} + +For example, to create a 1GB image named ``bar`` that stores information in a +pool named ``swimmingpool``, execute the following:: + + rbd create --size 1024 swimmingpool/bar + +If you don't specify pool when creating an image, it will be stored in the +default pool ``rbd``. For example, to create a 1GB image named ``foo`` stored in +the default pool ``rbd``, execute the following:: + + rbd create --size 1024 foo + +.. note:: You must create a pool first before you can specify it as a + source. See `Storage Pools`_ for details. + +Listing Block Device Images +=========================== + +To list block devices in the ``rbd`` pool, execute the following +(i.e., ``rbd`` is the default pool name):: + + rbd ls + +To list block devices in a particular pool, execute the following, +but replace ``{poolname}`` with the name of the pool:: + + rbd ls {poolname} + +For example:: + + rbd ls swimmingpool + +To list deferred delete block devices in the ``rbd`` pool, execute the +following:: + + rbd trash ls + +To list deferred delete block devices in a particular pool, execute the +following, but replace ``{poolname}`` with the name of the pool:: + + rbd trash ls {poolname} + +For example:: + + rbd trash ls swimmingpool + +Retrieving Image Information +============================ + +To retrieve information from a particular image, execute the following, +but replace ``{image-name}`` with the name for the image:: + + rbd info {image-name} + +For example:: + + rbd info foo + +To retrieve information from an image within a pool, execute the following, +but replace ``{image-name}`` with the name of the image and replace ``{pool-name}`` +with the name of the pool:: + + rbd info {pool-name}/{image-name} + +For example:: + + rbd info swimmingpool/bar + +Resizing a Block Device Image +============================= + +:term:`Ceph Block Device` images are thin provisioned. They don't actually use +any physical storage until you begin saving data to them. However, they do have +a maximum capacity that you set with the ``--size`` option. If you want to +increase (or decrease) the maximum size of a Ceph Block Device image, execute +the following:: + + rbd resize --size 2048 foo (to increase) + rbd resize --size 2048 foo --allow-shrink (to decrease) + + +Removing a Block Device Image +============================= + +To remove a block device, execute the following, but replace ``{image-name}`` +with the name of the image you want to remove:: + + rbd rm {image-name} + +For example:: + + rbd rm foo + +To remove a block device from a pool, execute the following, but replace +``{image-name}`` with the name of the image to remove and replace +``{pool-name}`` with the name of the pool:: + + rbd rm {pool-name}/{image-name} + +For example:: + + rbd rm swimmingpool/bar + +To defer delete a block device from a pool, execute the following, but +replace ``{image-name}`` with the name of the image to move and replace +``{pool-name}`` with the name of the pool:: + + rbd trash mv {pool-name}/{image-name} + +For example:: + + rbd trash mv swimmingpool/bar + +To remove a deferred block device from a pool, execute the following, but +replace ``{image-id}`` with the id of the image to remove and replace +``{pool-name}`` with the name of the pool:: + + rbd trash rm {pool-name}/{image-id} + +For example:: + + rbd trash rm swimmingpool/2bf4474b0dc51 + +.. note:: + + * You can move an image to the trash even it has shapshot(s) or actively + in-use by clones, but can not be removed from trash. + + * You can use *--delay* to set the defer time (default is 0), and if its + deferment time has not expired, it can not be removed unless you use + force. + +Restoring a Block Device Image +============================== + +To restore a deferred delete block device in the rbd pool, execute the +following, but replace ``{image-id}`` with the id of the image:: + + rbd trash restore {image-d} + +For example:: + + rbd trash restore 2bf4474b0dc51 + +To restore a deferred delete block device in a particular pool, execute +the following, but replace ``{image-id}`` with the id of the image and +replace ``{pool-name}`` with the name of the pool:: + + rbd trash restore {pool-name}/{image-id} + +For example:: + + rbd trash restore swimmingpool/2bf4474b0dc51 + +Also you can use *--image* to rename the iamge when restore it, for +example:: + + rbd trash restore swimmingpool/2bf4474b0dc51 --image new-name + + +.. _create a pool: ../../rados/operations/pools/#create-a-pool +.. _Storage Pools: ../../rados/operations/pools +.. _RBD – Manage RADOS Block Device (RBD) Images: ../../man/8/rbd/ +.. _create a Ceph user: ../../rados/operations/user-management#add-a-user diff --git a/src/ceph/doc/rbd/rbd-cloudstack.rst b/src/ceph/doc/rbd/rbd-cloudstack.rst new file mode 100644 index 0000000..f66d6d4 --- /dev/null +++ b/src/ceph/doc/rbd/rbd-cloudstack.rst @@ -0,0 +1,135 @@ +============================= + Block Devices and CloudStack +============================= + +You may use Ceph Block Device images with CloudStack 4.0 and higher through +``libvirt``, which configures the QEMU interface to ``librbd``. Ceph stripes +block device images as objects across the cluster, which means that large Ceph +Block Device images have better performance than a standalone server! + +To use Ceph Block Devices with CloudStack 4.0 and higher, you must install QEMU, +``libvirt``, and CloudStack first. We recommend using a separate physical host +for your CloudStack installation. CloudStack recommends a minimum of 4GB of RAM +and a dual-core processor, but more CPU and RAM will perform better. The +following diagram depicts the CloudStack/Ceph technology stack. + + +.. ditaa:: +---------------------------------------------------+ + | CloudStack | + +---------------------------------------------------+ + | libvirt | + +------------------------+--------------------------+ + | + | configures + v + +---------------------------------------------------+ + | QEMU | + +---------------------------------------------------+ + | librbd | + +---------------------------------------------------+ + | librados | + +------------------------+-+------------------------+ + | OSDs | | Monitors | + +------------------------+ +------------------------+ + +.. important:: To use Ceph Block Devices with CloudStack, you must have + access to a running Ceph Storage Cluster. + +CloudStack integrates with Ceph's block devices to provide CloudStack with a +back end for CloudStack's Primary Storage. The instructions below detail the +setup for CloudStack Primary Storage. + +.. note:: We recommend installing with Ubuntu 14.04 or later so that + you can use package installation instead of having to compile + libvirt from source. + +Installing and configuring QEMU for use with CloudStack doesn't require any +special handling. Ensure that you have a running Ceph Storage Cluster. Install +QEMU and configure it for use with Ceph; then, install ``libvirt`` version +0.9.13 or higher (you may need to compile from source) and ensure it is running +with Ceph. + + +.. note:: Ubuntu 14.04 and CentOS 7.2 will have ``libvirt`` with RBD storage + pool support enabled by default. + +.. index:: pools; CloudStack + +Create a Pool +============= + +By default, Ceph block devices use the ``rbd`` pool. Create a pool for +CloudStack NFS Primary Storage. Ensure your Ceph cluster is running, then create +the pool. :: + + ceph osd pool create cloudstack + +See `Create a Pool`_ for details on specifying the number of placement groups +for your pools, and `Placement Groups`_ for details on the number of placement +groups you should set for your pools. + +A newly created pool must initialized prior to use. Use the ``rbd`` tool +to initialize the pool:: + + rbd pool init cloudstack + +Create a Ceph User +================== + +To access the Ceph cluster we require a Ceph user which has the correct +credentials to access the ``cloudstack`` pool we just created. Although we could +use ``client.admin`` for this, it's recommended to create a user with only +access to the ``cloudstack`` pool. :: + + ceph auth get-or-create client.cloudstack mon 'profile rbd' osd 'profile rbd pool=cloudstack' + +Use the information returned by the command in the next step when adding the +Primary Storage. + +See `User Management`_ for additional details. + +Add Primary Storage +=================== + +To add primary storage, refer to `Add Primary Storage (4.2.0)`_ to add a Ceph block device, the steps +include: + +#. Log in to the CloudStack UI. +#. Click **Infrastructure** on the left side navigation bar. +#. Select the Zone you want to use for Primary Storage. +#. Click the **Compute** tab. +#. Select **View All** on the `Primary Storage` node in the diagram. +#. Click **Add Primary Storage**. +#. Follow the CloudStack instructions. + + - For **Protocol**, select ``RBD``. + - Add cluster information (cephx is supported). Note: Do not include the ``client.`` part of the user. + - Add ``rbd`` as a tag. + + +Create a Disk Offering +====================== + +To create a new disk offering, refer to `Create a New Disk Offering (4.2.0)`_. +Create a disk offering so that it matches the ``rbd`` tag. +The ``StoragePoolAllocator`` will choose the ``rbd`` +pool when searching for a suitable storage pool. If the disk offering doesn't +match the ``rbd`` tag, the ``StoragePoolAllocator`` may select the pool you +created (e.g., ``cloudstack``). + + +Limitations +=========== + +- CloudStack will only bind to one monitor (You can however create a Round Robin DNS record over multiple monitors) + + + +.. _Create a Pool: ../../rados/operations/pools#createpool +.. _Placement Groups: ../../rados/operations/placement-groups +.. _Install and Configure QEMU: ../qemu-rbd +.. _Install and Configure libvirt: ../libvirt +.. _KVM Hypervisor Host Installation: http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Installation_Guide/hypervisor-kvm-install-flow.html +.. _Add Primary Storage (4.2.0): http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/primary-storage-add.html +.. _Create a New Disk Offering (4.2.0): http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/compute-disk-service-offerings.html#creating-disk-offerings +.. _User Management: ../../rados/operations/user-management diff --git a/src/ceph/doc/rbd/rbd-config-ref.rst b/src/ceph/doc/rbd/rbd-config-ref.rst new file mode 100644 index 0000000..db942f8 --- /dev/null +++ b/src/ceph/doc/rbd/rbd-config-ref.rst @@ -0,0 +1,136 @@ +======================= + librbd Settings +======================= + +See `Block Device`_ for additional details. + +Cache Settings +======================= + +.. sidebar:: Kernel Caching + + The kernel driver for Ceph block devices can use the Linux page cache to + improve performance. + +The user space implementation of the Ceph block device (i.e., ``librbd``) cannot +take advantage of the Linux page cache, so it includes its own in-memory +caching, called "RBD caching." RBD caching behaves just like well-behaved hard +disk caching. When the OS sends a barrier or a flush request, all dirty data is +written to the OSDs. This means that using write-back caching is just as safe as +using a well-behaved physical hard disk with a VM that properly sends flushes +(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) +algorithm, and in write-back mode it can coalesce contiguous requests for +better throughput. + +.. versionadded:: 0.46 + +Ceph supports write-back caching for RBD. To enable it, add ``rbd cache = +true`` to the ``[client]`` section of your ``ceph.conf`` file. By default +``librbd`` does not perform any caching. Writes and reads go directly to the +storage cluster, and writes return only when the data is on disk on all +replicas. With caching enabled, writes return immediately, unless there are more +than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers +writeback and blocks until enough bytes are flushed. + +.. versionadded:: 0.47 + +Ceph supports write-through caching for RBD. You can set the size of +the cache, and you can set targets and limits to switch from +write-back caching to write through caching. To enable write-through +mode, set ``rbd cache max dirty`` to 0. This means writes return only +when the data is on disk on all replicas, but reads may come from the +cache. The cache is in memory on the client, and each RBD image has +its own. Since the cache is local to the client, there's no coherency +if there are others accessing the image. Running GFS or OCFS on top of +RBD will not work with caching enabled. + +The ``ceph.conf`` file settings for RBD should be set in the ``[client]`` +section of your configuration file. The settings include: + + +``rbd cache`` + +:Description: Enable caching for RADOS Block Device (RBD). +:Type: Boolean +:Required: No +:Default: ``true`` + + +``rbd cache size`` + +:Description: The RBD cache size in bytes. +:Type: 64-bit Integer +:Required: No +:Default: ``32 MiB`` + + +``rbd cache max dirty`` + +:Description: The ``dirty`` limit in bytes at which the cache triggers write-back. If ``0``, uses write-through caching. +:Type: 64-bit Integer +:Required: No +:Constraint: Must be less than ``rbd cache size``. +:Default: ``24 MiB`` + + +``rbd cache target dirty`` + +:Description: The ``dirty target`` before the cache begins writing data to the data storage. Does not block writes to the cache. +:Type: 64-bit Integer +:Required: No +:Constraint: Must be less than ``rbd cache max dirty``. +:Default: ``16 MiB`` + + +``rbd cache max dirty age`` + +:Description: The number of seconds dirty data is in the cache before writeback starts. +:Type: Float +:Required: No +:Default: ``1.0`` + +.. versionadded:: 0.60 + +``rbd cache writethrough until flush`` + +:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32. +:Type: Boolean +:Required: No +:Default: ``true`` + +.. _Block Device: ../../rbd + + +Read-ahead Settings +======================= + +.. versionadded:: 0.86 + +RBD supports read-ahead/prefetching to optimize small, sequential reads. +This should normally be handled by the guest OS in the case of a VM, +but boot loaders may not issue efficient reads. +Read-ahead is automatically disabled if caching is disabled. + + +``rbd readahead trigger requests`` + +:Description: Number of sequential read requests necessary to trigger read-ahead. +:Type: Integer +:Required: No +:Default: ``10`` + + +``rbd readahead max bytes`` + +:Description: Maximum size of a read-ahead request. If zero, read-ahead is disabled. +:Type: 64-bit Integer +:Required: No +:Default: ``512 KiB`` + + +``rbd readahead disable after bytes`` + +:Description: After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled. +:Type: 64-bit Integer +:Required: No +:Default: ``50 MiB`` diff --git a/src/ceph/doc/rbd/rbd-ko.rst b/src/ceph/doc/rbd/rbd-ko.rst new file mode 100644 index 0000000..951757c --- /dev/null +++ b/src/ceph/doc/rbd/rbd-ko.rst @@ -0,0 +1,59 @@ +========================== + Kernel Module Operations +========================== + +.. index:: Ceph Block Device; kernel module + +.. important:: To use kernel module operations, you must have a running Ceph cluster. + +Get a List of Images +==================== + +To mount a block device image, first return a list of the images. :: + + rbd list + +Map a Block Device +================== + +Use ``rbd`` to map an image name to a kernel module. You must specify the +image name, the pool name, and the user name. ``rbd`` will load RBD kernel +module on your behalf if it's not already loaded. :: + + sudo rbd map {pool-name}/{image-name} --id {user-name} + +For example:: + + sudo rbd map rbd/myimage --id admin + +If you use `cephx`_ authentication, you must also specify a secret. It may come +from a keyring or a file containing the secret. :: + + sudo rbd map rbd/myimage --id admin --keyring /path/to/keyring + sudo rbd map rbd/myimage --id admin --keyfile /path/to/file + + +Show Mapped Block Devices +========================= + +To show block device images mapped to kernel modules with the ``rbd`` command, +specify the ``showmapped`` option. :: + + rbd showmapped + + +Unmapping a Block Device +======================== + +To unmap a block device image with the ``rbd`` command, specify the ``unmap`` +option and the device name (i.e., by convention the same as the block device +image name). :: + + sudo rbd unmap /dev/rbd/{poolname}/{imagename} + +For example:: + + sudo rbd unmap /dev/rbd/rbd/foo + + +.. _cephx: ../../rados/operations/user-management/ diff --git a/src/ceph/doc/rbd/rbd-mirroring.rst b/src/ceph/doc/rbd/rbd-mirroring.rst new file mode 100644 index 0000000..989f1fc --- /dev/null +++ b/src/ceph/doc/rbd/rbd-mirroring.rst @@ -0,0 +1,318 @@ +=============== + RBD Mirroring +=============== + +.. index:: Ceph Block Device; mirroring + +RBD images can be asynchronously mirrored between two Ceph clusters. This +capability uses the RBD journaling image feature to ensure crash-consistent +replication between clusters. Mirroring is configured on a per-pool basis +within peer clusters and can be configured to automatically mirror all +images within a pool or only a specific subset of images. Mirroring is +configured using the ``rbd`` command. The ``rbd-mirror`` daemon is responsible +for pulling image updates from the remote, peer cluster and applying them to +the image within the local cluster. + +.. note:: RBD mirroring requires the Ceph Jewel release or later. + +.. important:: To use RBD mirroring, you must have two Ceph clusters, each + running the ``rbd-mirror`` daemon. + +Pool Configuration +================== + +The following procedures demonstrate how to perform the basic administrative +tasks to configure mirroring using the ``rbd`` command. Mirroring is +configured on a per-pool basis within the Ceph clusters. + +The pool configuration steps should be performed on both peer clusters. These +procedures assume two clusters, named "local" and "remote", are accessible from +a single host for clarity. + +See the `rbd`_ manpage for additional details of how to connect to different +Ceph clusters. + +.. note:: The cluster name in the following examples corresponds to a Ceph + configuration file of the same name (e.g. /etc/ceph/remote.conf). See the + `ceph-conf`_ documentation for how to configure multiple clusters. + +.. note:: Images in a given pool will be mirrored to a pool with the same name + on the remote cluster. Images using a separate data-pool will use a data-pool + with the same name on the remote cluster. E.g., if an image being mirrored is + in the ``rbd`` pool on the local cluster and using a data-pool called + ``rbd-ec``, pools called ``rbd`` and ``rbd-ec`` must exist on the remote + cluster and will be used for mirroring the image. + +Enable Mirroring +---------------- + +To enable mirroring on a pool with ``rbd``, specify the ``mirror pool enable`` +command, the pool name, and the mirroring mode:: + + rbd mirror pool enable {pool-name} {mode} + +The mirroring mode can either be ``pool`` or ``image``: + +* **pool**: When configured in ``pool`` mode, all images in the pool with the + journaling feature enabled are mirrored. +* **image**: When configured in ``image`` mode, mirroring needs to be + `explicitly enabled`_ on each image. + +For example:: + + rbd --cluster local mirror pool enable image-pool pool + rbd --cluster remote mirror pool enable image-pool pool + +Disable Mirroring +----------------- + +To disable mirroring on a pool with ``rbd``, specify the ``mirror pool disable`` +command and the pool name:: + + rbd mirror pool disable {pool-name} + +When mirroring is disabled on a pool in this way, mirroring will also be +disabled on any images (within the pool) for which mirroring was enabled +explicitly. + +For example:: + + rbd --cluster local mirror pool disable image-pool + rbd --cluster remote mirror pool disable image-pool + +Add Cluster Peer +---------------- + +In order for the ``rbd-mirror`` daemon to discover its peer cluster, the peer +needs to be registered to the pool. To add a mirroring peer Ceph cluster with +``rbd``, specify the ``mirror pool peer add`` command, the pool name, and a +cluster specification:: + + rbd mirror pool peer add {pool-name} {client-name}@{cluster-name} + +For example:: + + rbd --cluster local mirror pool peer add image-pool client.remote@remote + rbd --cluster remote mirror pool peer add image-pool client.local@local + +Remove Cluster Peer +------------------- + +To remove a mirroring peer Ceph cluster with ``rbd``, specify the +``mirror pool peer remove`` command, the pool name, and the peer UUID +(available from the ``rbd mirror pool info`` command):: + + rbd mirror pool peer remove {pool-name} {peer-uuid} + +For example:: + + rbd --cluster local mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445 + rbd --cluster remote mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08 + +Image Configuration +=================== + +Unlike pool configuration, image configuration only needs to be performed against +a single mirroring peer Ceph cluster. + +Mirrored RBD images are designated as either primary or non-primary. This is a +property of the image and not the pool. Images that are designated as +non-primary cannot be modified. + +Images are automatically promoted to primary when mirroring is first enabled on +an image (either implicitly if the pool mirror mode was **pool** and the image +has the journaling image feature enabled, or `explicitly enabled`_ by the +``rbd`` command). + +Enable Image Journaling Support +------------------------------- + +RBD mirroring uses the RBD journaling feature to ensure that the replicated +image always remains crash-consistent. Before an image can be mirrored to +a peer cluster, the journaling feature must be enabled. The feature can be +enabled at image creation time by providing the +``--image-feature exclusive-lock,journaling`` option to the ``rbd`` command. + +Alternatively, the journaling feature can be dynamically enabled on +pre-existing RBD images. To enable journaling with ``rbd``, specify +the ``feature enable`` command, the pool and image name, and the feature name:: + + rbd feature enable {pool-name}/{image-name} {feature-name} + +For example:: + + rbd --cluster local feature enable image-pool/image-1 journaling + +.. note:: The journaling feature is dependent on the exclusive-lock feature. If + the exclusive-lock feature is not already enabled, it should be enabled prior + to enabling the journaling feature. + +.. tip:: You can enable journaling on all new images by default by adding + ``rbd default features = 125`` to your Ceph configuration file. + +Enable Image Mirroring +---------------------- + +If the mirroring is configured in ``image`` mode for the image's pool, then it +is necessary to explicitly enable mirroring for each image within the pool. +To enable mirroring for a specific image with ``rbd``, specify the +``mirror image enable`` command along with the pool and image name:: + + rbd mirror image enable {pool-name}/{image-name} + +For example:: + + rbd --cluster local mirror image enable image-pool/image-1 + +Disable Image Mirroring +----------------------- + +To disable mirroring for a specific image with ``rbd``, specify the +``mirror image disable`` command along with the pool and image name:: + + rbd mirror image disable {pool-name}/{image-name} + +For example:: + + rbd --cluster local mirror image disable image-pool/image-1 + +Image Promotion and Demotion +---------------------------- + +In a failover scenario where the primary designation needs to be moved to the +image in the peer Ceph cluster, access to the primary image should be stopped +(e.g. power down the VM or remove the associated drive from a VM), demote the +current primary image, promote the new primary image, and resume access to the +image on the alternate cluster. + +.. note:: RBD only provides the necessary tools to facilitate an orderly + failover of an image. An external mechanism is required to coordinate the + full failover process (e.g. closing the image before demotion). + +To demote a specific image to non-primary with ``rbd``, specify the +``mirror image demote`` command along with the pool and image name:: + + rbd mirror image demote {pool-name}/{image-name} + +For example:: + + rbd --cluster local mirror image demote image-pool/image-1 + +To demote all primary images within a pool to non-primary with ``rbd``, specify +the ``mirror pool demote`` command along with the pool name:: + + rbd mirror pool demote {pool-name} + +For example:: + + rbd --cluster local mirror pool demote image-pool + +To promote a specific image to primary with ``rbd``, specify the +``mirror image promote`` command along with the pool and image name:: + + rbd mirror image promote [--force] {pool-name}/{image-name} + +For example:: + + rbd --cluster remote mirror image promote image-pool/image-1 + +To promote all non-primary images within a pool to primary with ``rbd``, specify +the ``mirror pool promote`` command along with the pool name:: + + rbd mirror pool promote [--force] {pool-name} + +For example:: + + rbd --cluster local mirror pool promote image-pool + +.. tip:: Since the primary / non-primary status is per-image, it is possible to + have two clusters split the IO load and stage failover / failback. + +.. note:: Promotion can be forced using the ``--force`` option. Forced + promotion is needed when the demotion cannot be propagated to the peer + Ceph cluster (e.g. Ceph cluster failure, communication outage). This will + result in a split-brain scenario between the two peers and the image will no + longer be in-sync until a `force resync command`_ is issued. + +Force Image Resync +------------------ + +If a split-brain event is detected by the ``rbd-mirror`` daemon, it will not +attempt to mirror the affected image until corrected. To resume mirroring for an +image, first `demote the image`_ determined to be out-of-date and then request a +resync to the primary image. To request an image resync with ``rbd``, specify the +``mirror image resync`` command along with the pool and image name:: + + rbd mirror image resync {pool-name}/{image-name} + +For example:: + + rbd mirror image resync image-pool/image-1 + +.. note:: The ``rbd`` command only flags the image as requiring a resync. The + local cluster's ``rbd-mirror`` daemon process is responsible for performing + the resync asynchronously. + +Mirror Status +============= + +The peer cluster replication status is stored for every primary mirrored image. +This status can be retrieved using the ``mirror image status`` and +``mirror pool status`` commands. + +To request the mirror image status with ``rbd``, specify the +``mirror image status`` command along with the pool and image name:: + + rbd mirror image status {pool-name}/{image-name} + +For example:: + + rbd mirror image status image-pool/image-1 + +To request the mirror pool summary status with ``rbd``, specify the +``mirror pool status`` command along with the pool name:: + + rbd mirror pool status {pool-name} + +For example:: + + rbd mirror pool status image-pool + +.. note:: Adding ``--verbose`` option to the ``mirror pool status`` command will + additionally output status details for every mirroring image in the pool. + +rbd-mirror Daemon +================= + +The two ``rbd-mirror`` daemons are responsible for watching image journals on the +remote, peer cluster and replaying the journal events against the local +cluster. The RBD image journaling feature records all modifications to the +image in the order they occur. This ensures that a crash-consistent mirror of +the remote image is available locally. + +The ``rbd-mirror`` daemon is available within the optional ``rbd-mirror`` +distribution package. + +.. important:: Each ``rbd-mirror`` daemon requires the ability to connect + to both clusters simultaneously. +.. warning:: Pre-Luminous releases: only run a single ``rbd-mirror`` daemon per + Ceph cluster. + +Each ``rbd-mirror`` daemon should use a unique Ceph user ID. To +`create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create`` +command, user name, monitor caps, and OSD caps:: + + ceph auth get-or-create client.rbd-mirror.{unique id} mon 'profile rbd' osd 'profile rbd' + +The ``rbd-mirror`` daemon can be managed by ``systemd`` by specifying the user +ID as the daemon instance:: + + systemctl enable ceph-rbd-mirror@rbd-mirror.{unique id} + +.. _rbd: ../../man/8/rbd +.. _ceph-conf: ../../rados/configuration/ceph-conf/#running-multiple-clusters +.. _explicitly enabled: #enable-image-mirroring +.. _force resync command: #force-image-resync +.. _demote the image: #image-promotion-and-demotion +.. _create a Ceph user: ../../rados/operations/user-management#add-a-user + diff --git a/src/ceph/doc/rbd/rbd-openstack.rst b/src/ceph/doc/rbd/rbd-openstack.rst new file mode 100644 index 0000000..db52028 --- /dev/null +++ b/src/ceph/doc/rbd/rbd-openstack.rst @@ -0,0 +1,512 @@ +============================= + Block Devices and OpenStack +============================= + +.. index:: Ceph Block Device; OpenStack + +You may use Ceph Block Device images with OpenStack through ``libvirt``, which +configures the QEMU interface to ``librbd``. Ceph stripes block device images as +objects across the cluster, which means that large Ceph Block Device images have +better performance than a standalone server! + +To use Ceph Block Devices with OpenStack, you must install QEMU, ``libvirt``, +and OpenStack first. We recommend using a separate physical node for your +OpenStack installation. OpenStack recommends a minimum of 8GB of RAM and a +quad-core processor. The following diagram depicts the OpenStack/Ceph +technology stack. + + +.. ditaa:: +---------------------------------------------------+ + | OpenStack | + +---------------------------------------------------+ + | libvirt | + +------------------------+--------------------------+ + | + | configures + v + +---------------------------------------------------+ + | QEMU | + +---------------------------------------------------+ + | librbd | + +---------------------------------------------------+ + | librados | + +------------------------+-+------------------------+ + | OSDs | | Monitors | + +------------------------+ +------------------------+ + +.. important:: To use Ceph Block Devices with OpenStack, you must have + access to a running Ceph Storage Cluster. + +Three parts of OpenStack integrate with Ceph's block devices: + +- **Images**: OpenStack Glance manages images for VMs. Images are immutable. + OpenStack treats images as binary blobs and downloads them accordingly. + +- **Volumes**: Volumes are block devices. OpenStack uses volumes to boot VMs, + or to attach volumes to running VMs. OpenStack manages volumes using + Cinder services. + +- **Guest Disks**: Guest disks are guest operating system disks. By default, + when you boot a virtual machine, its disk appears as a file on the filesystem + of the hypervisor (usually under ``/var/lib/nova/instances/<uuid>/``). Prior + to OpenStack Havana, the only way to boot a VM in Ceph was to use the + boot-from-volume functionality of Cinder. However, now it is possible to boot + every virtual machine inside Ceph directly without using Cinder, which is + advantageous because it allows you to perform maintenance operations easily + with the live-migration process. Additionally, if your hypervisor dies it is + also convenient to trigger ``nova evacuate`` and run the virtual machine + elsewhere almost seamlessly. + +You can use OpenStack Glance to store images in a Ceph Block Device, and you +can use Cinder to boot a VM using a copy-on-write clone of an image. + +The instructions below detail the setup for Glance, Cinder and Nova, although +they do not have to be used together. You may store images in Ceph block devices +while running VMs using a local disk, or vice versa. + +.. important:: Ceph doesn’t support QCOW2 for hosting a virtual machine disk. + Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot + from volume), the Glance image format must be ``RAW``. + +.. tip:: This document describes using Ceph Block Devices with OpenStack Havana. + For earlier versions of OpenStack see + `Block Devices and OpenStack (Dumpling)`_. + +.. index:: pools; OpenStack + +Create a Pool +============= + +By default, Ceph block devices use the ``rbd`` pool. You may use any available +pool. We recommend creating a pool for Cinder and a pool for Glance. Ensure +your Ceph cluster is running, then create the pools. :: + + ceph osd pool create volumes 128 + ceph osd pool create images 128 + ceph osd pool create backups 128 + ceph osd pool create vms 128 + +See `Create a Pool`_ for detail on specifying the number of placement groups for +your pools, and `Placement Groups`_ for details on the number of placement +groups you should set for your pools. + +Newly created pools must initialized prior to use. Use the ``rbd`` tool +to initialize the pools:: + + rbd pool init volumes + rbd pool init images + rbd pool init backups + rbd pool init vms + +.. _Create a Pool: ../../rados/operations/pools#createpool +.. _Placement Groups: ../../rados/operations/placement-groups + + +Configure OpenStack Ceph Clients +================================ + +The nodes running ``glance-api``, ``cinder-volume``, ``nova-compute`` and +``cinder-backup`` act as Ceph clients. Each requires the ``ceph.conf`` file:: + + ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf + + +Install Ceph client packages +---------------------------- + +On the ``glance-api`` node, you will need the Python bindings for ``librbd``:: + + sudo apt-get install python-rbd + sudo yum install python-rbd + +On the ``nova-compute``, ``cinder-backup`` and on the ``cinder-volume`` node, +use both the Python bindings and the client command line tools:: + + sudo apt-get install ceph-common + sudo yum install ceph-common + + +Setup Ceph Client Authentication +-------------------------------- + +If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder +and Glance. Execute the following:: + + ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' + ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images' + ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' + +Add the keyrings for ``client.cinder``, ``client.glance``, and +``client.cinder-backup`` to the appropriate nodes and change their ownership:: + + ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring + ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring + ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring + ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring + ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring + ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring + +Nodes running ``nova-compute`` need the keyring file for the ``nova-compute`` +process:: + + ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring + +They also need to store the secret key of the ``client.cinder`` user in +``libvirt``. The libvirt process needs it to access the cluster while attaching +a block device from Cinder. + +Create a temporary copy of the secret key on the nodes running +``nova-compute``:: + + ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key + +Then, on the compute nodes, add the secret key to ``libvirt`` and remove the +temporary copy of the key:: + + uuidgen + 457eb676-33da-42ec-9a8c-9293d545c337 + + cat > secret.xml <<EOF + <secret ephemeral='no' private='no'> + <uuid>457eb676-33da-42ec-9a8c-9293d545c337</uuid> + <usage type='ceph'> + <name>client.cinder secret</name> + </usage> + </secret> + EOF + sudo virsh secret-define --file secret.xml + Secret 457eb676-33da-42ec-9a8c-9293d545c337 created + sudo virsh secret-set-value --secret 457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key) && rm client.cinder.key secret.xml + +Save the uuid of the secret for configuring ``nova-compute`` later. + +.. important:: You don't necessarily need the UUID on all the compute nodes. + However from a platform consistency perspective, it's better to keep the + same UUID. + +.. _cephx authentication: ../../rados/configuration/auth-config-ref/#enabling-disabling-cephx + + +Configure OpenStack to use Ceph +=============================== + +Configuring Glance +------------------ + +Glance can use multiple back ends to store images. To use Ceph block devices by +default, configure Glance like the following. + +Prior to Juno +~~~~~~~~~~~~~~ + +Edit ``/etc/glance/glance-api.conf`` and add under the ``[DEFAULT]`` section:: + + default_store = rbd + rbd_store_user = glance + rbd_store_pool = images + rbd_store_chunk_size = 8 + + +Juno +~~~~ + +Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section:: + + [DEFAULT] + ... + default_store = rbd + ... + [glance_store] + stores = rbd + rbd_store_pool = images + rbd_store_user = glance + rbd_store_ceph_conf = /etc/ceph/ceph.conf + rbd_store_chunk_size = 8 + +.. important:: Glance has not completely moved to 'store' yet. + So we still need to configure the store in the DEFAULT section until Kilo. + +Kilo and after +~~~~~~~~~~~~~~ + +Edit ``/etc/glance/glance-api.conf`` and add under the ``[glance_store]`` section:: + + [glance_store] + stores = rbd + default_store = rbd + rbd_store_pool = images + rbd_store_user = glance + rbd_store_ceph_conf = /etc/ceph/ceph.conf + rbd_store_chunk_size = 8 + +For more information about the configuration options available in Glance please refer to the OpenStack Configuration Reference: http://docs.openstack.org/. + +Enable copy-on-write cloning of images +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Note that this exposes the back end location via Glance's API, so the endpoint +with this option enabled should not be publicly accessible. + +Any OpenStack version except Mitaka +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you want to enable copy-on-write cloning of images, also add under the ``[DEFAULT]`` section:: + + show_image_direct_url = True + +For Mitaka only +^^^^^^^^^^^^^^^ + +To enable image locations and take advantage of copy-on-write cloning for images, add under the ``[DEFAULT]`` section:: + + show_multiple_locations = True + show_image_direct_url = True + +Disable cache management (any OpenStack version) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Disable the Glance cache management to avoid images getting cached under ``/var/lib/glance/image-cache/``, +assuming your configuration file has ``flavor = keystone+cachemanagement``:: + + [paste_deploy] + flavor = keystone + +Image properties +~~~~~~~~~~~~~~~~ + +We recommend to use the following properties for your images: + +- ``hw_scsi_model=virtio-scsi``: add the virtio-scsi controller and get better performance and support for discard operation +- ``hw_disk_bus=scsi``: connect every cinder block devices to that controller +- ``hw_qemu_guest_agent=yes``: enable the QEMU guest agent +- ``os_require_quiesce=yes``: send fs-freeze/thaw calls through the QEMU guest agent + + +Configuring Cinder +------------------ + +OpenStack requires a driver to interact with Ceph block devices. You must also +specify the pool name for the block device. On your OpenStack node, edit +``/etc/cinder/cinder.conf`` by adding:: + + [DEFAULT] + ... + enabled_backends = ceph + ... + [ceph] + volume_driver = cinder.volume.drivers.rbd.RBDDriver + volume_backend_name = ceph + rbd_pool = volumes + rbd_ceph_conf = /etc/ceph/ceph.conf + rbd_flatten_volume_from_snapshot = false + rbd_max_clone_depth = 5 + rbd_store_chunk_size = 4 + rados_connect_timeout = -1 + glance_api_version = 2 + +If you are using `cephx authentication`_, also configure the user and uuid of +the secret you added to ``libvirt`` as documented earlier:: + + [ceph] + ... + rbd_user = cinder + rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337 + +Note that if you are configuring multiple cinder back ends, +``glance_api_version = 2`` must be in the ``[DEFAULT]`` section. + + +Configuring Cinder Backup +------------------------- + +OpenStack Cinder Backup requires a specific daemon so don't forget to install it. +On your Cinder Backup node, edit ``/etc/cinder/cinder.conf`` and add:: + + backup_driver = cinder.backup.drivers.ceph + backup_ceph_conf = /etc/ceph/ceph.conf + backup_ceph_user = cinder-backup + backup_ceph_chunk_size = 134217728 + backup_ceph_pool = backups + backup_ceph_stripe_unit = 0 + backup_ceph_stripe_count = 0 + restore_discard_excess_bytes = true + + +Configuring Nova to attach Ceph RBD block device +------------------------------------------------ + +In order to attach Cinder devices (either normal block or by issuing a boot +from volume), you must tell Nova (and libvirt) which user and UUID to refer to +when attaching the device. libvirt will refer to this user when connecting and +authenticating with the Ceph cluster. :: + + [libvirt] + ... + rbd_user = cinder + rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337 + +These two flags are also used by the Nova ephemeral backend. + + +Configuring Nova +---------------- + +In order to boot all the virtual machines directly into Ceph, you must +configure the ephemeral backend for Nova. + +It is recommended to enable the RBD cache in your Ceph configuration file +(enabled by default since Giant). Moreover, enabling the admin socket +brings a lot of benefits while troubleshooting. Having one socket +per virtual machine using a Ceph block device will help investigating performance and/or wrong behaviors. + +This socket can be accessed like this:: + + ceph daemon /var/run/ceph/ceph-client.cinder.19195.32310016.asok help + +Now on every compute nodes edit your Ceph configuration file:: + + [client] + rbd cache = true + rbd cache writethrough until flush = true + admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok + log file = /var/log/qemu/qemu-guest-$pid.log + rbd concurrent management ops = 20 + +Configure the permissions of these paths:: + + mkdir -p /var/run/ceph/guests/ /var/log/qemu/ + chown qemu:libvirtd /var/run/ceph/guests /var/log/qemu/ + +Note that user ``qemu`` and group ``libvirtd`` can vary depending on your system. +The provided example works for RedHat based systems. + +.. tip:: If your virtual machine is already running you can simply restart it to get the socket + + +Havana and Icehouse +~~~~~~~~~~~~~~~~~~~ + +Havana and Icehouse require patches to implement copy-on-write cloning and fix +bugs with image size and live migration of ephemeral disks on rbd. These are +available in branches based on upstream Nova `stable/havana`_ and +`stable/icehouse`_. Using them is not mandatory but **highly recommended** in +order to take advantage of the copy-on-write clone functionality. + +On every Compute node, edit ``/etc/nova/nova.conf`` and add:: + + libvirt_images_type = rbd + libvirt_images_rbd_pool = vms + libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf + disk_cachemodes="network=writeback" + rbd_user = cinder + rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337 + +It is also a good practice to disable file injection. While booting an +instance, Nova usually attempts to open the rootfs of the virtual machine. +Then, Nova injects values such as password, ssh keys etc. directly into the +filesystem. However, it is better to rely on the metadata service and +``cloud-init``. + +On every Compute node, edit ``/etc/nova/nova.conf`` and add:: + + libvirt_inject_password = false + libvirt_inject_key = false + libvirt_inject_partition = -2 + +To ensure a proper live-migration, use the following flags:: + + libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED" + +Juno +~~~~ + +In Juno, Ceph block device was moved under the ``[libvirt]`` section. +On every Compute node, edit ``/etc/nova/nova.conf`` under the ``[libvirt]`` +section and add:: + + [libvirt] + images_type = rbd + images_rbd_pool = vms + images_rbd_ceph_conf = /etc/ceph/ceph.conf + rbd_user = cinder + rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337 + disk_cachemodes="network=writeback" + + +It is also a good practice to disable file injection. While booting an +instance, Nova usually attempts to open the rootfs of the virtual machine. +Then, Nova injects values such as password, ssh keys etc. directly into the +filesystem. However, it is better to rely on the metadata service and +``cloud-init``. + +On every Compute node, edit ``/etc/nova/nova.conf`` and add the following +under the ``[libvirt]`` section:: + + inject_password = false + inject_key = false + inject_partition = -2 + +To ensure a proper live-migration, use the following flags (under the ``[libvirt]`` section):: + + live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED" + +Kilo +~~~~ + +Enable discard support for virtual machine ephemeral root disk:: + + [libvirt] + ... + ... + hw_disk_discard = unmap # enable discard support (be careful of performance) + + +Restart OpenStack +================= + +To activate the Ceph block device driver and load the block device pool name +into the configuration, you must restart OpenStack. Thus, for Debian based +systems execute these commands on the appropriate nodes:: + + sudo glance-control api restart + sudo service nova-compute restart + sudo service cinder-volume restart + sudo service cinder-backup restart + +For Red Hat based systems execute:: + + sudo service openstack-glance-api restart + sudo service openstack-nova-compute restart + sudo service openstack-cinder-volume restart + sudo service openstack-cinder-backup restart + +Once OpenStack is up and running, you should be able to create a volume +and boot from it. + + +Booting from a Block Device +=========================== + +You can create a volume from an image using the Cinder command line tool:: + + cinder create --image-id {id of image} --display-name {name of volume} {size of volume} + +Note that image must be RAW format. You can use `qemu-img`_ to convert +from one format to another. For example:: + + qemu-img convert -f {source-format} -O {output-format} {source-filename} {output-filename} + qemu-img convert -f qcow2 -O raw precise-cloudimg.img precise-cloudimg.raw + +When Glance and Cinder are both using Ceph block devices, the image is a +copy-on-write clone, so it can create a new volume quickly. In the OpenStack +dashboard, you can boot from that volume by performing the following steps: + +#. Launch a new instance. +#. Choose the image associated to the copy-on-write clone. +#. Select 'boot from volume'. +#. Select the volume you created. + +.. _qemu-img: ../qemu-rbd/#running-qemu-with-rbd +.. _Block Devices and OpenStack (Dumpling): http://docs.ceph.com/docs/dumpling/rbd/rbd-openstack +.. _stable/havana: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd +.. _stable/icehouse: https://github.com/angdraug/nova/tree/rbd-ephemeral-clone-stable-icehouse diff --git a/src/ceph/doc/rbd/rbd-replay.rst b/src/ceph/doc/rbd/rbd-replay.rst new file mode 100644 index 0000000..e1c96b2 --- /dev/null +++ b/src/ceph/doc/rbd/rbd-replay.rst @@ -0,0 +1,42 @@ +=================== + RBD Replay +=================== + +.. index:: Ceph Block Device; RBD Replay + +RBD Replay is a set of tools for capturing and replaying Rados Block Device +(RBD) workloads. To capture an RBD workload, ``lttng-tools`` must be installed +on the client, and ``librbd`` on the client must be the v0.87 (Giant) release +or later. To replay an RBD workload, ``librbd`` on the client must be the Giant +release or later. + +Capture and replay takes three steps: + +#. Capture the trace. Make sure to capture ``pthread_id`` context:: + + mkdir -p traces + lttng create -o traces librbd + lttng enable-event -u 'librbd:*' + lttng add-context -u -t pthread_id + lttng start + # run RBD workload here + lttng stop + +#. Process the trace with `rbd-replay-prep`_:: + + rbd-replay-prep traces/ust/uid/*/* replay.bin + +#. Replay the trace with `rbd-replay`_. Use read-only until you know + it's doing what you want:: + + rbd-replay --read-only replay.bin + +.. important:: ``rbd-replay`` will destroy data by default. Do not use against + an image you wish to keep, unless you use the ``--read-only`` option. + +The replayed workload does not have to be against the same RBD image or even the +same cluster as the captured workload. To account for differences, you may need +to use the ``--pool`` and ``--map-image`` options of ``rbd-replay``. + +.. _rbd-replay: ../../man/8/rbd-replay +.. _rbd-replay-prep: ../../man/8/rbd-replay-prep diff --git a/src/ceph/doc/rbd/rbd-snapshot.rst b/src/ceph/doc/rbd/rbd-snapshot.rst new file mode 100644 index 0000000..2e5af9f --- /dev/null +++ b/src/ceph/doc/rbd/rbd-snapshot.rst @@ -0,0 +1,308 @@ +=========== + Snapshots +=========== + +.. index:: Ceph Block Device; snapshots + +A snapshot is a read-only copy of the state of an image at a particular point in +time. One of the advanced features of Ceph block devices is that you can create +snapshots of the images to retain a history of an image's state. Ceph also +supports snapshot layering, which allows you to clone images (e.g., a VM image) +quickly and easily. Ceph supports block device snapshots using the ``rbd`` +command and many higher level interfaces, including `QEMU`_, `libvirt`_, +`OpenStack`_ and `CloudStack`_. + +.. important:: To use use RBD snapshots, you must have a running Ceph cluster. + +.. note:: If a snapshot is taken while `I/O` is still in progress in a image, the + snapshot might not get the exact or latest data of the image and the snapshot + may have to be cloned to a new image to be mountable. So, we recommend to stop + `I/O` before taking a snapshot of an image. If the image contains a filesystem, + the filesystem must be in a consistent state before taking a snapshot. To stop + `I/O` you can use `fsfreeze` command. See `fsfreeze(8)` man page for more details. + For virtual machines, `qemu-guest-agent` can be used to automatically freeze + filesystems when creating a snapshot. + +.. ditaa:: +------------+ +-------------+ + | {s} | | {s} c999 | + | Active |<-------*| Snapshot | + | Image | | of Image | + | (stop i/o) | | (read only) | + +------------+ +-------------+ + + +Cephx Notes +=========== + +When `cephx`_ is enabled (it is by default), you must specify a user name or ID +and a path to the keyring containing the corresponding key for the user. See +`User Management`_ for details. You may also add the ``CEPH_ARGS`` environment +variable to avoid re-entry of the following parameters. :: + + rbd --id {user-ID} --keyring=/path/to/secret [commands] + rbd --name {username} --keyring=/path/to/secret [commands] + +For example:: + + rbd --id admin --keyring=/etc/ceph/ceph.keyring [commands] + rbd --name client.admin --keyring=/etc/ceph/ceph.keyring [commands] + +.. tip:: Add the user and secret to the ``CEPH_ARGS`` environment + variable so that you don't need to enter them each time. + + +Snapshot Basics +=============== + +The following procedures demonstrate how to create, list, and remove +snapshots using the ``rbd`` command on the command line. + +Create Snapshot +--------------- + +To create a snapshot with ``rbd``, specify the ``snap create`` option, the pool +name and the image name. :: + + rbd snap create {pool-name}/{image-name}@{snap-name} + +For example:: + + rbd snap create rbd/foo@snapname + + +List Snapshots +-------------- + +To list snapshots of an image, specify the pool name and the image name. :: + + rbd snap ls {pool-name}/{image-name} + +For example:: + + rbd snap ls rbd/foo + + +Rollback Snapshot +----------------- + +To rollback to a snapshot with ``rbd``, specify the ``snap rollback`` option, the +pool name, the image name and the snap name. :: + + rbd snap rollback {pool-name}/{image-name}@{snap-name} + +For example:: + + rbd snap rollback rbd/foo@snapname + + +.. note:: Rolling back an image to a snapshot means overwriting + the current version of the image with data from a snapshot. The + time it takes to execute a rollback increases with the size of the + image. It is **faster to clone** from a snapshot **than to rollback** + an image to a snapshot, and it is the preferred method of returning + to a pre-existing state. + + +Delete a Snapshot +----------------- + +To delete a snapshot with ``rbd``, specify the ``snap rm`` option, the pool +name, the image name and the snap name. :: + + rbd snap rm {pool-name}/{image-name}@{snap-name} + +For example:: + + rbd snap rm rbd/foo@snapname + + +.. note:: Ceph OSDs delete data asynchronously, so deleting a snapshot + doesn't free up the disk space immediately. + +Purge Snapshots +--------------- + +To delete all snapshots for an image with ``rbd``, specify the ``snap purge`` +option and the image name. :: + + rbd snap purge {pool-name}/{image-name} + +For example:: + + rbd snap purge rbd/foo + + +.. index:: Ceph Block Device; snapshot layering + +Layering +======== + +Ceph supports the ability to create many copy-on-write (COW) clones of a block +device shapshot. Snapshot layering enables Ceph block device clients to create +images very quickly. For example, you might create a block device image with a +Linux VM written to it; then, snapshot the image, protect the snapshot, and +create as many copy-on-write clones as you like. A snapshot is read-only, +so cloning a snapshot simplifies semantics--making it possible to create +clones rapidly. + + +.. ditaa:: +-------------+ +-------------+ + | {s} c999 | | {s} | + | Snapshot | Child refers | COW Clone | + | of Image |<------------*| of Snapshot | + | | to Parent | | + | (read only) | | (writable) | + +-------------+ +-------------+ + + Parent Child + +.. note:: The terms "parent" and "child" mean a Ceph block device snapshot (parent), + and the corresponding image cloned from the snapshot (child). These terms are + important for the command line usage below. + +Each cloned image (child) stores a reference to its parent image, which enables +the cloned image to open the parent snapshot and read it. + +A COW clone of a snapshot behaves exactly like any other Ceph block device +image. You can read to, write from, clone, and resize cloned images. There are +no special restrictions with cloned images. However, the copy-on-write clone of +a snapshot refers to the snapshot, so you **MUST** protect the snapshot before +you clone it. The following diagram depicts the process. + +.. note:: Ceph only supports cloning for format 2 images (i.e., created with + ``rbd create --image-format 2``). The kernel client supports cloned images + since kernel 3.10. + +Getting Started with Layering +----------------------------- + +Ceph block device layering is a simple process. You must have an image. You must +create a snapshot of the image. You must protect the snapshot. Once you have +performed these steps, you can begin cloning the snapshot. + +.. ditaa:: +----------------------------+ +-----------------------------+ + | | | | + | Create Block Device Image |------->| Create a Snapshot | + | | | | + +----------------------------+ +-----------------------------+ + | + +--------------------------------------+ + | + v + +----------------------------+ +-----------------------------+ + | | | | + | Protect the Snapshot |------->| Clone the Snapshot | + | | | | + +----------------------------+ +-----------------------------+ + + +The cloned image has a reference to the parent snapshot, and includes the pool +ID, image ID and snapshot ID. The inclusion of the pool ID means that you may +clone snapshots from one pool to images in another pool. + + +#. **Image Template:** A common use case for block device layering is to create a + a master image and a snapshot that serves as a template for clones. For example, + a user may create an image for a Linux distribution (e.g., Ubuntu 12.04), and + create a snapshot for it. Periodically, the user may update the image and create + a new snapshot (e.g., ``sudo apt-get update``, ``sudo apt-get upgrade``, + ``sudo apt-get dist-upgrade`` followed by ``rbd snap create``). As the image + matures, the user can clone any one of the snapshots. + +#. **Extended Template:** A more advanced use case includes extending a template + image that provides more information than a base image. For example, a user may + clone an image (e.g., a VM template) and install other software (e.g., a database, + a content management system, an analytics system, etc.) and then snapshot the + extended image, which itself may be updated just like the base image. + +#. **Template Pool:** One way to use block device layering is to create a + pool that contains master images that act as templates, and snapshots of those + templates. You may then extend read-only privileges to users so that they + may clone the snapshots without the ability to write or execute within the pool. + +#. **Image Migration/Recovery:** One way to use block device layering is to migrate + or recover data from one pool into another pool. + +Protecting a Snapshot +--------------------- + +Clones access the parent snapshots. All clones would break if a user inadvertently +deleted the parent snapshot. To prevent data loss, you **MUST** protect the +snapshot before you can clone it. :: + + rbd snap protect {pool-name}/{image-name}@{snapshot-name} + +For example:: + + rbd snap protect rbd/my-image@my-snapshot + +.. note:: You cannot delete a protected snapshot. + +Cloning a Snapshot +------------------ + +To clone a snapshot, specify you need to specify the parent pool, image and +snapshot; and, the child pool and image name. You must protect the snapshot +before you can clone it. :: + + rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name} + +For example:: + + rbd clone rbd/my-image@my-snapshot rbd/new-image + +.. note:: You may clone a snapshot from one pool to an image in another pool. For example, + you may maintain read-only images and snapshots as templates in one pool, and writeable + clones in another pool. + +Unprotecting a Snapshot +----------------------- + +Before you can delete a snapshot, you must unprotect it first. Additionally, +you may *NOT* delete snapshots that have references from clones. You must +flatten each clone of a snapshot, before you can delete the snapshot. :: + + rbd snap unprotect {pool-name}/{image-name}@{snapshot-name} + +For example:: + + rbd snap unprotect rbd/my-image@my-snapshot + + +Listing Children of a Snapshot +------------------------------ + +To list the children of a snapshot, execute the following:: + + rbd children {pool-name}/{image-name}@{snapshot-name} + +For example:: + + rbd children rbd/my-image@my-snapshot + + +Flattening a Cloned Image +------------------------- + +Cloned images retain a reference to the parent snapshot. When you remove the +reference from the child clone to the parent snapshot, you effectively "flatten" +the image by copying the information from the snapshot to the clone. The time +it takes to flatten a clone increases with the size of the snapshot. To delete +a snapshot, you must flatten the child images first. :: + + rbd flatten {pool-name}/{image-name} + +For example:: + + rbd flatten rbd/my-image + +.. note:: Since a flattened image contains all the information from the snapshot, + a flattened image will take up more storage space than a layered clone. + + +.. _cephx: ../../rados/configuration/auth-config-ref/ +.. _User Management: ../../operations/user-management +.. _QEMU: ../qemu-rbd/ +.. _OpenStack: ../rbd-openstack/ +.. _CloudStack: ../rbd-cloudstack/ +.. _libvirt: ../libvirt/ |