initial code repo

This patch creates initial code repo. For ceph, luminous stable release will be used for base code, and next changes and optimization for ceph will be added to it. For opensds, currently any changes can be upstreamed into original opensds repo (https://github.com/opensds/opensds), and so stor4nfv will directly clone opensds code to deploy stor4nfv environment. And the scripts for deployment based on ceph and opensds will be put into 'ci' directory. Change-Id: I46a32218884c75dda2936337604ff03c554648e4 Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
author: Qiaowei Ren <qiaowei.ren@intel.com> 2018-01-04 13:43:33 +0800
committer: Qiaowei Ren <qiaowei.ren@intel.com> 2018-01-05 11:59:39 +0800
commit: 812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch)
tree: 04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/ceph-volume
parent: 15280273faafb77777eab341909a3f495cf248d9 (diff)
15 files changed, 1027 insertions, 0 deletions
diff --git a/src/ceph/doc/ceph-volume/index.rst b/src/ceph/doc/ceph-volume/index.rst
new file mode 100644
index 0000000..5cf4778
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/index.rst
@@ -0,0 +1,64 @@
+.. _ceph-volume:
+
+ceph-volume
+===========
+Deploy OSDs with different device technologies like lvm or physical disks using
+pluggable tools (:doc:`lvm/index` itself is treated like a plugin) and trying to
+follow a predictable, and robust way of preparing, activating, and starting OSDs.
+
+:ref:`Overview <ceph-volume-overview>` |
+:ref:`Plugin Guide <ceph-volume-plugins>` |
+
+
+**Command Line Subcommands**
+There is currently support for ``lvm``, and plain disks (with GPT partitions)
+that may have been deployed with ``ceph-disk``.
+
+* :ref:`ceph-volume-lvm`
+* :ref:`ceph-volume-simple`
+
+
+Migrating
+---------
+Starting on Ceph version 12.2.2, ``ceph-disk`` is deprecated. Deprecation
+warnings will show up that will link to this page. It is strongly suggested
+that users start consuming ``ceph-volume``.
+
+New deployments
+^^^^^^^^^^^^^^^
+For new deployments, :ref:`ceph-volume-lvm` is recommended, it can use any
+logical volume as input for data OSDs, or it can setup a minimal/naive logical
+volume from a device.
+
+Existing OSDs
+^^^^^^^^^^^^^
+If the cluster has OSDs that were provisioned with ``ceph-disk``, then
+``ceph-volume`` can take over the management of these with
+:ref:`ceph-volume-simple`. A scan is done on the data device or OSD directory,
+and ``ceph-disk`` is fully disabled.
+
+Encrypted OSDs
+^^^^^^^^^^^^^^
+If using encryption with OSDs, there is currently no support in ``ceph-volume``
+for this scenario (although support for this is coming soon). In this case, it
+is OK to continue to use ``ceph-disk`` until ``ceph-volume`` fully supports it.
+This page will be updated when that happens.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 3
+   :caption: Contents:
+
+   intro
+   systemd
+   lvm/index
+   lvm/activate
+   lvm/prepare
+   lvm/scan
+   lvm/systemd
+   lvm/list
+   lvm/zap
+   simple/index
+   simple/activate
+   simple/scan
+   simple/systemd
diff --git a/src/ceph/doc/ceph-volume/intro.rst b/src/ceph/doc/ceph-volume/intro.rst
new file mode 100644
index 0000000..3869148
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/intro.rst
@@ -0,0 +1,19 @@
+.. _ceph-volume-overview:
+
+Overview
+--------
+The ``ceph-volume`` tool aims to be a single purpose command line tool to deploy
+logical volumes as OSDs, trying to maintain a similar API to ``ceph-disk`` when
+preparing, activating, and creating OSDs.
+
+It deviates from ``ceph-disk`` by not interacting or relying on the udev rules
+that come installed for Ceph. These rules allow automatic detection of
+previously setup devices that are in turn fed into ``ceph-disk`` to activate
+them.
+
+
+``ceph-volume lvm``
+-------------------
+By making use of :term:`LVM tags`, the :ref:`ceph-volume-lvm` sub-command is
+able to store and later re-discover and query devices associated with OSDs so
+that they can later activated.
diff --git a/src/ceph/doc/ceph-volume/lvm/activate.rst b/src/ceph/doc/ceph-volume/lvm/activate.rst
new file mode 100644
index 0000000..956a62a
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/activate.rst
@@ -0,0 +1,88 @@
+.. _ceph-volume-lvm-activate:
+
+``activate``
+============
+Once :ref:`ceph-volume-lvm-prepare` is completed, and all the various steps
+that entails are done, the volume is ready to get "activated".
+
+This activation process enables a systemd unit that persists the OSD ID and its
+UUID (also called ``fsid`` in Ceph CLI tools), so that at boot time it can
+understand what OSD is enabled and needs to be mounted.
+
+.. note:: The execution of this call is fully idempotent, and there is no
+          side-effects when running multiple times
+
+New OSDs
+--------
+To activate newly prepared OSDs both the :term:`OSD id` and :term:`OSD uuid`
+need to be supplied. For example::
+
+    ceph-volume lvm activate --bluestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
+
+.. note:: The UUID is stored in the ``osd_fsid`` file in the OSD path, which is
+          generated when :ref:`ceph-volume-lvm-prepare` is used.
+
+requiring uuids
+^^^^^^^^^^^^^^^
+The :term:`OSD uuid` is being required as an extra step to ensure that the
+right OSD is being activated. It is entirely possible that a previous OSD with
+the same id exists and would end up activating the incorrect one.
+
+
+Discovery
+---------
+With either existing OSDs or new ones being activated, a *discovery* process is
+performed using :term:`LVM tags` to enable the systemd units.
+
+The systemd unit will capture the :term:`OSD id` and :term:`OSD uuid` and
+persist it. Internally, the activation will enable it like::
+
+    systemctl enable ceph-volume@$id-$uuid-lvm
+
+For example::
+
+    systemctl enable ceph-volume@0-8715BEB4-15C5-49DE-BA6F-401086EC7B41-lvm
+
+Would start the discovery process for the OSD with an id of ``0`` and a UUID of
+``8715BEB4-15C5-49DE-BA6F-401086EC7B41``.
+
+.. note:: for more details on the systemd workflow see :ref:`ceph-volume-lvm-systemd`
+
+The systemd unit will look for the matching OSD device, and by looking at its
+:term:`LVM tags` will proceed to:
+
+# mount the device in the corresponding location (by convention this is
+  ``/var/lib/ceph/osd/<cluster name>-<osd id>/``)
+
+# ensure that all required devices are ready for that OSD
+
+# start the ``ceph-osd@0`` systemd unit
+
+.. note:: The system infers the objectstore type (filestore or bluestore) by
+          inspecting the LVM tags applied to the OSD devices
+
+Existing OSDs
+-------------
+For exsiting OSDs that have been deployed with different tooling, the only way
+to port them over to the new mechanism is to prepare them again (losing data).
+See :ref:`ceph-volume-lvm-existing-osds` for details on how to proceed.
+
+Summary
+-------
+To recap the ``activate`` process for :term:`bluestore`:
+
+#. require both :term:`OSD id` and :term:`OSD uuid`
+#. enable the system unit with matching id and uuid
+#. Create the ``tmpfs`` mount at the OSD directory in
+   ``/var/lib/ceph/osd/$cluster-$id/``
+#. Recreate all the files needed with ``ceph-bluestore-tool prime-osd-dir`` by
+   pointing it to the OSD ``block`` device.
+#. the systemd unit will ensure all devices are ready and linked
+#. the matching ``ceph-osd`` systemd unit will get started
+
+And for :term:`filestore`:
+
+#. require both :term:`OSD id` and :term:`OSD uuid`
+#. enable the system unit with matching id and uuid
+#. the systemd unit will ensure all devices are ready and mounted (if needed)
+#. the matching ``ceph-osd`` systemd unit will get started
diff --git a/src/ceph/doc/ceph-volume/lvm/create.rst b/src/ceph/doc/ceph-volume/lvm/create.rst
new file mode 100644
index 0000000..c90d1f6
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/create.rst
@@ -0,0 +1,24 @@
+.. _ceph-volume-lvm-create:
+
+``create``
+===========
+This subcommand wraps the two-step process to provision a new osd (calling
+``prepare`` first and then ``activate``) into a single
+one. The reason to prefer ``prepare`` and then ``activate`` is to gradually
+introduce new OSDs into a cluster, and avoiding large amounts of data being
+rebalanced.
+
+The single-call process unifies exactly what :ref:`ceph-volume-lvm-prepare` and
+:ref:`ceph-volume-lvm-activate` do, with the convenience of doing it all at
+once.
+
+There is nothing different to the process except the OSD will become up and in
+immediately after completion.
+
+The backing objectstore can be specified with:
+
+* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+All command line flags and options are the same as ``ceph-volume lvm prepare``.
+Please refer to :ref:`ceph-volume-lvm-prepare` for details.
diff --git a/src/ceph/doc/ceph-volume/lvm/index.rst b/src/ceph/doc/ceph-volume/lvm/index.rst
new file mode 100644
index 0000000..9a2191f
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/index.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-lvm:
+
+``lvm``
+=======
+Implements the functionality needed to deploy OSDs from the ``lvm`` subcommand:
+``ceph-volume lvm``
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-lvm-prepare`
+
+* :ref:`ceph-volume-lvm-activate`
+
+* :ref:`ceph-volume-lvm-create`
+
+* :ref:`ceph-volume-lvm-list`
+
+.. not yet implemented
+.. * :ref:`ceph-volume-lvm-scan`
+
+**Internal functionality**
+
+There are other aspects of the ``lvm`` subcommand that are internal and not
+exposed to the user, these sections explain how these pieces work together,
+clarifying the workflows of the tool.
+
+:ref:`Systemd Units <ceph-volume-lvm-systemd>` |
+:ref:`lvm <ceph-volume-lvm-api>`
diff --git a/src/ceph/doc/ceph-volume/lvm/list.rst b/src/ceph/doc/ceph-volume/lvm/list.rst
new file mode 100644
index 0000000..19e0600
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/list.rst
@@ -0,0 +1,173 @@
+.. _ceph-volume-lvm-list:
+
+``list``
+========
+This subcommand will list any devices (logical and physical) that may be
+associated with a Ceph cluster, as long as they contain enough metadata to
+allow for that discovery.
+
+Output is grouped by the OSD ID associated with the devices, and unlike
+``ceph-disk`` it does not provide any information for devices that aren't
+associated with Ceph.
+
+Command line options:
+
+* ``--format`` Allows a ``json`` or ``pretty`` value. Defaults to ``pretty``
+  which will group the device information in a human-readable format.
+
+Full Reporting
+--------------
+When no positional arguments are used, a full reporting will be presented. This
+means that all devices and logical volumes found in the system will be
+displayed.
+
+Full ``pretty`` reporting for two OSDs, one with a lv as a journal, and another
+one with a physical device may look similar to::
+
+    # ceph-volume lvm list
+
+
+    ====== osd.1 =======
+
+      [journal]    /dev/journals/journal1
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      journal
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+
+      [data]    /dev/test_group/data-lv2
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+
+    ====== osd.0 =======
+
+      [data]    /dev/test_group/data-lv1
+
+          journal uuid              cd72bd28-002a-48da-bdf6-d5b993e84f3f
+          osd id                    0
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  943949f0-ce37-47ca-a33c-3413d46ee9ec
+          data uuid                 TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00
+          journal device            /dev/sdd1
+          data device               /dev/test_group/data-lv1
+
+      [journal]    /dev/sdd1
+
+          PARTUUID                  cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+          as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+          see :ref:`ceph-volume-lvm-tag-api`
+
+Single Reporting
+----------------
+Single reporting can consume both devices and logical volumes as input
+(positional parameters). For logical volumes, it is required to use the group
+name as well as the logical volume name.
+
+For example the ``data-lv2`` logical volume, in the ``test_group`` volume group
+can be listed in the following way::
+
+    # ceph-volume lvm list test_group/data-lv2
+
+
+    ====== osd.1 =======
+
+      [data]    /dev/test_group/data-lv2
+
+          journal uuid              C65n7d-B1gy-cqX3-vZKY-ZoE0-IEYM-HnIJzs
+          osd id                    1
+          cluster fsid              ce454d91-d748-4751-a318-ff7f7aa18ffd
+          type                      data
+          osd fsid                  661b24f8-e062-482b-8110-826ffe7f13fa
+          data uuid                 SlEgHe-jX1H-QBQk-Sce0-RUls-8KlY-g8HgcZ
+          journal device            /dev/journals/journal1
+          data device               /dev/test_group/data-lv2
+
+
+.. note:: Tags are displayed in a readable format. The ``osd id`` key is stored
+          as a ``ceph.osd_id`` tag. For more information on lvm tag conventions
+          see :ref:`ceph-volume-lvm-tag-api`
+
+
+For plain disks, the full path to the device is required. For example, for
+a device like ``/dev/sdd1`` it can look like::
+
+
+    # ceph-volume lvm list /dev/sdd1
+
+
+    ====== osd.0 =======
+
+      [journal]    /dev/sdd1
+
+          PARTUUID                  cd72bd28-002a-48da-bdf6-d5b993e84f3f
+
+
+
+``json`` output
+---------------
+All output using ``--format=json`` will show everything the system has stored
+as metadata for the devices, including tags.
+
+No changes for readability are done with ``json`` reporting, and all
+information is presented as-is. Full output as well as single devices can be
+listed.
+
+For brevity, this is how a single logical volume would look with ``json``
+output (note how tags aren't modified)::
+
+    # ceph-volume lvm list --format=json test_group/data-lv1
+    {
+        "0": [
+            {
+                "lv_name": "data-lv1",
+                "lv_path": "/dev/test_group/data-lv1",
+                "lv_tags": "ceph.cluster_fsid=ce454d91-d748-4751-a318-ff7f7aa18ffd,ceph.data_device=/dev/test_group/data-lv1,ceph.data_uuid=TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00,ceph.journal_device=/dev/sdd1,ceph.journal_uuid=cd72bd28-002a-48da-bdf6-d5b993e84f3f,ceph.osd_fsid=943949f0-ce37-47ca-a33c-3413d46ee9ec,ceph.osd_id=0,ceph.type=data",
+                "lv_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+                "name": "data-lv1",
+                "path": "/dev/test_group/data-lv1",
+                "tags": {
+                    "ceph.cluster_fsid": "ce454d91-d748-4751-a318-ff7f7aa18ffd",
+                    "ceph.data_device": "/dev/test_group/data-lv1",
+                    "ceph.data_uuid": "TUpfel-Q5ZT-eFph-bdGW-SiNW-l0ag-f5kh00",
+                    "ceph.journal_device": "/dev/sdd1",
+                    "ceph.journal_uuid": "cd72bd28-002a-48da-bdf6-d5b993e84f3f",
+                    "ceph.osd_fsid": "943949f0-ce37-47ca-a33c-3413d46ee9ec",
+                    "ceph.osd_id": "0",
+                    "ceph.type": "data"
+                },
+                "type": "data",
+                "vg_name": "test_group"
+            }
+        ]
+    }
+
+
+Synchronized information
+------------------------
+Before any listing type, the lvm API is queried to ensure that physical devices
+that may be in use haven't changed naming. It is possible that non-persistent
+devices like ``/dev/sda1`` could change to ``/dev/sdb1``.
+
+The detection is possible because the ``PARTUUID`` is stored as part of the
+metadata in the logical volume for the data lv. Even in the case of a journal
+that is a physical device, this information is still stored on the data logical
+volume associated with it.
+
+If the name is no longer the same (as reported by ``blkid`` when using the
+``PARTUUID``), the tag will get updated and the report will use the newly
+refreshed information.
diff --git a/src/ceph/doc/ceph-volume/lvm/prepare.rst b/src/ceph/doc/ceph-volume/lvm/prepare.rst
new file mode 100644
index 0000000..27ebb55
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/prepare.rst
@@ -0,0 +1,241 @@
+.. _ceph-volume-lvm-prepare:
+
+``prepare``
+===========
+This subcommand allows a :term:`filestore` or :term:`bluestore` setup. It is
+recommended to pre-provision a logical volume before using it with
+``ceph-volume lvm``.
+
+Logical volumes are not altered except for adding extra metadata.
+
+.. note:: This is part of a two step process to deploy an OSD. If looking for
+          a single-call way, please see :ref:`ceph-volume-lvm-create`
+
+To help identify volumes, the process of preparing a volume (or volumes) to
+work with Ceph, the tool will assign a few pieces of metadata information using
+:term:`LVM tags`.
+
+:term:`LVM tags` makes volumes easy to discover later, and help identify them as
+part of a Ceph system, and what role they have (journal, filestore, bluestore,
+etc...)
+
+Although initially :term:`filestore` is supported (and supported by default)
+the back end can be specified with:
+
+
+* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
+* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
+
+.. _ceph-volume-lvm-prepare_filestore:
+
+``filestore``
+-------------
+This is the OSD backend that allows preparation of logical volumes for
+a :term:`filestore` objectstore OSD.
+
+It can use a logical volume for the OSD data and a partitioned physical device
+or logical volume for the journal.  No special preparation is needed for these
+volumes other than following the minimum size requirements for data and
+journal.
+
+The API call looks like::
+
+    ceph-volume prepare --filestore --data data --journal journal
+
+There is flexibility to use a raw device or partition as well for ``--data``
+that will be converted to a logical volume. This is not ideal in all situations
+since ``ceph-volume`` is just going to create a unique volume group and
+a logical volume from that device.
+
+When using logical volumes for ``--data``, the  value *must* be a volume group
+name and a logical volume name separated by a ``/``. Since logical volume names
+are not enforced for uniqueness, this prevents using the wrong volume. The
+``--journal`` can be either a logical volume *or* a partition.
+
+When using a partition, it *must* contain a ``PARTUUID`` discoverable by
+``blkid``, so that it can later be identified correctly regardless of the
+device name (or path).
+
+When using a partition, this is how it would look for ``/dev/sdc1``::
+
+    ceph-volume prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
+
+For a logical volume, just like for ``--data``, a volume group and logical
+volume name are required::
+
+    ceph-volume prepare --filestore --data volume_group/lv_name --journal volume_group/journal_lv
+
+A generated uuid is used to ask the cluster for a new OSD. These two pieces are
+crucial for identifying an OSD and will later be used throughout the
+:ref:`ceph-volume-lvm-activate` process.
+
+The OSD data directory is created using the following convention::
+
+    /var/lib/ceph/osd/<cluster name>-<osd id>
+
+At this point the data volume is mounted at this location, and the journal
+volume is linked::
+
+      ln -s /path/to/journal /var/lib/ceph/osd/<cluster_name>-<osd-id>/journal
+
+The monmap is fetched using the bootstrap key from the OSD::
+
+      /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
+      --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
+      mon getmap -o /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap
+
+``ceph-osd`` will be called to populate the OSD directory, that is already
+mounted, re-using all the pieces of information from the initial steps::
+
+      ceph-osd --cluster ceph --mkfs --mkkey -i <osd id> \
+      --monmap /var/lib/ceph/osd/<cluster name>-<osd id>/activate.monmap --osd-data \
+      /var/lib/ceph/osd/<cluster name>-<osd id> --osd-journal /var/lib/ceph/osd/<cluster name>-<osd id>/journal \
+      --osd-uuid <osd uuid> --keyring /var/lib/ceph/osd/<cluster name>-<osd id>/keyring \
+      --setuser ceph --setgroup ceph
+
+.. _ceph-volume-lvm-existing-osds:
+
+Existing OSDs
+-------------
+For existing clusters that want to use this new system and have OSDs that are
+already running there are a few things to take into account:
+
+.. warning:: this process will forcefully format the data device, destroying
+             existing data, if any.
+
+* OSD paths should follow this convention::
+
+     /var/lib/ceph/osd/<cluster name>-<osd id>
+
+* Preferably, no other mechanisms to mount the volume should exist, and should
+  be removed (like fstab mount points)
+* There is currently no support for encrypted volumes
+
+The one time process for an existing OSD, with an ID of 0 and
+using a ``"ceph"`` cluster name would look like::
+
+    ceph-volume lvm prepare --filestore --osd-id 0 --osd-fsid E3D291C1-E7BF-4984-9794-B60D9FA139CB
+
+The command line tool will not contact the monitor to generate an OSD ID and
+will format the LVM device in addition to storing the metadata on it so that it
+can later be startednot contact the monitor to generate an OSD ID and will
+format the LVM device in addition to storing the metadata on it so that it can
+later be started (for detailed metadata description see :ref:`ceph-volume-lvm-tags`).
+
+
+.. _ceph-volume-lvm-prepare_bluestore:
+
+``bluestore``
+-------------
+The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
+more flexibility for devices. Bluestore supports the following configurations:
+
+* A block device, a block.wal, and a block.db device
+* A block device and a block.wal device
+* A block device and a block.db device
+* A single block device
+
+It can accept a whole device (or partition), or a logical volume for ``block``.
+If a physical device is provided it will then be turned into a logical volume.
+This allows a simpler approach at using LVM but at the cost of flexibility:
+there are no options or configurations to change how the LV is created.
+
+The ``block`` is specified with the ``--data`` flag, and in its simplest use
+case it looks like::
+
+    ceph-volume lvm prepare --bluestore --data vg/lv
+
+A raw device can be specified in the same way::
+
+    ceph-volume lvm prepare --bluestore --data /path/to/device
+
+
+If a ``block.db`` or a ``block.wal`` is needed (they are optional for
+bluestore) they can be specified with ``--block.db`` and ``--block.wal``
+accordingly. These can be a physical device (they **must** be a partition) or
+a logical volume.
+
+For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
+because they can be used as-is. Logical Volumes are also allowed.
+
+While creating the OSD directory, the process will use a ``tmpfs`` mount to
+place all the files needed for the OSD. These files are initially created by
+``ceph-osd --mkfs`` and are fully ephemeral.
+
+A symlink is always created for the ``block`` device, and optionally for
+``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
+id of 0, the directory could look like::
+
+    # ls -l /var/lib/ceph/osd/ceph-0
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
+    -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
+    -rw-------. 1 ceph ceph  6 Oct 20 13:05 ready
+    -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
+    -rw-------. 1 ceph ceph  2 Oct 20 13:05 whoami
+
+In the above case, a device was used for ``block`` so ``ceph-volume`` create
+a volume group and a logical volume using the following convention:
+
+* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
+  ``ceph-{random uuid}``
+
+* logical volume name: ``osd-block-{osd_fsid}``
+
+
+Storing metadata
+----------------
+The following tags will get applied as part of the preparation process
+regardless of the type of volume (journal or data) or OSD objectstore:
+
+* ``cluster_fsid``
+* ``encrypted``
+* ``osd_fsid``
+* ``osd_id``
+
+For :term:`filestore` these tags will be added:
+
+* ``journal_device``
+* ``journal_uuid``
+
+For :term:`bluestore` these tags will be added:
+
+* ``block_device``
+* ``block_uuid``
+* ``db_device``
+* ``db_uuid``
+* ``wal_device``
+* ``wal_uuid``
+
+.. note:: For the complete lvm tag conventions see :ref:`ceph-volume-lvm-tag-api`
+
+
+Summary
+-------
+To recap the ``prepare`` process for :term:`bluestore`:
+
+#. Accept a logical volume for block or a raw device (that will get converted
+   to an lv)
+#. Accept partitions or logical volumes for ``block.wal`` or ``block.db``
+#. Generate a UUID for the OSD
+#. Ask the monitor get an OSD ID reusing the generated UUID
+#. OSD data directory is created on a tmpfs mount.
+#. ``block``, ``block.wal``, and ``block.db`` are symlinked if defined.
+#. monmap is fetched for activation
+#. Data directory is populated by ``ceph-osd``
+#. Logical Volumes are are assigned all the Ceph metadata using lvm tags
+
+
+And the ``prepare`` process for :term:`filestore`:
+
+#. Accept only logical volumes for data and journal (both required)
+#. Generate a UUID for the OSD
+#. Ask the monitor get an OSD ID reusing the generated UUID
+#. OSD data directory is created and data volume mounted
+#. Journal is symlinked from data volume to journal location
+#. monmap is fetched for activation
+#. devices is mounted and data directory is populated by ``ceph-osd``
+#. data and journal volumes are assigned all the Ceph metadata using lvm tags
diff --git a/src/ceph/doc/ceph-volume/lvm/scan.rst b/src/ceph/doc/ceph-volume/lvm/scan.rst
new file mode 100644
index 0000000..96d2719
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/scan.rst
@@ -0,0 +1,9 @@
+scan
+====
+This sub-command will allow to discover Ceph volumes previously setup by the
+tool by looking into the system's logical volumes and their tags.
+
+As part of the the :ref:`ceph-volume-lvm-prepare` process, the logical volumes are assigned
+a few tags with important pieces of information.
+
+.. note:: This sub-command is not yet implemented
diff --git a/src/ceph/doc/ceph-volume/lvm/systemd.rst b/src/ceph/doc/ceph-volume/lvm/systemd.rst
new file mode 100644
index 0000000..30260de
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/systemd.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-lvm-systemd:
+
+systemd
+=======
+Upon startup, it will identify the logical volume using :term:`LVM tags`,
+finding a matching ID and later ensuring it is the right one with
+the :term:`OSD uuid`.
+
+After identifying the correct volume it will then proceed to mount it by using
+the OSD destination conventions, that is::
+
+    /var/lib/ceph/osd/<cluster name>-<osd id>
+
+For our example OSD with an id of ``0``, that means the identified device will
+be mounted at::
+
+
+    /var/lib/ceph/osd/ceph-0
+
+
+Once that process is complete, a call will be made to start the OSD::
+
+    systemctl start ceph-osd@0
+
+The systemd portion of this process is handled by the ``ceph-volume lvm
+trigger`` sub-command, which is only in charge of parsing metadata coming from
+systemd and startup, and then dispatching to ``ceph-volume lvm activate`` which
+would proceed with activation.
diff --git a/src/ceph/doc/ceph-volume/lvm/zap.rst b/src/ceph/doc/ceph-volume/lvm/zap.rst
new file mode 100644
index 0000000..8d42a90
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/lvm/zap.rst
@@ -0,0 +1,19 @@
+.. _ceph-volume-lvm-zap:
+
+``zap``
+=======
+
+This subcommand is used to zap lvs or partitions that have been used
+by ceph OSDs so that they may be reused. If given a path to a logical
+volume it must be in the format of vg/lv. Any filesystems present
+on the given lv or partition will be removed and all data will be purged.
+
+.. note:: The lv or partition will be kept intact.
+
+Zapping a logical volume::
+
+      ceph-volume lvm zap {vg name/lv name}
+
+Zapping a partition::
+
+      ceph-volume lvm zap /dev/sdc1
diff --git a/src/ceph/doc/ceph-volume/simple/activate.rst b/src/ceph/doc/ceph-volume/simple/activate.rst
new file mode 100644
index 0000000..edbb1e3
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/simple/activate.rst
@@ -0,0 +1,80 @@
+.. _ceph-volume-simple-activate:
+
+``activate``
+============
+Once :ref:`ceph-volume-simple-scan` has been completed, and all the metadata
+captured for an OSD has been persisted to ``/etc/ceph/osd/{id}-{uuid}.json``
+the OSD is now ready to get "activated".
+
+This activation process **disables** all ``ceph-disk`` systemd units by masking
+them, to prevent the UDEV/ceph-disk interaction that will attempt to start them
+up at boot time.
+
+The disabling of ``ceph-disk`` units is done only when calling ``ceph-volume
+simple activate`` directly, but is is avoided when being called by systemd when
+the system is booting up.
+
+The activation process requires using both the :term:`OSD id` and :term:`OSD uuid`
+To activate parsed OSDs::
+
+    ceph-volume simple activate 0 6cc43680-4f6e-4feb-92ff-9c7ba204120e
+
+The above command will assume that a JSON configuration will be found in::
+
+    /etc/ceph/osd/0-6cc43680-4f6e-4feb-92ff-9c7ba204120e.json
+
+Alternatively, using a path to a JSON file directly is also possible::
+
+    ceph-volume simple activate --file /etc/ceph/osd/0-6cc43680-4f6e-4feb-92ff-9c7ba204120e.json
+
+requiring uuids
+^^^^^^^^^^^^^^^
+The :term:`OSD uuid` is being required as an extra step to ensure that the
+right OSD is being activated. It is entirely possible that a previous OSD with
+the same id exists and would end up activating the incorrect one.
+
+
+Discovery
+---------
+With OSDs previously scanned by ``ceph-volume``, a *discovery* process is
+performed using ``blkid`` and ``lvm``. There is currently support only for
+devices with GPT partitions and LVM logical volumes.
+
+The GPT partitions will have a ``PARTUUID`` that can be queried by calling out
+to ``blkid``, and the logical volumes will have a ``lv_uuid`` that can be
+queried against ``lvs`` (the LVM tool to list logical volumes).
+
+This discovery process ensures that devices can be correctly detected even if
+they are repurposed into another system or if their name changes (as in the
+case of non-persisting names like ``/dev/sda1``)
+
+The JSON configuration file used to map what devices go to what OSD will then
+coordinate the mounting and symlinking as part of activation.
+
+To ensure that the symlinks are always correct, if they exist in the OSD
+directory, the symlinks will be re-done.
+
+A systemd unit will capture the :term:`OSD id` and :term:`OSD uuid` and
+persist it. Internally, the activation will enable it like::
+
+    systemctl enable ceph-volume@simple-$id-$uuid
+
+For example::
+
+    systemctl enable ceph-volume@simple-0-8715BEB4-15C5-49DE-BA6F-401086EC7B41
+
+Would start the discovery process for the OSD with an id of ``0`` and a UUID of
+``8715BEB4-15C5-49DE-BA6F-401086EC7B41``.
+
+
+The systemd process will call out to activate passing the information needed to
+identify the OSD and its devices, and it will proceed to:
+
+# mount the device in the corresponding location (by convention this is
+  ``/var/lib/ceph/osd/<cluster name>-<osd id>/``)
+
+# ensure that all required devices are ready for that OSD and properly linked,
+regardless of objectstore used (filestore or bluestore). The symbolic link will
+**always** be re-done to ensure that the correct device is linked.
+
+# start the ``ceph-osd@0`` systemd unit
diff --git a/src/ceph/doc/ceph-volume/simple/index.rst b/src/ceph/doc/ceph-volume/simple/index.rst
new file mode 100644
index 0000000..6f2534a
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/simple/index.rst
@@ -0,0 +1,19 @@
+.. _ceph-volume-simple:
+
+``simple``
+==========
+Implements the functionality needed to manage OSDs from the ``simple`` subcommand:
+``ceph-volume simple``
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-simple-scan`
+
+* :ref:`ceph-volume-simple-activate`
+
+* :ref:`ceph-volume-simple-systemd`
+
+
+By *taking over* management, it disables all ``ceph-disk`` systemd units used
+to trigger devices at startup, relying on basic (customizable) JSON
+configuration and systemd for starting up OSDs.
diff --git a/src/ceph/doc/ceph-volume/simple/scan.rst b/src/ceph/doc/ceph-volume/simple/scan.rst
new file mode 100644
index 0000000..afeddab
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/simple/scan.rst
@@ -0,0 +1,158 @@
+.. _ceph-volume-simple-scan:
+
+``scan``
+========
+Scanning allows to capture any important details from an already-deployed OSD
+so that ``ceph-volume`` can manage it without the need of any other startup
+workflows or tools (like ``udev`` or ``ceph-disk``).
+
+The command has the ability to inspect a running OSD, by inspecting the
+directory where the OSD data is stored, or by consuming the data partition.
+
+Once scanned, information will (by default) persist the metadata as JSON in
+a file in ``/etc/ceph/osd``. This ``JSON`` file will use the naming convention
+of: ``{OSD ID}-{OSD FSID}.json``. An OSD with an id of 1, and an FSID like
+``86ebd829-1405-43d3-8fd6-4cbc9b6ecf96`` the absolute path of the file would
+be::
+
+    /etc/ceph/osd/1-86ebd829-1405-43d3-8fd6-4cbc9b6ecf96.json
+
+The ``scan`` subcommand will refuse to write to this file if it already exists.
+If overwriting the contents is needed, the ``--force`` flag must be used::
+
+    ceph-volume simple scan --force {path}
+
+If there is no need to persist the ``JSON`` metadata, there is support to send
+the contents to ``stdout`` (no file will be written)::
+
+    ceph-volume simple scan --stdout {path}
+
+
+.. _ceph-volume-simple-scan-directory:
+
+Directory scan
+--------------
+The directory scan will capture OSD file contents from interesting files. There
+are a few files that must exist in order to have a successful scan:
+
+* ``ceph_fsid``
+* ``fsid``
+* ``keyring``
+* ``ready``
+* ``type``
+* ``whoami``
+
+In the case of any other file, as long as it is not a binary or a directory, it
+will also get captured and persisted as part of the JSON object.
+
+The convention for the keys in the JSON object is that any file name will be
+a key, and its contents will be its value. If the contents are a single line
+(like in the case of the ``whoami``) the contents are trimmed, and the newline
+is dropped. For example with an OSD with an id of 1, this is how the JSON entry
+would look like::
+
+    "whoami": "1",
+
+For files that may have more than one line, the contents are left as-is, for
+example, a ``keyring`` could look like this::
+
+    "keyring": "[osd.1]\n\tkey = AQBBJ/dZp57NIBAAtnuQS9WOS0hnLVe0rZnE6Q==\n",
+
+For a directory like ``/var/lib/ceph/osd/ceph-1``, the command could look
+like::
+
+    ceph-volume simple scan /var/lib/ceph/osd/ceph1
+
+
+.. note:: There is no support for encrypted OSDs
+
+
+.. _ceph-volume-simple-scan-device:
+
+Device scan
+-----------
+When an OSD directory is not available (OSD is not running, or device is not
+mounted) the ``scan`` command is able to introspect the device to capture
+required data. Just like :ref:`ceph-volume-simple-scan-directory`, it would
+still require a few files present. This means that the device to be scanned
+**must be** the data partition of the OSD.
+
+As long as the data partition of the OSD is being passed in as an argument, the
+sub-command can scan its contents.
+
+In the case where the device is already mounted, the tool can detect this
+scenario and capture file contents from that directory.
+
+If the device is not mounted, a temporary directory will be created, and the
+device will be mounted temporarily just for scanning the contents. Once
+contents are scanned, the device will be unmounted.
+
+For a device like ``/dev/sda1`` which **must** be a data partition, the command
+could look like::
+
+    ceph-volume simple scan /dev/sda1
+
+
+.. note:: There is no support for encrypted OSDs
+
+
+.. _ceph-volume-simple-scan-json:
+
+``JSON`` contents
+-----------------
+The contents of the JSON object is very simple. The scan not only will persist
+information from the special OSD files and their contents, but will also
+validate paths and device UUIDs. Unlike what ``ceph-disk`` would do, by storing
+them in ``{device type}_uuid`` files, the tool will persist them as part of the
+device type key.
+
+For example, a ``block.db`` device would look something like::
+
+    "block.db": {
+        "path": "/dev/disk/by-partuuid/6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+        "uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e"
+    },
+
+But it will also persist the ``ceph-disk`` special file generated, like so::
+
+    "block.db_uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+
+This duplication is in place because the tool is trying to ensure the
+following:
+
+# Support OSDs that may not have ceph-disk special files
+# Check the most up-to-date information on the device, by querying against LVM
+and ``blkid``
+# Support both logical volumes and GPT devices
+
+This is a sample ``JSON`` metadata, from an OSD that is using ``bluestore``::
+
+    {
+        "active": "ok",
+        "block": {
+            "path": "/dev/disk/by-partuuid/40fd0a64-caa5-43a3-9717-1836ac661a12",
+            "uuid": "40fd0a64-caa5-43a3-9717-1836ac661a12"
+        },
+        "block.db": {
+            "path": "/dev/disk/by-partuuid/6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+            "uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e"
+        },
+        "block.db_uuid": "6cc43680-4f6e-4feb-92ff-9c7ba204120e",
+        "block_uuid": "40fd0a64-caa5-43a3-9717-1836ac661a12",
+        "bluefs": "1",
+        "ceph_fsid": "c92fc9eb-0610-4363-aafc-81ddf70aaf1b",
+        "cluster_name": "ceph",
+        "data": {
+            "path": "/dev/sdr1",
+            "uuid": "86ebd829-1405-43d3-8fd6-4cbc9b6ecf96"
+        },
+        "fsid": "86ebd829-1405-43d3-8fd6-4cbc9b6ecf96",
+        "keyring": "[osd.3]\n\tkey = AQBBJ/dZp57NIBAAtnuQS9WOS0hnLVe0rZnE6Q==\n",
+        "kv_backend": "rocksdb",
+        "magic": "ceph osd volume v026",
+        "mkfs_done": "yes",
+        "ready": "ready",
+        "systemd": "",
+        "type": "bluestore",
+        "whoami": "3"
+    }
diff --git a/src/ceph/doc/ceph-volume/simple/systemd.rst b/src/ceph/doc/ceph-volume/simple/systemd.rst
new file mode 100644
index 0000000..aa5bebf
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/simple/systemd.rst
@@ -0,0 +1,28 @@
+.. _ceph-volume-simple-systemd:
+
+systemd
+=======
+Upon startup, it will identify the logical volume by loading the JSON file in
+``/etc/ceph/osd/{id}-{uuid}.json`` corresponding to the instance name of the
+systemd unit.
+
+After identifying the correct volume it will then proceed to mount it by using
+the OSD destination conventions, that is::
+
+    /var/lib/ceph/osd/{cluster name}-{osd id}
+
+For our example OSD with an id of ``0``, that means the identified device will
+be mounted at::
+
+
+    /var/lib/ceph/osd/ceph-0
+
+
+Once that process is complete, a call will be made to start the OSD::
+
+    systemctl start ceph-osd@0
+
+The systemd portion of this process is handled by the ``ceph-volume simple
+trigger`` sub-command, which is only in charge of parsing metadata coming from
+systemd and startup, and then dispatching to ``ceph-volume simple activate`` which
+would proceed with activation.
diff --git a/src/ceph/doc/ceph-volume/systemd.rst b/src/ceph/doc/ceph-volume/systemd.rst
new file mode 100644
index 0000000..6cbc112
--- /dev/null
+++ b/src/ceph/doc/ceph-volume/systemd.rst
@@ -0,0 +1,49 @@
+.. _ceph-volume-systemd:
+
+systemd
+=======
+As part of the activation process (either with :ref:`ceph-volume-lvm-activate`
+or :ref:`ceph-volume-simple-activate`), systemd units will get enabled that
+will use the OSD id and uuid as part of their name. These units will be run
+when the system boots, and will proceed to activate their corresponding
+volumes via their sub-command implementation.
+
+The API for activation is a bit loose, it only requires two parts: the
+subcommand to use and any extra meta information separated by a dash. This
+convention makes the units look like::
+
+    ceph-volume@{command}-{extra metadata}
+
+The *extra metadata* can be anything needed that the subcommand implementing
+the processing might need. In the case of :ref:`ceph-volume-lvm` and
+:ref:`ceph-volume-simple`, both look to consume the :term:`OSD id` and :term:`OSD uuid`,
+but this is not a hard requirement, it is just how the sub-commands are
+implemented.
+
+Both the command and extra metadata gets persisted by systemd as part of the
+*"instance name"* of the unit.  For example an OSD with an ID of 0, for the
+``lvm`` sub-command would look like::
+
+    systemctl enable ceph-volume@lvm-0-0A3E1ED2-DA8A-4F0E-AA95-61DEC71768D6
+
+The enabled unit is a :term:`systemd oneshot` service, meant to start at boot
+after the local filesystem is ready to be used.
+
+
+Failure and Retries
+-------------------
+It is common to have failures when a system is coming up online. The devices
+are sometimes not fully available and this unpredictable behavior may cause an
+OSD to not be ready to be used.
+
+There are two configurable environment variables used to set the retry
+behavior:
+
+* ``CEPH_VOLUME_SYSTEMD_TRIES``: Defaults to 30
+* ``CEPH_VOLUME_SYSTEMD_INTERVAL``: Defaults to 5
+
+The *"tries"* is a number that sets the maximum amount of times the unit will
+attempt to activate an OSD before giving up.
+
+The *"interval"* is a value in seconds that determines the waiting time before
+initiating another try at activating the OSD.
author	Qiaowei Ren <qiaowei.ren@intel.com>	2018-01-04 13:43:33 +0800
committer	Qiaowei Ren <qiaowei.ren@intel.com>	2018-01-05 11:59:39 +0800
commit	812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch)
tree	04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/ceph-volume
parent	15280273faafb77777eab341909a3f495cf248d9 (diff)