diff options
author | Qiaowei Ren <qiaowei.ren@intel.com> | 2018-01-04 13:43:33 +0800 |
---|---|---|
committer | Qiaowei Ren <qiaowei.ren@intel.com> | 2018-01-05 11:59:39 +0800 |
commit | 812ff6ca9fcd3e629e49d4328905f33eee8ca3f5 (patch) | |
tree | 04ece7b4da00d9d2f98093774594f4057ae561d4 /src/ceph/doc/mgr | |
parent | 15280273faafb77777eab341909a3f495cf248d9 (diff) |
initial code repo
This patch creates initial code repo.
For ceph, luminous stable release will be used for base code,
and next changes and optimization for ceph will be added to it.
For opensds, currently any changes can be upstreamed into original
opensds repo (https://github.com/opensds/opensds), and so stor4nfv
will directly clone opensds code to deploy stor4nfv environment.
And the scripts for deployment based on ceph and opensds will be
put into 'ci' directory.
Change-Id: I46a32218884c75dda2936337604ff03c554648e4
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Diffstat (limited to 'src/ceph/doc/mgr')
-rw-r--r-- | src/ceph/doc/mgr/administrator.rst | 159 | ||||
-rw-r--r-- | src/ceph/doc/mgr/dashboard.rst | 59 | ||||
-rw-r--r-- | src/ceph/doc/mgr/index.rst | 36 | ||||
-rw-r--r-- | src/ceph/doc/mgr/influx.rst | 162 | ||||
-rw-r--r-- | src/ceph/doc/mgr/localpool.rst | 35 | ||||
-rw-r--r-- | src/ceph/doc/mgr/plugins.rst | 215 | ||||
-rw-r--r-- | src/ceph/doc/mgr/prometheus.rst | 219 | ||||
-rw-r--r-- | src/ceph/doc/mgr/restful.rst | 89 | ||||
-rw-r--r-- | src/ceph/doc/mgr/zabbix.rst | 104 |
9 files changed, 1078 insertions, 0 deletions
diff --git a/src/ceph/doc/mgr/administrator.rst b/src/ceph/doc/mgr/administrator.rst new file mode 100644 index 0000000..11daf3e --- /dev/null +++ b/src/ceph/doc/mgr/administrator.rst @@ -0,0 +1,159 @@ +.. _mgr-administrator-guide: + +ceph-mgr administrator's guide +============================== + +Manual setup +------------ + +Usually, you would set up a ceph-mgr daemon using a tool such +as ceph-ansible. These instructions describe how to set up +a ceph-mgr daemon manually. + +First, create an authentication key for your daemon:: + + ceph auth get-or-create mgr.$name mon 'allow profile mgr' osd 'allow *' mds 'allow *' + +Place that key into ``mgr data`` path, which for a cluster "ceph" +and mgr $name "foo" would be ``/var/lib/ceph/mgr/ceph-foo``. + +Start the ceph-mgr daemon:: + + ceph-mgr -i $name + +Check that the mgr has come up by looking at the output +of ``ceph status``, which should now include a mgr status line:: + + mgr active: $name + +Client authentication +--------------------- + +The manager is a new daemon which requires new CephX capabilities. If you upgrade +a cluster from an old version of Ceph, or use the default install/deploy tools, +your admin client should get this capability automatically. If you use tooling from +elsewhere, you may get EACCES errors when invoking certain ceph cluster commands. +To fix that, add a "mgr allow \*" stanza to your client's cephx capabilities by +`Modifying User Capabilities`_. + +High availability +----------------- + +In general, you should set up a ceph-mgr on each of the hosts +running a ceph-mon daemon to achieve the same level of availability. + +By default, whichever ceph-mgr instance comes up first will be made +active by the monitors, and the others will be standbys. There is +no requirement for quorum among the ceph-mgr daemons. + +If the active daemon fails to send a beacon to the monitors for +more than ``mon mgr beacon grace`` (default 30s), then it will be replaced +by a standby. + +If you want to pre-empt failover, you can explicitly mark a ceph-mgr +daemon as failed using ``ceph mgr fail <mgr name>``. + +Using modules +------------- + +Use the command ``ceph mgr module ls`` to see which modules are +available, and which are currently enabled. Enable or disable modules +using the commands ``ceph mgr module enable <module>`` and +``ceph mgr module disable <module>`` respectively. + +If a module is *enabled* then the active ceph-mgr daemon will load +and execute it. In the case of modules that provide a service, +such as an HTTP server, the module may publish its address when it +is loaded. To see the addresses of such modules, use the command +``ceph mgr services``. + +Some modules may also implement a special standby mode which runs on +standby ceph-mgr daemons as well as the active daemon. This enables +modules that provide services to redirect their clients to the active +daemon, if the client tries to connect to a standby. + +Consult the documentation pages for individual manager modules for more +information about what functionality each module provides. + +Here is an example of enabling the ``dashboard`` module: + +:: + + $ ceph mgr module ls + { + "enabled_modules": [ + "restful", + "status" + ], + "disabled_modules": [ + "dashboard" + ] + } + + $ ceph mgr module enable dashboard + $ ceph mgr module ls + { + "enabled_modules": [ + "restful", + "status", + "dashboard" + ], + "disabled_modules": [ + ] + } + + $ ceph mgr services + { + "dashboard": "http://myserver.com:7789/", + "restful": "https://myserver.com:8789/" + } + + +Calling module commands +----------------------- + +Where a module implements command line hooks, the commands will +be accessible as ordinary Ceph commands:: + + ceph <command | help> + +If you would like to see the list of commands handled by the +manager (where normal ``ceph help`` would show all mon and mgr commands), +you can send a command directly to the manager daemon:: + + ceph tell mgr help + +Note that it is not necessary to address a particular mgr instance, +simply ``mgr`` will pick the current active daemon. + +Configuration +------------- + +OPTION(mgr_module_path, OPT_STR, CEPH_PKGLIBDIR "/mgr") // where to load python modules from + +``mgr module path`` + +:Description: Path to load modules from +:Type: String +:Default: ``"<library dir>/mgr"`` + +``mgr data`` + +:Description: Path to load daemon data (such as keyring) +:Type: String +:Default: ``"/var/lib/ceph/mgr/$cluster-$id"`` + +``mgr tick period`` + +:Description: How many seconds between mgr beacons to monitors, and other + periodic checks. +:Type: Integer +:Default: ``5`` + +``mon mgr beacon grace`` + +:Description: How long after last beacon should a mgr be considered failed +:Type: Integer +:Default: ``30`` + +.. _Modifying User Capabilities: ../../rados/operations/user-management/#modify-user-capabilities diff --git a/src/ceph/doc/mgr/dashboard.rst b/src/ceph/doc/mgr/dashboard.rst new file mode 100644 index 0000000..4c2116b --- /dev/null +++ b/src/ceph/doc/mgr/dashboard.rst @@ -0,0 +1,59 @@ +dashboard plugin +================ + +Dashboard plugin visualizes the statistics of the cluster using a web server +hosted by ``ceph-mgr``. + +Enabling +-------- + +The *dashboard* module is enabled with:: + + ceph mgr module enable dashboard + +Configuration +------------- + +Like most web applications, dashboard binds to a host name and port. +By default, the ``ceph-mgr`` daemon hosting the dashboard (i.e., the +currently active manager) will bind to port 7000 and any available +IPv4 or IPv6 address on the host. + +Since each ``ceph-mgr`` hosts its own instance of dashboard, it may +also be necessary to configure them separately. The hostname and port +can be changed via the configuration key facility:: + + ceph config-key set mgr/dashboard/$name/server_addr $IP + ceph config-key set mgr/dashboard/$name/server_port $PORT + +where ``$name`` is the ID of the ceph-mgr who is hosting this +dashboard web app. + +These settings can also be configured cluster-wide and not manager +specific. For example,:: + + ceph config-key set mgr/dashboard/server_addr $IP + ceph config-key set mgr/dashboard/server_port $PORT + +If the port is not configured, the web app will bind to port ``7000``. +If the address it not configured, the web app will bind to ``::``, +which corresponds to all available IPv4 and IPv6 addresses. + +You can configure a prefix for all URLs:: + + ceph config-key set mgr/dashboard/url_prefix $PREFIX + +so you can access the dashboard at ``http://$IP:$PORT/$PREFIX/``. + + +Load balancer +------------- + +Please note that the dashboard will *only* start on the manager which +is active at that moment. Query the Ceph cluster status to see which +manager is active (e.g., ``ceph mgr dump``). In order to make the +dashboard available via a consistent URL regardless of which manager +daemon is currently active, you may want to set up a load balancer +front-end to direct traffic to whichever manager endpoint is +available. If you use a reverse http proxy that forwards a subpath to +the dashboard, you need to configure ``url_prefix`` (see above). diff --git a/src/ceph/doc/mgr/index.rst b/src/ceph/doc/mgr/index.rst new file mode 100644 index 0000000..53844ba --- /dev/null +++ b/src/ceph/doc/mgr/index.rst @@ -0,0 +1,36 @@ + + +=================== +Ceph Manager Daemon +=================== + +The :term:`Ceph Manager` daemon (ceph-mgr) runs alongside monitor daemons, +to provide additional monitoring and interfaces to external monitoring +and management systems. + +Since the 12.x (*luminous*) Ceph release, the ceph-mgr daemon is required for +normal operations. The ceph-mgr daemon is an optional component in +the 11.x (*kraken*) Ceph release. + +By default, the manager daemon requires no additional configuration, beyond +ensuring it is running. If there is no mgr daemon running, you will +see a health warning to that effect, and some of the other information +in the output of `ceph status` will be missing or stale until a mgr is started. + +Use your normal deployment tools, such as ceph-ansible or ceph-deploy, to +set up ceph-mgr daemons on each of your mon nodes. It is not mandatory +to place mgr daemons on the same nodes as mons, but it is almost always +sensible. + +.. toctree:: + :maxdepth: 1 + + Installation and Configuration <administrator> + Writing plugins <plugins> + Dashboard plugin <dashboard> + Local pool plugin <localpool> + RESTful plugin <restful> + Zabbix plugin <zabbix> + Prometheus plugin <prometheus> + Influx plugin <influx> + diff --git a/src/ceph/doc/mgr/influx.rst b/src/ceph/doc/mgr/influx.rst new file mode 100644 index 0000000..37aa5cd --- /dev/null +++ b/src/ceph/doc/mgr/influx.rst @@ -0,0 +1,162 @@ +============= +Influx Plugin +============= + +The influx plugin continuously collects and sends time series data to an +influxdb database. + +The influx plugin was introduced in the 13.x *Mimic* release. + +-------- +Enabling +-------- + +To enable the module, use the following command: + +:: + + ceph mgr module enable influx + +If you wish to subsequently disable the module, you can use the equivalent +*disable* command: + +:: + + ceph mgr module disable influx + +------------- +Configuration +------------- + +For the influx module to send statistics to an InfluxDB server, it +is necessary to configure the servers address and some authentication +credentials. + +Set configuration values using the following command: + +:: + + ceph config-key set mgr/influx/<key> <value> + + +The most important settings are ``hostname``, ``username`` and ``password``. +For example, a typical configuration might look like this: + +:: + + ceph config-key set mgr/influx/hostname influx.mydomain.com + ceph config-key set mgr/influx/username admin123 + ceph config-key set mgr/influx/password p4ssw0rd + +Additional optional configuration settings are: + +:interval: Time between reports to InfluxDB. Default 5 seconds. +:database: InfluxDB database name. Default "ceph". You will need to create this database and grant write privileges to the configured username or the username must have admin privileges to create it. +:port: InfluxDB server port. Default 8086 + + +--------- +Debugging +--------- + +By default, a few debugging statments as well as error statements have been set to print in the log files. Users can add more if necessary. +To make use of the debugging option in the module: + +- Add this to the ceph.conf file.:: + + [mgr] + debug_mgr = 20 + +- Use this command ``ceph tell mgr.<mymonitor> influx self-test``. +- Check the log files. Users may find it easier to filter the log files using *mgr[influx]*. + +-------------------- +Interesting counters +-------------------- + +The following tables describe a subset of the values output by +this module. + +^^^^^ +Pools +^^^^^ + ++---------------+-----------------------------------------------------+ +|Counter | Description | ++===============+=====================================================+ +|bytes_used | Bytes used in the pool not including copies | ++---------------+-----------------------------------------------------+ +|max_avail | Max available number of bytes in the pool | ++---------------+-----------------------------------------------------+ +|objects | Number of objects in the pool | ++---------------+-----------------------------------------------------+ +|wr_bytes | Number of bytes written in the pool | ++---------------+-----------------------------------------------------+ +|dirty | Number of bytes dirty in the pool | ++---------------+-----------------------------------------------------+ +|rd_bytes | Number of bytes read in the pool | ++---------------+-----------------------------------------------------+ +|raw_bytes_used | Bytes used in pool including copies made | ++---------------+-----------------------------------------------------+ + +^^^^ +OSDs +^^^^ + ++------------+------------------------------------+ +|Counter | Description | ++============+====================================+ +|op_w | Client write operations | ++------------+------------------------------------+ +|op_in_bytes | Client operations total write size | ++------------+------------------------------------+ +|op_r | Client read operations | ++------------+------------------------------------+ +|op_out_bytes| Client operations total read size | ++------------+------------------------------------+ + + ++------------------------+--------------------------------------------------------------------------+ +|Counter | Description | ++========================+==========================================================================+ +|op_wip | Replication operations currently being processed (primary) | ++------------------------+--------------------------------------------------------------------------+ +|op_latency | Latency of client operations (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_process_latency | Latency of client operations (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_prepare_latency | Latency of client operations (excluding queue time and wait for finished)| ++------------------------+--------------------------------------------------------------------------+ +|op_r_latency | Latency of read operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_r_process_latency | Latency of read operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_in_bytes | Client data written | ++------------------------+--------------------------------------------------------------------------+ +|op_w_latency | Latency of write operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_process_latency | Latency of write operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_prepare_latency | Latency of write operations (excluding queue time and wait for finished) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw | Client read-modify-write operations | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_in_bytes | Client read-modify-write operations write in | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_out_bytes | Client read-modify-write operations read out | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_latency | Latency of read-modify-write operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_process_latency | Latency of read-modify-write operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_prepare_latency | Latency of read-modify-write operations (excluding queue time | +| | and wait for finished) | ++------------------------+--------------------------------------------------------------------------+ +|op_before_queue_op_lat | Latency of IO before calling queue (before really queue into ShardedOpWq)| +| | op_before_dequeue_op_lat | ++------------------------+--------------------------------------------------------------------------+ +|op_before_dequeue_op_lat| Latency of IO before calling dequeue_op(already dequeued and get PG lock)| ++------------------------+--------------------------------------------------------------------------+ + +Latency counters are measured in microseconds unless otherwise specified in the description. + diff --git a/src/ceph/doc/mgr/localpool.rst b/src/ceph/doc/mgr/localpool.rst new file mode 100644 index 0000000..5779b7c --- /dev/null +++ b/src/ceph/doc/mgr/localpool.rst @@ -0,0 +1,35 @@ +Local pool plugin +================= + +The *localpool* plugin can automatically create RADOS pools that are +localized to a subset of the overall cluster. For example, by default, it will +create a pool for each distinct rack in the cluster. This can be useful for some +deployments that want to distribute some data locally as well as globally across the cluster . + +Enabling +-------- + +The *localpool* module is enabled with:: + + ceph mgr module enable localpool + +Configuring +----------- + +The *localpool* module understands the following options: + +* **subtree** (default: `rack`): which CRUSH subtree type the module + should create a pool for. +* **failure_domain** (default: `host`): what failure domain we should + separate data replicas across. +* **pg_num** (default: `128`): number of PGs to create for each pool +* **num_rep** (default: `3`): number of replicas for each pool. + (Currently, pools are always replicated.) +* **min_size** (default: none): value to set min_size to (unchanged from Ceph's default if this option is not set) +* **prefix** (default: `by-$subtreetype-`): prefix for the pool name. + +These options are set via the config-key interface. For example, to +change the replication level to 2x with only 64 PGs, :: + + ceph config-key set mgr/localpool/num_rep 2 + ceph config-key set mgr/localpool/pg_num 64 diff --git a/src/ceph/doc/mgr/plugins.rst b/src/ceph/doc/mgr/plugins.rst new file mode 100644 index 0000000..a75c14c --- /dev/null +++ b/src/ceph/doc/mgr/plugins.rst @@ -0,0 +1,215 @@ + +ceph-mgr plugin author guide +============================ + +Creating a plugin +----------------- + +In pybind/mgr/, create a python module. Within your module, create a class +named ``Module`` that inherits from ``MgrModule``. + +The most important methods to override are: + +* a ``serve`` member function for server-type modules. This + function should block forever. +* a ``notify`` member function if your module needs to + take action when new cluster data is available. +* a ``handle_command`` member function if your module + exposes CLI commands. + +Installing a plugin +------------------- + +Once your module is present in the location set by the +``mgr module path`` configuration setting, you can enable it +via the ``ceph mgr module enable`` command:: + + ceph mgr module enable mymodule + +Note that the MgrModule interface is not stable, so any modules maintained +outside of the Ceph tree are liable to break when run against any newer +or older versions of Ceph. + +Logging +------- + +MgrModule instances have a ``log`` property which is a logger instance that +sends log messages into the Ceph logging layer where they will be recorded +in the mgr daemon's log file. + +Use it the same way you would any other python logger. The python +log levels debug, info, warn, err are mapped into the Ceph +severities 20, 4, 1 and 0 respectively. + +Exposing commands +----------------- + +Set the ``COMMANDS`` class attribute of your plugin to a list of dicts +like this:: + + COMMANDS = [ + { + "cmd": "foobar name=myarg,type=CephString", + "desc": "Do something awesome", + "perm": "rw" + } + ] + +The ``cmd`` part of each entry is parsed in the same way as internal +Ceph mon and admin socket commands (see mon/MonCommands.h in +the Ceph source for examples) + +Config settings +--------------- + +Modules have access to a simple key/value store (keys and values are +byte strings) for storing configuration. Don't use this for +storing large amounts of data. + +Config values are stored using the mon's config-key commands. + +Hints for using these: + +* Reads are fast: ceph-mgr keeps a local in-memory copy +* Don't set things by hand with "ceph config-key", the mgr doesn't update + at runtime (only set things from within modules). +* Writes block until the value is persisted, but reads from another + thread will see the new value immediately. + +Any config settings you want to expose to users from your module will +need corresponding hooks in ``COMMANDS`` to expose a setter. + +Accessing cluster data +---------------------- + +Modules have access to the in-memory copies of the Ceph cluster's +state that the mgr maintains. Accessor functions as exposed +as members of MgrModule. + +Calls that access the cluster or daemon state are generally going +from Python into native C++ routines. There is some overhead to this, +but much less than for example calling into a REST API or calling into +an SQL database. + +There are no consistency rules about access to cluster structures or +daemon metadata. For example, an OSD might exist in OSDMap but +have no metadata, or vice versa. On a healthy cluster these +will be very rare transient states, but plugins should be written +to cope with the possibility. + +``get(self, data_name)`` + +Fetch named cluster-wide objects such as the OSDMap. Valid things +to fetch are osd_crush_map_text, osd_map, osd_map_tree, +osd_map_crush, config, mon_map, fs_map, osd_metadata, pg_summary, +df, osd_stats, health, mon_status. + +All these structures have their own JSON representations: experiment +or look at the C++ dump() methods to learn about them. + +``get_server(self, hostname)`` + +Fetch metadata about a particular hostname. This is information +that ceph-mgr has gleaned from the daemon metadata reported +by daemons running on a particular server. + +``list_servers(self)`` + +Like ``get_server``, but gives information about all servers (i.e. all +unique hostnames that have been mentioned in daemon metadata) + +``get_metadata(self, svc_type, svc_id)`` + +Fetch the daemon metadata for a particular service. svc_type is one +of osd or mds, and svc_id is a string (convert OSD integer IDs to strings +when calling this). + +``get_counter(self, svc_type, svc_name, path)`` + +Fetch the latest performance counter data for a particular counter. The +path is a period-separated concatenation of the subsystem and the counter +name, for example "mds.inodes". + +A list of two-tuples of (timestamp, value) is returned. This may be +empty if no data is available. + +Sending commands +---------------- + +A non-blocking facility is provided for sending monitor commands +to the cluster. + +``send_command(self, result, command_str, tag)`` + +The ``result`` parameter should be an instance of the CommandResult +class, defined in the same module as MgrModule. This acts as a +completion and stores the output of the command. Use CommandResult.wait() +if you want to block on completion. + +The ``command_str`` parameter is a JSON-serialized command. This +uses the same format as the ceph command line, which is a dictionary +of command arguments, with the extra ``prefix`` key containing the +command name itself. Consult MonCommands.h for available commands +and their expected arguments. + +The ``tag`` parameter is used for nonblocking operation: when +a command completes, the ``notify()`` callback on the MgrModule +instance is triggered, with notify_type set to "command", and +notify_id set to the tag of the command. + +Implementing standby mode +------------------------- + +For some modules, it is useful to run on standby manager daemons as well +as on the active daemon. For example, an HTTP server can usefully +serve HTTP redirect responses from the standby managers so that +the user can point his browser at any of the manager daemons without +having to worry about which one is active. + +Standby manager daemons look for a class called ``StandbyModule`` +in each module. If the class is not found then the module is not +used at all on standby daemons. If the class is found, then +its ``serve`` method is called. Implementations of ``StandbyModule`` +must inherit from ``mgr_module.MgrStandbyModule``. + +The interface of ``MgrStandbyModule`` is much restricted compared to +``MgrModule`` -- none of the Ceph cluster state is available to +the module. ``serve`` and ``shutdown`` methods are used in the same +way as a normal module class. The ``get_active_uri`` method enables +the standby module to discover the address of its active peer in +order to make redirects. See the ``MgrStandbyModule`` definition +in the Ceph source code for the full list of methods. + +For an example of how to use this interface, look at the source code +of the ``dashboard`` module. + +Logging +------- + +Use your module's ``log`` attribute as your logger. This is a logger +configured to output via the ceph logging framework, to the local ceph-mgr +log files. + +Python log severities are mapped to ceph severities as follows: + +* DEBUG is 20 +* INFO is 4 +* WARN is 1 +* ERR is 0 + +Shutting down cleanly +--------------------- + +If a module implements the ``serve()`` method, it should also implement +the ``shutdown()`` method to shutdown cleanly: misbehaving modules +may otherwise prevent clean shutdown of ceph-mgr. + +Is something missing? +--------------------- + +The ceph-mgr python interface is not set in stone. If you have a need +that is not satisfied by the current interface, please bring it up +on the ceph-devel mailing list. While it is desired to avoid bloating +the interface, it is not generally very hard to expose existing data +to the Python code when there is a good reason. + diff --git a/src/ceph/doc/mgr/prometheus.rst b/src/ceph/doc/mgr/prometheus.rst new file mode 100644 index 0000000..5bae6a9 --- /dev/null +++ b/src/ceph/doc/mgr/prometheus.rst @@ -0,0 +1,219 @@ +================= +Prometheus plugin +================= + +Provides a Prometheus exporter to pass on Ceph performance counters +from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport +messages from all MgrClient processes (mons and OSDs, for instance) +with performance counter schema data and actual counter data, and keeps +a circular buffer of the last N samples. This plugin creates an HTTP +endpoint (like all Prometheus exporters) and retrieves the latest sample +of every counter when polled (or "scraped" in Prometheus terminology). +The HTTP path and query parameters are ignored; all extant counters +for all reporting entities are returned in text exposition format. +(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.) + +Enabling prometheus output +========================== + +The *prometheus* module is enabled with:: + + ceph mgr module enable prometheus + +Configuration +------------- + +By default the module will accept HTTP requests on port ``9283`` on all +IPv4 and IPv6 addresses on the host. The port and listen address are both +configurable with ``ceph config-key set``, with keys +``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``. +This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_. + +Statistic names and labels +========================== + +The names of the stats are exactly as Ceph names them, with +illegal characters ``.``, ``-`` and ``::`` translated to ``_``, +and ``ceph_`` prefixed to all names. + + +All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123" +that identifies the type and ID of the daemon they come from. Some +statistics can come from different types of daemon, so when querying +e.g. an OSD's RocksDB stats, you would probably want to filter +on ceph_daemon starting with "osd" to avoid mixing in the monitor +rocksdb stats. + + +The *cluster* statistics (i.e. those global to the Ceph cluster) +have labels appropriate to what they report on. For example, +metrics relating to pools have a ``pool_id`` label. + +Pool and OSD metadata series +---------------------------- + +Special series are output to enable displaying and querying on +certain metadata fields. + +Pools have a ``ceph_pool_metadata`` field like this: + +:: + + ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 0.0 + +OSDs have a ``ceph_osd_metadata`` field like this: + +:: + + ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",id="0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 0.0 + + +Correlating drive statistics with node_exporter +----------------------------------------------- + +The prometheus output from Ceph is designed to be used in conjunction +with the generic host monitoring from the Prometheus node_exporter. + +To enable correlation of Ceph OSD statistics with node_exporter's +drive statistics, special series are output like this: + +:: + + ceph_disk_occupation{ceph_daemon="osd.0",device="sdd",instance="myhost",job="ceph"} + +To use this to get disk statistics by OSD ID, use the ``and on`` syntax +in your prometheus query like this: + +:: + + rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"} + +See the prometheus documentation for more information about constructing +queries. + +Note that for this mechanism to work, Ceph and node_exporter must agree +about the values of the ``instance`` label. See the following section +for guidance about to to set up Prometheus in a way that sets +``instance`` properly. + +Configuring Prometheus server +============================= + +See the prometheus documentation for full details of how to add +scrape endpoints: the notes +in this section are tips on how to configure Prometheus to capture +the Ceph statistics in the most usefully-labelled form. + +This configuration is necessary because Ceph is reporting metrics +from many hosts and services via a single endpoint, and some +metrics that relate to no physical host (such as pool statistics). + +honor_labels +------------ + +To enable Ceph to output properly-labelled data relating to any host, +use the ``honor_labels`` setting when adding the ceph-mgr endpoints +to your prometheus configuration. + +Without this setting, any ``instance`` labels that Ceph outputs, such +as those in ``ceph_disk_occupation`` series, will be overridden +by Prometheus. + +Ceph instance label +------------------- + +By default, Prometheus applies an ``instance`` label that includes +the hostname and port of the endpoint that the series game from. Because +Ceph clusters have multiple manager daemons, this results in an ``instance`` +label that changes spuriously when the active manager daemon changes. + +Set a custom ``instance`` label in your Prometheus target configuration: +you might wish to set it to the hostname of your first monitor, or something +completely arbitrary like "ceph_cluster". + +node_exporter instance labels +----------------------------- + +Set your ``instance`` labels to match what appears in Ceph's OSD metadata +in the ``hostname`` field. This is generally the short hostname of the node. + +This is only necessary if you want to correlate Ceph stats with host stats, +but you may find it useful to do it in all cases in case you want to do +the correlation in the future. + +Example configuration +--------------------- + +This example shows a single node configuration running ceph-mgr and +node_exporter on a server called ``senta04``. + +This is just an example: there are other ways to configure prometheus +scrape targets and label rewrite rules. + +prometheus.yml +~~~~~~~~~~~~~~ + +:: + + global: + scrape_interval: 15s + evaluation_interval: 15s + + scrape_configs: + - job_name: 'node' + file_sd_configs: + - files: + - node_targets.yml + - job_name: 'ceph' + honor_labels: true + file_sd_configs: + - files: + - ceph_targets.yml + + +ceph_targets.yml +~~~~~~~~~~~~~~~~ + + +:: + + [ + { + "targets": [ "senta04.mydomain.com:9283" ], + "labels": { + "instance": "ceph_cluster" + } + } + ] + + +node_targets.yml +~~~~~~~~~~~~~~~~ + +:: + + [ + { + "targets": [ "senta04.mydomain.com:9100" ], + "labels": { + "instance": "senta04" + } + } + ] + + +Notes +===== + +Counters and gauges are exported; currently histograms and long-running +averages are not. It's possible that Ceph's 2-D histograms could be +reduced to two separate 1-D histograms, and that long-running averages +could be exported as Prometheus' Summary type. + +Timestamps, as with many Prometheus exporters, are established by +the server's scrape time (Prometheus expects that it is polling the +actual counter process synchronously). It is possible to supply a +timestamp along with the stat report, but the Prometheus team strongly +advises against this. This means that timestamps will be delayed by +an unpredictable amount; it's not clear if this will be problematic, +but it's worth knowing about. diff --git a/src/ceph/doc/mgr/restful.rst b/src/ceph/doc/mgr/restful.rst new file mode 100644 index 0000000..e67f2d1 --- /dev/null +++ b/src/ceph/doc/mgr/restful.rst @@ -0,0 +1,89 @@ +restful plugin +============== + +RESTful plugin offers the REST API access to the status of the cluster +over an SSL-secured connection. + +Enabling +-------- + +The *restful* module is enabled with:: + + ceph mgr module enable restful + +You will also need to configure an SSL certificate below before the +API endpoint is available. By default the module will accept HTTPS +requests on port ``8003`` on all IPv4 and IPv6 addresses on the host. + +Securing +-------- + +All connections to *restful* are secured with SSL. You can generate a +self-signed certificate with the command:: + + ceph restful create-self-signed-cert + +Note that with a self-signed certificate most clients will need a flag +to allow a connection and/or suppress warning messages. For example, +if the ``ceph-mgr`` daemon is on the same host,:: + + curl -k https://localhost:8003/ + +To properly secure a deployment, a certificate that is signed by the +organization's certificate authority should be used. For example, a key pair +can be generated with a command similar to:: + + openssl req -new -nodes -x509 \ + -subj "/O=IT/CN=ceph-mgr-restful" \ + -days 3650 -keyout restful.key -out restful.crt -extensions v3_ca + +The ``restful.crt`` should then be signed by your organization's CA +(certificate authority). Once that is done, you can set it with:: + + ceph config-key set mgr/restful/$name/crt -i restful.crt + ceph config-key set mgr/restful/$name/key -i restful.key + +where ``$name`` is the name of the ``ceph-mgr`` instance (usually the +hostname). If all manager instances are to share the same certificate, +you can leave off the ``$name`` portion:: + + ceph config-key set mgr/restful/crt -i restful.crt + ceph config-key set mgr/restful/key -i restful.key + + +Configuring IP and port +----------------------- + +Like any other RESTful API endpoint, *restful* binds to an IP and +port. By default, the currently active ``ceph-mgr`` daemon will bind +to port 8003 and any available IPv4 or IPv6 address on the host. + +Since each ``ceph-mgr`` hosts its own instance of *restful*, it may +also be necessary to configure them separately. The IP and port +can be changed via the configuration key facility:: + + ceph config-key set mgr/restful/$name/server_addr $IP + ceph config-key set mgr/restful/$name/server_port $PORT + +where ``$name`` is the ID of the ceph-mgr daemon (usually the hostname). + +These settings can also be configured cluster-wide and not manager +specific. For example,:: + + ceph config-key set mgr/restful/server_addr $IP + ceph config-key set mgr/restful/server_port $PORT + +If the port is not configured, *restful* will bind to port ``8003``. +If the address it not configured, the *restful* will bind to ``::``, +which corresponds to all available IPv4 and IPv6 addresses. + +Load balancer +------------- + +Please note that *restful* will *only* start on the manager which +is active at that moment. Query the Ceph cluster status to see which +manager is active (e.g., ``ceph mgr dump``). In order to make the +API available via a consistent URL regardless of which manager +daemon is currently active, you may want to set up a load balancer +front-end to direct traffic to whichever manager endpoint is +available. diff --git a/src/ceph/doc/mgr/zabbix.rst b/src/ceph/doc/mgr/zabbix.rst new file mode 100644 index 0000000..d98540e --- /dev/null +++ b/src/ceph/doc/mgr/zabbix.rst @@ -0,0 +1,104 @@ +Zabbix plugin +============= + +The Zabbix plugin actively sends information to a Zabbix server like: + +- Ceph status +- I/O operations +- I/O bandwidth +- OSD status +- Storage utilization + +Requirements +------------ + +The plugin requires that the *zabbix_sender* executable is present on *all* +machines running ceph-mgr. It can be installed on most distributions using +the package manager. + +Dependencies +^^^^^^^^^^^^ +Installing zabbix_sender can be done under Ubuntu or CentOS using either apt +or dnf. + +On Ubuntu Xenial: + +:: + + apt install zabbix-agent + +On Fedora: + +:: + + dnf install zabbix-sender + + +Enabling +-------- + +Add this to your ceph.conf on nodes where you run ceph-mgr: + +:: + + [mgr] + mgr modules = zabbix + +If you use any other ceph-mgr modules, make sure they're in the list too. + +Restart the ceph-mgr daemon after modifying the setting to load the module. + + +Configuration +------------- + +Two configuration keys are mandatory for the module to work: + +- mgr/zabbix/zabbix_host +- mgr/zabbix/identifier + +The parameter *zabbix_host* controls the hostname of the Zabbix server to which +*zabbix_sender* will send the items. This can be a IP-Address if required by +your installation. + +The *identifier* parameter controls the identifier/hostname to use as source +when sending items to Zabbix. This should match the name of the *Host* in +your Zabbix server. + +Additional configuration keys which can be configured and their default values: + +- mgr/zabbix/zabbix_port: 10051 +- mgr/zabbix/zabbix_sender: /usr/bin/zabbix_sender +- mgr/zabbix/interval: 60 + +Configuration keys +^^^^^^^^^^^^^^^^^^^ + +Configuration keys can be set on any machine with the proper cephx credentials, +these are usually Monitors where the *client.admin* key is present. + +:: + + ceph config-key set <key> <value> + +For example: + +:: + + ceph config-key set mgr/zabbix/zabbix_host zabbix.localdomain + ceph config-key set mgr/zabbix/identifier ceph.eu-ams02.local + +Debugging +--------- + +Should you want to debug the Zabbix module increase the logging level for +ceph-mgr and check the logs. + +:: + + [mgr] + debug mgr = 20 + +With logging set to debug for the manager the plugin will print various logging +lines prefixed with *mgr[zabbix]* for easy filtering. + |