diff options
Diffstat (limited to 'src/ceph/doc/mgr')
-rw-r--r-- | src/ceph/doc/mgr/administrator.rst | 159 | ||||
-rw-r--r-- | src/ceph/doc/mgr/dashboard.rst | 59 | ||||
-rw-r--r-- | src/ceph/doc/mgr/index.rst | 36 | ||||
-rw-r--r-- | src/ceph/doc/mgr/influx.rst | 162 | ||||
-rw-r--r-- | src/ceph/doc/mgr/localpool.rst | 35 | ||||
-rw-r--r-- | src/ceph/doc/mgr/plugins.rst | 215 | ||||
-rw-r--r-- | src/ceph/doc/mgr/prometheus.rst | 219 | ||||
-rw-r--r-- | src/ceph/doc/mgr/restful.rst | 89 | ||||
-rw-r--r-- | src/ceph/doc/mgr/zabbix.rst | 104 |
9 files changed, 1078 insertions, 0 deletions
diff --git a/src/ceph/doc/mgr/administrator.rst b/src/ceph/doc/mgr/administrator.rst new file mode 100644 index 0000000..11daf3e --- /dev/null +++ b/src/ceph/doc/mgr/administrator.rst @@ -0,0 +1,159 @@ +.. _mgr-administrator-guide: + +ceph-mgr administrator's guide +============================== + +Manual setup +------------ + +Usually, you would set up a ceph-mgr daemon using a tool such +as ceph-ansible. These instructions describe how to set up +a ceph-mgr daemon manually. + +First, create an authentication key for your daemon:: + + ceph auth get-or-create mgr.$name mon 'allow profile mgr' osd 'allow *' mds 'allow *' + +Place that key into ``mgr data`` path, which for a cluster "ceph" +and mgr $name "foo" would be ``/var/lib/ceph/mgr/ceph-foo``. + +Start the ceph-mgr daemon:: + + ceph-mgr -i $name + +Check that the mgr has come up by looking at the output +of ``ceph status``, which should now include a mgr status line:: + + mgr active: $name + +Client authentication +--------------------- + +The manager is a new daemon which requires new CephX capabilities. If you upgrade +a cluster from an old version of Ceph, or use the default install/deploy tools, +your admin client should get this capability automatically. If you use tooling from +elsewhere, you may get EACCES errors when invoking certain ceph cluster commands. +To fix that, add a "mgr allow \*" stanza to your client's cephx capabilities by +`Modifying User Capabilities`_. + +High availability +----------------- + +In general, you should set up a ceph-mgr on each of the hosts +running a ceph-mon daemon to achieve the same level of availability. + +By default, whichever ceph-mgr instance comes up first will be made +active by the monitors, and the others will be standbys. There is +no requirement for quorum among the ceph-mgr daemons. + +If the active daemon fails to send a beacon to the monitors for +more than ``mon mgr beacon grace`` (default 30s), then it will be replaced +by a standby. + +If you want to pre-empt failover, you can explicitly mark a ceph-mgr +daemon as failed using ``ceph mgr fail <mgr name>``. + +Using modules +------------- + +Use the command ``ceph mgr module ls`` to see which modules are +available, and which are currently enabled. Enable or disable modules +using the commands ``ceph mgr module enable <module>`` and +``ceph mgr module disable <module>`` respectively. + +If a module is *enabled* then the active ceph-mgr daemon will load +and execute it. In the case of modules that provide a service, +such as an HTTP server, the module may publish its address when it +is loaded. To see the addresses of such modules, use the command +``ceph mgr services``. + +Some modules may also implement a special standby mode which runs on +standby ceph-mgr daemons as well as the active daemon. This enables +modules that provide services to redirect their clients to the active +daemon, if the client tries to connect to a standby. + +Consult the documentation pages for individual manager modules for more +information about what functionality each module provides. + +Here is an example of enabling the ``dashboard`` module: + +:: + + $ ceph mgr module ls + { + "enabled_modules": [ + "restful", + "status" + ], + "disabled_modules": [ + "dashboard" + ] + } + + $ ceph mgr module enable dashboard + $ ceph mgr module ls + { + "enabled_modules": [ + "restful", + "status", + "dashboard" + ], + "disabled_modules": [ + ] + } + + $ ceph mgr services + { + "dashboard": "http://myserver.com:7789/", + "restful": "https://myserver.com:8789/" + } + + +Calling module commands +----------------------- + +Where a module implements command line hooks, the commands will +be accessible as ordinary Ceph commands:: + + ceph <command | help> + +If you would like to see the list of commands handled by the +manager (where normal ``ceph help`` would show all mon and mgr commands), +you can send a command directly to the manager daemon:: + + ceph tell mgr help + +Note that it is not necessary to address a particular mgr instance, +simply ``mgr`` will pick the current active daemon. + +Configuration +------------- + +OPTION(mgr_module_path, OPT_STR, CEPH_PKGLIBDIR "/mgr") // where to load python modules from + +``mgr module path`` + +:Description: Path to load modules from +:Type: String +:Default: ``"<library dir>/mgr"`` + +``mgr data`` + +:Description: Path to load daemon data (such as keyring) +:Type: String +:Default: ``"/var/lib/ceph/mgr/$cluster-$id"`` + +``mgr tick period`` + +:Description: How many seconds between mgr beacons to monitors, and other + periodic checks. +:Type: Integer +:Default: ``5`` + +``mon mgr beacon grace`` + +:Description: How long after last beacon should a mgr be considered failed +:Type: Integer +:Default: ``30`` + +.. _Modifying User Capabilities: ../../rados/operations/user-management/#modify-user-capabilities diff --git a/src/ceph/doc/mgr/dashboard.rst b/src/ceph/doc/mgr/dashboard.rst new file mode 100644 index 0000000..4c2116b --- /dev/null +++ b/src/ceph/doc/mgr/dashboard.rst @@ -0,0 +1,59 @@ +dashboard plugin +================ + +Dashboard plugin visualizes the statistics of the cluster using a web server +hosted by ``ceph-mgr``. + +Enabling +-------- + +The *dashboard* module is enabled with:: + + ceph mgr module enable dashboard + +Configuration +------------- + +Like most web applications, dashboard binds to a host name and port. +By default, the ``ceph-mgr`` daemon hosting the dashboard (i.e., the +currently active manager) will bind to port 7000 and any available +IPv4 or IPv6 address on the host. + +Since each ``ceph-mgr`` hosts its own instance of dashboard, it may +also be necessary to configure them separately. The hostname and port +can be changed via the configuration key facility:: + + ceph config-key set mgr/dashboard/$name/server_addr $IP + ceph config-key set mgr/dashboard/$name/server_port $PORT + +where ``$name`` is the ID of the ceph-mgr who is hosting this +dashboard web app. + +These settings can also be configured cluster-wide and not manager +specific. For example,:: + + ceph config-key set mgr/dashboard/server_addr $IP + ceph config-key set mgr/dashboard/server_port $PORT + +If the port is not configured, the web app will bind to port ``7000``. +If the address it not configured, the web app will bind to ``::``, +which corresponds to all available IPv4 and IPv6 addresses. + +You can configure a prefix for all URLs:: + + ceph config-key set mgr/dashboard/url_prefix $PREFIX + +so you can access the dashboard at ``http://$IP:$PORT/$PREFIX/``. + + +Load balancer +------------- + +Please note that the dashboard will *only* start on the manager which +is active at that moment. Query the Ceph cluster status to see which +manager is active (e.g., ``ceph mgr dump``). In order to make the +dashboard available via a consistent URL regardless of which manager +daemon is currently active, you may want to set up a load balancer +front-end to direct traffic to whichever manager endpoint is +available. If you use a reverse http proxy that forwards a subpath to +the dashboard, you need to configure ``url_prefix`` (see above). diff --git a/src/ceph/doc/mgr/index.rst b/src/ceph/doc/mgr/index.rst new file mode 100644 index 0000000..53844ba --- /dev/null +++ b/src/ceph/doc/mgr/index.rst @@ -0,0 +1,36 @@ + + +=================== +Ceph Manager Daemon +=================== + +The :term:`Ceph Manager` daemon (ceph-mgr) runs alongside monitor daemons, +to provide additional monitoring and interfaces to external monitoring +and management systems. + +Since the 12.x (*luminous*) Ceph release, the ceph-mgr daemon is required for +normal operations. The ceph-mgr daemon is an optional component in +the 11.x (*kraken*) Ceph release. + +By default, the manager daemon requires no additional configuration, beyond +ensuring it is running. If there is no mgr daemon running, you will +see a health warning to that effect, and some of the other information +in the output of `ceph status` will be missing or stale until a mgr is started. + +Use your normal deployment tools, such as ceph-ansible or ceph-deploy, to +set up ceph-mgr daemons on each of your mon nodes. It is not mandatory +to place mgr daemons on the same nodes as mons, but it is almost always +sensible. + +.. toctree:: + :maxdepth: 1 + + Installation and Configuration <administrator> + Writing plugins <plugins> + Dashboard plugin <dashboard> + Local pool plugin <localpool> + RESTful plugin <restful> + Zabbix plugin <zabbix> + Prometheus plugin <prometheus> + Influx plugin <influx> + diff --git a/src/ceph/doc/mgr/influx.rst b/src/ceph/doc/mgr/influx.rst new file mode 100644 index 0000000..37aa5cd --- /dev/null +++ b/src/ceph/doc/mgr/influx.rst @@ -0,0 +1,162 @@ +============= +Influx Plugin +============= + +The influx plugin continuously collects and sends time series data to an +influxdb database. + +The influx plugin was introduced in the 13.x *Mimic* release. + +-------- +Enabling +-------- + +To enable the module, use the following command: + +:: + + ceph mgr module enable influx + +If you wish to subsequently disable the module, you can use the equivalent +*disable* command: + +:: + + ceph mgr module disable influx + +------------- +Configuration +------------- + +For the influx module to send statistics to an InfluxDB server, it +is necessary to configure the servers address and some authentication +credentials. + +Set configuration values using the following command: + +:: + + ceph config-key set mgr/influx/<key> <value> + + +The most important settings are ``hostname``, ``username`` and ``password``. +For example, a typical configuration might look like this: + +:: + + ceph config-key set mgr/influx/hostname influx.mydomain.com + ceph config-key set mgr/influx/username admin123 + ceph config-key set mgr/influx/password p4ssw0rd + +Additional optional configuration settings are: + +:interval: Time between reports to InfluxDB. Default 5 seconds. +:database: InfluxDB database name. Default "ceph". You will need to create this database and grant write privileges to the configured username or the username must have admin privileges to create it. +:port: InfluxDB server port. Default 8086 + + +--------- +Debugging +--------- + +By default, a few debugging statments as well as error statements have been set to print in the log files. Users can add more if necessary. +To make use of the debugging option in the module: + +- Add this to the ceph.conf file.:: + + [mgr] + debug_mgr = 20 + +- Use this command ``ceph tell mgr.<mymonitor> influx self-test``. +- Check the log files. Users may find it easier to filter the log files using *mgr[influx]*. + +-------------------- +Interesting counters +-------------------- + +The following tables describe a subset of the values output by +this module. + +^^^^^ +Pools +^^^^^ + ++---------------+-----------------------------------------------------+ +|Counter | Description | ++===============+=====================================================+ +|bytes_used | Bytes used in the pool not including copies | ++---------------+-----------------------------------------------------+ +|max_avail | Max available number of bytes in the pool | ++---------------+-----------------------------------------------------+ +|objects | Number of objects in the pool | ++---------------+-----------------------------------------------------+ +|wr_bytes | Number of bytes written in the pool | ++---------------+-----------------------------------------------------+ +|dirty | Number of bytes dirty in the pool | ++---------------+-----------------------------------------------------+ +|rd_bytes | Number of bytes read in the pool | ++---------------+-----------------------------------------------------+ +|raw_bytes_used | Bytes used in pool including copies made | ++---------------+-----------------------------------------------------+ + +^^^^ +OSDs +^^^^ + ++------------+------------------------------------+ +|Counter | Description | ++============+====================================+ +|op_w | Client write operations | ++------------+------------------------------------+ +|op_in_bytes | Client operations total write size | ++------------+------------------------------------+ +|op_r | Client read operations | ++------------+------------------------------------+ +|op_out_bytes| Client operations total read size | ++------------+------------------------------------+ + + ++------------------------+--------------------------------------------------------------------------+ +|Counter | Description | ++========================+==========================================================================+ +|op_wip | Replication operations currently being processed (primary) | ++------------------------+--------------------------------------------------------------------------+ +|op_latency | Latency of client operations (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_process_latency | Latency of client operations (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_prepare_latency | Latency of client operations (excluding queue time and wait for finished)| ++------------------------+--------------------------------------------------------------------------+ +|op_r_latency | Latency of read operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_r_process_latency | Latency of read operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_in_bytes | Client data written | ++------------------------+--------------------------------------------------------------------------+ +|op_w_latency | Latency of write operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_process_latency | Latency of write operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_w_prepare_latency | Latency of write operations (excluding queue time and wait for finished) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw | Client read-modify-write operations | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_in_bytes | Client read-modify-write operations write in | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_out_bytes | Client read-modify-write operations read out | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_latency | Latency of read-modify-write operation (including queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_process_latency | Latency of read-modify-write operation (excluding queue time) | ++------------------------+--------------------------------------------------------------------------+ +|op_rw_prepare_latency | Latency of read-modify-write operations (excluding queue time | +| | and wait for finished) | ++------------------------+--------------------------------------------------------------------------+ +|op_before_queue_op_lat | Latency of IO before calling queue (before really queue into ShardedOpWq)| +| | op_before_dequeue_op_lat | ++------------------------+--------------------------------------------------------------------------+ +|op_before_dequeue_op_lat| Latency of IO before calling dequeue_op(already dequeued and get PG lock)| ++------------------------+--------------------------------------------------------------------------+ + +Latency counters are measured in microseconds unless otherwise specified in the description. + diff --git a/src/ceph/doc/mgr/localpool.rst b/src/ceph/doc/mgr/localpool.rst new file mode 100644 index 0000000..5779b7c --- /dev/null +++ b/src/ceph/doc/mgr/localpool.rst @@ -0,0 +1,35 @@ +Local pool plugin +================= + +The *localpool* plugin can automatically create RADOS pools that are +localized to a subset of the overall cluster. For example, by default, it will +create a pool for each distinct rack in the cluster. This can be useful for some +deployments that want to distribute some data locally as well as globally across the cluster . + +Enabling +-------- + +The *localpool* module is enabled with:: + + ceph mgr module enable localpool + +Configuring +----------- + +The *localpool* module understands the following options: + +* **subtree** (default: `rack`): which CRUSH subtree type the module + should create a pool for. +* **failure_domain** (default: `host`): what failure domain we should + separate data replicas across. +* **pg_num** (default: `128`): number of PGs to create for each pool +* **num_rep** (default: `3`): number of replicas for each pool. + (Currently, pools are always replicated.) +* **min_size** (default: none): value to set min_size to (unchanged from Ceph's default if this option is not set) +* **prefix** (default: `by-$subtreetype-`): prefix for the pool name. + +These options are set via the config-key interface. For example, to +change the replication level to 2x with only 64 PGs, :: + + ceph config-key set mgr/localpool/num_rep 2 + ceph config-key set mgr/localpool/pg_num 64 diff --git a/src/ceph/doc/mgr/plugins.rst b/src/ceph/doc/mgr/plugins.rst new file mode 100644 index 0000000..a75c14c --- /dev/null +++ b/src/ceph/doc/mgr/plugins.rst @@ -0,0 +1,215 @@ + +ceph-mgr plugin author guide +============================ + +Creating a plugin +----------------- + +In pybind/mgr/, create a python module. Within your module, create a class +named ``Module`` that inherits from ``MgrModule``. + +The most important methods to override are: + +* a ``serve`` member function for server-type modules. This + function should block forever. +* a ``notify`` member function if your module needs to + take action when new cluster data is available. +* a ``handle_command`` member function if your module + exposes CLI commands. + +Installing a plugin +------------------- + +Once your module is present in the location set by the +``mgr module path`` configuration setting, you can enable it +via the ``ceph mgr module enable`` command:: + + ceph mgr module enable mymodule + +Note that the MgrModule interface is not stable, so any modules maintained +outside of the Ceph tree are liable to break when run against any newer +or older versions of Ceph. + +Logging +------- + +MgrModule instances have a ``log`` property which is a logger instance that +sends log messages into the Ceph logging layer where they will be recorded +in the mgr daemon's log file. + +Use it the same way you would any other python logger. The python +log levels debug, info, warn, err are mapped into the Ceph +severities 20, 4, 1 and 0 respectively. + +Exposing commands +----------------- + +Set the ``COMMANDS`` class attribute of your plugin to a list of dicts +like this:: + + COMMANDS = [ + { + "cmd": "foobar name=myarg,type=CephString", + "desc": "Do something awesome", + "perm": "rw" + } + ] + +The ``cmd`` part of each entry is parsed in the same way as internal +Ceph mon and admin socket commands (see mon/MonCommands.h in +the Ceph source for examples) + +Config settings +--------------- + +Modules have access to a simple key/value store (keys and values are +byte strings) for storing configuration. Don't use this for +storing large amounts of data. + +Config values are stored using the mon's config-key commands. + +Hints for using these: + +* Reads are fast: ceph-mgr keeps a local in-memory copy +* Don't set things by hand with "ceph config-key", the mgr doesn't update + at runtime (only set things from within modules). +* Writes block until the value is persisted, but reads from another + thread will see the new value immediately. + +Any config settings you want to expose to users from your module will +need corresponding hooks in ``COMMANDS`` to expose a setter. + +Accessing cluster data +---------------------- + +Modules have access to the in-memory copies of the Ceph cluster's +state that the mgr maintains. Accessor functions as exposed +as members of MgrModule. + +Calls that access the cluster or daemon state are generally going +from Python into native C++ routines. There is some overhead to this, +but much less than for example calling into a REST API or calling into +an SQL database. + +There are no consistency rules about access to cluster structures or +daemon metadata. For example, an OSD might exist in OSDMap but +have no metadata, or vice versa. On a healthy cluster these +will be very rare transient states, but plugins should be written +to cope with the possibility. + +``get(self, data_name)`` + +Fetch named cluster-wide objects such as the OSDMap. Valid things +to fetch are osd_crush_map_text, osd_map, osd_map_tree, +osd_map_crush, config, mon_map, fs_map, osd_metadata, pg_summary, +df, osd_stats, health, mon_status. + +All these structures have their own JSON representations: experiment +or look at the C++ dump() methods to learn about them. + +``get_server(self, hostname)`` + +Fetch metadata about a particular hostname. This is information +that ceph-mgr has gleaned from the daemon metadata reported +by daemons running on a particular server. + +``list_servers(self)`` + +Like ``get_server``, but gives information about all servers (i.e. all +unique hostnames that have been mentioned in daemon metadata) + +``get_metadata(self, svc_type, svc_id)`` + +Fetch the daemon metadata for a particular service. svc_type is one +of osd or mds, and svc_id is a string (convert OSD integer IDs to strings +when calling this). + +``get_counter(self, svc_type, svc_name, path)`` + +Fetch the latest performance counter data for a particular counter. The +path is a period-separated concatenation of the subsystem and the counter +name, for example "mds.inodes". + +A list of two-tuples of (timestamp, value) is returned. This may be +empty if no data is available. + +Sending commands +---------------- + +A non-blocking facility is provided for sending monitor commands +to the cluster. + +``send_command(self, result, command_str, tag)`` + +The ``result`` parameter should be an instance of the CommandResult +class, defined in the same module as MgrModule. This acts as a +completion and stores the output of the command. Use CommandResult.wait() +if you want to block on completion. + +The ``command_str`` parameter is a JSON-serialized command. This +uses the same format as the ceph command line, which is a dictionary +of command arguments, with the extra ``prefix`` key containing the +command name itself. Consult MonCommands.h for available commands +and their expected arguments. + +The ``tag`` parameter is used for nonblocking operation: when +a command completes, the ``notify()`` callback on the MgrModule +instance is triggered, with notify_type set to "command", and +notify_id set to the tag of the command. + +Implementing standby mode +------------------------- + +For some modules, it is useful to run on standby manager daemons as well +as on the active daemon. For example, an HTTP server can usefully +serve HTTP redirect responses from the standby managers so that +the user can point his browser at any of the manager daemons without +having to worry about which one is active. + +Standby manager daemons look for a class called ``StandbyModule`` +in each module. If the class is not found then the module is not +used at all on standby daemons. If the class is found, then +its ``serve`` method is called. Implementations of ``StandbyModule`` +must inherit from ``mgr_module.MgrStandbyModule``. + +The interface of ``MgrStandbyModule`` is much restricted compared to +``MgrModule`` -- none of the Ceph cluster state is available to +the module. ``serve`` and ``shutdown`` methods are used in the same +way as a normal module class. The ``get_active_uri`` method enables +the standby module to discover the address of its active peer in +order to make redirects. See the ``MgrStandbyModule`` definition +in the Ceph source code for the full list of methods. + +For an example of how to use this interface, look at the source code +of the ``dashboard`` module. + +Logging +------- + +Use your module's ``log`` attribute as your logger. This is a logger +configured to output via the ceph logging framework, to the local ceph-mgr +log files. + +Python log severities are mapped to ceph severities as follows: + +* DEBUG is 20 +* INFO is 4 +* WARN is 1 +* ERR is 0 + +Shutting down cleanly +--------------------- + +If a module implements the ``serve()`` method, it should also implement +the ``shutdown()`` method to shutdown cleanly: misbehaving modules +may otherwise prevent clean shutdown of ceph-mgr. + +Is something missing? +--------------------- + +The ceph-mgr python interface is not set in stone. If you have a need +that is not satisfied by the current interface, please bring it up +on the ceph-devel mailing list. While it is desired to avoid bloating +the interface, it is not generally very hard to expose existing data +to the Python code when there is a good reason. + diff --git a/src/ceph/doc/mgr/prometheus.rst b/src/ceph/doc/mgr/prometheus.rst new file mode 100644 index 0000000..5bae6a9 --- /dev/null +++ b/src/ceph/doc/mgr/prometheus.rst @@ -0,0 +1,219 @@ +================= +Prometheus plugin +================= + +Provides a Prometheus exporter to pass on Ceph performance counters +from the collection point in ceph-mgr. Ceph-mgr receives MMgrReport +messages from all MgrClient processes (mons and OSDs, for instance) +with performance counter schema data and actual counter data, and keeps +a circular buffer of the last N samples. This plugin creates an HTTP +endpoint (like all Prometheus exporters) and retrieves the latest sample +of every counter when polled (or "scraped" in Prometheus terminology). +The HTTP path and query parameters are ignored; all extant counters +for all reporting entities are returned in text exposition format. +(See the Prometheus `documentation <https://prometheus.io/docs/instrumenting/exposition_formats/#text-format-details>`_.) + +Enabling prometheus output +========================== + +The *prometheus* module is enabled with:: + + ceph mgr module enable prometheus + +Configuration +------------- + +By default the module will accept HTTP requests on port ``9283`` on all +IPv4 and IPv6 addresses on the host. The port and listen address are both +configurable with ``ceph config-key set``, with keys +``mgr/prometheus/server_addr`` and ``mgr/prometheus/server_port``. +This port is registered with Prometheus's `registry <https://github.com/prometheus/prometheus/wiki/Default-port-allocations>`_. + +Statistic names and labels +========================== + +The names of the stats are exactly as Ceph names them, with +illegal characters ``.``, ``-`` and ``::`` translated to ``_``, +and ``ceph_`` prefixed to all names. + + +All *daemon* statistics have a ``ceph_daemon`` label such as "osd.123" +that identifies the type and ID of the daemon they come from. Some +statistics can come from different types of daemon, so when querying +e.g. an OSD's RocksDB stats, you would probably want to filter +on ceph_daemon starting with "osd" to avoid mixing in the monitor +rocksdb stats. + + +The *cluster* statistics (i.e. those global to the Ceph cluster) +have labels appropriate to what they report on. For example, +metrics relating to pools have a ``pool_id`` label. + +Pool and OSD metadata series +---------------------------- + +Special series are output to enable displaying and querying on +certain metadata fields. + +Pools have a ``ceph_pool_metadata`` field like this: + +:: + + ceph_pool_metadata{pool_id="2",name="cephfs_metadata_a"} 0.0 + +OSDs have a ``ceph_osd_metadata`` field like this: + +:: + + ceph_osd_metadata{cluster_addr="172.21.9.34:6802/19096",device_class="ssd",id="0",public_addr="172.21.9.34:6801/19096",weight="1.0"} 0.0 + + +Correlating drive statistics with node_exporter +----------------------------------------------- + +The prometheus output from Ceph is designed to be used in conjunction +with the generic host monitoring from the Prometheus node_exporter. + +To enable correlation of Ceph OSD statistics with node_exporter's +drive statistics, special series are output like this: + +:: + + ceph_disk_occupation{ceph_daemon="osd.0",device="sdd",instance="myhost",job="ceph"} + +To use this to get disk statistics by OSD ID, use the ``and on`` syntax +in your prometheus query like this: + +:: + + rate(node_disk_bytes_written[30s]) and on (device,instance) ceph_disk_occupation{ceph_daemon="osd.0"} + +See the prometheus documentation for more information about constructing +queries. + +Note that for this mechanism to work, Ceph and node_exporter must agree +about the values of the ``instance`` label. See the following section +for guidance about to to set up Prometheus in a way that sets +``instance`` properly. + +Configuring Prometheus server +============================= + +See the prometheus documentation for full details of how to add +scrape endpoints: the notes +in this section are tips on how to configure Prometheus to capture +the Ceph statistics in the most usefully-labelled form. + +This configuration is necessary because Ceph is reporting metrics +from many hosts and services via a single endpoint, and some +metrics that relate to no physical host (such as pool statistics). + +honor_labels +------------ + +To enable Ceph to output properly-labelled data relating to any host, +use the ``honor_labels`` setting when adding the ceph-mgr endpoints +to your prometheus configuration. + +Without this setting, any ``instance`` labels that Ceph outputs, such +as those in ``ceph_disk_occupation`` series, will be overridden +by Prometheus. + +Ceph instance label +------------------- + +By default, Prometheus applies an ``instance`` label that includes +the hostname and port of the endpoint that the series game from. Because +Ceph clusters have multiple manager daemons, this results in an ``instance`` +label that changes spuriously when the active manager daemon changes. + +Set a custom ``instance`` label in your Prometheus target configuration: +you might wish to set it to the hostname of your first monitor, or something +completely arbitrary like "ceph_cluster". + +node_exporter instance labels +----------------------------- + +Set your ``instance`` labels to match what appears in Ceph's OSD metadata +in the ``hostname`` field. This is generally the short hostname of the node. + +This is only necessary if you want to correlate Ceph stats with host stats, +but you may find it useful to do it in all cases in case you want to do +the correlation in the future. + +Example configuration +--------------------- + +This example shows a single node configuration running ceph-mgr and +node_exporter on a server called ``senta04``. + +This is just an example: there are other ways to configure prometheus +scrape targets and label rewrite rules. + +prometheus.yml +~~~~~~~~~~~~~~ + +:: + + global: + scrape_interval: 15s + evaluation_interval: 15s + + scrape_configs: + - job_name: 'node' + file_sd_configs: + - files: + - node_targets.yml + - job_name: 'ceph' + honor_labels: true + file_sd_configs: + - files: + - ceph_targets.yml + + +ceph_targets.yml +~~~~~~~~~~~~~~~~ + + +:: + + [ + { + "targets": [ "senta04.mydomain.com:9283" ], + "labels": { + "instance": "ceph_cluster" + } + } + ] + + +node_targets.yml +~~~~~~~~~~~~~~~~ + +:: + + [ + { + "targets": [ "senta04.mydomain.com:9100" ], + "labels": { + "instance": "senta04" + } + } + ] + + +Notes +===== + +Counters and gauges are exported; currently histograms and long-running +averages are not. It's possible that Ceph's 2-D histograms could be +reduced to two separate 1-D histograms, and that long-running averages +could be exported as Prometheus' Summary type. + +Timestamps, as with many Prometheus exporters, are established by +the server's scrape time (Prometheus expects that it is polling the +actual counter process synchronously). It is possible to supply a +timestamp along with the stat report, but the Prometheus team strongly +advises against this. This means that timestamps will be delayed by +an unpredictable amount; it's not clear if this will be problematic, +but it's worth knowing about. diff --git a/src/ceph/doc/mgr/restful.rst b/src/ceph/doc/mgr/restful.rst new file mode 100644 index 0000000..e67f2d1 --- /dev/null +++ b/src/ceph/doc/mgr/restful.rst @@ -0,0 +1,89 @@ +restful plugin +============== + +RESTful plugin offers the REST API access to the status of the cluster +over an SSL-secured connection. + +Enabling +-------- + +The *restful* module is enabled with:: + + ceph mgr module enable restful + +You will also need to configure an SSL certificate below before the +API endpoint is available. By default the module will accept HTTPS +requests on port ``8003`` on all IPv4 and IPv6 addresses on the host. + +Securing +-------- + +All connections to *restful* are secured with SSL. You can generate a +self-signed certificate with the command:: + + ceph restful create-self-signed-cert + +Note that with a self-signed certificate most clients will need a flag +to allow a connection and/or suppress warning messages. For example, +if the ``ceph-mgr`` daemon is on the same host,:: + + curl -k https://localhost:8003/ + +To properly secure a deployment, a certificate that is signed by the +organization's certificate authority should be used. For example, a key pair +can be generated with a command similar to:: + + openssl req -new -nodes -x509 \ + -subj "/O=IT/CN=ceph-mgr-restful" \ + -days 3650 -keyout restful.key -out restful.crt -extensions v3_ca + +The ``restful.crt`` should then be signed by your organization's CA +(certificate authority). Once that is done, you can set it with:: + + ceph config-key set mgr/restful/$name/crt -i restful.crt + ceph config-key set mgr/restful/$name/key -i restful.key + +where ``$name`` is the name of the ``ceph-mgr`` instance (usually the +hostname). If all manager instances are to share the same certificate, +you can leave off the ``$name`` portion:: + + ceph config-key set mgr/restful/crt -i restful.crt + ceph config-key set mgr/restful/key -i restful.key + + +Configuring IP and port +----------------------- + +Like any other RESTful API endpoint, *restful* binds to an IP and +port. By default, the currently active ``ceph-mgr`` daemon will bind +to port 8003 and any available IPv4 or IPv6 address on the host. + +Since each ``ceph-mgr`` hosts its own instance of *restful*, it may +also be necessary to configure them separately. The IP and port +can be changed via the configuration key facility:: + + ceph config-key set mgr/restful/$name/server_addr $IP + ceph config-key set mgr/restful/$name/server_port $PORT + +where ``$name`` is the ID of the ceph-mgr daemon (usually the hostname). + +These settings can also be configured cluster-wide and not manager +specific. For example,:: + + ceph config-key set mgr/restful/server_addr $IP + ceph config-key set mgr/restful/server_port $PORT + +If the port is not configured, *restful* will bind to port ``8003``. +If the address it not configured, the *restful* will bind to ``::``, +which corresponds to all available IPv4 and IPv6 addresses. + +Load balancer +------------- + +Please note that *restful* will *only* start on the manager which +is active at that moment. Query the Ceph cluster status to see which +manager is active (e.g., ``ceph mgr dump``). In order to make the +API available via a consistent URL regardless of which manager +daemon is currently active, you may want to set up a load balancer +front-end to direct traffic to whichever manager endpoint is +available. diff --git a/src/ceph/doc/mgr/zabbix.rst b/src/ceph/doc/mgr/zabbix.rst new file mode 100644 index 0000000..d98540e --- /dev/null +++ b/src/ceph/doc/mgr/zabbix.rst @@ -0,0 +1,104 @@ +Zabbix plugin +============= + +The Zabbix plugin actively sends information to a Zabbix server like: + +- Ceph status +- I/O operations +- I/O bandwidth +- OSD status +- Storage utilization + +Requirements +------------ + +The plugin requires that the *zabbix_sender* executable is present on *all* +machines running ceph-mgr. It can be installed on most distributions using +the package manager. + +Dependencies +^^^^^^^^^^^^ +Installing zabbix_sender can be done under Ubuntu or CentOS using either apt +or dnf. + +On Ubuntu Xenial: + +:: + + apt install zabbix-agent + +On Fedora: + +:: + + dnf install zabbix-sender + + +Enabling +-------- + +Add this to your ceph.conf on nodes where you run ceph-mgr: + +:: + + [mgr] + mgr modules = zabbix + +If you use any other ceph-mgr modules, make sure they're in the list too. + +Restart the ceph-mgr daemon after modifying the setting to load the module. + + +Configuration +------------- + +Two configuration keys are mandatory for the module to work: + +- mgr/zabbix/zabbix_host +- mgr/zabbix/identifier + +The parameter *zabbix_host* controls the hostname of the Zabbix server to which +*zabbix_sender* will send the items. This can be a IP-Address if required by +your installation. + +The *identifier* parameter controls the identifier/hostname to use as source +when sending items to Zabbix. This should match the name of the *Host* in +your Zabbix server. + +Additional configuration keys which can be configured and their default values: + +- mgr/zabbix/zabbix_port: 10051 +- mgr/zabbix/zabbix_sender: /usr/bin/zabbix_sender +- mgr/zabbix/interval: 60 + +Configuration keys +^^^^^^^^^^^^^^^^^^^ + +Configuration keys can be set on any machine with the proper cephx credentials, +these are usually Monitors where the *client.admin* key is present. + +:: + + ceph config-key set <key> <value> + +For example: + +:: + + ceph config-key set mgr/zabbix/zabbix_host zabbix.localdomain + ceph config-key set mgr/zabbix/identifier ceph.eu-ams02.local + +Debugging +--------- + +Should you want to debug the Zabbix module increase the logging level for +ceph-mgr and check the logs. + +:: + + [mgr] + debug mgr = 20 + +With logging set to debug for the manager the plugin will print various logging +lines prefixed with *mgr[zabbix]* for easy filtering. + |