summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/rados/operations/add-or-rm-mons.rst
blob: 0cdc4313cc2586c9c14e36151d06523db805ce3d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
==========================
 Adding/Removing Monitors
==========================

When you have a cluster up and running, you may add or remove monitors
from the cluster at runtime. To bootstrap a monitor, see `Manual Deployment`_
or `Monitor Bootstrap`_.

Adding Monitors
===============

Ceph monitors are light-weight processes that maintain a master copy of the 
cluster map. You can run a cluster with 1 monitor. We recommend at least 3 
monitors for a production cluster. Ceph monitors use a variation of the
`Paxos`_ protocol to establish consensus about maps and other critical
information across the cluster. Due to the nature of Paxos, Ceph requires
a majority of monitors running to establish a quorum (thus establishing
consensus).

It is advisable to run an odd-number of monitors but not mandatory. An
odd-number of monitors has a higher resiliency to failures than an
even-number of monitors. For instance, on a 2 monitor deployment, no
failures can be tolerated in order to maintain a quorum; with 3 monitors,
one failure can be tolerated; in a 4 monitor deployment, one failure can
be tolerated; with 5 monitors, two failures can be tolerated.  This is
why an odd-number is advisable. Summarizing, Ceph needs a majority of
monitors to be running (and able to communicate with each other), but that
majority can be achieved using a single monitor, or 2 out of 2 monitors,
2 out of 3, 3 out of 4, etc.

For an initial deployment of a multi-node Ceph cluster, it is advisable to
deploy three monitors, increasing the number two at a time if a valid need
for more than three exists.

Since monitors are light-weight, it is possible to run them on the same 
host as an OSD; however, we recommend running them on separate hosts,
because fsync issues with the kernel may impair performance. 

.. note:: A *majority* of monitors in your cluster must be able to 
   reach each other in order to establish a quorum.

Deploy your Hardware
--------------------

If you are adding a new host when adding a new monitor,  see `Hardware
Recommendations`_ for details on minimum recommendations for monitor hardware.
To add a monitor host to your cluster, first make sure you have an up-to-date
version of Linux installed (typically Ubuntu 14.04 or RHEL 7). 

Add your monitor host to a rack in your cluster, connect it to the network
and ensure that it has network connectivity.

.. _Hardware Recommendations: ../../../start/hardware-recommendations

Install the Required Software
-----------------------------

For manually deployed clusters, you must install Ceph packages
manually. See `Installing Packages`_ for details.
You should configure SSH to a user with password-less authentication
and root permissions.

.. _Installing Packages: ../../../install/install-storage-cluster


.. _Adding a Monitor (Manual):

Adding a Monitor (Manual)
-------------------------

This procedure creates a ``ceph-mon`` data directory, retrieves the monitor map
and monitor keyring, and adds a ``ceph-mon`` daemon to your cluster.  If
this results in only two monitor daemons, you may add more monitors by
repeating this procedure until you have a sufficient number of ``ceph-mon`` 
daemons to achieve a quorum.

At this point you should define your monitor's id.  Traditionally, monitors 
have been named with single letters (``a``, ``b``, ``c``, ...), but you are 
free to define the id as you see fit.  For the purpose of this document, 
please take into account that ``{mon-id}`` should be the id you chose, 
without the ``mon.`` prefix (i.e., ``{mon-id}`` should be the ``a`` 
on ``mon.a``).

#. Create the default directory on the machine that will host your 
   new monitor. :: 

	ssh {new-mon-host}
	sudo mkdir /var/lib/ceph/mon/ceph-{mon-id}

#. Create a temporary directory ``{tmp}`` to keep the files needed during 
   this process. This directory should be different from the monitor's default 
   directory created in the previous step, and can be removed after all the 
   steps are executed. :: 

	mkdir {tmp}

#. Retrieve the keyring for your monitors, where ``{tmp}`` is the path to 
   the retrieved keyring, and ``{key-filename}`` is the name of the file 
   containing the retrieved monitor key. :: 

	ceph auth get mon. -o {tmp}/{key-filename}

#. Retrieve the monitor map, where ``{tmp}`` is the path to 
   the retrieved monitor map, and ``{map-filename}`` is the name of the file 
   containing the retrieved monitor monitor map. :: 

	ceph mon getmap -o {tmp}/{map-filename}

#. Prepare the monitor's data directory created in the first step. You must 
   specify the path to the monitor map so that you can retrieve the 
   information about a quorum of monitors and their ``fsid``. You must also 
   specify a path to the monitor keyring:: 

	sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}
	

#. Start the new monitor and it will automatically join the cluster.
   The daemon needs to know which address to bind to, either via
   ``--public-addr {ip:port}`` or by setting ``mon addr`` in the
   appropriate section of ``ceph.conf``.  For example::

	ceph-mon -i {mon-id} --public-addr {ip:port}


Removing Monitors
=================

When you remove monitors from a cluster, consider that Ceph monitors use 
PAXOS to establish consensus about the master cluster map. You must have 
a sufficient number of monitors to establish a quorum for consensus about 
the cluster map.

.. _Removing a Monitor (Manual):

Removing a Monitor (Manual)
---------------------------

This procedure removes a ``ceph-mon`` daemon from your cluster.   If this
procedure results in only two monitor daemons, you may add or remove another
monitor until you have a number of ``ceph-mon`` daemons that can achieve a 
quorum.

#. Stop the monitor. ::

	service ceph -a stop mon.{mon-id}
	
#. Remove the monitor from the cluster. ::

	ceph mon remove {mon-id}
	
#. Remove the monitor entry from ``ceph.conf``. 


Removing Monitors from an Unhealthy Cluster
-------------------------------------------

This procedure removes a ``ceph-mon`` daemon from an unhealthy
cluster, for example a cluster where the monitors cannot form a
quorum.


#. Stop all ``ceph-mon`` daemons on all monitor hosts. ::

	ssh {mon-host}
	service ceph stop mon || stop ceph-mon-all
	# and repeat for all mons

#. Identify a surviving monitor and log in to that host. :: 

	ssh {mon-host}

#. Extract a copy of the monmap file.  ::

        ceph-mon -i {mon-id} --extract-monmap {map-path}
        # in most cases, that's
        ceph-mon -i `hostname` --extract-monmap /tmp/monmap

#. Remove the non-surviving or problematic monitors.  For example, if
   you have three monitors, ``mon.a``, ``mon.b``, and ``mon.c``, where
   only ``mon.a`` will survive, follow the example below::

	monmaptool {map-path} --rm {mon-id}
	# for example,
	monmaptool /tmp/monmap --rm b
	monmaptool /tmp/monmap --rm c
	
#. Inject the surviving map with the removed monitors into the
   surviving monitor(s).  For example, to inject a map into monitor
   ``mon.a``, follow the example below::

	ceph-mon -i {mon-id} --inject-monmap {map-path}
	# for example,
	ceph-mon -i a --inject-monmap /tmp/monmap

#. Start only the surviving monitors.

#. Verify the monitors form a quorum (``ceph -s``).

#. You may wish to archive the removed monitors' data directory in
   ``/var/lib/ceph/mon`` in a safe location, or delete it if you are
   confident the remaining monitors are healthy and are sufficiently
   redundant.

.. _Changing a Monitor's IP address:

Changing a Monitor's IP Address
===============================

.. important:: Existing monitors are not supposed to change their IP addresses.

Monitors are critical components of a Ceph cluster, and they need to maintain a
quorum for the whole system to work properly. To establish a quorum, the
monitors need to discover each other. Ceph has strict requirements for
discovering monitors.

Ceph clients and other Ceph daemons use ``ceph.conf`` to discover monitors.
However, monitors discover each other using the monitor map, not ``ceph.conf``.
For example,  if you refer to `Adding a Monitor (Manual)`_ you will see that you
need to obtain the current monmap for the cluster when creating a new monitor,
as it is one of the required arguments of ``ceph-mon -i {mon-id} --mkfs``. The
following sections explain the consistency requirements for Ceph monitors, and a
few safe ways to change a monitor's IP address.


Consistency Requirements
------------------------

A monitor always refers to the local copy of the monmap  when discovering other
monitors in the cluster.  Using the monmap instead of ``ceph.conf`` avoids
errors that could  break the cluster (e.g., typos in ``ceph.conf`` when
specifying a monitor address or port). Since monitors use monmaps for discovery
and they share monmaps with clients and other Ceph daemons, the monmap provides
monitors with a strict guarantee that their consensus is valid.

Strict consistency also applies to updates to the monmap. As with any other
updates on the monitor, changes to the monmap always run through a distributed
consensus algorithm called `Paxos`_. The monitors must agree on each update to
the monmap, such as adding or removing a monitor, to ensure that each monitor in
the quorum has the same version of the monmap. Updates to the monmap are
incremental so that monitors have the latest agreed upon version, and a set of
previous versions, allowing a monitor that has an older version of the monmap to
catch up with the current state of the cluster.

If monitors discovered each other through the Ceph configuration file instead of
through the monmap, it would introduce additional risks because the Ceph
configuration files are not updated and distributed automatically. Monitors
might inadvertently use an older ``ceph.conf`` file, fail to recognize a
monitor, fall out of a quorum, or develop a situation where `Paxos`_ is not able
to determine the current state of the system accurately. Consequently,  making
changes to an existing monitor's IP address must be done with  great care.


Changing a Monitor's IP address (The Right Way)
-----------------------------------------------

Changing a monitor's IP address in ``ceph.conf`` only is not sufficient to
ensure that other monitors in the cluster will receive the update.  To change a
monitor's IP address, you must add a new monitor with the IP  address you want
to use (as described in `Adding a Monitor (Manual)`_),  ensure that the new
monitor successfully joins the  quorum; then, remove the monitor that uses the
old IP address. Then, update the ``ceph.conf`` file to ensure that clients and
other daemons know the IP address of the new monitor.

For example, lets assume there are three monitors in place, such as :: 

	[mon.a]
		host = host01
		addr = 10.0.0.1:6789
	[mon.b]
		host = host02
		addr = 10.0.0.2:6789
	[mon.c]
		host = host03
		addr = 10.0.0.3:6789

To change ``mon.c`` to ``host04`` with the IP address  ``10.0.0.4``, follow the
steps in `Adding a Monitor (Manual)`_ by adding a  new monitor ``mon.d``. Ensure
that ``mon.d`` is  running before removing ``mon.c``, or it will break the
quorum. Remove ``mon.c`` as described on  `Removing a Monitor (Manual)`_. Moving
all three  monitors would thus require repeating this process as many times as
needed.


Changing a Monitor's IP address (The Messy Way)
-----------------------------------------------

There may come a time when the monitors must be moved to a different network,  a
different part of the datacenter or a different datacenter altogether. While  it
is possible to do it, the process becomes a bit more hazardous.

In such a case, the solution is to generate a new monmap with updated IP
addresses for all the monitors in the cluster, and inject the new map on each
individual monitor.  This is not the most user-friendly approach, but we do not
expect this to be something that needs to be done every other week.  As it is
clearly stated on the top of this section, monitors are not supposed to change
IP addresses.

Using the previous monitor configuration as an example, assume you want to move
all the  monitors from the ``10.0.0.x`` range to ``10.1.0.x``, and these
networks  are unable to communicate.  Use the following procedure:

#. Retrieve the monitor map, where ``{tmp}`` is the path to 
   the retrieved monitor map, and ``{filename}`` is the name of the file 
   containing the retrieved monitor monitor map. :: 

	ceph mon getmap -o {tmp}/{filename}

#. The following example demonstrates the contents of the monmap. ::

	$ monmaptool --print {tmp}/{filename}
	
	monmaptool: monmap file {tmp}/{filename}
	epoch 1
	fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
	last_changed 2012-12-17 02:46:41.591248
	created 2012-12-17 02:46:41.591248
	0: 10.0.0.1:6789/0 mon.a
	1: 10.0.0.2:6789/0 mon.b
	2: 10.0.0.3:6789/0 mon.c

#. Remove the existing monitors. ::

	$ monmaptool --rm a --rm b --rm c {tmp}/{filename}
	
	monmaptool: monmap file {tmp}/{filename}
	monmaptool: removing a
	monmaptool: removing b
	monmaptool: removing c
	monmaptool: writing epoch 1 to {tmp}/{filename} (0 monitors)

#. Add the new monitor locations. ::

	$ monmaptool --add a 10.1.0.1:6789 --add b 10.1.0.2:6789 --add c 10.1.0.3:6789 {tmp}/{filename}
	
	monmaptool: monmap file {tmp}/{filename}
	monmaptool: writing epoch 1 to {tmp}/{filename} (3 monitors)

#. Check new contents. ::

	$ monmaptool --print {tmp}/{filename}
	
	monmaptool: monmap file {tmp}/{filename}
	epoch 1
	fsid 224e376d-c5fe-4504-96bb-ea6332a19e61
	last_changed 2012-12-17 02:46:41.591248
	created 2012-12-17 02:46:41.591248
	0: 10.1.0.1:6789/0 mon.a
	1: 10.1.0.2:6789/0 mon.b
	2: 10.1.0.3:6789/0 mon.c

At this point, we assume the monitors (and stores) are installed at the new
location. The next step is to propagate the modified monmap to the new 
monitors, and inject the modified monmap into each new monitor.

#. First, make sure to stop all your monitors.  Injection must be done while 
   the daemon is not running.

#. Inject the monmap. ::

	ceph-mon -i {mon-id} --inject-monmap {tmp}/{filename}

#. Restart the monitors.

After this step, migration to the new location is complete and 
the monitors should operate successfully.


.. _Manual Deployment: ../../../install/manual-deployment
.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
.. _Paxos: http://en.wikipedia.org/wiki/Paxos_(computer_science)