Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit implements composable HA for the pacemaker profiles.
- Everytime a pacemaker resource gets included on a node,
that node will add a node cluster property with the name of the resource
(e.g. galera-role=true)
- Add a location rule constraint to force running the resource only
on the nodes that have that property
- We also make sure that any pacemaker resource/property creation has a
predefined number of tries (20 by default). The reason for this is
that within composable HA, it might be possible to get "older CIB"
errors when another node changed the CIB while we were doing an
operation on it. Simply retrying fixes this.
- Also make sure that we use the newly introduced
pacemaker::constraint::order class instead of the older
pacemaker::constraint::base class. The former uses the push_cib()
function and hence behaves correctly in case multiple nodes try
to modify the CIB at the same time.
Change-Id: I63da4f48da14534fd76265764569e76300534472
Depends-On: Ib931adaff43dbc16220a90fb509845178d696402
Depends-On: I8d78cc1b14f0e18e034b979a826bf3cdb0878bae
Depends-On: Iba1017c33b1cd4d56a3ee8824d851b38cfdbc2d3
|
|
When we create a pacemaker resource it must happen from a single node.
If it happens from multiple nodes an immediate error will be returned by
pcs.
For the pacemaker roles we enforce this by leveraging the recently
introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us
the first hostname per-service, regardless of the role.
(introduced via I03e8685f939e8ae1fcd8b16883b559615042505d)
With this approach if a pacemaker service belongs to two different
roles (say role Controller on node A and role galera on node B), it
will only create the resource from one of the two and not both (which
would return an error).
Only setting Partial-Bug for this one, because it addresses the issue
from the pacemaker resource creation POV (which is always affected). But
the issue itself is a race that we're theoretically affected by since
the composable roles work landed. While I have tried to fix the more
general case in previous attempts, I think it is best if we start a
discussion on how to fix it, because each approach has a bunch of
potential drawbacks and is quite invasive on how we do things. A
discussion slot for this has been proposed for the Atlanta PTG.
Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a
Partial-Bug: #1615983
|
|
Manila ceph driver reads ceph's client configuration
(keyring is the most important) from ceph.conf file
(or any other file set by cephfs_conf_path). ceph.conf
should be updated with keyring location.
If ceph is deployed by tripleo then also manila ceph key
is added into ceph and ceph filesystem is created.
Depends-On: I18436a64fc991b9e697a1d79e369ac110cf8fe20
Change-Id: Iac4a260af6738ed6afd4bcb107221a736d07c1b5
Partial-Bug: #1644784
Closes-Bug: #1646147
|
|
Manila pacemaker manifest (which sets manila share service only)
includes also manila api and scheduler manifests. There is no
reason for this. Also it causes that on whichever node manila share
service runs also manila api and scheduler services are started.
Change-Id: Ia1b39ef36c5bc34813cd6430b69ad9b698acc3cf
Closes-Bug: #1653500
|
|
Aligns the way how we check for enabled backends in
pacemaker/manila.pp with what we did in base/manila/api.pp with [1].
The benefit is that we don't need to emit from the templates
custom hiera.
1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb
Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
|
|
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e
we made sure that the default start and stop timeouts for pacemaker
systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec
in systemd). We did this change by setting puppet resource defaults for
the Pacemaker::Resource::Service class:
Pacemaker::Resource::Service {
op_params => 'start timeout=200s stop timeout=200s',
}
The problem is that after the composable services rework, this does not
work anymore and the pacemaker systemd resources that still exist do not
have these timeouts set.
We want to move away from resource defaults for this because its results
are dependent on the inclusion order which in tripleo is not guaranteed
any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules)
The only services affected in Newton are: cinder-volume,
cinder-backup, manila-share, haproxy. I preferred fixing all the
pacemaker resources because it seems the cleanest and most logical
commit.
Change-Id: If89a95706514e536a7a2949871a0002c79b6046e
Closes-Bug: #1629366
|
|
In puppet-manila it is the api service performing db sync, not
scheduler. This change moves ::manila::db::mysql (which creates
the empty database and users) in the tripleo manila/api profile.
Also moves rabbit config into a general manila base profile as
that would be needed by the scheduler service as well.
Change-Id: I2b537f735b8d1be8f39e8c274be3872b193c1014
|
|
The puppet-tripleo side for manila-cephfs landed without specifying
defaults for all class params [1] so when cephfs isn't enabled
e.g. only generic, then you will get errors for those params. See
review comments at [2] for reports of this.
This will fixup the manila-cephfs puppet-tripleo side to be more
in line with the tidy up adding netapp at [3]. The config is all
moved tripleo-heat-templates side. The tht review for this
is at https://review.openstack.org/#/c/358525/ and that will now
depend on this review.
[1] https://review.openstack.org/#/c/354047/
[2] https://review.openstack.org/#/c/354019/
[3] https://review.openstack.org/#/c/354014/
Change-Id: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
|
|
This adds support for the manila-netapp backend. The backend
specific config is set tht side. So this change also
tidies up the manila generic config, which is unnecessarily
being duplicated here
( see https://review.openstack.org/#/c/354019/ )
Change-Id: Ic6f8e8d27ca20b9badddea5d16550aa18bff8418
|
|
Write restart flag file for services managed by Pacemaker into
/var/lib/tripleo/pacemaker-restarts directory. The name of the file must
match the name of the clone resource defined in pacemaker. The
post-puppet restart script will restart each service having a restart
flag file and remove those files.
This approach focuses on $pacemaker_master only (we don't want to
restart the pacemaker services 3 times when we have 3 controllers), so
it relies on the assumption that we're making the matching config
changes across the pacemaker nodes.
Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
|
|
Change-Id: Idaad75238a2884fe82b1e5fce3ed910d866b98a2
|
|
As we are staring to manually check overcloud services
the first step is to check that the puppet profiles
are all aligned.
Changes applied:
No logic added or removed in this submission.
Removed unused parameters.
Align header comments structure.
All profiles parameters sorted following:
"Mandatory params first sorted alphabetically
then optional params sorted alphabetically."
Note: Following submissions will check pacemaker,
cinder, mistral and redis services in the base profiles
as some of them has the $pacemaker_master parameter
defaulted to true.
Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
|
|
See discussion at https://review.openstack.org/#/c/342961/8
Change-Id: I571b65a5402c1028418476a573ebeb9450ed00c9
|
|
The tripleo-heat-templates side that uses this is at
https://review.openstack.org/#/c/188137/
Change-Id: I444916d60a67bf730bf4089323dba1c1429e2e71
Implements: blueprint refactor-puppet-manifests
|