Age | Commit message (Collapse) | Author | Files | Lines |
|
When we create a pacemaker resource it must happen from a single node.
If it happens from multiple nodes an immediate error will be returned by
pcs.
For the pacemaker roles we enforce this by leveraging the recently
introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us
the first hostname per-service, regardless of the role.
(introduced via I03e8685f939e8ae1fcd8b16883b559615042505d)
With this approach if a pacemaker service belongs to two different
roles (say role Controller on node A and role galera on node B), it
will only create the resource from one of the two and not both (which
would return an error).
Only setting Partial-Bug for this one, because it addresses the issue
from the pacemaker resource creation POV (which is always affected). But
the issue itself is a race that we're theoretically affected by since
the composable roles work landed. While I have tried to fix the more
general case in previous attempts, I think it is best if we start a
discussion on how to fix it, because each approach has a bunch of
potential drawbacks and is quite invasive on how we do things. A
discussion slot for this has been proposed for the Atlanta PTG.
Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a
Partial-Bug: #1615983
|
|
The current redis file descriptor limit is 4096 because of two reasons:
- It is run via the redis user
- It is not started via systemd which has explicit LimitNOFILE set to
10240 (which matches the default configuration of maximum 10000
clients)
Create an /etc/security/limits.d/redis.conf file in order to increase
the fd limit value With this change we correctly get the following
limits:
[root@overcloud-controller-0 ~]# pcs status |grep -A2 redis
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-controller-2 ]
Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
[root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open
Max open files 10240 10240 files
Previously this limit was set to 4096.
Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f
Closes-Bug: #1635334
|
|
When we observe the 'stop timeout' values of pacemaker resources:
rabbitmq and redis, they are set to 90s. But for all other services, it
is set to 200s.
The overcloud deployment sometimes fails due to this with the error:
Error: Could not complete shutdown of rabbitmq-clone, 1 resources
remaining
Error performing operation: Timer expired
This patch updates the timeout for Redis and RabbitMQ to avoid this
error.
Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
|
|
Write restart flag file for services managed by Pacemaker into
/var/lib/tripleo/pacemaker-restarts directory. The name of the file must
match the name of the clone resource defined in pacemaker. The
post-puppet restart script will restart each service having a restart
flag file and remove those files.
This approach focuses on $pacemaker_master only (we don't want to
restart the pacemaker services 3 times when we have 3 controllers), so
it relies on the assumption that we're making the matching config
changes across the pacemaker nodes.
Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
|
|
As we are staring to manually check overcloud services
the first step is to check that the puppet profiles
are all aligned.
Changes applied:
No logic added or removed in this submission.
Removed unused parameters.
Align header comments structure.
All profiles parameters sorted following:
"Mandatory params first sorted alphabetically
then optional params sorted alphabetically."
Note: Following submissions will check pacemaker,
cinder, mistral and redis services in the base profiles
as some of them has the $pacemaker_master parameter
defaulted to true.
Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
|
|
Change-Id: I6ba962c682dc2ab8c6ee5238e0c176d9ae05d696
|
|
Implements: blueprint refactor-puppet-manifests
Co-Authored-By: Carlos Camacho <ccamacho@redhat.com>
Change-Id: I60493a3aa64e5136b763e8e2084d728f5f812f8a
|