aboutsummaryrefslogtreecommitdiffstats
path: root/manifests/profile/pacemaker/rabbitmq.pp
AgeCommit message (Collapse)AuthorFilesLines
2017-06-14Ensure hiera step value is an integerSteve Baker1-1/+1
The step is typically set with the hieradata setting an integer value: {"step": 1} However it would be useful for the value to be a string so that substitutions are possible, for example: {"step": "%{::step}"} This change ensures the step parameter defaults to an integer by calling Integer(hiera('step')) This change was made by manually removing the undef defaults from fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with: find ./ -type f -print0 |xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/" Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3
2017-04-26Add a flag to rabbitmq so that we can deploy with ha-mode: all againMichele Baldessari1-2/+6
In change Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 we switched the rabbitmq queues HA mode from ha-all to ha-exactly. While this gives us a nice performance boost with rabbitmq, it makes rabbit less resilient to network glitches as we painfully found out via https://bugzilla.redhat.com/show_bug.cgi?id=1441635. Will propose another THT change to actually change the default to -1 so we get this ha-mode:all by default. Change-Id: I9a90e71094b8d8d58b5be0a45a2979701b0ac21c Partial-Bug: #1686337 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Co-Authored-By: John Eckersberg <jeckersb@redhat.com>
2017-01-25Composable HAMichele Baldessari1-15/+35
This commit implements composable HA for the pacemaker profiles. - Everytime a pacemaker resource gets included on a node, that node will add a node cluster property with the name of the resource (e.g. galera-role=true) - Add a location rule constraint to force running the resource only on the nodes that have that property - We also make sure that any pacemaker resource/property creation has a predefined number of tries (20 by default). The reason for this is that within composable HA, it might be possible to get "older CIB" errors when another node changed the CIB while we were doing an operation on it. Simply retrying fixes this. - Also make sure that we use the newly introduced pacemaker::constraint::order class instead of the older pacemaker::constraint::base class. The former uses the push_cib() function and hence behaves correctly in case multiple nodes try to modify the CIB at the same time. Change-Id: I63da4f48da14534fd76265764569e76300534472 Depends-On: Ib931adaff43dbc16220a90fb509845178d696402 Depends-On: I8d78cc1b14f0e18e034b979a826bf3cdb0878bae Depends-On: Iba1017c33b1cd4d56a3ee8824d851b38cfdbc2d3
2017-01-18Do not depend on bootstrap_nodeid for any pacemaker profileMichele Baldessari1-2/+2
When we create a pacemaker resource it must happen from a single node. If it happens from multiple nodes an immediate error will be returned by pcs. For the pacemaker roles we enforce this by leveraging the recently introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us the first hostname per-service, regardless of the role. (introduced via I03e8685f939e8ae1fcd8b16883b559615042505d) With this approach if a pacemaker service belongs to two different roles (say role Controller on node A and role galera on node B), it will only create the resource from one of the two and not both (which would return an error). Only setting Partial-Bug for this one, because it addresses the issue from the pacemaker resource creation POV (which is always affected). But the issue itself is a race that we're theoretically affected by since the composable roles work landed. While I have tried to fix the more general case in previous attempts, I think it is best if we start a discussion on how to fix it, because each approach has a bunch of potential drawbacks and is quite invasive on how we do things. A discussion slot for this has been proposed for the Atlanta PTG. Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a Partial-Bug: #1615983
2016-10-12pacemaker: increase timeouts for rabbitmq and redisEmilien Macchi1-0/+1
When we observe the 'stop timeout' values of pacemaker resources: rabbitmq and redis, they are set to 90s. But for all other services, it is set to 200s. The overcloud deployment sometimes fails due to this with the error: Error: Could not complete shutdown of rabbitmq-clone, 1 resources remaining Error performing operation: Timer expired This patch updates the timeout for Redis and RabbitMQ to avoid this error. Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
2016-10-05Change rabbitmq queues HA mode from ha-all to ha-exactlyMichele Baldessari1-1/+21
It turns out that reducing number of rabbitmq queues in cluster significantly improves performance of cluster especially in the case of failover recovery time. Right now the cluster uses ha-all mode for rabbitmq queues. It is best to change this to "ha-exactly" mode and reduce the number of queue copies to ceil(N/2) where N is number of controllers in the cluster - so in typical scenario of 3 controller It would be 2 by default. It does not make much sense to keep the copies of queues over whole cluster since if the quorum of nodes is lost then the rest of cluster nodes will be stopped anyway. We let the user override this with a parameter. I.e. for a 3 node controlplane cluster we will go from this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" To this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}" According to Marin Krcmarik's testing recovery time from failure was reduced significantly. Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com> Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 Partial-Bug: #1628998
2016-08-30Write restart flags to restart services only when necessaryJiri Stransky1-0/+6
Write restart flag file for services managed by Pacemaker into /var/lib/tripleo/pacemaker-restarts directory. The name of the file must match the name of the clone resource defined in pacemaker. The post-puppet restart script will restart each service having a restart flag file and remove those files. This approach focuses on $pacemaker_master only (we don't want to restart the pacemaker services 3 times when we have 3 controllers), so it relies on the assumption that we're making the matching config changes across the pacemaker nodes. Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
2016-08-08Fix parameters and headers inconsistency in the puppet manifests.Carlos Camacho1-6/+5
As we are staring to manually check overcloud services the first step is to check that the puppet profiles are all aligned. Changes applied: No logic added or removed in this submission. Removed unused parameters. Align header comments structure. All profiles parameters sorted following: "Mandatory params first sorted alphabetically then optional params sorted alphabetically." Note: Following submissions will check pacemaker, cinder, mistral and redis services in the base profiles as some of them has the $pacemaker_master parameter defaulted to true. Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
2016-05-17Composable role for RabbitMQEmilien Macchi1-0/+67
Add RabbitMQ composable role, and keep the same logic that we had in THT. Implements: blueprint refactor-puppet-manifests Change-Id: I961bdbe1cc6dd1d4a315de616439f9fc77d793ae