aboutsummaryrefslogtreecommitdiffstats
path: root/manifests/profile/base/pacemaker.pp
AgeCommit message (Collapse)AuthorFilesLines
2017-08-23Use resource collector for the fencing -> stonith orderingMichele Baldessari1-1/+1
Change Ifef08033043a4cc90a6261e962d2fdecdf275650 moved the stonith property definition to the pacemaker_master node. This means that the Class['tripleo::fencing'] -> Class['pacemaker::stonith'] ordering breaks on non-boostrap pacemaker nodes because the pacemaker::stonith property is not defined there any longer. Let's fix this by simply using a resource collector and set the ordering on that instead of adding yet anoth if statement. Ordering on enablement of stonith is actually more correct formally. Tested this on a broken setup successfully. Closes-Bug: #1712605 Change-Id: I616d340bdf75da9d9eb8b83b2e804dff3d07d58e
2017-08-01Enable encryption of pacemaker traffic by defaultJuan Antonio Osorio Robles1-2/+18
We already are setting a pre-shared key by default for the pacemaker cluster. This was done in order to communicate with TLS-PSK with pacemaker-remote clusters. This key is also useful for us to enable encrypted traffic for the regular cluster traffic, which we enable by default with this patch. Change-Id: I349b8bf79eeeaa4ddde1c17b7014603913f184cf
2017-07-21Fix lint issues to upgrade to puppet-lint 2.3Carlos Camacho1-2/+2
2017-07-20 15:09:38.571317 | manifests/glance/nfs_mount.pp:65:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571430 | manifests/pacemaker/haproxy_with_vip.pp:107:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571473 | manifests/pacemaker/haproxy_with_vip.pp:108:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571511 | manifests/pacemaker/haproxy_with_vip.pp:109:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571551 | manifests/pacemaker/resource_restart_flag.pp:44:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571590 | manifests/profile/base/cinder/volume/nfs.pp:72:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571625 | manifests/profile/base/docker.pp:188:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571661 | manifests/profile/base/docker.pp:210:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571699 | manifests/profile/base/logging/fluentd.pp:79:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571735 | manifests/profile/base/pacemaker.pp:107:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571773 | manifests/profile/base/swift/ringbuilder.pp:97:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571811 | manifests/profile/base/swift/ringbuilder.pp:125:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571850 | manifests/profile/base/swift/ringbuilder.pp:130:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571889 | manifests/profile/pacemaker/ceph/rbdmirror.pp:79:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571927 | manifests/profile/pacemaker/cinder/backup.pp:66:WARNING: arrow should be on the right operand's line 2017-07-20 15:09:38.571965 | manifests/profile/pacemaker/ovn_northd.pp:96:WARNING: arrow should be on the right operand's line Change-Id: I9393c5e04310cf84695531df9bb16f33e7e15abb
2017-06-16Merge "Ensure hiera step value is an integer"Jenkins1-1/+1
2017-06-14Only set the stonith property on the pacemaker_master nodeMichele Baldessari1-3/+5
It makes little sense to enforce the stonith property on remote nodes and/or all cluster nodes. We can just enforce it once on the pacemaker_master node as it is a cluster-wide property anyway. We can also remove the tripleo::fencing -> pacemaker::stonith constraint in the pacemaker remote profile now as the fencing stuff happens on step 5 anyway and the property is set at step 1. While this works in general it creates extra CIB changes for nothing and slows down the deployment. Change-Id: Ifef08033043a4cc90a6261e962d2fdecdf275650 Closes-Bug: #1696336
2017-06-14Ensure hiera step value is an integerSteve Baker1-1/+1
The step is typically set with the hieradata setting an integer value: {"step": 1} However it would be useful for the value to be a string so that substitutions are possible, for example: {"step": "%{::step}"} This change ensures the step parameter defaults to an integer by calling Integer(hiera('step')) This change was made by manually removing the undef defaults from fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with: find ./ -type f -print0 |xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/" Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3
2017-05-07Use verify_on_create when creating pacemaker remote resourcesMichele Baldessari1-0/+1
We currently create remote resources without waiting for their creation. This leads to the following potential race (spotted by Marian Mkrcmari): - On Step1 pacemaker bootstrap node creates the resource but the remote resource is not yet created - Step1 completes and Step2 starts - On Step2 the remote node sets a property (or calls pcs cib) but the remote is not yet set up so 'pcs cluster cib' will fail there with: (err): Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20170506-15994-1swnk1i failed with code: 1 -> Note that when verify_on_create is set to true we are not using the cib dump/push mechanism. That is fine because we create the remotes on step1 and the dump/push mechanism is only needed starting from step2 when multiple nodes set cluster properties at the same time. Tested by Marian Mkrcmari successfully as well. Closes-Bug: #1689028 Change-Id: I764526b3f3c06591d477cc92779d83a19802368e Depends-On: I1db31dcc92b8695ab0522bba91df729b37f34e0f
2017-04-05Make the cluster-check property configurableMichele Baldessari1-0/+25
This change will make the global cluster-check property configurable and will pick a lower default (60s) in case a pacemaker remote node is deployed. The cluster-recheck-interval is set to default to 15minutes by pacemaker. This value is too high when a pacemaker remote service is deployed. With this default value a reboot of a pacemaker remote node will be reported as offline by pacemaker for up to 15minutes. With this change we do the following: 1) Do nothing in case pacemaker remote is not deployed 2) When pacemaker remote is deployed and the operator has not specified otherwise, we set the recheck interval to 60s. 3) When the operator specifies the recheck interval we set that. Change-Id: I900952b33317b7998a1f26a65f4d70c1726df19c Closes-Bug: #1679753
2017-01-24pacemaker remote profile supportMichele Baldessari1-2/+58
This support enables a base profile called pacemaker_remote which will allow the operator to automatically configure the pacemaker_remote service on such nodes. This manifest also automatically adds any pacemaker_remote nodes to the pacemaker cluster. Depends-On: I0c01ecb7df1a0f9856fdc866b9d06acf0283fa4f Depends-On: Ic0488f4fc63e35b9aede60fae1e2cab34b1fbdd5 Change-Id: I92953afcc7d536d387381f08164cae8b52f41605
2017-01-19Add retries to the ::pacemaker::stonith propertyMichele Baldessari1-1/+7
Let's set a default number of retries also for the stonith property creation. Just like we do for most of the composable HA resource creation. Change-Id: Ie6e19cc838a3f45100f6c98a350bdf6a37d40590 Depends-On: I20098c5d69cde356fe79f6d8dbdc03ae42ecb3ef
2017-01-18Do not depend on bootstrap_nodeid for any pacemaker profileMichele Baldessari1-1/+1
When we create a pacemaker resource it must happen from a single node. If it happens from multiple nodes an immediate error will be returned by pcs. For the pacemaker roles we enforce this by leveraging the recently introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us the first hostname per-service, regardless of the role. (introduced via I03e8685f939e8ae1fcd8b16883b559615042505d) With this approach if a pacemaker service belongs to two different roles (say role Controller on node A and role galera on node B), it will only create the resource from one of the two and not both (which would return an error). Only setting Partial-Bug for this one, because it addresses the issue from the pacemaker resource creation POV (which is always affected). But the issue itself is a race that we're theoretically affected by since the composable roles work landed. While I have tried to fix the more general case in previous attempts, I think it is best if we start a discussion on how to fix it, because each approach has a bunch of potential drawbacks and is quite invasive on how we do things. A discussion slot for this has been proposed for the Atlanta PTG. Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a Partial-Bug: #1615983
2016-12-08Do not use hardcoded controller_node_names when setting up the clusterMichele Baldessari1-1/+2
Currently we chose the pacemaker cluster node by simply taking hiera('controller_node_names'). We should instead use the pacemaker_short_node_names array which is built dynamically from all the nodes that include the pacemaker service. Change-Id: I0a3e4acaab11e078da5eeb2ef2adde5387785927
2016-10-13Ensure presence of pacemaker restart directory.Sofer Athlan-Guyot1-4/+0
Currently the /var/lib/tripleo/pacemaker-restarts directory is created only when base/pacemaker.pp file is included in the manifest. There is a notification that ensures precedence order and trigger the touch. The trigger and the dependency on the base/pacemaker.pp should not be required as someone using the tripleo::pacemaker::resource_restart_flag would expect the file to be created no matter what. For instance in the Cinder upgrade in the convergence step has this defined: Cinder_config<||> ~> Tripleo::Pacemaker::Resource_restart_flag["${::cinder::params::volume_service}"] but in the convergence step, the base/pacemaker.pp is not included and the above trigger fails as the directory is not created. It looks the same for manilla.pp. This patch removes the trigger and ensures the directory is created when needed. Change-Id: Ic3aa82c818662e9e88e21c8381d657adef5b43ac Closes-Bug: #1632232
2016-10-03Fix the timeout for pacemaker systemd resourcesMichele Baldessari1-7/+0
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e we made sure that the default start and stop timeouts for pacemaker systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec in systemd). We did this change by setting puppet resource defaults for the Pacemaker::Resource::Service class: Pacemaker::Resource::Service { op_params => 'start timeout=200s stop timeout=200s', } The problem is that after the composable services rework, this does not work anymore and the pacemaker systemd resources that still exist do not have these timeouts set. We want to move away from resource defaults for this because its results are dependent on the inclusion order which in tripleo is not guaranteed any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules) The only services affected in Newton are: cinder-volume, cinder-backup, manila-share, haproxy. I preferred fixing all the pacemaker resources because it seems the cleanest and most logical commit. Change-Id: If89a95706514e536a7a2949871a0002c79b6046e Closes-Bug: #1629366
2016-08-30Write restart flags to restart services only when necessaryJiri Stransky1-0/+4
Write restart flag file for services managed by Pacemaker into /var/lib/tripleo/pacemaker-restarts directory. The name of the file must match the name of the clone resource defined in pacemaker. The post-puppet restart script will restart each service having a restart flag file and remove those files. This approach focuses on $pacemaker_master only (we don't want to restart the pacemaker services 3 times when we have 3 controllers), so it relies on the assumption that we're making the matching config changes across the pacemaker nodes. Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
2016-08-08Fix parameters and headers inconsistency in the puppet manifests.Carlos Camacho1-1/+0
As we are staring to manually check overcloud services the first step is to check that the puppet profiles are all aligned. Changes applied: No logic added or removed in this submission. Removed unused parameters. Align header comments structure. All profiles parameters sorted following: "Mandatory params first sorted alphabetically then optional params sorted alphabetically." Note: Following submissions will check pacemaker, cinder, mistral and redis services in the base profiles as some of them has the $pacemaker_master parameter defaulted to true. Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
2016-07-27Remove global openstack-core resourceGiulio Fidente1-6/+0
The openstack-core resource is not needed by the NG Pacemaker architecture. It was moved into an isolated role by [1] so that it could optionally be enabled when wanting the older architecture. This submission removes the old openstack-core global resource. 1. I74a62973146c0261385ecf5fd3d06db51e079caa Change-Id: I16a786ce167c57848551c7245f4344c382c55b3d
2016-07-22use parameter to lookup the step instead of hiera againEmilien Macchi1-3/+3
In some profiles, we were looking up the $step by using Hiera again, while we already do it in the parameter definition. When using this class outside THT, it will fail but with this patch, we could use just set the $step parameter and the rest of the manifest will work. Change-Id: I7082f47204fb4e529b164e4c4f1032e7bdd88f02
2016-07-15openstack-core resource does not have interleave=trueMichele Baldessari1-1/+1
The dummy openstack-core resource was meant to replace keystone so that restarting keystone would not restart the whole cloud. When this resource was introduced the paramter interleave=true was mistakenly left out. This causes a simple promote operation on the galera resource to restart openstack-core and its children. Change-Id: Ic590005a9419be87e6e6ea131b0ac0630c5afc19 Closes-Bug: 1603381
2016-07-12Implement Pacemaker service profileEmilien Macchi1-0/+93
Change-Id: I46215f82480854b5e04aef1ac1609dd99455181b Closes-Bug: #1601970