aboutsummaryrefslogtreecommitdiffstats
path: root/manifests/profile/pacemaker
AgeCommit message (Collapse)AuthorFilesLines
2017-06-03Merge "Clustercheck, monitor service for galera containers"Jenkins1-0/+65
2017-06-03Merge "Puppet module to deploy MySQL bundle for HA"Jenkins1-0/+302
2017-06-03Merge "Puppet module to deploy RabbitMQ bundle for HA"Jenkins1-0/+194
2017-06-02Merge "Puppet module to deploy Redis bundle for HA"Jenkins1-0/+178
2017-06-02Puppet module to deploy MySQL bundle for HADamien Ciabrini1-0/+302
This module is used by tripleo-heat-templates to configure and deploy Kolla-based mysql containers managed by pacemaker. We use short-lived containers that call pcs via puppet to create the needed pacemaker resources, properties and constraints. Co-Authored-By: Michele Baldesari <michele@acksyn.org> Partial-Bug: #1692842 Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f Change-Id: I3b4d8ad2eec70080419882d5d822f78ebd3721ae
2017-06-02Merge "Puppet module to deploy HAProxy bundle for HA"Jenkins1-0/+196
2017-05-30Puppet module to deploy RabbitMQ bundle for HADamien Ciabrini1-0/+194
This module is used by tripleo-heat-templates to configure and deploy Kolla-based RabbitMQ containers managed by pacemaker. We use short-lived containers that call pcs via puppet to create the needed pacemaker resources, properties and constraints. Co-Authored-By: Michele Baldessari <michele@acksyn.org> Co-Authored-By: John Eckersberg <jeckersb@redhat.com> Partial-Bug: #1692909 Change-Id: I0722e4a4d4716f477e8304cfa1aadd3eef7c2f31 Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f
2017-05-25Puppet module to deploy HAProxy bundle for HADamien Ciabrini1-0/+196
This module is used by tripleo-heat-templates to configure and deploy Kolla-based haproxy containers managed by pacemaker. We use short-lived containers that call pcs via puppet to create the needed pacemaker resources, properties and constraints. Co-Authored-By: Michele Baldesari <michele@acksyn.org> Partial-Bug: #1692908 Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f Change-Id: Ifcf890a88ef003d3ab754cb677cbf34ba8db9312
2017-05-25Puppet module to deploy Redis bundle for HADamien1-0/+178
This module is used by tripleo-heat-templates to configure and deploy Kolla-based Redis containers managed by pacemaker. We use short-lived containers that call pcs via puppet to create the needed pacemaker resources, properties and constraints. Co-Authored-By: Michele Baldesari <michele@acksyn.org> Partial-Bug: #1692924 Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f Change-Id: Ia1131611d15670190b7b6654f72e6290bf7f8b9e
2017-05-23Clustercheck, monitor service for galera containersDamien Ciabrini1-0/+65
In HA overcloud deployments, HAProxy makes use of a helper service called "clustercheck", to check whether galera nodes are available for serving traffic. This change implements a dedicated profile for clustercheck, which was originally part of the pacemaker mysql profile. The profile generates the necessary configuration files for clustercheck and let heat templates manage the associated container's lifecycle. Co-Authored-By: Michele Baldessari <michele@acksyn.org> Partial-Bug: #1692969 Change-Id: I1aabe34fa6a9c8c705a4405f275b66502c313cf2
2017-05-16Composable Role for Neutron LBaaSRyan Hefner1-0/+44
Add composable service interface for Neutron LBaaSv2 service. Change-Id: Ieeb21fafd340fdfbaddbe7633946fe0f05c640c9
2017-05-05Remove limits for redis in /etc/security/limits.dMichele Baldessari1-15/+16
Now that puppet-redis supports ulimit for cluster managed redis (via https://github.com/arioch/puppet-redis/pull/192), we need to remove the file snippet as otherwise we will get a duplicate resource error. We will need to create a THT change that at the very least sets the redis::managed_by_cluster_manager key to true so that /etc/security/limits.d/redis.conf gets created. We also add code to not break backwards compatibility with the old hiera key. Change-Id: I4ffccfe3e3ba862d445476c14c8f2cb267fa108d Partial-Bug: #1688464
2017-04-26Add a flag to rabbitmq so that we can deploy with ha-mode: all againMichele Baldessari1-2/+6
In change Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 we switched the rabbitmq queues HA mode from ha-all to ha-exactly. While this gives us a nice performance boost with rabbitmq, it makes rabbit less resilient to network glitches as we painfully found out via https://bugzilla.redhat.com/show_bug.cgi?id=1441635. Will propose another THT change to actually change the default to -1 so we get this ha-mode:all by default. Change-Id: I9a90e71094b8d8d58b5be0a45a2979701b0ac21c Partial-Bug: #1686337 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Co-Authored-By: John Eckersberg <jeckersb@redhat.com>
2017-04-06Make galera-ready exec refreshonlyAlex Schultz1-2/+3
Previously we were always run the galera-ready exec every step. This change switches it to be refreshonly so we only wait when the service is setup or restarted. Change-Id: I5ff9d49c2590751913b96777bcd72c8a15627a01 Closes-Bug: #1680586
2017-02-17Merge "xinetd: bind only on mysql network"Jenkins1-0/+1
2017-02-16Merge "Fix a typo in mysql.pp"Jenkins1-1/+1
2017-02-03Revert "Revert "set innodb_file_per_table to ON for MySQL / Galera""Alex Schultz1-0/+1
This reverts commit 3f7e74ab24bb43f9ad7e24e0efd4206ac6a3dd4e. After identifying how to workaround the performance issues on the undercloud, let's put this back in. Enabling innodb_file_per_table is important for operators to be able to better manage their databases. Change-Id: I435de381a0f0e3ef221e498f442335cdce3fb818 Depends-On: I77507c638237072e38d9888aff3da884aeff0b59 Closes-Bug: #1660722
2017-02-02Revert "set innodb_file_per_table to ON for MySQL / Galera"Alex Schultz1-1/+0
This reverts commit 621ea892a299d2029348db2b56fea1338bd41c48. We're getting performance problems on SATA disks. Change-Id: I30312fd5ca3405694d57e6a4ff98b490de388b92 Closes-Bug: #1661396 Related-Bug: #1660722
2017-02-01set innodb_file_per_table to ON for MySQL / GaleraMike Bayer1-0/+1
InnoDB uses a single file by default which can grow to be tens/hundreds of gigabytes, and is not shrinkable even if data is deleted from the database. Best practices are that innodb_file_per_table is set to ON which instead stores each database table in its own file, each of which is also shrinkable by the InnoDB engine. Closes-Bug: #1660722 Change-Id: I59ee53f6462a2eeddad72b1d75c77a69322d5de4
2017-01-26Support composable HA for the Ceph rbdmirror daemonGiulio Fidente1-1/+21
Follow up patch for I63da4f48da14534fd76265764569e76300534472 to support composable HA for the Ceph rbdmirror daemon. Change-Id: I3767bee4b1c7849fa85e71bcc57534b393d2d415
2017-01-26Merge "Ensure basic Ceph configuration is performed by RBD mirror"Jenkins1-0/+1
2017-01-25Merge "Composable HA"Jenkins7-47/+204
2017-01-25Composable HAMichele Baldessari7-47/+204
This commit implements composable HA for the pacemaker profiles. - Everytime a pacemaker resource gets included on a node, that node will add a node cluster property with the name of the resource (e.g. galera-role=true) - Add a location rule constraint to force running the resource only on the nodes that have that property - We also make sure that any pacemaker resource/property creation has a predefined number of tries (20 by default). The reason for this is that within composable HA, it might be possible to get "older CIB" errors when another node changed the CIB while we were doing an operation on it. Simply retrying fixes this. - Also make sure that we use the newly introduced pacemaker::constraint::order class instead of the older pacemaker::constraint::base class. The former uses the push_cib() function and hence behaves correctly in case multiple nodes try to modify the CIB at the same time. Change-Id: I63da4f48da14534fd76265764569e76300534472 Depends-On: Ib931adaff43dbc16220a90fb509845178d696402 Depends-On: I8d78cc1b14f0e18e034b979a826bf3cdb0878bae Depends-On: Iba1017c33b1cd4d56a3ee8824d851b38cfdbc2d3
2017-01-25Ensure basic Ceph configuration is performed by RBD mirrorGiulio Fidente1-0/+1
Previously we missed to perform the basic Ceph client configuration on a node where only the RBD mirror service was deployed. Change-Id: Ie6a4284a88714bcee964a38636e12aa88bb95c9d Co-Authored-By: Michele Baldessari <michele@acksyn.org> Related-Bug: #1652177
2017-01-25Fix wrong hiera key in ceph_rbdmirrorMichele Baldessari1-1/+1
There is a typo in the bootstrap check which will lead to: Could not find data item ceph_rbdmirror_bootstrap_short_node_name in any Hiera data file and no default supplied at /etc/puppet/modules/tripleo/manifests/profile/pacemaker/ceph/rbdmirror.pp We need to be using the correct one: $ hiera ceph_rbdmirror_short_bootstrap_node_name overcloud-remote-0 Change-Id: Ic343e5f99e48360bdd2d2989781a4b6ca484e8fc
2017-01-18Add Ceph RBD mirror Pacemaker profileGiulio Fidente1-0/+77
This change adds a profile for the Ceph RBD mirror service, which should be managed by Pacemaker to make sure there is always a single instance running. Change-Id: Ic63dc5cffece38942d305f538f71dd58a5d50789 Partial-Bug: #1652177
2017-01-18Do not depend on bootstrap_nodeid for any pacemaker profileMichele Baldessari7-13/+19
When we create a pacemaker resource it must happen from a single node. If it happens from multiple nodes an immediate error will be returned by pcs. For the pacemaker roles we enforce this by leveraging the recently introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us the first hostname per-service, regardless of the role. (introduced via I03e8685f939e8ae1fcd8b16883b559615042505d) With this approach if a pacemaker service belongs to two different roles (say role Controller on node A and role galera on node B), it will only create the resource from one of the two and not both (which would return an error). Only setting Partial-Bug for this one, because it addresses the issue from the pacemaker resource creation POV (which is always affected). But the issue itself is a race that we're theoretically affected by since the composable roles work landed. While I have tried to fix the more general case in previous attempts, I think it is best if we start a discussion on how to fix it, because each approach has a bunch of potential drawbacks and is quite invasive on how we do things. A discussion slot for this has been proposed for the Atlanta PTG. Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a Partial-Bug: #1615983
2017-01-11Set ceph key when using manila ceph backendJan Provaznik1-1/+36
Manila ceph driver reads ceph's client configuration (keyring is the most important) from ceph.conf file (or any other file set by cephfs_conf_path). ceph.conf should be updated with keyring location. If ceph is deployed by tripleo then also manila ceph key is added into ceph and ceph filesystem is created. Depends-On: I18436a64fc991b9e697a1d79e369ac110cf8fe20 Change-Id: Iac4a260af6738ed6afd4bcb107221a736d07c1b5 Partial-Bug: #1644784 Closes-Bug: #1646147
2017-01-05Fix a typo in mysql.ppCao Xuan Hoang1-1/+1
Removed redundant 'the' Change-Id: Ie2051f35ec1e7010423c46084f5512c02af85f33
2017-01-02Don't include api/scheduler manifests on manila share service set upJan Provaznik1-2/+0
Manila pacemaker manifest (which sets manila share service only) includes also manila api and scheduler manifests. There is no reason for this. Also it causes that on whichever node manila share service runs also manila api and scheduler services are started. Change-Id: Ia1b39ef36c5bc34813cd6430b69ad9b698acc3cf Closes-Bug: #1653500
2016-12-10xinetd: bind only on mysql networkDimitri Savineau1-0/+1
By default galera-monitor xinetd is binding on all the interfaces. That means that the port 9200 is exposed on the external network. Because haproxy is using the same network for the backend and the check we can reuse it for the xinetd binding. Change-Id: If1a50515593e81f46d67309bdeecbe84c1d0ebe4
2016-12-04Merge "Remove unused pacemaker profiles"Jenkins42-2553/+0
2016-11-29pacemaker: create Mysql_user once Galera is ready (puppet4)Emilien Macchi1-1/+2
Puppet 4 ordering make things more strict in catalog, which is good. Resources have to be orchestrated or Puppet will take them in the order they are found in catalog. This patch makes sure we create MySQL users only when Galera is actually ready. Closes-Bug: #1645787 Change-Id: I536a1a128c3a7eca49bcc4f34a1307bcd60b029e
2016-10-27Merge "Set redis file descriptor limit when run via pacemaker"Jenkins1-0/+17
2016-10-25Only restart haproxy services when enable_load_balancer is definedMichele Baldessari1-1/+1
If we upgrade a cloud that was configured with external load balancer the process will fail during convergence step because it will try to restart haproxy which is not configured when an external load balancer is configured. Closes-Bug: #1636527 Change-Id: I6f6caec3e5c96e77437c1c83e625f39649a66c48
2016-10-25Remove unused pacemaker profilesMichele Baldessari42-2553/+0
With the landing of HA NG in Newton we can actually remove the pacemaker profiles we do not need. The only ones that are being used in one form or the other are: $ grep -ir services\/pacemaker environments | awk '{ print $3 }' | sort | uniq ../puppet/services/pacemaker/cinder-backup.yaml ../puppet/services/pacemaker/cinder-volume.yaml ../puppet/services/pacemaker/database/mysql.yaml ../puppet/services/pacemaker/database/redis.yaml ../puppet/services/pacemaker/haproxy.yaml ../puppet/services/pacemaker/manila-share.yaml ../puppet/services/pacemaker/rabbitmq.yaml ../puppet/services/pacemaker.yaml The only exception is profile/pacemaker/database/mongodbvalidator because it is included by profile/base/database/mongodb.pp Change-Id: I80c8559bb2d915385bcc20ae71fe144ddd6591c1
2016-10-25Set redis file descriptor limit when run via pacemakerMichele Baldessari1-0/+17
The current redis file descriptor limit is 4096 because of two reasons: - It is run via the redis user - It is not started via systemd which has explicit LimitNOFILE set to 10240 (which matches the default configuration of maximum 10000 clients) Create an /etc/security/limits.d/redis.conf file in order to increase the fd limit value With this change we correctly get the following limits: [root@overcloud-controller-0 ~]# pcs status |grep -A2 redis Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-2 ] Slaves: [ overcloud-controller-0 overcloud-controller-1 ] [root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open Max open files 10240 10240 files Previously this limit was set to 4096. Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f Closes-Bug: #1635334
2016-10-20Merge "pacemaker/mysql: wait step 2 to remove default accounts"Jenkins1-1/+11
2016-10-13pacemaker/mysql: wait step 2 to remove default accountsEmilien Macchi1-1/+11
remove_default_accounts is a mysql::server parameter that, set to True, will execute some MySQL commands to cleanup MySQL defaults accounts created by packaging. In order to successfully run the commands, we need MySQL up and running, which is not the case at step 1 but at step 2. This patch make sure we run the commands at step 2 on pacemaker master only. No change for scenarios without Pacemaker. Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e Closes-Bug: #1633113
2016-10-12pacemaker: increase timeouts for rabbitmq and redisEmilien Macchi2-0/+2
When we observe the 'stop timeout' values of pacemaker resources: rabbitmq and redis, they are set to 90s. But for all other services, it is set to 200s. The overcloud deployment sometimes fails due to this with the error: Error: Could not complete shutdown of rabbitmq-clone, 1 resources remaining Error performing operation: Timer expired This patch updates the timeout for Redis and RabbitMQ to avoid this error. Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
2016-10-07Use Heat role *_enabled hiera to check Manila backendsGiulio Fidente1-8/+20
Aligns the way how we check for enabled backends in pacemaker/manila.pp with what we did in base/manila/api.pp with [1]. The benefit is that we don't need to emit from the templates custom hiera. 1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
2016-10-06Merge "Enable usage of "short names" for Galera cluster"Jenkins1-1/+6
2016-10-05Enable usage of "short names" for Galera clusterJuan Antonio Osorio Robles1-1/+6
We're not able to use FQDNs yet, so to work around this, we give precedence to a "short name" list we'll get from t-h-t. Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8 Related-Bug: #1628521
2016-10-05Change rabbitmq queues HA mode from ha-all to ha-exactlyMichele Baldessari1-1/+21
It turns out that reducing number of rabbitmq queues in cluster significantly improves performance of cluster especially in the case of failover recovery time. Right now the cluster uses ha-all mode for rabbitmq queues. It is best to change this to "ha-exactly" mode and reduce the number of queue copies to ceil(N/2) where N is number of controllers in the cluster - so in typical scenario of 3 controller It would be 2 by default. It does not make much sense to keep the copies of queues over whole cluster since if the quorum of nodes is lost then the rest of cluster nodes will be stopped anyway. We let the user override this with a parameter. I.e. for a 3 node controlplane cluster we will go from this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" To this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}" According to Marin Krcmarik's testing recovery time from failure was reduced significantly. Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com> Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 Partial-Bug: #1628998
2016-10-03Fix the timeout for pacemaker systemd resourcesMichele Baldessari31-3/+40
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e we made sure that the default start and stop timeouts for pacemaker systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec in systemd). We did this change by setting puppet resource defaults for the Pacemaker::Resource::Service class: Pacemaker::Resource::Service { op_params => 'start timeout=200s stop timeout=200s', } The problem is that after the composable services rework, this does not work anymore and the pacemaker systemd resources that still exist do not have these timeouts set. We want to move away from resource defaults for this because its results are dependent on the inclusion order which in tripleo is not guaranteed any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules) The only services affected in Newton are: cinder-volume, cinder-backup, manila-share, haproxy. I preferred fixing all the pacemaker resources because it seems the cleanest and most logical commit. Change-Id: If89a95706514e536a7a2949871a0002c79b6046e Closes-Bug: #1629366
2016-09-28Merge "Move db syncs into mysql base role"Jenkins1-0/+5
2016-09-27Move db syncs into mysql base roleDan Prince1-0/+5
This patch moves the various DB syncs into the MySQL role. Database creation needs to occur on the MySQL server to avoid permission issues. This patch also moves database creation to step 2 so we can guarantee that all per-service databases exist at this time. This avoids complex ordering needed during step 3 where services, on different hosts, can run their own db sync's in a distributed fashion. Change-Id: I05cc0afa9373429a3197c194c3e8f784ae96de5f Partial-bug: #1620595
2016-09-26Merge "Add pameter for gmcast.listen_addr configuration"Jenkins1-4/+9
2016-09-26Merge "Make mysql bind-address configurable"Jenkins1-2/+7
2016-09-26Add pameter for gmcast.listen_addr configurationJuan Antonio Osorio Robles1-4/+9
having an actual name for that configuration will allow us to pass a more proper name via t-h-t. Change-Id: Iea4bd67074824e5dc6732fd7e408743e693d80b3