aboutsummaryrefslogtreecommitdiffstats
path: root/manifests/profile/pacemaker
AgeCommit message (Collapse)AuthorFilesLines
2017-01-18Do not depend on bootstrap_nodeid for any pacemaker profileMichele Baldessari7-13/+19
When we create a pacemaker resource it must happen from a single node. If it happens from multiple nodes an immediate error will be returned by pcs. For the pacemaker roles we enforce this by leveraging the recently introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us the first hostname per-service, regardless of the role. (introduced via I03e8685f939e8ae1fcd8b16883b559615042505d) With this approach if a pacemaker service belongs to two different roles (say role Controller on node A and role galera on node B), it will only create the resource from one of the two and not both (which would return an error). Only setting Partial-Bug for this one, because it addresses the issue from the pacemaker resource creation POV (which is always affected). But the issue itself is a race that we're theoretically affected by since the composable roles work landed. While I have tried to fix the more general case in previous attempts, I think it is best if we start a discussion on how to fix it, because each approach has a bunch of potential drawbacks and is quite invasive on how we do things. A discussion slot for this has been proposed for the Atlanta PTG. Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a Partial-Bug: #1615983
2017-01-11Set ceph key when using manila ceph backendJan Provaznik1-1/+36
Manila ceph driver reads ceph's client configuration (keyring is the most important) from ceph.conf file (or any other file set by cephfs_conf_path). ceph.conf should be updated with keyring location. If ceph is deployed by tripleo then also manila ceph key is added into ceph and ceph filesystem is created. Depends-On: I18436a64fc991b9e697a1d79e369ac110cf8fe20 Change-Id: Iac4a260af6738ed6afd4bcb107221a736d07c1b5 Partial-Bug: #1644784 Closes-Bug: #1646147
2017-01-02Don't include api/scheduler manifests on manila share service set upJan Provaznik1-2/+0
Manila pacemaker manifest (which sets manila share service only) includes also manila api and scheduler manifests. There is no reason for this. Also it causes that on whichever node manila share service runs also manila api and scheduler services are started. Change-Id: Ia1b39ef36c5bc34813cd6430b69ad9b698acc3cf Closes-Bug: #1653500
2016-12-04Merge "Remove unused pacemaker profiles"Jenkins42-2553/+0
2016-11-29pacemaker: create Mysql_user once Galera is ready (puppet4)Emilien Macchi1-1/+2
Puppet 4 ordering make things more strict in catalog, which is good. Resources have to be orchestrated or Puppet will take them in the order they are found in catalog. This patch makes sure we create MySQL users only when Galera is actually ready. Closes-Bug: #1645787 Change-Id: I536a1a128c3a7eca49bcc4f34a1307bcd60b029e
2016-10-27Merge "Set redis file descriptor limit when run via pacemaker"Jenkins1-0/+17
2016-10-25Only restart haproxy services when enable_load_balancer is definedMichele Baldessari1-1/+1
If we upgrade a cloud that was configured with external load balancer the process will fail during convergence step because it will try to restart haproxy which is not configured when an external load balancer is configured. Closes-Bug: #1636527 Change-Id: I6f6caec3e5c96e77437c1c83e625f39649a66c48
2016-10-25Remove unused pacemaker profilesMichele Baldessari42-2553/+0
With the landing of HA NG in Newton we can actually remove the pacemaker profiles we do not need. The only ones that are being used in one form or the other are: $ grep -ir services\/pacemaker environments | awk '{ print $3 }' | sort | uniq ../puppet/services/pacemaker/cinder-backup.yaml ../puppet/services/pacemaker/cinder-volume.yaml ../puppet/services/pacemaker/database/mysql.yaml ../puppet/services/pacemaker/database/redis.yaml ../puppet/services/pacemaker/haproxy.yaml ../puppet/services/pacemaker/manila-share.yaml ../puppet/services/pacemaker/rabbitmq.yaml ../puppet/services/pacemaker.yaml The only exception is profile/pacemaker/database/mongodbvalidator because it is included by profile/base/database/mongodb.pp Change-Id: I80c8559bb2d915385bcc20ae71fe144ddd6591c1
2016-10-25Set redis file descriptor limit when run via pacemakerMichele Baldessari1-0/+17
The current redis file descriptor limit is 4096 because of two reasons: - It is run via the redis user - It is not started via systemd which has explicit LimitNOFILE set to 10240 (which matches the default configuration of maximum 10000 clients) Create an /etc/security/limits.d/redis.conf file in order to increase the fd limit value With this change we correctly get the following limits: [root@overcloud-controller-0 ~]# pcs status |grep -A2 redis Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-2 ] Slaves: [ overcloud-controller-0 overcloud-controller-1 ] [root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open Max open files 10240 10240 files Previously this limit was set to 4096. Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f Closes-Bug: #1635334
2016-10-20Merge "pacemaker/mysql: wait step 2 to remove default accounts"Jenkins1-1/+11
2016-10-13pacemaker/mysql: wait step 2 to remove default accountsEmilien Macchi1-1/+11
remove_default_accounts is a mysql::server parameter that, set to True, will execute some MySQL commands to cleanup MySQL defaults accounts created by packaging. In order to successfully run the commands, we need MySQL up and running, which is not the case at step 1 but at step 2. This patch make sure we run the commands at step 2 on pacemaker master only. No change for scenarios without Pacemaker. Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e Closes-Bug: #1633113
2016-10-12pacemaker: increase timeouts for rabbitmq and redisEmilien Macchi2-0/+2
When we observe the 'stop timeout' values of pacemaker resources: rabbitmq and redis, they are set to 90s. But for all other services, it is set to 200s. The overcloud deployment sometimes fails due to this with the error: Error: Could not complete shutdown of rabbitmq-clone, 1 resources remaining Error performing operation: Timer expired This patch updates the timeout for Redis and RabbitMQ to avoid this error. Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
2016-10-07Use Heat role *_enabled hiera to check Manila backendsGiulio Fidente1-8/+20
Aligns the way how we check for enabled backends in pacemaker/manila.pp with what we did in base/manila/api.pp with [1]. The benefit is that we don't need to emit from the templates custom hiera. 1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
2016-10-06Merge "Enable usage of "short names" for Galera cluster"Jenkins1-1/+6
2016-10-05Enable usage of "short names" for Galera clusterJuan Antonio Osorio Robles1-1/+6
We're not able to use FQDNs yet, so to work around this, we give precedence to a "short name" list we'll get from t-h-t. Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8 Related-Bug: #1628521
2016-10-05Change rabbitmq queues HA mode from ha-all to ha-exactlyMichele Baldessari1-1/+21
It turns out that reducing number of rabbitmq queues in cluster significantly improves performance of cluster especially in the case of failover recovery time. Right now the cluster uses ha-all mode for rabbitmq queues. It is best to change this to "ha-exactly" mode and reduce the number of queue copies to ceil(N/2) where N is number of controllers in the cluster - so in typical scenario of 3 controller It would be 2 by default. It does not make much sense to keep the copies of queues over whole cluster since if the quorum of nodes is lost then the rest of cluster nodes will be stopped anyway. We let the user override this with a parameter. I.e. for a 3 node controlplane cluster we will go from this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" To this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}" According to Marin Krcmarik's testing recovery time from failure was reduced significantly. Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com> Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 Partial-Bug: #1628998
2016-10-03Fix the timeout for pacemaker systemd resourcesMichele Baldessari31-3/+40
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e we made sure that the default start and stop timeouts for pacemaker systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec in systemd). We did this change by setting puppet resource defaults for the Pacemaker::Resource::Service class: Pacemaker::Resource::Service { op_params => 'start timeout=200s stop timeout=200s', } The problem is that after the composable services rework, this does not work anymore and the pacemaker systemd resources that still exist do not have these timeouts set. We want to move away from resource defaults for this because its results are dependent on the inclusion order which in tripleo is not guaranteed any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules) The only services affected in Newton are: cinder-volume, cinder-backup, manila-share, haproxy. I preferred fixing all the pacemaker resources because it seems the cleanest and most logical commit. Change-Id: If89a95706514e536a7a2949871a0002c79b6046e Closes-Bug: #1629366
2016-09-28Merge "Move db syncs into mysql base role"Jenkins1-0/+5
2016-09-27Move db syncs into mysql base roleDan Prince1-0/+5
This patch moves the various DB syncs into the MySQL role. Database creation needs to occur on the MySQL server to avoid permission issues. This patch also moves database creation to step 2 so we can guarantee that all per-service databases exist at this time. This avoids complex ordering needed during step 3 where services, on different hosts, can run their own db sync's in a distributed fashion. Change-Id: I05cc0afa9373429a3197c194c3e8f784ae96de5f Partial-bug: #1620595
2016-09-26Merge "Add pameter for gmcast.listen_addr configuration"Jenkins1-4/+9
2016-09-26Merge "Make mysql bind-address configurable"Jenkins1-2/+7
2016-09-26Add pameter for gmcast.listen_addr configurationJuan Antonio Osorio Robles1-4/+9
having an actual name for that configuration will allow us to pass a more proper name via t-h-t. Change-Id: Iea4bd67074824e5dc6732fd7e408743e693d80b3
2016-09-24Make mysql bind-address configurableJuan Antonio Osorio Robles1-2/+7
It used to be hardcoded that the bind-address was always coming from the $::hostname fact. This is wrong, as it disregards where we have configured the mysql address. This commit actually makes it configurable, so we'll be able to set it via hieradata. On the other hand, we use the hiera key that we already set 'mysql_bind_host' as a default; if, for some reason, that's unavailable then we fall back to $::hostname. Related-Bug: #1627060 Change-Id: I316acfd514aac63b84890e20283c4ca611ccde8b
2016-09-23Move inclusion of ::manila::db::mysql in manila/api profileGiulio Fidente1-4/+4
In puppet-manila it is the api service performing db sync, not scheduler. This change moves ::manila::db::mysql (which creates the empty database and users) in the tripleo manila/api profile. Also moves rabbit config into a general manila base profile as that would be needed by the scheduler service as well. Change-Id: I2b537f735b8d1be8f39e8c274be3872b193c1014
2016-09-20Fixup manila-cephfs native backend defaultsmarios1-41/+7
The puppet-tripleo side for manila-cephfs landed without specifying defaults for all class params [1] so when cephfs isn't enabled e.g. only generic, then you will get errors for those params. See review comments at [2] for reports of this. This will fixup the manila-cephfs puppet-tripleo side to be more in line with the tidy up adding netapp at [3]. The config is all moved tripleo-heat-templates side. The tht review for this is at https://review.openstack.org/#/c/358525/ and that will now depend on this review. [1] https://review.openstack.org/#/c/354047/ [2] https://review.openstack.org/#/c/354019/ [3] https://review.openstack.org/#/c/354014/ Change-Id: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
2016-09-17Merge "mysql: never add brackets to mysql_bind_host"Jenkins1-1/+1
2016-09-16Add manila-netapp backend to manila class and tidy up genericmarios1-81/+41
This adds support for the manila-netapp backend. The backend specific config is set tht side. So this change also tidies up the manila generic config, which is unnecessarily being duplicated here ( see https://review.openstack.org/#/c/354019/ ) Change-Id: Ic6f8e8d27ca20b9badddea5d16550aa18bff8418
2016-09-16mysql: never add brackets to mysql_bind_hostEmilien Macchi1-1/+1
Don't add brackets on mysql_bind_host parameter in Galera config. Having brackets from this parameter works with old version of Galera but not newest one. So let's remove them at all, so we can safely upgrade Galera in RDO. Change-Id: Ic904d4efda162f18ec8dffb91c2f383f54361f41 Closes-Bug: #1622755
2016-09-01Merge "Write restart flags to restart services only when necessary"Jenkins7-0/+44
2016-08-30Merge "Handle galera_node_names being an array"Jenkins1-1/+10
2016-08-30Write restart flags to restart services only when necessaryJiri Stransky7-0/+44
Write restart flag file for services managed by Pacemaker into /var/lib/tripleo/pacemaker-restarts directory. The name of the file must match the name of the clone resource defined in pacemaker. The post-puppet restart script will restart each service having a restart flag file and remove those files. This approach focuses on $pacemaker_master only (we don't want to restart the pacemaker services 3 times when we have 3 controllers), so it relies on the assumption that we're making the matching config changes across the pacemaker nodes. Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
2016-08-29Merge "Removing WARNING: line has more than 140 characters in puppet-tripleo ↵Jenkins1-1/+5
profiles"
2016-08-29Handle galera_node_names being an arrayJiri Stransky1-1/+10
Prepare the pacemaker mysql manifest that galera_node_names will be an array. Stay backwards compatible to handle comma-delimited-string too and avoid a chicken-and-egg patch problem between t-h-t and puppet-tripleo. Change-Id: Ia0d9d59728c8771974bfbc486f4929b99a38e4fb Partially-Implements: blueprint custom-roles
2016-08-29Merge "Add Manila CephFS backend to manila class"Jenkins1-1/+54
2016-08-25Add Manila CephFS backend to manila classErno Kuvaja1-1/+54
Change-Id: Idaad75238a2884fe82b1e5fce3ed910d866b98a2
2016-08-23Merge "Move ceilometer api to run under apache wsgi"Jenkins2-18/+3
2016-08-17Move ceilometer api to run under apache wsgiPradeep Kilambi2-18/+3
Change-Id: If3feb859b527d08e10c124b5ad2f7f4b1f19156a
2016-08-17Configure galera-monitor on all controller nodesMichele Baldessari1-1/+1
When we implemented the galera composable role we accidentally moved the xinetd.d monitor service on the bootstrap node only. This meant that haproxy believed that galera was down on the non-bootstrap nodes. A shutdown of the bootstrap node meant that galera was effectively down because haproxy would refuse to redirect the traffic to the non-bootstrap node. Fix this by creating the /etc/xinetd.d/galera-monitor on all controller nodes. Change-Id: Ib5a06b3abbc32182476c2b0c81eb77a12821ad6b
2016-08-17Merge "Add cinder-backup profiles"Jenkins1-0/+54
2016-08-11Removing WARNING: line has more than 140 characters in puppet-tripleo profilesCarlos Camacho1-1/+5
Some lint checks are returning: WARNING: line has more than 140 characters in puppet-tripleo profiles This patch will remove those warnings by adding \'s Change-Id: I19b56c93db82948fb0498a4c9851b522c81946f8
2016-08-11Add cinder-backup profilesDan Prince1-0/+54
Adds a Cinder backup profile for Cinder backup service activation (to be used in https://review.openstack.org/#/c/304563). Cinder backup uses Swift as a default. Change-Id: Ib1dfe52b83ab01819fc669312967950e75d8ddf1 Co-Authored-By: Jon Bernard <jobernar@redhat.com> Co-Authored-By: Boris Kreitchman <bkreitch@gmail.com>
2016-08-08Fix parameters and headers inconsistency in the puppet manifests.Carlos Camacho48-229/+162
As we are staring to manually check overcloud services the first step is to check that the puppet profiles are all aligned. Changes applied: No logic added or removed in this submission. Removed unused parameters. Align header comments structure. All profiles parameters sorted following: "Mandatory params first sorted alphabetically then optional params sorted alphabetically." Note: Following submissions will check pacemaker, cinder, mistral and redis services in the base profiles as some of them has the $pacemaker_master parameter defaulted to true. Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
2016-08-05Remove keystone PKI cert generationSteven Hardy1-3/+0
We don't currently offer any parameter interface to enable PKI certs, and these have all been deprecated by keystone, so remove them. Change-Id: I8232262b928c91dcde7bea2f23fa2a7c2660719e
2016-08-04Merge "Next generation HA architecture work"Jenkins2-17/+17
2016-08-03Remove unused parameter in saharaCarlos Camacho1-12/+1
Remove unused parameter in sahara Change-Id: I46c033b410ab850289b798ee93990b6fb10c80ea
2016-08-02Fixup nit in manila pacemaker profile, duplicate variablemarios1-1/+0
See discussion at https://review.openstack.org/#/c/342961/8 Change-Id: I571b65a5402c1028418476a573ebeb9450ed00c9
2016-08-01Next generation HA architecture workMichele Baldessari2-17/+17
This change moves the cinder-volume/cinder-scheduler constraints in the cinder-scheduler profile as these can't be applied by the cinder-volume service when cinder-scheduler isn't managed by Pacemaker. Blueprint: https://blueprints.launchpad.net/tripleo/+spec/ha-lightweight-architecture Change-Id: I5e7585c08675d8a4bd071523b94210d325d79b59 Implements: blueprint ha-lightweight-architecture Co-Author: cmsj@tenshu.net
2016-07-29Move constraints to their respective servicesMichele Baldessari2-0/+36
The openstack-core-then-httpd constraint needs to live in the apache pacemaker manifest and not in the main controller manifest file. The same goes for those specific vsm/cisco neutron resources. Change-Id: I2041d4d163f051427b62eec07b8345ad7006cc1d
2016-07-29Merge "Move nova constraints from controller manifest to each service"Jenkins3-0/+87
2016-07-28Merge "Remove global openstack-core resource"Jenkins1-11/+0