aboutsummaryrefslogtreecommitdiffstats
path: root/manifests/profile/pacemaker
AgeCommit message (Collapse)AuthorFilesLines
2016-10-25Set redis file descriptor limit when run via pacemakerMichele Baldessari1-0/+17
The current redis file descriptor limit is 4096 because of two reasons: - It is run via the redis user - It is not started via systemd which has explicit LimitNOFILE set to 10240 (which matches the default configuration of maximum 10000 clients) Create an /etc/security/limits.d/redis.conf file in order to increase the fd limit value With this change we correctly get the following limits: [root@overcloud-controller-0 ~]# pcs status |grep -A2 redis Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-2 ] Slaves: [ overcloud-controller-0 overcloud-controller-1 ] [root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open Max open files 10240 10240 files Previously this limit was set to 4096. Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f Closes-Bug: #1635334
2016-10-20Merge "pacemaker/mysql: wait step 2 to remove default accounts"Jenkins1-1/+11
2016-10-13pacemaker/mysql: wait step 2 to remove default accountsEmilien Macchi1-1/+11
remove_default_accounts is a mysql::server parameter that, set to True, will execute some MySQL commands to cleanup MySQL defaults accounts created by packaging. In order to successfully run the commands, we need MySQL up and running, which is not the case at step 1 but at step 2. This patch make sure we run the commands at step 2 on pacemaker master only. No change for scenarios without Pacemaker. Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e Closes-Bug: #1633113
2016-10-12pacemaker: increase timeouts for rabbitmq and redisEmilien Macchi2-0/+2
When we observe the 'stop timeout' values of pacemaker resources: rabbitmq and redis, they are set to 90s. But for all other services, it is set to 200s. The overcloud deployment sometimes fails due to this with the error: Error: Could not complete shutdown of rabbitmq-clone, 1 resources remaining Error performing operation: Timer expired This patch updates the timeout for Redis and RabbitMQ to avoid this error. Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
2016-10-07Use Heat role *_enabled hiera to check Manila backendsGiulio Fidente1-8/+20
Aligns the way how we check for enabled backends in pacemaker/manila.pp with what we did in base/manila/api.pp with [1]. The benefit is that we don't need to emit from the templates custom hiera. 1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
2016-10-06Merge "Enable usage of "short names" for Galera cluster"Jenkins1-1/+6
2016-10-05Enable usage of "short names" for Galera clusterJuan Antonio Osorio Robles1-1/+6
We're not able to use FQDNs yet, so to work around this, we give precedence to a "short name" list we'll get from t-h-t. Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8 Related-Bug: #1628521
2016-10-05Change rabbitmq queues HA mode from ha-all to ha-exactlyMichele Baldessari1-1/+21
It turns out that reducing number of rabbitmq queues in cluster significantly improves performance of cluster especially in the case of failover recovery time. Right now the cluster uses ha-all mode for rabbitmq queues. It is best to change this to "ha-exactly" mode and reduce the number of queue copies to ceil(N/2) where N is number of controllers in the cluster - so in typical scenario of 3 controller It would be 2 by default. It does not make much sense to keep the copies of queues over whole cluster since if the quorum of nodes is lost then the rest of cluster nodes will be stopped anyway. We let the user override this with a parameter. I.e. for a 3 node controlplane cluster we will go from this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" To this: pcs resource show rabbitmq Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}" According to Marin Krcmarik's testing recovery time from failure was reduced significantly. Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com> Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084 Partial-Bug: #1628998
2016-10-03Fix the timeout for pacemaker systemd resourcesMichele Baldessari31-3/+40
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e we made sure that the default start and stop timeouts for pacemaker systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec in systemd). We did this change by setting puppet resource defaults for the Pacemaker::Resource::Service class: Pacemaker::Resource::Service { op_params => 'start timeout=200s stop timeout=200s', } The problem is that after the composable services rework, this does not work anymore and the pacemaker systemd resources that still exist do not have these timeouts set. We want to move away from resource defaults for this because its results are dependent on the inclusion order which in tripleo is not guaranteed any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules) The only services affected in Newton are: cinder-volume, cinder-backup, manila-share, haproxy. I preferred fixing all the pacemaker resources because it seems the cleanest and most logical commit. Change-Id: If89a95706514e536a7a2949871a0002c79b6046e Closes-Bug: #1629366
2016-09-28Merge "Move db syncs into mysql base role"Jenkins1-0/+5
2016-09-27Move db syncs into mysql base roleDan Prince1-0/+5
This patch moves the various DB syncs into the MySQL role. Database creation needs to occur on the MySQL server to avoid permission issues. This patch also moves database creation to step 2 so we can guarantee that all per-service databases exist at this time. This avoids complex ordering needed during step 3 where services, on different hosts, can run their own db sync's in a distributed fashion. Change-Id: I05cc0afa9373429a3197c194c3e8f784ae96de5f Partial-bug: #1620595
2016-09-26Merge "Add pameter for gmcast.listen_addr configuration"Jenkins1-4/+9
2016-09-26Merge "Make mysql bind-address configurable"Jenkins1-2/+7
2016-09-26Add pameter for gmcast.listen_addr configurationJuan Antonio Osorio Robles1-4/+9
having an actual name for that configuration will allow us to pass a more proper name via t-h-t. Change-Id: Iea4bd67074824e5dc6732fd7e408743e693d80b3
2016-09-24Make mysql bind-address configurableJuan Antonio Osorio Robles1-2/+7
It used to be hardcoded that the bind-address was always coming from the $::hostname fact. This is wrong, as it disregards where we have configured the mysql address. This commit actually makes it configurable, so we'll be able to set it via hieradata. On the other hand, we use the hiera key that we already set 'mysql_bind_host' as a default; if, for some reason, that's unavailable then we fall back to $::hostname. Related-Bug: #1627060 Change-Id: I316acfd514aac63b84890e20283c4ca611ccde8b
2016-09-23Move inclusion of ::manila::db::mysql in manila/api profileGiulio Fidente1-4/+4
In puppet-manila it is the api service performing db sync, not scheduler. This change moves ::manila::db::mysql (which creates the empty database and users) in the tripleo manila/api profile. Also moves rabbit config into a general manila base profile as that would be needed by the scheduler service as well. Change-Id: I2b537f735b8d1be8f39e8c274be3872b193c1014
2016-09-20Fixup manila-cephfs native backend defaultsmarios1-41/+7
The puppet-tripleo side for manila-cephfs landed without specifying defaults for all class params [1] so when cephfs isn't enabled e.g. only generic, then you will get errors for those params. See review comments at [2] for reports of this. This will fixup the manila-cephfs puppet-tripleo side to be more in line with the tidy up adding netapp at [3]. The config is all moved tripleo-heat-templates side. The tht review for this is at https://review.openstack.org/#/c/358525/ and that will now depend on this review. [1] https://review.openstack.org/#/c/354047/ [2] https://review.openstack.org/#/c/354019/ [3] https://review.openstack.org/#/c/354014/ Change-Id: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
2016-09-17Merge "mysql: never add brackets to mysql_bind_host"Jenkins1-1/+1
2016-09-16Add manila-netapp backend to manila class and tidy up genericmarios1-81/+41
This adds support for the manila-netapp backend. The backend specific config is set tht side. So this change also tidies up the manila generic config, which is unnecessarily being duplicated here ( see https://review.openstack.org/#/c/354019/ ) Change-Id: Ic6f8e8d27ca20b9badddea5d16550aa18bff8418
2016-09-16mysql: never add brackets to mysql_bind_hostEmilien Macchi1-1/+1
Don't add brackets on mysql_bind_host parameter in Galera config. Having brackets from this parameter works with old version of Galera but not newest one. So let's remove them at all, so we can safely upgrade Galera in RDO. Change-Id: Ic904d4efda162f18ec8dffb91c2f383f54361f41 Closes-Bug: #1622755
2016-09-01Merge "Write restart flags to restart services only when necessary"Jenkins7-0/+44
2016-08-30Merge "Handle galera_node_names being an array"Jenkins1-1/+10
2016-08-30Write restart flags to restart services only when necessaryJiri Stransky7-0/+44
Write restart flag file for services managed by Pacemaker into /var/lib/tripleo/pacemaker-restarts directory. The name of the file must match the name of the clone resource defined in pacemaker. The post-puppet restart script will restart each service having a restart flag file and remove those files. This approach focuses on $pacemaker_master only (we don't want to restart the pacemaker services 3 times when we have 3 controllers), so it relies on the assumption that we're making the matching config changes across the pacemaker nodes. Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
2016-08-29Merge "Removing WARNING: line has more than 140 characters in puppet-tripleo ↵Jenkins1-1/+5
profiles"
2016-08-29Handle galera_node_names being an arrayJiri Stransky1-1/+10
Prepare the pacemaker mysql manifest that galera_node_names will be an array. Stay backwards compatible to handle comma-delimited-string too and avoid a chicken-and-egg patch problem between t-h-t and puppet-tripleo. Change-Id: Ia0d9d59728c8771974bfbc486f4929b99a38e4fb Partially-Implements: blueprint custom-roles
2016-08-29Merge "Add Manila CephFS backend to manila class"Jenkins1-1/+54
2016-08-25Add Manila CephFS backend to manila classErno Kuvaja1-1/+54
Change-Id: Idaad75238a2884fe82b1e5fce3ed910d866b98a2
2016-08-23Merge "Move ceilometer api to run under apache wsgi"Jenkins2-18/+3
2016-08-17Move ceilometer api to run under apache wsgiPradeep Kilambi2-18/+3
Change-Id: If3feb859b527d08e10c124b5ad2f7f4b1f19156a
2016-08-17Configure galera-monitor on all controller nodesMichele Baldessari1-1/+1
When we implemented the galera composable role we accidentally moved the xinetd.d monitor service on the bootstrap node only. This meant that haproxy believed that galera was down on the non-bootstrap nodes. A shutdown of the bootstrap node meant that galera was effectively down because haproxy would refuse to redirect the traffic to the non-bootstrap node. Fix this by creating the /etc/xinetd.d/galera-monitor on all controller nodes. Change-Id: Ib5a06b3abbc32182476c2b0c81eb77a12821ad6b
2016-08-17Merge "Add cinder-backup profiles"Jenkins1-0/+54
2016-08-11Removing WARNING: line has more than 140 characters in puppet-tripleo profilesCarlos Camacho1-1/+5
Some lint checks are returning: WARNING: line has more than 140 characters in puppet-tripleo profiles This patch will remove those warnings by adding \'s Change-Id: I19b56c93db82948fb0498a4c9851b522c81946f8
2016-08-11Add cinder-backup profilesDan Prince1-0/+54
Adds a Cinder backup profile for Cinder backup service activation (to be used in https://review.openstack.org/#/c/304563). Cinder backup uses Swift as a default. Change-Id: Ib1dfe52b83ab01819fc669312967950e75d8ddf1 Co-Authored-By: Jon Bernard <jobernar@redhat.com> Co-Authored-By: Boris Kreitchman <bkreitch@gmail.com>
2016-08-08Fix parameters and headers inconsistency in the puppet manifests.Carlos Camacho48-229/+162
As we are staring to manually check overcloud services the first step is to check that the puppet profiles are all aligned. Changes applied: No logic added or removed in this submission. Removed unused parameters. Align header comments structure. All profiles parameters sorted following: "Mandatory params first sorted alphabetically then optional params sorted alphabetically." Note: Following submissions will check pacemaker, cinder, mistral and redis services in the base profiles as some of them has the $pacemaker_master parameter defaulted to true. Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
2016-08-05Remove keystone PKI cert generationSteven Hardy1-3/+0
We don't currently offer any parameter interface to enable PKI certs, and these have all been deprecated by keystone, so remove them. Change-Id: I8232262b928c91dcde7bea2f23fa2a7c2660719e
2016-08-04Merge "Next generation HA architecture work"Jenkins2-17/+17
2016-08-03Remove unused parameter in saharaCarlos Camacho1-12/+1
Remove unused parameter in sahara Change-Id: I46c033b410ab850289b798ee93990b6fb10c80ea
2016-08-02Fixup nit in manila pacemaker profile, duplicate variablemarios1-1/+0
See discussion at https://review.openstack.org/#/c/342961/8 Change-Id: I571b65a5402c1028418476a573ebeb9450ed00c9
2016-08-01Next generation HA architecture workMichele Baldessari2-17/+17
This change moves the cinder-volume/cinder-scheduler constraints in the cinder-scheduler profile as these can't be applied by the cinder-volume service when cinder-scheduler isn't managed by Pacemaker. Blueprint: https://blueprints.launchpad.net/tripleo/+spec/ha-lightweight-architecture Change-Id: I5e7585c08675d8a4bd071523b94210d325d79b59 Implements: blueprint ha-lightweight-architecture Co-Author: cmsj@tenshu.net
2016-07-29Move constraints to their respective servicesMichele Baldessari2-0/+36
The openstack-core-then-httpd constraint needs to live in the apache pacemaker manifest and not in the main controller manifest file. The same goes for those specific vsm/cisco neutron resources. Change-Id: I2041d4d163f051427b62eec07b8345ad7006cc1d
2016-07-29Merge "Move nova constraints from controller manifest to each service"Jenkins3-0/+87
2016-07-28Merge "Remove global openstack-core resource"Jenkins1-11/+0
2016-07-28Merge "Create role for the fake openstack-core resource"Jenkins1-0/+59
2016-07-27Move nova constraints from controller manifest to each serviceMichele Baldessari3-0/+87
Currently we are still creating all the pacemaker constraints for nova in the main overcloud_controller_pacemaker.pp manifest file. Let's move those to each role where they belong. Note that given that a constraint depends on two separate pacemaker resources it is a bit arbitrary in which file they end up being (the one of the first resource or the second one). Change-Id: I96a3a313d15fac820b020feae0568437c2cbade3
2016-07-27Remove global openstack-core resourceGiulio Fidente1-11/+0
The openstack-core resource is not needed by the NG Pacemaker architecture. It was moved into an isolated role by [1] so that it could optionally be enabled when wanting the older architecture. This submission removes the old openstack-core global resource. 1. I74a62973146c0261385ecf5fd3d06db51e079caa Change-Id: I16a786ce167c57848551c7245f4344c382c55b3d
2016-07-27Create role for the fake openstack-core resourceGiulio Fidente1-0/+59
Change-Id: I74a62973146c0261385ecf5fd3d06db51e079caa
2016-07-27profile/base/nova: declare nova class and configure cache correctly.Emilien Macchi1-1/+8
Nova {} workaround is not working correctly, we need to merge this patch so we can move out ::nova from THT completely. Also we need to use nova::cache to configure memcached parameters. Co-Authorized-By: Giulio Fidente <gfidente@redhat.com> Co-Authorized-By: Sven Anderson <sven@redhat.com> Co-Authorized-By: Emilien Macchi <emilien@redhat.com> Depends-On: I52d5badb9960124bb8fcb54983db2853c4185e77 Depends-On: I3e400a5f64b85f0d374fc02cc5e4080d19d0f2e4 Depends-On: Iee5f8015cbf40ca0e9a435a7de919ebdb74cf93f Change-Id: Ie4e72e765f6a8ade48d4b2b766f067872554d1a2
2016-07-25Merge "Add base constraint so gnocchi metricd is tied to core-clone"Jenkins1-0/+9
2016-07-22Merge "use parameter to lookup the step instead of hiera again"Jenkins1-3/+3
2016-07-22Merge "Remove unused redis_vip parameter"Jenkins1-4/+0