Age | Commit message (Collapse) | Author | Files | Lines |
|
This is needed because when we run bundles we actually
want to store attributes on a per-node basis and not on a per-bundle
basis. By activating this attribute pacemaker will pass
some extra OCS_RESKEY_CRM_meta attributes that will help us in this
decision.
We can merge this once we have packages for pacemaker and
resource-agents releases that contain the necessary fixes.
Proper pacemaker and resource-agents are now in the repo [1] so
we can merge it and backport it to pike.
[1] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-pike/
Closes-Bug: #1713007
Change-Id: I0dd06e953b4c81f217d0f4199b2337e4c3358086
(cherry picked from commit 6bcb011723ad7b75f18914c887dc4fa4bad4d620)
|
|
This uses the tls_proxy resource in front of the Redis server when
internal TLS is enabled.
bp tls-via-certmonger
Co-Authored-By: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Change-Id: Ia50933da9e59268b17f56db34d01dcc6b6c38147
(cherry picked from commit 2d1d7875aa6f0b68005c84189627bc0716a7693f)
|
|
In non-containerized deployments, Galera can be configured to use TLS
for gcomm group communication when enable_internal_tls is set to true.
Fix the creation of the mysql bundle resource to enable TLS when
configured. The key and cert are passed as other configuration files
and must be copied by Kolla at container startup.
Change-Id: If845baa7b0a437c28148c817b7f94d540ca15814
Partial-Bug: #1708135
|
|
Mistakenly this was set to 3121 which is the same port that pacemaker
remote uses. Move this to 3122 which was the plan all along.
Also fix a wrong port comment in redis and mysql at the same time.
Change-Id: Iccca6a53a769570443091577c7d86f47119d9cbb
|
|
|
|
This solves a problem with bind-mounts when the containers are holding
files descriptors open.
At the same time this makes the template more robust to puppet changes
since new config files will be available in the containers without
needing to update the templates.
Closes-Bug: #1698323
Change-Id: I857c94ba5f7f064d7c58df621ec5d477654b9166
Depends-On: I78dcec741a941dc21adba33ba33a6dc6ff1d217c
|
|
The innodb_flush_log_at_trx_commit flag changes the timing
of when the log buffer is written to disk for writes.
At its default of 1, transactions are written to disk
and the buffer flushed on a per-transaction basis; but when
set to 2, the flush of the buffer proceeds only once per
second. This removes the durability guarantee for the
single node. However the central concept of Galera is
that durability is achieved via the cluster as a whole,
in that transactions are replicated to other nodes before
the commit succeeds (though not necessarily written to disk
unless wsrep_causal_reads is set). In this model,
data would only be lost of all nodes of the Galera cluster
were killed within one second of each other. Percona's
blog post at https://www.percona.com/blog/2014/11/17/typical-misconceptions-on-galera-for-mysql/
recommends that the value of 2 should be considered "safe"
for a Galera cluster unless you are in fact worried that
all three nodes will be powered off simultaneously.
The value here is added as an option only, defaulting
to the usual default of "1", flush per transaction.
Change-Id: Id5a30f1daf978e094a74db2d284febbc9ae64bb3
|
|
This enables the options so Galera can use TLS for the replication
traffic.
bp tls-via-certmonger
Depends-On: I9252303b92a2805ba83f86a85770db2551a014d3
Change-Id: I2ee3bf4bbda3f65f5b03440ecbc75f14225a2428
|
|
The step is typically set with the hieradata setting an integer value:
{"step": 1}
However it would be useful for the value to be a string so that
substitutions are possible, for example:
{"step": "%{::step}"}
This change ensures the step parameter defaults to an integer by
calling Integer(hiera('step'))
This change was made by manually removing the undef defaults from
fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with:
find ./ -type f -print0 |xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/"
Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3
|
|
|
|
This takes into use the cluster_host_map, which allows to give aliases
to the pacemaker nodes (which are FQDNs), and allows us to configure the
cluster using FQDNs.
We need FQDNs in order to request certificates, since the default CA
(FreeIPA) only allows certificates for FQDNs.
Change-Id: I2f146afdd32aef2d11cf25a65fa8d67428f621f5
|
|
|
|
In composable HA we bind resources to nodes that have special
node properties. We need to do this also for bundle resources
otherwise there is a potential race where the bundle might be
started on nodes where it is not supposed to during a small
window of time.
Tested with the depends-on and correctly obtained a containerized
composable HA deployment:
Docker container set: rabbitmq-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-rabbitmq:latest]
rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-rabbit-0
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-rabbit-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-rabbit-2
Docker container set: galera-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-mariadb:latest]
galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-galera-0
galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-galera-1
galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-galera-2
Docker container set: redis-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-redis:latest]
redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0
redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1
redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2
ip-192.168.24.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-10.0.0.7 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.16.2.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
ip-172.16.2.9 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.16.1.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.16.3.7 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
Docker container set: haproxy-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-haproxy:latest]
haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0
haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-1
haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-2
Depends-On: I44449861cbfe56304b8829c9ca10fd648353b3ae
Change-Id: I48fb490040497ba08cae19937159c0efdf99e3f8
|
|
|
|
This module is used by tripleo-heat-templates to configure and deploy
Kolla-based mysql containers managed by pacemaker.
We use short-lived containers that call pcs via puppet to create
the needed pacemaker resources, properties and constraints.
Co-Authored-By: Michele Baldesari <michele@acksyn.org>
Partial-Bug: #1692842
Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f
Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f
Change-Id: I3b4d8ad2eec70080419882d5d822f78ebd3721ae
|
|
Since galera is configured to use rsync, we ought to make sure the
package is installed. Particularly when using deployed-server, the
package is not always installed by default depending on what was used to
install the servers.
Change-Id: I92ee78f2dd2c0f7fd4d393b104166407d7c654e2
Closes-Bug: #1693003
|
|
This module is used by tripleo-heat-templates to configure and deploy
Kolla-based Redis containers managed by pacemaker.
We use short-lived containers that call pcs via puppet to create
the needed pacemaker resources, properties and constraints.
Co-Authored-By: Michele Baldesari <michele@acksyn.org>
Partial-Bug: #1692924
Depends-On: I44fbd7f89ab22b72e8d3fc0a0e3fe54a9418a60f
Depends-On: Ie9b7e7d2a3cec4b121915a17c1e809e4ec950e7f
Change-Id: Ia1131611d15670190b7b6654f72e6290bf7f8b9e
|
|
Now that puppet-redis supports ulimit for cluster managed redis (via
https://github.com/arioch/puppet-redis/pull/192), we need to remove the
file snippet as otherwise we will get a duplicate resource error.
We will need to create a THT change that at the very least sets the
redis::managed_by_cluster_manager key to true so that
/etc/security/limits.d/redis.conf gets created.
We also add code to not break backwards compatibility with the old hiera
key.
Change-Id: I4ffccfe3e3ba862d445476c14c8f2cb267fa108d
Partial-Bug: #1688464
|
|
Previously we were always run the galera-ready exec every step. This
change switches it to be refreshonly so we only wait when the service is
setup or restarted.
Change-Id: I5ff9d49c2590751913b96777bcd72c8a15627a01
Closes-Bug: #1680586
|
|
|
|
|
|
This reverts commit 3f7e74ab24bb43f9ad7e24e0efd4206ac6a3dd4e.
After identifying how to workaround the performance issues on the
undercloud, let's put this back in. Enabling innodb_file_per_table is
important for operators to be able to better manage their databases.
Change-Id: I435de381a0f0e3ef221e498f442335cdce3fb818
Depends-On: I77507c638237072e38d9888aff3da884aeff0b59
Closes-Bug: #1660722
|
|
This reverts commit 621ea892a299d2029348db2b56fea1338bd41c48.
We're getting performance problems on SATA disks.
Change-Id: I30312fd5ca3405694d57e6a4ff98b490de388b92
Closes-Bug: #1661396
Related-Bug: #1660722
|
|
InnoDB uses a single file by default which can grow to be
tens/hundreds of gigabytes, and is not shrinkable even
if data is deleted from the database.
Best practices are that innodb_file_per_table is set to ON
which instead stores each database table in its own file, each of
which is also shrinkable by the InnoDB engine.
Closes-Bug: #1660722
Change-Id: I59ee53f6462a2eeddad72b1d75c77a69322d5de4
|
|
This commit implements composable HA for the pacemaker profiles.
- Everytime a pacemaker resource gets included on a node,
that node will add a node cluster property with the name of the resource
(e.g. galera-role=true)
- Add a location rule constraint to force running the resource only
on the nodes that have that property
- We also make sure that any pacemaker resource/property creation has a
predefined number of tries (20 by default). The reason for this is
that within composable HA, it might be possible to get "older CIB"
errors when another node changed the CIB while we were doing an
operation on it. Simply retrying fixes this.
- Also make sure that we use the newly introduced
pacemaker::constraint::order class instead of the older
pacemaker::constraint::base class. The former uses the push_cib()
function and hence behaves correctly in case multiple nodes try
to modify the CIB at the same time.
Change-Id: I63da4f48da14534fd76265764569e76300534472
Depends-On: Ib931adaff43dbc16220a90fb509845178d696402
Depends-On: I8d78cc1b14f0e18e034b979a826bf3cdb0878bae
Depends-On: Iba1017c33b1cd4d56a3ee8824d851b38cfdbc2d3
|
|
When we create a pacemaker resource it must happen from a single node.
If it happens from multiple nodes an immediate error will be returned by
pcs.
For the pacemaker roles we enforce this by leveraging the recently
introduced <SERVICE_NAME_bootstrap_short_node_name> which gives us
the first hostname per-service, regardless of the role.
(introduced via I03e8685f939e8ae1fcd8b16883b559615042505d)
With this approach if a pacemaker service belongs to two different
roles (say role Controller on node A and role galera on node B), it
will only create the resource from one of the two and not both (which
would return an error).
Only setting Partial-Bug for this one, because it addresses the issue
from the pacemaker resource creation POV (which is always affected). But
the issue itself is a race that we're theoretically affected by since
the composable roles work landed. While I have tried to fix the more
general case in previous attempts, I think it is best if we start a
discussion on how to fix it, because each approach has a bunch of
potential drawbacks and is quite invasive on how we do things. A
discussion slot for this has been proposed for the Atlanta PTG.
Change-Id: I662398cab60d523d204b57a5674ca8f5c0f2e68a
Partial-Bug: #1615983
|
|
Removed redundant 'the'
Change-Id: Ie2051f35ec1e7010423c46084f5512c02af85f33
|
|
By default galera-monitor xinetd is binding on all the interfaces.
That means that the port 9200 is exposed on the external network.
Because haproxy is using the same network for the backend and the
check we can reuse it for the xinetd binding.
Change-Id: If1a50515593e81f46d67309bdeecbe84c1d0ebe4
|
|
|
|
Puppet 4 ordering make things more strict in catalog, which is good.
Resources have to be orchestrated or Puppet will take them in the order
they are found in catalog.
This patch makes sure we create MySQL users only when Galera is actually
ready.
Closes-Bug: #1645787
Change-Id: I536a1a128c3a7eca49bcc4f34a1307bcd60b029e
|
|
With the landing of HA NG in Newton we can actually remove the
pacemaker profiles we do not need. The only ones that are being
used in one form or the other are:
$ grep -ir services\/pacemaker environments | awk '{ print $3 }' | sort | uniq
../puppet/services/pacemaker/cinder-backup.yaml
../puppet/services/pacemaker/cinder-volume.yaml
../puppet/services/pacemaker/database/mysql.yaml
../puppet/services/pacemaker/database/redis.yaml
../puppet/services/pacemaker/haproxy.yaml
../puppet/services/pacemaker/manila-share.yaml
../puppet/services/pacemaker/rabbitmq.yaml
../puppet/services/pacemaker.yaml
The only exception is profile/pacemaker/database/mongodbvalidator
because it is included by profile/base/database/mongodb.pp
Change-Id: I80c8559bb2d915385bcc20ae71fe144ddd6591c1
|
|
The current redis file descriptor limit is 4096 because of two reasons:
- It is run via the redis user
- It is not started via systemd which has explicit LimitNOFILE set to
10240 (which matches the default configuration of maximum 10000
clients)
Create an /etc/security/limits.d/redis.conf file in order to increase
the fd limit value With this change we correctly get the following
limits:
[root@overcloud-controller-0 ~]# pcs status |grep -A2 redis
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-controller-2 ]
Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
[root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open
Max open files 10240 10240 files
Previously this limit was set to 4096.
Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f
Closes-Bug: #1635334
|
|
|
|
remove_default_accounts is a mysql::server parameter that, set to True,
will execute some MySQL commands to cleanup MySQL defaults accounts
created by packaging.
In order to successfully run the commands, we need MySQL up and running,
which is not the case at step 1 but at step 2.
This patch make sure we run the commands at step 2 on pacemaker master
only.
No change for scenarios without Pacemaker.
Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e
Closes-Bug: #1633113
|
|
When we observe the 'stop timeout' values of pacemaker resources:
rabbitmq and redis, they are set to 90s. But for all other services, it
is set to 200s.
The overcloud deployment sometimes fails due to this with the error:
Error: Could not complete shutdown of rabbitmq-clone, 1 resources
remaining
Error performing operation: Timer expired
This patch updates the timeout for Redis and RabbitMQ to avoid this
error.
Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
|
|
We're not able to use FQDNs yet, so to work around this, we give
precedence to a "short name" list we'll get from t-h-t.
Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8
Related-Bug: #1628521
|
|
|
|
This patch moves the various DB syncs into the MySQL role.
Database creation needs to occur on the MySQL server to
avoid permission issues.
This patch also moves database creation to step 2 so we can
guarantee that all per-service databases exist at this time.
This avoids complex ordering needed during step 3 where
services, on different hosts, can run their own db sync's
in a distributed fashion.
Change-Id: I05cc0afa9373429a3197c194c3e8f784ae96de5f
Partial-bug: #1620595
|
|
having an actual name for that configuration will allow us to pass a
more proper name via t-h-t.
Change-Id: Iea4bd67074824e5dc6732fd7e408743e693d80b3
|
|
It used to be hardcoded that the bind-address was always coming from
the $::hostname fact. This is wrong, as it disregards where we have
configured the mysql address. This commit actually makes it
configurable, so we'll be able to set it via hieradata.
On the other hand, we use the hiera key that we already set
'mysql_bind_host' as a default; if, for some reason, that's
unavailable then we fall back to $::hostname.
Related-Bug: #1627060
Change-Id: I316acfd514aac63b84890e20283c4ca611ccde8b
|
|
Don't add brackets on mysql_bind_host parameter in Galera config.
Having brackets from this parameter works with old version of
Galera but not newest one.
So let's remove them at all, so we can safely upgrade Galera in RDO.
Change-Id: Ic904d4efda162f18ec8dffb91c2f383f54361f41
Closes-Bug: #1622755
|
|
|
|
|
|
Write restart flag file for services managed by Pacemaker into
/var/lib/tripleo/pacemaker-restarts directory. The name of the file must
match the name of the clone resource defined in pacemaker. The
post-puppet restart script will restart each service having a restart
flag file and remove those files.
This approach focuses on $pacemaker_master only (we don't want to
restart the pacemaker services 3 times when we have 3 controllers), so
it relies on the assumption that we're making the matching config
changes across the pacemaker nodes.
Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
|
|
profiles"
|
|
Prepare the pacemaker mysql manifest that galera_node_names will be an
array. Stay backwards compatible to handle comma-delimited-string too
and avoid a chicken-and-egg patch problem between t-h-t and
puppet-tripleo.
Change-Id: Ia0d9d59728c8771974bfbc486f4929b99a38e4fb
Partially-Implements: blueprint custom-roles
|
|
When we implemented the galera composable role we accidentally moved the
xinetd.d monitor service on the bootstrap node only. This meant that
haproxy believed that galera was down on the non-bootstrap nodes. A
shutdown of the bootstrap node meant that galera was effectively down
because haproxy would refuse to redirect the traffic to the
non-bootstrap node. Fix this by creating the
/etc/xinetd.d/galera-monitor on all controller nodes.
Change-Id: Ib5a06b3abbc32182476c2b0c81eb77a12821ad6b
|
|
Some lint checks are returning:
WARNING: line has more than 140 characters in puppet-tripleo profiles
This patch will remove those warnings by adding \'s
Change-Id: I19b56c93db82948fb0498a4c9851b522c81946f8
|
|
As we are staring to manually check overcloud services
the first step is to check that the puppet profiles
are all aligned.
Changes applied:
No logic added or removed in this submission.
Removed unused parameters.
Align header comments structure.
All profiles parameters sorted following:
"Mandatory params first sorted alphabetically
then optional params sorted alphabetically."
Note: Following submissions will check pacemaker,
cinder, mistral and redis services in the base profiles
as some of them has the $pacemaker_master parameter
defaulted to true.
Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
|
|
The openstack-core resource is not needed by the NG Pacemaker
architecture. It was moved into an isolated role by [1] so that
it could optionally be enabled when wanting the older architecture.
This submission removes the old openstack-core global resource.
1. I74a62973146c0261385ecf5fd3d06db51e079caa
Change-Id: I16a786ce167c57848551c7245f4344c382c55b3d
|