Age | Commit message (Collapse) | Author | Files | Lines |
|
When deploying a composable HA overcloud with a database role split off
to separate nodes we could observe a deployment failure due to galera
never starting up properly.
The reason for this was that instead of having the firewall rules for
the galera bundle applied (i.e. those with the extra control-port for
the bundle), we would see the firewall rules for the BM galera service.
E.g. we would see the following on the host:
tripleo.mysql.firewall_rules: {
104 mysql galera: {
dport: [ 873, 3306, 4444, 4567, 4568, 9200 ]
Instead of the correct mysq bundle firewall rules:
tripleo.mysql.firewall_rules:
104 mysql galera-bundle:
dport: [ 873, 3123, 3306, 4444, 4567, 4568, 9200 ]
The reason for this is the following piece of code in
https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/clustercheck.yaml#L62:
...
MysqlPuppetBase:
type: ../../../puppet/services/pacemaker/database/mysql.yaml
properties:
EndpointMap: {get_param: EndpointMap}
ServiceData: {get_param: ServiceData}
ServiceNetMap: {get_param: ServiceNetMap}
DefaultPasswords: {get_param: DefaultPasswords}
RoleName: {get_param: RoleName}
RoleParameters: {get_param: RoleParameters}
outputs:
role_data:
description: Containerized service clustercheck using composable services.
value:
service_name: clustercheck
config_settings: {get_attr: [MysqlPuppetBase, role_data, config_settings]}
logging_source: {get_attr: [MysqlPuppetBase, role_data, logging_source]}
...
Depending on the ordering of the clustercheck service within the role
(before or after the mysql service), the above code will override the
tripleo.mysql.firewall_rules with the wrong rules because we derive from
puppet/services/... which contain the BM firewall rules.
Let's just switch to derive from the docker service so we do not risk
getting the wrong firewall rules during the map_merge.
Tested this change successfully on a composable HA with split-off DB
nodes.
Change-Id: Ie87b327fe7981d905f8762d3944a0e950dbd0bfa
Closes-Bug: #1728918
(cherry picked from commit 3df6a4204a85b119cd67ccf176d5b72f9e550da6)
|
|
stable/pike
|
|
|
|
Adds update_tasks for the minor update workflow. These will be
collected into playbooks during an initial 'update init' heat
stack update and then invoked later by the operator as ansible
playbooks.
Current understanding/workflow:
Step=1: stop the cluster on the updated node
Step=2: Pull the latest image and retag the it pcmklatest
Step=3: yum upgrade happens on the host
Step=4: Restart the cluster on the node
Step=5: Verification: test pacemaker services are running.
https://etherpad.openstack.org/p/tripleo-pike-updates-upgrades
Related-Bug: 1715557
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com>
Change-Id: I101e0f5d221045fbf94fb9dc11a2f30706843806
(cherry picked from commit a953bda0ae615dc44d3e8a70aa7ab0160e26f3af)
|
|
The services that docker depends on, have logging_sources and logging_groups;
but those are not set on the docker outputs so they are not used when dockers
are deployed.
Added logging_source & logging_groups as docker optional parameters in
tools/yaml-validate.py
Closes-Bug: #1718110
Change-Id: I8795eaf4bd06051e9b94aa50450dee0d8761e526
(cherry picked from commit 5dbe1121e98a794ec6a6387ff56ee34314177567)
|
|
The containerized galera service generates a galera.cnf which uses
short hostname to identify itself rather than the fqdn from the
mysql_network (e.g. overcloud-x.internalapi.cloudname).
This breaks when internal TLS is in use, because the mysql certificate
does not reference this short hostname.
Fix the appropriate hiera parameter to make it behave like the
non-containerized galera service.
Change-Id: I904cde38f2baeddab5178e8ad48d34a0c73629af
Closes-Bug: #1719599
(cherry picked from commit e10aa591dc9155a2746df01279c4ba4f2133fd17)
|
|
stable/pike
|
|
This wrapper binary spawns the HAproxy daemon and implements a
coordinated HAproxy restart on SIGHUP.
From a service's perspective, this allows reloading the HAProxy
configuration with minimal service disruption, i.e. without stopping
and restarting the HAProxy container.
Closes-Bug: #1717521
Change-Id: Ib3ef0c0bcf1a8151e179ff4d7509cf0d6b3ac5a1
(cherry picked from commit 91cd44cd7266c15ce07fafbee9d2e33f226096ba)
|
|
During the bootstrap of the mariadb database, galera replication
must be disabled while the users credentials are being set up. This
is done by setting wsrep-provider=none when starting mysqld_safe.
Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b already disabled it
when the clustercheck credentials are being set up, but Kolla also
start a temporary server for setting up the root password.
Disable the setting directly at the end of the mysql.cnf in the
running container. That way, the default setting from galera.cnf will
be overriden, all mysqld_safe calls will disable WSREP and the setting
will stay ephemeral.
Change-Id: If14e22992b46a35a05a16a9db5ecb360ea13df8f
Closes-Bug: #1717250
(cherry picked from commit b0f50db80b10e9cd6263c4d6b3ca8dd818b658ba)
|
|
This change allows running the major upgrade composable docker
steps multiple times by not trying to delete the pacemaker resources
if they're not reported as started or in master state.
Closes-bug: 1716031
Depends-On: I8da03f5c4a6d442617b81be5793a9724cc8842bf
Change-Id: Ifcf9de8c82550a90a9fb118052d43fdbcdc6ca7e
(cherry picked from commit 64d7be1e3d4552e06cbc53f788572e530cc5c3bb)
|
|
Add a retry when the pacemaker_resource command
wasn't apply correctly, more info here:
https://bugzilla.redhat.com/show_bug.cgi?id=1482116
This is the same approach puppet-pacemaker uses
and provides eventual consistency when multiple
nodes change the cluster CIB concurrently.
This change depends-on :
https://review.gerrithub.io/375982
The return code is not available in the current
ansible-pacemaker package.
Change-Id: I8da03f5c4a6d442617b81be5793a9724cc8842bf
(cherry picked from commit e92430d8d03fc2ce2d0ce192b96209f2c5c04169)
|
|
Redis does not have TLS out of the box. Let's use a proxy container for
TLS termination.
This commit enables redis TLS proxy for the HA deployment.
bp tls-via-certmonger
Change-Id: I45e539872a03878337def33c681c4577c1a5629e
(cherry picked from commit c6d8df01d7aa8b44af9ac152b3bb08f07e2e02b7)
|
|
stable/pike
|
|
It's being mounted on the actual haproxy container, but not the init
one.
Change-Id: I66b69e0bb3642dbfeec767ef5216d515786b5b19
Closes-Bug: #1715132
(cherry picked from commit 03622e89ac3037b4d69d913586823e689b210688)
|
|
Depending on the version of mariadb/galera installed the mysql_bootstrap
command might fail. With the following unrevealing error:
openstack-mariadb-docker:2017-08-28.10 "bash -ec 'if [ -e /v" 3 hours ago Exited (124) 3 hours ago
The timeout is actually due to the fact that the following snippets does
not complete within 60 seconds:
"""
if [ -e /var/lib/mysql/mysql ]; then exit 0; fi
kolla_start
mysqld_safe --skip-networking --wsrep-on=OFF &
timeout ${DB_MAX_TIMEOUT} /bin/bash -c ''until mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" ping 2>/dev/null; do sleep 1; done''
mysql -uroot -p"${DB_ROOT_PASSWORD}" -e "CREATE USER ''clustercheck''@''localhost'' IDENTIFIED BY '${DB_CLUSTERCHECK_PASSWORD}'';"
mysql -uroot -p"${DB_ROOT_PASSWORD}" -e "GRANT PROCESS ON *.* TO ''clustercheck'
"""
The problem is that with older mariadb versions:
galera-25.3.16-3.el7ost.x86_64
mariadb-5.5.56-2.el7.x86_64
The mysqld_safe process starts in galera mode (as opposed as to single
local mode):
170830 17:03:05 [Note] WSREP: Start replication
170830 17:03:05 [Note] WSREP: GMCast version 0
...
170830 17:03:05 [ERROR] WSREP: wsrep::connect() failed: 7
170830 17:03:05 [ERROR] Aborting
That means that even though we specified --wsrep-on=OFF it is still
starting in cluster mode. Let's add the extra --wsrep-provider=none
which older versions required.
Let's also add a '-x' to this transient container as that
would have helped a bit because we would have understood right away
that it was mysqld_safe that was not starting. I tested this
successfully on an environment that showed the problem. The new
option is still accepted by newer DB versions in any case.
Closes-Bug: #1714057
Change-Id: Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b
Co-Authored-By: Mike Bayer <mbayer@redhat.com>
(cherry picked from commit c19968ca852ab608513fe692aab958af25276220)
|
|
ovn-dbs pacemaker bundle resources are created for supporting
Master/Slave HA. puppet-tripleo already supports creating
ovn-dbs bundle resources. The heat template added in this patch
makes use of this.
Closes-bug: #1699085
Change-Id: I23c2d312cfb144f9afc14f0982a92670dc29d74c
(cherry picked from commit 444a39f5983e71e3222b6b7f8f523fce60aeece7)
|
|
Pacemaker puppet module takes care of mounting /etc/ceph into
manila-share container (I23b6890b4cf7f1e6fe84b6be280dde82218275fc).
Change-Id: I1026b2436275b17cfe3ac85192d42c5268f0a630
Related-To: I23b6890b4cf7f1e6fe84b6be280dde82218275fc
(cherry picked from commit 0d8040ca33d42dbb7e06162f2b659ff6cbc0316f)
|
|
We need to tag the HA containers with a special tag so
that the RA definition never changes. We do this step in THT
as opposed to puppet because we need to guarantee
that all images are tagged on all nodes *before* step 2 where the bundle
gets created.
NB: Getting the image name without the tag will require some more
yaql work to get all the cases right. Right now this works only
if we enforce that the image has a ':tag' at the end of the name.
So far this is always the case. If things change we will need to
amend this code.
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com>
Change-Id: I362e6cf26fba77d3f949b7d2fc4b35a3eab9087e
|
|
|
|
|
|
|
|
This service allows configuring and deploying manila-share
containers in a HA overcloud managed by pacemaker.
The containers are managed and run by pacemaker. Pacemaker runs the
standard Kolla image but overrides the initial command so that
it explicitely calls manila-share. This way, we shield ourselves
from any unexpected future change in Kolla.
This container needs to use the 'docker_config' section to invoke
puppet (as opposed to 'docker_puppet_tasks'), because due to the HA
composability each resource creation needs to happen on the bootstrap
node of that service and 'docker_puppet_tasks' will only run on the
controller/primary role.
Based on work done in fdb233e64e3d78014dd7e351abfed5aec5035866
Partial-Bug: #1668922
Change-Id: Ifa94c506db5eb667690a19d594115a93d2a790b2
Depends-On: I797eea2f7788f65411964ccb852b5707e916416f
|
|
|
|
|
|
In non-containerized deployments, Galera can be configured to use TLS
for gcomm group communication when enable_internal_tls is set to true.
Fix the metadata service definition and update the Kolla configuration
to make gcomm use TLS in containers, if configured.
bp tls-via-certmonger-containers
Change-Id: Ibead27be81910f946d64b8e5421bcc41210d7430
Co-Authored-By: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Closes-Bug: #1708135
Depends-On: If845baa7b0a437c28148c817b7f94d540ca15814
|
|
In non-containerized deployments, HAProxy can be configured to use TLS for
proxying internal services.
Fix the creation of the of the haproxy bundle resource to enable TLS when
configured. The keys and certs files are all passed as configuration files and
must be copied by Kolla at container startup.
For the time being, disable the use of the CRL file until we find a means
of restarting the containerized HAProxy service when that file expires.
Change-Id: If307e3357dccb7e96bdb80c9c06d66a09b55f3bd
Depends-On: I4b72739446c63f0f0ac9f859314a4d6746e20255
Closes-Bug: #1709563
|
|
In non-containerized deployments, RabbitMQ can be configured to use TLS for
serving and mirroring traffic.
Fix the creation of the rabbitmq bundle resource to enable TLS when configured.
The key and cert are passed as other configuration files and must be copied by
Kolla at container startup.
Change-Id: I8af63a1cb710e687a593505c0202d717842d5496
Depends-On: Ia64d79462de7012e5bceebf0ffe478a1cccdd6c9
Closes-Bug: #1709558
|
|
|
|
In HA overclouds, the helper script clustercheck is called by HAProxy to poll
the state of the galera cluster. Make sure that a dedicated clustercheck user
is created at deployment, like it is currently done in Ocata.
The creation of the clustercheck user happens on all controller nodes, right
after the database creation. This way, it does not need to wait for the galera
cluster to be up and running.
Partial-Bug: #1707683
Change-Id: If8e0b3f9e4f317fde5328e71115aab87a5fa655f
|
|
Services that access database have to read an extra MySQL configuration file
/etc/my.cnf.d/tripleo.cnf which holds client-only settings, like client bind
address and SSL configuration. The configuration file is thus used by
containerized services, but also by non-containerized services that still
run on the host.
In order to generate that client configuration file appropriately both on the
host and for containers, 1) the MySQLClient service must be included by the
role; 2) every containerized service which uses the database must include the
mysql::client profile in the docker-puppet config generation step.
By including the mysql::client profile in each containerized service, we ensure
that any change in configuration file will be reflected in the service's
/var/lib/config-data/{service}, and that paunch will restart the service's
container automatically.
We now only rely on MySQLClient from puppet/services, to make it possible to
generate /etc/my.cnf.d/tripleo.cnf on the host, and to set the hiera keys that
drive the generation of that config file in containers via docker-puppet.
We include a new YAML validation step to ensure that any service which depends
on MySQL will initialize the mysql::client profile during the docker-puppet
step.
Change-Id: I0dab1dc9caef1e749f1c42cfefeba179caebc8d7
|
|
Once an Ocata overcloud is upgraded to Pike, clustercheck should only be
running in a dedicated container, and xinetd should no longer manage it on
the host. Fix the mysql upgrade_task accordingly.
Change-Id: I01acacc2ff7bcc867760b298fad6ff11742a2afb
Closes-Bug: #1706612
|
|
|
|
This is required when the bundles run on pacemaker remote nodes
otherwise the cluster won't be able to connect to the control-ports
of each bundle. The only services that need this are rabbit, redis and
galera because those run pacemaker_remote inside the container
(A/P resources and haproxy do not)
Change-Id: I6a56d79319ef3d14973a0586dcda4d523adda7aa
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
|
|
Adds upgrade_tasks to remove the pacemaker resources using the
ansible-pacemaker module.
Resources are disabled and removed in step2 (called only on
bootstrap node) and then the cluster stop is moved to step3
The existing systemd/service call is kept but only to disable
services after they are disabled/deleted from the cluster.
Related-Bug: 1701485
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Ia597d240ea5834c50a8f6c4fac0b6ed417b8535c
|
|
|
|
This removes the default container names from all the templates
and uses a single environment file to specify the full container
name and registry from which to pull. Also does away with most
of DockerNamespace.
Change-Id: Ieaedac33f0a25a352ab432cdb00b5c888be4ba27
Depends-On: Ibc108871ebc2beb1baae437105b2da1d0123ba60
Co-Authored-By: Dan Prince <dprince@redhat.com>
Co-Authored-By: Steve Baker <sbaker@redhat.com>
|
|
Makes it possible to resolve network subnets within a service
template; the data is transported into a new property ServiceData
wired into every service which hopefully is generic enough to
be extended in the future and transport more data.
Data can be consumed in service templates to set config values
which need to know what is the subnet where a deamon operates (for
example the Ceph Public vs Cluster network).
Change-Id: I28e21c46f1ef609517175f7e7ee19e28d1c0cba2
|
|
|
|
haproxy needs the deployed SSL cert file to function when TLS is
enabled.
It is also required for the docker-puppet haproxy container since the
haproxy puppet module uses a validate_cmd to check the generated config
file is valid that fails when the required SSL cert is not present.
There is no clean way to disable this feature [1] so we need to bind
mount the cert into the container.
This commit applies the same change that was applied in
Id2df144b678769def204961236624091d4e5c457 for the non-ha case.
[1] https://github.com/puppetlabs/puppetlabs-haproxy/blob/4753ea5b2506ee093e9b4c8af6e91201d476d426/manifests/config.pp#L53-L57
Change-Id: I93e1ee86197bcf271f18a62a27c2f350ed3966ea
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
|
|
This solves a problem with bind-mounts when the containers are holding
files descriptors open.
At the same time this makes the template more robust to puppet changes
since new config files will be available in the containers without
needing to update the templates.
Partial-Bug: #1698323
Change-Id: Ia4ad6d77387e3dc354cd131c2f9756939fb8f736
|
|
This commit consistently defines a heat template parameter in the form
of DockerXXXConfigImage where XXX represents the name of the
config_volume that is used by docker-puppet.
The goal is to mitigate hard to debug errors where the templates would
set different defaults for the image docker-puppet.py uses to run, for
the same config_volume name.
This fixes a couple of inconsistencies on the way.
Change-Id: I212020a76622a03521385a6cae4ce73e51ce5b6b
Closes-Bug: #1699791
|
|
|
|
|
|
The containerized HAproxy service can only specify steps to be run in
containers, i.e. it cannot runs the regular puppet steps on bare metal
at the same time. A side effect is that the dedicated HAproxy iptables
rules are no longer generated.
Update the docker_config step to fix the creation of iptables rules
for HAproxy and persist them on-disk as before.
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Closes-Bug: 1697387
Change-Id: Ib5a083ba3299a82645f1a0f9da0d482c6b89ee23
|
|
This service allows configuring and deploying cinder-volume
containers in a HA overcloud managed by pacemaker.
The containers are managed and run by pacemaker. Pacemaker runs the
standard Kolla image but overrides the initial command so that
it explicitely calls cinder-volume. This way, we shield ourselves
from any unexpected future change in Kolla.
This container needs to use the 'docker_config' section to invoke
puppet (as opposed to 'docker_puppet_tasks'), because due to the HA
composability each resource creation needs to happen on the bootstrap
node of that service and 'docker_puppet_tasks' will only run on the
controller/primary role.
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Partial-Bug: #1668920
Depends-On: I95ad4dd89b47396bea672813d87de35e64c04b2d
Change-Id: Ib6396219c3d9484c533f6f9995d565091a197bbb
|
|
This service allows configuring and deploying cinder-backup
containers in a HA overcloud managed by pacemaker.
The containers are managed and run by pacemaker. Pacemaker runs the
standard Kolla image but overrides the initial command so that
it explicitely calls cinder-backup. This way, we shield ourselves
from any unexpected future change in Kolla.
This container needs to use the 'docker_config' section to invoke
puppet (as opposed to 'docker_puppet_tasks'), because due to the HA
composability each resource creation needs to happen on the bootstrap
node of that service and 'docker_puppet_tasks' will only run on the
controller/primary role.
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Partial-Bug: #1668920
Depends-On: If53495ff75d4832cc6be80dc0dc9bd540ab6583b
Change-Id: Ieec823e10667592bd775bb2642f0c3790a83e85f
|
|
|
|
This service allows configuring and deploying Redis containers
in a HA overcloud managed by pacemaker.
The containers are managed and run by pacemaker. Inside there is
pacemaker_remote which will invoke the resource agent managing galera.
The resources themselves are created via puppet-pacemaker inside a
short-lived container used for this purpose (mysql_init_bundle).
This container needs to use the 'docker_config' section to invoke
puppet (as opposed to 'docker_puppet_tasks'), because due to the HA
composability each resource creation needs to happen on the bootstrap
node of that service and 'docker_puppet_tasks' will only run on the
controller/primary role.
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Closes-Bug: #1692924
Depends-On: Ia1131611d15670190b7b6654f72e6290bf7f8b9e
Change-Id: Ie045954fcc86ef2b3e4562b6f012853177f03948
|
|
|
|
|