Age | Commit message (Collapse) | Author | Files | Lines |
|
puppet run on never fails, even when it should, since we moved
to the ansible way of applying it. The reason is the current following code:
- name: Run puppet host configuration for step {{step}}
command: >-
puppet apply
--modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules
--logdest syslog --logdest console --color=false
/var/lib/tripleo-config/puppet_step_config.pp
The above is missing the --detailed-exitcodes switch and so puppet will never
really error out on us and the deployment will keep on running all the
steps even though a previous puppet manifest might have failed. This
cause extra hard-to-debug failures.
Initially the issue was observed on the puppet host runs, but this
parameter is missing also from docker-puppet.py, so let's add it there
as well as it makes sense to return proper error codes whenever we call
puppet.
Besides this being a good idea in general, we actually *have* to do it
because puppet does not fail correctly without this option due to the
following puppet bug:
https://tickets.puppetlabs.com/browse/PUP-2754
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Ie9df4f520645404560a9635fb66e3af42b966f54
Closes-Bug: #1723163
(cherry picked from commit 11e599d116cfbf7df4dcd0e7670c3405a4224c1a)
|
|
into stable/pike
|
|
|
|
|
|
When deploying a composable HA overcloud with a database role split off
to separate nodes we could observe a deployment failure due to galera
never starting up properly.
The reason for this was that instead of having the firewall rules for
the galera bundle applied (i.e. those with the extra control-port for
the bundle), we would see the firewall rules for the BM galera service.
E.g. we would see the following on the host:
tripleo.mysql.firewall_rules: {
104 mysql galera: {
dport: [ 873, 3306, 4444, 4567, 4568, 9200 ]
Instead of the correct mysq bundle firewall rules:
tripleo.mysql.firewall_rules:
104 mysql galera-bundle:
dport: [ 873, 3123, 3306, 4444, 4567, 4568, 9200 ]
The reason for this is the following piece of code in
https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/clustercheck.yaml#L62:
...
MysqlPuppetBase:
type: ../../../puppet/services/pacemaker/database/mysql.yaml
properties:
EndpointMap: {get_param: EndpointMap}
ServiceData: {get_param: ServiceData}
ServiceNetMap: {get_param: ServiceNetMap}
DefaultPasswords: {get_param: DefaultPasswords}
RoleName: {get_param: RoleName}
RoleParameters: {get_param: RoleParameters}
outputs:
role_data:
description: Containerized service clustercheck using composable services.
value:
service_name: clustercheck
config_settings: {get_attr: [MysqlPuppetBase, role_data, config_settings]}
logging_source: {get_attr: [MysqlPuppetBase, role_data, logging_source]}
...
Depending on the ordering of the clustercheck service within the role
(before or after the mysql service), the above code will override the
tripleo.mysql.firewall_rules with the wrong rules because we derive from
puppet/services/... which contain the BM firewall rules.
Let's just switch to derive from the docker service so we do not risk
getting the wrong firewall rules during the map_merge.
Tested this change successfully on a composable HA with split-off DB
nodes.
Change-Id: Ie87b327fe7981d905f8762d3944a0e950dbd0bfa
Closes-Bug: #1728918
(cherry picked from commit 3df6a4204a85b119cd67ccf176d5b72f9e550da6)
|
|
|
|
stable/pike
|
|
into stable/pike
|
|
In 59e29b17f4a9f5f65b6f8a7b8e82ef6426d8a51 we forgot to
add tags to the Ansible tasks to remove the baremetal
cron jobs at step 2.
(cherry picked from commit 1128271b460b120a2a59eac3df95082c55e554d0)
Change-Id: I23fb134b88336ebc4eb1a97a69a2d73d4ef0edb2
Related-bug: #1708466
|
|
We were relying on the sysconfig options to set the memcached log file,
however, this is not happening, as the redirection is being taken as an
option and ends up being ignored by the memcached command. So instead,
we set the redirection in the container template.
Change-Id: Ic94e3fd7884d518eb9558c53acdc6b294823cd0a
Closes-Bug: #1720183
(cherry picked from commit ca1fc5848661aacbf14b52e33879190c133c8e48)
|
|
|
|
We used to bind-mount /var/log/memcached.log, but this resulted in the
file being createdin the memcached container as a directory, since this
file didn't exist.
This commit takes the approach of other containers and gets the logs to
a memcached directory in /var/log/containers.
Change-Id: I926b65fa557ad56b4faa2be34452b58f7b01247a
Closes-Bug: #1720183
(cherry picked from commit 5020f38301a9a0a70f34878196250e24fc639dec)
|
|
The format which ceph-ansible uses to describe the list of pools
to be created in the cluster is different from the one which
puppet-ceph uses; this commit updates the description and the
the docker templates accordingly.
Change-Id: I1e5b2c3cbf6ae02c19a2275ca119fed6e173319d
Closes-Bug: #1720373
(cherry picked from commit c10aa7a0439fb7d8e8e964e75d73f3cbb54aa9ec)
|
|
Enable Cinder as a backend for Glance by adding 'cinder' to the list of
allowed choices for the GlanceBackend heat parameter.
Update the glance-api docker configuration to allow the feature to work.
This is necessary because the feature uses iSCSI, which requires additional
privileges.
Closes-Bug: #1728409
Depends-On: I850047e32f3608b3ce490e52e2e540695cb1a4ff
Change-Id: I42241747de931103a04aa5ee2ed18fd46197d183
(cherry picked from commit e828e8c7bb2e890b243faa767992226dc270bb6f)
|
|
The mistral-api container image we use doesn't have the necessary
packages to run via wsgi and this cause puppet to error with:
"Notice: /Stage[main]/Mistral::Wsgi::Apache/Openstacklib::Wsgi::Apache[mistral_wsgi]/File[mistral_wsgi]: Dependency File[/var/www/cgi-bin/mistral] has failures: true",
Fallback to eventlet mistral-api for the time being until we get
a usable mistral-api image.
Change-Id: Ic10c579aa3b6d0d6a01f120669be3b5dcc5efcda
Depends-On: I54627f1c5a8867738a55bee42075bb6087830c61
Related-Bug: #1724607
(cherry picked from commit e158acb14c4ed92be1a5b961ff1e8ff99b1a5ae3)
|
|
The Kolla Dockerfile sets the permissions for /etc/openstack-dashboard/
to horizon:horizon. We need this to be readable by the apache user
as the horizon user is not the user in which httpd runs with. We may
want to consider fixing this in the upstream Dockerfile instead, e.g.
checking if we're using centos/rhel and changing the permissions that
way. I'm not sure why it's set to horizon:horizon upstream, and I'm keen
not to break any existing functionality that relies on the horizon based
permissions.
Closes-Bug: #1723125
Change-Id: If5feebae38f7fdfffa60bfaedc4521f676006484
(cherry picked from commit fd657aa4e68de7ad239a88525b5ae343acd3bf80)
|
|
into stable/pike
|
|
configuration" into stable/pike
|
|
Due to missing puppet invocation with --detailed-exitcodes we ignored
a large amount of puppet errors during deploy. Swift storage fails
during the puppet_config step with the following error:
Debug: /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Package[swift-object]: Not tagged with file, file_line, concat, augeas, cron, swif t_proxy_config, swift_config, swift_container_config, swift_container_sync_realms_config, swift_account_config, swift_object_config, swift_object_expirer_con fig, rsync::server
Debug: /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Package[swift-object]: Resource is being skipped, unscheduling all events
Debug: Executing: '/usr/bin/systemctl is-active xinetd'
Debug: Executing: '/usr/bin/systemctl is-enabled xinetd'
Debug: Executing: '/usr/bin/systemctl unmask xinetd'
Debug: Executing: '/usr/bin/systemctl start xinetd'
Debug: Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u xinetd --no-pager
Debug: Executing: 'journalctl -n 50 --since '5 minutes ago' -u xinetd --no-pager'
Error: Systemd start for xinetd failed!
The problem is that by using the rsync::server tag we end up including
the xinetd class automatically which will try to start a service inside
a container. By nooping the xinetd class, we're able avoid systemctl
calls and have a successfuly deployment. The resulting swift_rsync
container seems to work correctly:
[root@overcloud-controller-0 ~]# docker exec -it swift_rsync /bin/bash -c "ps -axuwf"
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 10 0.0 0.0 47444 1624 pts/1 Rs+ 18:16 0:00 ps -axuwf
root 1 0.0 0.0 188 4 ? Ss 17:27 0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start
root 6 0.0 0.0 11036 924 ? Ss 17:27 0:00 /usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf
[root@overcloud-controller-0 ~]# docker logs swift_rsync 2>&1|tail -n4
INFO:__main__:Deleting /etc/rsyncd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/rsyncd.conf to /etc/rsyncd.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf'
Change-Id: I5e43e8fd61e002d2acc56a7de52e6aae64ab60be
Closes-Bug: #1723463
(cherry picked from commit b5eeeab73e12efecc86ea7deebc105eee0739510)
|
|
Some services only mount this directory, not /var/lib/config-data/$service
so handle this case in the docker-puppet code that maps the mounted
volumes to the services when adding the config hash to the container
environment.
Change-Id: I3bdb7609f322458584ac9597ffbfefb057b84646
Closes-Bug: #1720208
(cherry picked from commit 3a932b056914d148fa460b8890fc0e631c817a40)
|
|
We should not pass any hardcoded value for monitor_interface and
rely on monitor_address_block only instead.
Also removes journal_collocation which is not consumed by
newer (and stable) builds of ceph-ansible.
Change-Id: Idf213a1f43a66506f76d07102f122839b5096948
Closes-Bug: #1715246
(cherry picked from commit 3e90ae3df5a7c5491672254733ceac163b34a395)
|
|
Horizon needs write access to its log file and read permissions for all
of its configuration files.
The code that was supposed to set the permissions did it in the wrong
directory.
Closes-Bug: #1719590
Co-Authored-By: Martin Andre <m.andre@redhat.com>
Change-Id: I0c125fac38cd186f98b9bc69bcc570f669eb6de1
(cherry picked from commit 960d7ff1025a568343aa5ae5ef95386306de8cab)
|
|
Since, user ID on host and container differs, image-create
with NFS backend was failing with permission error. But even after
resolving permission error[1] the image was not getting created
on the nfs share as the NFS endpoint is not mounted successfully on
the container via puppet. This will be fixed by [2].
Now, adding two below changes in this patch,
[1]. chown glance:glance /var/lib/glance.
[2]. Proposing this solution to mount NFS endpoint on the host instead
of mounting it on glance container, because mounting in container
does not work as explained in LP Bug.
Closes-Bug: 1708629
Change-Id: Ib60cb0d179e7c117dc26440746154136aa9d163e
(cherry picked from commit
ed11f8ebcfbaf1fbbebb4c83e3201e462fee14ee)
|
|
|
|
stable/pike
|
|
|
|
|
|
|
|
Adds a UpgradeRemoveUnusedPackages param to use
in the ansible when conditional for the removal
Adds package removal to step2 right after a service
is stopped and disabled on step2. Package updates
happen in step3 so ideally remove before that.
The package removal task has ignore_errors true
so dependencies or other issue removing packages will
not fail the upgrade workflow.
Also adds this to the upgrade environment files
for visibility and defaulting false
Change-Id: Ie4e4a2d41f7752c5a13507a7c15c6f68e203cfca
Related-Bug: 1701501
(cherry picked from commit ce0ef2fa207698c1ae61c1620fe3c5e8d1c7bfca)
|
|
Adds update_tasks for the minor update workflow. These will be
collected into playbooks during an initial 'update init' heat
stack update and then invoked later by the operator as ansible
playbooks.
Current understanding/workflow:
Step=1: stop the cluster on the updated node
Step=2: Pull the latest image and retag the it pcmklatest
Step=3: yum upgrade happens on the host
Step=4: Restart the cluster on the node
Step=5: Verification: test pacemaker services are running.
https://etherpad.openstack.org/p/tripleo-pike-updates-upgrades
Related-Bug: 1715557
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com>
Change-Id: I101e0f5d221045fbf94fb9dc11a2f30706843806
(cherry picked from commit a953bda0ae615dc44d3e8a70aa7ab0160e26f3af)
|
|
|
|
Currently the default Sensu check defined in docker/services/sensu-client.yaml
reports only first unhealthy container. This patch changes the check output
to contain list of all unhealthy containers.
Change-Id: I0a934367ef22984d9091d160ec7105092edc8149
Closes-Bug: #1720972
(cherry picked from commit 9b016c9f3fbe9552497737974b9928d1dff4d299)
|
|
Currently health check for mysql container reports unhealthy container
because there is no 'mysql' user created. This patch creates the user
during mysql_bootstrap without any permission, just to allow health
check to connect to DB and run 'select 1'.
Change-Id: Iab26da0d30939b219189d4e7beb2a61d456ab7c3
Closes-Bug: #1718944
(cherry picked from commit 3a9cfaa992e92423461d64f84d701336322bdd10)
|
|
The services that docker depends on, have logging_sources and logging_groups;
but those are not set on the docker outputs so they are not used when dockers
are deployed.
Added logging_source & logging_groups as docker optional parameters in
tools/yaml-validate.py
Closes-Bug: #1718110
Change-Id: I8795eaf4bd06051e9b94aa50450dee0d8761e526
(cherry picked from commit 5dbe1121e98a794ec6a6387ff56ee34314177567)
|
|
Change-Id: Ia350e4899aa499cf27efffd9d2243e7e95fa1d65
Depends-On: I60796063fa9ebe0d98030fb982d22dabe2593ea0
Depends-On: I585b6877074353b5de62e5efaabfbe62432c473d
(cherry picked from commit f37fe4f903f429b43d22b485c29547f576ec7269)
|
|
The containerized galera service generates a galera.cnf which uses
short hostname to identify itself rather than the fqdn from the
mysql_network (e.g. overcloud-x.internalapi.cloudname).
This breaks when internal TLS is in use, because the mysql certificate
does not reference this short hostname.
Fix the appropriate hiera parameter to make it behave like the
non-containerized galera service.
Change-Id: I904cde38f2baeddab5178e8ad48d34a0c73629af
Closes-Bug: #1719599
(cherry picked from commit e10aa591dc9155a2746df01279c4ba4f2133fd17)
|
|
In Ocata all live-migration over ssh is performed on the default ssh port (22).
In Pike the containerized live-migration over ssh is on port 2022 as the
docker host's sshd is using port 22.
To allow live migration during upgrade we need to temporarily pin the Pike
computes to port 22 and in the final converge we can switch over to port 2022.
This also changes the default port to 2022 for baremetal computes in Pike to
enable live-migration between baremetal and containerized computes.
Change-Id: Icb9bfdd9a99dc1dce28eb95c50a9a36bffa621b1
Depends-On: I0b80b81711f683be539939e7d084365ff63546d3
Closes-Bug: 1714171
(cherry picked from commit 17fd16b9f266e1aa67bf03ebdf309e89d668ada2)
|
|
Previously it was mistakenly replacing the contents because we
do not do deep merge.
Change-Id: I145feb0208f135da7c71694ebcecd937244d66b1
Closes-Bug: #1719919
(cherry picked from commit 17416dcfc56c5148ccc9ab40297f99adfdcd085b)
|
|
|
|
We missed to set the pgp_num default in ceph.conf, causing WARNING
messages like:
pool default.rgw.buckets.data pg_num 32 > pgp_num 8
Also increases the default pg_num to 128 which is the recommended
value for less than 5 OSDs [1].
1. http://docs.ceph.com/docs/master/rados/operations/placement-groups/
Change-Id: Ibd9fb23e04576e95e24af58f856663397886a947
Closes-Bug: #1718173
(cherry picked from commit 58e6f6533a04eddd2dc897d890737bbccde4ea7b)
|
|
stable/pike
|
|
stable/pike
|
|
This wrapper binary spawns the HAproxy daemon and implements a
coordinated HAproxy restart on SIGHUP.
From a service's perspective, this allows reloading the HAProxy
configuration with minimal service disruption, i.e. without stopping
and restarting the HAProxy container.
Closes-Bug: #1717521
Change-Id: Ib3ef0c0bcf1a8151e179ff4d7509cf0d6b3ac5a1
(cherry picked from commit 91cd44cd7266c15ce07fafbee9d2e33f226096ba)
|
|
During the bootstrap of the mariadb database, galera replication
must be disabled while the users credentials are being set up. This
is done by setting wsrep-provider=none when starting mysqld_safe.
Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b already disabled it
when the clustercheck credentials are being set up, but Kolla also
start a temporary server for setting up the root password.
Disable the setting directly at the end of the mysql.cnf in the
running container. That way, the default setting from galera.cnf will
be overriden, all mysqld_safe calls will disable WSREP and the setting
will stay ephemeral.
Change-Id: If14e22992b46a35a05a16a9db5ecb360ea13df8f
Closes-Bug: #1717250
(cherry picked from commit b0f50db80b10e9cd6263c4d6b3ca8dd818b658ba)
|
|
Running these daemons at step 5 should avoid seeing error messages in
the gnocchi-statsd log files on startup which starts at step4.
Change-Id: Idb82f864a2e1c623dab7a2a87054443036670453
Closes-bug: #1713182
(cherry picked from commit 9d8e496f3e8a825d48d9eba9aab540001bb780ea)
|
|
This change allows running the major upgrade composable docker
steps multiple times by not trying to delete the pacemaker resources
if they're not reported as started or in master state.
Closes-bug: 1716031
Depends-On: I8da03f5c4a6d442617b81be5793a9724cc8842bf
Change-Id: Ifcf9de8c82550a90a9fb118052d43fdbcdc6ca7e
(cherry picked from commit 64d7be1e3d4552e06cbc53f788572e530cc5c3bb)
|
|
Using the service_ prefix seems incoherent with its use in
service_config_settings (vs config_settings).
Change-Id: Ia39f181415bee0071409dabddfa0c5c312915e1f
(cherry picked from commit 09137304b98a02ed024c0288da907cfe35ca5fe1)
|
|
Add a retry when the pacemaker_resource command
wasn't apply correctly, more info here:
https://bugzilla.redhat.com/show_bug.cgi?id=1482116
This is the same approach puppet-pacemaker uses
and provides eventual consistency when multiple
nodes change the cluster CIB concurrently.
This change depends-on :
https://review.gerrithub.io/375982
The return code is not available in the current
ansible-pacemaker package.
Change-Id: I8da03f5c4a6d442617b81be5793a9724cc8842bf
(cherry picked from commit e92430d8d03fc2ce2d0ce192b96209f2c5c04169)
|
|
|
|
stable/pike
|