summaryrefslogtreecommitdiffstats
path: root/docker
AgeCommit message (Collapse)AuthorFilesLines
2017-11-08Add --detailed-exitcodes when running puppet via ansibleMichele Baldessari1-3/+11
puppet run on never fails, even when it should, since we moved to the ansible way of applying it. The reason is the current following code: - name: Run puppet host configuration for step {{step}} command: >- puppet apply --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --logdest syslog --logdest console --color=false /var/lib/tripleo-config/puppet_step_config.pp The above is missing the --detailed-exitcodes switch and so puppet will never really error out on us and the deployment will keep on running all the steps even though a previous puppet manifest might have failed. This cause extra hard-to-debug failures. Initially the issue was observed on the puppet host runs, but this parameter is missing also from docker-puppet.py, so let's add it there as well as it makes sense to return proper error codes whenever we call puppet. Besides this being a good idea in general, we actually *have* to do it because puppet does not fail correctly without this option due to the following puppet bug: https://tickets.puppetlabs.com/browse/PUP-2754 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Change-Id: Ie9df4f520645404560a9635fb66e3af42b966f54 Closes-Bug: #1723163 (cherry picked from commit 11e599d116cfbf7df4dcd0e7670c3405a4224c1a)
2017-11-08Merge "Fix /etc/openstack-dashboard/ permissions for access to *policy.json" ↵Zuul1-0/+6
into stable/pike
2017-11-08Merge "Enable Cinder as a backend for Glance" into stable/pikeZuul1-1/+15
2017-11-08Merge "Add tags to baremetal cron removal tasks" into stable/pikeZuul4-0/+4
2017-11-04Fix iptables rules override bug in clustercheck docker serviceMichele Baldessari1-1/+4
When deploying a composable HA overcloud with a database role split off to separate nodes we could observe a deployment failure due to galera never starting up properly. The reason for this was that instead of having the firewall rules for the galera bundle applied (i.e. those with the extra control-port for the bundle), we would see the firewall rules for the BM galera service. E.g. we would see the following on the host: tripleo.mysql.firewall_rules: { 104 mysql galera: { dport: [ 873, 3306, 4444, 4567, 4568, 9200 ] Instead of the correct mysq bundle firewall rules: tripleo.mysql.firewall_rules: 104 mysql galera-bundle: dport: [ 873, 3123, 3306, 4444, 4567, 4568, 9200 ] The reason for this is the following piece of code in https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/clustercheck.yaml#L62: ... MysqlPuppetBase: type: ../../../puppet/services/pacemaker/database/mysql.yaml properties: EndpointMap: {get_param: EndpointMap} ServiceData: {get_param: ServiceData} ServiceNetMap: {get_param: ServiceNetMap} DefaultPasswords: {get_param: DefaultPasswords} RoleName: {get_param: RoleName} RoleParameters: {get_param: RoleParameters} outputs: role_data: description: Containerized service clustercheck using composable services. value: service_name: clustercheck config_settings: {get_attr: [MysqlPuppetBase, role_data, config_settings]} logging_source: {get_attr: [MysqlPuppetBase, role_data, logging_source]} ... Depending on the ordering of the clustercheck service within the role (before or after the mysql service), the above code will override the tripleo.mysql.firewall_rules with the wrong rules because we derive from puppet/services/... which contain the BM firewall rules. Let's just switch to derive from the docker service so we do not risk getting the wrong firewall rules during the map_merge. Tested this change successfully on a composable HA with split-off DB nodes. Change-Id: Ie87b327fe7981d905f8762d3944a0e950dbd0bfa Closes-Bug: #1728918 (cherry picked from commit 3df6a4204a85b119cd67ccf176d5b72f9e550da6)
2017-11-03Merge "Run containerized mistral-api eventlet" into stable/pikeZuul1-1/+35
2017-11-03Merge "Providing required priviledges to the mounted NFS volume" into ↵Zuul1-0/+23
stable/pike
2017-11-03Merge "Update CephPools format in the docker templates to fit ceph-ansible" ↵Zuul1-16/+4
into stable/pike
2017-11-01Add tags to baremetal cron removal tasksDan Prince4-0/+4
In 59e29b17f4a9f5f65b6f8a7b8e82ef6426d8a51 we forgot to add tags to the Ansible tasks to remove the baremetal cron jobs at step 2. (cherry picked from commit 1128271b460b120a2a59eac3df95082c55e554d0) Change-Id: I23fb134b88336ebc4eb1a97a69a2d73d4ef0edb2 Related-bug: #1708466
2017-11-01Force memcached container log to fileJuan Antonio Osorio Robles1-1/+9
We were relying on the sysconfig options to set the memcached log file, however, this is not happening, as the redirection is being taken as an option and ends up being ignored by the memcached command. So instead, we set the redirection in the container template. Change-Id: Ic94e3fd7884d518eb9558c53acdc6b294823cd0a Closes-Bug: #1720183 (cherry picked from commit ca1fc5848661aacbf14b52e33879190c133c8e48)
2017-11-01Merge "Fix permissions for dockerized horizon" into stable/pikeZuul1-1/+1
2017-10-30persist memcached logs in /var/log/containers/memcached/memcached.logJuan Antonio Osorio Robles1-3/+3
We used to bind-mount /var/log/memcached.log, but this resulted in the file being createdin the memcached container as a directory, since this file didn't exist. This commit takes the approach of other containers and gets the logs to a memcached directory in /var/log/containers. Change-Id: I926b65fa557ad56b4faa2be34452b58f7b01247a Closes-Bug: #1720183 (cherry picked from commit 5020f38301a9a0a70f34878196250e24fc639dec)
2017-10-30Update CephPools format in the docker templates to fit ceph-ansibleGiulio Fidente1-16/+4
The format which ceph-ansible uses to describe the list of pools to be created in the cluster is different from the one which puppet-ceph uses; this commit updates the description and the the docker templates accordingly. Change-Id: I1e5b2c3cbf6ae02c19a2275ca119fed6e173319d Closes-Bug: #1720373 (cherry picked from commit c10aa7a0439fb7d8e8e964e75d73f3cbb54aa9ec)
2017-10-29Enable Cinder as a backend for GlanceAlan Bishop1-1/+15
Enable Cinder as a backend for Glance by adding 'cinder' to the list of allowed choices for the GlanceBackend heat parameter. Update the glance-api docker configuration to allow the feature to work. This is necessary because the feature uses iSCSI, which requires additional privileges. Closes-Bug: #1728409 Depends-On: I850047e32f3608b3ce490e52e2e540695cb1a4ff Change-Id: I42241747de931103a04aa5ee2ed18fd46197d183 (cherry picked from commit e828e8c7bb2e890b243faa767992226dc270bb6f)
2017-10-26Run containerized mistral-api eventletMartin André1-1/+35
The mistral-api container image we use doesn't have the necessary packages to run via wsgi and this cause puppet to error with: "Notice: /Stage[main]/Mistral::Wsgi::Apache/Openstacklib::Wsgi::Apache[mistral_wsgi]/File[mistral_wsgi]: Dependency File[/var/www/cgi-bin/mistral] has failures: true", Fallback to eventlet mistral-api for the time being until we get a usable mistral-api image. Change-Id: Ic10c579aa3b6d0d6a01f120669be3b5dcc5efcda Depends-On: I54627f1c5a8867738a55bee42075bb6087830c61 Related-Bug: #1724607 (cherry picked from commit e158acb14c4ed92be1a5b961ff1e8ff99b1a5ae3)
2017-10-25Fix /etc/openstack-dashboard/ permissions for access to *policy.jsonRhys Oxenham1-0/+6
The Kolla Dockerfile sets the permissions for /etc/openstack-dashboard/ to horizon:horizon. We need this to be readable by the apache user as the horizon user is not the user in which httpd runs with. We may want to consider fixing this in the upstream Dockerfile instead, e.g. checking if we're using centos/rhel and changing the permissions that way. I'm not sure why it's set to horizon:horizon upstream, and I'm keen not to break any existing functionality that relies on the horizon based permissions. Closes-Bug: #1723125 Change-Id: If5feebae38f7fdfffa60bfaedc4521f676006484 (cherry picked from commit fd657aa4e68de7ad239a88525b5ae343acd3bf80)
2017-10-23Merge "Also match config volumes for /var/lib/config-data/puppet-generated/" ↵Zuul1-5/+7
into stable/pike
2017-10-23Merge "Disable xinetd class when creating swift-storage puppet ↵Zuul1-1/+4
configuration" into stable/pike
2017-10-19Disable xinetd class when creating swift-storage puppet configurationMichele Baldessari1-1/+4
Due to missing puppet invocation with --detailed-exitcodes we ignored a large amount of puppet errors during deploy. Swift storage fails during the puppet_config step with the following error: Debug: /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Package[swift-object]: Not tagged with file, file_line, concat, augeas, cron, swif t_proxy_config, swift_config, swift_container_config, swift_container_sync_realms_config, swift_account_config, swift_object_config, swift_object_expirer_con fig, rsync::server Debug: /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Package[swift-object]: Resource is being skipped, unscheduling all events Debug: Executing: '/usr/bin/systemctl is-active xinetd' Debug: Executing: '/usr/bin/systemctl is-enabled xinetd' Debug: Executing: '/usr/bin/systemctl unmask xinetd' Debug: Executing: '/usr/bin/systemctl start xinetd' Debug: Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u xinetd --no-pager Debug: Executing: 'journalctl -n 50 --since '5 minutes ago' -u xinetd --no-pager' Error: Systemd start for xinetd failed! The problem is that by using the rsync::server tag we end up including the xinetd class automatically which will try to start a service inside a container. By nooping the xinetd class, we're able avoid systemctl calls and have a successfuly deployment. The resulting swift_rsync container seems to work correctly: [root@overcloud-controller-0 ~]# docker exec -it swift_rsync /bin/bash -c "ps -axuwf" USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 10 0.0 0.0 47444 1624 pts/1 Rs+ 18:16 0:00 ps -axuwf root 1 0.0 0.0 188 4 ? Ss 17:27 0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start root 6 0.0 0.0 11036 924 ? Ss 17:27 0:00 /usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf [root@overcloud-controller-0 ~]# docker logs swift_rsync 2>&1|tail -n4 INFO:__main__:Deleting /etc/rsyncd.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/rsyncd.conf to /etc/rsyncd.conf INFO:__main__:Writing out command to execute Running command: '/usr/bin/rsync --daemon --no-detach --config=/etc/rsyncd.conf' Change-Id: I5e43e8fd61e002d2acc56a7de52e6aae64ab60be Closes-Bug: #1723463 (cherry picked from commit b5eeeab73e12efecc86ea7deebc105eee0739510)
2017-10-18Also match config volumes for /var/lib/config-data/puppet-generated/Steven Hardy1-5/+7
Some services only mount this directory, not /var/lib/config-data/$service so handle this case in the docker-puppet code that maps the mounted volumes to the services when adding the config hash to the container environment. Change-Id: I3bdb7609f322458584ac9597ffbfefb057b84646 Closes-Bug: #1720208 (cherry picked from commit 3a932b056914d148fa460b8890fc0e631c817a40)
2017-10-14Remove monitor_interface from ceph-ansible parametersGiulio Fidente2-2/+0
We should not pass any hardcoded value for monitor_interface and rely on monitor_address_block only instead. Also removes journal_collocation which is not consumed by newer (and stable) builds of ceph-ansible. Change-Id: Idf213a1f43a66506f76d07102f122839b5096948 Closes-Bug: #1715246 (cherry picked from commit 3e90ae3df5a7c5491672254733ceac163b34a395)
2017-10-12Fix permissions for dockerized horizonRadomir Dopieralski1-1/+1
Horizon needs write access to its log file and read permissions for all of its configuration files. The code that was supposed to set the permissions did it in the wrong directory. Closes-Bug: #1719590 Co-Authored-By: Martin Andre <m.andre@redhat.com> Change-Id: I0c125fac38cd186f98b9bc69bcc570f669eb6de1 (cherry picked from commit 960d7ff1025a568343aa5ae5ef95386306de8cab)
2017-10-10Providing required priviledges to the mounted NFS volumePranali Deore1-0/+23
Since, user ID on host and container differs, image-create with NFS backend was failing with permission error. But even after resolving permission error[1] the image was not getting created on the nfs share as the NFS endpoint is not mounted successfully on the container via puppet. This will be fixed by [2]. Now, adding two below changes in this patch, [1]. chown glance:glance /var/lib/glance. [2]. Proposing this solution to mount NFS endpoint on the host instead of mounting it on glance container, because mounting in container does not work as explained in LP Bug. Closes-Bug: 1708629 Change-Id: Ib60cb0d179e7c117dc26440746154136aa9d163e (cherry picked from commit ed11f8ebcfbaf1fbbebb4c83e3201e462fee14ee)
2017-10-10Merge "Remove package if service stopped and disabled" into stable/pikeJenkins30-3/+282
2017-10-10Merge "Adds pacemaker update_tasks for Pike minor update workflow" into ↵Jenkins10-3/+239
stable/pike
2017-10-10Merge "Make containerized galera use mysql_network everywhere" into stable/pikeJenkins1-0/+6
2017-10-10Merge "Create mysql user for non-ha deployments" into stable/pikeJenkins1-5/+21
2017-10-10Merge "List all unhealthy containers" into stable/pikeJenkins1-1/+5
2017-10-09Remove package if service stopped and disabledmarios30-3/+282
Adds a UpgradeRemoveUnusedPackages param to use in the ansible when conditional for the removal Adds package removal to step2 right after a service is stopped and disabled on step2. Package updates happen in step3 so ideally remove before that. The package removal task has ignore_errors true so dependencies or other issue removing packages will not fail the upgrade workflow. Also adds this to the upgrade environment files for visibility and defaulting false Change-Id: Ie4e4a2d41f7752c5a13507a7c15c6f68e203cfca Related-Bug: 1701501 (cherry picked from commit ce0ef2fa207698c1ae61c1620fe3c5e8d1c7bfca)
2017-10-09Adds pacemaker update_tasks for Pike minor update workflowmarios10-3/+239
Adds update_tasks for the minor update workflow. These will be collected into playbooks during an initial 'update init' heat stack update and then invoked later by the operator as ansible playbooks. Current understanding/workflow: Step=1: stop the cluster on the updated node Step=2: Pull the latest image and retag the it pcmklatest Step=3: yum upgrade happens on the host Step=4: Restart the cluster on the node Step=5: Verification: test pacemaker services are running. https://etherpad.openstack.org/p/tripleo-pike-updates-upgrades Related-Bug: 1715557 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com> Change-Id: I101e0f5d221045fbf94fb9dc11a2f30706843806 (cherry picked from commit a953bda0ae615dc44d3e8a70aa7ab0160e26f3af)
2017-10-09Merge "docker: add logging(source & groups)" into stable/pikeJenkins82-7/+165
2017-10-09List all unhealthy containersMartin Mágr1-1/+5
Currently the default Sensu check defined in docker/services/sensu-client.yaml reports only first unhealthy container. This patch changes the check output to contain list of all unhealthy containers. Change-Id: I0a934367ef22984d9091d160ec7105092edc8149 Closes-Bug: #1720972 (cherry picked from commit 9b016c9f3fbe9552497737974b9928d1dff4d299)
2017-10-09Create mysql user for non-ha deploymentsMartin Mágr1-5/+21
Currently health check for mysql container reports unhealthy container because there is no 'mysql' user created. This patch creates the user during mysql_bootstrap without any permission, just to allow health check to connect to DB and run 'select 1'. Change-Id: Iab26da0d30939b219189d4e7beb2a61d456ab7c3 Closes-Bug: #1718944 (cherry picked from commit 3a9cfaa992e92423461d64f84d701336322bdd10)
2017-10-09docker: add logging(source & groups)Juan Badia Payno82-7/+165
The services that docker depends on, have logging_sources and logging_groups; but those are not set on the docker outputs so they are not used when dockers are deployed. Added logging_source & logging_groups as docker optional parameters in tools/yaml-validate.py Closes-Bug: #1718110 Change-Id: I8795eaf4bd06051e9b94aa50450dee0d8761e526 (cherry picked from commit 5dbe1121e98a794ec6a6387ff56ee34314177567)
2017-10-09Containerized Fluentd clientJuan Badia Payno1-0/+121
Change-Id: Ia350e4899aa499cf27efffd9d2243e7e95fa1d65 Depends-On: I60796063fa9ebe0d98030fb982d22dabe2593ea0 Depends-On: I585b6877074353b5de62e5efaabfbe62432c473d (cherry picked from commit f37fe4f903f429b43d22b485c29547f576ec7269)
2017-10-07Make containerized galera use mysql_network everywhereDamien Ciabrini1-0/+6
The containerized galera service generates a galera.cnf which uses short hostname to identify itself rather than the fqdn from the mysql_network (e.g. overcloud-x.internalapi.cloudname). This breaks when internal TLS is in use, because the mysql certificate does not reference this short hostname. Fix the appropriate hiera parameter to make it behave like the non-containerized galera service. Change-Id: I904cde38f2baeddab5178e8ad48d34a0c73629af Closes-Bug: #1719599 (cherry picked from commit e10aa591dc9155a2746df01279c4ba4f2133fd17)
2017-10-07Support for Ocata-Pike live-migration over sshOliver Walsh4-6/+104
In Ocata all live-migration over ssh is performed on the default ssh port (22). In Pike the containerized live-migration over ssh is on port 2022 as the docker host's sshd is using port 22. To allow live migration during upgrade we need to temporarily pin the Pike computes to port 22 and in the final converge we can switch over to port 2022. This also changes the default port to 2022 for baremetal computes in Pike to enable live-migration between baremetal and containerized computes. Change-Id: Icb9bfdd9a99dc1dce28eb95c50a9a36bffa621b1 Depends-On: I0b80b81711f683be539939e7d084365ff63546d3 Closes-Bug: 1714171 (cherry picked from commit 17fd16b9f266e1aa67bf03ebdf309e89d668ada2)
2017-09-28Make CephConfigOverrides append to ceph.conf[global]Giulio Fidente1-4/+4
Previously it was mistakenly replacing the contents because we do not do deep merge. Change-Id: I145feb0208f135da7c71694ebcecd937244d66b1 Closes-Bug: #1719919 (cherry picked from commit 17416dcfc56c5148ccc9ab40297f99adfdcd085b)
2017-09-25Merge "Rename service_workflow_tasks into workflow_tasks" into stable/pikeJenkins7-7/+7
2017-09-22Set Ceph pgp_num after pg_numGiulio Fidente1-1/+2
We missed to set the pgp_num default in ceph.conf, causing WARNING messages like: pool default.rgw.buckets.data pg_num 32 > pgp_num 8 Also increases the default pg_num to 128 which is the recommended value for less than 5 OSDs [1]. 1. http://docs.ceph.com/docs/master/rados/operations/placement-groups/ Change-Id: Ibd9fb23e04576e95e24af58f856663397886a947 Closes-Bug: #1718173 (cherry picked from commit 58e6f6533a04eddd2dc897d890737bbccde4ea7b)
2017-09-21Merge "Use haproxy-systemd-wrapper as pid1 in containerized Haproxy" into ↵Jenkins2-6/+4
stable/pike
2017-09-21Merge "Disable all uses of wsrep-provider in mysql_bootstrap container" into ↵Jenkins1-2/+4
stable/pike
2017-09-20Use haproxy-systemd-wrapper as pid1 in containerized HaproxyDamien Ciabrini2-6/+4
This wrapper binary spawns the HAproxy daemon and implements a coordinated HAproxy restart on SIGHUP. From a service's perspective, this allows reloading the HAProxy configuration with minimal service disruption, i.e. without stopping and restarting the HAProxy container. Closes-Bug: #1717521 Change-Id: Ib3ef0c0bcf1a8151e179ff4d7509cf0d6b3ac5a1 (cherry picked from commit 91cd44cd7266c15ce07fafbee9d2e33f226096ba)
2017-09-20Disable all uses of wsrep-provider in mysql_bootstrap containerDamien Ciabrini1-2/+4
During the bootstrap of the mariadb database, galera replication must be disabled while the users credentials are being set up. This is done by setting wsrep-provider=none when starting mysqld_safe. Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b already disabled it when the clustercheck credentials are being set up, but Kolla also start a temporary server for setting up the root password. Disable the setting directly at the end of the mysql.cnf in the running container. That way, the default setting from galera.cnf will be overriden, all mysqld_safe calls will disable WSREP and the setting will stay ephemeral. Change-Id: If14e22992b46a35a05a16a9db5ecb360ea13df8f Closes-Bug: #1717250 (cherry picked from commit b0f50db80b10e9cd6263c4d6b3ca8dd818b658ba)
2017-09-19Run gnocchi statsd and metrcd at step 5Dan Prince2-2/+2
Running these daemons at step 5 should avoid seeing error messages in the gnocchi-statsd log files on startup which starts at step4. Change-Id: Idb82f864a2e1c623dab7a2a87054443036670453 Closes-bug: #1713182 (cherry picked from commit 9d8e496f3e8a825d48d9eba9aab540001bb780ea)
2017-09-15One time delete pacemaker resources during upgrade to containersMarius Cornea4-8/+40
This change allows running the major upgrade composable docker steps multiple times by not trying to delete the pacemaker resources if they're not reported as started or in master state. Closes-bug: 1716031 Depends-On: I8da03f5c4a6d442617b81be5793a9724cc8842bf Change-Id: Ifcf9de8c82550a90a9fb118052d43fdbcdc6ca7e (cherry picked from commit 64d7be1e3d4552e06cbc53f788572e530cc5c3bb)
2017-09-14Rename service_workflow_tasks into workflow_tasksGiulio Fidente7-7/+7
Using the service_ prefix seems incoherent with its use in service_config_settings (vs config_settings). Change-Id: Ia39f181415bee0071409dabddfa0c5c312915e1f (cherry picked from commit 09137304b98a02ed024c0288da907cfe35ca5fe1)
2017-09-14Retry if the pacemaker_resource commands failedMathieu Bultel6-0/+36
Add a retry when the pacemaker_resource command wasn't apply correctly, more info here: https://bugzilla.redhat.com/show_bug.cgi?id=1482116 This is the same approach puppet-pacemaker uses and provides eventual consistency when multiple nodes change the cluster CIB concurrently. This change depends-on : https://review.gerrithub.io/375982 The return code is not available in the current ansible-pacemaker package. Change-Id: I8da03f5c4a6d442617b81be5793a9724cc8842bf (cherry picked from commit e92430d8d03fc2ce2d0ce192b96209f2c5c04169)
2017-09-13Merge "Enable redis TLS proxy in HA deployments" into stable/pikeJenkins1-26/+67
2017-09-13Merge "Add CephConfigOverrides to allow arbitrary configs in ceph.conf" into ↵Jenkins2-11/+19
stable/pike