aboutsummaryrefslogtreecommitdiffstats
path: root/common
AgeCommit message (Collapse)AuthorFilesLines
2017-11-10Refactor cellv2 host discovery logic to avoid racesOliver Walsh2-0/+33
The compute service list is polled until all expected hosts are reported or a timeout occurs (600s). Adds a cellv2_discovery flag to puppet services. Used to generate a list of hosts that should have cellv2 host mappings. Adds a canonical fqdn and that should match the fqdn reported by a host. Adds the ability to upload a config script for docker config instead of using complex bash on-liners. Closes-bug: 1720821 Change-Id: I33e2f296526c957cb5f96dff19682a4e60c6a0f0 (cherry picked from commit 61fcfca045aeb5be1ee280d8dd9c260fb39b9084)
2017-11-08Add --detailed-exitcodes when running puppet via ansibleMichele Baldessari1-2/+3
puppet run on never fails, even when it should, since we moved to the ansible way of applying it. The reason is the current following code: - name: Run puppet host configuration for step {{step}} command: >- puppet apply --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --logdest syslog --logdest console --color=false /var/lib/tripleo-config/puppet_step_config.pp The above is missing the --detailed-exitcodes switch and so puppet will never really error out on us and the deployment will keep on running all the steps even though a previous puppet manifest might have failed. This cause extra hard-to-debug failures. Initially the issue was observed on the puppet host runs, but this parameter is missing also from docker-puppet.py, so let's add it there as well as it makes sense to return proper error codes whenever we call puppet. Besides this being a good idea in general, we actually *have* to do it because puppet does not fail correctly without this option due to the following puppet bug: https://tickets.puppetlabs.com/browse/PUP-2754 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Change-Id: Ie9df4f520645404560a9635fb66e3af42b966f54 Closes-Bug: #1723163 (cherry picked from commit 11e599d116cfbf7df4dcd0e7670c3405a4224c1a)
2017-10-16Fix ConfigDebug for puppet host runsMichele Baldessari2-1/+11
Before pike we used to be able to add -e environments/config-debug.yaml and that would give us debug logs for puppet. With the move to ansible running puppet we lost this feature. Let's make sure that the old ConfigDebug variable still works with the ansible playbook-based deploy steps. With this patch and ConfigDebug set to true, we correctly get the puppet debug logs: TASK [debug] ******************************************************************* ok: [localhost] => { "(outputs.stderr|default('')).split('\n')|union(outputs.stdout_lines|default([]))": [ "Warning: Undefined variable 'deploy_config_name'; ", " (file & line not available)", "Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')", "Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8", "Debug: Loading external facts from /etc/puppet/modules/openstacklib/facts.d", "Debug: Loading external facts from /var/lib/puppet/facts.d", .... Change-Id: Ia726fb8ca4a6f7bbbd7a1284d76ff42df6825d01 Closes-Bug: #1722752 (cherry picked from commit ecc6ce340aea59faaee4c2a49cd6d6fb90d8ed35)
2017-10-04Disable role host_prep_tasks on controlplane upgrademarios1-0/+7
During the controlplane upgrade the host_prep_tasks are being executed on the disable_upgrade_deployment roles too. This sets the role specific host_prep_tasks to an empty list for those roles during an upgrade, as executing them during the controlplane upgrade (during -e major-upgrade-composable-steps-docker.yaml) causes problems. They will be executed as part of the non controller upgrade as they are written to the stack outputs to be used as ansible playbooks (see bug 1708115 for more info on this) Change-Id: I42c963440b9b1e8222097c3d4e83ffcbe820886c Closes-Bug: 1719604 (cherry picked from commit 684267a7a4fbff489f6324020289afbdcaaca8f5)
2017-09-25Merge "Rename service_workflow_tasks into workflow_tasks" into stable/pikeJenkins2-8/+8
2017-09-25Merge "Remove deploy_steps_tasks.yaml from upgrade_steps_playbook" into ↵Jenkins1-5/+0
stable/pike
2017-09-21Remove deploy_steps_tasks.yaml from upgrade_steps_playbookMarius Cornea1-5/+0
After landing https://review.openstack.org/#/c/503484/ we run the puppet host configuration steps twice. This change removes the deploy_steps_tasks.yaml playbook in order to run the puppet steps only once. Closes-bug: 1717244 Change-Id: I09461094618124915841c8390c8bce8daf64d029 (cherry picked from commit e471c67aab6a8f91011aa2330b3cf80f4427f443)
2017-09-20Adds post_upgrade_tasks for any service post-upgrade ansible tasksmarios2-0/+48
This adds a new config/deployment per role that will come after any post deploy steps. It drives the same ansible config as the upgrade_tasks but instead collects the post_upgrade_tasks for any service in the given role. The workflow is upgrade_tasks, then post deploy steps (either puppet/ or docker/ depending on the env) and then the post_upgrade_tasks added here. This is added to the pacemaker/cinder-volume.yaml service for now see the bug below for more info Change-Id: Iced34fecf02ebddc91df9302de54d2f4c2cab680 Closes-Bug: 1706951 (cherry picked from commit 2e182bffeeb099cb5e0b1747086fb0e0f57b7b5d)
2017-09-14Rename service_workflow_tasks into workflow_tasksGiulio Fidente2-8/+8
Using the service_ prefix seems incoherent with its use in service_config_settings (vs config_settings). Change-Id: Ia39f181415bee0071409dabddfa0c5c312915e1f (cherry picked from commit 09137304b98a02ed024c0288da907cfe35ca5fe1)
2017-09-13Add RoleConfig output to major_upgrade_steps.j2.yamlSteven Hardy3-16/+29
I96ec09bc788836584c4b39dcce5bf9b80e914c71 added this output to the deploy-steps.j2, but missed adding this to the major upgrade template which means the overcloud RoleConfig output is broken after the upgrade (until the converge update switches back to the deploy-steps.j2 derived template) Closes-Bug: #1716404 Change-Id: I331fa18b456ca2d6c124316d513374e3fe5a5007 (cherry picked from commit 27018b4182d77abf612697cfe54a4fc3ceeb6be5)
2017-09-05Set mode for ansible written filesSteven Hardy2-8/+8
Use a more restrictive mode for these files, as some may contain sensitive data which shouldn't be world readable Closes-Bug: #1714986 Change-Id: Ib1e79b1d4e25d6e329938402b1ca776bdab81bdd (cherry picked from commit 94c7752cfae64d96124a32bc36ccd6ec7b4df4a7)
2017-09-02Use list_concat in place of yaqlThomas Herve1-47/+45
Where applicable, use list_concat instead of yaql to build new lists: it should be more resilient to errors, easier to debug, and less expensive. Change-Id: I6d3dbc7ee8eac50f46023a35af4ec7f2d378fd87 Related-Bug: #1714005 (cherry picked from commit 8008089de24437757d3ba10299bb1041b4aa627a)
2017-08-31Remove puppet run and workarounds from tripleo_upgrade_node.shmarios1-27/+0
For bug 1708115 and the O..P upgrade, and for the upgrade of 'non-controlers' we are now generating ansible playbooks from collected service upgrade_tasks and these are executed instead of the legacy tripleo_upgrade_node.sh. To clarify, by 'non-controllers' it is meant any node for which the corresponding roles_data.yaml role has the disable_upgrade_deployment flag set True. As a first pass, I am removing the workarounds from the script but keeping its delivery mechanism for now in case it is needed still. We can either update here to remove it or keep it until next cycle The most important part for now is that we no longer 'manually' run puppet here. Instead the post_deploy_steps are also collected into a playbook and will be executed after the upgrade_tasks (see the bug for discussion of the mechanism and related reviews) Change-Id: Ib017b0ab435ca9558cf8659d434489cdf01df955 Related-Bug: 1708115 (cherry picked from commit 4c5b9c5c967105536106fa4a7e1ec2352b14b08c)
2017-08-29Add DockerPuppetProcessCount defaults to 3Dan Prince2-0/+7
docker-puppet.py is very aggressive about running concurrently. It uses python multiprocessing to run multiple config generating containers at once. This seems to work well in general, but in some cases... perhaps when the registry is slow or under heavy load can cause timeouts to occur. Lately I'm seeing several 'container did not start before the specified timeout' errors that always seem to occur when config files are generated (docker-puppet.py is initially executed. A couple of things: -when config files are generated this is the first time most of the containers are pulled to each host machine during deployment -docker-puppet.py runs many of these processes at once. Some of them run faster, other not. -docker daemon's pull limit defaults to 3. This would throttle the above a bit perhaps contributing the the likelyhood of a timeout. One solution that seems to work for me is to set the PROCESS_COUNT in docker-puppet.py to 3. As this matches docker daemon's default it is probably safer at the cost of being slightly slower in some cases. Change-Id: I17feb3abd9d36fe7c95865a064502ce9902a074e Closes-bug: #1713188 (cherry picked from commit 949d367ddeb42eff913cdbed733ccf6239b4864b)
2017-08-23Specify the start count to 0 for the update step loopMathieu Bultel1-5/+5
Force the count start to 0 to ensure that the update step loop will start to 0 and execute the update step0 Closes-Bug: #1712498 Change-Id: I71be55c1f56e53e5c565bec281795d63e5845ff6
2017-08-15Also write an upgrade_tasks_playbookmarios1-0/+17
To get this to work upgrade_tasks need to be rewritten with 'when' statements like the update tasks (in parent review from shardy). So that we don't break the existing upgrades workflow, we add these as part of the config download see the depends on Related-Bug: 1708115 Depends-On: Ief593dc758a2ffe33c1cbcbda9289393fcf023e4 Change-Id: Ib01b96a2c26721747d81d98e3d57c4c388663004
2017-08-12Add environment to disable deploy stepsSteven Hardy1-1/+1
This enables either deploying without configuring any services, or temporarily disabling the deploy steps such as will be required for minor updates where we want to re-run the rolling update outside of heat. To deploy directly via ansible-playbook you can do e.g: openstack overcloud config download --config-dir tmpconfig cd tmpconfig/tripleo-6b02U7-config ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml Which will run the same ansible steps as we normally run via heat. Change-Id: I59947b67523dfcc43d454d4ac7d82b06804cf71d
2017-08-12Add support for update_tasksSteven Hardy2-0/+29
These work the same way as upgrade_tasks *but* they use a step variable instead of tags, so we can iterate over a count/sequence which isn't possibly via a wrapper playbook with tags (we may want to align upgrade tasks with the same approach if this works out well). Note the tasks can be run via ansible-playbook on the undercloud, like: openstack overcloud config download --config-dir tmpconfig cd tmpconfig/tripleo-HCrDA6-config ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit controller The above will do a rolling update for the Controller role (note the inconsistent capitalization, we probably need to fix the group naming in tripleo-ansible-inventory) because we specify serial: 1 in the playbook. You can also trigger an update explicitly on one node like this, which is useful for debugging: ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit overcloud-controller-0 Change-Id: I20bb3e26ab9d9cadf1a31fd304de8a014a901aa9
2017-08-12Add RoleConfig outputSteven Hardy1-0/+18
This exposes the deploy workflow for all roles from deploy-steps via overcloud.j2.yaml - which means we can write it via the new openstack overcloud config download command and/or run the workflow outside of heat via mistral With https://review.openstack.org/#/c/485732/ applied to tripleoclient it becomes possible to do: openstack overcloud config download --config-dir tmpconfig cd tmpconfig/tripleo-EvEZk0-config ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml This runs the deploy steps, exactly the same as normally run via heat via ansible-playbook for all overcloud nodes (--limit can be used to restrict to specific nodes/roles). Change-Id: I96ec09bc788836584c4b39dcce5bf9b80e914c71
2017-08-12Default docker_puppet_debug to falseSteven Hardy1-1/+1
This isn't set unless the playbook is run via heat, so default it to false to enable easier use via ansible-playbook combined with tripleo-ansible-inventory Change-Id: I9705e4533831a019dd0051e5522d4b7958682506
2017-08-12Move deploy-steps-playbook to deploy-steps-tasksSteven Hardy2-4/+12
So that we can more easily iterate over an include in an output Change-Id: Idd5bb47589e5c37123caafcded1afbff8881aa33
2017-08-11Consolidate puppet/docker deployments with one deploy steps workflowSteven Hardy5-0/+612
If we consolidate these we can focus on one implementation (the new ansible based one used for docker-steps) Change-Id: Iec0ad2278d62040bf03613fc9556b199c6a80546 Depends-On: Ifa2afa915e0fee368fb2506c02de75bf5efe82d5
2017-08-02Make RoleParameters and key_name descriptions consistentBen Nemec1-1/+1
The key_name default is ignored because the parameter is used in some mutually exclusive environments where the default doesn't need to be the same. Change-Id: I77c1a1159fae38d03b0e59b80ae6bee491d734d7 Partial-Bug: 1700664
2017-07-24Move docker_puppet_tasks calculation into services.yamlSteven Hardy1-2/+2
This makes the RolesData output more accurate, and we can rework things so docker-puppet only gets run when there is a non-empty file calculated (e.g there are tasks to run). Change-Id: I8cdab3c857977c80fe2e359ab9e05740a838d66b
2017-07-24Move services.yaml output calculation into Value resourcesSteven Hardy1-41/+126
This stores the result of the yaql queries etc for easier debugging, and also so there's no risk we constantly re-evaluate the expensive query which can happen with some heat versions and configurations. This also gives a nicer error when things go wrong as when a query fails you know which resource had an error, and also the validation on resources is currently stricter due to bug #1599114. We also get some additional type validation from each OS::Heat::Value resource, e.g it checks if the calculated value is a valid map or list. The final advantage (and the original motivation for doing this) is that we can easily filter null values for any outputs where this isn't already done, which makes the config data written via openstack overcloud config download cleaner. Change-Id: Ia6697cf2e47f3f7b727d620536e0873a985c98c4
2017-07-21Move step_config/docker_config calculation into services.yamlSteven Hardy1-3/+29
Moving these means we get a more accurate output from the overcloud RoleData output, which more closely reflects what is actually deployed. Change-Id: I154f36c1597cf4abe29ca0bfe15a54f507433fb1
2017-07-14Adds network/cidr mapping into a new service propertyGiulio Fidente1-0/+5
Makes it possible to resolve network subnets within a service template; the data is transported into a new property ServiceData wired into every service which hopefully is generic enough to be extended in the future and transport more data. Data can be consumed in service templates to set config values which need to know what is the subnet where a deamon operates (for example the Ceph Public vs Cluster network). Change-Id: I28e21c46f1ef609517175f7e7ee19e28d1c0cba2
2017-07-13Move services.yaml to common directorySteven Hardy2-1/+148
This new directory has now been added to the RDO packaging so we can move things common to both puppet/container architecture here, starting with the recently combined services.yaml Change-Id: If2ce27188c4c15002b3ad830e8d6eb9504d2f3d2
2017-06-09Remove duplicate docker/puppet services.yamlSteven Hardy1-0/+1
Move to one common services.yaml not only reduces the duplication, but it should improve performance for the docker/services.yaml case, because we were creating two ResourceChains with $many services which we know can be really slow (especially since we seem to be missing concurrent: true on one) Change-Id: I76f188438bfc6449b152c2861d99738e6eb3c61b