Age | Commit message (Collapse) | Author | Files | Lines |
|
puppet run on never fails, even when it should, since we moved
to the ansible way of applying it. The reason is the current following code:
- name: Run puppet host configuration for step {{step}}
command: >-
puppet apply
--modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules
--logdest syslog --logdest console --color=false
/var/lib/tripleo-config/puppet_step_config.pp
The above is missing the --detailed-exitcodes switch and so puppet will never
really error out on us and the deployment will keep on running all the
steps even though a previous puppet manifest might have failed. This
cause extra hard-to-debug failures.
Initially the issue was observed on the puppet host runs, but this
parameter is missing also from docker-puppet.py, so let's add it there
as well as it makes sense to return proper error codes whenever we call
puppet.
Besides this being a good idea in general, we actually *have* to do it
because puppet does not fail correctly without this option due to the
following puppet bug:
https://tickets.puppetlabs.com/browse/PUP-2754
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Ie9df4f520645404560a9635fb66e3af42b966f54
Closes-Bug: #1723163
(cherry picked from commit 11e599d116cfbf7df4dcd0e7670c3405a4224c1a)
|
|
Before pike we used to be able to add -e environments/config-debug.yaml
and that would give us debug logs for puppet. With the move to ansible
running puppet we lost this feature.
Let's make sure that the old ConfigDebug variable still works with
the ansible playbook-based deploy steps. With this patch and ConfigDebug
set to true, we correctly get the puppet debug logs:
TASK [debug] *******************************************************************
ok: [localhost] => {
"(outputs.stderr|default('')).split('\n')|union(outputs.stdout_lines|default([]))": [
"Warning: Undefined variable 'deploy_config_name'; ",
" (file & line not available)",
"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
" (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')",
"Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",
"Debug: Loading external facts from /etc/puppet/modules/openstacklib/facts.d",
"Debug: Loading external facts from /var/lib/puppet/facts.d",
....
Change-Id: Ia726fb8ca4a6f7bbbd7a1284d76ff42df6825d01
Closes-Bug: #1722752
(cherry picked from commit ecc6ce340aea59faaee4c2a49cd6d6fb90d8ed35)
|
|
During the controlplane upgrade the host_prep_tasks are being
executed on the disable_upgrade_deployment roles too.
This sets the role specific host_prep_tasks to an empty list for
those roles during an upgrade, as executing them during the
controlplane upgrade (during -e
major-upgrade-composable-steps-docker.yaml) causes problems.
They will be executed as part of the non controller upgrade as they
are written to the stack outputs to be used as ansible playbooks
(see bug 1708115 for more info on this)
Change-Id: I42c963440b9b1e8222097c3d4e83ffcbe820886c
Closes-Bug: 1719604
(cherry picked from commit 684267a7a4fbff489f6324020289afbdcaaca8f5)
|
|
|
|
stable/pike
|
|
After landing https://review.openstack.org/#/c/503484/ we run the
puppet host configuration steps twice. This change removes the
deploy_steps_tasks.yaml playbook in order to run the puppet steps
only once.
Closes-bug: 1717244
Change-Id: I09461094618124915841c8390c8bce8daf64d029
(cherry picked from commit e471c67aab6a8f91011aa2330b3cf80f4427f443)
|
|
This adds a new config/deployment per role that will come after any
post deploy steps. It drives the same ansible config as the
upgrade_tasks but instead collects the post_upgrade_tasks for any
service in the given role.
The workflow is upgrade_tasks, then post deploy steps (either
puppet/ or docker/ depending on the env) and then the
post_upgrade_tasks added here.
This is added to the pacemaker/cinder-volume.yaml service for now
see the bug below for more info
Change-Id: Iced34fecf02ebddc91df9302de54d2f4c2cab680
Closes-Bug: 1706951
(cherry picked from commit 2e182bffeeb099cb5e0b1747086fb0e0f57b7b5d)
|
|
Using the service_ prefix seems incoherent with its use in
service_config_settings (vs config_settings).
Change-Id: Ia39f181415bee0071409dabddfa0c5c312915e1f
(cherry picked from commit 09137304b98a02ed024c0288da907cfe35ca5fe1)
|
|
I96ec09bc788836584c4b39dcce5bf9b80e914c71 added this output to the
deploy-steps.j2, but missed adding this to the major upgrade template
which means the overcloud RoleConfig output is broken after the upgrade
(until the converge update switches back to the deploy-steps.j2 derived
template)
Closes-Bug: #1716404
Change-Id: I331fa18b456ca2d6c124316d513374e3fe5a5007
(cherry picked from commit 27018b4182d77abf612697cfe54a4fc3ceeb6be5)
|
|
Use a more restrictive mode for these files, as some may contain sensitive data
which shouldn't be world readable
Closes-Bug: #1714986
Change-Id: Ib1e79b1d4e25d6e329938402b1ca776bdab81bdd
(cherry picked from commit 94c7752cfae64d96124a32bc36ccd6ec7b4df4a7)
|
|
Where applicable, use list_concat instead of yaql to build new lists: it
should be more resilient to errors, easier to debug, and less expensive.
Change-Id: I6d3dbc7ee8eac50f46023a35af4ec7f2d378fd87
Related-Bug: #1714005
(cherry picked from commit 8008089de24437757d3ba10299bb1041b4aa627a)
|
|
For bug 1708115 and the O..P upgrade, and for the upgrade of
'non-controlers' we are now generating ansible playbooks from
collected service upgrade_tasks and these are executed instead
of the legacy tripleo_upgrade_node.sh.
To clarify, by 'non-controllers' it is meant any node for which
the corresponding roles_data.yaml role has the
disable_upgrade_deployment flag set True.
As a first pass, I am removing the workarounds from the script but
keeping its delivery mechanism for now in case it is needed still.
We can either update here to remove it or keep it until next cycle
The most important part for now is that we no longer 'manually'
run puppet here. Instead the post_deploy_steps are also collected
into a playbook and will be executed after the upgrade_tasks
(see the bug for discussion of the mechanism and related reviews)
Change-Id: Ib017b0ab435ca9558cf8659d434489cdf01df955
Related-Bug: 1708115
(cherry picked from commit 4c5b9c5c967105536106fa4a7e1ec2352b14b08c)
|
|
docker-puppet.py is very aggressive about running concurrently.
It uses python multiprocessing to run multiple config generating
containers at once. This seems to work well in general, but
in some cases... perhaps when the registry is slow or under
heavy load can cause timeouts to occur. Lately I'm seeing
several 'container did not start before the specified timeout'
errors that always seem to occur when config files are generated
(docker-puppet.py is initially executed.
A couple of things:
-when config files are generated this is the first time
most of the containers are pulled to each host machine
during deployment
-docker-puppet.py runs many of these processes at once. Some
of them run faster, other not.
-docker daemon's pull limit defaults to 3. This would throttle
the above a bit perhaps contributing the the likelyhood of a timeout.
One solution that seems to work for me is to set the PROCESS_COUNT
in docker-puppet.py to 3. As this matches docker daemon's default
it is probably safer at the cost of being slightly slower in some
cases.
Change-Id: I17feb3abd9d36fe7c95865a064502ce9902a074e
Closes-bug: #1713188
(cherry picked from commit 949d367ddeb42eff913cdbed733ccf6239b4864b)
|
|
Force the count start to 0 to ensure that the
update step loop will start to 0 and execute the
update step0
Closes-Bug: #1712498
Change-Id: I71be55c1f56e53e5c565bec281795d63e5845ff6
|
|
To get this to work upgrade_tasks need to be rewritten with 'when'
statements like the update tasks (in parent review from shardy).
So that we don't break the existing upgrades workflow, we add these
as part of the config download see the depends on
Related-Bug: 1708115
Depends-On: Ief593dc758a2ffe33c1cbcbda9289393fcf023e4
Change-Id: Ib01b96a2c26721747d81d98e3d57c4c388663004
|
|
This enables either deploying without configuring any services, or
temporarily disabling the deploy steps such as will be required
for minor updates where we want to re-run the rolling update outside
of heat.
To deploy directly via ansible-playbook you can do e.g:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-6b02U7-config
ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml
Which will run the same ansible steps as we normally run via heat.
Change-Id: I59947b67523dfcc43d454d4ac7d82b06804cf71d
|
|
These work the same way as upgrade_tasks *but* they use a step variable
instead of tags, so we can iterate over a count/sequence which isn't
possibly via a wrapper playbook with tags (we may want to align upgrade
tasks with the same approach if this works out well).
Note the tasks can be run via ansible-playbook on the undercloud, like:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-HCrDA6-config
ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit controller
The above will do a rolling update for the Controller role (note the inconsistent
capitalization, we probably need to fix the group naming in tripleo-ansible-inventory)
because we specify serial: 1 in the playbook.
You can also trigger an update explicitly on one node like this, which is useful for debugging:
ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit overcloud-controller-0
Change-Id: I20bb3e26ab9d9cadf1a31fd304de8a014a901aa9
|
|
This exposes the deploy workflow for all roles from deploy-steps
via overcloud.j2.yaml - which means we can write it via the new
openstack overcloud config download command and/or run the workflow
outside of heat via mistral
With https://review.openstack.org/#/c/485732/ applied to
tripleoclient it becomes possible to do:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-EvEZk0-config
ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml
This runs the deploy steps, exactly the same as normally run via heat
via ansible-playbook for all overcloud nodes (--limit can be used to restrict
to specific nodes/roles).
Change-Id: I96ec09bc788836584c4b39dcce5bf9b80e914c71
|
|
This isn't set unless the playbook is run via heat, so default it to false
to enable easier use via ansible-playbook combined with tripleo-ansible-inventory
Change-Id: I9705e4533831a019dd0051e5522d4b7958682506
|
|
So that we can more easily iterate over an include in an output
Change-Id: Idd5bb47589e5c37123caafcded1afbff8881aa33
|
|
If we consolidate these we can focus on one implementation (the new ansible
based one used for docker-steps)
Change-Id: Iec0ad2278d62040bf03613fc9556b199c6a80546
Depends-On: Ifa2afa915e0fee368fb2506c02de75bf5efe82d5
|
|
The key_name default is ignored because the parameter is used in
some mutually exclusive environments where the default doesn't
need to be the same.
Change-Id: I77c1a1159fae38d03b0e59b80ae6bee491d734d7
Partial-Bug: 1700664
|
|
This makes the RolesData output more accurate, and we can rework
things so docker-puppet only gets run when there is a non-empty
file calculated (e.g there are tasks to run).
Change-Id: I8cdab3c857977c80fe2e359ab9e05740a838d66b
|
|
This stores the result of the yaql queries etc for easier debugging, and
also so there's no risk we constantly re-evaluate the expensive query
which can happen with some heat versions and configurations.
This also gives a nicer error when things go wrong as when a query fails
you know which resource had an error, and also the validation on resources
is currently stricter due to bug #1599114. We also get some additional
type validation from each OS::Heat::Value resource, e.g it checks if the
calculated value is a valid map or list.
The final advantage (and the original motivation for doing this) is that
we can easily filter null values for any outputs where this isn't already
done, which makes the config data written via openstack overcloud config
download cleaner.
Change-Id: Ia6697cf2e47f3f7b727d620536e0873a985c98c4
|
|
Moving these means we get a more accurate output from the overcloud
RoleData output, which more closely reflects what is actually
deployed.
Change-Id: I154f36c1597cf4abe29ca0bfe15a54f507433fb1
|
|
Makes it possible to resolve network subnets within a service
template; the data is transported into a new property ServiceData
wired into every service which hopefully is generic enough to
be extended in the future and transport more data.
Data can be consumed in service templates to set config values
which need to know what is the subnet where a deamon operates (for
example the Ceph Public vs Cluster network).
Change-Id: I28e21c46f1ef609517175f7e7ee19e28d1c0cba2
|
|
This new directory has now been added to the RDO packaging so we
can move things common to both puppet/container architecture here,
starting with the recently combined services.yaml
Change-Id: If2ce27188c4c15002b3ad830e8d6eb9504d2f3d2
|
|
Move to one common services.yaml not only reduces the duplication, but it
should improve performance for the docker/services.yaml case, because we were
creating two ResourceChains with $many services which we know can be really
slow (especially since we seem to be missing concurrent: true on one)
Change-Id: I76f188438bfc6449b152c2861d99738e6eb3c61b
|