apex-tripleo-heat-templates - Unnamed repository

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-02	Add special case upgrade from openvswitch 2.5.0-14	marios	1	-0/+3
	In [1] we removed the previously used special case upgrade code. However we have since discovered that for openvswitch 2.5.0-14 the special case is still required with an extra flag to prevent the restart. This adds the upgrade code back into the minor update and 'manual upgrade' scripts for compute/swift. The review at If998704b3c4199bbae8a1d068c31a71763f5c8a2 is adding this logic for the ansible upgrade steps. Related-Bug: 1669714 [1] https://review.openstack.org/#/q/59e5f9597eb37f69045e470eb457b878728477d7 Change-Id: I3e5899e2d831b89745b2f37e61ff69dbf83ff595 (cherry picked from commit 25983882c2f7a8e8f8fb83bd967a67d008a556a4)
2017-03-30	Run cluster check on nodes configured in wsrep_cluster_address.	Yurii Prokulevych	1	-9/+13
	Attempt to check galera's cluster status fails when galera service is not running on the same node. Change-Id: I27fb0841d85cd0dc86e92ac2e21eedf5f8f863ab Closes-Bug: #1677574 (cherry picked from commit d39c952fd3150d24c9e01c15806181715d0760f8 )
2017-03-22	Don't try to run os-net-config from yum_update.sh	Lukas Bezdicka	1	-11/+0
	The UpdateDeployment already depends on NetworkDeployment. We should not run os-net-config unconditionally before update. Closes-Bug: #1666227 Change-Id: I48cbf5de00d47c6fdad71ff24c00e9db05cec5d5 (cherry picked from commit b19d6306ea582dc31ebfd609475d9ac4e641e278)
2017-03-21	Merge "Disable exit on error for pacemaker commands for update flow" into ↵	Jenkins	1	-1/+4
	stable/ocata
2017-03-03	Remove the openvswitch special case upgrade code	marios	1	-3/+0
	Removed from the tripleo_upgrade_node.sh (major upgrade) & yum_update.sh (minor update). The workaround is no longer needed and in fact has the opposite effect killing connectitivity to the node. The 'normal' yum update on nodes delivers the latest openvswitch 2.6.1 with no drama. Also adds a 'complete' message, some extra debug echo for logs and removes the python-zaqarclient install no longer needed Closes-Bug: 1669714 Change-Id: Icd1517bcade36781fa0da21d045ffd9ec68efc38 (cherry picked from commit 9025a3bc23834e31efc5021acaef80b8d0f5de73)
2017-03-01	Disable exit on error for pacemaker commands for update flow	Saravanan KR	1	-1/+4
	Package update fails on compute node, when yum_update checks for pacemaker status via systemctl command. Because exit on error (-e) option has been enabled recently, this issue is happening. Fixing by, executing the command only on nodes where pacemaker is enabled. Closes-Bug: #1668266 Change-Id: I2aae4e2fdfec526c835f8967b54e1db3757bca17 (cherry picked from commit e9a2fdc0afd2a3f1242f397c5f164cf6b43c2669)
2017-02-02	Merge "Don't run yum_update.sh inside docker"	Jenkins	1	-0/+5

2017-01-19	Ignore systemctl return code in yum_update.sh	Lukas Bezdicka	1	-1/+1
	We only need to know if pacemaker service is in active state. Change-Id: Id5e16f2bbbe51b8a0c250eb5d35e89e61a7b3383 Resolves: rhbz#1414779 Closes-Bug: #1656980
2016-12-15	Don't run yum_update.sh inside docker	Steve Baker	1	-0/+5
	For now, don't run anything in yum_update.sh when it is run from inside the heat-agents container. A mechanism for doing a yum update on the host can be worked out later, but for now a yum update should never be run inside a container. Change-Id: I73d37578f8b2dc9c3029b968b1ef74ef4894100a
2016-12-14	Make the openvswitch 2.4->2.5 upgrade more robust	marios	1	-12/+1
	In I9b1f0eaa0d36a28e20b507bec6a4e9b3af1781ae and I11fcf688982ceda5eef7afc8904afae44300c2d9 we added a manual step for upgrading openvswitch in order to specify the --nopostun as discussed in the bug below. This change adds a minor update to make this workaround more robust. It removes any existing rpms that may be around from an earlier run, and also checks that the rpms installed are at least newer than the version we are on. This also refactors the code into a common definition in the pacemaker_common_functions.sh which is included even for the heredocs generating upgrade scripts during init. Thanks Sofer Athlan-Guyot and Jirka Stransky for help with that. Change-Id: Idc863de7b5a8c116c990ee8c1472cfe377836d37 Related-Bug: 1635205
2016-11-23	Run os-net-config before restarting cluster on update	Brent Eagles	1	-0/+11
	Running os-net-config before restarting the cluster prevents changes to the interface files caused by changes to implementation from bouncing network interfaces after the cluster has restarted. Closes-Bug: #1644138 Change-Id: I65fb104465ff3d37ddc791634302994334136014
2016-11-22	Fix ovs 2.4 to 2.5 upgrade - minor update non controllers	marios	1	-14/+13
	In I9b1f0eaa0d36a28e20b507bec6a4e9b3af1781ae and I11fcf688982ceda5eef7afc8904afae44300c2d9 we landed a workaround for the openvswitch 2.4 to 2.5 upgrade discussed in the bug below. Unfortunately testing has revealed a problem with the minor update case specifically for non controllers. It seems we would exit before the ovs workaround has had a chance to execute. This moves the block up a few lines to avoid this condition. As with the other two reviews noted here, this will need to go into newton and then mitaka too. Change-Id: If905de82d96302334ebe02de9c43f00faed9b72b Related-Bug: 1635205
2016-11-01	Update openstack-puppet-modules dependencies	Lukas Bezdicka	1	-1/+2
	OPM package is metadata package with unversioned requirements which means that update does not update the dependencies. This leaves us with old puppet modules and old puppet during the puppet run. Change-Id: I80f8a73142a09bb4178bb5a396d256ba81ba98a8 Closes-Bug: #1638266 Resolves: rhbz#1390559
2016-10-27	Add replacepkgs to the manual ovs upgrade workaround and fix a typo	Mathieu Bultel	1	-3/+2
	rpm command will return an exit 1 if ovs package is already there and will exit the step_1.sh script. To get around this force the update with --replacepkgs Also remove the \ just before the $ which cause a syntax error for the ceph storage Change-Id: I11fcf688982ceda5eef7afc8904afae44300c2d9 Closes-bug: 1636748
2016-10-22	Fix the rabbitmq/redis pacemaker resource timeouts on updates	Michele Baldessari	1	-0/+19
	With the following two changes we increased the timeout for redis and rabbit for both starting and stopping to 200s: https://review.openstack.org/386618 newton (merged) https://review.openstack.org/385555 master (merged) We want to also fix that on minor updates on all our supported releases upstream and downstream (newton, mitaka, liberty, kilo). This way we can guarantee that we have a uniform timeout for sart and stop for rabbit and redis across all our releases. Change-Id: If59bf3386832ee78d3a654f01077aff2e8be76e8 Closes-Bug: #1634851
2016-10-20	Add special case handling for OVS upgrade in updates and upgrades	marios	1	-0/+15
	This adds a special case handling for the opensvswitch package as discussed at the related bug below. This is added/handled here for both the minor update and the major mitaka...newton upgrade. Change-Id: I9b1f0eaa0d36a28e20b507bec6a4e9b3af1781ae Closes-Bug: 1635205
2016-04-26	Merge "Update .sh references from openstack-keystone to openstack-core"	Jenkins	1	-103/+0

2016-04-20	Merge "Increase galera sync timeout in yum_update.sh"	Jenkins	1	-1/+1

2016-04-11	Update .sh references from openstack-keystone to openstack-core	Giulio Fidente	1	-103/+0
	The update and upgrade shell scripts were still referencing the old openstack-keystone service which got removed with Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4 Also removes kilo and liberty specific workarounds and config changes. Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
2016-03-20	Deploy Aodh services, replacing Ceilometer Alarm	Pradeep Kilambi	1	-2/+6
	Ceilometer Alarm is deprecated in Liberty by Aodh. This patch: * manage Aodh Keystone resources * deploy Aodh API under WSGI, Notifier, Listener and Evaluator * manage new parameters to customize Aodh deployment * uses ceilometer DB for the upgrade path * pacemaker config * Add migration logic to remove pcs resources Depends-On: I5333faa72e52d2aa2a622ac2d4b60825aadc52b5 Depends-On: Ib6c9c4c35da3fb55e0ca8e2d5a58ebaf4204d792 Co-Authored-By: Emilien Macchi <emilien@redhat.com> Change-Id: Ib47a22884afb032ebc1655e1a4a06bfe70249134
2016-03-04	Revert "Deploy Aodh services, replacing Ceilometer Alarm"	James Slagle	1	-7/+2
	This just a revert to see if reverting this gets back to a normal CI run time. This reverts commit f72aed85594f223b6f888e6d0af3c880ea581a66. Change-Id: I04a0893f6cf69f547a4db26261005e580e1fc90b
2016-03-03	Deploy Aodh services, replacing Ceilometer Alarm	Emilien Macchi	1	-2/+7
	Ceilometer Alarm is deprecated in Liberty by Aodh. This patch: * manage Aodh Keystone resources * deploy Aodh API under WSGI, Notifier, Listener and Evaluator * manage new parameters to customize Aodh deployment * uses ceilometer DB for the upgrade path * pacemaker config Depends-On: I9e34485285829884d9c954b804e3bdd5d6e31635 Depends-On: I891985da9248a88c6ce2df1dd186881f582605ee Depends-On: Ied8ba5985f43a5c5b3be5b35a091aef6ed86572f Co-Authored-By: Pradeep Kilambi <pkilambi@redhat.com> Change-Id: I58d419173e80d2462accf7324c987c71420fd5f6
2016-02-23	Add meta notify=true to rabbitmq resource	Michele Baldessari	1	-0/+3
	See RHBZ 1311005 and 1247303. In short: sometimes when a controller node gets fenced, rabbitmq is unable to rejoin the cluster. To fix this we need two steps: 1) The fix for the RA in BZ 1247303 2) Add notify=true to the meta parameters of the rabbitmq resource on fresh installs and updates Note that if this change is applied on systems that do not have the fix for the rabbitmq resource agent, no action is taken. So when the resource agent will be updated, the notify operation will start to work as soon as the first monitor action will take place. Fixes RH Bug #1311005 Change-Id: I513daf6d45e1a13d43d3c404cfd6e49d64e51d5a
2016-02-08	Pass -q option to yum	Zane Bitter	1	-2/+2
	The maximum payload size of the return signal from a Heat software deployment is 1MB, and the output of yum starts breaking this limit at ~1000 packages to update - which is not an atypical number. To prevent this, pass the -q (quiet) option to reduce the amount of output to a manageable level. Change-Id: I517271e8465885421a78b73c5af756816c37a977 Resolves-rhbz: #1304878 Closes-Bug: #1543034
2016-01-26	Increase galera sync timeout in yum_update.sh	Jiri Stransky	1	-1/+1
	We've seen the 360 second threshold broken and a failed update because of that, even though Galera eventually synced fine, clusterchecks OK and pcs status clean. This will give Galera more time to perform the sync. Change-Id: I17207ec9b4038fb9540582c9b0b717f9b85a78b9 Closes-Bug: #1538218
2016-01-17	Let Puppet update all packages on non-controllers	James Slagle	1	-8/+5
	With I02f7cf07792765359f19fdf357024d9e48690e42[1] in puppet-tripleo, puppet is capable of updating all packages itself on non controller nodes now. This is a safer mechanism than using the exclude logic in yum_update.sh since that can cause depdency problems across sub packages. [1] https://review.openstack.org/#/c/261041/ Closes-Bug: 1534785 Change-Id: I9075a1bb85baa65a9d0afc5d0fd31a1f99a98819
2016-01-05	Bump the pacemaker service op_params to 200s for start and stop	marios	1	-2/+2
	Based on observed timeouts during updates bump the stop and start timeouts for pacemaker service resources (via op_params) to 200. This is based on the reasoning that the full timeout may be as long as two elapsed timeout intervals. After an initial timeout, the sigterm that follows is then allowed another DefaultTimeoutStopSec seconds. The 200s is produced by allowing this 2xDefaultTimeoutStopSec (@90s for systemd) and some scheduling delta. Many thanks to Michele Baldessari. Closes-Bug: 1531204 Change-Id: If6b43982c958f63bc78ad997400bf1279c23df7e
2016-01-04	Wait for cluster to settle in yum_update.sh	Jiri Stransky	1	-1/+11
	Occasionally we hit "Error: unable to push cib" during update. This is probably due to the fact that when we try to replace cib in yum_update.sh, services on the previous updated controller are still coming up and changing cib, and racing/conflicting with the cib push from yum_update.sh. This commit adds waiting for the cluster to settle before exiting from yum_update.sh, to avoid this kind of conflict. Also a check for cib-push success is added, to make the update fail properly instead of hanging indefinitely as we've observed with this issue. Change-Id: I953087e0e565474ac553fd57bea2459d2e3a6081 Closes-Bug: #1527644
2015-12-03	Merge "Add pcmk constraints against haproxy-clone only if applicable"	Jenkins	1	-32/+34

2015-12-03	Merge "Apply mongod timeout via cib-push"	Jenkins	1	-1/+1

2015-12-02	Add pcmk constraints against haproxy-clone only if applicable	Giulio Fidente	1	-32/+34
	When the Overcloud does not host an instance of haproxy, pcmk will not have any resource named haproxy-clone so we should not add any constraint relying on it. Change-Id: I801f07b7570f3805aa71c22998fec6b6f192b350
2015-11-25	Merge "Update: clean keepalived and radvd instances after pcs cluster stop"	Jenkins	1	-0/+7

2015-11-25	Apply mongod timeout via cib-push	Giulio Fidente	1	-1/+1
	We forgot to apply the mongod timeout in the cib dump first, to apply it later in a single cib-push step. Change-Id: Ib104e51782c6d3f646907cdb06c74fd4cbf9028c
2015-11-24	Update: clean keepalived and radvd instances after pcs cluster stop	Jiri Stransky	1	-0/+7
	Older neutron versions have a bug which makes them leave keepalived and radvd running even after all neutron services are stopped, preventing neutron router failover from happening. Router can then get stuck on the inactive node, like this: [stack@instack ~]$ neutron l3-agent-list-hosting-router default_router +--------------------------------------+------------------------------------+----------------+-------+----------+ \| id \| host \| admin_state_up \| alive \| ha_state \| +--------------------------------------+------------------------------------+----------------+-------+----------+ \| 48ca9477-b93b-4305-9e6d-9f1c5d3388f0 \| overcloud-controller-1.localdomain \| True \| :-) \| standby \| \| eba0575c-654f-4da6-b1cd-f7fdf1cd3726 \| overcloud-controller-2.localdomain \| True \| :-) \| standby \| \| 68815390-251f-4425-a5f8-38bdbf3bdb90 \| overcloud-controller-0.localdomain \| True \| xxx \| active \| +--------------------------------------+------------------------------------+----------------+-------+----------+ We need to kill the leftover processes manually to prevent the state described above from happening. See https://review.gerrithub.io/#/c/248931 Change-Id: I2deaa176222983daa0c33ab52a6aa5dbe7365302
2015-11-23	Fixup neutron constraints in older overclouds before updating	marios	1	-0/+10
	The neutron pcs constraints were reworked in https://review.openstack.org/#/c/229466/ For overclouds deployed with older tripleo-heat-templates the current pcs ordering constraints will not have those changes, meaning that the behaviour discussed at https://bugs.launchpad.net/tripleo/+bug/1501378 is likely given we will stop and restart all services. This review applies those, in short, remove the ovs-cleanup after neutron-server and add openvswitch-agent instead. Detail in the bug report and linked BZ. Change-Id: I45822c5fe9029f11635400b7fbd386880ac80a4e Related-Bug: 1501378
2015-11-19	Add constraints and timeouts from file in single step	Giulio Fidente	1	-78/+50
	To avoid pcmk reconfiguring the resources on each config change, we want to apply the constraints and timeouts from file. We also do not want to alter the timeouts for a few ocf resources which are rabbitmq, neutron-netns-cleanup and neutron-ovs-cleanup Change-Id: I6875f19e1f34f0fdcf0928421f49b61d857ca7c8 Co-Authored-By: Andrew Beekhof <abeekhof@redhat.com>
2015-11-17	Verify galera is sync'd in yum_update.sh	James Slagle	1	-0/+12
	When the cluster is brought back online after a yum update in yum_update.sh, we should verify that galera is fully sync'd before moving on. This ensures the sync is complete before moving on to update any other nodes in the cluster. Change-Id: Ie8fc2c5d5214deacea94ca658ac75359b318ced1
2015-11-13	Set start/stop pacemaker resource timeouts for updates	Jiri Stransky	1	-0/+72
	This matches change I6fc18f1ad876c5a25723710a3b20d8ec9519dcba, but we need it to set it before attempting the cluster stop - yum update - cluster start cycle, to make sure this cycle doesn't hit the low timeout limits. This can be removed once updates from deployments made prior to I6fc18f1ad876c5a25723710a3b20d8ec9519dcba are no longer supported. Change-Id: I587136d8d045d213875c657ea5a405074f80c8ad
2015-11-11	Add missing constraints in yum_update.sh	James Slagle	1	-0/+30
	Some missing pacemaker constraints were added in the following commits: https://review.openstack.org/#/c/219770/ https://review.openstack.org/#/c/219665/ https://review.openstack.org/#/c/218931/ https://review.openstack.org/#/c/218930/ Overclouds that were deployed prior to these constraints being added to tripleo-heat-templates still have the constraints missing. During an update, stopping and starting the cluster can fail without these constraints in place. As a workaround, conditionally add these contraints in yum_update.sh so that we're sure they're always present before updating. Change-Id: Id46c85dbbe5e85d362279661091b17ce1b697fe0
2015-10-01	Force stop a single node pacemaker on yum update	Steve Baker	1	-1/+7
	Currently package updates won't occur on a single node non-HA pacemaker managed Controller because stopping the node loses the quorum of 1. This change gets the count of current nodes in the cluster and if the count is 1 then specify --force when doing a pcs cluster stop. Change-Id: I0de2488e24f1ef53a935dbc90ec6de6142bb4264
2015-10-01	Make package upgrade pacemaker-aware	Steve Baker	1	-7/+45
	This change adds alternative logic for handling package updates on a pacemaker managed node. "yum list updates" is now run and this script exits early if there are no packages to update. If the pacemaker service is not running then the previous puppet logic remains, so a package update is performed which excludes packages managed by puppet, and a flag is set to indicate that puppet should perform an ensure=>latest on all packages it manages. However if the pacemaker service is running, the following occurs: - pcs cluster stop is run for this node - a full yum update is performed - pcs cluster start is run for this node - pcs status is run until the hostname for this node appears in the Online list This means that puppet is not involved in the package update process when the node is managed by pacemaker. Change-Id: I5ad118552d053dbda280978751167d9fd9da9874
2015-10-01	Ensure present/latest for puppet driven package updates	Steve Baker	1	-0/+9
	This change updates yum_update.sh so that we set set a boolean output when "managed" packages should get updated. The output is named 'update_managed_packages' and for the puppet implementation it is wired up so that it directly sets tripleo::packages::enable_upgrade to control whether packages are updated. It also modifies yum_update.sh to build a yum update excludes list for packages managed by puppet. The exclude lists are being generated via puppet-tripleo as well via the new 'write_package_names' function that is now wired into all the role manifests. This change does not actually trigger the puppet apply. The fix for Related-Bug: #1463092 will be used to trigger the puppet run when the hiera changes. As a minor tweak to this logic we append the UpdateIdentifier to the config_identifier so that we ensure puppet gets executed on an update where other (non-related) hiera changes also occur. Co-Authored-By: Dan Prince <dprince@redhat.com> Change-Id: I343c3959517eae38bbcd43648ed56f610272864d
2015-06-08	Config & deployments to update overcloud packages	Steve Baker	1	-0/+41
	This change adds config and deployment resources to trigger package updates on nodes. The deployments are triggered by doing a stack-update and setting one of the parameters to a unique value. The intent is that rolling update will be controlled by setting breakpoints on all of the UpdateDeployment resources inside the role resource groups. Change-Id: I56bbf944ecd6cbdbf116021b8a53f9f9111c134f