summaryrefslogtreecommitdiffstats
path: root/extraconfig/tasks/yum_update.sh
AgeCommit message (Collapse)AuthorFilesLines
2016-11-23Run os-net-config before restarting cluster on updateBrent Eagles1-0/+11
Running os-net-config before restarting the cluster prevents changes to the interface files caused by changes to implementation from bouncing network interfaces after the cluster has restarted. Closes-Bug: #1644138 Change-Id: I65fb104465ff3d37ddc791634302994334136014
2016-11-22Fix ovs 2.4 to 2.5 upgrade - minor update non controllersmarios1-14/+13
In I9b1f0eaa0d36a28e20b507bec6a4e9b3af1781ae and I11fcf688982ceda5eef7afc8904afae44300c2d9 we landed a workaround for the openvswitch 2.4 to 2.5 upgrade discussed in the bug below. Unfortunately testing has revealed a problem with the minor update case specifically for non controllers. It seems we would exit before the ovs workaround has had a chance to execute. This moves the block up a few lines to avoid this condition. As with the other two reviews noted here, this will need to go into newton and then mitaka too. Change-Id: If905de82d96302334ebe02de9c43f00faed9b72b Related-Bug: 1635205
2016-11-01Update openstack-puppet-modules dependenciesLukas Bezdicka1-1/+2
OPM package is metadata package with unversioned requirements which means that update does not update the dependencies. This leaves us with old puppet modules and old puppet during the puppet run. Change-Id: I80f8a73142a09bb4178bb5a396d256ba81ba98a8 Closes-Bug: #1638266 Resolves: rhbz#1390559
2016-10-27Add replacepkgs to the manual ovs upgrade workaround and fix a typoMathieu Bultel1-3/+2
rpm command will return an exit 1 if ovs package is already there and will exit the step_1.sh script. To get around this force the update with --replacepkgs Also remove the \ just before the $ which cause a syntax error for the ceph storage Change-Id: I11fcf688982ceda5eef7afc8904afae44300c2d9 Closes-bug: 1636748
2016-10-22Fix the rabbitmq/redis pacemaker resource timeouts on updatesMichele Baldessari1-0/+19
With the following two changes we increased the timeout for redis and rabbit for both starting and stopping to 200s: https://review.openstack.org/386618 newton (merged) https://review.openstack.org/385555 master (merged) We want to also fix that on minor updates on all our supported releases upstream and downstream (newton, mitaka, liberty, kilo). This way we can guarantee that we have a uniform timeout for sart and stop for rabbit and redis across all our releases. Change-Id: If59bf3386832ee78d3a654f01077aff2e8be76e8 Closes-Bug: #1634851
2016-10-20Add special case handling for OVS upgrade in updates and upgradesmarios1-0/+15
This adds a special case handling for the opensvswitch package as discussed at the related bug below. This is added/handled here for both the minor update and the major mitaka...newton upgrade. Change-Id: I9b1f0eaa0d36a28e20b507bec6a4e9b3af1781ae Closes-Bug: 1635205
2016-04-26Merge "Update .sh references from openstack-keystone to openstack-core"Jenkins1-103/+0
2016-04-20Merge "Increase galera sync timeout in yum_update.sh"Jenkins1-1/+1
2016-04-11Update .sh references from openstack-keystone to openstack-coreGiulio Fidente1-103/+0
The update and upgrade shell scripts were still referencing the old openstack-keystone service which got removed with Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4 Also removes kilo and liberty specific workarounds and config changes. Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
2016-03-20Deploy Aodh services, replacing Ceilometer AlarmPradeep Kilambi1-2/+6
Ceilometer Alarm is deprecated in Liberty by Aodh. This patch: * manage Aodh Keystone resources * deploy Aodh API under WSGI, Notifier, Listener and Evaluator * manage new parameters to customize Aodh deployment * uses ceilometer DB for the upgrade path * pacemaker config * Add migration logic to remove pcs resources Depends-On: I5333faa72e52d2aa2a622ac2d4b60825aadc52b5 Depends-On: Ib6c9c4c35da3fb55e0ca8e2d5a58ebaf4204d792 Co-Authored-By: Emilien Macchi <emilien@redhat.com> Change-Id: Ib47a22884afb032ebc1655e1a4a06bfe70249134
2016-03-04Revert "Deploy Aodh services, replacing Ceilometer Alarm"James Slagle1-7/+2
This just a revert to see if reverting this gets back to a normal CI run time. This reverts commit f72aed85594f223b6f888e6d0af3c880ea581a66. Change-Id: I04a0893f6cf69f547a4db26261005e580e1fc90b
2016-03-03Deploy Aodh services, replacing Ceilometer AlarmEmilien Macchi1-2/+7
Ceilometer Alarm is deprecated in Liberty by Aodh. This patch: * manage Aodh Keystone resources * deploy Aodh API under WSGI, Notifier, Listener and Evaluator * manage new parameters to customize Aodh deployment * uses ceilometer DB for the upgrade path * pacemaker config Depends-On: I9e34485285829884d9c954b804e3bdd5d6e31635 Depends-On: I891985da9248a88c6ce2df1dd186881f582605ee Depends-On: Ied8ba5985f43a5c5b3be5b35a091aef6ed86572f Co-Authored-By: Pradeep Kilambi <pkilambi@redhat.com> Change-Id: I58d419173e80d2462accf7324c987c71420fd5f6
2016-02-23Add meta notify=true to rabbitmq resourceMichele Baldessari1-0/+3
See RHBZ 1311005 and 1247303. In short: sometimes when a controller node gets fenced, rabbitmq is unable to rejoin the cluster. To fix this we need two steps: 1) The fix for the RA in BZ 1247303 2) Add notify=true to the meta parameters of the rabbitmq resource on fresh installs and updates Note that if this change is applied on systems that do not have the fix for the rabbitmq resource agent, no action is taken. So when the resource agent will be updated, the notify operation will start to work as soon as the first monitor action will take place. Fixes RH Bug #1311005 Change-Id: I513daf6d45e1a13d43d3c404cfd6e49d64e51d5a
2016-02-08Pass -q option to yumZane Bitter1-2/+2
The maximum payload size of the return signal from a Heat software deployment is 1MB, and the output of yum starts breaking this limit at ~1000 packages to update - which is not an atypical number. To prevent this, pass the -q (quiet) option to reduce the amount of output to a manageable level. Change-Id: I517271e8465885421a78b73c5af756816c37a977 Resolves-rhbz: #1304878 Closes-Bug: #1543034
2016-01-26Increase galera sync timeout in yum_update.shJiri Stransky1-1/+1
We've seen the 360 second threshold broken and a failed update because of that, even though Galera eventually synced fine, clusterchecks OK and pcs status clean. This will give Galera more time to perform the sync. Change-Id: I17207ec9b4038fb9540582c9b0b717f9b85a78b9 Closes-Bug: #1538218
2016-01-17Let Puppet update all packages on non-controllersJames Slagle1-8/+5
With I02f7cf07792765359f19fdf357024d9e48690e42[1] in puppet-tripleo, puppet is capable of updating all packages itself on non controller nodes now. This is a safer mechanism than using the exclude logic in yum_update.sh since that can cause depdency problems across sub packages. [1] https://review.openstack.org/#/c/261041/ Closes-Bug: 1534785 Change-Id: I9075a1bb85baa65a9d0afc5d0fd31a1f99a98819
2016-01-05Bump the pacemaker service op_params to 200s for start and stopmarios1-2/+2
Based on observed timeouts during updates bump the stop and start timeouts for pacemaker service resources (via op_params) to 200. This is based on the reasoning that the full timeout may be as long as two elapsed timeout intervals. After an initial timeout, the sigterm that follows is then allowed another DefaultTimeoutStopSec seconds. The 200s is produced by allowing this 2xDefaultTimeoutStopSec (@90s for systemd) and some scheduling delta. Many thanks to Michele Baldessari. Closes-Bug: 1531204 Change-Id: If6b43982c958f63bc78ad997400bf1279c23df7e
2016-01-04Wait for cluster to settle in yum_update.shJiri Stransky1-1/+11
Occasionally we hit "Error: unable to push cib" during update. This is probably due to the fact that when we try to replace cib in yum_update.sh, services on the previous updated controller are still coming up and changing cib, and racing/conflicting with the cib push from yum_update.sh. This commit adds waiting for the cluster to settle before exiting from yum_update.sh, to avoid this kind of conflict. Also a check for cib-push success is added, to make the update fail properly instead of hanging indefinitely as we've observed with this issue. Change-Id: I953087e0e565474ac553fd57bea2459d2e3a6081 Closes-Bug: #1527644
2015-12-03Merge "Add pcmk constraints against haproxy-clone only if applicable"Jenkins1-32/+34
2015-12-03Merge "Apply mongod timeout via cib-push"Jenkins1-1/+1
2015-12-02Add pcmk constraints against haproxy-clone only if applicableGiulio Fidente1-32/+34
When the Overcloud does not host an instance of haproxy, pcmk will not have any resource named haproxy-clone so we should not add any constraint relying on it. Change-Id: I801f07b7570f3805aa71c22998fec6b6f192b350
2015-11-25Merge "Update: clean keepalived and radvd instances after pcs cluster stop"Jenkins1-0/+7
2015-11-25Apply mongod timeout via cib-pushGiulio Fidente1-1/+1
We forgot to apply the mongod timeout in the cib dump first, to apply it later in a single cib-push step. Change-Id: Ib104e51782c6d3f646907cdb06c74fd4cbf9028c
2015-11-24Update: clean keepalived and radvd instances after pcs cluster stopJiri Stransky1-0/+7
Older neutron versions have a bug which makes them leave keepalived and radvd running even after all neutron services are stopped, preventing neutron router failover from happening. Router can then get stuck on the inactive node, like this: [stack@instack ~]$ neutron l3-agent-list-hosting-router default_router +--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | 48ca9477-b93b-4305-9e6d-9f1c5d3388f0 | overcloud-controller-1.localdomain | True | :-) | standby | | eba0575c-654f-4da6-b1cd-f7fdf1cd3726 | overcloud-controller-2.localdomain | True | :-) | standby | | 68815390-251f-4425-a5f8-38bdbf3bdb90 | overcloud-controller-0.localdomain | True | xxx | active | +--------------------------------------+------------------------------------+----------------+-------+----------+ We need to kill the leftover processes manually to prevent the state described above from happening. See https://review.gerrithub.io/#/c/248931 Change-Id: I2deaa176222983daa0c33ab52a6aa5dbe7365302
2015-11-23Fixup neutron constraints in older overclouds before updatingmarios1-0/+10
The neutron pcs constraints were reworked in https://review.openstack.org/#/c/229466/ For overclouds deployed with older tripleo-heat-templates the current pcs ordering constraints will not have those changes, meaning that the behaviour discussed at https://bugs.launchpad.net/tripleo/+bug/1501378 is likely given we will stop and restart all services. This review applies those, in short, remove the ovs-cleanup after neutron-server and add openvswitch-agent instead. Detail in the bug report and linked BZ. Change-Id: I45822c5fe9029f11635400b7fbd386880ac80a4e Related-Bug: 1501378
2015-11-19Add constraints and timeouts from file in single stepGiulio Fidente1-78/+50
To avoid pcmk reconfiguring the resources on each config change, we want to apply the constraints and timeouts from file. We also *do not* want to alter the timeouts for a few ocf resources which are rabbitmq, neutron-netns-cleanup and neutron-ovs-cleanup Change-Id: I6875f19e1f34f0fdcf0928421f49b61d857ca7c8 Co-Authored-By: Andrew Beekhof <abeekhof@redhat.com>
2015-11-17Verify galera is sync'd in yum_update.shJames Slagle1-0/+12
When the cluster is brought back online after a yum update in yum_update.sh, we should verify that galera is fully sync'd before moving on. This ensures the sync is complete before moving on to update any other nodes in the cluster. Change-Id: Ie8fc2c5d5214deacea94ca658ac75359b318ced1
2015-11-13Set start/stop pacemaker resource timeouts for updatesJiri Stransky1-0/+72
This matches change I6fc18f1ad876c5a25723710a3b20d8ec9519dcba, but we need it to set it before attempting the cluster stop - yum update - cluster start cycle, to make sure this cycle doesn't hit the low timeout limits. This can be removed once updates from deployments made prior to I6fc18f1ad876c5a25723710a3b20d8ec9519dcba are no longer supported. Change-Id: I587136d8d045d213875c657ea5a405074f80c8ad
2015-11-11Add missing constraints in yum_update.shJames Slagle1-0/+30
Some missing pacemaker constraints were added in the following commits: https://review.openstack.org/#/c/219770/ https://review.openstack.org/#/c/219665/ https://review.openstack.org/#/c/218931/ https://review.openstack.org/#/c/218930/ Overclouds that were deployed prior to these constraints being added to tripleo-heat-templates still have the constraints missing. During an update, stopping and starting the cluster can fail without these constraints in place. As a workaround, conditionally add these contraints in yum_update.sh so that we're sure they're always present before updating. Change-Id: Id46c85dbbe5e85d362279661091b17ce1b697fe0
2015-10-01Force stop a single node pacemaker on yum updateSteve Baker1-1/+7
Currently package updates won't occur on a single node non-HA pacemaker managed Controller because stopping the node loses the quorum of 1. This change gets the count of current nodes in the cluster and if the count is 1 then specify --force when doing a pcs cluster stop. Change-Id: I0de2488e24f1ef53a935dbc90ec6de6142bb4264
2015-10-01Make package upgrade pacemaker-awareSteve Baker1-7/+45
This change adds alternative logic for handling package updates on a pacemaker managed node. "yum list updates" is now run and this script exits early if there are no packages to update. If the pacemaker service is not running then the previous puppet logic remains, so a package update is performed which excludes packages managed by puppet, and a flag is set to indicate that puppet should perform an ensure=>latest on all packages it manages. However if the pacemaker service is running, the following occurs: - pcs cluster stop is run for this node - a full yum update is performed - pcs cluster start is run for this node - pcs status is run until the hostname for this node appears in the Online list This means that puppet is not involved in the package update process when the node is managed by pacemaker. Change-Id: I5ad118552d053dbda280978751167d9fd9da9874
2015-10-01Ensure present/latest for puppet driven package updatesSteve Baker1-0/+9
This change updates yum_update.sh so that we set set a boolean output when "managed" packages should get updated. The output is named 'update_managed_packages' and for the puppet implementation it is wired up so that it directly sets tripleo::packages::enable_upgrade to control whether packages are updated. It also modifies yum_update.sh to build a yum update excludes list for packages managed by puppet. The exclude lists are being generated via puppet-tripleo as well via the new 'write_package_names' function that is now wired into all the role manifests. This change does not actually trigger the puppet apply. The fix for Related-Bug: #1463092 will be used to trigger the puppet run when the hiera changes. As a minor tweak to this logic we append the UpdateIdentifier to the config_identifier so that we ensure puppet gets executed on an update where other (non-related) hiera changes also occur. Co-Authored-By: Dan Prince <dprince@redhat.com> Change-Id: I343c3959517eae38bbcd43648ed56f610272864d
2015-06-08Config & deployments to update overcloud packagesSteve Baker1-0/+41
This change adds config and deployment resources to trigger package updates on nodes. The deployments are triggered by doing a stack-update and setting one of the parameters to a unique value. The intent is that rolling update will be controlled by setting breakpoints on all of the UpdateDeployment resources inside the role resource groups. Change-Id: I56bbf944ecd6cbdbf116021b8a53f9f9111c134f