summaryrefslogtreecommitdiffstats
path: root/extraconfig/tasks
AgeCommit message (Collapse)AuthorFilesLines
2016-09-19Add a function to upgrade from full HA to NG HAMichele Baldessari3-16/+137
This is the initial work to have a function that migrates a full HA architecture as deployed in Mitaka to the HA architecture as deployed in Newton where only a few resources are managed by pacemaker. The sequence is the following: 1) We remove the desired services from pacemaker's control. The services at this point are still running normally via the systemd service as invoked by pacemaker 2) We do a "systemctl stop <service>" on all controllers for all the services that were removed from pacemaker's control. We do this to make sure that during the yum upgrade, the %post sections that call "systemctl try-restart" do not take ages, because at this point during the upgrade rabbit is down. The only exceptions are "openstack-core" and "delay" which are dummy pacemaker resources that do not exist on the system 3) We do a "systemctl start <service>" on all nodes for all the services mentioned above. We should probably merge this patch only when newton has branched as it is very specific to the M/N upgrade. Closes-Bug: 1617520 Change-Id: I4c409ce58c1a57b6e0decc3cf168b62698b32e39
2016-09-17M/N upgrade sahara-api fails to restart.Sofer Athlan-Guyot1-0/+2
Change-Id: I7a041dab8b1b1edc9c80248e1eef3ce7ab272292 Closes-Bug: 1615056
2016-09-17Rework the pacemaker_common_functions for M..N upgradesmarios4-61/+290
For N we cannot assume services are managed by pacemaker. This adds functions to check if a service is systemd or pcmk managed and start/stops it accordingly. For pcmk, only stop/disable on bootstrap node for example, whereas systemd should stop/start on all controllers. There is also an equivalent change to the check_resource which has been reworked to allow both pcmk and systemd. Implements: blueprint overcloud-upgrades-workflow-mitaka-to-newton Change-Id: Ic8252736781dc906b3aef8fc756eb8b2f3bb1f02
2016-09-16Merge "Refactor upgrade checks."Jenkins3-62/+111
2016-09-16Merge "Convert UpdateWorkflow to support composable roles"Jenkins3-84/+24
2016-09-16Merge "Fix use of batch_create in CephMon major upgrade template"Jenkins1-1/+2
2016-09-16Fix use of batch_create in CephMon major upgrade templateMathieu Bultel1-1/+2
The batch_create and rolling_update keys were incorrectly defined as properties of the resource instead of update policies. Change-Id: I19261adc78e4cdc3616f16221e85490a6b48d47b Closes-Bug: 1623506
2016-09-16Fixes the Ceph upgrade scriptsGiulio Fidente2-5/+5
The Ceph upgrade scripts was failing on the following: 1. a syntax error in an if condition 2. an attempt to read a possibly unbound variable 3. an attempt to chown a directory which might not exist this change aims at fixing all of the above. Closes-Bug: 1623942 Change-Id: I9e9d63d4ab7626893aaf2a25dccfcafbb97ccbdf
2016-09-16Convert UpdateWorkflow to support composable rolesSteven Hardy3-84/+24
We need to remove the hard-coded roles from overcloud.j2.yaml as now it's valid to e.g remove BlockStorage completely. The previous behavior for the per-role upgrade scripts is maintained but we'll need to rework this for newton->ocata upgrades where we can no longer be sure the servers mapping will contain all roles. Change-Id: I25e6c84757e3c00fba2aae834cd8206c62e44acf Partially-Implements: blueprint custom-roles
2016-09-12Refactor upgrade checks.Sofer Athlan-Guyot3-62/+111
We make it clear that recoverable checks happen before starting the upgrade to be able to run the upgrade after the offending error has been manually corrected. Add new check for the pcsd cluster status. Add new check for galera password file: BZ 1357112 Closes-Bug: 1614907 Change-Id: If736c79121e1ffe0eaeb814bdb73ccbc0b64edcd
2016-09-09Merge "Add Ceph cluster health validation on upgrade"Jenkins2-4/+32
2016-09-02Merge "Upgrade ceph-osd"Jenkins1-10/+67
2016-09-01Merge "Restart only services that need it"Jenkins1-8/+16
2016-08-31Restart only services that need itJiri Stransky1-8/+16
With new pacemaker architecture, Puppet handles restarts of most of the services. There are several still managed by pacemaker which need special restart handling utilizing pacemaker and its resource agents. The counterpart in puppet-tripleo requests restarts for individual pacemaker-managed services by writing out "restart flag" files, and the pacemaker_resource_restart.sh script then performs the restarts. Change-Id: Ia4e6a9f88181f1981993f046cf415dbbcdc9570e Closes-Bug: #1614967 Closes-Bug: #1587015 Depends-On: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
2016-08-30Merge "M/N upgrade fix galera restart."Jenkins1-11/+16
2016-08-30Add Ceph cluster health validation on upgradeGiulio Fidente2-4/+32
This will prevent the Ceph Mon upgrade script from starting if the Ceph cluster is in error state. It also adds a parameter to ignore warning states, useful when performing an upgrade of a cluster where the number of healthy OSDs does not guarantee the desired replica size. Closes-Bug: 1618533 Change-Id: I1beb8ad0812f19b1018ba19b5a9fc85fa132d7f7
2016-08-30Upgrade ceph-osdGiulio Fidente1-10/+67
Upgrades the ceph-osd daemon from Hammer to Jewel Change-Id: Idfa90fdc0052c53f448401c85c5d13a2ba68acd1
2016-08-30Merge "Upgrade ceph-mon"Jenkins2-0/+81
2016-08-30Merge "Fix check of rpm-python."Jenkins1-1/+1
2016-08-29Upgrade ceph-monGiulio Fidente2-0/+81
Adds a pre-requisite software deployment to the pacemaker scenario upgrade which, before the openstack services are upgraded, upgrades the ceph-mon daemon from Hammer to Jewel. Change-Id: I9855d80a6ae156b4a9e0409c3c927068b9db95a0
2016-08-29Merge "M/N upgrade set scheduler_host_manager right."Jenkins1-0/+2
2016-08-26Merge "M/N upgrade fail to restart nova-scheduler."Jenkins1-0/+1
2016-08-26M/N upgrade fix galera restart.Sofer Athlan-Guyot1-11/+16
We have to recreate the /var/lib/mysql directory on all controller node, not just the boostrap node for the cluster to be able to restart. Adding a warning on the fact that those script are local and know nothing about the good upgrade state of the other nodes. Closes-Bug: 1612642 Change-Id: I48e2812d7df80bbf2db53a8b71dc434d4209a160
2016-08-26Fix check of rpm-python.Sofer Athlan-Guyot1-1/+1
There is a typo in the code, making this test always successful. Closes-Bug: 1614437 Change-Id: Ia6b0b156294de9fcb8f66fc46aa8801555775a56
2016-08-26M/N upgrade set scheduler_host_manager right.Sofer Athlan-Guyot1-0/+2
scheduler_host_manager doesn't take nova.scheduler.host_manager.HostManager as a value anymore. This fix it before restarting the service. Change-Id: Ia9adcfd5a898f0c712b4a37ae33db88a44630f0d Closes-Bug: 1615035
2016-08-26Merge "Update pacemaker_resource_restart.sh for new HA arch"Jenkins1-25/+8
2016-08-25Merge "Clean up old functions"Jenkins1-61/+0
2016-08-24Clean up old functionsPradeep Kilambi1-61/+0
These are not needed any more as they were specific to mitaka upgrades. Change-Id: I0d421b942e620403f88374e1c82105747d8d84c9
2016-08-24M/N upgrade fail to restart nova-scheduler.Sofer Athlan-Guyot1-0/+1
The nova api db need to be synchronized as well. Change-Id: I2628b24ff1153c84cbf388455666ae42570cb10f Closes-Bug: 1615042
2016-08-24Fix check for MariaDB upgrade manual switch offJiri Stransky1-1/+1
The MySqlMajorUpgrade parameter has validation on it allowing only values yes/no/auto, however in the script we checked for '0' instead of 'no', which means the only effective values were yes/auto. This is now fixed to allow switching the migration off. Change-Id: I5d64734894c6bfd9003ad643f3747e34e62465cc Closes-Bug: #1616429
2016-08-22Don't trigger mariadb upgrade dump/restore when not neededJiri Stransky1-2/+2
When upgrading from mariadb X.Y.Z to mariadb X.Y.Ž (X.Y part stays the same), the dump/restore of mariadb shouldn't be necessary. Therefore we now only check for up to the first 2 fields of the version string when determining if we should trigger the dump/restore operation. Closes-Bug: #1615721 Change-Id: Ib7af8bfb121f5c83184d51b3c6dc657108c25973
2016-08-17Upgrade scripts to migrate aodh alarm dataPradeep Kilambi2-0/+52
In Newton, Aodh will be using its own mysql DB rather than using ceilometer's mongo instance. This means we need to migrate any existing alarm and alrm history data from ceilometer DB to aodh mysqlDB. Upstream aodh provides us with a aodh-data-migration utility. We need to invoke this during the mitaka->newton upgrade procedure so data is migrated as expected and aodh mysql backend takes over. Closes-bug: #1611794 Change-Id: I17888b57ecf98cd83e92af2f9cdbead066b03aa3
2016-08-16Update pacemaker_resource_restart.sh for new HA archJames Slagle1-25/+8
Given the new HA architecture with less pacemaker managed resources, we need to update this script to reflect those changes. Without these changes, stack-updates using the exact same templates will fail since this script is always executed on update. Change-Id: I2ce1681d19d4a24a7561e3dd9c5efdae40d030b7 Closes-Bug: #1612667
2016-07-28Merge "Allow to manually disable post-puppet restarts"Jenkins2-13/+30
2016-07-04Merge "Dump and restore galera db during major upgrades"Jenkins3-0/+184
2016-06-29Dump and restore galera db during major upgradesMichele Baldessari3-0/+184
When the overcloud is upgraded we do a yum update of the packages. This step might introduce a newer galera version. In such a situation we need to dump the db and restore it. The high-level workflow should be the following: 1) During the main upgrade step, before shutting down the cluster we need to dump the db 2) We upgrade the packages 3) We briefly start mysql on a single node while making sure that /root/.my.cnf is briefly moved out of the way (because it contains a password) and import the data. After the import we shutdown this mysql instance 4) We let the cluster start up normally The above steps will take place in the following scenarios. Given a locally installed mariadb version X.Y.Z and release R, we will dump and restore the DB under the following conditions: A) MySqlMajorUpgrade template parameter is set to 'auto' and the upgraded package differs in X, Y *or* Z. We basically don't dump automatically if the release field changes. B) MySqlMajorUpgrade template parameter is set to 'yes' When MySqlMajorUpgrade is set to 'no', no dumping will be performed. Note that this will give a non functional upgrade if a major mariadb upgrade is taking place. Partial-Bug: #1587449 Co-Author: Damien Ciabrin <dciabrin@redhat.com> Co-Author: Mike Bayer <mbayer@redhat.com> Depends-On: I8cb4cb3193e6b823aad48ad7dbbbb227364d2a58 Depends-On: I38dcacfabc44539aab1f7da85168fe44a1b43a51 Change-Id: I374628547aed091129d0deaa29764bfc998d76ea
2016-06-29Increase cluster sync timeout for M->N major upgradesDamien Ciabrini1-1/+1
Since the Liberty release, the number of services managed by pacemaker on HA Overcloud has increased. This has an impact on major_upgrade_controller_pacemaker_1.sh, where cluster sync timeout value tuned for older releases is now becoming too low. Raise the cluster sync timeout value to a sensible limit to give pacemaker enough time to stop the cluster during major upgrade. Change-Id: I821d354ba30ce39134982ba12a82c429faa3ce62 Closes-Bug: #1597506
2016-06-24Disable stonith temporarily during upgradesMichele Baldessari1-0/+14
It is best if we disable stonith if a cluster has it configured and on, before we call "pcs cluster stop --all", because should a service fail to stop for whatever reason, pacemaker will fence the node where it happened. This is something that we unlikely want during an upgrade as it will make things worse. Once the cluster is stopped we can reenable stonith (if it was enabled to start with) in the CIB while the cluster is shut down. Closes-Bug: #1596065 Change-Id: I38dcacfabc44539aab1f7da85168fe44a1b43a51
2016-06-14Allow to manually disable post-puppet restartsJiri Stransky2-13/+30
Restarting services after Puppet is vital to ensure that config changes go applied. However, it can be sometimes desirable to prevent these restarts to avoid downtime, if the operator is sure that no config changes need applying. This can be a case e.g. when scaling compute nodes. Passing the puppet-pacemaker-no-restart.yaml environment file *in addition* to puppet-pacemaker.yaml should allow this. This is a stop gap solution before we have proper communication between Puppet and Pacemaker to allow selective restarts. Change-Id: I9c3c5c10ed6ecd5489a59d7e320c3c69af9e19f4
2016-05-04Merge "Disable VIPs before stopping cluster during version upgrade"Jenkins2-0/+11
2016-05-04Merge "Fix distinguishing between stack-create and stack-update"Jenkins1-1/+1
2016-05-02Disable VIPs before stopping cluster during version upgradeIan Pilcher2-0/+11
If "pcs cluster stop --all" is executed on a controller that happens to have a VIP on the internal network, pcs may use the VIP as the source address for communication with another cluster node. When pacemaker is stopped this VIP goes away, and pcs never receives a response from the other node. This causes pcs to hang indefinitely; eventually the upgrade times out and fails. Disabling the VIPs before stopping the cluster avoids this situation. Change-Id: I6bc59120211af28456018640033ce3763c373bbb Closes-Bug: 1577570
2016-04-26Merge "Update .sh references from openstack-keystone to openstack-core"Jenkins4-115/+8
2016-04-20Merge "Increase galera sync timeout in yum_update.sh"Jenkins1-1/+1
2016-04-13Make sure openstack services are dependent on openstack-coreJiri Stransky1-0/+36
Previously ceilometer-notification, aodh-listener and sahara-engine didn't have constraints that would anchor them under openstack-core dummy resource. Such constraints are added now. (sahara-engine starting after sahara-api, aodh-listener after aodh-evaluator, and ceilometer-notification after openstack-core.) Openstack-core -> heat-api constraint has been removed because heat-api depends on ceilometer-notification, so there's a transitive dependency on openstack-core already. Change-Id: Ided7321ebbf2c3556726343b4bb466fd8759b43a Closes-Bug: #1569444
2016-04-11Fix distinguishing between stack-create and stack-updateJiri Stransky1-1/+1
Previously we tried to use UpdateIdentifier for two different things: tell whether to perform package update, and also to tell whether the top-level stack is being created or updated (which was incorrect and resulted in bug 1567384, and an attempt to work around that bug resulted in bug 1567385). We cannot use Heat's "action" conditionals in some cases, because they refer to the direct parent stack, which can yield undesirable results when introducing new nested stacks or temporarily no-opping something and then adding it back (in both these cases, "action" would be considered "CREATE", even though the top-level stack is in "UPDATE"). So tripleoclient passes a new parameter StackAction to tell whether the top-level stack is being created or updated, and we make use of that. (It seems there's no better way of getting this info from within the nested Heat stacks.) Change-Id: Ie14ddbff15e7ed21aaa3fcdacf36e0040f912382 Depends-On: I9dc3b4cd8a6a71df34d8babf0e4c6505041f5311 Closes-Bug: #1567384 Related-Bug: #1567385
2016-04-11Update .sh references from openstack-keystone to openstack-coreGiulio Fidente4-115/+8
The update and upgrade shell scripts were still referencing the old openstack-keystone service which got removed with Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4 Also removes kilo and liberty specific workarounds and config changes. Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
2016-04-08Replace extraconfig/tasks/noop.yaml w/ Heat::NoneDan Prince1-26/+0
Removes the old noop nested stack template for extraconfig tasks and instead uses OS::Heat::None. This should avoid a few extra resource checks on create and update. Change-Id: I5a42fc78ece2553e86385236e214aa1e3c91cd85
2016-04-07Add removal of the /etc/resolv.conf.save file for +bug/1567004marios1-0/+3
The change at https://review.openstack.org/#/c/302352/ should stop the if up/down scripts from making changes to resolv.conf as discussed in that review and the related bug below. However during upgrades, as we are moving from a version of the ifcfg-vlanXX files that don't have the PEERDNS=no added by /#/c/302352 the if up script will restore the /etc/resolv.conf.save to /etc/resolv.conf and overwrite it. This removes the .save file during the upgrade init command which gets delivered to all nodes as the first stage of a major upgrade. Change-Id: I91dd139f43be4912c20d8661691bee2b662964d4 Related-Bug: 1567004
2016-04-04Filter for local nodes in check_resource functionRaoul Scarazzini1-1/+2
While having extra customizations inside a TripleO deployed Pacemaker environment, say you have instance HA with pacemaker_remoted or you need to configure an external arbitrator for something, then the status of the resources for remote nodes is "Stopped". This leads to failures while, for example, scaling up. This fixes the way status is checked, filtering just local nodes. Co-Authored-By: Giulio Fidente <gfidente@redhat.com> Change-Id: I8dc25f5d7031c265858afd5a266fda5315ae37a0