Age | Commit message (Collapse) | Author | Files | Lines |
|
This is the initial work to have a function that migrates a full HA
architecture as deployed in Mitaka to the HA architecture as deployed in
Newton where only a few resources are managed by pacemaker.
The sequence is the following:
1) We remove the desired services from pacemaker's control. The services
at this point are still running normally via the systemd service as
invoked by pacemaker
2) We do a "systemctl stop <service>" on all controllers for all the
services that were removed from pacemaker's control. We do this to make
sure that during the yum upgrade, the %post sections that call
"systemctl try-restart" do not take ages, because at this point during
the upgrade rabbit is down. The only exceptions are "openstack-core"
and "delay" which are dummy pacemaker resources that do not exist on
the system
3) We do a "systemctl start <service>" on all nodes for all the services
mentioned above.
We should probably merge this patch only when newton has branched as it
is very specific to the M/N upgrade.
Closes-Bug: 1617520
Change-Id: I4c409ce58c1a57b6e0decc3cf168b62698b32e39
|
|
Change-Id: I7a041dab8b1b1edc9c80248e1eef3ce7ab272292
Closes-Bug: 1615056
|
|
For N we cannot assume services are managed by pacemaker.
This adds functions to check if a service is systemd or
pcmk managed and start/stops it accordingly. For pcmk,
only stop/disable on bootstrap node for example, whereas
systemd should stop/start on all controllers.
There is also an equivalent change to the check_resource
which has been reworked to allow both pcmk and systemd.
Implements: blueprint overcloud-upgrades-workflow-mitaka-to-newton
Change-Id: Ic8252736781dc906b3aef8fc756eb8b2f3bb1f02
|
|
We make it clear that recoverable checks happen before starting the
upgrade to be able to run the upgrade after the offending error has been
manually corrected.
Add new check for the pcsd cluster status.
Add new check for galera password file: BZ 1357112
Closes-Bug: 1614907
Change-Id: If736c79121e1ffe0eaeb814bdb73ccbc0b64edcd
|
|
|
|
|
|
|
|
We have to recreate the /var/lib/mysql directory on all controller node,
not just the boostrap node for the cluster to be able to restart.
Adding a warning on the fact that those script are local and know
nothing about the good upgrade state of the other nodes.
Closes-Bug: 1612642
Change-Id: I48e2812d7df80bbf2db53a8b71dc434d4209a160
|
|
There is a typo in the code, making this test always successful.
Closes-Bug: 1614437
Change-Id: Ia6b0b156294de9fcb8f66fc46aa8801555775a56
|
|
scheduler_host_manager doesn't take
nova.scheduler.host_manager.HostManager as a value anymore. This fix it
before restarting the service.
Change-Id: Ia9adcfd5a898f0c712b4a37ae33db88a44630f0d
Closes-Bug: 1615035
|
|
The MySqlMajorUpgrade parameter has validation on it allowing only
values yes/no/auto, however in the script we checked for '0' instead of
'no', which means the only effective values were yes/auto. This is now
fixed to allow switching the migration off.
Change-Id: I5d64734894c6bfd9003ad643f3747e34e62465cc
Closes-Bug: #1616429
|
|
|
|
When the overcloud is upgraded we do a yum update of the packages.
This step might introduce a newer galera version. In such a situation
we need to dump the db and restore it. The high-level workflow should
be the following:
1) During the main upgrade step, before shutting down the cluster
we need to dump the db
2) We upgrade the packages
3) We briefly start mysql on a single node while making sure that
/root/.my.cnf is briefly moved out of the way (because it contains
a password) and import the data. After the import we shutdown this
mysql instance
4) We let the cluster start up normally
The above steps will take place in the following scenarios.
Given a locally installed mariadb version X.Y.Z and release R,
we will dump and restore the DB under the following conditions:
A) MySqlMajorUpgrade template parameter is set to 'auto' and
the upgraded package differs in X, Y *or* Z. We basically don't
dump automatically if the release field changes.
B) MySqlMajorUpgrade template parameter is set to 'yes'
When MySqlMajorUpgrade is set to 'no', no dumping will be performed.
Note that this will give a non functional upgrade if a major mariadb
upgrade is taking place.
Partial-Bug: #1587449
Co-Author: Damien Ciabrin <dciabrin@redhat.com>
Co-Author: Mike Bayer <mbayer@redhat.com>
Depends-On: I8cb4cb3193e6b823aad48ad7dbbbb227364d2a58
Depends-On: I38dcacfabc44539aab1f7da85168fe44a1b43a51
Change-Id: I374628547aed091129d0deaa29764bfc998d76ea
|
|
Since the Liberty release, the number of services managed by pacemaker
on HA Overcloud has increased. This has an impact on
major_upgrade_controller_pacemaker_1.sh, where cluster sync timeout
value tuned for older releases is now becoming too low.
Raise the cluster sync timeout value to a sensible limit to
give pacemaker enough time to stop the cluster during major upgrade.
Change-Id: I821d354ba30ce39134982ba12a82c429faa3ce62
Closes-Bug: #1597506
|
|
It is best if we disable stonith if a cluster has it configured and on,
before we call "pcs cluster stop --all", because should a service fail
to stop for whatever reason, pacemaker will fence the node where it
happened. This is something that we unlikely want during an upgrade as
it will make things worse.
Once the cluster is stopped we can reenable stonith (if it was enabled
to start with) in the CIB while the cluster is shut down.
Closes-Bug: #1596065
Change-Id: I38dcacfabc44539aab1f7da85168fe44a1b43a51
|
|
If "pcs cluster stop --all" is executed on a controller that
happens to have a VIP on the internal network, pcs may use the
VIP as the source address for communication with another cluster
node. When pacemaker is stopped this VIP goes away, and pcs never
receives a response from the other node. This causes pcs to hang
indefinitely; eventually the upgrade times out and fails.
Disabling the VIPs before stopping the cluster avoids this
situation.
Change-Id: I6bc59120211af28456018640033ce3763c373bbb
Closes-Bug: 1577570
|
|
The update and upgrade shell scripts were still referencing the
old openstack-keystone service which got removed with
Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4
Also removes kilo and liberty specific workarounds and config changes.
Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
|
|
Since swift isn't managed by pacemaker we need to manually (systemctl)
stop and start the swift services. This moves the duplicate blocks for
start/stop into a common function (we already include that
pacemaker_common_functions.sh here so may as well)
Change-Id: Ic4f23212594c1bf9edc39143bf60c7f6d648fd1d
|
|
Old overcloud images don't have python-zaqarclient installed, and new
overclouds' os-collect-config are configured with Zaqar support. This
together means that on upgrade we need to install python-zaqarclient,
otherwise os-collect-config will be restarted during yum update and
crash due to trying to import missing Python module from zaqarclient.
Change-Id: I3e875e14cb60b1b78aec0d9ddc412ccf865abd01
|
|
Quiet down yum during major upgrades to reduce the output size. This is
consistent with what was introduced into minor updates in change
I517271e8465885421a78b73c5af756816c37a977.
Change-Id: Ie6b470e383fdf42870ac6f60ca43e44b4c446ebe
|
|
This parameter can be used for pinning (and later unpinning) the Nova
Compute RPC version.
Change-Id: I2f181f3b01f0b8059566d01db0152a12bbbd1c3e
|
|
Change-Id: I7226070aa87416e79f25625647f8e3076c9e2c9a
|