Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
This commit does the following:
1. We now explicitly disable/stop and then remove the resources that are
moving to systemd. We do this because we want to make sure they are all
stopped before doing a yum upgrade, which otherwise would take ages due
to rabbitmq and galera being down. It is best if we do this via pcs
while we do the HA Full -> HA NG migration because it is simpler to make
sure all the services are stopped at that stage. For extra safety we can
still do a check by hand. By doing it via pacemaker we have the
guarantee that all the migrated services are down already when we stop
the cluster (which happens to be a syncronization point between all
controller nodes). That way we can be certain that they are all down on
all nodes before starting the yum upgrade process.
2. We actually need to start the systemd services in
major_upgrade_controller_pacemaker_2.sh and not stop them.
3. We need to use the proper bash variable name
4. Use is_bootstrap_node everywhere to make the code more consistent
Change-Id: Ic565c781b80357bed9483df45a4a94ec0423487c
Closes-Bug: #1627490
|
|
Currently we do not disable openstack-cinder-volume during our
major-upgrade-pacemaker step. This leads to the following scenario. In
major_upgrade_controller_pacemaker_2.sh we do:
start_or_enable_service galera
check_resource galera started 600
....
if [[ -n $(is_bootstrap_node) ]]; then
...
cinder-manage db sync
...
What happens here is that since openstack-cinder-volume was never
disabled it will already be started by pacemaker before we call
cinder-manage and this will give us the following errors during the
start:
06:05:21.861 19482 ERROR cinder.cmd.volume DBError:
(pymysql.err.InternalError) (1054, u"Unknown column 'services.cluster_name' in 'field list'")
Change-Id: I01b2daf956c30b9a4985ea62cbf4c941ec66dcdf
Closes-Bug: #1627470
|
|
Currently we in major_upgrade_controller_pacemaker_2.sh we are calling
ceilometer-dbsync before mongod is actually started (only galera is
started at this point). This will make the dbsync hang indefinitely
until the heat stack times out.
Now this approach should be okay, but do note that when we start mongod
via systemctl we are not guaranteed that it will be up on all nodes
before we call ceilometer-dbsync. This *should* be okay because
ceilometer-dbsync keeps retrying and eventually one of the nodes will
be available. A completely clean fix here would be to add another
step in heat to have the guarantee that all mongo servers are up and
running before the dbsync call.
Change-Id: I10c960b1e0efdeb1e55d77c25aebf1e3e67f17ca
Closes-Bug: #1627453
|
|
With commit fb25385d34e604d2f670cebe3e03fd57c14fa6be
"Rework the pacemaker_common_functions for M..N upgrades" we
accidentally removed some lines that fixed M/N upgrade issues.
Namely:
extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh
-# https://bugzilla.redhat.com/show_bug.cgi?id=1284047
-# Change-Id: Ib3f6c12ff5471e1f017f28b16b1e6496a4a4b435
-crudini --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend rabbit
-# https://bugzilla.redhat.com/show_bug.cgi?id=1284058
-# Ifd1861e3df46fad0e44ff9b5cbd58711bbc87c97 Swift Ceilometer middleware no longer exists
-crudini --set /etc/swift/proxy-server.conf pipeline:main pipeline "catch_errors healthcheck cache ratelimit tempurl formpost authtoken keystone staticweb proxy-logging proxy-server"
-# LP: 1615035, required only for M/N upgrade.
-crudini --set /etc/nova/nova.conf DEFAULT scheduler_host_manager host_manager
extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
nova-manage db sync
- nova-manage api_db sync
This patch simply puts that code back without reverting the
whole commit that broke things, because that is needed.
Closes-Bug: #1627448
Change-Id: I89124ead8928ff33e6b6907a7c2178169e91f4e6
|
|
This is the initial work to have a function that migrates a full HA
architecture as deployed in Mitaka to the HA architecture as deployed in
Newton where only a few resources are managed by pacemaker.
The sequence is the following:
1) We remove the desired services from pacemaker's control. The services
at this point are still running normally via the systemd service as
invoked by pacemaker
2) We do a "systemctl stop <service>" on all controllers for all the
services that were removed from pacemaker's control. We do this to make
sure that during the yum upgrade, the %post sections that call
"systemctl try-restart" do not take ages, because at this point during
the upgrade rabbit is down. The only exceptions are "openstack-core"
and "delay" which are dummy pacemaker resources that do not exist on
the system
3) We do a "systemctl start <service>" on all nodes for all the services
mentioned above.
We should probably merge this patch only when newton has branched as it
is very specific to the M/N upgrade.
Closes-Bug: 1617520
Change-Id: I4c409ce58c1a57b6e0decc3cf168b62698b32e39
|
|
For N we cannot assume services are managed by pacemaker.
This adds functions to check if a service is systemd or
pcmk managed and start/stops it accordingly. For pcmk,
only stop/disable on bootstrap node for example, whereas
systemd should stop/start on all controllers.
There is also an equivalent change to the check_resource
which has been reworked to allow both pcmk and systemd.
Implements: blueprint overcloud-upgrades-workflow-mitaka-to-newton
Change-Id: Ic8252736781dc906b3aef8fc756eb8b2f3bb1f02
|
|
The nova api db need to be synchronized as well.
Change-Id: I2628b24ff1153c84cbf388455666ae42570cb10f
Closes-Bug: 1615042
|
|
If "pcs cluster stop --all" is executed on a controller that
happens to have a VIP on the internal network, pcs may use the
VIP as the source address for communication with another cluster
node. When pacemaker is stopped this VIP goes away, and pcs never
receives a response from the other node. This causes pcs to hang
indefinitely; eventually the upgrade times out and fails.
Disabling the VIPs before stopping the cluster avoids this
situation.
Change-Id: I6bc59120211af28456018640033ce3763c373bbb
Closes-Bug: 1577570
|
|
The update and upgrade shell scripts were still referencing the
old openstack-keystone service which got removed with
Ie26908ac9bfc0b84b6b65ae3bda711236b03d9d4
Also removes kilo and liberty specific workarounds and config changes.
Change-Id: Icc80904908ee3558930d4639a21812f14b2fd12e
|
|
Since swift isn't managed by pacemaker we need to manually (systemctl)
stop and start the swift services. This moves the duplicate blocks for
start/stop into a common function (we already include that
pacemaker_common_functions.sh here so may as well)
Change-Id: Ic4f23212594c1bf9edc39143bf60c7f6d648fd1d
|
|
Change-Id: I7226070aa87416e79f25625647f8e3076c9e2c9a
|