diff options
author | Jiri Stransky <jistr@redhat.com> | 2015-12-17 14:40:15 +0100 |
---|---|---|
committer | Jiri Stransky <jistr@redhat.com> | 2016-01-04 18:15:15 +0100 |
commit | ac7467bacddb98fbb9fc9716ba56cad0d6f282de (patch) | |
tree | 121845068503cb6e2d06f99fa8cd1846c159e43a | |
parent | bce5f65f2e036f6ef5232fdfc8025d7f864faa56 (diff) |
Wait for cluster to settle in yum_update.sh
Occasionally we hit "Error: unable to push cib" during update. This is
probably due to the fact that when we try to replace cib in
yum_update.sh, services on the previous updated controller are still
coming up and changing cib, and racing/conflicting with the cib push
from yum_update.sh.
This commit adds waiting for the cluster to settle before exiting from
yum_update.sh, to avoid this kind of conflict.
Also a check for cib-push success is added, to make the update fail
properly instead of hanging indefinitely as we've observed with this
issue.
Change-Id: I953087e0e565474ac553fd57bea2459d2e3a6081
Closes-Bug: #1527644
-rwxr-xr-x | extraconfig/tasks/yum_update.sh | 12 |
1 files changed, 11 insertions, 1 deletions
diff --git a/extraconfig/tasks/yum_update.sh b/extraconfig/tasks/yum_update.sh index e32369e1..2d6b8cc2 100755 --- a/extraconfig/tasks/yum_update.sh +++ b/extraconfig/tasks/yum_update.sh @@ -24,6 +24,7 @@ update_identifier=${update_identifier//[^a-zA-Z0-9-_]/} # seconds to wait for this node to rejoin the cluster after update cluster_start_timeout=600 galera_sync_timeout=360 +cluster_settle_timeout=1800 timestamp_file="$timestamp_dir/$update_identifier" if [[ -a "$timestamp_file" ]]; then @@ -128,7 +129,10 @@ openstack-nova-scheduler" pcs -f $pacemaker_dumpfile resource update mongod op stop timeout=100s echo "Applying new Pacemaker config" - pcs cluster cib-push $pacemaker_dumpfile + if ! pcs cluster cib-push $pacemaker_dumpfile; then + echo "ERROR failed to apply new pacemaker config" + exit 1 + fi echo "Pacemaker running, stopping cluster node and doing full package update" node_count=$(pcs status xml | grep -o "<nodes_configured.*/>" | grep -o 'number="[0-9]*"' | grep -o "[0-9]*") @@ -188,6 +192,12 @@ if [[ "$pacemaker_status" == "active" ]] ; then fi done + echo "Waiting for pacemaker cluster to settle" + if ! timeout -k 10 $cluster_settle_timeout crm_resource --wait; then + echo "ERROR timed out while waiting for the cluster to settle" + exit 1 + fi + pcs status else |