Age | Commit message (Collapse) | Author | Files | Lines |
|
Older neutron versions have a bug which makes them leave keepalived and
radvd running even after all neutron services are stopped, preventing
neutron router failover from happening. Router can then get stuck on the
inactive node, like this:
[stack@instack ~]$ neutron l3-agent-list-hosting-router default_router
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| 48ca9477-b93b-4305-9e6d-9f1c5d3388f0 | overcloud-controller-1.localdomain | True | :-) | standby |
| eba0575c-654f-4da6-b1cd-f7fdf1cd3726 | overcloud-controller-2.localdomain | True | :-) | standby |
| 68815390-251f-4425-a5f8-38bdbf3bdb90 | overcloud-controller-0.localdomain | True | xxx | active |
+--------------------------------------+------------------------------------+----------------+-------+----------+
We need to kill the leftover processes manually to prevent the state
described above from happening.
See https://review.gerrithub.io/#/c/248931
Change-Id: I2deaa176222983daa0c33ab52a6aa5dbe7365302
|
|
The neutron pcs constraints were reworked in
https://review.openstack.org/#/c/229466/
For overclouds deployed with older tripleo-heat-templates the
current pcs ordering constraints will not have those changes,
meaning that the behaviour discussed at
https://bugs.launchpad.net/tripleo/+bug/1501378 is likely
given we will stop and restart all services. This review
applies those, in short, remove the ovs-cleanup after
neutron-server and add openvswitch-agent instead. Detail in
the bug report and linked BZ.
Change-Id: I45822c5fe9029f11635400b7fbd386880ac80a4e
Related-Bug: 1501378
|
|
To avoid pcmk reconfiguring the resources on each config change,
we want to apply the constraints and timeouts from file.
We also *do not* want to alter the timeouts for a few ocf resources
which are rabbitmq, neutron-netns-cleanup and neutron-ovs-cleanup
Change-Id: I6875f19e1f34f0fdcf0928421f49b61d857ca7c8
Co-Authored-By: Andrew Beekhof <abeekhof@redhat.com>
|
|
When the cluster is brought back online after a yum update in
yum_update.sh, we should verify that galera is fully sync'd before
moving on. This ensures the sync is complete before moving on to update
any other nodes in the cluster.
Change-Id: Ie8fc2c5d5214deacea94ca658ac75359b318ced1
|
|
This matches change I6fc18f1ad876c5a25723710a3b20d8ec9519dcba, but we
need it to set it before attempting the cluster stop - yum update -
cluster start cycle, to make sure this cycle doesn't hit the low timeout
limits.
This can be removed once updates from deployments made prior to
I6fc18f1ad876c5a25723710a3b20d8ec9519dcba are no longer supported.
Change-Id: I587136d8d045d213875c657ea5a405074f80c8ad
|
|
Some missing pacemaker constraints were added in the following commits:
https://review.openstack.org/#/c/219770/
https://review.openstack.org/#/c/219665/
https://review.openstack.org/#/c/218931/
https://review.openstack.org/#/c/218930/
Overclouds that were deployed prior to these constraints being added to
tripleo-heat-templates still have the constraints missing. During an
update, stopping and starting the cluster can fail without these
constraints in place. As a workaround, conditionally add these
contraints in yum_update.sh so that we're sure they're always present
before updating.
Change-Id: Id46c85dbbe5e85d362279661091b17ce1b697fe0
|
|
|
|
|
|
Currently, we have a problem because the unregistration happens in the
"post deploy" phase, which works fine when the top-level stack is being
deleted, but not when the ResourceGroup of servers is being scaled down,
because then the normal "post deploy" update ordering is respected and
we try to unregister after the corresponding server has been deleted.
So, instead, register/unregister each node inside the unit of scale,
e.g the role template being scaled down, which is possible via the new
NodesExtraConfig interface, which means unregistration will take
place at the right time both on stack delete and on scale-down.
Change-Id: I8f117a49fd128f268659525dd03ad46ba3daa1bc
|
|
Currently package updates won't occur on a single node
non-HA pacemaker managed Controller because stopping
the node loses the quorum of 1.
This change gets the count of current nodes in the cluster and
if the count is 1 then specify --force when doing a pcs cluster stop.
Change-Id: I0de2488e24f1ef53a935dbc90ec6de6142bb4264
|
|
This change adds alternative logic for handling package updates
on a pacemaker managed node.
"yum list updates" is now run and this script exits early if
there are no packages to update.
If the pacemaker service is not running then the previous puppet
logic remains, so a package update is performed which excludes packages
managed by puppet, and a flag is set to indicate that puppet should
perform an ensure=>latest on all packages it manages.
However if the pacemaker service is running, the following occurs:
- pcs cluster stop is run for this node
- a full yum update is performed
- pcs cluster start is run for this node
- pcs status is run until the hostname for this node appears in the
Online list
This means that puppet is not involved in the package update process when
the node is managed by pacemaker.
Change-Id: I5ad118552d053dbda280978751167d9fd9da9874
|
|
This change updates yum_update.sh so that we set set a boolean
output when "managed" packages should get updated. The
output is named 'update_managed_packages' and for the
puppet implementation it is wired up so that it
directly sets tripleo::packages::enable_upgrade to
control whether packages are updated.
It also modifies yum_update.sh to build a yum update excludes list for
packages managed by puppet. The exclude lists are being
generated via puppet-tripleo as well via the new 'write_package_names'
function that is now wired into all the role manifests.
This change does not actually trigger the puppet apply. The fix for
Related-Bug: #1463092 will be used to trigger the puppet run when the
hiera changes. As a minor tweak to this logic we append the
UpdateIdentifier to the config_identifier so that we ensure
puppet gets executed on an update where other (non-related)
hiera changes also occur.
Co-Authored-By: Dan Prince <dprince@redhat.com>
Change-Id: I343c3959517eae38bbcd43648ed56f610272864d
|
|
Adds hook to enable additional "AllNodes" config to be performed prior
to applying puppet - this is useful when you need to build
configuration data which requires knowledge of all nodes in a cluster,
or of the entire deployment.
As an example, there is a sample config template which collects the
hostname and mac addresses for all nodes in the deployment then writes
the data to all Controller nodes. Something similar to this may be
required to enable creation of the nexus_config in
https://review.openstack.org/#/c/198754/
There's also another, simpler, example which shows how you could share
the output of an OS::Heat::RandomString between nodes.
Change-Id: I8342a238f50142d8c7426f2b96f4ef1635775509
|
|
In the case of using portal registration with an
activation key, the RHEL registration script is still
executing a `subscription-manager attach` command. This
should not happen if an activation key is provided. This
is because an activation key already provides the
subscriptions to attach.
Change-Id: I2907bede28a9b7bef71cedeea69c876eb4949df0
|
|
The recently added cinder-netapp extraconfig contains some additional
hieradata which needs to be applied during the initial pre-deployment
phase, e.g in controller-puppet.yaml (before the manifests are applied)
so wire in a new OS::TripleO::ControllerExtraConfigPre provider resource
which allows passing in a nested stack (empty by default) which contains
any required "pre deployment" extraconfig, such as applying this hieradata.
Some changes were required to the cinder-netapp extraconfig and environment
such that now the hieradata is actually applied, and the parameter_defaults
specified will be correctly mapped into the StructuredDeployment.
Change-Id: I8838a71db9447466cc84283b0b257bdb70353ffd
|
|
Currently we've got a mix of SoftwareConfig resource with
StructuredDeployments resources - while this will work it's
inconsistent and normally using the corresponding
SoftwareDeployments resouce is encourgaged instead.
Change-Id: I308d62d4ff491c073e3e8650fd4c2c65bf96d14a
|
|
|
|
This change adds config and deployment resources to trigger package
updates on nodes. The deployments are triggered by doing a stack-update
and setting one of the parameters to a unique value.
The intent is that rolling update will be controlled by setting
breakpoints on all of the UpdateDeployment resources inside the
role resource groups.
Change-Id: I56bbf944ecd6cbdbf116021b8a53f9f9111c134f
|
|
Enables support for configuring Cinder with a NetApp backend.
This change adds all relevant parameters for:
- Clustered Data ONTAP (NFS, iSCSI, FC)
- Data ONTAP 7-Mode (NFS, iSCSI, FC)
- E-Series (iSCSI)
Change-Id: If6c6e511ef2d26c4794e3b37c61e5318485ff4db
|
|
Adds a potential usage of the post-deploy hooks to register a server
with RHN or a satellite.
Note this requires some additional parameters, which can be specified in
environment_rhel_reg.yaml, and this must be passed into the call to heat
via another -e parameter. An alternative may be to have a global
extraconfig_env.yaml at the top level, which the scripts always pass, or
to use the global environment (/etc/heat/environment.d/default.yaml) on
the seed.
Co-Authored-By: James Slagle <jslagle@redhat.com>
Change-Id: Ia6fd270122cbc2e51beb672654e5e1ebd3bd2966
|
|
Adds optional hooks which can run operator defined additional config on
nodes after the application deployment has completed.
Change-Id: I3f99e648efad82ce2cd51e2d5168c716f0cee8fe
|