Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
When deploying via ipv6, rabbitmq-ctl commands have the following
issues:
- `rabbitmq cluster_status` shows nodedown alerts
- list_queues / list_connections hang
- `rabbitmqctl node_health_check` fails with an error.
* There is no any issue while performing activity on RHOS setup(From
* horizon/cli). i.e. RHOS environment is functioning as expected.
For example:
sudo rabbitmqctl node_health_check -n rabbit@node1
Checking health of node 'rabbit@node1' ...
Heath check failed:
health check of node 'rabbit@node1' fails: nodedown
The problem is that we are missing the following in
/etc/rabbitmq/rabbitmq-env.conf:
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
Fix these by setting the appropriate RABBITMQ_CTL_ERL_ARGS when
deploying ipv6.
Closes-Bug: #1633693
Change-Id: I53f4e76e687b3966fbb74fd0c2d83f05176630de
|
|
proxy for the UI
Change-Id: I74eac4bbfc16720eeb6e2bf0ee251689dde3bafc
Implements: enable-communication-ui-undercloud
|
|
We use the rabbit_hosts configuration for most of our services but we
haven't been adding the configured port. This patch appends the IP port
used provided to the service's heat template to the IPs in the list.
Note: while we could use the value set for the rabbitmq server in
rabbitmq::port, it doesn't allow for dealing with SSL. This also is also
backwards compatible with the RabbitClientPort parameters used in the
heat templates.
Change-Id: I0000f039144a6b0e98c0a148dc69324f60db3d8b
Closes-Bug: #1633580
|
|
Since moving to composable service/roles there was some logic here that
was relying on a variable to enable ODL rather than enabling the service
itself to decide where ODL was enabled. Now that ODL and ODL OVS
configuration are split into 2 different services we can make these
truly composable.
Partial-Bug: 1633625
Change-Id: Ia55c05e12d5d434111a13e1ed795da530e3ff4a5
Signed-off-by: Tim Rozet <trozet@redhat.com>
|
|
|
|
Change-Id: Ie215289a7be681a2b1aa5495d3f965c005d62f52
Depends-On: Ia863b38bbac1aceabe6b7deb6939c9db693ff16d
|
|
This adds the necessary resources to the manifest to run cinder
to run over httpd. The service name will be moved to t-h-t in a
subsequent commit, but since this patch depends on t-h-t, we try to
avoid circular dependencies of repos.
Change-Id: I950257e3b5d8db071752e53557115429574e98e2
Depends-On: Ic1967a6f4f60a273965811516f33121115d518b4
|
|
The patch making nova run over httpd had added migration logic to
stop nova-api, However, this doesn't work since nova-metadata is
running over the same process. Now, the fact that is was running
seems to be just luck, since the systemctl runs, then we start the
service via the nova::api resource. So this is fragile in it's
current state.
This then removes the exec, as we don't need it for the migration.
Change-Id: I4603b81d30a704b07eef461b3cdbfe164614b04f
|
|
By enabling the statistics socket we allow the collection
of statistics over time for haproxy.
This socket is set to "user" level, so this socket is limited
to read-only. The "stats timeout" line is optional, but since the
default timeout of the stats socket is 10s, we set this higher.
Change-Id: I22d3ab771e981be0d2c74b60443d276973bc1639
|
|
|
|
Instead of using an operator to make sure we upgrade package before any
service, which causes dependency cycles with iptables puppet module,
let's do another approach where we upgrade rpms in the 'setup' stage,
which is a stage that runs before configure and running services.
In that way, we'll remove dependency cycles and make sure packages are
upgrades before configure and running TripleO services.
Change-Id: I1be83f88be1959885c980ab4f428477d412751f7
|
|
|
|
This needs to happen on the node running keystone, or things break
when you try to deploy e.g the heat_engine service on a non Controller
role. We check the enabled flag for heat engine so this only happens
if the heat_engine service is running on some (any) role.
Partial-Bug: #1631130
Change-Id: Ib088a572b384b479f51d56555734d78ab840a1f3
|
|
We can now get this parameter from t-h-t, so it's not needed here.
Change-Id: I014e7b3a6feb5609ace2e8ef1e4df11448b0a0cc
Depends-On: Ic229182cc5c887b57f6182c3db1bac8bed330f7c
|
|
|
|
|
|
|
|
Change-Id: I78049105adf52226d47cc6764b1ba6c2c06e91e5
Related-Bug: 1631926
|
|
|
|
remove_default_accounts is a mysql::server parameter that, set to True,
will execute some MySQL commands to cleanup MySQL defaults accounts
created by packaging.
In order to successfully run the commands, we need MySQL up and running,
which is not the case at step 1 but at step 2.
This patch make sure we run the commands at step 2 on pacemaker master
only.
No change for scenarios without Pacemaker.
Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e
Closes-Bug: #1633113
|
|
Currently the /var/lib/tripleo/pacemaker-restarts directory is created
only when base/pacemaker.pp file is included in the manifest. There is a
notification that ensures precedence order and trigger the touch.
The trigger and the dependency on the base/pacemaker.pp should not be
required as someone using the tripleo::pacemaker::resource_restart_flag
would expect the file to be created no matter what.
For instance in the Cinder upgrade in the convergence step has this
defined:
Cinder_config<||> ~> Tripleo::Pacemaker::Resource_restart_flag["${::cinder::params::volume_service}"]
but in the convergence step, the base/pacemaker.pp is not included and
the above trigger fails as the directory is not created.
It looks the same for manilla.pp.
This patch removes the trigger and ensures the directory is created when
needed.
Change-Id: Ic3aa82c818662e9e88e21c8381d657adef5b43ac
Closes-Bug: #1632232
|
|
This adds the necessary resources to the manifest to migrate nova
to run over httpd. The service name will be moved to t-h-t in a
subsequent commit, but since this patch depends on t-h-t, we try to
avoid circular dependencies of repos.
Change-Id: I91d430a3871672f90b0f885736f067ddae3c238c
Depends-On: I57fb20cf0d58b3376243ba4aeb04e995e7152ce3
|
|
|
|
|
|
When we observe the 'stop timeout' values of pacemaker resources:
rabbitmq and redis, they are set to 90s. But for all other services, it
is set to 200s.
The overcloud deployment sometimes fails due to this with the error:
Error: Could not complete shutdown of rabbitmq-clone, 1 resources
remaining
Error performing operation: Timer expired
This patch updates the timeout for Redis and RabbitMQ to avoid this
error.
Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
|
|
Tempest expects object versioning to be enabled by default in Swift;
if not it has to be disabled explicitly in the Tempest config.
This is a commonly used middleware, therefore it should be enabled
in the overcloud proxy nodes as well.
Closes-Bug: 1632215
Change-Id: I07a206473ff7939749e3eba1dfe3ea8c4526eb5c
|
|
|
|
The hiera key generated by THT is eqlx_chap_password and not
eql_san_password.
https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/extraconfig/pre_deploy/controller/cinder-eqlx.yaml#L63
Change-Id: Ic062d9060f0ce437336e2bd6aaca3887fc33c8cf
Closes-Bug: #1631527
|
|
The ceilometer::db::sync is included by default in ceilometer::db but we
only want it to run on the bootstrap node. This change passes the
sync_db parameter to ceilometer::db to manage the db sync process rather
than trying to manage the inclusion of ceilometer::db::sync within the
profile class.
Change-Id: Ib56db1a90dd6fbfe7582fc57b7728df81942cce2
Closes-Bug: #1629373
|
|
This changes makes the Swift API and Storage roles to include the
::swift::config class, as we do for the other OpenStack services,
which is useful to push arbitrary config settings into Swift.
Change-Id: Iaf2c2f0f0103fe9264ce875099a1578b353a5558
|
|
Aligns the way how we check for enabled backends in
pacemaker/manila.pp with what we did in base/manila/api.pp with [1].
The benefit is that we don't need to emit from the templates
custom hiera.
1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb
Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
|
|
When deploying manila with cephfs, share creation fails because
'enabled_share_protocols' sticks to NFS,CIFS and does not get updated
with CEPHFS. This change aims at fixing it by building the list of
enabled protocols based on the list of enabled backends.
Co-Authored-By: Tom Barron <tbarron@redhat.com>
Closes-Bug: 1630564
Change-Id: I86ba8b9d5872c0f1a94e74215e97b796ad129bfb
|
|
|
|
|
|
|
|
|
|
The service profile in HAProxy has the capability of creating
certificates based on a map. The idea is to standardize this, as
some of those certificates should match certain networks the services
are listening on (with the exception of the external network which is
handled differently and the tenant network which doesn't need a
certificate). So, based on which network a certain service is
listening on, we fetch the appropriate certificate.
bp tls-via-certmonger
Change-Id: I89001ae32f46c9682aecc118753ef6cd647baa62
|
|
|
|
We're not able to use FQDNs yet, so to work around this, we give
precedence to a "short name" list we'll get from t-h-t. We can
migrate to using FQDNs in the next cycle.
Change-Id: Ic6fec1057439ed9122d44ef294be890d3ff8a8ee
Related-Bug: #1628521
|
|
|
|
The UI expects a Keystone endpoint URL that includes the version
(without it, it is not possible to log in). Looking at the
dist/tripleo_ui_config.js.sample configuration sample in the tripleo-ui
repository, the current expectation is a v2.0 URL so let's use that for
now.
Change-Id: I4ca04b16251fbee264cd4ce5e5433c2c1cb6d2f0
Closes-Bug: #1630546
|
|
Right now we're hardcoding the server names for the services to be
the controllers. This is problematic if we start using custom roles
for services, which listen on nodes that are not controllers.
We already have the server names for each service, so using this
mapping instead fixes the issue.
Change-Id: Ic4b65edb3dc1b75abbc3421a87cab97425b058c4
Closes-Bug: #1629098
|
|
We're not able to use FQDNs yet, so to work around this, we give
precedence to a "short name" list we'll get from t-h-t.
Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8
Related-Bug: #1628521
|
|
It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.
It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.
It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.
I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
To this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"
According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.
Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com>
Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084
Partial-Bug: #1628998
|
|
We added code in t-h-t to strip empty services from the service_names
list. (These are often the result of a service set to OS::Heat::None).
As such we can now drop this puppet reject statement.
Change-Id: Ie66f14f183de7e44a1f69af862f7d4be9a14c904
|
|
|
|
When updating the package with yum directly, a new httpd config file is
created with a different name than the one used by Puppet, causing
httpd to fail. Cleaning out the package config file and keeping it
around means it won't get overwritten on update, and is the way other
projects such as puppet-horizon handle this.
Change-Id: I539729ce4cd0898f8b0f3f26266e4e6d55b99e37
Closes-Bug: #1628983
|
|
|
|
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e
we made sure that the default start and stop timeouts for pacemaker
systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec
in systemd). We did this change by setting puppet resource defaults for
the Pacemaker::Resource::Service class:
Pacemaker::Resource::Service {
op_params => 'start timeout=200s stop timeout=200s',
}
The problem is that after the composable services rework, this does not
work anymore and the pacemaker systemd resources that still exist do not
have these timeouts set.
We want to move away from resource defaults for this because its results
are dependent on the inclusion order which in tripleo is not guaranteed
any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules)
The only services affected in Newton are: cinder-volume,
cinder-backup, manila-share, haproxy. I preferred fixing all the
pacemaker resources because it seems the cleanest and most logical
commit.
Change-Id: If89a95706514e536a7a2949871a0002c79b6046e
Closes-Bug: #1629366
|