Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
If we upgrade a cloud that was configured with external load balancer
the process will fail during convergence step because it will try to
restart haproxy which is not configured when an external load balancer
is configured.
Closes-Bug: #1636527
Change-Id: I6f6caec3e5c96e77437c1c83e625f39649a66c48
|
|
The current redis file descriptor limit is 4096 because of two reasons:
- It is run via the redis user
- It is not started via systemd which has explicit LimitNOFILE set to
10240 (which matches the default configuration of maximum 10000
clients)
Create an /etc/security/limits.d/redis.conf file in order to increase
the fd limit value With this change we correctly get the following
limits:
[root@overcloud-controller-0 ~]# pcs status |grep -A2 redis
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-controller-2 ]
Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
[root@overcloud-controller-0 ~]# cat /proc/`pgrep redis`/limits | grep open
Max open files 10240 10240 files
Previously this limit was set to 4096.
Change-Id: I7691581bad92ad9442cecd82cf44f5ac78ed169f
Closes-Bug: #1635334
|
|
|
|
remove_default_accounts is a mysql::server parameter that, set to True,
will execute some MySQL commands to cleanup MySQL defaults accounts
created by packaging.
In order to successfully run the commands, we need MySQL up and running,
which is not the case at step 1 but at step 2.
This patch make sure we run the commands at step 2 on pacemaker master
only.
No change for scenarios without Pacemaker.
Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e
Closes-Bug: #1633113
|
|
When we observe the 'stop timeout' values of pacemaker resources:
rabbitmq and redis, they are set to 90s. But for all other services, it
is set to 200s.
The overcloud deployment sometimes fails due to this with the error:
Error: Could not complete shutdown of rabbitmq-clone, 1 resources
remaining
Error performing operation: Timer expired
This patch updates the timeout for Redis and RabbitMQ to avoid this
error.
Change-Id: I8a3b3951a896ee3e8e5e09778e8ea4717e76a1b4
|
|
Aligns the way how we check for enabled backends in
pacemaker/manila.pp with what we did in base/manila/api.pp with [1].
The benefit is that we don't need to emit from the templates
custom hiera.
1. I86ba8b9d5872c0f1a94e74215e97b796ad129bfb
Change-Id: I04e28a95e8d69a24cd3df109bf1802bfcbd941db
|
|
|
|
We're not able to use FQDNs yet, so to work around this, we give
precedence to a "short name" list we'll get from t-h-t.
Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8
Related-Bug: #1628521
|
|
It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.
It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.
It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.
I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
To this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"
According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.
Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com>
Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084
Partial-Bug: #1628998
|
|
Back in the Mitaka cycle via the change If6b43982c958f63bc78ad997400bf1279c23df7e
we made sure that the default start and stop timeouts for pacemaker
systemd resources is 200s (>= twice the default 90s DefaultTimeoutStopSec
in systemd). We did this change by setting puppet resource defaults for
the Pacemaker::Resource::Service class:
Pacemaker::Resource::Service {
op_params => 'start timeout=200s stop timeout=200s',
}
The problem is that after the composable services rework, this does not
work anymore and the pacemaker systemd resources that still exist do not
have these timeouts set.
We want to move away from resource defaults for this because its results
are dependent on the inclusion order which in tripleo is not guaranteed
any longer (https://docs.puppet.com/puppet/latest/reference/lang_scope.html#scope-lookup-rules)
The only services affected in Newton are: cinder-volume,
cinder-backup, manila-share, haproxy. I preferred fixing all the
pacemaker resources because it seems the cleanest and most logical
commit.
Change-Id: If89a95706514e536a7a2949871a0002c79b6046e
Closes-Bug: #1629366
|
|
|
|
This patch moves the various DB syncs into the MySQL role.
Database creation needs to occur on the MySQL server to
avoid permission issues.
This patch also moves database creation to step 2 so we can
guarantee that all per-service databases exist at this time.
This avoids complex ordering needed during step 3 where
services, on different hosts, can run their own db sync's
in a distributed fashion.
Change-Id: I05cc0afa9373429a3197c194c3e8f784ae96de5f
Partial-bug: #1620595
|
|
|
|
|
|
having an actual name for that configuration will allow us to pass a
more proper name via t-h-t.
Change-Id: Iea4bd67074824e5dc6732fd7e408743e693d80b3
|
|
It used to be hardcoded that the bind-address was always coming from
the $::hostname fact. This is wrong, as it disregards where we have
configured the mysql address. This commit actually makes it
configurable, so we'll be able to set it via hieradata.
On the other hand, we use the hiera key that we already set
'mysql_bind_host' as a default; if, for some reason, that's
unavailable then we fall back to $::hostname.
Related-Bug: #1627060
Change-Id: I316acfd514aac63b84890e20283c4ca611ccde8b
|
|
In puppet-manila it is the api service performing db sync, not
scheduler. This change moves ::manila::db::mysql (which creates
the empty database and users) in the tripleo manila/api profile.
Also moves rabbit config into a general manila base profile as
that would be needed by the scheduler service as well.
Change-Id: I2b537f735b8d1be8f39e8c274be3872b193c1014
|
|
The puppet-tripleo side for manila-cephfs landed without specifying
defaults for all class params [1] so when cephfs isn't enabled
e.g. only generic, then you will get errors for those params. See
review comments at [2] for reports of this.
This will fixup the manila-cephfs puppet-tripleo side to be more
in line with the tidy up adding netapp at [3]. The config is all
moved tripleo-heat-templates side. The tht review for this
is at https://review.openstack.org/#/c/358525/ and that will now
depend on this review.
[1] https://review.openstack.org/#/c/354047/
[2] https://review.openstack.org/#/c/354019/
[3] https://review.openstack.org/#/c/354014/
Change-Id: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
|
|
|
|
This adds support for the manila-netapp backend. The backend
specific config is set tht side. So this change also
tidies up the manila generic config, which is unnecessarily
being duplicated here
( see https://review.openstack.org/#/c/354019/ )
Change-Id: Ic6f8e8d27ca20b9badddea5d16550aa18bff8418
|
|
Don't add brackets on mysql_bind_host parameter in Galera config.
Having brackets from this parameter works with old version of
Galera but not newest one.
So let's remove them at all, so we can safely upgrade Galera in RDO.
Change-Id: Ic904d4efda162f18ec8dffb91c2f383f54361f41
Closes-Bug: #1622755
|
|
|
|
|
|
Write restart flag file for services managed by Pacemaker into
/var/lib/tripleo/pacemaker-restarts directory. The name of the file must
match the name of the clone resource defined in pacemaker. The
post-puppet restart script will restart each service having a restart
flag file and remove those files.
This approach focuses on $pacemaker_master only (we don't want to
restart the pacemaker services 3 times when we have 3 controllers), so
it relies on the assumption that we're making the matching config
changes across the pacemaker nodes.
Change-Id: I6369ab0c82dbf3c8f21043f8aa9ab810744ddc12
|
|
profiles"
|
|
Prepare the pacemaker mysql manifest that galera_node_names will be an
array. Stay backwards compatible to handle comma-delimited-string too
and avoid a chicken-and-egg patch problem between t-h-t and
puppet-tripleo.
Change-Id: Ia0d9d59728c8771974bfbc486f4929b99a38e4fb
Partially-Implements: blueprint custom-roles
|
|
|
|
Change-Id: Idaad75238a2884fe82b1e5fce3ed910d866b98a2
|
|
|
|
Change-Id: If3feb859b527d08e10c124b5ad2f7f4b1f19156a
|
|
When we implemented the galera composable role we accidentally moved the
xinetd.d monitor service on the bootstrap node only. This meant that
haproxy believed that galera was down on the non-bootstrap nodes. A
shutdown of the bootstrap node meant that galera was effectively down
because haproxy would refuse to redirect the traffic to the
non-bootstrap node. Fix this by creating the
/etc/xinetd.d/galera-monitor on all controller nodes.
Change-Id: Ib5a06b3abbc32182476c2b0c81eb77a12821ad6b
|
|
|
|
Some lint checks are returning:
WARNING: line has more than 140 characters in puppet-tripleo profiles
This patch will remove those warnings by adding \'s
Change-Id: I19b56c93db82948fb0498a4c9851b522c81946f8
|
|
Adds a Cinder backup profile for Cinder backup service activation
(to be used in https://review.openstack.org/#/c/304563).
Cinder backup uses Swift as a default.
Change-Id: Ib1dfe52b83ab01819fc669312967950e75d8ddf1
Co-Authored-By: Jon Bernard <jobernar@redhat.com>
Co-Authored-By: Boris Kreitchman <bkreitch@gmail.com>
|
|
As we are staring to manually check overcloud services
the first step is to check that the puppet profiles
are all aligned.
Changes applied:
No logic added or removed in this submission.
Removed unused parameters.
Align header comments structure.
All profiles parameters sorted following:
"Mandatory params first sorted alphabetically
then optional params sorted alphabetically."
Note: Following submissions will check pacemaker,
cinder, mistral and redis services in the base profiles
as some of them has the $pacemaker_master parameter
defaulted to true.
Change-Id: I2f91c3f6baa33f74b5625789eec83233179a9655
|
|
We don't currently offer any parameter interface to enable
PKI certs, and these have all been deprecated by keystone, so
remove them.
Change-Id: I8232262b928c91dcde7bea2f23fa2a7c2660719e
|
|
|
|
Remove unused parameter in sahara
Change-Id: I46c033b410ab850289b798ee93990b6fb10c80ea
|
|
See discussion at https://review.openstack.org/#/c/342961/8
Change-Id: I571b65a5402c1028418476a573ebeb9450ed00c9
|
|
This change moves the cinder-volume/cinder-scheduler constraints in the
cinder-scheduler profile as these can't be applied by the cinder-volume
service when cinder-scheduler isn't managed by Pacemaker.
Blueprint:
https://blueprints.launchpad.net/tripleo/+spec/ha-lightweight-architecture
Change-Id: I5e7585c08675d8a4bd071523b94210d325d79b59
Implements: blueprint ha-lightweight-architecture
Co-Author: cmsj@tenshu.net
|
|
The openstack-core-then-httpd constraint needs to live in the apache
pacemaker manifest and not in the main controller manifest file.
The same goes for those specific vsm/cisco neutron resources.
Change-Id: I2041d4d163f051427b62eec07b8345ad7006cc1d
|
|
|
|
|
|
|
|
Currently we are still creating all the pacemaker constraints for nova
in the main overcloud_controller_pacemaker.pp manifest file.
Let's move those to each role where they belong. Note that given
that a constraint depends on two separate pacemaker resources it is
a bit arbitrary in which file they end up being (the one of the first
resource or the second one).
Change-Id: I96a3a313d15fac820b020feae0568437c2cbade3
|
|
The openstack-core resource is not needed by the NG Pacemaker
architecture. It was moved into an isolated role by [1] so that
it could optionally be enabled when wanting the older architecture.
This submission removes the old openstack-core global resource.
1. I74a62973146c0261385ecf5fd3d06db51e079caa
Change-Id: I16a786ce167c57848551c7245f4344c382c55b3d
|
|
Change-Id: I74a62973146c0261385ecf5fd3d06db51e079caa
|
|
Nova {} workaround is not working correctly, we need to merge this patch
so we can move out ::nova from THT completely.
Also we need to use nova::cache to configure memcached parameters.
Co-Authorized-By: Giulio Fidente <gfidente@redhat.com>
Co-Authorized-By: Sven Anderson <sven@redhat.com>
Co-Authorized-By: Emilien Macchi <emilien@redhat.com>
Depends-On: I52d5badb9960124bb8fcb54983db2853c4185e77
Depends-On: I3e400a5f64b85f0d374fc02cc5e4080d19d0f2e4
Depends-On: Iee5f8015cbf40ca0e9a435a7de919ebdb74cf93f
Change-Id: Ie4e72e765f6a8ade48d4b2b766f067872554d1a2
|
|
|