Age | Commit message (Collapse) | Author | Files | Lines |
|
Some nodes fail automatic testing done by MaaS during commissioning,
although running the testing suites one more time manually works.
For now, just override all 'failed testing' nodes unconditionally.
JIRA: FUEL-333
Change-Id: I13d3ee3d82550524480aa53aa8752ab90aa940cd
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Previous commit replacing explicit loops with `wait_for` failed to
properly escape a nested variable, leading to deploy failure.
Also, the logic was flawed, not breaking for offline nodes, rendering
the whole barrier check useless.
Fixes: 1a0e8e7e
Change-Id: I038dbf90fb53c6b61da2e5c9b6867e31d78867af
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Change-Id: I7b583c354843f0116a65b3a31f3be4589087b8a5
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
- apply `linux` state on cfg01 first, so PXE/admin IP is added and
FN VM minions are available;
- add barrier and wait for all FN VMs to register with cfg01;
- use batch-mode execution while applying `linux.network` on FN VMs;
- retry all states executed via <salt.sh> on FN VMs;
JIRA: FUEL-310
Change-Id: I72e1c565370072500df1d486fe76e6315f583c75
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
wait_for function should be able to also check for minions that did
not return or not respond, in addition to the return code.
To keep it backwards compatible, condition the new check on the max
attempt number being specified in decimal format (e.g. '10.0' unlike
old '10').
Change-Id: If2512cf9121cdd795638efe7362ef0485d4e8d91
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Isolate networks by retiring NAT on mas01; also cutting direct
internet access from cluster nodes that are not facing the public
network (prx, cmp).
NOTE: Since we are removing mas01 NAT, VCP VMs (except prx which have
public IPs) and kvm nodes (cmp also have public IPs) will no longer
have direct internet connectivity.
Cluster deployment and operations will work without it, but if it is
required for different reasons, the MaaS proxy could be enabled by
uncommenting the /etc/enviroment section in:
- cluster.baremetal-mcp-pike-common-ha.include.proxy.yml
JIRA: FUEL-317
Change-Id: I5ed8b420296b27df34a54ec1ebd7b7cf58041425
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Change-Id: I9dbb51ce2387450e4ae19f8b3444f5e52cfdc71d
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
`maas_fixup` is already re-entrant, so we can execute it more than
once during a commissioning/deploy cycle. Reduce the timeout waiting
for all nodes to reach a stable state, so nodes stuck in 'Ready'
state instead of reaching 'Deploying' get dealt with sooner (~5 min
vs old 30 min).
While at it, let `maas_fixup` handle machine deploy as well, so we
can catch nodes stuck in 'Ready' state and re-trigger the deploy.
Change-Id: Id24cc97b17489835c5846288639a9a6032bd320a
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
|
|
Use PXE/admin network for salt traffic from/to all minions
except cfg01, mas01.
This allows us to drop the route to admin net from cfg01.
Change-Id: Ic2526f1ff77afe5d92ced900971f4c8f78d2d8a2
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Running `ci/deploy.sh -EE` should also perform an UEFI boot option
cleanup, otherwise we risk booting the previously installed OS.
While at it, reduce delay between nodes removal and fix a rare failure
for `-EE` when no nodes are defined in MaaS.
Change-Id: I789ffd3e22545921216f7d5ee3509c76354542eb
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Currently, Xenial repos provide MaaS 2.2.x, while the PPA bumped it
to 2.3.x. Since we switched to 2.3, we observed a rare wrongful state
transition from 'Deploying' back to 'Ready'.
Drop the PPA, falling back to 2.2 from mainline distro repos.
JIRA: FUEL-312
Change-Id: I3daa118059f37cbeca076da685661c28f3a28a97
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
NOTE: In order to undefine VCP VMs with NVRAM (e.g. AArch64 VMs
using AAVMF), an additional parameter should be passed to libvirt
by Salt virt core module (equivalent to `virsh undefine --nvram`).
While at it, pass CI_DEBUG, ERASE_ENV enviroment variables to
state execution, and stop force-applying patches.
Also refactor the rsync between foundation node and Salt master,
so the whole git repo is copied as </root/opnfv>, and <root/fuel>
becomes a link to it; useful for Armband, where 'fuel' is a git
submodule. Fix .git paths after rsync, so git submodules work as
expected in cfg01 repos.
JIRA: FUEL-307
Change-Id: Ic62f03e786581c019168c50ccc50107238021d7f
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Change-Id: Icc30d27951abb1e231c9269c6293782a39e08fb6
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
Change-Id: I7a21c30d49aecca948f45535fec164c2f643450e
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
After MaaS reports baremetal provisioning finished successfully,
check that all nodes are online before attempting a `sync_all`.
Change-Id: I6ba4b3e4ba5b5258ace4da8c39e0fc77354885e3
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Change-Id: Ib4aa3f2cb4fc7129d502b4332cd7fedd83a0e1fe
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
While applying scenario states, break on error, and retry failed
state up to 5 times. Apply the same behavior for `salt.sh`.
Add new deploy parameter, '-D', backed up by 'CI_DEBUG' env var,
which gates deploy sh scripts logging (set -x).
Also extend '-f' deploy parameter, allowing it to be specified
more than once; the first occurence will skip infra VM creation,
but still sync reclass & other config from local repo, while a
second occurence will also disable config sync.
To prevent glusterfs client state from failing due to non-existent
nova user/group, move it after nova:compute's nova state is applied.
Change-Id: I234e126e16be0e133d878957bd88fed946955de8
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
While at it, compact 'set' into bash shebang where possible and
add `make patches-copyright` target to simplify adding patch
license headers.
Change-Id: I0c841de72e5709e5eef915a52c5ec4a7fc0f7c37
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
While at it, fix some shellcheck warnings, and s/fgrep/grep -F/g.
Change-Id: I093b7b4c196731b1ecc0c27a4111955b2e412762
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
We should eventually also support baremetal deploys without a
virtualized control plane (VCP), so decouple MaaS provisioning
from VCP provisioning.
While at it, move "wait_for" bash function from maas state to
common library file, lib.sh.
Change-Id: I32c33135655cb6aceae901a5f92b51265a8c84b4
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Change-Id: If7cb8473f5c290d1d5f22fce5567f7b8da24fd9f
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
|
|
Change-Id: I53ac0be519df1bb39a6a56e236285fce95228bd4
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
Change-Id: I7fe8d0c77a1d62e2214fb1089651a639303dd20e
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
pkg.upgrade was enabled for all salt minions, including VCP VMs,
which take longer to perform the operation, probably due to an
older set of packages in the Ubuntu disk image we use.
One way to work around this is to switch to UCA Xenial image, and
let Salt pre-provising salt minion on it, but that adds deploy
time delay and has caused issues in the past (should be ok now).
Alternatively, we can retry the pkg.upgrade until all minions
respond, before moving on with the state execution. This prevents
silently skipping the next salt calls (e.g. installing keepalived).
Note that the issue did not manifest for OVD-DPDK, where after
pkg.upgrade, DPDK is installed, giving VCP VMs enough time to return.
While at it, retry 'salt.control' state apply too (non-critical,
but it fails every once in a while).
Fixes: 87310fb
Change-Id: I97acc2b23206a55d72f7e6583ca42127fdbacc16
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Occasionaly, MaaS fails to provision/deploy some nodes, in which
case we try marking them as broken, then fixed (to put them again
in 'ready' state); before re-attempting the MaaS deploy operation.
However, this leads to 'Error: Internal server error' when deploy
function is called right after transitioning the node to 'ready'
state.
Add a delay of 30 seconds before re-attempting the failed operation.
Change-Id: Ia9ecec67639387e4a29feab3114e1741c554a2cb
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
* [baremetal] add memory to contollers & salt master
* tune up sysctl vm.dirty* for compute nodes
* upgrade packages to get the latest versions
(https://bugs.launchpad.net/cinder/+bug/1641312)
Change-Id: I9ad22206f2f3f11e1da3f93c7a0931c592adf1cf
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
mcpcontrol virsh network, as well as MaaS PXE network are installer
specific, and not POD specific.
Therefore, these should be easily parametrized without the PDF,
using only installer inputs (e.g. env vars passed via Jenkins).
- add new <all-mcp-ocata-common.opnfv.runtime> reclass class;
- parametrize at runtime new reclass class based on global vars;
- factor out MaaS deploy address / config using new mechanism;
- parametrize at runtime virsh network definitions based on template;
- add new "maas.pxe_route" sls for configuring routing on cfg01;
- replace env vars with the new sls in "maas" state;
NOTE: baremetal parametrization will be handled later.
Change-Id: Ifd61143d818fb088b3f4395388ba769bbc49156e
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Use INSTALLER_IP Jenkins param instead of SALT_MASTER_IP, allowing
us to drop SALT_MASTER_IP completely from releng.
mcpcontrol IP changes:
- 192.168.10.100 becomes 10.20.0.2 (align with legacy Fuel master);
- 192.168.10.3 becomes 10.20.0.3 (baremetal MaaS address);
JIRA: FUEL-285
Change-Id: I6e2d44c3a8b43846196bd64191735214167a76ce
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Fix/silence all shellcheck errors, except for scripts in
<prototypes/sfc_tacker>.
Change-Id: Idc317cdba0f69b78299f2d3665e72ffc19dd8af5
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
JIRA: FUEL-283
Change-Id: Ie85af8c12163fac28cb8826aa8902a4ff3dec623
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
* add user of "ubuntu" so that functest gets cluster credentials
* reduce cpu resources for vcp nodes in nofeature scenario
* tune salt targets for maas state
* specify ntp servers
Change-Id: I433a1de1cd2c69c6747c62c3359f5485dee3bfa4
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
|
|
While at it, parametrize max attempt number in maas state's "wait_for",
and reduce retries count for certain simpler tasks.
Change-Id: I3ac2877719cdd32613bcf41186ebbb9f3f3aee93
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Since we don't `set -e` in state files, applying each state will
always succeed unless the last instruction in the state fails.
Make this uniform by always succeeding in applying the state.
While at it, enable bash debugging logs, for better readability
of deploy log files.
Change-Id: I3cf4886f6d73c6fd1380df1a4e1413334bec1701
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Add our mcp.rsa.pub RSA key to all nodes, including VCP VMs.
This is required for functest to be able to fetch openrc.
While at it, add retry wrappers for more VCP VM state.sls calls.
Change-Id: I34f79848c52e36de8d981055880321a081420874
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Signed-off-by: Guillermo Herrero <Guillermo.Herrero@enea.com>
|
|
* shift vcp nodes interfaces since names started from ens2
* add extra salt sync before vcp start up
* run rabbitmq state on 1st node beforehand then the rest
Change-Id: Ic2c174c288a5e89f2f28c0d9aa573340190a61d3
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
Running a heavy state like `linux` on all nodes (including VCP VMs)
might time out the first time on slower systems.
Change-Id: I21a3ad380afafa833f59e14da86aff92e254e9c7
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Remove keys that are left over from the previous deployment
to avoid interfere with the new ones.
Change-Id: I0dfa9782cbce9a8e8b7c1efe5954c8ffe85996f9
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
Change-Id: I86bb27b323152440e8a885dbf867da433a288dae
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
Change-Id: Ic47a9dd2d5a4cccc9c4330509d81aba82f777084
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
Previous changes attempted to add 'noifupdown' support, but failed
to spell it correctly. Fix the typo and also edit the 'maas' state
to use simple `salt state.apply` instead of `cmd.run 'salt-call'`.
Change-Id: If9889dee896fa100febe0372fe2c4173fc223ee3
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|
|
* re-assign ip from interface to bridge
- install bridge utils
- make a reboot straight away after network config
* change image source for vcp
Change-Id: I34506ee161337b5d3a4088cfdf3c082d99ccb695
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
|
|
- ci/deploy.sh: fail if default scenario file is missing;
- start by copying reclass/classes/cluster/virtual-mcp-ocata-ovs as
classes/cluster/baremetal-mcp-ocata-ovs;
- add new state (maas) that will handle MaaS configuration;
- Split PXE network in two for baremetal:
* rename old "pxe" virtual network to "mcpcontrol", make it
non-configurable and identical for baremetal/virtual deploys;
* new "pxebr" bridge is dedicated for MaaS fabric network, which
comes with its own DHCP, TFTP etc.;
- Drop hardcoded PXE gateway & static IP for MaaS node, since
"mcpcontrol" remains a NAT-ed virtual network, with its own DHCP;
- Keep internet access available on first interfaces for cfg01/mas01;
- Align MaaS IP addrs (all x.y.z.3), add public IP for easy debug
via MaaS dashboard;
- Add static IP in new network segment (192.168.11.3/24) on MaaS
node's PXE interface;
- Set MaaS PXE interface MTU 1500 (weird network errors with jumbo);
- MaaS node: Add NAT iptables traffic forward from "mcpcontrol" to
"pxebr" interfaces;
- MaaS: Add harcoded lf-pod2 machine info (fixed identation in v6);
- Switch our targeted scenario to HA;
* scenario: s/os-nosdn-nofeature-noha/os-nosdn-nofeature-ha/
- maas region: Use mcp.rsa.pub from ~ubuntu/.ssh/authorized_keys;
- add route for 192.168.11.0/24 via mas01 on cfg01;
- fix race condition on kvm nodes network setup:
* add "noifupdown" support in salt formula for linux.network;
* keep primary eth/br-mgmt unconfigured till reboot;
TODO:
- Read all this info from PDF (Pod Descriptor File) later;
- investigate leftover references to eno2, eth3;
- add public network interfaces config, IPs;
- improve wait conditions for MaaS commision/deploy;
- report upstream breakage in system.single;
Change-Id: Ie8dd584b140991d2bd992acdfe47f5644bf51409
Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
Signed-off-by: Guillermo Herrero <Guillermo.Herrero@enea.com>
Signed-off-by: Charalampos Kominos <Charalampos.Kominos@enea.com>
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
|