aboutsummaryrefslogtreecommitdiffstats
path: root/mcp/config/states/maas
AgeCommit message (Collapse)AuthorFilesLines
2018-11-21[state] maas: Retry first state apply on mas01Alexandru Avadanii1-1/+1
Change-Id: I6d2fab853b25d2f235e27c83a355ebc2c520771c Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-08-03[lib.sh] Reset virtual nodes after MaaS installAlexandru Avadanii1-1/+1
For hybrid PODs (e.g. x86_64 jumpserver + control nodes, aarch64 baremetal compute nodes), the virtual nodes rely on MaaS DHCP to be up when the OS boots, so issue a `virsh reset` accordingly. Instead of checking for online nodes using `test.ping`, use `saltutil.sync_all` to also sync Salt state modules to the virtual nodes (usually handled by baremetal_init state in HA deploys). JIRA: FUEL-338 Change-Id: If689d057dc4438102c3a7428a97b9638e21bfdc5 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-05-24[maas] Adopt maas.machines.storageting wu1-10/+1
Replace MAAS CLI set_disk_layout with the new maas.machines.storage state JIRA: FUEL-364 Change-Id: I4d8cd9f473c5386ee7b32ad378ca1e02989233ca Signed-off-by: ting wu <ting.wu@enea.com>
2018-03-08Revert "[baremetal] Retire mas01 NAT"Alexandru Avadanii1-0/+1
Bring back public internet access to all cluster nodes via NAT on mas01 node, required for NTP syncing. NOTE: Both mcpcontrol and PXE/admin networks are currently hard wired to using /24 netmask, so we leverage that in pxe_nat.sls. JIRA: FUEL-348 This reverts commit 9a6e655e0b851ff6e449027c01ac1a66188b0064. Change-Id: I7bab385f95f8c6d92cadc4e2149c2cd56e10c506 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-02-17[MaaS] Add maas.machines.set_storage_layout slsAlexandru Avadanii1-0/+10
On cmp nodes, allocate only 30GB (fixed for now) for / partition. The rest of the disk(s) can later be allocated via salt-formula-linux. JIRA: FUEL-330 Change-Id: Ie11c78791e60801719cd33475ff91fc003df5ffa Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-02-17[MaaS] Override failed testing by defaultAlexandru Avadanii1-1/+8
Some nodes fail automatic testing done by MaaS during commissioning, although running the testing suites one more time manually works. For now, just override all 'failed testing' nodes unconditionally. JIRA: FUEL-333 Change-Id: I13d3ee3d82550524480aa53aa8752ab90aa940cd Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-02-07[states] Fix broken online check for bm, vcp nodesAlexandru Avadanii1-3/+3
Previous commit replacing explicit loops with `wait_for` failed to properly escape a nested variable, leading to deploy failure. Also, the logic was flawed, not breaking for offline nodes, rendering the whole barrier check useless. Fixes: 1a0e8e7e Change-Id: I038dbf90fb53c6b61da2e5c9b6867e31d78867af Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-02-06[states] maas, vcp: Use `wait_for` in online checkAlexandru Avadanii1-14/+7
Change-Id: I7b583c354843f0116a65b3a31f3be4589087b8a5 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-02-05[FN VM] Reboot VMs on jump, wait for all onlineAlexandru Avadanii1-1/+1
- apply `linux` state on cfg01 first, so PXE/admin IP is added and FN VM minions are available; - add barrier and wait for all FN VMs to register with cfg01; - use batch-mode execution while applying `linux.network` on FN VMs; - retry all states executed via <salt.sh> on FN VMs; JIRA: FUEL-310 Change-Id: I72e1c565370072500df1d486fe76e6315f583c75 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-01-07lib.sh: Extend wait_for function to catch no respAlexandru Avadanii1-1/+1
wait_for function should be able to also check for minions that did not return or not respond, in addition to the return code. To keep it backwards compatible, condition the new check on the max attempt number being specified in decimal format (e.g. '10.0' unlike old '10'). Change-Id: If2512cf9121cdd795638efe7362ef0485d4e8d91 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2018-01-03[baremetal] Retire mas01 NATAlexandru Avadanii1-1/+0
Isolate networks by retiring NAT on mas01; also cutting direct internet access from cluster nodes that are not facing the public network (prx, cmp). NOTE: Since we are removing mas01 NAT, VCP VMs (except prx which have public IPs) and kvm nodes (cmp also have public IPs) will no longer have direct internet connectivity. Cluster deployment and operations will work without it, but if it is required for different reasons, the MaaS proxy could be enabled by uncommenting the /etc/enviroment section in: - cluster.baremetal-mcp-pike-common-ha.include.proxy.yml JIRA: FUEL-317 Change-Id: I5ed8b420296b27df34a54ec1ebd7b7cf58041425 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-12-20[maas] Adjust deployment order/timeoutsMichael Polenchuk1-3/+7
Change-Id: I9dbb51ce2387450e4ae19f8b3444f5e52cfdc71d Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-12-19[baremetal] MaaS: Reduce timeout valuesAlexandru Avadanii1-9/+8
`maas_fixup` is already re-entrant, so we can execute it more than once during a commissioning/deploy cycle. Reduce the timeout waiting for all nodes to reach a stable state, so nodes stuck in 'Ready' state instead of reaching 'Deploying' get dealt with sooner (~5 min vs old 30 min). While at it, let `maas_fixup` handle machine deploy as well, so we can catch nodes stuck in 'Ready' state and re-trigger the deploy. Change-Id: Id24cc97b17489835c5846288639a9a6032bd320a Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-12-18Merge "[baremetal] Move salt master IP to PXE/admin"Alexandru Avadanii1-2/+0
2017-12-18[baremetal] Move salt master IP to PXE/adminAlexandru Avadanii1-2/+0
Use PXE/admin network for salt traffic from/to all minions except cfg01, mas01. This allows us to drop the route to admin net from cfg01. Change-Id: Ic2526f1ff77afe5d92ced900971f4c8f78d2d8a2 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-12-15ci/deploy.sh: maas: cleanup_uefi on env eraseAlexandru Avadanii1-8/+6
Running `ci/deploy.sh -EE` should also perform an UEFI boot option cleanup, otherwise we risk booting the previously installed OS. While at it, reduce delay between nodes removal and fix a rare failure for `-EE` when no nodes are defined in MaaS. Change-Id: I789ffd3e22545921216f7d5ee3509c76354542eb Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-12-10states: maas: Stop using maas-stable PPAAlexandru Avadanii1-2/+0
Currently, Xenial repos provide MaaS 2.2.x, while the PPA bumped it to 2.3.x. Since we switched to 2.3, we observed a rare wrongful state transition from 'Deploying' back to 'Ready'. Drop the PPA, falling back to 2.2 from mainline distro repos. JIRA: FUEL-312 Change-Id: I3daa118059f37cbeca076da685661c28f3a28a97 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-11-21ci/deploy.sh: Add new `-E` arg for env eraseAlexandru Avadanii1-0/+12
NOTE: In order to undefine VCP VMs with NVRAM (e.g. AArch64 VMs using AAVMF), an additional parameter should be passed to libvirt by Salt virt core module (equivalent to `virsh undefine --nvram`). While at it, pass CI_DEBUG, ERASE_ENV enviroment variables to state execution, and stop force-applying patches. Also refactor the rsync between foundation node and Salt master, so the whole git repo is copied as </root/opnfv>, and <root/fuel> becomes a link to it; useful for Armband, where 'fuel' is a git submodule. Fix .git paths after rsync, so git submodules work as expected in cfg01 repos. JIRA: FUEL-307 Change-Id: Ic62f03e786581c019168c50ccc50107238021d7f Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-11-07[maas] Conform regex to machines status outputMichael Polenchuk1-3/+3
Change-Id: Icc30d27951abb1e231c9269c6293782a39e08fb6 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-10-19[baremetal] Remove infinite loops from node checksAlexandru Avadanii1-4/+6
Change-Id: I7a21c30d49aecca948f45535fec164c2f643450e Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-19[baremetal] maas state: Wait for all nodes onlineAlexandru Avadanii1-0/+13
After MaaS reports baremetal provisioning finished successfully, check that all nodes are online before attempting a `sync_all`. Change-Id: I6ba4b3e4ba5b5258ace4da8c39e0fc77354885e3 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-19[baremetal] maas state: Retry sync_all on failureAlexandru Avadanii1-1/+1
Change-Id: Ib4aa3f2cb4fc7129d502b4332cd7fedd83a0e1fe Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-15states: Break on error, retry states up to 5 timesAlexandru Avadanii1-1/+3
While applying scenario states, break on error, and retry failed state up to 5 times. Apply the same behavior for `salt.sh`. Add new deploy parameter, '-D', backed up by 'CI_DEBUG' env var, which gates deploy sh scripts logging (set -x). Also extend '-f' deploy parameter, allowing it to be specified more than once; the first occurence will skip infra VM creation, but still sync reclass & other config from local repo, while a second occurence will also disable config sync. To prevent glusterfs client state from failing due to non-existent nova user/group, move it after nova:compute's nova state is applied. Change-Id: I234e126e16be0e133d878957bd88fed946955de8 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-14Add license headers where missingAlexandru Avadanii1-2/+8
While at it, compact 'set' into bash shebang where possible and add `make patches-copyright` target to simplify adding patch license headers. Change-Id: I0c841de72e5709e5eef915a52c5ec4a7fc0f7c37 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-14Drop infinite loops in favor of finite wait_forAlexandru Avadanii1-1/+1
While at it, fix some shellcheck warnings, and s/fgrep/grep -F/g. Change-Id: I093b7b4c196731b1ecc0c27a4111955b2e412762 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-14states: Split virtual_control_plane from maasAlexandru Avadanii1-48/+2
We should eventually also support baremetal deploys without a virtualized control plane (VCP), so decouple MaaS provisioning from VCP provisioning. While at it, move "wait_for" bash function from maas state to common library file, lib.sh. Change-Id: I32c33135655cb6aceae901a5f92b51265a8c84b4 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-10-02Align salt version & reposMichael Polenchuk1-4/+0
Change-Id: If7cb8473f5c290d1d5f22fce5567f7b8da24fd9f Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-09-25Merge "states: maas: Dynamic node count in stop condition"Alexandru Avadanii1-1/+4
2017-09-25Run packages upgrade on openstack nodes onlyMichael Polenchuk1-2/+2
Change-Id: I53ac0be519df1bb39a6a56e236285fce95228bd4 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-09-24states: maas: Dynamic node count in stop conditionMichael Polenchuk1-1/+4
Change-Id: I7fe8d0c77a1d62e2214fb1089651a639303dd20e Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com> Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-09-23Workaround VCP minions timeout post pkg.upgradeAlexandru Avadanii1-2/+4
pkg.upgrade was enabled for all salt minions, including VCP VMs, which take longer to perform the operation, probably due to an older set of packages in the Ubuntu disk image we use. One way to work around this is to switch to UCA Xenial image, and let Salt pre-provising salt minion on it, but that adds deploy time delay and has caused issues in the past (should be ok now). Alternatively, we can retry the pkg.upgrade until all minions respond, before moving on with the state execution. This prevents silently skipping the next salt calls (e.g. installing keepalived). Note that the issue did not manifest for OVD-DPDK, where after pkg.upgrade, DPDK is installed, giving VCP VMs enough time to return. While at it, retry 'salt.control' state apply too (non-critical, but it fails every once in a while). Fixes: 87310fb Change-Id: I97acc2b23206a55d72f7e6583ca42127fdbacc16 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-09-22states: maas: Add 30s delay in re-deploy attemptAlexandru Avadanii1-0/+2
Occasionaly, MaaS fails to provision/deploy some nodes, in which case we try marking them as broken, then fixed (to put them again in 'ready' state); before re-attempting the MaaS deploy operation. However, this leads to 'Error: Internal server error' when deploy function is called right after transitioning the node to 'ready' state. Add a delay of 30 seconds before re-attempting the failed operation. Change-Id: Ia9ecec67639387e4a29feab3114e1741c554a2cb Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-09-22Adjust memory allocation sizeMichael Polenchuk1-0/+3
* [baremetal] add memory to contollers & salt master * tune up sysctl vm.dirty* for compute nodes * upgrade packages to get the latest versions (https://bugs.launchpad.net/cinder/+bug/1641312) Change-Id: I9ad22206f2f3f11e1da3f93c7a0931c592adf1cf Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-09-12reclass, states: Parametrize runtime configurationAlexandru Avadanii1-2/+1
mcpcontrol virsh network, as well as MaaS PXE network are installer specific, and not POD specific. Therefore, these should be easily parametrized without the PDF, using only installer inputs (e.g. env vars passed via Jenkins). - add new <all-mcp-ocata-common.opnfv.runtime> reclass class; - parametrize at runtime new reclass class based on global vars; - factor out MaaS deploy address / config using new mechanism; - parametrize at runtime virsh network definitions based on template; - add new "maas.pxe_route" sls for configuring routing on cfg01; - replace env vars with the new sls in "maas" state; NOTE: baremetal parametrization will be handled later. Change-Id: Ifd61143d818fb088b3f4395388ba769bbc49156e Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-09-11salt master, maas: Move mcpcontrol to 10.20.0.0/24Alexandru Avadanii1-1/+1
Use INSTALLER_IP Jenkins param instead of SALT_MASTER_IP, allowing us to drop SALT_MASTER_IP completely from releng. mcpcontrol IP changes: - 192.168.10.100 becomes 10.20.0.2 (align with legacy Fuel master); - 192.168.10.3 becomes 10.20.0.3 (baremetal MaaS address); JIRA: FUEL-285 Change-Id: I6e2d44c3a8b43846196bd64191735214167a76ce Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-09-08bash scripts: Fix remaining shellcheck warn/errsAlexandru Avadanii1-2/+7
Fix/silence all shellcheck errors, except for scripts in <prototypes/sfc_tacker>. Change-Id: Idc317cdba0f69b78299f2d3665e72ffc19dd8af5 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-30states: maas: Retry linux state if no responseAlexandru Avadanii1-1/+2
JIRA: FUEL-283 Change-Id: Ie85af8c12163fac28cb8826aa8902a4ff3dec623 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-28[baremetal] Add required user on vcp nodesMichael Polenchuk1-6/+6
* add user of "ubuntu" so that functest gets cluster credentials * reduce cpu resources for vcp nodes in nofeature scenario * tune salt targets for maas state * specify ntp servers Change-Id: I433a1de1cd2c69c6747c62c3359f5485dee3bfa4 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-08-24Merge "MaaS: commissioning/deployment retry"Michael Polenchuk1-17/+46
2017-08-23MaaS: commissioning/deployment retryAlexandru Avadanii1-17/+46
While at it, parametrize max attempt number in maas state's "wait_for", and reduce retries count for certain simpler tasks. Change-Id: I3ac2877719cdd32613bcf41186ebbb9f3f3aee93 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-23ci/deploy.sh, states: bash debug, continue on errAlexandru Avadanii1-0/+2
Since we don't `set -e` in state files, applying each state will always succeed unless the last instruction in the state fails. Make this uniform by always succeeding in applying the state. While at it, enable bash debugging logs, for better readability of deploy log files. Change-Id: I3cf4886f6d73c6fd1380df1a4e1413334bec1701 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-23states/maas: Add mcp.rsa.pub to authorized_keysAlexandru Avadanii1-1/+4
Add our mcp.rsa.pub RSA key to all nodes, including VCP VMs. This is required for functest to be able to fetch openrc. While at it, add retry wrappers for more VCP VM state.sls calls. Change-Id: I34f79848c52e36de8d981055880321a081420874 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com> Signed-off-by: Guillermo Herrero <Guillermo.Herrero@enea.com>
2017-08-22Shift vcp nodes interfacesMichael Polenchuk1-1/+3
* shift vcp nodes interfaces since names started from ens2 * add extra salt sync before vcp start up * run rabbitmq state on 1st node beforehand then the rest Change-Id: Ic2c174c288a5e89f2f28c0d9aa573340190a61d3 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-08-22states: maas: Retry applying VCP VMs linux stateAlexandru Avadanii1-1/+1
Running a heavy state like `linux` on all nodes (including VCP VMs) might time out the first time on slower systems. Change-Id: I21a3ad380afafa833f59e14da86aff92e254e9c7 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-21Cleanup outdated salt keysMichael Polenchuk1-0/+3
Remove keys that are left over from the previous deployment to avoid interfere with the new ones. Change-Id: I0dfa9782cbce9a8e8b7c1efe5954c8ffe85996f9 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-08-21Check out all vcp nodes are availableMichael Polenchuk1-0/+11
Change-Id: I86bb27b323152440e8a885dbf867da433a288dae Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-08-19maas state: Add debug output to grep query loopsAlexandru Avadanii1-3/+3
Change-Id: Ic47a9dd2d5a4cccc9c4330509d81aba82f777084 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-18linux.network: Fix noifupdown in linux/map.jinjaAlexandru Avadanii1-2/+3
Previous changes attempted to add 'noifupdown' support, but failed to spell it correctly. Fix the typo and also edit the 'maas' state to use simple `salt state.apply` instead of `cmd.run 'salt-call'`. Change-Id: If9889dee896fa100febe0372fe2c4173fc223ee3 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2017-08-18Apply network config on kvm nodesMichael Polenchuk1-4/+5
* re-assign ip from interface to bridge - install bridge utils - make a reboot straight away after network config * change image source for vcp Change-Id: I34506ee161337b5d3a4088cfdf3c082d99ccb695 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2017-08-17Bring in baremetal supportAlexandru Avadanii1-0/+56
- ci/deploy.sh: fail if default scenario file is missing; - start by copying reclass/classes/cluster/virtual-mcp-ocata-ovs as classes/cluster/baremetal-mcp-ocata-ovs; - add new state (maas) that will handle MaaS configuration; - Split PXE network in two for baremetal: * rename old "pxe" virtual network to "mcpcontrol", make it non-configurable and identical for baremetal/virtual deploys; * new "pxebr" bridge is dedicated for MaaS fabric network, which comes with its own DHCP, TFTP etc.; - Drop hardcoded PXE gateway & static IP for MaaS node, since "mcpcontrol" remains a NAT-ed virtual network, with its own DHCP; - Keep internet access available on first interfaces for cfg01/mas01; - Align MaaS IP addrs (all x.y.z.3), add public IP for easy debug via MaaS dashboard; - Add static IP in new network segment (192.168.11.3/24) on MaaS node's PXE interface; - Set MaaS PXE interface MTU 1500 (weird network errors with jumbo); - MaaS node: Add NAT iptables traffic forward from "mcpcontrol" to "pxebr" interfaces; - MaaS: Add harcoded lf-pod2 machine info (fixed identation in v6); - Switch our targeted scenario to HA; * scenario: s/os-nosdn-nofeature-noha/os-nosdn-nofeature-ha/ - maas region: Use mcp.rsa.pub from ~ubuntu/.ssh/authorized_keys; - add route for 192.168.11.0/24 via mas01 on cfg01; - fix race condition on kvm nodes network setup: * add "noifupdown" support in salt formula for linux.network; * keep primary eth/br-mgmt unconfigured till reboot; TODO: - Read all this info from PDF (Pod Descriptor File) later; - investigate leftover references to eno2, eth3; - add public network interfaces config, IPs; - improve wait conditions for MaaS commision/deploy; - report upstream breakage in system.single; Change-Id: Ie8dd584b140991d2bd992acdfe47f5644bf51409 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com> Signed-off-by: Guillermo Herrero <Guillermo.Herrero@enea.com> Signed-off-by: Charalampos Kominos <Charalampos.Kominos@enea.com> Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>