summaryrefslogtreecommitdiffstats
path: root/mcp
AgeCommit message (Collapse)AuthorFilesLines
2019-05-31Merge "Revert "Patch dhcp agent to avoid unwanted resync""Michael Polenchuk5-36/+0
2019-05-29Revert "Patch dhcp agent to avoid unwanted resync"Michael Polenchuk5-36/+0
This reverts commit 7522bdb0e898144da2b6dc361dbdd549b39bc025. The original patch has been merged (https://review.opendev.org/661011) Change-Id: I9a1c04590145800523d546e36e9462fa7074922c Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-05-29Disable block migration explicitlyMichael Polenchuk2-0/+2
Functest enabled block migration by default recently but it can't be used with shared storage. Change-Id: I15fd5459df91cece02e87cda9d1ed6e575194667 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-05-10[maas] Fix permissions on (partial) redeployAlexandru Avadanii2-4/+6
When redeploying a cluster only (keeping the infrastructure containers from a previous deploy), some things need to be adjusted: - /entrypoint.sh exec permission; - /etc/maas uid/gid re-align on new (fresh) deploy; - account for different location of /usr/sbin/tcpdump apparmor profile for CentOS jumpservers; Change-Id: If51db0bc95eff1a497e1df5d457e26a7b902aa5a Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-05-09[fdio] Bump compute RAM defaults for virtual PODsAlexandru Avadanii6-19/+24
Hugepage count has been recently bumped for virtual PODs via IDF changes in Pharos, so align our FDio scenarios with the new RAM requirements. While at it, fix wrong pod_config template evaluation by moving it after the templated scenario files are expanded, since pod_config relies on scenario node definition. Also, configure VPP to use decimal interface names by default to align with Pharos macro for the VPP interface name string. Change-Id: Ib3a89c294a3a2755567fdbe07e3be2b8ca1a5714 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-05-07Merge "[dpdk] Get back to shared memory model"Michael Polenchuk3-5/+5
2019-05-06Merge "[virtual] Parameterize scenarios based on PDF/IDF"Alexandru Avadanii21-83/+115
2019-05-06[dpdk] Get back to shared memory modelMichael Polenchuk3-5/+5
The per port model potentially requires an increase in memory resource requirements (which is limited by labs) to support the same number of ports and configuration as the shared port model. Set linux:network:openvswitch:per_port_memory explicitly to true to enable per port mempools support for DPDK devices. Change-Id: I130885afc50e7a047f8835113d370840827ad718 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-25Patch dhcp agent to avoid unwanted reschedulingMichael Polenchuk7-6/+36
Change-Id: Id49f26a2615e2fc06e94eeaf2e9200e83625e6c9 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-24[ha] Decouple openstack services by rolesMichael Polenchuk1-10/+18
Deploy the OpenStack API services based on roles to prevent issues with absent database tables since db_sync runs only on the nodes with primary role. Change-Id: I04cf3ce0dd59afd93b8a0dfcf060fbd7e7411c82 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-23[iec] Copy full contents of IEC git repoAlexandru Avadanii1-2/+2
Previously we only synced the scripts subdir, but going forward we will need the full contents of the IEC repo on all cluster nodes. Change-Id: I88edd4885875048d50d28c1eac9fd413dc2b6ffb Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-18mcpcontrol: Avoid duplicate ip rulesAlexandru Avadanii1-1/+2
Executing deploy.sh multiple times led to duplicating the ip rules. Change-Id: Iad5886a851970f166996226fa3d115a93113c6db Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-15mcpcontrol: policy based routing for INSTALLER_IPAlexandru Avadanii1-1/+2
To bypass Docker 'bridge'-backed network isolation, we previously added an extra routing hop, which broke access from inside the 'mcpcontrol' Docker network (typically 10.20.0.0/24) to its bridge address (10.20.0.1), leading to DNS issues on Salt Master. This change leverages policy based routing to only add the extra routing hop for connections originating from the default Docker bridge network ('docker0'). Note that other Docker networks using the 'bridge' driver are still isolated from 'mcpcontrol'. Fixes: d9b44acb Change-Id: Ib92901c3278ae9b815f28f26d4c26f82bcadacd6 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-12Merge "[odl] Disable timeout for learnt flows of snat"Michael Polenchuk4-5/+9
2019-04-12[baremetal] Tune up dpdk optionsMichael Polenchuk2-10/+10
Optimized for LF-POD2 as nic assigned to private/dpdk interface and pinned cores resides on numa #0. Core #11 is for DPDK, the rest four cores for PMDs. Change-Id: Icca701bc1a66f3672b8511e0245c82ca29788a8b Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-12[odl] Disable timeout for learnt flows of snatMichael Polenchuk4-5/+9
Set timeout value for snat punts to zero to turn off the rate limiting and installation of learnt flows. Change-Id: I79dad8fd0f925bfc11d7dc1678c3a414dc35fa56 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-12Merge "route mcpcontrol via PXE br to bypass isolation"Michael Polenchuk1-1/+2
2019-04-11route mcpcontrol via PXE br to bypass isolationAlexandru Avadanii1-1/+2
Recent virsh/Docker network rework changed mcpcontrol (previously a virsh-managed network) into a Docker-controlled network using the 'bridge' driver. As a consequence, Docker now isolates traffic from 'mcpcontrol' network from the default Docker bridge network ('docker0') using iptables rules that check input/output interfaces. Yardstick (and any other Docker container hooked via 'docker0') will not be able to ssh into Salt master due to this isolation. One possible workaround would be to explicitly ACCEPT traffic from 'docker0' going to Salt master. However, this is only properly supported starting with Docker 17.06, while most CI hosts and end users are still using 17.05 or older. In older Docker releases, DOCKER-USER iptables table was not avaiable, so injecting custom iptables and making them persistent is not only complicated, it's also prone to subtle errors. Another way to bypass the iptables rules is to route the packets coming from our new Docker network via another bridge before letting them find their way into 'docker0'. This change adds a new route for the Salt master host (note that MaaS container will not benefit from this) via the PXE bridge on the jumphost (which can be either a real Linux bridge for baremetal deployments or a virsh-managed network); adding one extra network hop for each packet going between our 'mcpcontrol' Docker network and 'docker0', effectively bypassing the Docker-enforced iptables DROP. Change-Id: Id8ac7a638c778887b361c9b64c320664c88f59fd Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-11[ha] Take out class with backports repoMichael Polenchuk4-6/+10
* update system reclass * rectify telemetry redis options Change-Id: I6dca1ae52e7f7d73a90e53fceddca8e86872651b Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-10Merge "Setup repository with backports"Michael Polenchuk12-11/+33
2019-04-09Merge "[VCP VMs] AArch64: Switch seeding back to qemu-nbd"Alexandru Avadanii1-0/+2
2019-04-08Setup repository with backportsMichael Polenchuk12-11/+33
Change-Id: I791436f512dea6c6bc61133c4122ac872950af8e Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-08[VCP VMs] AArch64: Switch seeding back to qemu-nbdAlexandru Avadanii1-0/+2
Upstream change [1] switched from old qemu-nbd preseeding of VCP VMs to using a cloud-init + configuration drive. This breaks on AArch64 with "IDE controllers are unsupported for this QEMU binary or machine type", so switch back to using qemu-nbd. [1] https://github.com/Mirantis/reclass-system-salt-model/commit/c0e4807 Change-Id: I0dfeb638d408343c76a73fafa503048a79ce1f6e Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-08[virtual] Parameterize scenarios based on PDF/IDFAlexandru Avadanii21-83/+115
NOTE: only os-nosdn-nofeature-noha is parameterized for now. - move config drive & disk creation from prepare_vms to create_vms; - make default disk size(s) configurable based on scenario defaults and vPDF; * compute nodes require 2 disks to be defined in vPDF, since the pillar reclass model assumes /dev/vdb is reserved for cinder; * if multiple disks are defined in vPDF, they are created and attached accordinly (only ctl01 and cmp nodes are parameterized in this change; only for the os-nosdn-nofeature-noha scenario); - vCPU specifications are deduced based on vPDF (sockets, cores); * threads/core is hard set to 2 since vPDF does not have a key for it; * NUMA resources are distributed evenly based on the number of sockets configured in PDF; * no less than the mininum requirement for a scenario is allocated (e.g. if PDF specifies 2 cores, but the scenario requires at least 4 cores, the larger value will be used); - RAM is deduced based on PDF (but no less than the mininum req is allocated, e.g. if PDF specifies 2GB RAM for computes, but the scenario requires at least 8GB, the larger value will be used); Change-Id: I97188aa2a1006865b8429eb6483e10c76795f7d2 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-04-05[dpdk] Rise up available memory on computesMichael Polenchuk2-4/+4
There is no enough memory (default 4k pages) for services like libvirt, which cannot fork child processes. Change-Id: I44d8efd7cafb52a7c823c02738c1d321017aa7a3 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-04Define stub for cinder service in keystoneMichael Polenchuk2-0/+8
Required only for Rally validation in cinder scenarios, there is no useful functionaly in terms of cluster. Change-Id: Idc4d62cbbc9974972e9d492b5a419342077e3d9a Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-03[noha] Deploy dhcp/metadata agents on computesMichael Polenchuk2-0/+6
Sometimes instance doesn't get ip address from dhcp server, which resides only on gateway node, so run additional dhcp/metadata agents on compute nodes to handle tenant networks in place. Change-Id: If1d74af665cf8db64b09f846fac7192f76abdb25 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-04-02[dpdk] Enable per port memory modelMichael Polenchuk6-22/+25
The per port memory model provides a more transparent memory usage model and avoids pool exhaustion due to competing memory requirements for interfaces. (http://docs.openvswitch.org/en/latest/topics/dpdk/memory/) Change-Id: I5add0f49cdcdf2fc3d24affee10a275abe3ca46a Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-29[akraino] Add IEC K8-calico scenariosAlexandru Avadanii23-16/+492
- bump Pharos git submodule to allow PODs with fewer nodes; - add `k8-calico-iec-noha` scenario definition for Akraino IEC basic configuration; - add `k8-calico-iec-vcp-noha` scenario definition for Akraino IEC nested (virtualized control plane) configuration; - add `akraino_iec` state, which will leverage the Akraino IEC bootstrap scripts from [1]; - replace system.reboot salt call with cmd.run 'reboot' as it's more reliable; - use kernel 4.15 for AArch64 K8 IEC scenarios; NOTE: These scenarios will not be released in OPNFV since don't rely on Salt formulas but instead of Akraino IEC scripts to install K8s. [1] https://gerrit.akraino.org/r/#/q/project:iec Change-Id: I4e538e0563d724cd3fd5c4d462ddc22d0c739402 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-03-29Bring in kubernetes scenarioMichael Polenchuk13-0/+483
Change-Id: I2b41ce2e275bb053fa2590654ea7fa432b0c857f Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-27Rectify system reclass after updateMichael Polenchuk9-1/+23
* add opendaylight password (removed from system level) * get updated ovn system class w/o mysql settings * enable ceilometer user back (removed along with outdated service/endpoints) * adjsut check interval of haproxy for noha scenarios since there is only one backend for services, i.e. failover ain't expected Change-Id: Iedee290e1cfcf838998bd44dc09a729d143974ac Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-27Merge "[fdio] salt-formula-neutron: Fix VPP support patch"Michael Polenchuk1-25/+24
2019-03-25[fdio] salt-formula-neutron: Fix VPP support patchAlexandru Avadanii1-25/+24
After Rocky support was added upstream to salt-formula-neutron, our FDIO patch continued to be applied only for Queens, so refresh the patch by switching to Rocky. Change-Id: If0bbb9c4ec674d386ceade00ef8fe936482fb49c Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-03-25Update system reclassMichael Polenchuk14-14/+14
Change-Id: I745a838b1f2f294b6c455700509ddf4b0264446f Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-19Revert "Fix race condition with nova privsep utime"Michael Polenchuk3-16/+0
This reverts commit ac56d7b14f46b05f497b3dca4b6a4b0bfedd83e2. The original patch has been merged (https://review.openstack.org/643011) Change-Id: I3a7cd825f371e375d36256143b4b8c91f90ee26e Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-18[lib] nbd: Explicitly map partitionsAlexandru Avadanii1-1/+5
Certain kernels (e.g. 4.4.0-101+ in Ubuntu) no longer automatically ack the partition table update after `kpartx -a /dev/nbdX`, see [1]. To avoid another dependency on `parted` packages, use `partx` from `util-linux`, which is already installed as a dependency of e2fsprogs. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1743026 Change-Id: Ibd993fe210c1a11814e89a66759568d4d117d613 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-03-14Smooth down telemetry servicesMichael Polenchuk11-33/+2
* update gnocchi to 4.3 * remove outdated ceilometer api Change-Id: I7adaf3ddc76d93531b6b0997b684672b80f2992f Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-03-06[lib] Create veths using systemd opnfv-fuel unitsAlexandru Avadanii2-9/+43
Create 2 systemd services on the jumphost that will handle veth pairs creation, respectively adding them to virsh/real bridges. This allows us to set docker containers restart policy to 'always', enabling persistent Salt Master/MaaS containers across jumphost reboots. NOTE: libvirt creates virtual networks async, hence the need for retrying hooking veths to them. JIRA: FUEL-406 Change-Id: I1ca033cb5eb854b577b57bb2387a58bd9605a5bb Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-03-04Turn off meltdown/spectre patchesMichael Polenchuk4-0/+12
Change-Id: Id75ffe4db808a4ec250ba8b86c5d49f1206c3784 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-02-28Tune up nova/neutron intervalsMichael Polenchuk17-144/+28
Also re-align resources for virtual scenarios. Change-Id: Id0d55407fd5b1720a24e30c364219f8b08e89d06 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-02-26Fix race condition with nova privsep utimeMichael Polenchuk3-0/+16
Bug: https://bugs.launchpad.net/nova/+bug/1809123 Change-Id: I14622c21826aeeddac6ea7bf7f9d116cd3e68cfb Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-02-26Merge "[cfg01] Reduce mine_interval to 15 min"Michael Polenchuk1-1/+1
2019-02-22[lib] Add fatal validation of old kernel on UbuntuAlexandru Avadanii1-0/+8
As reported in [1], kernel 4.4 seems to break nested virtualization, add a fatal check against it. [1] https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1797332 Change-Id: I0aef8a7340dd82bfeb2e58c9642623b9ec13dca5 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-02-22[cfg01] Reduce mine_interval to 15 minAlexandru Avadanii1-1/+1
Some PODs are fast enough to get past installing, syncing and using MaaS to provision the OS on the baremetal nodes before the 1h mine refresh. Since mine.update operation is fast enough to go unnoticed and we only collect IP addresses, grains and pem entries, schedule it every 15 minutes. Due to reclass class inheritance, we can't easily override this via pillar data, so handle it via entrypoint.sh. Change-Id: I0d8ed2da838ad09c94e9327d0131d3e239de4f08 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-02-22Install missing gnocchi dependenciesMichael Polenchuk3-0/+13
Change-Id: Ifc4fff90551344c69295990b220f0778967887a4 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>
2019-02-19Merge "[baremetal] Containerize MaaS"Alexandru Avadanii28-179/+183
2019-02-15Merge "[cfg01] Schedule x509.get_pem_entries mine update"Alexandru Avadanii1-0/+4
2019-02-15[cfg01] Schedule x509.get_pem_entries mine updateAlexandru Avadanii1-0/+4
Previously, Salt Master CA mine was only sent once, during salt.minion.ca state execution at cfg01 bringup / bootstrap. This causes possible issues with: - Salt Master container restart (mine data is lost); - UNH Lab deployment (uknown rootcause, might be related to XFS and overlay2 being used with Docker on CentOS); To bypass this issue, make x509.get_pem_entries module send mine data at the default mine interval (60 minutes). Change-Id: I5f6334ae18f5af6cbe0a164791603b67f0a3668f Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-02-14[baremetal] Containerize MaaSAlexandru Avadanii28-179/+183
- replace mas01 VM with a Docker container; - drop `mcpcontrol` virsh-managed network, including special handling previously required for it across all scripts; - drop infrastructure VMs handling from scripts, the only VMs we still handle are cluster VMs for virtual and/or hybrid deployments; - drop SSH server from mas01; - stop running linux state on mas01, as all prerequisites are properly handled durin Docker build or via entrypoint.sh - for completeness, we still keep pillar data in sync with the actual contents of mas01 configuration, so running the state manually would still work; - make port 5240 available on the jumpserver for MaaS dashboard access; - docs: update diagrams and text to reflect the new changes; Change-Id: I6d9424995e9a90c530fd7577edf401d552bab929 Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
2019-02-14Rise up salt's gather job timeoutMichael Polenchuk1-1/+2
While the minions are working their jobs the CLI is waiting for the first initial timeout period (timeout) to start. When that hits, the CLI sends sends the first "find_job" query. This kicks off the gather_job_timeout timer. Sometimes a minion doesn't respond to the request within the gather_job_timeout time period (default is 10s), so rise up this value to give a chance for a minion to report actual status. Change-Id: Ic3756b82fdeb17718870ab30e9578263d25309f7 Signed-off-by: Michael Polenchuk <mpolenchuk@mirantis.com>