Age | Commit message (Collapse) | Author | Files | Lines |
|
We used to try to probe all the cores that the VNF was using
and only dump CPU stats for those cores.
We can't really detect those core accurately and we would rather
dump all the core information and let influxdb and grafana filter
the information.
We do end up with excessive KPI output, especially on systems with 88
cores, but this is manageable.
The core logic was partially removed, this finishes the removal.
Change-Id: I5cbb694fd982519e2df54db49a21ed5948e13537
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
switch from hardcoded path to dynamic path
based on bin_path
also enable proxy for install_collectd
add barometer settings for virt and ovs_stats
Change-Id: Id138aef548332a3e3fcb3963b746e7c9f10c0948
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
The PROX tests were hanging in the duration
runner.
These are fixes for various errors:
raise error in collect_kpi if VNF is down
move prox dpdk_rebind after collectd stop
fix dpdk nicbind rebind to group by drivers
prox: raise error in collect_kpi if the VNF is down
prox: add VNF_TYPE for consistency
sample_vnf: debug and fix kill_vnf
pkill is not matching some executable names,
add some debug process dumps and try switching
back to killall until we can find the issue
sample_vnf: add default timeout, so we can override
default 3600 SSH timeout
collect_kpi is the point at which we check
the VNFs and TGs for failures or exits
queues are the problem make sure we aren't silently blocking on
non-empty queues by canceling join thread in subprocess
fixup duration runner to close queues
and other attempt to stop duration runner
from hanging
VnfdHelper: memoize port_num
resource: fail if ssh can't connect
at the end of 3600 second test our ssh connection
is dead, so we can't actually stop collectd
unless we reconnect
fix stop() logic to ignore ssh errors
Change-Id: I6c8e682a80cb9d00362e2fef4a46df080f304e55
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
allow manually adding collectd nodes using Node context.
if a node is present with a collectd config dict then
we can create a ResourceProfile object for it
and connect to collectd.
example
nodes:
-
name: compute_0
role: Compute
ip: 1.1.1.1
user: root
password: r00t
collectd:
interval: 5
plugins:
ovs_stats: {}
Change-Id: Ie0c00fdb58373206071daa1fb13faf175c4313e0
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
We have the collectd.conf inside the python package
so instead of copying it from various places,
write the template directly to the remote system.
collectd: read collect.conf template with pkgresources
read the collectd.conf file as a string directly
and upload without creating temp file
use Jinja2 template, disable failing plugins
use proper Jinja2 template, disable the plugins that
were failing to load and blocking startup
add support for per-testcase collectd.conf config
using YAML
add support for custom interval, default is 25 seconds
Change-Id: Id904f7b7c9f41a9dd7adf5dfa06c064d65c25d2d
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
Add a new PortPair class to resolve the
topology into list of public and private ports.
Before we were calculating public/private in multiple
locations and using different conventions.
In addition for all the DPDK test we need to use the DPDK
port number and no rely on interface ordering or interface naming
conventions.
We used to use xe0 -> 0, xe1 -> 1, etc. This is not the DPDK port
number.
Use the new dpdknicbind_helper class to parse the output of
dpdk-devbind.py to find the actual DPDK port number at runtime.
We then use this DPDK port number to correctly calculate the
port_mask_hex.
The port mask maps the DPDK port num (PMD ID) to the LINK ID
used in the pipeline config
We also need to make sure we only use the interfaces matched to the
topology and not use all the interfaces, because in some cases we will
have unused interfaces. In particular TRex always requires an even
number of interfaces, so for single port TRex tests we have to create
the second port and not use it.
Thus we had to modify the traffic generator stats code to only dump
stats for used ports and no unused ports.
Ixia was using interface ordering to map to Ixia ports, instead we use
the dpdk_port_num which must be hardcoded for Ixia.
Renamed traffic_profile.execute to traffic_profile.execute_traffic so
we can trace the code easier.
We pass the port used by the traffic profile to generate_samples so we
don't get stats for unused ports.
Fixed up vPE config creation and bring up issues.
Fixed up CGNAPT and UDP_Replay to work correctly.
Tested with 4-port scale-out
Change-Id: I2e4f328bff2904108081e92a4bf712333fa73869
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Edward MacGillivray <edward.s.macgillivray@intel.com>
|
|
Change-Id: I81ff3d43d209e98188855c8b2eb302835bb5d417
Signed-off-by: Neha Vadnere <neha.r.vadnere@intel.com>
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
Change-Id: I3f7b9ca17164564b11517116e7e73b47f42243b9
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
|
|
Change-Id: I15e4ac38b347a08350b71c68469e2793eeed92ab
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Edward MacGillivray <edward.s.macgillivray@intel.com>
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
we need to be following defautl paramiko rules,
first use pkey, then key_filenames (autodetecting ~/.ssh/ keys),
then password
We have too much boilerplate redudant code everywhere, we need
to standardize on a factory function that takes a node dict.
Using Python3 ChainMap we can layer overrides and defaults.
VNF descriptors have to default key_filename, password to Python None.
The only way to do this is to omit key values if the variable is not
defined, this way the dict will not have the value and it will
default to Python None
Add python2 chainmap backport
Updated unittest mocking to use ssh.SSH.from_node
Change-Id: I80b0cb606e593b33e317c9e5e8ed0b74da591514
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
Failures:
test_connect (network_services.nfvi.test_collectd.TestAmqpConsumer) ... ERROR:pika.adapters.base_connection:Connection to 1.1.1.1:5672 failed: timeout
WARNING:pika.connection:Could not connect, 0 attempts left
ERROR:pika.callback:Calling <bound method SelectConnection._on_connection_error of <pika.adapters.select_connection.SelectConnection object at 0x7fe7e2333710>> for "0:_on_connection_error" failed
Traceback (most recent call last):
File "/home/jenkins/opnfv/slave_root/workspace/yardstick-verify-master/.tox/py27/local/lib/python2.7/site-packages/pika/callback.py", line 236, in process
callback(*args, **keywords)
File "/home/jenkins/opnfv/slave_root/workspace/yardstick-verify-master/.tox/py27/local/lib/python2.7/site-packages/pika/connection.py", line 1265, in _on_connection_error
self.params.connection_attempts)
AMQPConnectionError: Connection to 1.1.1.1:5672 failed: timeout
ok
Firstly, 1.1.1.1 is not an approriate fake address, use 127.0.0.1 so we don't try
to connect to anything external
But 127.0.0.1 won't work anyway, so disable test_connect
replace 152.16.0.0 with 172.16.0.0
Remove network_services.nfvi.test_resource.TestResourceProfile since it
also fails due to same error
Remove test_amqp_collect_nfvi_kpi_exception
Change-Id: I00bb1729658e18b4651129661ad9dd9c0dedcf37
Signed-off-by: Ross Brattain <ross.b.brattain@intel.com>
|
|
This patches added common function to collect NFVi KPIs for given usecases
- Core KPIs like memory/LLC/IPC etc
- OVS stats
- memory stats etc.
JIRA: YARDSTICK-488
Change-Id: Iab41146392efc47b7313b1846a67728a44d0f1d6
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
|