summaryrefslogtreecommitdiffstats
path: root/docs/requirements
diff options
context:
space:
mode:
authorMaryam Tahhan <maryam.tahhan@intel.com>2017-02-16 14:28:05 +0000
committerMaryam Tahhan <maryam.tahhan@intel.com>2017-02-17 09:35:25 +0000
commit2bb8c4857689cabe69d3d2d3d54dffa78d8f4a9f (patch)
tree75c0b7fdeb5167588fe42d02702fb7d5b354725a /docs/requirements
parent47ccd41d789085a2186fc1fb86364d93a20783ef (diff)
docs: moving to new doc structure
Change-Id: I91188deec2bd4e8aa405a9e023acde42b3fb31f7 Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Diffstat (limited to 'docs/requirements')
-rw-r--r--docs/requirements/01-intro.rst183
-rwxr-xr-xdocs/requirements/02-collectd.rst103
-rw-r--r--docs/requirements/03-dpdk.rst170
-rwxr-xr-xdocs/requirements/barometer_scope.pngbin39958 -> 0 bytes
-rw-r--r--docs/requirements/dpdk_ka.pngbin100808 -> 0 bytes
-rw-r--r--docs/requirements/index.rst14
-rw-r--r--docs/requirements/stats_and_timestamps.pngbin52193 -> 0 bytes
7 files changed, 0 insertions, 470 deletions
diff --git a/docs/requirements/01-intro.rst b/docs/requirements/01-intro.rst
deleted file mode 100644
index bc0e9ba0..00000000
--- a/docs/requirements/01-intro.rst
+++ /dev/null
@@ -1,183 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-Introduction
-============
-Barometer is the project that renames Software Fastpath service Quality Metrics
-(SFQM) and updates its scope which was networking centric.
-
-The goal of SFQM was to develop the utilities and libraries in DPDK to
-support:
-
-* Measuring Telco Traffic and Performance KPIs. Including:
-
- * Packet Delay Variation (by enabling TX and RX time stamping).
- * Packet loss (by exposing extended NIC stats).
-
-* Performance Monitoring of the DPDK interfaces (by exposing
- extended NIC stats + collectd Plugin).
-* Detecting and reporting violations that can be consumed by VNFs
- and higher level management systems (through DPDK Keep Alive).
-
-With Barometer the scope is extended to monitoring the NFVI. The ability to
-monitor the Network Function Virtualization Infrastructure (NFVI) where VNFs
-are in operation will be a key part of Service Assurance within an NFV
-environment, in order to enforce SLAs or to detect violations, faults or
-degradation in the performance of NFVI resources so that events and relevant
-metrics are reported to higher level fault management systems.
-If physical appliances are going to be replaced by virtualized appliances
-the service levels, manageability and service assurance needs to remain
-consistent or improve on what is available today. As such, the NFVI needs to
-support the ability to monitor:
-
-* Traffic monitoring and performance monitoring of the components that provide
- networking functionality to the VNF, including: physical interfaces, virtual
- switch interfaces and flows, as well as the virtual interfaces themselves and
- their status, etc.
-* Platform monitoring including: CPU, memory, load, cache, themals, fan speeds,
- voltages and machine check exceptions, etc.
-
-All of the statistics and events gathered must be collected in-service and must
-be capable of being reported by standard Telco mechanisms (e.g. SNMP), for
-potential enforcement or correction actions. In addition, this information
-could be fed to analytics systems to enable failure prediction, and can also be
-used for intelligent workload placement.
-
-
-All developed features will be upstreamed to Open Source projects relevant to
-telemetry such as `collectd`_ and `Ceilometer`_.
-
-The OPNFV project wiki can be found @ `Barometer`_
-
-Problem Statement
-==================
-Providing carrier grade Service Assurance is critical in the network
-transformation to a software defined and virtualized network (NFV).
-Medium-/large-scale cloud environments account for between hundreds and
-hundreds of thousands of infrastructure systems. It is vital to monitor
-systems for malfunctions that could lead to users application service
-disruption and promptly react to these fault events to facilitate improving
-overall system performance. As the size of infrastructure and virtual resources
-grow, so does the effort of monitoring back-ends. SFQM aims to expose as much
-useful information as possible off the platform so that faults and errors in
-the NFVI can be detected promptly and reported to the appropriate fault
-management entity.
-
-The OPNFV platform (NFVI) requires functionality to:
-
-* Create a low latency, high performance packet processing path (fast path)
- through the NFVI that VNFs can take advantage of;
-* Measure Telco Traffic and Performance KPIs through that fast path;
-* Detect and report violations that can be consumed by VNFs and higher level
- EMS/OSS systems
-
-Examples of local measurable QoS factors for Traffic Monitoring which impact
-both Quality of Experience and five 9's availability would be (using Metro Ethernet
-Forum Guidelines as reference):
-
-* Packet loss
-* Packet Delay Variation
-* Uni-directional frame delay
-
-Other KPIs such as Call drops, Call Setup Success Rate, Call Setup time etc. are
-measured by the VNF.
-
-In addition to Traffic Monitoring, the NFVI must also support Performance
-Monitoring of the physical interfaces themselves (e.g. NICs), i.e. an ability to
-monitor and trace errors on the physical interfaces and report them.
-
-All these traffic statistics for Traffic and Performance Monitoring must be
-measured in-service and must be capable of being reported by standard Telco
-mechanisms (e.g. SNMP traps), for potential enforcement actions.
-
-Barometer updated scope
-=======================
-The scope of the project is to provide interfaces to support monitoring of the
-NFVI. The project will develop plugins for telemetry frameworks to enable the
-collection of platform stats and events and relay gathered information to fault
-management applications or the VIM. The scope is limited to
-collecting/gathering the events and stats and relaying them to a relevant
-endpoint. The project will not enforce or take any actions based on the
-gathered information.
-
-.. image: barometer_scope.png
-
-Scope of SFQM
-=============
-**NOTE:** The SFQM project has been replaced by Barometer.
-The output of the project will provide interfaces and functions to support
-monitoring of Packet Latency and Network Interfaces while the VNF is in service.
-
-The DPDK interface/API will be updated to support:
-
-* Exposure of NIC MAC/PHY Level Counters
-* Interface for Time stamp on RX
-* Interface for Time stamp on TX
-* Exposure of DPDK events
-
-collectd will be updated to support the exposure of DPDK metrics and events.
-
-Specific testing and integration will be carried out to cover:
-
-* Unit/Integration Test plans: A sample application provided to demonstrate packet
- latency monitoring and interface monitoring
-
-The following list of features and functionality will be developed:
-
-* DPDK APIs and functions for latency and interface monitoring
-* A sample application to demonstrate usage
-* collectd plugins
-
-The scope of the project involves developing the relavant DPDK APIs, OVS APIs,
-sample applications, as well as the utilities in collectd to export all the
-relavent information to a telemetry and events consumer.
-
-VNF specific processing, Traffic Monitoring, Performance Monitoring and
-Management Agent are out of scope.
-
-The Proposed Interface counters include:
-
-* Packet RX
-* Packet TX
-* Packet loss
-* Interface errors + other stats
-
-The Proposed Packet Latency Monitor include:
-
-* Cycle accurate stamping on ingress
-* Supports latency measurements on egress
-
-Support for failover of DPDK enabled cores is also out of scope of the current
-proposal. However, this is an important requirement and must-have functionality
-for any DPDK enabled framework in the NFVI. To that end, a second phase of this
-project will be to implement DPDK Keep Alive functionality that would address
-this and would report to a VNF-level Failover and High Availability mechanism
-that would then determine what actions, including failover, may be triggered.
-
-Consumption Models
-===================
-In reality many VNFs will have an existing performance or traffic monitoring
-utility used to monitor VNF behavior and report statistics, counters, etc.
-
-The consumption of performance and traffic related information/events provided
-by this project should be a logical extension of any existing VNF/NFVI monitoring
-framework. It should not require a new framework to be developed. We do not see
-the Barometer gathered metrics and evetns as major additional effort for
-monitoring frameworks to consume; this project would be sympathetic to existing
-monitoring frameworks. The intention is that this project represents an
-interface for NFVI monitoring to be used by higher level fault management
-entities (see below).
-
-Allowing the Barometer metrics and events to be handled within existing
-telemetry frameoworks makes it simpler for overall interfacing with higher
-level management components in the VIM, MANO and OSS/BSS. The Barometer
-proposal would be complementary to the Doctor project, which addresses NFVI Fault
-Management support in the VIM, and the VES project, which addresses the
-integration of VNF telemetry-related data into automated VNF management
-systems. To that end, the project committers and contributors for the Barometer
-project wish to collaborate with the Doctor and VES projects to facilitate this.
-
-.. _Barometer: https://wiki.opnfv.org/display/fastpath
-.. _collectd: http://collectd.org/
-.. _Ceilometer: https://wiki.openstack.org/wiki/Telemetry
diff --git a/docs/requirements/02-collectd.rst b/docs/requirements/02-collectd.rst
deleted file mode 100755
index 2303fadc..00000000
--- a/docs/requirements/02-collectd.rst
+++ /dev/null
@@ -1,103 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-collectd
-~~~~~~~~
-collectd is a daemon which collects system performance statistics periodically
-and provides a variety of mechanisms to publish the collected metrics. It
-supports more than 90 different input and output plugins. Input plugins retrieve
-metrics and publish them to the collectd deamon, while output plugins publish
-the data they receive to an end point. collectd also has infrastructure to
-support thresholding and notification.
-
-collectd statistics and Notifications
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Within collectd notifications and performance data are dispatched in the same
-way. There are producer plugins (plugins that create notifications/metrics),
-and consumer plugins (plugins that receive notifications/metrics and do
-something with them).
-
-Statistics in collectd consist of a value list. A value list includes:
-
-* Values, can be one of:
-
- * Derive: used for values where a change in the value since it's last been
- read is of interest. Can be used to calculate and store a rate.
-
- * Counter: similar to derive values, but take the possibility of a counter
- wrap around into consideration.
-
- * Gauge: used for values that are stored as is.
-
- * Absolute: used for counters that are reset after reading.
-
-* Value length: the number of values in the data set.
-
-* Time: timestamp at which the value was collected.
-
-* Interval: interval at which to expect a new value.
-
-* Host: used to identify the host.
-
-* Plugin: used to identify the plugin.
-
-* Plugin instance (optional): used to group a set of values together. For e.g.
- values belonging to a DPDK interface.
-
-* Type: unit used to measure a value. In other words used to refer to a data
- set.
-
-* Type instance (optional): used to distinguish between values that have an
- identical type.
-
-* meta data: an opaque data structure that enables the passing of additional
- information about a value list. "Meta data in the global cache can be used to
- store arbitrary information about an identifier" [7].
-
-Host, plugin, plugin instance, type and type instance uniquely identify a
-collectd value.
-
-Values lists are often accompanied by data sets that describe the values in more
-detail. Data sets consist of:
-
-* A type: a name which uniquely identifies a data set.
-
-* One or more data sources (entries in a data set) which include:
-
- * The name of the data source. If there is only a single data source this is
- set to "value".
-
- * The type of the data source, one of: counter, gauge, absolute or derive.
-
- * A min and a max value.
-
-Types in collectd are defined in types.db. Examples of types in types.db:
-
-.. code-block:: console
-
- bitrate value:GAUGE:0:4294967295
- counter value:COUNTER:U:U
- if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295
-
-In the example above if_octets has two data sources: tx and rx.
-
-Notifications in collectd are generic messages containing:
-
-* An associated severity, which can be one of OKAY, WARNING, and FAILURE.
-
-* A time.
-
-* A Message
-
-* A host.
-
-* A plugin.
-
-* A plugin instance (optional).
-
-* A type.
-
-* A types instance (optional).
-
-* Meta-data.
diff --git a/docs/requirements/03-dpdk.rst b/docs/requirements/03-dpdk.rst
deleted file mode 100644
index ad7c8c78..00000000
--- a/docs/requirements/03-dpdk.rst
+++ /dev/null
@@ -1,170 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-DPDK Enhancements
-==================
-This section will discuss the Barometer features that were integrated with DPDK.
-
-Measuring Telco Traffic and Performance KPIs
---------------------------------------------
-This section will discuss the Barometer features that enable Measuring Telco Traffic
-and Performance KPIs.
-
-.. Figure:: stats_and_timestamps.png
-
- Measuring Telco Traffic and Performance KPIs
-
-* The very first thing Barometer enabled was a call-back API in DPDK and an
- associated application that used the API to demonstrate how to timestamp
- packets and measure packet latency in DPDK (the sample app is called
- rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by
- the interfaces 1 and 2 in Figure 1.2.
-
-* The second thing Barometer implemented in DPDK is the extended NIC statistics API,
- which exposes NIC stats including error stats to the DPDK user by reading the
- registers on the NIC. This is represented by interface 3 in Figure 1.2.
-
- * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver,
- in association with a sample application that runs as a DPDK secondary
- process and retrieves the extended NIC stats.
-
- * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual
- Functions (VFs) for all drivers.
-
- * For DPDK 16.07 the API migrated from using string value pairs to using id
- value pairs, improving the overall performance of the API.
-
-Monitoring DPDK interfaces
---------------------------
-With the features Barometer enabled in DPDK to enable measuring Telco traffic and
-performance KPIs, we can now retrieve NIC statistics including error stats and
-relay them to a DPDK user. The next step is to enable monitoring of the DPDK
-interfaces based on the stats that we are retrieving from the NICs, by relaying
-the information to a higher level Fault Management entity. To enable this Barometer
-has been enabling a number of plugins for collectd.
-
-DPDK Keep Alive description
----------------------------
-SFQM aims to enable fault detection within DPDK, the very first feature to
-meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2.
-
-DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog
-for DPDK packet processing cores, to detect application thread failure. The
-application supports the detection of ‘failed’ DPDK cores and notification to a
-HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g.
-infinite loop) and ensure the failure of the core does not result in a fault
-that is not detectable by a management entity.
-
-.. Figure:: dpdk_ka.png
-
- DPDK Keep Alive Sample Application
-
-Essentially the app demonstrates how to detect 'silent outages' on DPDK packet
-processing cores. The application can be decomposed into two specific parts:
-detection and notification.
-
-* The detection period is programmable/configurable but defaults to 5ms if no
- timeout is specified.
-* The Notification support is enabled by simply having a hook function that where this
- can be 'call back support' for a fault management application with a compliant
- heartbeat mechanism.
-
-DPDK Keep Alive Sample App Internals
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This section provides some explanation of the The Keep-Alive/'Liveliness'
-conceptual scheme as well as the DPDK Keep Alive App. The initialization and
-run-time paths are very similar to those of the L2 forwarding application (see
-`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more
-information).
-
-There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core)
-and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core
-will supervise worker cores and report any failure (2 successive missed pings).
-The Keep-Alive/'Liveliness' conceptual scheme is:
-
-* DPDK worker cores mark their liveliness as they forward traffic.
-* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to
- inspect worker core liveliness.
-* If keep-alive agent detects time-outs, it notifies the fault management
- entity through a call-back function.
-
-**Note:** Only the worker cores state is monitored. There is no mechanism or agent
-to monitor the Keep Alive Monitor Agent Core.
-
-DPDK Keep Alive Sample App Code Internals
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The following section provides some explanation of the code aspects that are
-specific to the Keep Alive sample application.
-
-The heartbeat functionality is initialized with a struct rte_heartbeat and the
-callback function to invoke in the case of a timeout.
-
-.. code:: c
-
- rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
- if (rte_global_hbeat_info == NULL)
- rte_exit(EXIT_FAILURE, "keepalive_create() failed");
-
-The function that issues the pings hbeat_dispatch_pings() is configured to run
-every check_period milliseconds.
-
-.. code:: c
-
- if (rte_timer_reset(&hb_timer,
- (check_period * rte_get_timer_hz()) / 1000,
- PERIODICAL,
- rte_lcore_id(),
- &hbeat_dispatch_pings, rte_global_keepalive_info
- ) != 0 )
- rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
-
-The rest of the initialization and run-time path follows the same paths as the
-the L2 forwarding application. The only addition to the main processing loop is
-the mark alive functionality and the example random failures.
-
-.. code:: c
-
- rte_keepalive_mark_alive(&rte_global_hbeat_info);
- cur_tsc = rte_rdtsc();
-
- /* Die randomly within 7 secs for demo purposes.. */
- if (cur_tsc - tsc_initial > tsc_lifetime)
- break;
-
-The rte_keepalive_mark_alive() function simply sets the core state to alive.
-
-.. code:: c
-
- static inline void
- rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg)
- {
- keepcfg->state_flags[rte_lcore_id()] = 1;
- }
-
-Keep Alive Monitor Agent Core Monitoring Options
-The application can run on either a host or a guest. As such there are a number
-of options for monitoring the Keep Alive Monitor Agent Core through a Local
-Agent on the compute node:
-
- ====================== ========== =============
- Application Location DPDK KA LOCAL AGENT
- ====================== ========== =============
- HOST X HOST/GUEST
- GUEST X HOST/GUEST
- ====================== ========== =============
-
-
-For the first implementation of a Local Agent SFQM will enable:
-
- ====================== ========== =============
- Application Location DPDK KA LOCAL AGENT
- ====================== ========== =============
- HOST X HOST
- ====================== ========== =============
-
-Through extending the dpdkstat plugin for collectd with KA functionality, and
-integrating the extended plugin with Monasca for high performing, resilient,
-and scalable fault detection.
-
-.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html
diff --git a/docs/requirements/barometer_scope.png b/docs/requirements/barometer_scope.png
deleted file mode 100755
index 03783bde..00000000
--- a/docs/requirements/barometer_scope.png
+++ /dev/null
Binary files differ
diff --git a/docs/requirements/dpdk_ka.png b/docs/requirements/dpdk_ka.png
deleted file mode 100644
index 4a45e10c..00000000
--- a/docs/requirements/dpdk_ka.png
+++ /dev/null
Binary files differ
diff --git a/docs/requirements/index.rst b/docs/requirements/index.rst
deleted file mode 100644
index e5d04896..00000000
--- a/docs/requirements/index.rst
+++ /dev/null
@@ -1,14 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-**********************
-Barometer Requirements
-**********************
-.. toctree::
- :maxdepth: 3
- :numbered:
-
- 01-intro.rst
- 02-collectd.rst
- 03-dpdk.rst
diff --git a/docs/requirements/stats_and_timestamps.png b/docs/requirements/stats_and_timestamps.png
deleted file mode 100644
index 84aef726..00000000
--- a/docs/requirements/stats_and_timestamps.png
+++ /dev/null
Binary files differ