diff options
author | Maryam Tahhan <maryam.tahhan@intel.com> | 2017-02-16 14:28:05 +0000 |
---|---|---|
committer | Maryam Tahhan <maryam.tahhan@intel.com> | 2017-02-17 09:35:25 +0000 |
commit | 2bb8c4857689cabe69d3d2d3d54dffa78d8f4a9f (patch) | |
tree | 75c0b7fdeb5167588fe42d02702fb7d5b354725a /docs/requirements | |
parent | 47ccd41d789085a2186fc1fb86364d93a20783ef (diff) |
docs: moving to new doc structure
Change-Id: I91188deec2bd4e8aa405a9e023acde42b3fb31f7
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Diffstat (limited to 'docs/requirements')
-rw-r--r-- | docs/requirements/01-intro.rst | 183 | ||||
-rwxr-xr-x | docs/requirements/02-collectd.rst | 103 | ||||
-rw-r--r-- | docs/requirements/03-dpdk.rst | 170 | ||||
-rwxr-xr-x | docs/requirements/barometer_scope.png | bin | 39958 -> 0 bytes | |||
-rw-r--r-- | docs/requirements/dpdk_ka.png | bin | 100808 -> 0 bytes | |||
-rw-r--r-- | docs/requirements/index.rst | 14 | ||||
-rw-r--r-- | docs/requirements/stats_and_timestamps.png | bin | 52193 -> 0 bytes |
7 files changed, 0 insertions, 470 deletions
diff --git a/docs/requirements/01-intro.rst b/docs/requirements/01-intro.rst deleted file mode 100644 index bc0e9ba0..00000000 --- a/docs/requirements/01-intro.rst +++ /dev/null @@ -1,183 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -Introduction -============ -Barometer is the project that renames Software Fastpath service Quality Metrics -(SFQM) and updates its scope which was networking centric. - -The goal of SFQM was to develop the utilities and libraries in DPDK to -support: - -* Measuring Telco Traffic and Performance KPIs. Including: - - * Packet Delay Variation (by enabling TX and RX time stamping). - * Packet loss (by exposing extended NIC stats). - -* Performance Monitoring of the DPDK interfaces (by exposing - extended NIC stats + collectd Plugin). -* Detecting and reporting violations that can be consumed by VNFs - and higher level management systems (through DPDK Keep Alive). - -With Barometer the scope is extended to monitoring the NFVI. The ability to -monitor the Network Function Virtualization Infrastructure (NFVI) where VNFs -are in operation will be a key part of Service Assurance within an NFV -environment, in order to enforce SLAs or to detect violations, faults or -degradation in the performance of NFVI resources so that events and relevant -metrics are reported to higher level fault management systems. -If physical appliances are going to be replaced by virtualized appliances -the service levels, manageability and service assurance needs to remain -consistent or improve on what is available today. As such, the NFVI needs to -support the ability to monitor: - -* Traffic monitoring and performance monitoring of the components that provide - networking functionality to the VNF, including: physical interfaces, virtual - switch interfaces and flows, as well as the virtual interfaces themselves and - their status, etc. -* Platform monitoring including: CPU, memory, load, cache, themals, fan speeds, - voltages and machine check exceptions, etc. - -All of the statistics and events gathered must be collected in-service and must -be capable of being reported by standard Telco mechanisms (e.g. SNMP), for -potential enforcement or correction actions. In addition, this information -could be fed to analytics systems to enable failure prediction, and can also be -used for intelligent workload placement. - - -All developed features will be upstreamed to Open Source projects relevant to -telemetry such as `collectd`_ and `Ceilometer`_. - -The OPNFV project wiki can be found @ `Barometer`_ - -Problem Statement -================== -Providing carrier grade Service Assurance is critical in the network -transformation to a software defined and virtualized network (NFV). -Medium-/large-scale cloud environments account for between hundreds and -hundreds of thousands of infrastructure systems. It is vital to monitor -systems for malfunctions that could lead to users application service -disruption and promptly react to these fault events to facilitate improving -overall system performance. As the size of infrastructure and virtual resources -grow, so does the effort of monitoring back-ends. SFQM aims to expose as much -useful information as possible off the platform so that faults and errors in -the NFVI can be detected promptly and reported to the appropriate fault -management entity. - -The OPNFV platform (NFVI) requires functionality to: - -* Create a low latency, high performance packet processing path (fast path) - through the NFVI that VNFs can take advantage of; -* Measure Telco Traffic and Performance KPIs through that fast path; -* Detect and report violations that can be consumed by VNFs and higher level - EMS/OSS systems - -Examples of local measurable QoS factors for Traffic Monitoring which impact -both Quality of Experience and five 9's availability would be (using Metro Ethernet -Forum Guidelines as reference): - -* Packet loss -* Packet Delay Variation -* Uni-directional frame delay - -Other KPIs such as Call drops, Call Setup Success Rate, Call Setup time etc. are -measured by the VNF. - -In addition to Traffic Monitoring, the NFVI must also support Performance -Monitoring of the physical interfaces themselves (e.g. NICs), i.e. an ability to -monitor and trace errors on the physical interfaces and report them. - -All these traffic statistics for Traffic and Performance Monitoring must be -measured in-service and must be capable of being reported by standard Telco -mechanisms (e.g. SNMP traps), for potential enforcement actions. - -Barometer updated scope -======================= -The scope of the project is to provide interfaces to support monitoring of the -NFVI. The project will develop plugins for telemetry frameworks to enable the -collection of platform stats and events and relay gathered information to fault -management applications or the VIM. The scope is limited to -collecting/gathering the events and stats and relaying them to a relevant -endpoint. The project will not enforce or take any actions based on the -gathered information. - -.. image: barometer_scope.png - -Scope of SFQM -============= -**NOTE:** The SFQM project has been replaced by Barometer. -The output of the project will provide interfaces and functions to support -monitoring of Packet Latency and Network Interfaces while the VNF is in service. - -The DPDK interface/API will be updated to support: - -* Exposure of NIC MAC/PHY Level Counters -* Interface for Time stamp on RX -* Interface for Time stamp on TX -* Exposure of DPDK events - -collectd will be updated to support the exposure of DPDK metrics and events. - -Specific testing and integration will be carried out to cover: - -* Unit/Integration Test plans: A sample application provided to demonstrate packet - latency monitoring and interface monitoring - -The following list of features and functionality will be developed: - -* DPDK APIs and functions for latency and interface monitoring -* A sample application to demonstrate usage -* collectd plugins - -The scope of the project involves developing the relavant DPDK APIs, OVS APIs, -sample applications, as well as the utilities in collectd to export all the -relavent information to a telemetry and events consumer. - -VNF specific processing, Traffic Monitoring, Performance Monitoring and -Management Agent are out of scope. - -The Proposed Interface counters include: - -* Packet RX -* Packet TX -* Packet loss -* Interface errors + other stats - -The Proposed Packet Latency Monitor include: - -* Cycle accurate stamping on ingress -* Supports latency measurements on egress - -Support for failover of DPDK enabled cores is also out of scope of the current -proposal. However, this is an important requirement and must-have functionality -for any DPDK enabled framework in the NFVI. To that end, a second phase of this -project will be to implement DPDK Keep Alive functionality that would address -this and would report to a VNF-level Failover and High Availability mechanism -that would then determine what actions, including failover, may be triggered. - -Consumption Models -=================== -In reality many VNFs will have an existing performance or traffic monitoring -utility used to monitor VNF behavior and report statistics, counters, etc. - -The consumption of performance and traffic related information/events provided -by this project should be a logical extension of any existing VNF/NFVI monitoring -framework. It should not require a new framework to be developed. We do not see -the Barometer gathered metrics and evetns as major additional effort for -monitoring frameworks to consume; this project would be sympathetic to existing -monitoring frameworks. The intention is that this project represents an -interface for NFVI monitoring to be used by higher level fault management -entities (see below). - -Allowing the Barometer metrics and events to be handled within existing -telemetry frameoworks makes it simpler for overall interfacing with higher -level management components in the VIM, MANO and OSS/BSS. The Barometer -proposal would be complementary to the Doctor project, which addresses NFVI Fault -Management support in the VIM, and the VES project, which addresses the -integration of VNF telemetry-related data into automated VNF management -systems. To that end, the project committers and contributors for the Barometer -project wish to collaborate with the Doctor and VES projects to facilitate this. - -.. _Barometer: https://wiki.opnfv.org/display/fastpath -.. _collectd: http://collectd.org/ -.. _Ceilometer: https://wiki.openstack.org/wiki/Telemetry diff --git a/docs/requirements/02-collectd.rst b/docs/requirements/02-collectd.rst deleted file mode 100755 index 2303fadc..00000000 --- a/docs/requirements/02-collectd.rst +++ /dev/null @@ -1,103 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -collectd -~~~~~~~~ -collectd is a daemon which collects system performance statistics periodically -and provides a variety of mechanisms to publish the collected metrics. It -supports more than 90 different input and output plugins. Input plugins retrieve -metrics and publish them to the collectd deamon, while output plugins publish -the data they receive to an end point. collectd also has infrastructure to -support thresholding and notification. - -collectd statistics and Notifications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Within collectd notifications and performance data are dispatched in the same -way. There are producer plugins (plugins that create notifications/metrics), -and consumer plugins (plugins that receive notifications/metrics and do -something with them). - -Statistics in collectd consist of a value list. A value list includes: - -* Values, can be one of: - - * Derive: used for values where a change in the value since it's last been - read is of interest. Can be used to calculate and store a rate. - - * Counter: similar to derive values, but take the possibility of a counter - wrap around into consideration. - - * Gauge: used for values that are stored as is. - - * Absolute: used for counters that are reset after reading. - -* Value length: the number of values in the data set. - -* Time: timestamp at which the value was collected. - -* Interval: interval at which to expect a new value. - -* Host: used to identify the host. - -* Plugin: used to identify the plugin. - -* Plugin instance (optional): used to group a set of values together. For e.g. - values belonging to a DPDK interface. - -* Type: unit used to measure a value. In other words used to refer to a data - set. - -* Type instance (optional): used to distinguish between values that have an - identical type. - -* meta data: an opaque data structure that enables the passing of additional - information about a value list. "Meta data in the global cache can be used to - store arbitrary information about an identifier" [7]. - -Host, plugin, plugin instance, type and type instance uniquely identify a -collectd value. - -Values lists are often accompanied by data sets that describe the values in more -detail. Data sets consist of: - -* A type: a name which uniquely identifies a data set. - -* One or more data sources (entries in a data set) which include: - - * The name of the data source. If there is only a single data source this is - set to "value". - - * The type of the data source, one of: counter, gauge, absolute or derive. - - * A min and a max value. - -Types in collectd are defined in types.db. Examples of types in types.db: - -.. code-block:: console - - bitrate value:GAUGE:0:4294967295 - counter value:COUNTER:U:U - if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295 - -In the example above if_octets has two data sources: tx and rx. - -Notifications in collectd are generic messages containing: - -* An associated severity, which can be one of OKAY, WARNING, and FAILURE. - -* A time. - -* A Message - -* A host. - -* A plugin. - -* A plugin instance (optional). - -* A type. - -* A types instance (optional). - -* Meta-data. diff --git a/docs/requirements/03-dpdk.rst b/docs/requirements/03-dpdk.rst deleted file mode 100644 index ad7c8c78..00000000 --- a/docs/requirements/03-dpdk.rst +++ /dev/null @@ -1,170 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -DPDK Enhancements -================== -This section will discuss the Barometer features that were integrated with DPDK. - -Measuring Telco Traffic and Performance KPIs --------------------------------------------- -This section will discuss the Barometer features that enable Measuring Telco Traffic -and Performance KPIs. - -.. Figure:: stats_and_timestamps.png - - Measuring Telco Traffic and Performance KPIs - -* The very first thing Barometer enabled was a call-back API in DPDK and an - associated application that used the API to demonstrate how to timestamp - packets and measure packet latency in DPDK (the sample app is called - rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by - the interfaces 1 and 2 in Figure 1.2. - -* The second thing Barometer implemented in DPDK is the extended NIC statistics API, - which exposes NIC stats including error stats to the DPDK user by reading the - registers on the NIC. This is represented by interface 3 in Figure 1.2. - - * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver, - in association with a sample application that runs as a DPDK secondary - process and retrieves the extended NIC stats. - - * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual - Functions (VFs) for all drivers. - - * For DPDK 16.07 the API migrated from using string value pairs to using id - value pairs, improving the overall performance of the API. - -Monitoring DPDK interfaces --------------------------- -With the features Barometer enabled in DPDK to enable measuring Telco traffic and -performance KPIs, we can now retrieve NIC statistics including error stats and -relay them to a DPDK user. The next step is to enable monitoring of the DPDK -interfaces based on the stats that we are retrieving from the NICs, by relaying -the information to a higher level Fault Management entity. To enable this Barometer -has been enabling a number of plugins for collectd. - -DPDK Keep Alive description ---------------------------- -SFQM aims to enable fault detection within DPDK, the very first feature to -meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2. - -DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog -for DPDK packet processing cores, to detect application thread failure. The -application supports the detection of ‘failed’ DPDK cores and notification to a -HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g. -infinite loop) and ensure the failure of the core does not result in a fault -that is not detectable by a management entity. - -.. Figure:: dpdk_ka.png - - DPDK Keep Alive Sample Application - -Essentially the app demonstrates how to detect 'silent outages' on DPDK packet -processing cores. The application can be decomposed into two specific parts: -detection and notification. - -* The detection period is programmable/configurable but defaults to 5ms if no - timeout is specified. -* The Notification support is enabled by simply having a hook function that where this - can be 'call back support' for a fault management application with a compliant - heartbeat mechanism. - -DPDK Keep Alive Sample App Internals -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This section provides some explanation of the The Keep-Alive/'Liveliness' -conceptual scheme as well as the DPDK Keep Alive App. The initialization and -run-time paths are very similar to those of the L2 forwarding application (see -`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more -information). - -There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core) -and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core -will supervise worker cores and report any failure (2 successive missed pings). -The Keep-Alive/'Liveliness' conceptual scheme is: - -* DPDK worker cores mark their liveliness as they forward traffic. -* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to - inspect worker core liveliness. -* If keep-alive agent detects time-outs, it notifies the fault management - entity through a call-back function. - -**Note:** Only the worker cores state is monitored. There is no mechanism or agent -to monitor the Keep Alive Monitor Agent Core. - -DPDK Keep Alive Sample App Code Internals -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The following section provides some explanation of the code aspects that are -specific to the Keep Alive sample application. - -The heartbeat functionality is initialized with a struct rte_heartbeat and the -callback function to invoke in the case of a timeout. - -.. code:: c - - rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL); - if (rte_global_hbeat_info == NULL) - rte_exit(EXIT_FAILURE, "keepalive_create() failed"); - -The function that issues the pings hbeat_dispatch_pings() is configured to run -every check_period milliseconds. - -.. code:: c - - if (rte_timer_reset(&hb_timer, - (check_period * rte_get_timer_hz()) / 1000, - PERIODICAL, - rte_lcore_id(), - &hbeat_dispatch_pings, rte_global_keepalive_info - ) != 0 ) - rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n"); - -The rest of the initialization and run-time path follows the same paths as the -the L2 forwarding application. The only addition to the main processing loop is -the mark alive functionality and the example random failures. - -.. code:: c - - rte_keepalive_mark_alive(&rte_global_hbeat_info); - cur_tsc = rte_rdtsc(); - - /* Die randomly within 7 secs for demo purposes.. */ - if (cur_tsc - tsc_initial > tsc_lifetime) - break; - -The rte_keepalive_mark_alive() function simply sets the core state to alive. - -.. code:: c - - static inline void - rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg) - { - keepcfg->state_flags[rte_lcore_id()] = 1; - } - -Keep Alive Monitor Agent Core Monitoring Options -The application can run on either a host or a guest. As such there are a number -of options for monitoring the Keep Alive Monitor Agent Core through a Local -Agent on the compute node: - - ====================== ========== ============= - Application Location DPDK KA LOCAL AGENT - ====================== ========== ============= - HOST X HOST/GUEST - GUEST X HOST/GUEST - ====================== ========== ============= - - -For the first implementation of a Local Agent SFQM will enable: - - ====================== ========== ============= - Application Location DPDK KA LOCAL AGENT - ====================== ========== ============= - HOST X HOST - ====================== ========== ============= - -Through extending the dpdkstat plugin for collectd with KA functionality, and -integrating the extended plugin with Monasca for high performing, resilient, -and scalable fault detection. - -.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html diff --git a/docs/requirements/barometer_scope.png b/docs/requirements/barometer_scope.png Binary files differdeleted file mode 100755 index 03783bde..00000000 --- a/docs/requirements/barometer_scope.png +++ /dev/null diff --git a/docs/requirements/dpdk_ka.png b/docs/requirements/dpdk_ka.png Binary files differdeleted file mode 100644 index 4a45e10c..00000000 --- a/docs/requirements/dpdk_ka.png +++ /dev/null diff --git a/docs/requirements/index.rst b/docs/requirements/index.rst deleted file mode 100644 index e5d04896..00000000 --- a/docs/requirements/index.rst +++ /dev/null @@ -1,14 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -********************** -Barometer Requirements -********************** -.. toctree:: - :maxdepth: 3 - :numbered: - - 01-intro.rst - 02-collectd.rst - 03-dpdk.rst diff --git a/docs/requirements/stats_and_timestamps.png b/docs/requirements/stats_and_timestamps.png Binary files differdeleted file mode 100644 index 84aef726..00000000 --- a/docs/requirements/stats_and_timestamps.png +++ /dev/null |