From 32a7428ae0c68d02db255e42a921445d2d440ff4 Mon Sep 17 00:00:00 2001 From: Maryam Tahhan Date: Sun, 14 Aug 2016 12:21:34 +0100 Subject: docs: add userguide Add a userguide that describes the SFQM features. JIRA: DOCS-106 Change-Id: Icd57e7353bc813ed42fa295dee907b3c67f4fb93 Signed-off-by: Maryam Tahhan --- docs/index.rst | 1 + ...easuring_telco_traffic_and_performance_KPIs.rst | 195 -------------------- docs/requirements/03-dpdk_ka.rst | 128 ------------- docs/requirements/dpdk_ka.png | Bin 100808 -> 0 bytes docs/requirements/index.rst | 3 - docs/requirements/monitoring_interfaces.png | Bin 94097 -> 0 bytes docs/requirements/stats_and_timestamps.png | Bin 52193 -> 0 bytes docs/userguide/collectd.userguide.rst | 202 +++++++++++++++++++++ docs/userguide/dpdk_ka.png | Bin 0 -> 100808 bytes docs/userguide/index.rst | 19 ++ docs/userguide/keepalive.userguide.rst | 128 +++++++++++++ docs/userguide/monitoring_interfaces.png | Bin 0 -> 94097 bytes docs/userguide/stats_and_timestamps.png | Bin 0 -> 52193 bytes 13 files changed, 350 insertions(+), 326 deletions(-) delete mode 100644 docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst delete mode 100644 docs/requirements/03-dpdk_ka.rst delete mode 100644 docs/requirements/dpdk_ka.png delete mode 100644 docs/requirements/monitoring_interfaces.png delete mode 100644 docs/requirements/stats_and_timestamps.png create mode 100644 docs/userguide/collectd.userguide.rst create mode 100644 docs/userguide/dpdk_ka.png create mode 100644 docs/userguide/index.rst create mode 100644 docs/userguide/keepalive.userguide.rst create mode 100644 docs/userguide/monitoring_interfaces.png create mode 100644 docs/userguide/stats_and_timestamps.png (limited to 'docs') diff --git a/docs/index.rst b/docs/index.rst index d9d557f5..5f0aae49 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -25,4 +25,5 @@ Table of Contents: :numbered: requirements/index.rst + userguide/index.rst release/index.rst diff --git a/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst b/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst deleted file mode 100644 index 7f0d4861..00000000 --- a/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst +++ /dev/null @@ -1,195 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -Measuring Telco Traffic and Performance KPIs -============================================ -This section will discuss the SFQM features that enable Measuring Telco Traffic -and Performance KPIs. - -.. Figure:: stats_and_timestamps.png - - Measuring Telco Traffic and Performance KPIs - -* The very first thing SFQM enabled was a call-back API in DPDK and an - associated application that used the API to demonstrate how to timestamp - packets and measure packet latency in DPDK (the sample app is called - rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by - the interfaces 1 and 2 in Figure 1.2. - -* The second thing SFQM implemented in DPDK is the extended NIC statistics API, - which exposes NIC stats including error stats to the DPDK user by reading the - registers on the NIC. This is represented by interface 3 in Figure 1.2. - - * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver, - in association with a sample application that runs as a DPDK secondary - process and retrieves the extended NIC stats. - - * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual - Functions (VFs) for all drivers. - - * For DPDK 16.07 the API migrated from using string value pairs to using id - value pairs, improving the overall performance of the API. - -Monitoring DPDK interfaces -=========================== -With the features SFQM enabled in DPDK to enable measuring Telco traffic and -performance KPIs, we can now retrieve NIC statistics including error stats and -relay them to a DPDK user. The next step is to enable monitoring of the DPDK -interfaces based on the stats that we are retrieving from the NICs, by relaying -the information to a higher level Fault Management entity. To enable this SFQM -has been enabling a number of plugins for collectd. - -collectd ---------- -collectd is a daemon which collects system performance statistics periodically -and provides a variety of mechanisms to publish the collected metrics. It -supports more than 90 different input and output plugins. Input plugins retrieve -metrics and publish them to the collectd deamon, while output plugins publish -the data they receive to an end point. collectd also has infrastructure to -support thresholding and notification. - -Statistics and Notifications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Within collectd notifications and performance data are dispatched in the same -way. There are producer plugins (plugins that create notifications/metrics), -and consumer plugins (plugins that receive notifications/metrics and do -something with them). - -Statistics in collectd consist of a value list. A value list includes: - -* Values, can be one of: - - * Derive: used for values where a change in the value since it's last been - read is of interest. Can be used to calculate and store a rate. - - * Counter: similar to derive values, but take the possibility of a counter - wrap around into consideration. - - * Gauge: used for values that are stored as is. - - * Absolute: used for counters that are reset after reading. - -* Value length: the number of values in the data set. - -* Time: timestamp at which the value was collected. - -* Interval: interval at which to expect a new value. - -* Host: used to identify the host. - -* Plugin: used to identify the plugin. - -* Plugin instance (optional): used to group a set of values together. For e.g. - values belonging to a DPDK interface. - -* Type: unit used to measure a value. In other words used to refer to a data - set. - -* Type instance (optional): used to distinguish between values that have an - identical type. - -* meta data: an opaque data structure that enables the passing of additional - information about a value list. "Meta data in the global cache can be used to - store arbitrary information about an identifier" [7]. - -Host, plugin, plugin instance, type and type instance uniquely identify a -collectd value. - -Values lists are often accompanied by data sets that describe the values in more -detail. Data sets consist of: - -* A type: a name which uniquely identifies a data set. - -* One or more data sources (entries in a data set) which include: - - * The name of the data source. If there is only a single data source this is - set to "value". - - * The type of the data source, one of: counter, gauge, absolute or derive. - - * A min and a max value. - -Types in collectd are defined in types.db. Examples of types in types.db: - -.. code-block:: console - - bitrate value:GAUGE:0:4294967295 - counter value:COUNTER:U:U - if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295 - -In the example above if_octets has two data sources: tx and rx. - -Notifications in collectd are generic messages containing: - -* An associated severity, which can be one of OKAY, WARNING, and FAILURE. - -* A time. - -* A Message - -* A host. - -* A plugin. - -* A plugin instance (optional). - -* A type. - -* A types instance (optional). - -* Meta-data. - -collectd plugins ----------------- -SFQM has enabled three collectd plugins to date: - -* `dpdkstat plugin`_: A read plugin that retrieve stats from the DPDK extended - NIC stats API. - -* `ceilometer plugin`_: A write plugin that pushes the retrieved stats to - Ceilometer. It's capable of pushing any stats read through collectd to - Ceilometer, not just the DPDK stats. - -* `hugepages plugin`_: A read plugin that retrieves the number of available - and free hugepages on a platform as well as what is available in terms of - hugepages per socket. - -Other plugins in progress: - -* dpdkevents: A read plugin that retrieves DPDK link status and DPDK - forwarding cores liveliness status (DPDK Keep Alive). - -* Open vSwitch stats Plugin: A read plugin that retrieve flow and interface - stats from OVS. - -* Open vSwitch events Plugin: A read plugin that retrieves events from OVS. - - -Monitoring Interfaces and Openstack Support -------------------------------------------- -.. Figure:: monitoring_interfaces.png - - Monitoring Interfaces and Openstack Support - -The figure above shows the DPDK L2 forwarding application running on a compute -node, sending and receiving traffic. collectd is also running on this compute -node retrieving the stats periodically from DPDK through the dpdkstat plugin -and publishing the retrieved stats to Ceilometer through the ceilometer plugin. - -To see this demo in action please checkout: `SFQM OPNFV Summit demo`_ - -References ----------- -[1] https://collectd.org/wiki/index.php/Naming_schema -[2] https://github.com/collectd/collectd/blob/master/src/daemon/plugin.h -[3] https://collectd.org/wiki/index.php/Value_list_t -[4] https://collectd.org/wiki/index.php/Data_set -[5] https://collectd.org/documentation/manpages/types.db.5.shtml -[6] https://collectd.org/wiki/index.php/Data_source -[7] https://collectd.org/wiki/index.php/Meta_Data_Interface - -.. _SFQM OPNFV Summit demo: https://prezi.com/kjv6o8ixs6se/software-fastpath-service-quality-metrics-demo/ -.. _dpdkstat plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/dpdkstat -.. _ceilometer plugin: https://github.com/openstack/collectd-ceilometer-plugin/tree/stable/mitaka -.. _hugepages plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/hugepages diff --git a/docs/requirements/03-dpdk_ka.rst b/docs/requirements/03-dpdk_ka.rst deleted file mode 100644 index ce3e7e49..00000000 --- a/docs/requirements/03-dpdk_ka.rst +++ /dev/null @@ -1,128 +0,0 @@ -.. This work is licensed under a Creative Commons Attribution 4.0 International License. -.. http://creativecommons.org/licenses/by/4.0 -.. (c) OPNFV, Intel Corporation and others. - -DPDK Keep Alive Overview -========================= -SFQM aims to enable fault detection within DPDK, the very first feature to -meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2. - -DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog -for DPDK packet processing cores, to detect application thread failure. The -application supports the detection of ‘failed’ DPDK cores and notification to a -HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g. -infinite loop) and ensure the failure of the core does not result in a fault -that is not detectable by a management entity. - -.. Figure:: dpdk_ka.png - - DPDK Keep Alive Sample Application - -Essentially the app demonstrates how to detect 'silent outages' on DPDK packet -processing cores. The application can be decomposed into two specific parts: -detection and notification. - -* The detection period is programmable/configurable but defaults to 5ms if no - timeout is specified. -* The Notification support is enabled by simply having a hook function that where this - can be 'call back support' for a fault management application with a compliant - heartbeat mechanism. - -DPDK Keep Alive Sample App Internals -==================================== -This section provides some explanation of the The Keep-Alive/'Liveliness' -conceptual scheme as well as the DPDK Keep Alive App. The initialization and -run-time paths are very similar to those of the L2 forwarding application (see -`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more -information). - -There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core) -and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core -will supervise worker cores and report any failure (2 successive missed pings). -The Keep-Alive/'Liveliness' conceptual scheme is: - -* DPDK worker cores mark their liveliness as they forward traffic. -* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to - inspect worker core liveliness. -* If keep-alive agent detects time-outs, it notifies the fault management - entity through a call-back function. - -**Note:** Only the worker cores state is monitored. There is no mechanism or agent -to monitor the Keep Alive Monitor Agent Core. - -DPDK Keep Alive Sample App Code Internals -========================================= -The following section provides some explanation of the code aspects that are -specific to the Keep Alive sample application. - -The heartbeat functionality is initialized with a struct rte_heartbeat and the -callback function to invoke in the case of a timeout. - -.. code:: c - - rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL); - if (rte_global_hbeat_info == NULL) - rte_exit(EXIT_FAILURE, "keepalive_create() failed"); - -The function that issues the pings hbeat_dispatch_pings() is configured to run -every check_period milliseconds. - -.. code:: c - - if (rte_timer_reset(&hb_timer, - (check_period * rte_get_timer_hz()) / 1000, - PERIODICAL, - rte_lcore_id(), - &hbeat_dispatch_pings, rte_global_keepalive_info - ) != 0 ) - rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n"); - -The rest of the initialization and run-time path follows the same paths as the -the L2 forwarding application. The only addition to the main processing loop is -the mark alive functionality and the example random failures. - -.. code:: c - - rte_keepalive_mark_alive(&rte_global_hbeat_info); - cur_tsc = rte_rdtsc(); - - /* Die randomly within 7 secs for demo purposes.. */ - if (cur_tsc - tsc_initial > tsc_lifetime) - break; - -The rte_keepalive_mark_alive() function simply sets the core state to alive. - -.. code:: c - - static inline void - rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg) - { - keepcfg->state_flags[rte_lcore_id()] = 1; - } - -Keep Alive Monitor Agent Core Monitoring Options -The application can run on either a host or a guest. As such there are a number -of options for monitoring the Keep Alive Monitor Agent Core through a Local -Agent on the compute node: - - ====================== ========== ============= - Application Location DPDK KA LOCAL AGENT - ====================== ========== ============= - HOST X HOST/GUEST - GUEST X HOST/GUEST - ====================== ========== ============= - - -For the first implementation of a Local Agent SFQM will enable: - - ====================== ========== ============= - Application Location DPDK KA LOCAL AGENT - ====================== ========== ============= - HOST X HOST - ====================== ========== ============= - -Through extending the dpdkstat plugin for collectd with KA functionality, and -integrating the extended plugin with Monasca for high performing, resilient, -and scalable fault detection. - -.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html diff --git a/docs/requirements/dpdk_ka.png b/docs/requirements/dpdk_ka.png deleted file mode 100644 index 4a45e10c..00000000 Binary files a/docs/requirements/dpdk_ka.png and /dev/null differ diff --git a/docs/requirements/index.rst b/docs/requirements/index.rst index a58b103b..a9be153d 100644 --- a/docs/requirements/index.rst +++ b/docs/requirements/index.rst @@ -7,6 +7,3 @@ :numbered: 01-intro.rst - 02-measuring_telco_traffic_and_performance_KPIs.rst - 03-dpdk_ka.rst - 04-release_b.rst diff --git a/docs/requirements/monitoring_interfaces.png b/docs/requirements/monitoring_interfaces.png deleted file mode 100644 index e57c4aa1..00000000 Binary files a/docs/requirements/monitoring_interfaces.png and /dev/null differ diff --git a/docs/requirements/stats_and_timestamps.png b/docs/requirements/stats_and_timestamps.png deleted file mode 100644 index 84aef726..00000000 Binary files a/docs/requirements/stats_and_timestamps.png and /dev/null differ diff --git a/docs/userguide/collectd.userguide.rst b/docs/userguide/collectd.userguide.rst new file mode 100644 index 00000000..0755fdf5 --- /dev/null +++ b/docs/userguide/collectd.userguide.rst @@ -0,0 +1,202 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 +.. (c) OPNFV, Intel Corporation and others. + +collectd plugins description +============================ +The SFQM collectd plugins enable the ability to monitor DPDK interfaces by +exposing stats and the relevant events to higher level telemetry and fault +management applications. The following sections will discuss the SFQM features +in detail. + +Measuring Telco Traffic and Performance KPIs +-------------------------------------------- +This section will discuss the SFQM features that enable Measuring Telco Traffic +and Performance KPIs. + +.. Figure:: stats_and_timestamps.png + + Measuring Telco Traffic and Performance KPIs + +* The very first thing SFQM enabled was a call-back API in DPDK and an + associated application that used the API to demonstrate how to timestamp + packets and measure packet latency in DPDK (the sample app is called + rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by + the interfaces 1 and 2 in Figure 1.2. + +* The second thing SFQM implemented in DPDK is the extended NIC statistics API, + which exposes NIC stats including error stats to the DPDK user by reading the + registers on the NIC. This is represented by interface 3 in Figure 1.2. + + * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver, + in association with a sample application that runs as a DPDK secondary + process and retrieves the extended NIC stats. + + * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual + Functions (VFs) for all drivers. + + * For DPDK 16.07 the API migrated from using string value pairs to using id + value pairs, improving the overall performance of the API. + +Monitoring DPDK interfaces +-------------------------- +With the features SFQM enabled in DPDK to enable measuring Telco traffic and +performance KPIs, we can now retrieve NIC statistics including error stats and +relay them to a DPDK user. The next step is to enable monitoring of the DPDK +interfaces based on the stats that we are retrieving from the NICs, by relaying +the information to a higher level Fault Management entity. To enable this SFQM +has been enabling a number of plugins for collectd. + +collectd +~~~~~~~~ +collectd is a daemon which collects system performance statistics periodically +and provides a variety of mechanisms to publish the collected metrics. It +supports more than 90 different input and output plugins. Input plugins retrieve +metrics and publish them to the collectd deamon, while output plugins publish +the data they receive to an end point. collectd also has infrastructure to +support thresholding and notification. + +collectd statistics and Notifications +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Within collectd notifications and performance data are dispatched in the same +way. There are producer plugins (plugins that create notifications/metrics), +and consumer plugins (plugins that receive notifications/metrics and do +something with them). + +Statistics in collectd consist of a value list. A value list includes: + +* Values, can be one of: + + * Derive: used for values where a change in the value since it's last been + read is of interest. Can be used to calculate and store a rate. + + * Counter: similar to derive values, but take the possibility of a counter + wrap around into consideration. + + * Gauge: used for values that are stored as is. + + * Absolute: used for counters that are reset after reading. + +* Value length: the number of values in the data set. + +* Time: timestamp at which the value was collected. + +* Interval: interval at which to expect a new value. + +* Host: used to identify the host. + +* Plugin: used to identify the plugin. + +* Plugin instance (optional): used to group a set of values together. For e.g. + values belonging to a DPDK interface. + +* Type: unit used to measure a value. In other words used to refer to a data + set. + +* Type instance (optional): used to distinguish between values that have an + identical type. + +* meta data: an opaque data structure that enables the passing of additional + information about a value list. "Meta data in the global cache can be used to + store arbitrary information about an identifier" [7]. + +Host, plugin, plugin instance, type and type instance uniquely identify a +collectd value. + +Values lists are often accompanied by data sets that describe the values in more +detail. Data sets consist of: + +* A type: a name which uniquely identifies a data set. + +* One or more data sources (entries in a data set) which include: + + * The name of the data source. If there is only a single data source this is + set to "value". + + * The type of the data source, one of: counter, gauge, absolute or derive. + + * A min and a max value. + +Types in collectd are defined in types.db. Examples of types in types.db: + +.. code-block:: console + + bitrate value:GAUGE:0:4294967295 + counter value:COUNTER:U:U + if_octets rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295 + +In the example above if_octets has two data sources: tx and rx. + +Notifications in collectd are generic messages containing: + +* An associated severity, which can be one of OKAY, WARNING, and FAILURE. + +* A time. + +* A Message + +* A host. + +* A plugin. + +* A plugin instance (optional). + +* A type. + +* A types instance (optional). + +* Meta-data. + +collectd plugins +---------------- +SFQM has enabled three collectd plugins to date: + +* `dpdkstat plugin`_: A read plugin that retrieve stats from the DPDK extended + NIC stats API. + +* `ceilometer plugin`_: A write plugin that pushes the retrieved stats to + Ceilometer. It's capable of pushing any stats read through collectd to + Ceilometer, not just the DPDK stats. + +* `hugepages plugin`_: A read plugin that retrieves the number of available + and free hugepages on a platform as well as what is available in terms of + hugepages per socket. + +Other plugins in progress: + +* dpdkevents: A read plugin that retrieves DPDK link status and DPDK + forwarding cores liveliness status (DPDK Keep Alive). + +* Open vSwitch stats Plugin: A read plugin that retrieve flow and interface + stats from OVS. + +* Open vSwitch events Plugin: A read plugin that retrieves events from OVS. + + +Monitoring Interfaces and Openstack Support +------------------------------------------- +.. Figure:: monitoring_interfaces.png + + Monitoring Interfaces and Openstack Support + +The figure above shows the DPDK L2 forwarding application running on a compute +node, sending and receiving traffic. collectd is also running on this compute +node retrieving the stats periodically from DPDK through the dpdkstat plugin +and publishing the retrieved stats to Ceilometer through the ceilometer plugin. + +To see this demo in action please checkout: `SFQM OPNFV Summit demo`_ + +References +---------- +[1] https://collectd.org/wiki/index.php/Naming_schema +[2] https://github.com/collectd/collectd/blob/master/src/daemon/plugin.h +[3] https://collectd.org/wiki/index.php/Value_list_t +[4] https://collectd.org/wiki/index.php/Data_set +[5] https://collectd.org/documentation/manpages/types.db.5.shtml +[6] https://collectd.org/wiki/index.php/Data_source +[7] https://collectd.org/wiki/index.php/Meta_Data_Interface + +.. _SFQM OPNFV Summit demo: https://prezi.com/kjv6o8ixs6se/software-fastpath-service-quality-metrics-demo/ +.. _dpdkstat plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/dpdkstat +.. _ceilometer plugin: https://github.com/openstack/collectd-ceilometer-plugin/tree/stable/mitaka +.. _hugepages plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/hugepages diff --git a/docs/userguide/dpdk_ka.png b/docs/userguide/dpdk_ka.png new file mode 100644 index 00000000..4a45e10c Binary files /dev/null and b/docs/userguide/dpdk_ka.png differ diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst new file mode 100644 index 00000000..994f63c8 --- /dev/null +++ b/docs/userguide/index.rst @@ -0,0 +1,19 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. + .. http://creativecommons.org/licenses/by/4.0 + .. (c) + +==================== +SFQM user guide +==================== + +.. The feature user guide should provide an OPNFV user with enough information to + .. use the features provided by the feature project in the supported scenarios. + .. This guide should walk a user through the usage of the features once a scenario + .. has been deployed and is active according to the installation guide provided + .. by the installer project. + +.. toctree:: + :maxdepth: 3 + + collectd.userguide.rst + keepalive.userguide.rst diff --git a/docs/userguide/keepalive.userguide.rst b/docs/userguide/keepalive.userguide.rst new file mode 100644 index 00000000..4b6e990d --- /dev/null +++ b/docs/userguide/keepalive.userguide.rst @@ -0,0 +1,128 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 +.. (c) OPNFV, Intel Corporation and others. + +DPDK Keep Alive description +=========================== +SFQM aims to enable fault detection within DPDK, the very first feature to +meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2. + +DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog +for DPDK packet processing cores, to detect application thread failure. The +application supports the detection of ‘failed’ DPDK cores and notification to a +HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g. +infinite loop) and ensure the failure of the core does not result in a fault +that is not detectable by a management entity. + +.. Figure:: dpdk_ka.png + + DPDK Keep Alive Sample Application + +Essentially the app demonstrates how to detect 'silent outages' on DPDK packet +processing cores. The application can be decomposed into two specific parts: +detection and notification. + +* The detection period is programmable/configurable but defaults to 5ms if no + timeout is specified. +* The Notification support is enabled by simply having a hook function that where this + can be 'call back support' for a fault management application with a compliant + heartbeat mechanism. + +DPDK Keep Alive Sample App Internals +------------------------------------ +This section provides some explanation of the The Keep-Alive/'Liveliness' +conceptual scheme as well as the DPDK Keep Alive App. The initialization and +run-time paths are very similar to those of the L2 forwarding application (see +`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more +information). + +There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core) +and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core +will supervise worker cores and report any failure (2 successive missed pings). +The Keep-Alive/'Liveliness' conceptual scheme is: + +* DPDK worker cores mark their liveliness as they forward traffic. +* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to + inspect worker core liveliness. +* If keep-alive agent detects time-outs, it notifies the fault management + entity through a call-back function. + +**Note:** Only the worker cores state is monitored. There is no mechanism or agent +to monitor the Keep Alive Monitor Agent Core. + +DPDK Keep Alive Sample App Code Internals +----------------------------------------- +The following section provides some explanation of the code aspects that are +specific to the Keep Alive sample application. + +The heartbeat functionality is initialized with a struct rte_heartbeat and the +callback function to invoke in the case of a timeout. + +.. code:: c + + rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL); + if (rte_global_hbeat_info == NULL) + rte_exit(EXIT_FAILURE, "keepalive_create() failed"); + +The function that issues the pings hbeat_dispatch_pings() is configured to run +every check_period milliseconds. + +.. code:: c + + if (rte_timer_reset(&hb_timer, + (check_period * rte_get_timer_hz()) / 1000, + PERIODICAL, + rte_lcore_id(), + &hbeat_dispatch_pings, rte_global_keepalive_info + ) != 0 ) + rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n"); + +The rest of the initialization and run-time path follows the same paths as the +the L2 forwarding application. The only addition to the main processing loop is +the mark alive functionality and the example random failures. + +.. code:: c + + rte_keepalive_mark_alive(&rte_global_hbeat_info); + cur_tsc = rte_rdtsc(); + + /* Die randomly within 7 secs for demo purposes.. */ + if (cur_tsc - tsc_initial > tsc_lifetime) + break; + +The rte_keepalive_mark_alive() function simply sets the core state to alive. + +.. code:: c + + static inline void + rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg) + { + keepcfg->state_flags[rte_lcore_id()] = 1; + } + +Keep Alive Monitor Agent Core Monitoring Options +The application can run on either a host or a guest. As such there are a number +of options for monitoring the Keep Alive Monitor Agent Core through a Local +Agent on the compute node: + + ====================== ========== ============= + Application Location DPDK KA LOCAL AGENT + ====================== ========== ============= + HOST X HOST/GUEST + GUEST X HOST/GUEST + ====================== ========== ============= + + +For the first implementation of a Local Agent SFQM will enable: + + ====================== ========== ============= + Application Location DPDK KA LOCAL AGENT + ====================== ========== ============= + HOST X HOST + ====================== ========== ============= + +Through extending the dpdkstat plugin for collectd with KA functionality, and +integrating the extended plugin with Monasca for high performing, resilient, +and scalable fault detection. + +.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html diff --git a/docs/userguide/monitoring_interfaces.png b/docs/userguide/monitoring_interfaces.png new file mode 100644 index 00000000..e57c4aa1 Binary files /dev/null and b/docs/userguide/monitoring_interfaces.png differ diff --git a/docs/userguide/stats_and_timestamps.png b/docs/userguide/stats_and_timestamps.png new file mode 100644 index 00000000..84aef726 Binary files /dev/null and b/docs/userguide/stats_and_timestamps.png differ -- cgit 1.2.3-korg