From 32a7428ae0c68d02db255e42a921445d2d440ff4 Mon Sep 17 00:00:00 2001
From: Maryam Tahhan <maryam.tahhan@intel.com>
Date: Sun, 14 Aug 2016 12:21:34 +0100
Subject: docs: add userguide

Add a userguide that describes the SFQM features.

JIRA: DOCS-106
Change-Id: Icd57e7353bc813ed42fa295dee907b3c67f4fb93
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
---
 docs/index.rst                                     |   1 +
 ...easuring_telco_traffic_and_performance_KPIs.rst | 195 --------------------
 docs/requirements/03-dpdk_ka.rst                   | 128 -------------
 docs/requirements/dpdk_ka.png                      | Bin 100808 -> 0 bytes
 docs/requirements/index.rst                        |   3 -
 docs/requirements/monitoring_interfaces.png        | Bin 94097 -> 0 bytes
 docs/requirements/stats_and_timestamps.png         | Bin 52193 -> 0 bytes
 docs/userguide/collectd.userguide.rst              | 202 +++++++++++++++++++++
 docs/userguide/dpdk_ka.png                         | Bin 0 -> 100808 bytes
 docs/userguide/index.rst                           |  19 ++
 docs/userguide/keepalive.userguide.rst             | 128 +++++++++++++
 docs/userguide/monitoring_interfaces.png           | Bin 0 -> 94097 bytes
 docs/userguide/stats_and_timestamps.png            | Bin 0 -> 52193 bytes
 13 files changed, 350 insertions(+), 326 deletions(-)
 delete mode 100644 docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst
 delete mode 100644 docs/requirements/03-dpdk_ka.rst
 delete mode 100644 docs/requirements/dpdk_ka.png
 delete mode 100644 docs/requirements/monitoring_interfaces.png
 delete mode 100644 docs/requirements/stats_and_timestamps.png
 create mode 100644 docs/userguide/collectd.userguide.rst
 create mode 100644 docs/userguide/dpdk_ka.png
 create mode 100644 docs/userguide/index.rst
 create mode 100644 docs/userguide/keepalive.userguide.rst
 create mode 100644 docs/userguide/monitoring_interfaces.png
 create mode 100644 docs/userguide/stats_and_timestamps.png

(limited to 'docs')

diff --git a/docs/index.rst b/docs/index.rst
index d9d557f..5f0aae4 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -25,4 +25,5 @@ Table of Contents:
      :numbered:
 
      requirements/index.rst
+     userguide/index.rst
      release/index.rst
diff --git a/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst b/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst
deleted file mode 100644
index 7f0d486..0000000
--- a/docs/requirements/02-measuring_telco_traffic_and_performance_KPIs.rst
+++ /dev/null
@@ -1,195 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-Measuring Telco Traffic and Performance KPIs
-============================================
-This section will discuss the SFQM features that enable Measuring Telco Traffic
-and Performance KPIs.
-
-.. Figure:: stats_and_timestamps.png
-
-   Measuring Telco Traffic and Performance KPIs
-
-* The very first thing SFQM enabled was a call-back API in DPDK and an
-  associated application that used the API to demonstrate how to timestamp
-  packets and measure packet latency in DPDK (the sample app is called
-  rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by
-  the interfaces 1 and 2 in Figure 1.2.
-
-* The second thing SFQM implemented in DPDK is the extended NIC statistics API,
-  which exposes NIC stats including error stats to the DPDK user by reading the
-  registers on the NIC. This is represented by interface 3 in Figure 1.2.
-
-  * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver,
-    in association with a sample application that runs as a DPDK secondary
-    process and retrieves the extended NIC stats.
-
-  * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual
-    Functions (VFs) for all drivers.
-
-  * For DPDK 16.07 the API migrated from using string value pairs to using id
-    value pairs, improving the overall performance of the API.
-
-Monitoring DPDK interfaces
-===========================
-With the features SFQM enabled in DPDK to enable measuring Telco traffic and
-performance KPIs, we can now retrieve NIC statistics including error stats and
-relay them to a DPDK user. The next step is to enable monitoring of the DPDK
-interfaces based on the stats that we are retrieving from the NICs, by relaying
-the information to a higher level Fault Management entity. To enable this SFQM
-has been enabling a number of plugins for collectd.
-
-collectd
----------
-collectd is a daemon which collects system performance statistics periodically
-and provides a variety of mechanisms to publish the collected metrics. It
-supports more than 90 different input and output plugins. Input plugins retrieve
-metrics and publish them to the collectd deamon, while output plugins publish
-the data they receive to an end point. collectd also has infrastructure to
-support thresholding and notification.
-
-Statistics and Notifications
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Within collectd notifications and performance data are dispatched in the same
-way. There are producer plugins (plugins that create notifications/metrics),
-and consumer plugins (plugins that receive notifications/metrics and do
-something with them).
-
-Statistics in collectd consist of a value list. A value list includes:
-
-* Values, can be one of:
-
-  * Derive: used for values where a change in the value since it's last been
-    read is of interest. Can be used to calculate and store a rate.
-
-  * Counter: similar to derive values, but take the possibility of a counter
-    wrap around into consideration.
-
-  * Gauge: used for values that are stored as is.
-
-  * Absolute: used for counters that are reset after reading.
-
-* Value length: the number of values in the data set.
-
-* Time: timestamp at which the value was collected.
-
-* Interval: interval at which to expect a new value.
-
-* Host: used to identify the host.
-
-* Plugin: used to identify the plugin.
-
-* Plugin instance (optional): used to group a set of values together. For e.g.
-  values belonging to a DPDK interface.
-
-* Type: unit used to measure a value. In other words used to refer to a data
-  set.
-
-* Type instance (optional): used to distinguish between values that have an
-  identical type.
-
-* meta data: an opaque data structure that enables the passing of additional
-  information about a value list. "Meta data in the global cache can be used to
-  store arbitrary information about an identifier" [7].
-
-Host, plugin, plugin instance, type and type instance uniquely identify a
-collectd value.
-
-Values lists are often accompanied by data sets that describe the values in more
-detail. Data sets consist of:
-
-* A type: a name which uniquely identifies a data set.
-
-* One or more data sources (entries in a data set) which include:
-
-  * The name of the data source. If there is only a single data source this is
-    set to "value".
-
-  * The type of the data source, one of: counter, gauge, absolute or derive.
-
-  * A min and a max value.
-
-Types in collectd are defined in types.db. Examples of types in types.db:
-
-.. code-block:: console
-
-    bitrate    value:GAUGE:0:4294967295
-    counter    value:COUNTER:U:U
-    if_octets  rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295
-
-In the example above if_octets has two data sources: tx and rx.
-
-Notifications in collectd are generic messages containing:
-
-* An associated severity, which can be one of OKAY, WARNING, and FAILURE.
-
-* A time.
-
-* A Message
-
-* A host.
-
-* A plugin.
-
-* A plugin instance (optional).
-
-* A type.
-
-* A types instance (optional).
-
-* Meta-data.
-
-collectd plugins
-----------------
-SFQM has enabled three collectd plugins to date:
-
-* `dpdkstat plugin`_: A read plugin that retrieve stats from the DPDK extended
-   NIC stats API.
-
-* `ceilometer plugin`_: A write plugin that pushes the retrieved stats to
-  Ceilometer. It's capable of pushing any stats read through collectd to
-  Ceilometer, not just the DPDK stats.
-
-* `hugepages plugin`_:  A read plugin that retrieves the number of available
-  and free hugepages on a platform as well as what is available in terms of
-  hugepages per socket.
-
-Other plugins in progress:
-
-* dpdkevents:  A read plugin that retrieves DPDK link status and DPDK
-  forwarding cores liveliness status (DPDK Keep Alive).
-
-* Open vSwitch stats Plugin: A read plugin that retrieve flow and interface
-  stats from OVS.
-
-* Open vSwitch events Plugin: A read plugin that retrieves events from OVS.
-
-
-Monitoring Interfaces and Openstack Support
--------------------------------------------
-.. Figure:: monitoring_interfaces.png
-
-   Monitoring Interfaces and Openstack Support
-
-The figure above shows the DPDK L2 forwarding application running on a compute
-node, sending and receiving traffic. collectd is also running on this compute
-node retrieving the stats periodically from DPDK through the dpdkstat plugin
-and publishing the retrieved stats to Ceilometer through the ceilometer plugin.
-
-To see this demo in action please checkout: `SFQM OPNFV Summit demo`_
-
-References
-----------
-[1] https://collectd.org/wiki/index.php/Naming_schema
-[2] https://github.com/collectd/collectd/blob/master/src/daemon/plugin.h
-[3] https://collectd.org/wiki/index.php/Value_list_t
-[4] https://collectd.org/wiki/index.php/Data_set
-[5] https://collectd.org/documentation/manpages/types.db.5.shtml
-[6] https://collectd.org/wiki/index.php/Data_source
-[7] https://collectd.org/wiki/index.php/Meta_Data_Interface
-
-.. _SFQM OPNFV Summit demo: https://prezi.com/kjv6o8ixs6se/software-fastpath-service-quality-metrics-demo/
-.. _dpdkstat plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/dpdkstat
-.. _ceilometer plugin: https://github.com/openstack/collectd-ceilometer-plugin/tree/stable/mitaka
-.. _hugepages plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/hugepages
diff --git a/docs/requirements/03-dpdk_ka.rst b/docs/requirements/03-dpdk_ka.rst
deleted file mode 100644
index ce3e7e4..0000000
--- a/docs/requirements/03-dpdk_ka.rst
+++ /dev/null
@@ -1,128 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-.. (c) OPNFV, Intel Corporation and others.
-
-DPDK Keep Alive Overview
-=========================
-SFQM aims to enable fault detection within DPDK, the very first feature to
-meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2.
-
-DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog
-for DPDK packet processing cores, to detect application thread failure. The
-application supports the detection of ‘failed’ DPDK cores and notification to a
-HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g.
-infinite loop) and ensure the failure of the core does not result in a fault
-that is not detectable by a management entity.
-
-.. Figure:: dpdk_ka.png
-
-   DPDK Keep Alive Sample Application
-
-Essentially the app demonstrates how to detect 'silent outages' on DPDK packet
-processing cores. The application can be decomposed into two specific parts:
-detection and notification.
-
-* The detection period is programmable/configurable but defaults to 5ms if no
-  timeout is specified.
-* The Notification support is enabled by simply having a hook function that where this
-  can be 'call back support' for a fault management application with a compliant
-  heartbeat mechanism.
-
-DPDK Keep Alive Sample App Internals
-====================================
-This section provides some explanation of the The Keep-Alive/'Liveliness'
-conceptual scheme as well as the DPDK Keep Alive App. The initialization and
-run-time paths are very similar to those of the L2 forwarding application (see
-`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more
-information).
-
-There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core)
-and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core
-will supervise worker cores and report any failure (2 successive missed pings).
-The Keep-Alive/'Liveliness' conceptual scheme is:
-
-* DPDK worker cores mark their liveliness as they forward traffic.
-* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to
-  inspect worker core liveliness.
-* If keep-alive agent detects time-outs, it notifies the fault management
-  entity through a call-back function.
-
-**Note:**  Only the worker cores state is monitored. There is no mechanism or agent
-to monitor the Keep Alive Monitor Agent Core.
-
-DPDK Keep Alive Sample App Code Internals
-=========================================
-The following section provides some explanation of the code aspects that are
-specific to the Keep Alive sample application.
-
-The heartbeat functionality is initialized with a struct rte_heartbeat and the
-callback function to invoke in the case of a timeout.
-
-.. code:: c
-
-    rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
-    if (rte_global_hbeat_info == NULL)
-        rte_exit(EXIT_FAILURE, "keepalive_create() failed");
-
-The function that issues the pings hbeat_dispatch_pings() is configured to run
-every check_period milliseconds.
-
-.. code:: c
-
-    if (rte_timer_reset(&hb_timer,
-            (check_period * rte_get_timer_hz()) / 1000,
-            PERIODICAL,
-            rte_lcore_id(),
-            &hbeat_dispatch_pings, rte_global_keepalive_info
-            ) != 0 )
-        rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
-
-The rest of the initialization and run-time path follows the same paths as the
-the L2 forwarding application. The only addition to the main processing loop is
-the mark alive functionality and the example random failures.
-
-.. code:: c
-
-    rte_keepalive_mark_alive(&rte_global_hbeat_info);
-    cur_tsc = rte_rdtsc();
-
-    /* Die randomly within 7 secs for demo purposes.. */
-    if (cur_tsc - tsc_initial > tsc_lifetime)
-    break;
-
-The rte_keepalive_mark_alive() function simply sets the core state to alive.
-
-.. code:: c
-
-    static inline void
-    rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg)
-    {
-        keepcfg->state_flags[rte_lcore_id()] = 1;
-    }
-
-Keep Alive Monitor Agent Core Monitoring Options
-The application can run on either a host or a guest. As such there are a number
-of options for monitoring the Keep Alive Monitor Agent Core through a Local
-Agent on the compute node:
-
-         ======================  ==========  =============
-          Application Location     DPDK KA     LOCAL AGENT
-         ======================  ==========  =============
-                  HOST               X        HOST/GUEST
-                  GUEST              X        HOST/GUEST
-         ======================  ==========  =============
-
-
-For the first implementation of a Local Agent SFQM will enable:
-
-         ======================  ==========  =============
-          Application Location     DPDK KA     LOCAL AGENT
-         ======================  ==========  =============
-                  HOST               X           HOST
-         ======================  ==========  =============
-
-Through extending the dpdkstat plugin for collectd with KA functionality, and
-integrating the extended plugin with Monasca for high performing, resilient,
-and scalable fault detection.
-
-.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html
diff --git a/docs/requirements/dpdk_ka.png b/docs/requirements/dpdk_ka.png
deleted file mode 100644
index 4a45e10..0000000
Binary files a/docs/requirements/dpdk_ka.png and /dev/null differ
diff --git a/docs/requirements/index.rst b/docs/requirements/index.rst
index a58b103..a9be153 100644
--- a/docs/requirements/index.rst
+++ b/docs/requirements/index.rst
@@ -7,6 +7,3 @@
      :numbered:
 
      01-intro.rst
-     02-measuring_telco_traffic_and_performance_KPIs.rst
-     03-dpdk_ka.rst
-     04-release_b.rst
diff --git a/docs/requirements/monitoring_interfaces.png b/docs/requirements/monitoring_interfaces.png
deleted file mode 100644
index e57c4aa..0000000
Binary files a/docs/requirements/monitoring_interfaces.png and /dev/null differ
diff --git a/docs/requirements/stats_and_timestamps.png b/docs/requirements/stats_and_timestamps.png
deleted file mode 100644
index 84aef72..0000000
Binary files a/docs/requirements/stats_and_timestamps.png and /dev/null differ
diff --git a/docs/userguide/collectd.userguide.rst b/docs/userguide/collectd.userguide.rst
new file mode 100644
index 0000000..0755fdf
--- /dev/null
+++ b/docs/userguide/collectd.userguide.rst
@@ -0,0 +1,202 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) OPNFV, Intel Corporation and others.
+
+collectd plugins description
+============================
+The SFQM collectd plugins enable the ability to monitor DPDK interfaces by
+exposing stats and the relevant events to higher level telemetry and fault
+management applications. The following sections will discuss the SFQM features
+in detail.
+
+Measuring Telco Traffic and Performance KPIs
+--------------------------------------------
+This section will discuss the SFQM features that enable Measuring Telco Traffic
+and Performance KPIs.
+
+.. Figure:: stats_and_timestamps.png
+
+   Measuring Telco Traffic and Performance KPIs
+
+* The very first thing SFQM enabled was a call-back API in DPDK and an
+  associated application that used the API to demonstrate how to timestamp
+  packets and measure packet latency in DPDK (the sample app is called
+  rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by
+  the interfaces 1 and 2 in Figure 1.2.
+
+* The second thing SFQM implemented in DPDK is the extended NIC statistics API,
+  which exposes NIC stats including error stats to the DPDK user by reading the
+  registers on the NIC. This is represented by interface 3 in Figure 1.2.
+
+  * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver,
+    in association with a sample application that runs as a DPDK secondary
+    process and retrieves the extended NIC stats.
+
+  * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual
+    Functions (VFs) for all drivers.
+
+  * For DPDK 16.07 the API migrated from using string value pairs to using id
+    value pairs, improving the overall performance of the API.
+
+Monitoring DPDK interfaces
+--------------------------
+With the features SFQM enabled in DPDK to enable measuring Telco traffic and
+performance KPIs, we can now retrieve NIC statistics including error stats and
+relay them to a DPDK user. The next step is to enable monitoring of the DPDK
+interfaces based on the stats that we are retrieving from the NICs, by relaying
+the information to a higher level Fault Management entity. To enable this SFQM
+has been enabling a number of plugins for collectd.
+
+collectd
+~~~~~~~~
+collectd is a daemon which collects system performance statistics periodically
+and provides a variety of mechanisms to publish the collected metrics. It
+supports more than 90 different input and output plugins. Input plugins retrieve
+metrics and publish them to the collectd deamon, while output plugins publish
+the data they receive to an end point. collectd also has infrastructure to
+support thresholding and notification.
+
+collectd statistics and Notifications
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Within collectd notifications and performance data are dispatched in the same
+way. There are producer plugins (plugins that create notifications/metrics),
+and consumer plugins (plugins that receive notifications/metrics and do
+something with them).
+
+Statistics in collectd consist of a value list. A value list includes:
+
+* Values, can be one of:
+
+  * Derive: used for values where a change in the value since it's last been
+    read is of interest. Can be used to calculate and store a rate.
+
+  * Counter: similar to derive values, but take the possibility of a counter
+    wrap around into consideration.
+
+  * Gauge: used for values that are stored as is.
+
+  * Absolute: used for counters that are reset after reading.
+
+* Value length: the number of values in the data set.
+
+* Time: timestamp at which the value was collected.
+
+* Interval: interval at which to expect a new value.
+
+* Host: used to identify the host.
+
+* Plugin: used to identify the plugin.
+
+* Plugin instance (optional): used to group a set of values together. For e.g.
+  values belonging to a DPDK interface.
+
+* Type: unit used to measure a value. In other words used to refer to a data
+  set.
+
+* Type instance (optional): used to distinguish between values that have an
+  identical type.
+
+* meta data: an opaque data structure that enables the passing of additional
+  information about a value list. "Meta data in the global cache can be used to
+  store arbitrary information about an identifier" [7].
+
+Host, plugin, plugin instance, type and type instance uniquely identify a
+collectd value.
+
+Values lists are often accompanied by data sets that describe the values in more
+detail. Data sets consist of:
+
+* A type: a name which uniquely identifies a data set.
+
+* One or more data sources (entries in a data set) which include:
+
+  * The name of the data source. If there is only a single data source this is
+    set to "value".
+
+  * The type of the data source, one of: counter, gauge, absolute or derive.
+
+  * A min and a max value.
+
+Types in collectd are defined in types.db. Examples of types in types.db:
+
+.. code-block:: console
+
+    bitrate    value:GAUGE:0:4294967295
+    counter    value:COUNTER:U:U
+    if_octets  rx:COUNTER:0:4294967295, tx:COUNTER:0:4294967295
+
+In the example above if_octets has two data sources: tx and rx.
+
+Notifications in collectd are generic messages containing:
+
+* An associated severity, which can be one of OKAY, WARNING, and FAILURE.
+
+* A time.
+
+* A Message
+
+* A host.
+
+* A plugin.
+
+* A plugin instance (optional).
+
+* A type.
+
+* A types instance (optional).
+
+* Meta-data.
+
+collectd plugins
+----------------
+SFQM has enabled three collectd plugins to date:
+
+* `dpdkstat plugin`_: A read plugin that retrieve stats from the DPDK extended
+   NIC stats API.
+
+* `ceilometer plugin`_: A write plugin that pushes the retrieved stats to
+  Ceilometer. It's capable of pushing any stats read through collectd to
+  Ceilometer, not just the DPDK stats.
+
+* `hugepages plugin`_:  A read plugin that retrieves the number of available
+  and free hugepages on a platform as well as what is available in terms of
+  hugepages per socket.
+
+Other plugins in progress:
+
+* dpdkevents:  A read plugin that retrieves DPDK link status and DPDK
+  forwarding cores liveliness status (DPDK Keep Alive).
+
+* Open vSwitch stats Plugin: A read plugin that retrieve flow and interface
+  stats from OVS.
+
+* Open vSwitch events Plugin: A read plugin that retrieves events from OVS.
+
+
+Monitoring Interfaces and Openstack Support
+-------------------------------------------
+.. Figure:: monitoring_interfaces.png
+
+   Monitoring Interfaces and Openstack Support
+
+The figure above shows the DPDK L2 forwarding application running on a compute
+node, sending and receiving traffic. collectd is also running on this compute
+node retrieving the stats periodically from DPDK through the dpdkstat plugin
+and publishing the retrieved stats to Ceilometer through the ceilometer plugin.
+
+To see this demo in action please checkout: `SFQM OPNFV Summit demo`_
+
+References
+----------
+[1] https://collectd.org/wiki/index.php/Naming_schema
+[2] https://github.com/collectd/collectd/blob/master/src/daemon/plugin.h
+[3] https://collectd.org/wiki/index.php/Value_list_t
+[4] https://collectd.org/wiki/index.php/Data_set
+[5] https://collectd.org/documentation/manpages/types.db.5.shtml
+[6] https://collectd.org/wiki/index.php/Data_source
+[7] https://collectd.org/wiki/index.php/Meta_Data_Interface
+
+.. _SFQM OPNFV Summit demo: https://prezi.com/kjv6o8ixs6se/software-fastpath-service-quality-metrics-demo/
+.. _dpdkstat plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/dpdkstat
+.. _ceilometer plugin: https://github.com/openstack/collectd-ceilometer-plugin/tree/stable/mitaka
+.. _hugepages plugin: https://github.com/maryamtahhan/collectd-with-DPDK/tree/hugepages
diff --git a/docs/userguide/dpdk_ka.png b/docs/userguide/dpdk_ka.png
new file mode 100644
index 0000000..4a45e10
Binary files /dev/null and b/docs/userguide/dpdk_ka.png differ
diff --git a/docs/userguide/index.rst b/docs/userguide/index.rst
new file mode 100644
index 0000000..994f63c
--- /dev/null
+++ b/docs/userguide/index.rst
@@ -0,0 +1,19 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+   .. http://creativecommons.org/licenses/by/4.0
+      .. (c) <optionally add copywriters name>
+
+====================
+SFQM user guide
+====================
+
+.. The feature user guide should provide an OPNFV user with enough information to
+   .. use the features provided by the feature project in the supported scenarios.
+      .. This guide should walk a user through the usage of the features once a scenario
+         .. has been deployed and is active according to the installation guide provided
+            .. by the installer project.
+
+.. toctree::
+     :maxdepth: 3
+
+     collectd.userguide.rst
+     keepalive.userguide.rst
diff --git a/docs/userguide/keepalive.userguide.rst b/docs/userguide/keepalive.userguide.rst
new file mode 100644
index 0000000..4b6e990
--- /dev/null
+++ b/docs/userguide/keepalive.userguide.rst
@@ -0,0 +1,128 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) OPNFV, Intel Corporation and others.
+
+DPDK Keep Alive description
+===========================
+SFQM aims to enable fault detection within DPDK, the very first feature to
+meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2.
+
+DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog
+for DPDK packet processing cores, to detect application thread failure. The
+application supports the detection of ‘failed’ DPDK cores and notification to a
+HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g.
+infinite loop) and ensure the failure of the core does not result in a fault
+that is not detectable by a management entity.
+
+.. Figure:: dpdk_ka.png
+
+   DPDK Keep Alive Sample Application
+
+Essentially the app demonstrates how to detect 'silent outages' on DPDK packet
+processing cores. The application can be decomposed into two specific parts:
+detection and notification.
+
+* The detection period is programmable/configurable but defaults to 5ms if no
+  timeout is specified.
+* The Notification support is enabled by simply having a hook function that where this
+  can be 'call back support' for a fault management application with a compliant
+  heartbeat mechanism.
+
+DPDK Keep Alive Sample App Internals
+------------------------------------
+This section provides some explanation of the The Keep-Alive/'Liveliness'
+conceptual scheme as well as the DPDK Keep Alive App. The initialization and
+run-time paths are very similar to those of the L2 forwarding application (see
+`L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more
+information).
+
+There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core)
+and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core
+will supervise worker cores and report any failure (2 successive missed pings).
+The Keep-Alive/'Liveliness' conceptual scheme is:
+
+* DPDK worker cores mark their liveliness as they forward traffic.
+* A Keep Alive Monitor Agent Core runs a function every N Milliseconds to
+  inspect worker core liveliness.
+* If keep-alive agent detects time-outs, it notifies the fault management
+  entity through a call-back function.
+
+**Note:**  Only the worker cores state is monitored. There is no mechanism or agent
+to monitor the Keep Alive Monitor Agent Core.
+
+DPDK Keep Alive Sample App Code Internals
+-----------------------------------------
+The following section provides some explanation of the code aspects that are
+specific to the Keep Alive sample application.
+
+The heartbeat functionality is initialized with a struct rte_heartbeat and the
+callback function to invoke in the case of a timeout.
+
+.. code:: c
+
+    rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
+    if (rte_global_hbeat_info == NULL)
+        rte_exit(EXIT_FAILURE, "keepalive_create() failed");
+
+The function that issues the pings hbeat_dispatch_pings() is configured to run
+every check_period milliseconds.
+
+.. code:: c
+
+    if (rte_timer_reset(&hb_timer,
+            (check_period * rte_get_timer_hz()) / 1000,
+            PERIODICAL,
+            rte_lcore_id(),
+            &hbeat_dispatch_pings, rte_global_keepalive_info
+            ) != 0 )
+        rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
+
+The rest of the initialization and run-time path follows the same paths as the
+the L2 forwarding application. The only addition to the main processing loop is
+the mark alive functionality and the example random failures.
+
+.. code:: c
+
+    rte_keepalive_mark_alive(&rte_global_hbeat_info);
+    cur_tsc = rte_rdtsc();
+
+    /* Die randomly within 7 secs for demo purposes.. */
+    if (cur_tsc - tsc_initial > tsc_lifetime)
+    break;
+
+The rte_keepalive_mark_alive() function simply sets the core state to alive.
+
+.. code:: c
+
+    static inline void
+    rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg)
+    {
+        keepcfg->state_flags[rte_lcore_id()] = 1;
+    }
+
+Keep Alive Monitor Agent Core Monitoring Options
+The application can run on either a host or a guest. As such there are a number
+of options for monitoring the Keep Alive Monitor Agent Core through a Local
+Agent on the compute node:
+
+         ======================  ==========  =============
+          Application Location     DPDK KA     LOCAL AGENT
+         ======================  ==========  =============
+                  HOST               X        HOST/GUEST
+                  GUEST              X        HOST/GUEST
+         ======================  ==========  =============
+
+
+For the first implementation of a Local Agent SFQM will enable:
+
+         ======================  ==========  =============
+          Application Location     DPDK KA     LOCAL AGENT
+         ======================  ==========  =============
+                  HOST               X           HOST
+         ======================  ==========  =============
+
+Through extending the dpdkstat plugin for collectd with KA functionality, and
+integrating the extended plugin with Monasca for high performing, resilient,
+and scalable fault detection.
+
+.. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html
diff --git a/docs/userguide/monitoring_interfaces.png b/docs/userguide/monitoring_interfaces.png
new file mode 100644
index 0000000..e57c4aa
Binary files /dev/null and b/docs/userguide/monitoring_interfaces.png differ
diff --git a/docs/userguide/stats_and_timestamps.png b/docs/userguide/stats_and_timestamps.png
new file mode 100644
index 0000000..84aef72
Binary files /dev/null and b/docs/userguide/stats_and_timestamps.png differ
-- 
cgit