5 files changed, 152 insertions, 15 deletions
diff --git a/README.rst b/README.rst
index 0a9295d..0ed92c8 100644
--- a/README.rst
+++ b/README.rst
@@ -1,14 +1,27 @@
 NFVbench: A Network Performance Benchmarking Tool for NFVi Full Stacks
 **********************************************************************
 
-The NFVbench tool provides an automated way to measure the network performance for the most common data plane packet flows on any OpenStack based NFVi system viewed as a black box (NFVi Full Stack).
+The NFVbench tool provides an automated way to measure the network performance for the most common data plane packet flows
+on any NFVi system viewed as a black box (NFVi Full Stack).
 An NFVi full stack exposes the following interfaces:
-- an OpenStack API
+- an OpenStack API for those NFVi platforms based on OpenStack
 - an interface to send and receive packets on the data plane (typically through top of rack switches)
 
-The NFVi full stack does not have to be supported by the OPNFV ecosystem and can be any functional OpenStack system that provides the aboce interfaces. NFVbench can be installed standalone (in the form of a single Docker container) and is fully functional without the need to install any other OPNFV framework.
+The NFVi full stack does not have to be supported by the OPNFV ecosystem and can be any functional OpenStack system that provides
+the above interfaces.
+NFVbench can also be used without OpenStack on any bare system that can handle L2 forwarding or L3 routing.
 
-It is designed to be easy to install and easy to use by non experts (no need to be an expert in traffic generators and data plane performance testing).
+NFVbench can be installed standalone (in the form of a single Docker container) and is fully functional without
+the need to install any other OPNFV framework.
+
+It is designed to be easy to install and easy to use by non experts (no need to be an expert in traffic generators and data plane
+performance benchmarking).
+
+Online Documentation
+--------------------
+The latest version of the NFVbench documentation is available online at:
+
+http://docs.opnfv.org/en/latest/submodules/nfvbench/docs/testing/user/userguide/index.html
 
 
 Contact Information
@@ -16,4 +29,3 @@ Contact Information
 Inquiries and questions: send an email to opnfv-tech-discuss@lists.opnfv.org with a Subject line starting with "[nfvbench]".
 
 Open issues or submit an issue or enhancement request: https://jira.opnfv.org/projects/NFVBENCH/issues (this requires an OPNFV Linux Foundation login).
-
diff --git a/docs/development/design/index.rst b/docs/development/design/index.rst
index c54888a..0500ca2 100644
--- a/docs/development/design/index.rst
+++ b/docs/development/design/index.rst
@@ -12,4 +12,5 @@ OPNFV NFVbench Euphrates Design
 
    design
    versioning
+   traffic_desc
    ndrpdr
diff --git a/docs/development/design/ndrpdr.rst b/docs/development/design/ndrpdr.rst
index 5361174..e34e8ba 100644
--- a/docs/development/design/ndrpdr.rst
+++ b/docs/development/design/ndrpdr.rst
@@ -6,11 +6,15 @@
 NDR/PDR Binary Search
 =====================
 
+The NDR/PDR binary search algorithm used by NFVbench is based on the algorithm used by the
+FD.io CSIT project, with some additional optimizations.
+
 Algorithm Outline
 -----------------
 
-The ServiceChain class is responsible for calculating the NDR/PDR for all frame sizes requested in the configuration.
-Calculation for 1 frame size is delegated to the TrafficClient class.
+The ServiceChain class (nfvbench/service_chain.py) is responsible for calculating the NDR/PDR
+or all frame sizes requested in the configuration.
+Calculation for 1 frame size is delegated to the TrafficClient class (nfvbench/traffic_client.py)
 
 Call chain for calculating the NDR-PDR for a list of frame sizes:
 
@@ -22,23 +26,58 @@ Call chain for calculating the NDR-PDR for a list of frame sizes:
                     - TrafficClient.__range_search() recursive binary search
 
 The search range is delimited by a left and right rate (expressed as a % of line rate per direction).
+The search always start at line rate per port, e.g. in the case of 2x10Gbps, the first iteration
+will send 10Gbps of traffic on each port.
 
 The load_epsilon configuration parameter defines the accuracy of the result as a % of line rate.
 The default value of 0.1 indicates for example that the measured NDR and PDR are within 0.1% of line rate of the
 actual NDR/PDR (e.g. 0.1% of 10Gbps is 10Mbps). It also determines how small the search range must be in the binary search.
+Smaller values of load_epsilon will result in more iterations and will take more time but may not
+always be beneficial if the absolute value falls below the precision level of the measurement.
+For example a value of 0.01% would translate to an absolute value of 1Mbps (for a 10Gbps port) or
+around 10kpps (at 64 byte size) which might be too fine grain.
 
 The recursion narrows down the range by half and stops when:
 
 - the range is smaller than the configured load_epsilon value
 - or when the search hits 100% or 0% of line rate
 
+Optimization
+------------
+
+Binary search algorithms assume that the drop rate curve is monotonically increasing with the Tx rate.
+To save time, the algorithm used by NFVbench is capable of calculating the optimal Tx rate for an
+arbitrary list of target maximum drop rates in one pass instead of the usual 1 pass per target maximum drop rate.
+This saves time linearly to the number target drop rates.
+For example, a typical NDR/PDR search will have 2 target maximum drop rates:
+
+- NDR = 0.001%
+- PDR = 0.1%
+
+The binary search will then start with a sorted list of 2 target drop rates: [0.1, 0.001].
+The first part of the binary search will then focus on finding the optimal rate for the first target
+drop rate (0.1%). When found, the current target drop rate is removed from the list and
+iteration continues with the next target drop rate in the list but this time
+starting from the upper/lower range of the previous target drop rate, which saves significant time.
+The binary search continues until the target maximum drop rate list is empty.
+
+Results Granularity
+-------------------
+The binary search results contain per direction stats (forward and reverse).
+In the case of multi-chaining, results contain per chain stats.
+The current code only reports aggregated stats (forward + reverse for all chains) but could be enhanced
+to report per chain stats.
+
+
+CPU Limitations
+---------------
 One particularity of using a software traffic generator is that the requested Tx rate may not always be met due to
 resource limitations (e.g. CPU is not fast enough to generate a very high load). The algorithm should take this into
 consideration:
 
-- always monitor the actual Tx rate achieved
+- always monitor the actual Tx rate achieved as reported back by the traffic generator
 - actual Tx rate is always <= requested Tx rate
 - the measured drop rate should always be relative to the actual Tx rate
-- if the actual Tx rate is < requested Tx rate and the measured drop rate is already within threshold (<NDR/PDR threshold) then the binary search must stop with proper warning
-
-
+- if the actual Tx rate is < requested Tx rate and the measured drop rate is already within threshold
+ (<NDR/PDR threshold) then the binary search must stop with proper warning because the actual NDR/PDR
+ might probably be higher than the reported values
diff --git a/docs/development/design/traffic_desc.rst b/docs/development/design/traffic_desc.rst
new file mode 100644
index 0000000..2a40b6a
--- /dev/null
+++ b/docs/development/design/traffic_desc.rst
@@ -0,0 +1,85 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International
+.. License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) Cisco Systems, Inc
+
+Traffic Description
+===================
+
+The general packet path model followed by NFVbench requires injecting traffic into an arbitrary
+number of service chains, where each service chain is identified by 2 edge networks (left and right).
+In the current multi-chaining model:
+
+- all service chains share the same left and right edge networks
+- each port associated to the traffic generator is dedicated to send traffic to one edge network
+
+In an OpenStack deployment, this corresponds to all chains sharing the same 2 neutron networks.
+If VLAN encapsulation is used, all traffic sent to a port will have the same VLAN id.
+
+Basic Packet Description
+------------------------
+
+The code to create the UDP packet is located in TRex.create_pkt() (nfvbench/traffic_gen/trex.py).
+
+NFVbench always generates UDP packets (even when doing L2 forwarding).
+The final size of the frame containing each UDP packet will be based on the requested L2 frame size.
+When taking into account the minimum payload size requirements from the traffic generator for
+the latency streams, the minimum L2 frame size is 64 byte (no vlan tagging) or
+68 bytes (with vlan tagging).
+
+Flows Specification
+-------------------
+
+Mac Addresses
+.............
+The source MAC address is always the local port MAC address (for each port).
+The destination MAC address is based on the configuration and can be:
+
+- the traffic generator peer port MAC address in the case of L2 loopback at the switch level
+  or when using a loopback cable
+- the dest MAC as specified by the configuration file (EXT chain no ARP)
+- the dest MAC as discovered by ARP (EXT chain)
+- the VM MAC as dicovered from Neutron API (PVP, PVVP chains)
+
+NFVbench does not currently range on the MAC addresses.
+
+IP addresses
+............
+The source IP address is fixed per chain.
+The destination IP address is variable within a distinct range per chain.
+
+UDP ports
+.........
+The source and destination ports are fixed for all packets and can be set in the configuratoon
+file (default is 53).
+
+Payload User Data
+.................
+The length of the user data is based on the requested L2 frame size and takes into account the
+size of the L2 header - including the VLAN tag if applicable.
+
+
+IMIX Support
+------------
+In the case of IMIX, each direction is made of 4 streams:
+- 1 latency stream
+- 1 stream for each IMIX frame size
+
+The IMIX ratio is encoded into the number of consecutive packets sent by each stream in turn.
+
+Service Chains and Streams
+--------------------------
+A stream identifies one "stream" of packets with same characteristics such as rate and destination address.
+NFVbench will create 2 streams per service chain per direction:
+
+- 1 latency stream set to 1000pps
+- 1 main traffic stream set to the requested Tx rate less the latency stream rate (1000pps)
+
+For example, a benchmark with 1 chain (fixed rate) will result in a total of 4 streams.
+A benchmark with 20 chains will results in a total of 80 streams (fixed rate, it is more with IMIX).
+
+The overall flows are split equally between the number of chains by using the appropriate destination
+MAC address.
+
+For example, in the case of 10 chains, 1M flows and fixed rate, there will be a total of 40 streams.
+Each of the 20 non-latency stream will generate packets corresponding to 50,000 flows (unique src/dest address tuples).
diff --git a/docs/testing/user/userguide/readme.rst b/docs/testing/user/userguide/readme.rst
index 785ffed..1cb8d6b 100644
--- a/docs/testing/user/userguide/readme.rst
+++ b/docs/testing/user/userguide/readme.rst
@@ -179,7 +179,7 @@ NFVbench is agnostic of the virtual switch implementation and has been tested wi
 
 
 
-
-
-
-
+Limitations
+***********
+NFVbench only supports VLAN with OpenStack.
+NFVbench does not support VxLAN overlays.