3 files changed, 131 insertions, 6 deletions
diff --git a/docs/development/design/index.rst b/docs/development/design/index.rst
index c54888a..0500ca2 100644
--- a/docs/development/design/index.rst
+++ b/docs/development/design/index.rst
@@ -12,4 +12,5 @@ OPNFV NFVbench Euphrates Design
 
    design
    versioning
+   traffic_desc
    ndrpdr
diff --git a/docs/development/design/ndrpdr.rst b/docs/development/design/ndrpdr.rst
index 5361174..e34e8ba 100644
--- a/docs/development/design/ndrpdr.rst
+++ b/docs/development/design/ndrpdr.rst
@@ -6,11 +6,15 @@
 NDR/PDR Binary Search
 =====================
 
+The NDR/PDR binary search algorithm used by NFVbench is based on the algorithm used by the
+FD.io CSIT project, with some additional optimizations.
+
 Algorithm Outline
 -----------------
 
-The ServiceChain class is responsible for calculating the NDR/PDR for all frame sizes requested in the configuration.
-Calculation for 1 frame size is delegated to the TrafficClient class.
+The ServiceChain class (nfvbench/service_chain.py) is responsible for calculating the NDR/PDR
+or all frame sizes requested in the configuration.
+Calculation for 1 frame size is delegated to the TrafficClient class (nfvbench/traffic_client.py)
 
 Call chain for calculating the NDR-PDR for a list of frame sizes:
 
@@ -22,23 +26,58 @@ Call chain for calculating the NDR-PDR for a list of frame sizes:
                     - TrafficClient.__range_search() recursive binary search
 
 The search range is delimited by a left and right rate (expressed as a % of line rate per direction).
+The search always start at line rate per port, e.g. in the case of 2x10Gbps, the first iteration
+will send 10Gbps of traffic on each port.
 
 The load_epsilon configuration parameter defines the accuracy of the result as a % of line rate.
 The default value of 0.1 indicates for example that the measured NDR and PDR are within 0.1% of line rate of the
 actual NDR/PDR (e.g. 0.1% of 10Gbps is 10Mbps). It also determines how small the search range must be in the binary search.
+Smaller values of load_epsilon will result in more iterations and will take more time but may not
+always be beneficial if the absolute value falls below the precision level of the measurement.
+For example a value of 0.01% would translate to an absolute value of 1Mbps (for a 10Gbps port) or
+around 10kpps (at 64 byte size) which might be too fine grain.
 
 The recursion narrows down the range by half and stops when:
 
 - the range is smaller than the configured load_epsilon value
 - or when the search hits 100% or 0% of line rate
 
+Optimization
+------------
+
+Binary search algorithms assume that the drop rate curve is monotonically increasing with the Tx rate.
+To save time, the algorithm used by NFVbench is capable of calculating the optimal Tx rate for an
+arbitrary list of target maximum drop rates in one pass instead of the usual 1 pass per target maximum drop rate.
+This saves time linearly to the number target drop rates.
+For example, a typical NDR/PDR search will have 2 target maximum drop rates:
+
+- NDR = 0.001%
+- PDR = 0.1%
+
+The binary search will then start with a sorted list of 2 target drop rates: [0.1, 0.001].
+The first part of the binary search will then focus on finding the optimal rate for the first target
+drop rate (0.1%). When found, the current target drop rate is removed from the list and
+iteration continues with the next target drop rate in the list but this time
+starting from the upper/lower range of the previous target drop rate, which saves significant time.
+The binary search continues until the target maximum drop rate list is empty.
+
+Results Granularity
+-------------------
+The binary search results contain per direction stats (forward and reverse).
+In the case of multi-chaining, results contain per chain stats.
+The current code only reports aggregated stats (forward + reverse for all chains) but could be enhanced
+to report per chain stats.
+
+
+CPU Limitations
+---------------
 One particularity of using a software traffic generator is that the requested Tx rate may not always be met due to
 resource limitations (e.g. CPU is not fast enough to generate a very high load). The algorithm should take this into
 consideration:
 
-- always monitor the actual Tx rate achieved
+- always monitor the actual Tx rate achieved as reported back by the traffic generator
 - actual Tx rate is always <= requested Tx rate
 - the measured drop rate should always be relative to the actual Tx rate
-- if the actual Tx rate is < requested Tx rate and the measured drop rate is already within threshold (<NDR/PDR threshold) then the binary search must stop with proper warning
-
-
+- if the actual Tx rate is < requested Tx rate and the measured drop rate is already within threshold
+ (<NDR/PDR threshold) then the binary search must stop with proper warning because the actual NDR/PDR
+ might probably be higher than the reported values
diff --git a/docs/development/design/traffic_desc.rst b/docs/development/design/traffic_desc.rst
new file mode 100644
index 0000000..2a40b6a
--- /dev/null
+++ b/docs/development/design/traffic_desc.rst
@@ -0,0 +1,85 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International
+.. License.
+.. http://creativecommons.org/licenses/by/4.0
+.. (c) Cisco Systems, Inc
+
+Traffic Description
+===================
+
+The general packet path model followed by NFVbench requires injecting traffic into an arbitrary
+number of service chains, where each service chain is identified by 2 edge networks (left and right).
+In the current multi-chaining model:
+
+- all service chains share the same left and right edge networks
+- each port associated to the traffic generator is dedicated to send traffic to one edge network
+
+In an OpenStack deployment, this corresponds to all chains sharing the same 2 neutron networks.
+If VLAN encapsulation is used, all traffic sent to a port will have the same VLAN id.
+
+Basic Packet Description
+------------------------
+
+The code to create the UDP packet is located in TRex.create_pkt() (nfvbench/traffic_gen/trex.py).
+
+NFVbench always generates UDP packets (even when doing L2 forwarding).
+The final size of the frame containing each UDP packet will be based on the requested L2 frame size.
+When taking into account the minimum payload size requirements from the traffic generator for
+the latency streams, the minimum L2 frame size is 64 byte (no vlan tagging) or
+68 bytes (with vlan tagging).
+
+Flows Specification
+-------------------
+
+Mac Addresses
+.............
+The source MAC address is always the local port MAC address (for each port).
+The destination MAC address is based on the configuration and can be:
+
+- the traffic generator peer port MAC address in the case of L2 loopback at the switch level
+  or when using a loopback cable
+- the dest MAC as specified by the configuration file (EXT chain no ARP)
+- the dest MAC as discovered by ARP (EXT chain)
+- the VM MAC as dicovered from Neutron API (PVP, PVVP chains)
+
+NFVbench does not currently range on the MAC addresses.
+
+IP addresses
+............
+The source IP address is fixed per chain.
+The destination IP address is variable within a distinct range per chain.
+
+UDP ports
+.........
+The source and destination ports are fixed for all packets and can be set in the configuratoon
+file (default is 53).
+
+Payload User Data
+.................
+The length of the user data is based on the requested L2 frame size and takes into account the
+size of the L2 header - including the VLAN tag if applicable.
+
+
+IMIX Support
+------------
+In the case of IMIX, each direction is made of 4 streams:
+- 1 latency stream
+- 1 stream for each IMIX frame size
+
+The IMIX ratio is encoded into the number of consecutive packets sent by each stream in turn.
+
+Service Chains and Streams
+--------------------------
+A stream identifies one "stream" of packets with same characteristics such as rate and destination address.
+NFVbench will create 2 streams per service chain per direction:
+
+- 1 latency stream set to 1000pps
+- 1 main traffic stream set to the requested Tx rate less the latency stream rate (1000pps)
+
+For example, a benchmark with 1 chain (fixed rate) will result in a total of 4 streams.
+A benchmark with 20 chains will results in a total of 80 streams (fixed rate, it is more with IMIX).
+
+The overall flows are split equally between the number of chains by using the appropriate destination
+MAC address.
+
+For example, in the case of 10 chains, 1M flows and fixed rate, there will be a total of 40 streams.
+Each of the 20 non-latency stream will generate packets corresponding to 50,000 flows (unique src/dest address tuples).