summaryrefslogtreecommitdiffstats
path: root/docs/developer
diff options
context:
space:
mode:
authorZhijiang Hu <hu.zhijiang@zte.com.cn>2017-07-31 19:10:57 +0800
committerZhijiang Hu <hu.zhijiang@zte.com.cn>2017-08-01 12:22:52 +0800
commit49829982ba955fea91ad56cc05fd526eab106f14 (patch)
tree24e640b8566886950ce378b1d9474dff9cc8c28d /docs/developer
parentb35baf61e48a1d2aff88e7c3e100aa996422d849 (diff)
Jira: DAISY-36 Update doc structure
This PS update doc structure according to [1]. Note: This PS also add content to doc for describing the mapping methods for map role to discovered nodes. [1] http://docs.opnfv.org/en/stable-danube/how-to-use-docs/documentation-guide.html?highlight=templates#document-structure-and-contribution Change-Id: I7b2ef916753cddd8cd845abae8c7d5865c49e1ac Signed-off-by: Zhijiang Hu <hu.zhijiang@zte.com.cn>
Diffstat (limited to 'docs/developer')
-rw-r--r--docs/developer/design/index.rst16
-rw-r--r--docs/developer/design/multicast.rst278
-rw-r--r--docs/developer/spec/multicast.rst190
3 files changed, 0 insertions, 484 deletions
diff --git a/docs/developer/design/index.rst b/docs/developer/design/index.rst
deleted file mode 100644
index bc5e9f40..00000000
--- a/docs/developer/design/index.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-.. This document is protected/licensed under the following conditions
-.. (c) Sun Jing (ZTE corporation)
-.. Licensed under a Creative Commons Attribution 4.0 International License.
-.. You should have received a copy of the license along with this work.
-.. If not, see <http://creativecommons.org/licenses/by/4.0/>.
-
-*************************
-Design Docs for Daisy4nfv
-*************************
-
-.. toctree::
- :numbered:
- :maxdepth: 2
-
- multicast.rst
-
diff --git a/docs/developer/design/multicast.rst b/docs/developer/design/multicast.rst
deleted file mode 100644
index 89422fe6..00000000
--- a/docs/developer/design/multicast.rst
+++ /dev/null
@@ -1,278 +0,0 @@
-Detailed Design
-===============
-
-Protocol Design
----------------
-
-1. All Protocol headers are 1 byte long or align to 4 bytes.
-2. Packet size should not exceed above 1500(MTU) bytes including UDP/IP header and should
-be align to 4 bytes. In future, MTU can be modified larger than 1500(Jumbo Frame) through
-cmd line option to enlarge the data throughput.
-
-/* Packet header definition (align to 4 bytes) */
-struct packet_ctl {
- uint32_t seq; // packet seq number start from 0, unique in server life cycle.
- uint32_t crc; // checksum
- uint32_t data_size; // payload length
- uint8_t data[0];
-};
-
-/* Buffer info definition (align to 4 bytes) */
-struct buffer_ctl {
- uint32_t buffer_id; // buffer seq number start from 0, unique in server life cycle.
- uint32_t buffer_size; // payload total length of a buffer
- uint32_t packet_id_base; // seq number of the first packet in this buffer.
- uint32_t pkt_count; // number of packet in this buffer, 0 means EOF.
-};
-
-
-3. 1-byte-long header definition
-
-Signals such as the four below are 1 byte long, to simplify the receive process(since it
-cannot be spitted ).
-
-#define CLIENT_READY 0x1
-#define CLIENT_REQ 0x2
-#define CLIENT_DONE 0x4
-#define SERVER_SENT 0x8
-
-Note: Please see the collaboration diagram for their meanings.
-
-4. Retransmission Request Header
-
-/* Retransmition Request Header (align to 4 bytes) */
-struct request_ctl {
- uint32_t req_count; // How many seqs below.
- uint32_t seqs[0]; // packet seqs.
-};
-
-5. Buffer operations
-
-void buffer_init(); // Init the buffer_ctl structure and all(say 1024) packet_ctl
-structures. Allocate buffer memory.
-long buffer_fill(int fd); // fill a buffer from fd, such as stdin
-long buffer_flush(int fd); // flush a buffer to fd, say stdout
-struct packet_ctl *packet_put(struct packet_ctl *new_pkt);// put a packet to a buffer
-and return a free memory slot for the next packet.
-struct packet_ctl *packet_get(uint32_t seq);// get a packet data in buffer by
-indicating the packet seq.
-
-
-How to sync between server threads
-----------------------------------
-
-If children's aaa() operation need to wait the parents's init() to be done, then do it
-literally like this:
-
- UDP Server
- TCP Server1 = spawn( )----> TCP Server1
- init()
- TCP Server2 = spawn( )-----> TCP Server2
- V(sem)----------------------> P(sem) // No child any more
- V(sem)---------------------> P(sem)
- aaa() // No need to V(sem), for no child
- aaa()
-
-If parent's send() operation need to wait the children's ready() done, then do it
-literally too, but is a reverse way:
-
- UDP Server TCP Server1 TCP Server2
- // No child any more
- ready() ready()
- P(sem) <--------------------- V(sem)
- P(sem) <------------------ V(sem)
- send()
-
-Note that the aaa() and ready() operations above run in parallel. If this is not the
-case due to race condition, the sequence above can be modified into this below:
-
- UDP Server TCP Server1 TCP Server2
- // No child any more
- ready()
- P(sem) <--------------------- V(sem)
- ready()
- P(sem) <------------------- V(sem)
- send()
-
-
-In order to implement such chained/zipper sync pattern, a pair of semaphores is
-needed between the parent and the child. One is used by child to wait parent , the
-other is used by parent to wait child. semaphore pair can be allocated by parent
-and pass the pointer to the child over spawn() operation such as pthread_create().
-
-/* semaphore pair definition */
-struct semaphores {
- sem_t wait_parent;
- sem_t wait_child;
-};
-
-Then the semaphore pair can be recorded by threads by using the semlink struct below:
-struct semlink {
- struct semaphores *this; /* used by parent to point to the struct semaphores
- which it created during spawn child. */
- struct semaphores *parent; /* used by child to point to the struct
- semaphores which it created by parent */
-};
-
-chained/zipper sync API:
-
-void sl_wait_child(struct semlink *sl);
-void sl_release_child(struct semlink *sl);
-void sl_wait_parent(struct semlink *sl);
-void sl_release_parent(struct semlink *sl);
-
-API usage is like this.
-
-Thread1(root parent) Thread2(child) Thread3(grandchild)
-sl_wait_parent(noop op)
-sl_release_child
- +---------->sl_wait_parent
- sl_release_child
- +-----------> sl_wait_parent
- sl_release_child(noop op)
- ...
- sl_wait_child(noop op)
- + sl_release_parent
- sl_wait_child <-------------
- + sl_release_parent
-sl_wait_child <------------
-sl_release_parent(noop op)
-
-API implementation:
-
-void sl_wait_child(struct semlink *sl)
-{
- if (sl->this) {
- P(sl->this->wait_child);
- }
-}
-
-void sl_release_child(struct semlink *sl)
-{
- if (sl->this) {
- V(sl->this->wait_parent);
- }
-}
-
-void sl_wait_parent(struct semlink *sl)
-{
- if (sl->parent) {
- P(sl->parent->wait_parent);
- }
-}
-
-void sl_release_parent(struct semlink *sl)
-{
- if (sl->parent) {
- V(sl->parent->wait_child);
- }
-}
-
-Client flow chart
------------------
-See Collaboration Diagram
-
-UDP thread flow chart
----------------------
-See Collaboration Diagram
-
-TCP thread flow chart
----------------------
-
-
-S_INIT --- (UDP initialized) ---> S_ACCEPT --- (accept clients) --+
- |
- /----------------------------------------------------------------/
- V
-S_PREP --- (UDP prepared abuffer)
- ^ |
- | \--> S_SYNC --- (clients ClIENT_READY)
- | |
- | \--> S_SEND --- (clients CLIENT_DONE)
- | |
- | V
- \---------------(bufferctl.pkt_count != 0)-----------------------+
- |
- V
- exit() <--- (bufferctl.pkt_count == 0)
-
-
-TCP using poll and message queue
---------------------------------
-
-TCP uses poll() to sync with client's events as well as output event from itself, so
-that we can use non-block socket operations to reduce the latency. POLLIN means there
-are message from client and POLLOUT means we are ready to send message/retransmission
-packets to client.
-
-poll main loop pseudo code:
-void check_clients(struct server_status_data *sdata)
-{
- poll_events = poll(&(sdata->ds[1]), sdata->ccount - 1, timeout);
-
- /* check all connected clients */
- for (sdata->cindex = 1; sdata->cindex < sdata->ccount; sdata->cindex++) {
- ds = &(sdata->ds[sdata->cindex]);
- if (!ds->revents) {
- continue;
- }
-
- if (ds->revents & (POLLERR|POLLHUP|POLLNVAL)) {
- handle_error_event(sdata);
- } else if (ds->revents & (POLLIN|POLLPRI)) {
- handle_pullin_event(sdata); // may set POLLOUT into ds->events
- // to trigger handle_pullout_event().
- } else if (ds->revents & POLLOUT) {
- handle_pullout_event(sdata);
- }
- }
-}
-
-For TCP, since the message from client may not complete and send data may be also
-interrupted due to non-block fashion, there should be one send message queue and a
-receive message queue on the server side for each client (client do not use non-block
-operations).
-
-TCP message queue definition:
-
-struct tcpq {
- struct qmsg *head, *tail;
- long count; /* message count in a queue */
- long size; /* Total data size of a queue */
-};
-
-TCP message queue item definition:
-
-struct qmsg {
- struct qmsg *next;
- void *data;
- long size;
-};
-
-TCP message queue API:
-
-// Allocate and init a queue.
-struct tcpq * tcpq_queue_init(void);
-
-// Free a queue.
-void tcpq_queue_free(struct tcpq *q);
-
-// Return queue length.
-long tcpq_queue_dsize(struct tcpq *q);
-
-// queue new message to tail.
-void tcpq_queue_tail(struct tcpq *q, void *data, long size);
-
-// queue message that cannot be sent currently back to queue head.
-void tcpq_queue_head(struct tcpq *q, void *data, long size);
-
-// get one piece from queue head.
-void * tcpq_dequeue_head(struct tcpq *q, long *size);
-
-// Serialize all pieces of a queue, and move it out of queue, to ease the further
-//operation on it.
-void * tcpq_dqueue_flat(struct tcpq *q, long *size);
-
-// Serialize all pieces of a queue, do not move it out of queue, to ease the further
-//operation on it.
-void * tcpq_queue_flat_peek(struct tcpq *q, long *size);
diff --git a/docs/developer/spec/multicast.rst b/docs/developer/spec/multicast.rst
deleted file mode 100644
index ba314d3a..00000000
--- a/docs/developer/spec/multicast.rst
+++ /dev/null
@@ -1,190 +0,0 @@
-Requirement
-===========
-1. When deploying a large OPNFV/OpenStack cluster, we would like to take the advantage of UDP
-multicast to prevent the network bottleneck when distributing Kolla container from one
-Installer Server to all target hosts by using unicast.
-
-2. When it comes to auto scaling (extension) of compute nodes, use unicast is acceptable, since
-the number of nodes in this condition is usually small.
-
-The basic step to introduce multicast to deployment is:
-a. Still setup the monopolistic docker registry server on Daisy server as a failsafe.
-b. Daisy server, as the multicast server, prepares the image file to be transmitted, and count
-how many target hosts(as the multicast clients)that should receive the image file
-simultaneously.
-c. Multicast clients tell the multicast server about ready to receive the image.
-d. Multicast server transmits image over UDP multicast channel.
-e. Multicast clients report success after received the whole image.
-f. Setup docker registry server on each target hosts based upon received docker image.
-g. Setup Kolla ansible to use 127.0.0.1 as the registry server IP so that the real docker
-container retrieving network activities only take place inside target hosts.
-
-
-Design
-======
-
-Methods to achieve
-------------------
-
-TIPC
-++++
-
-TIPC or its wrapper such as ZeroMQ is good at multicast, but it is not suitable as an
-installer:
-1. The default TIPC kernel module equipped by CentOS7(kernel verison 3.10) is NOT stable
-especially in L3 multicast(although we can use L2 multicast, but the network will be limited to
-L2). If errors happen, it is hard for us to recover a node from kernel panic.
-
-2. TIPC's design is based on a stable node cluster environment, esp in Lossless Ethernet. But
-the real environment is generally not in that case. When multicast is broken, Installer should
-switch to unicast, but TIPC currently do not have such capability.
-
-Top level design
-----------------
-1. There are two kinds of thread on the server side, one is UDP multicast thread the other is
-TCP sync/retransmit thread. There will be more than one TCP threads since one TCP thread can
-only serve a limited client (say 64~128) in order to limit the CPU load and unicast retransmit
-network usage.
-
-2. There is only one thread on client side.
-
-3. All the packets that a client lost during UDP multicast will be request by client to the TCP
-thread and resend by using TCP unicast, if unicast still cannot deliver the packets successfully,
-the client will failback to using the monopolistic docker registry server on Daisy server as a
-failsafe option.
-
-4. Each packet needs checksum.
-
-
-UDP Server Design (runs on Daisy Server)
-----------------------------------------
-
-1. Multicast group IP and Port should be configurable, as well as the interface that will be
-used as the egress of the multicast packets. The user will pass the interface's IP as the
-handle to find the egress.
-
-2. Image data to be sent is passed to server through stdin.
-
-3. Consider the size of image is large (xGB), the server cannot pre-allocate whole buffer to
-hold all image at once. Besides, since the data is from stdin and the actual length is
-unpredictable. So the server should split the data into small size buffers and send to the
-clients one by one. Furthermore, buffer shall be divided into packets which size is MTU
-including the UDP/IP header. Then the buffer size can be , for example 1024 * MTU including the
-UDP/IP header.
-
-4. After sending one buffer to client the server should stop and get feedback from client to
-see if all clients have got all packets in that buffer. If any clients lost any buffer, client
-should request the server to resend packets from a more stable way(TCP).
-
-5. when got the EOF from stdin, server should send a buffer which size is 0 as an EOF signal to
-the client to let it know about the end of sending.
-
-
-TCP Server Design (runs on Daisy Server)
-----------------------------------------
-
-1. All TCP server threads and the only one UDP thread share one process. The UDP thread is the
-parent thread, and the first TCP thread is the child, while the second TCP thread is the
-grandchild, and so on. Thus, for each TCP thread, there is only one parent and at most one
-child.
-
-2. TCP thread accepts the connect request from client. The number of client is predefined by
-server cmdline parameter. Each TCP thread connect with at most ,say 64 clients, if there are
-more clients to be connected to, then a child TCP thread is spawned by the parent.
-
-3. Before UDP thread sending any buffer to client, all TCP threads should send UDP multicast
-IP/Port information to their clients beforehand.
-
-4. During each buffer sending cycle, TCP threads send a special protocol message to tell
-clients about the size/id of the buffer and id of each packet in it. After getting
-acknowledgements from all clients, TCP threads then signal the UDP thread to start
-multicasting buffer over UDP. After multicasting finished, TCP threads notifies clients
-multicast is done, and wait acknowledgements from clients again. If clients requests
-retransmission, then it is the responsibility of TCP threads to resend packets over unicast.
-If no retransmission needed, then clients should signal TCP threads that they are ready for
-the next buffer to come.
-
-5. Repeat step 4 if buffer size is not 0 in the last round, otherwise, TCP server shutdown
-connection and exit.
-
-
-Server cmdline usage example
-----------------------------
-
-./server <local_ip> <number_of_clients> [port] < kolla_image.tgz
-
-<local_ip> is used here to specify the multicast egress interface. But which interface will be
-used by TCP is leaved to route table to decide.
-<number_of_clients> indicates the number of clients , thus the number of target hosts which
-need to receive the image.
-[port] is the port that will be used by both UDP and TCP. Default value can be used if user
-does not provide it.
-
-
-Client Design(Target Host side)
---------------------------------
-
-1. Each target hosts has only one client process.
-
-2. Client connect to TCP server according to the cmdline parameters right after start up.
-
-3. After connecting to TCP server, client first read from TCP server the multicast group
-information which can be used to create the multicast receive socket then.
-
-4. During each buffer receiving cycle, the client first read from TCP server the buffer info,
-prepare the receive buffer, and acknowledge the TCP server that it is ready to receive. Then,
-client receive buffer from the multicast socket until TCP server notifying the end of
-multicast. By compare the buffer info and the received packets, the client knows whether to
-send the retransmission request or not and whether to wait retransmission packet or not.
-After all packets are received from UDP/TCP, the client eventually flush buffer to stdout
-and tells the TCP server about ready to receive the next buffer.
-
-5. Repeat step 4 if buffer size is not 0 in the last round, otherwise, client shutdowns
-connection and exit.
-
-Client cmdline usage example
-----------------------------
-
-./client <local_ip> <server_ip> [port] > kolla_image.tgz
-
-<local_ip> is used here to specify the multicast ingress interface. But which interface
-will be used by TCP is leaved to route table to decide.
-<server_ip> indicates the TCP server IP to be connected to.
-[port] is the port that will be used by both connect to TCP server and receive multicast
-data.
-
-
-Collaboration diagram among UDP Server, TCP Server(illustrate only one TCP thread)
-and Clients:
-
-
-UDP Server TCP Server Client
- | | |
-init mcast group
-init mcast send socket
- ---------------------------------->
- accept clients
- <------------------------connet------------------
- --------------------send mcast group info------->
- <----------------------------------
- state = PREP
-do {
-read data from stdin
-prepare one buffer
- ----------------------------------->
- state = SYNC
- -------------------send buffer info-------------->
- <----------------------send ClIENT_READY-----------
- <----------------------------------
- state = SEND
-
- ================================================send buffer over UDP multicast======>
- ----------------------------------->
- -----------------------send SERVER_SENT----------->
- [<-------------------send CLIENT_REQUEST----------]
- [--------------send buffer over TCP unicast------>]
- flush buffer to stdout
- <-------------------send CLIENT_DONE---------------
- <----------------------------------
- state = PREP
-while (buffer.len != 0)