diff options
author | Zhijiang Hu <hu.zhijiang@zte.com.cn> | 2017-04-07 01:36:45 +0000 |
---|---|---|
committer | Gerrit Code Review <gerrit@opnfv.org> | 2017-04-07 01:36:45 +0000 |
commit | d24bd81e5689c5d8531581fb8e3604d6851cc094 (patch) | |
tree | 5701dd2eed93a794bd2457a144a367981930880d /docs/developer/spec/multicast.rst | |
parent | 8f52576ce2063e1e742e1e6fafd66f6979bb2d10 (diff) | |
parent | a051fc3bf0ea4cd589b663e974517633563a4ed8 (diff) |
Merge "Add multicast spec"
Diffstat (limited to 'docs/developer/spec/multicast.rst')
-rw-r--r-- | docs/developer/spec/multicast.rst | 190 |
1 files changed, 190 insertions, 0 deletions
diff --git a/docs/developer/spec/multicast.rst b/docs/developer/spec/multicast.rst new file mode 100644 index 00000000..ba314d3a --- /dev/null +++ b/docs/developer/spec/multicast.rst @@ -0,0 +1,190 @@ +Requirement +=========== +1. When deploying a large OPNFV/OpenStack cluster, we would like to take the advantage of UDP +multicast to prevent the network bottleneck when distributing Kolla container from one +Installer Server to all target hosts by using unicast. + +2. When it comes to auto scaling (extension) of compute nodes, use unicast is acceptable, since +the number of nodes in this condition is usually small. + +The basic step to introduce multicast to deployment is: +a. Still setup the monopolistic docker registry server on Daisy server as a failsafe. +b. Daisy server, as the multicast server, prepares the image file to be transmitted, and count +how many target hosts(as the multicast clients)that should receive the image file +simultaneously. +c. Multicast clients tell the multicast server about ready to receive the image. +d. Multicast server transmits image over UDP multicast channel. +e. Multicast clients report success after received the whole image. +f. Setup docker registry server on each target hosts based upon received docker image. +g. Setup Kolla ansible to use 127.0.0.1 as the registry server IP so that the real docker +container retrieving network activities only take place inside target hosts. + + +Design +====== + +Methods to achieve +------------------ + +TIPC +++++ + +TIPC or its wrapper such as ZeroMQ is good at multicast, but it is not suitable as an +installer: +1. The default TIPC kernel module equipped by CentOS7(kernel verison 3.10) is NOT stable +especially in L3 multicast(although we can use L2 multicast, but the network will be limited to +L2). If errors happen, it is hard for us to recover a node from kernel panic. + +2. TIPC's design is based on a stable node cluster environment, esp in Lossless Ethernet. But +the real environment is generally not in that case. When multicast is broken, Installer should +switch to unicast, but TIPC currently do not have such capability. + +Top level design +---------------- +1. There are two kinds of thread on the server side, one is UDP multicast thread the other is +TCP sync/retransmit thread. There will be more than one TCP threads since one TCP thread can +only serve a limited client (say 64~128) in order to limit the CPU load and unicast retransmit +network usage. + +2. There is only one thread on client side. + +3. All the packets that a client lost during UDP multicast will be request by client to the TCP +thread and resend by using TCP unicast, if unicast still cannot deliver the packets successfully, +the client will failback to using the monopolistic docker registry server on Daisy server as a +failsafe option. + +4. Each packet needs checksum. + + +UDP Server Design (runs on Daisy Server) +---------------------------------------- + +1. Multicast group IP and Port should be configurable, as well as the interface that will be +used as the egress of the multicast packets. The user will pass the interface's IP as the +handle to find the egress. + +2. Image data to be sent is passed to server through stdin. + +3. Consider the size of image is large (xGB), the server cannot pre-allocate whole buffer to +hold all image at once. Besides, since the data is from stdin and the actual length is +unpredictable. So the server should split the data into small size buffers and send to the +clients one by one. Furthermore, buffer shall be divided into packets which size is MTU +including the UDP/IP header. Then the buffer size can be , for example 1024 * MTU including the +UDP/IP header. + +4. After sending one buffer to client the server should stop and get feedback from client to +see if all clients have got all packets in that buffer. If any clients lost any buffer, client +should request the server to resend packets from a more stable way(TCP). + +5. when got the EOF from stdin, server should send a buffer which size is 0 as an EOF signal to +the client to let it know about the end of sending. + + +TCP Server Design (runs on Daisy Server) +---------------------------------------- + +1. All TCP server threads and the only one UDP thread share one process. The UDP thread is the +parent thread, and the first TCP thread is the child, while the second TCP thread is the +grandchild, and so on. Thus, for each TCP thread, there is only one parent and at most one +child. + +2. TCP thread accepts the connect request from client. The number of client is predefined by +server cmdline parameter. Each TCP thread connect with at most ,say 64 clients, if there are +more clients to be connected to, then a child TCP thread is spawned by the parent. + +3. Before UDP thread sending any buffer to client, all TCP threads should send UDP multicast +IP/Port information to their clients beforehand. + +4. During each buffer sending cycle, TCP threads send a special protocol message to tell +clients about the size/id of the buffer and id of each packet in it. After getting +acknowledgements from all clients, TCP threads then signal the UDP thread to start +multicasting buffer over UDP. After multicasting finished, TCP threads notifies clients +multicast is done, and wait acknowledgements from clients again. If clients requests +retransmission, then it is the responsibility of TCP threads to resend packets over unicast. +If no retransmission needed, then clients should signal TCP threads that they are ready for +the next buffer to come. + +5. Repeat step 4 if buffer size is not 0 in the last round, otherwise, TCP server shutdown +connection and exit. + + +Server cmdline usage example +---------------------------- + +./server <local_ip> <number_of_clients> [port] < kolla_image.tgz + +<local_ip> is used here to specify the multicast egress interface. But which interface will be +used by TCP is leaved to route table to decide. +<number_of_clients> indicates the number of clients , thus the number of target hosts which +need to receive the image. +[port] is the port that will be used by both UDP and TCP. Default value can be used if user +does not provide it. + + +Client Design(Target Host side) +-------------------------------- + +1. Each target hosts has only one client process. + +2. Client connect to TCP server according to the cmdline parameters right after start up. + +3. After connecting to TCP server, client first read from TCP server the multicast group +information which can be used to create the multicast receive socket then. + +4. During each buffer receiving cycle, the client first read from TCP server the buffer info, +prepare the receive buffer, and acknowledge the TCP server that it is ready to receive. Then, +client receive buffer from the multicast socket until TCP server notifying the end of +multicast. By compare the buffer info and the received packets, the client knows whether to +send the retransmission request or not and whether to wait retransmission packet or not. +After all packets are received from UDP/TCP, the client eventually flush buffer to stdout +and tells the TCP server about ready to receive the next buffer. + +5. Repeat step 4 if buffer size is not 0 in the last round, otherwise, client shutdowns +connection and exit. + +Client cmdline usage example +---------------------------- + +./client <local_ip> <server_ip> [port] > kolla_image.tgz + +<local_ip> is used here to specify the multicast ingress interface. But which interface +will be used by TCP is leaved to route table to decide. +<server_ip> indicates the TCP server IP to be connected to. +[port] is the port that will be used by both connect to TCP server and receive multicast +data. + + +Collaboration diagram among UDP Server, TCP Server(illustrate only one TCP thread) +and Clients: + + +UDP Server TCP Server Client + | | | +init mcast group +init mcast send socket + ----------------------------------> + accept clients + <------------------------connet------------------ + --------------------send mcast group info-------> + <---------------------------------- + state = PREP +do { +read data from stdin +prepare one buffer + -----------------------------------> + state = SYNC + -------------------send buffer info--------------> + <----------------------send ClIENT_READY----------- + <---------------------------------- + state = SEND + + ================================================send buffer over UDP multicast======> + -----------------------------------> + -----------------------send SERVER_SENT-----------> + [<-------------------send CLIENT_REQUEST----------] + [--------------send buffer over TCP unicast------>] + flush buffer to stdout + <-------------------send CLIENT_DONE--------------- + <---------------------------------- + state = PREP +while (buffer.len != 0) |