In this project, we will build a system that can select the best replica for a given data to maximize the download performance.
In many distributed systems, such as storage systems in the cloud, the same data might be located in multiple servers in multiple geographical locations. Such copies of data are called replicas. Replicas are used to provide redundancy and reliability to storage system, i.e., if one copy of the data is lost, we still have another copy. We will not explore this aspect of replica in this project. Instead, we will explore the use of replica to improve data download performance.
In this project, we will create multiple replicas of a given piece of data. Our system will perform measurements to determine the best replica from which our client should download the data. We are primarily interested in two metrics: throughput and latency.
A client connects to a "registry" server to retrieve a list of replicas. Then the client performs a series of measurements to determine the best replica. Finally, the client connects to the best replica and download the file. We call this mode of operation "selection=client".
A client connects to a "registry" server with a request for the best replica. The server performs a series of measurements and estimation and determine the best replica for the client. The server sends the address of the replica to the client. The client downloads the file from that replica. We call this mode of operation "selection=network".
The client is a socket program that interacts with the registry in either modes to determine the best replica. Then it connects with the selected replica and downloads the file. In the "selection=client" mode, the client must perform latency and throughput measurements. Here is an example execution of the client:
./client -ra 1.2.3.4 -rp 9999 -selection=client -optimize=latency -file=out.png registry address: 1.2.3.4 registry port: 9999 selection mode: client metric to optimize: latency filename: cs.png connecting to registry at 1.2.3.4 port 9999 connected requesting a list of replicas replicas: (1.2.3.4,10000), (1.2.3.4,10001), (1.2.3.4,10002) performing measurements replica 1.2.3.4 (10000) latency 12ms throughput 0.7MB/s replica 1.2.3.4 (10001) latency 14ms throughput 0.8MB/s replica 1.2.3.4 (10002) latency 16ms throughput 1.0MB/s best replica: 1.2.3.4 (10000) connecting to replica at 1.2.3.4 (10000) downloading out.png saved out.png
client supports -ra and rp flags which specify the registry server to which the client should connect. The -selection flag can be either client or network for the two modes of replica selection. The -optimize flag can be either latency or throughput for the metric we wish to optimize when we perform replica selection. The -file flag is used in the request to the replica and also as the name of the file to which the downloaded data should be saved. The client terminates after performing the download and saving the file.
The register listens to the incoming connections from the replicas and the clients. The registy keeps track of all the active replicas using timeouts. In the network mode of selection, the register makes some measurements and some estimate to determine the replica for the client. Here is an example execution of the register:
./registry -p 9999 -selection=client -timeout=30 listening on: 9999 selection mode: client registration from replica (1.2.3.4,10000) registration from replica (1.2.3.4,10001) registration from replica (1.2.3.4,10002) (... the registration message repeated several times ...) received query from client sending list of replicas to the client: (1.2.3.4,10000), (1.2.3.4,10001), (1.2.3.4,10002) (... the registration message repeated several times ...)
Here is an example execution in the network selection mode:
./registry -p 9999 -selection=network -metric=throughput -timeout=30 listening on: 9999 selection mode: network metric to optimzie: throughput registration from replica (1.2.3.4,10000) registration from replica (1.2.3.4,10001) registration from replica (1.2.3.4,10002) (... the registration message repeated several times ...) received query from client (... display messages depending on the measurements you perform ...) send the name of the best replica to the client (1.2.3.4, 10002) (... the registration message repeated several times ...) timeout for replica (1.2.3.4,10001)
If we terminate one of the replicas, the registry must eventually detect that termination and remove it from the list of replicas to be considered for selection.
The Replica Manager (we will call this program "replica") stores files and sends it to the client upon receiving a request. The replica must be discovered by a client before it can transfer the file to the client. The replica registers with the registry. Here is an example execution of replica:
./replica -ra 1.2.3.4 -rp 9999 -keepalive 20 connecting to registry at 1.2.3.4 (9999) sending keepalive sending keepalive (... repeats ...) request for file out.png from client sending out.png sent out.png
The replica looks in the current directory for the requested file. It is a good idea to launch replica from a separate folder so the file downloaded by the client does not overwrite the original file. You can use the diff command to determine if the downloaded file is identical to the original file.
The best way to test these two programs is by running them on different terminals and looking at the output from each program to make sure they are working correctly.
We need to measure latency and throughput between different components. To enable these measurements, the client, registry, and replica will support a "download" command. When these components receive the download command, they send the specified number of synthetic data to the requesting entity. An entity may receive the measurement request at any time. This is what the console ouput should look like at the requesting entity:
time now is: xxxxx sending download request for 10 bytes received 10 bytes time now is: yyyyy latency: aaaaa throughput: bbbbb
The entity that receives the download request may display the following on its stdout:
received download request for 10 bytes sending 10 bytes
The three types of entities exchange various types of messages. You should fully specify the protocol before you start implementation.
Your submission should be a .tar.gz of the "uhemailprefix_p2" directory where you have your source code and a Makefile. If your UH email address is abc@uh.edu, then the directory is abc_p2. Also include a textfile called README with a short description of how to run the programs, what limitations exist (if any) and a description of your protocol in the format used by IETF documents. You will submit your .tar.gz on Moodle.
To grade your submission, we will copy your .tar.gz to bayou, untar, change to that directory and type "make". It should generate three executables client, registry, and replica.