COSC 6377 : Computer Networks

Spring 2014

MW 1-230pm at SEC 202

Project 2: Replica Selection

Due: 5/3/2014

In this project, we will build a system that can select the best replica for a given data to maximize the download performance.

Replica

In many distributed systems, such as storage systems in the cloud, the same data might be located in multiple servers in multiple geographical locations. Such copies of data are called replicas. Replicas are used to provide redundancy and reliability to storage system, i.e., if one copy of the data is lost, we still have another copy. We will not explore this aspect of replica in this project. Instead, we will explore the use of replica to improve data download performance.

In this project, we will create multiple replicas of a given piece of data. Our system will perform measurements to determine the best replica from which our client should download the data. We are primarily interested in two metrics: throughput and latency.

Approach 1: Selection by the Client

A client connects to a "registry" server to retrieve a list of replicas. Then the client performs a series of measurements to determine the best replica. Finally, the client connects to the best replica and download the file. We call this mode of operation "selection=client".

Approach 2: Selection by the Network

A client connects to a "registry" server with a request for the best replica. The server performs a series of measurements and estimation and determine the best replica for the client. The server sends the address of the replica to the client. The client downloads the file from that replica. We call this mode of operation "selection=network".

System Components

client

The client is a socket program that interacts with the registry in either modes to determine the best replica. Then it connects with the selected replica and downloads the file. In the "selection=client" mode, the client must perform latency and throughput measurements. Here is an example execution of the client:

./client -ra 1.2.3.4 -rp 9999 -selection=client -optimize=latency -file=out.png

registry address: 1.2.3.4
registry port: 9999
selection mode: client
metric to optimize: latency
filename: cs.png

connecting to registry at 1.2.3.4 port 9999
connected

requesting a list of replicas
replicas: (1.2.3.4,10000), (1.2.3.4,10001), (1.2.3.4,10002)

performing measurements
replica 1.2.3.4 (10000) latency 12ms throughput 0.7MB/s
replica 1.2.3.4 (10001) latency 14ms throughput 0.8MB/s
replica 1.2.3.4 (10002) latency 16ms throughput 1.0MB/s

best replica: 1.2.3.4 (10000)

connecting to replica at 1.2.3.4 (10000)
downloading out.png
saved out.png

client supports -ra and rp flags which specify the registry server to which the client should connect. The -selection flag can be either client or network for the two modes of replica selection. The -optimize flag can be either latency or throughput for the metric we wish to optimize when we perform replica selection. The -file flag is used in the request to the replica and also as the name of the file to which the downloaded data should be saved. The client terminates after performing the download and saving the file.

registry

The register listens to the incoming connections from the replicas and the clients. The registy keeps track of all the active replicas using timeouts. In the network mode of selection, the register makes some measurements and some estimate to determine the replica for the client. Here is an example execution of the register:

./registry -p 9999 -selection=client -timeout=30

listening on: 9999
selection mode: client

registration from replica (1.2.3.4,10000)
registration from replica (1.2.3.4,10001)
registration from replica (1.2.3.4,10002)

(... the registration message repeated several times ...)

received query from client

sending list of replicas to the client: (1.2.3.4,10000), (1.2.3.4,10001), (1.2.3.4,10002)

(... the registration message repeated several times ...)

Here is an example execution in the network selection mode:

./registry -p 9999 -selection=network -metric=throughput -timeout=30

listening on: 9999
selection mode: network
metric to optimzie: throughput

registration from replica (1.2.3.4,10000)
registration from replica (1.2.3.4,10001)
registration from replica (1.2.3.4,10002)

(... the registration message repeated several times ...)

received query from client

(... display messages depending on the measurements you perform ...)

send the name of the best replica to the client (1.2.3.4, 10002)

(... the registration message repeated several times ...)

timeout for replica (1.2.3.4,10001)

If we terminate one of the replicas, the registry must eventually detect that termination and remove it from the list of replicas to be considered for selection.

replica

The Replica Manager (we will call this program "replica") stores files and sends it to the client upon receiving a request. The replica must be discovered by a client before it can transfer the file to the client. The replica registers with the registry. Here is an example execution of replica:

./replica -ra 1.2.3.4 -rp 9999 -keepalive 20

connecting to registry at 1.2.3.4 (9999)
sending keepalive
sending keepalive
(... repeats ...)

request for file out.png from client
sending out.png
sent out.png
      

The replica looks in the current directory for the requested file. It is a good idea to launch replica from a separate folder so the file downloaded by the client does not overwrite the original file. You can use the diff command to determine if the downloaded file is identical to the original file.

The best way to test these two programs is by running them on different terminals and looking at the output from each program to make sure they are working correctly.

Measurements

We need to measure latency and throughput between different components. To enable these measurements, the client, registry, and replica will support a "download" command. When these components receive the download command, they send the specified number of synthetic data to the requesting entity. An entity may receive the measurement request at any time. This is what the console ouput should look like at the requesting entity:

time now is: xxxxx
sending download request for 10 bytes
received 10 bytes
time now is: yyyyy
latency: aaaaa
throughput: bbbbb
      

The entity that receives the download request may display the following on its stdout:

received download request for 10 bytes
sending 10 bytes
      

Protocol

The three types of entities exchange various types of messages. You should fully specify the protocol before you start implementation.

Submission

Your submission should be a .tar.gz of the "uhemailprefix_p2" directory where you have your source code and a Makefile. If your UH email address is abc@uh.edu, then the directory is abc_p2. Also include a textfile called README with a short description of how to run the programs, what limitations exist (if any) and a description of your protocol in the format used by IETF documents. You will submit your .tar.gz on Moodle.

To grade your submission, we will copy your .tar.gz to bayou, untar, change to that directory and type "make". It should generate three executables client, registry, and replica.