COSC 6377 - Computer Networks

Fall 2011

MW 2:30-4:00pm at AH301

InstructorOmprakash Gnawali

Project - 2

Due: December 4, 2011

Web Proxy with a Twist

In this project, we will build an HTTP proxy but we will add custom enhancements.

Web proxies allow caching of webpages so that many local clients can access the same webpage from a nearby server without having to contact the remote server. For example, we have clients A, B, and C in our local network. A accesses a webpage: http://www2.cs.uh.edu/~gnawali/. If node A was connected to the Internet through a proxy server, the request first goes to the proxy server. The proxy server then fetches the page, caches it, and sends it to node A. Later, when node B and node C request the same page, if they are also using the same proxy server, the proxy server will return the cached page. Thus, we were able to satisfy three requests using a single remote fetch.

Using a web browser with a proxy

Depending on the browser you are using, the sequence of steps to configure to use a proxy will be different. You should find these settings somewhere in Preferences / Connection Settings. There are many types of proxy. You want to use HTTP proxy. You need to specify two pieces of information: hostname and port. Once you configure the proxy server, every time you type a URL in your browser, the request will go to the proxy server, not to the server on the URL. Your proxy server should run at that hostname and listen for incoming connection on that port.

We will test your proxy server using Chrome. For testing, you should be able to use any web browser as long as it can use a standard compliant proxy server.

Components of a proxy server Here are the various tasks performed by a proxy server:

A reasonable way to start this project is by building the components to perform these tasks and testing them separately. For example, the first task is to listen to an incoming HTTP request from a client. For this task, you need to write a socket server. Then, you can print whatever data the server receives from the client. If it looks like your server is receiving correct HTTP request from the client, you are done with the first task.

Caching

When you fetch a content from the remote server, you should cache the content before sending it to the client. When new request for the same content is receive, you should reply from the cache. There are exceptions to this flow: You need to design a persistent storage for caching. Most likely you will need to store some metadata and the actual content. If your proxy crashes and restarts, the cache should remain intact.

Useful Tools

You will find netcat useful for this project. You can use netcat to create a quick server to see the data sent by the client. You can also send data to the client interactively using netcat.

You might also might telnet useful. For example, you can send HTTP request to a web server using telnet.

Experimenting with HTTP using these tools will speed up your exploration of essential parts of HTTP protocol that you will need to understand concretely for this project.

HTTP

Your browser, proxy, and the webserver communicate with each other using HTTP. We can just implement the most basic part of HTTP version 1.0. RFC 1945 describes the protocol. It is adequate if your proxy server can handle the basic GET request. The remote server uses the HTTP header to send hints to the proxy server on whether the content is cacheable and if it is when the content expires. You need to find what these header fields are and their correct values and how to interpret them. Here is one refernce that gives an overview of HTTP [local copy].

Extensions

We will develop some extensions to this protocol.

Proxy Chaining. Your HTTP proxy must allows proxy chaining. That is we should be able to configure your proxy to use another proxy upstream.

Data Encryption. Encryption the communication between your proxies. Do this with the encryption algorithm supported by the downstream proxy. In this project, there is only one encryption algorithm. We will use substitution cipher. You compute the ciphertext by adding k to each byte with a wrap around at the byte boundary. The value k is mod 10 of the in port of the downstream proxy (proxy closer to the web browser client) and negotiated with the upstream proxy.

Data Compression. Compression of web pages between your proxies. Do this with the compression algorithm supported by the downstream proxy. In this project, there is only one compression algorithm. We will use run length encoding. For a given input byte stream, you compute the output by tagging each byte with the number of time the particular byte appears consecutively in the intput stream modulo 255. Examples:
InputOutput
abbc1a2b1c
a(repeated 500 times)255a245a

Note that in the examples above, the number preceding the character is a byte, not multiple bytes. For example, when we write the output for the second input above, we will write four bytes to the output: the first byte with the value 255, the second byte with the value ord(a), the third byte with the value 245, and finally the fourth byte with the value ord(a). That is four bytes, and not 8 bytes like what it appears in the text representation.

Routing. Routing of requests depending on the URL. Choose different upstream proxy depending on the URL being requested. If the length of the URL is even, we forward the request to the upstream proxy that has even port number. If the length of the URL is odd, we forward the request to the upstream proxy that has odd port number. To test this feature, we will use a scenario that has only two upstream proxies, one on even and the other on odd ports.

Load Balancing. Load balance among multiple upstream proxies.

We will standardize the messages between the proxy servers to request and use these extensions. Standardization of messages allows us to mix and match proxy servers written by multiple teams to create a chain or tree of proxies.

Launching your proxy server

We will use command line arguments to configure your proxy when we execute it. Here is the syntax:
./proxy -listen port -supports plain|compress|encrypt [-server port1 port2 ... portn -use compress|routing|balance|encrypt]

Example 1:

./proxy -listen 8080 -supports plain

Example 2:
./proxy -listen 9001 -supports compress encrypt plain
./proxy -listen 9002 -supports compress plain
./proxy -listen 9003 -supports compress plain
./proxy -listen 9004 -supports compress plain
./proxy -listen 8080 -supports plain -server 9001 9002 9003 9004 -use compress routing

The part starting -server is optional. The first example starts a standalone proxy server at port 8080. You can then use your web browser to connect to this proxy with the hostname of the machine where you launched this proxy and the port specified in the command line. The second example starts a series of proxy servers. It starts four proxy servers at ports 9001..9004 with different capabilities. Then it starts a proxy server at 8080 that supports a plain HTTP request. This proxy does not send the HTTP request directly to the URL in the request; instead it routes the request to one of the proxy servers 9001..9004 and expects that the HTTP requests come back in compressed form. As shown in the examples, -supports and -use may have multiple values.

Project Submission

Create a zip of your code and submit HW through Moodle. In your zip file, there should be a README that mentions the members in your team and what part of the code each member contributed. Please list full instruction on how to run your code. If your code is not complete, please also describe the parts of the project you completed. Each group should have one member submit the project on Moodle.