COSC 6377 - Computer Networks

Fall 2012

MW 1:00-2:30pm at PGH376

Instructor

Omprakash Gnawali

Project - 1

Due: October 24, 2012

Web Proxy You Don't Want to Use

In this project, we will build an HTTP proxy but we will add custom enhancements.

Web proxies allow caching of webpages so that many local clients can access the same webpage from a nearby server without having to contact the remote server. For example, we have clients A, B, and C in our local network. A accesses a webpage: http://www2.cs.uh.edu/~gnawali/. If node A was connected to the Internet through a proxy server, the request first goes to the proxy server. The proxy server then fetches the page, caches it, and sends it to node A. Later, when node B and node C request the same page, if they are also using the same proxy server, the proxy server will return the cached page. Thus, we were able to satisfy three requests using a single remote fetch.

Using a web browser with a proxy

Depending on the browser you are using, the sequence of steps to configure to use a proxy will be different. You should find these settings somewhere in Preferences / Connection Settings. There are many types of proxy. You want to use HTTP proxy. You need to specify two pieces of information: hostname and port. Once you configure the proxy server, every time you type a URL in your browser, the request will go to the proxy server, not to the server on the URL. Your proxy server should run at that hostname and listen for incoming connection on that port.

We will test your proxy server using Chrome, Firefox, and a mobile browser of your choice. In theory, we should be able to use any web browser as long as it can use a standard-compliant proxy server.

Components of a proxy server Here are the various tasks performed by a proxy server:

Listens for incoming HTTP requests from a web browser.
Sends HTTP request to a remote server, receives HTTP reply from the remote server
Caches the content and metadata
Replies with an HTTP request to the client

A reasonable way to start this project is by building the components to perform these tasks and testing them separately. For example, the first task is to listen to an incoming HTTP request from a client. For this task, you need to write a socket server. Then, you can print whatever data the server receives from the client. If it looks like your server is receiving correct HTTP request from the client, you are done with the first task.

Caching

When you fetch a content from the remote server, you should cache the content before sending it to the client. When new request for the same content is receive, you should reply from the cache. There are exceptions to this flow:

The content has expired
The content is marked non cacheable

You need to design a persistent storage for caching. Most likely you will need to store some metadata and the actual content. If your proxy crashes and restarts, the cache should remain intact. Your proxy should save the data to the directory from which it is executed.

Useful Tools

You will find netcat useful for this project. You can use netcat to create a quick server to see the data sent by the client. You can also send data to the client interactively using netcat.

You might also might telnet useful. For example, you can send HTTP request to a web server using telnet.

Experimenting with HTTP using these tools will speed up your exploration of essential parts of HTTP protocol that you will need to understand concretely for this project.

HTTP

Your browser, proxy, and the webserver communicate with each other using HTTP. We can just implement the most basic part of HTTP version 1.0. RFC 1945 describes the protocol. It is adequate if your proxy server can handle the basic GET request. The remote server uses the HTTP header to send hints to the proxy server on whether the content is cacheable and if it is when the content expires. You need to find what these header fields are and their correct values and how to interpret them. Here is one refernce that gives an overview of HTTP [local copy].

Extensions

We will develop three extensions to this proxy.

Serving Ads. Change your proxy so it serves ads on the pages served to the client. One of the challenges in serving ads is deciding what ads to serve to which clients. We will use a simple logic. If a client accesses any page within uh.edu through your proxy, we insert the following ad:

Buy your textbooks at
www.cheapuhbooks.com.

The ads should be prominently displayed somewhere on the webpage and must be inside a table or a box separate from the text on the page.

If a client accesses any page within rice.edu through your proxy, it should insert the following ad:

Buy your textbooks at
www.cheapricebooks.com.

Censorship. Your proxy should censor any mention of any fruit on a webpage. If we have a sentence "I bought an apple." on a webpage, we should replace that sentence with "I bought an xxxxx."

Customization. Your proxy should scale large images to a smaller size (max height or width at 400 pixels) when a mobile browser accesses an image with larger dimension.

Launching your proxy server

We will use command line arguments to configure your proxy when we execute it. Here is the syntax:

./proxy -listen port

Example:

./proxy -listen 8080

Project Submission

Put your source code into a folder with the name: uhid_p1, where uhid is the prefix of your .uh.edu email address. There should be a single Makefile in that directory. When we run make on that directory, it should produce one executable proxy. Put a README.txt in the directory describing anything unusual (e.g., limitations) of your implementation. Then, zip the directory and upload the zip file using Blackboard.