COSC 4377 - Introduction to Computer Networks

Spring 2012

MW 1:00-2:30pm at PGH347

InstructorOmprakash Gnawali

Homework 2 : HTTP Client

Due: midnight February 1, 2012

In this assignment, we will write an HTTP client.

We will write a program that can download files from the web using the HTTP protocol. Our HTTP client should be able to download text and binary data. The user specifies the URL to the file to be downloaded as a command line argument.

Here is how we might use the tool:

./http_client -u url -o output_file

Example:

./http_cient -u http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/test1.txt -o mytest1.txt

In the example above, the user wanted to download the test1.txt file at the speified URL and save the content to mytest1.txt.

Implementation Guide

Web servers accept requests using a protocol called HTTP. When we type a URL into a web browser, the request to the web server is sent using the HTTP protocol. The program that you write in this assignment will pretend to be a web browser and send an HTTP request to the web server.

When the program is launched from the command line, open a socket to the server and write a valid HTTP request on that socket. The server will respond with an HTTP response. You can read that response using the same socket. An HTTP response has certain message format. You need to parse that message correctly, otherwise we won't know where the message begins or ends. It is helpful for debugging if you printf all the inbound and outbound messages.

You are not allowed to use any http/curl/file download libraries for this assignment because our goal is to learn how to read and write valid HTTP messages.

Testing your program

We will test your program by computing the difference between the file saved by your program and the file in the original URL. Here is how to do it. Downloads the test files using a browser. Download those test files the second time using your program. If your program is downloading the files correctly, there should be no difference between the two downloads (using the browser and your program). You might want to use the command called "diff" in Linux to check if the files are different. We are providing five test cases:
  1. Small text file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/test1.txt
  2. Small binary file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/test2.png
  3. Empty file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/test3.txt
  4. An executable at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/minigzip
  5. An object file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/inffast.o
  6. Makefile at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/Makefile
  7. PDF file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/cosc4377-s12-l01.pdf
  8. PPT file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/Presentation1.pptx
  9. Large binary file at http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles/test4.tif
  10. Non existing file. Your client should not crash. It should print "Page not found" to stdout.
Make sure your program works in these ten test cases. If you have hard-coded the file types depending on the name, we reserve the right to test your code against other test cases.

It is not necessary for your program to work with redirects, persistent connections, etc. If it can download and save static files using HTTP 1.0 that is good enough.

We have packaged all these test cases into a single tarball, which you can download at this link: http://www2.cs.uh.edu/~gnawali/courses/cosc4377-s12/hw2/testfiles.tar.gz

In the tarball, you will also find grade.sh, which will test your code against the first 9 test cases and tell you your grade out of 90. We will test the last case manually.

HTTP

Our HTTP client will use the most basic parts of HTTP version 1.0. RFC 1945 describes the protocol. You need to find what the http header fields are, their correct values, and how to interpret them. Here is one reference that gives an overview of HTTP [local copy].

Compiling and Running

It is perfectly fine for you to work on your own Linux machine. Regardless of where you do your assignment, your assignment must compile and execute on bayou.

Questions

Please provide answers to these questions in the README.txt file.
  1. What is the difference between http_client (what you wrote), curl, and wget? Use those tools and read about them online before you answer this question.
  2. Some web sites allow only certain browsers (e.g., IE) to download content from them. When an HTTP request comes in, those servers check what browser is sending the request. How can you write http_client so it is able to download that content despite that check?
  3. If you were running such a website that needed to restrict its content to a certain browser, how can you prevent programs like http_client from downloading the content from your site?

Submission

Put your source code into a folder with the name: uhid_hw2, where uhid is the prefix of your .uh.edu email address. There should be a single Makefile in that directory. When we run make on that directory, it should produce one executable http_client. Put a README.txt in the directory describing anything unusual (e.g., limitations) of your implementation and answer to the questions above. Then, zip the directory and upload the zip file using Blackboard.