COSC 6377 : Computer Networks

Fall 2017

TR 1-230pm at PGH 232

Homework 1: To CDN or Not

Due: 9/1/2017

In this homework, we will write a Python program "cdnornot.py" to analyze file download performance and make recommendation about which ones should be cached for faster access over the Internet.

Your program should read a URLlist file, the name of which is provided as an argument on the command line. The URLlist file has a list of URLs to image files, one line per URL.

Your program should download these images over the Internet. Feel free to use urllib or other built-in Python libraries to download the file. When you download the file, you need to "record" how long it took to download each file in ms granularity.

The files that took a long time to download are good candidates for caching, or distribution over a Content Distribution Network. In this homework, we will use a simple rule to decide which files need to be distributed using a CDN.

Sort the files by the time it took to download them. The files in the bottom half are candidates for distribution using a CDN. Create a folder called "SlowDownloaded" in the filesystem (as a sub-directory under the current directory), move the files in the bottom 50% download latency to this folder. Create another folder called "FastDownloaded" and move the remainder of the files to that folder.

Your program should now print a report that has three columns, first for the FastDownloaded files following by a section for the SlowDownloaded files. Each line should have URL, size of the image, and download latency in ms.

The end result of running this program: - the report with two sections on the terminal - A new folder SlowDownloaded with files that were slow to download - A new folder FastDownloaded with files that were quick to download

How to run the program?


python cdnornot.py myurllist

A sample output below.

Fast downloaded files
URL, Size(bytes), Download time(ms)

http://myurl/, 1024, 55
http://myotherurl/, 1025, 56

----------------

Slow downloaded files
URL, Size(bytes), Download time(ms)

http://myurlslow/, 1024, 555
http://myotherurlslow/, 1025, 556


Run the program three or more times. Are reports from the three runs identical? Explain the similarity and dissimilarity across the runs as seen on the reports. This report should be a file called "hw1report.pdf" in your git repository. It should not be more than one page in length, can contain graphs if appropriate, but with no more than 150 words of text total on the page.

Submission

Please upload your code using git invitation for hw1. Please include a README that describes the author, contact information, a short description of the software, and finally limitations of your implementation.

To grade your submission, we will run your code on bayou so you should make sure your code runs on bayou. We will git clone, and type "python cdnornot.py urllistfile" to run your program.