COSC 4377 - Introduction to Computer Networks

Spring 2012

MW 1:00-2:30pm at PGH347

InstructorOmprakash Gnawali

Homework 5 : Networked Service Analysis and Exam Review

Due: midnight February 22, 2012

Question 1 : HTTP Server Analysis

We will survey a list of popular websites and find out which web servers are used by those sites. Grab this list (provided by Google) of top 1000 most-visited websites:

http://www.google.com/adplanner/static/top1000/

For each site on the list, please find out which web server they use. If you want to do this manually, you would connect to the website on port 80, issue a HEAD command, and parse the appropriate header field. If details such as version number is available for the server, please include them in the server name. One challenge in this assignment is automating this discovery process. Feel free to use schell scripts and any programming language you want as long as it will run on bayou without installing any external or additional library. You will also realize that it will require significant effort to automate the last few cases. You should make a judgment if you want to automate those cases or manually determine the server they use on those sites. You should submit the following:

  1. Any script and programs you used to automate to determine the web server used by the sites. How did you grab the list of URL's from the webpage above? Include instruction on how to use your scripts. Which cases broke your automation and how did you handle those cases?

    The instruction and explanation regarding automation and manual handling should be in the file called README.txt.

  2. A text file that lists the site and the server separated by comma. The format should be:
    websiteurl,server
    

    Here are some example lines:

    www.mysite.com,Apache 2.1
    www.yoursite.com,Apache2.1.1
    www.coolsite.com,AOLServer 2.5.1.2
    
    Because there are 1000 sites, you should have 1000 lines in this text file.

    The name of the file should be siteservers.txt

  3. Include a PDF (not MS word or other format) with less than half-page description of which web server seems most popular among the top 1000 sites and why that might be the case. You should also include two bar graphs in the same PDF. One bar graph should list all the unique server and the number of sites that use that server. Second bar graph will present aggregate information: vendor/organization and the number of sites that use server sold/developed by that vendor/organization. For example, you will group Apache 2.1, Apache 2.2, etc. as Apache.

    The name of this file should be q1.pdf.

All these files in response to Question 1 should be inside a folder called q1 inside uhid_hw5 folder.

Question 2 : Moving Averages

Linux provides a way to probe various statistics about the networking stack using a special files under /proc/net. To get a count for the number of bytes transmitted and received on an interface, we can run the following command:
$ cat /proc/net/dev
which will produce the following output:
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo: 22343841    6372    0    0    0     0          0         0 22343841    6372    0    0    0     0       0          0
  eth0: 3889365741 12749462    0   50    0     0          0         0 1407816270 3806761    0    0    0     0       0          0

The output tells the number of bytes received (first column), number of packets received (second column), etc.

We ran a script to take a snapshot of the number of bytes received and transmitted approximately every 100ms and extracted just the bytes received and transmitted into two files: devsnap1.txt and devsnap2.txt. Each line on these files has row number, bytes received, and bytes transmitted. Each row corresponds to a snapshot taken every 100ms.

G1: Draw a line graph that shows time on x-axis and moving average with a window of 1s on y-axis. This graph will have 30 data points because we have 30 seconds of data.

G2: Same as G1, except y-axis should be EWMA with a window of 1s and alpha of 0.8.

G3: Same as G2, except use alpha of 0.2.

Please put G1, G2, and G3 as three different lines on a single graph. Label your lines as G1, G2, and G3. You will have two graphs corresponding to the two data sets. Put the two graphs on a single page.

Please answer the following questions:

  1. How did you compute these averages?
  2. Which of the three averages works better for the data?
  3. When we were taking the snapshots described above, there were some file transfers happening in the background. When did the file transfer start and end? Please annotate these times on the graph. What data rate was used for the transfers? How many total bytes were received and transmitted? How many transfers took place during the experiment?
  4. Comment on the received and transmitted bytes you see. What type of file transfer activity was it?
Your solution to question 2 should be in a single 2-page pdf called q2.pdf with font size no smaller than 12 point. The pdf should include your graphs. Do not submit the graphs separately.

Submission

Put all your files in a single folder with the name: uhid_hw5, where uhid is the prefix of your .uh.edu email address. Then, zip the directory and upload the zip file using Blackboard.