COSC 6377 - Computer Networks

Fall 2012

MW 1:00-2:30pm at PGH376

InstructorOmprakash Gnawali

Homework 2 : Internet Measurement

Due: November 7, 2012

In this assignment, we learn how Internet performance measurements are done. The first step is data collection. We will not write software ourselves to collect data. Instead we will use data collected by one of these projects. In order to know how to use the data, we need to understand what the software does. Once we download the data, we will process them to derice conclusions about the performance of the network.

Performance Measurement Project

A study was published in IMC 2011 that showed one way to measure ISP traffic shaping. You should read the following paper thoroughly to understand the measurement technique:

End-to-end Detection of ISP Traffic Shaping using Active and Passive Methods by Partha Kanuparthy and Constantine Dovrolis.

We will process the data collected in the project above so it is important that you understand the measurement methodology. Appendix A goes into the details of implementation but to make sense of it you will have to read the rest of the paper.

Measurement Infrastructure

The project used a network performance measurement infrastructure called the Measurement Lab. Please visit their site at www.measurementlab.net to learn more about the infrastructure. Pay attention to how the measurements are done and how the data are stored.

Downloading The Data

The measurement lab website describes different ways to download the data. The easiest way might be to use a tool called gsutil. Do not try to use curl or wget with the public URL method. The URLs require Google authentication and that is not a straightforward fetch with wget or curl.

We are interested in data collected by the shaperprobe project. That information will help you compose the URL to be used with gsutil.

Data Analysis

Q1: How many measurements were performed during July 2011 - October 2012? How many unique IP addresses were seen per day during July 2011 - October 2012? Downloading complete data for this period will take a long time so we will take a shortcut. We will download data for the day (given by the sum of the last two digits of your ID) of each month and use that as representative data (a bad one!) for that month. Present your result in a graph similar to Figure 1 in the paper. For example, if the last two digits of your ID are 11, then you will download the data for the 2nd day of each month. You will have two data points (number of measurements and number of unique IP addresses) per month. In your report, put figure 1 from the paper and your figure side by side and include a commentary on differences or similarities you see. Also explain why the number of IP addresses is larger or smaller than the number of runs.

Q2: Compute the aggregate packet rate (not data rate) using the first 50-packet train in the file and the 5s packet train (starts after "### MEAS ###" in the data) for all the nodes during the designated days. Plot the 5-s rate (x-axis) vs 50-pkt rate (y-axis) as a scatter plot. You may compute the rate using either the client (first column) or the server (second column) timestamp but indicate which one you are using and use the same one for the 50-packet and 5-s trains.

Q3: We can assume that the server connectivity is well-provision but we could not say that about the clients that perform the test. Could their performance be different at different times of the day? We do not have the luxury of asking the same client to perform the test at different times of the day. We can still try to understand the performance as a function of the hour of the day. Compute the 5-s packet rate for all the nodes. The first column in the data file is the sender timestamp in unix time. Group all the rates by hour of the day and plot a graph that shows the packet rates (y-axis) at different hours of the day (x-axis). Use boxplot to show the distribution of the rate for each hour.

We will use the last two digits of your ID to determine the portion of the data to analyze for Q2 and Q3. The second last digit of your ID gives the month and the last digit + 1 gives the day. The year is always 2012. For example, if the last two digits of your ID is 23, you should analyze the shaperprobe data collected on February 4, 2012.

Submission

Your submission consists of two parts. First, graphs and writeup for the three questions. Mention what the last two digits of your ID is and what month and day that corresponds to. Second part is your script, code, and commands that you used to download and process the data. Include all of them in the pdf with a description of how to run your script and code. Upload this single pdf to Blackboard.