Instructor | Omprakash Gnawali |
Write a program that will encode the source route as a bloom filter. You will be provided an input file that represents the path - each node along the path is separated by space. For example, if the content of the file is the following:
1 2 3 4 5
then the path starts at node 1 and ends at node 5 with 2, 3, and 4 as intermediate nodes.
You will compute the bloom filter as follows: First, we need to compute the index to the filter. Typically, we use a hash function to compute the index. In this assignment, we will use a special hash function:
str = "node1,node2" hashval = asciivalue(md5sumhex(str)) mod n
md5sumhex() returns a string of length 32 -- md5 message digest value in hex digits, and asciivalue() is a function that returns the sum of ASCII value of all the characters in the argument, which is a string. For example, asciivalue("abc") would return 294.
This hash function will convert each edge in the path to an index (hashval) in the range 0..n-1. For example, if our path is n1->n2->n3, we will compute the hash for "n1,n2" and "n2,n3". Now, you will maintain a bit array and set the bit for that index. If hash of "n1,n2" and "n2,n3" return 2 and 5, then we will set the 2nd and 5th bit to '1'. This is our bloom filter.
Your job is to convert the path given in the input file to a bloom filter of n bits. We will run your encoder like this:
./write-route route_filename output_filename filter_size
where filter_size specifies the size of the filter in bits and output_filename is the file to which you write your bloom filter . You should write the size of the filter as the first byte and the filter starting the second byte in the file. If your filter size is 5 bytes, then you here are the six bytes in the output_file: 5, A, B, C, D, E where (A..E) is the bit array representing your filter. Of course all these bytes would be one after another in the file in binary format.
For debugging, you can use byte array for the filter and write to the file in human-readable text form. But you should eventually write the bit array in binary format.
If we were building a network protocol, the filter would be put on a packet and forwarded to the next hop. In our example, we are going to write the filter to a file and write another program that will decode the filter stored in the file. With this exercise, we are essentially simulating sending a packet (writing to a file), receiving a packet (reading from a file and decoding the information), and the process by which an intermediate router will decide the next hop.
For this part, assume there is an input file that specifies the topology of the network. For each node in the network, there is a line in this file. The first column of each line is the node that has rest of the nodes in the line as its neighbors. For example, if we have an input file like this:
1 2 3 5 900 901
then we have 6 nodes in the network. 1 has links to nodes 2 and 3. 5 has links to nodes 900 and 901. Here 2 and 3 are not neighbors to each other. 2 and 3 have one neighbor: node 1.
Your decoder program will read the filter (output by write-route), the topology file, and decide the next hop for nodeid. For example,
./next-hop filter_filename topology_filename nodeid
Your program will then display the next hop(s) for the nodeid. You will first read the topology file to find all the neighbors of nodeid. Then, you should do a lookup for each edge at the node using the has function above.
Lets say the route is 1->2->3->4 and node 2 has nodes 3, 5, and 10 as neighbors, if we call next-hop for nodeid 2, it should display 3 (assuming no false positive) as next hop. If there are false positives, it should display all the possible next hops.
Q 1 Lets say, our path has 50 hops and we are using a filter of size 40. What is the probability of false positive assuming uniform hashing?
Q 2 What happens to our routing/forwarding mechanism, if there is a false positive? Did you encounter false positives? If you did, provide the command line arguments, route input file, and topology file so we can reproduce the false positive.
Q 3 Plot the CDF of index values computed by our hash function of strings "1" through "10000" for a filter of size 50 and 100 bits. Is our hash function biased?
Q 4 Complete this table experimentally for these input sets: "1".."20", "1".."40", "1".."60", "1".."80", and "1".."100". Here "1".."20" represents the twenty different strings: "1", "2", "3", ... "20", thus the argument to our hash function with be the integer formatted as string (not integer,integer formatted as string for our work above).
Filter Size | Compression Ratio | False Positive Rate |
5 | ||
10 | ||
20 | ||
30 | ||
40 |
Then, draw one graph with five lines for the five input sets. On x-axis put filter size, and on y-axis put false positive rate.
Q 5 Why does the encoder need to put the size of the filter as the first byte in the output file?