The goal of the project is to design and implement a distributed sorting algorithm using UDP transport layer. The input will be a text file containing a list of integer keys separated by blanks. The output should have exactly the same format, except that the numbers should be sorted in increasing order. Only one processor should read the input and write to the output file. The general idea is that one node will read input and distribute the keys among nodes. All the nodes will then combine to perform the sort. Finally, one node will output the sorted keys in a file.
You have a lot of freedom regarding how you would implement this project - basically any reasonable approach is fine. You can choose any sorting algorithm, but it should scale. This means that, as the number of processors is increased, the work done by each processor should decrease proportionally (approximately). Obviously, a single processor should not sort the entire list, or do any operation on all keys that is not linear in the number of keys. But a single processor may do a linear operation on the entire list, e.g., it could merge two arrays or lists of sorted keys, or it could separate a list of keys into buckets, where each bucket is an array or list of numbers in a range of values. You should also avoid unnecessary communication. For example, it is not desirable to broadcast the entire array of keys to all processors.
You can develop any scheme to overcome messages lost or reordered by the UDP transport layer. (You may want to write a version of your socket routines to randomly lose a packet with a fixed probability, for testing). However, you should keep in mind that the purpose of this scheme is simply to get sorting done reliably, and not a general purpose reliable transport like TCP.
You can make reasonable assumptions. Here are some examples:
Some of the assumptions you can make about the keys.
- The keys are integers in a range, say, 1..10,000
- There is an upper bound on the total number of keys, say 10,000.
Do not assume that the number of processors is a constant, but you may make other reasonable assumptions. For instance, you may assume:
- The maximum number of processors cannot be more than, say, 64.
- The number of processors is even, or a power of 2 or a perfect square¡K
- The number of keys is a multiple of the number of processors.
You have to take into account that messages may get lost or reordered in transit when using UDP. However, if it helps, you may assume:
- Messages rarely get reordered.
- The fraction of lost messages is relatively small.
- Performance is of less important if messages do get lost.
You may make other reasonable assumptions. But please make sure you list them in your design document and the final report. Also, you probably will not need to some of the assumptions above. Don't think you are missing the solution if you don't need some or all of the above sample assumptions.
April 2 (in class): Submit a plan of how you will implement the project. It should contain the list of person(s) in the team, what kind of algorithm you will use for sorting, how you will implement it in the sockets framework, and how you will recover from any loss and reordering of packets done by UDP. This would typically be about one page. The main purpose of this document is to make sure that you have a good plan, so don't waste time formatting and styling.
April 20 (10 PM): Submission of a final project report. The report should contain the following:
- Your project design and implementation scheme.
- Execution time of you program on some examples, e.g,, for sorting 1000 and 10,000 keys on 2 and 4 nodes
- Location of your code on CS pegasus file system. Do not modify your code after the submission date. Please do not append your code or example files here.
- This report should not exceed approximately 1000 words, excluding any figures or charts.
You will be asked to pick an appointment slot and demonstrate your project the week following the submission.
You are free to discuss your design with other students and exchange example test inputs etc. But you must write your reports and develop your code independently. We are very eager to catch cheating with our toolset to find commonalities in documents and code. Don't fall in the trap.
10 points per day out of a total of 100 points for the final project submission.
5 points per day for late submission of your design document.
Sample UDP Client/Server Programs:
Here are sample UDP programs.
Here is sample code for timer which
show you how to use "gettimeofday".
Test Data:
Here is the source (testdata.c) for
generating test data. The default setting will generate 500 numbers
in the range of [0 .. 9999]. Change the program to fit your needs.