Reading list for COSC 7397 for Spring 2008.

Italicized papers are supplementary material.

 

 

DATE/ TOPICS

 

PAPERS

 

 

Jan 15

Course Intro  (Jaspal)

 

SETI@home: An Experiment in Public-Resource Computing: Anderson, Cobb, Korpela, Lebofsky, Werthimer (pdf)


 

 

Jan 22

General topics

 

 

How to give a talk (Jaspal)

How to write a critique (Rong)

 

Jan 29

Global Computing Systems

 

 

BOINC: A System for Public Resource Computing and Storage: Anderson (pdf)

Entropia: Architecture and Performance of an Enterprise Desktop Grid System:   Chien, Calder,  Elbert, and  Bhatia (pdf)

The Entropia Virtual Machine for Desktop Grids: Calder, Chien, Wang, Yang (pdf)

Characterizing and Evaluating Desktop Grids:  Kondo, Taufer, Brooks, Casanova, Chien (pdf)

The Computation and Storage Potential of Volunteer Computing: Anderson, Fedak (pdf)

 

Feb 5

Global Computing Applications


Distributed Computing in Practice: "The Condor Experience": Thain, Tannenbaum, Livny, (pdf )
Condor -- A Hunter of Idle Workstations, Litzkow, Livny, Mutka, . (pdf)


Experiences Implementing PlanetLab. Peterson, Bavier,  Fiuczynski, Muir. (pdf)

The Google File System,  SOSP 2003 , Ghemawat, Gobioff, Leung    (pdf)
Bigtable: Distributed Storage System for Structured Data,  Chang,  Dean,  et. al. (OSDI 2006) (pdf)

 

Feb 12 

Networks and Grids



Smartsockets
: Solving the Connectivity Problems in Grid Computing Maasen, Bal (pdf)  

Automatic Construction and Evaluation of Performance Skeletons: Sodhi, Subhlok (pdf)    (presented by Xu)


 Feb 19
 
Fault tolerant MPI

Intro to MPI :  Edgar Gabriel (pdf)


The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
Hursey, Squyres, Mattox, Lumsdaine (pdf)

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes:  Bosilca,  Bouteiller, Cappello, Djailali,  Fedak, Germain, Herault, Lemarinier, Oleg Lodygensky, Frederic Magniette, Vincent Neri, Anton Selikhov (pdf)

  Feb 26
 
Virtualization

Xen and the Art of Virtualization, Barham,  Dragovic, Fraser, Hand, Harris,  Ho,  Neugebauery,  Pratt, Warfield (SOSP 03) (pdf
 

A Case for Grid Computing on Virtual Machines,  Figueiredo,   Dinda,   Fortes (pdf)


  Mar 4
 
Virtualization and MPI
 

Towards Virtual Networks for Virtual Machine Grid Computing   Sundararaj and Dinda (pdf)


Executing MPI Programs on Virtual Machines in an Internet Sharing System   Pan, Ren, Eigenmann, Xu (pdf)


Amazon Grid Services (link). Focus on the Compute Cloud, Simple Storage Service and Simple Queue Service.


Google/IBM Cloud (link1)  (link2)

   Mar 11
 Project  proposal  presentation
   Mar 18
  MID TERM BREAK
    Mar 25
  
Distributed Mobility

  

Persistent Personal Names for Globally Connected Mobile Devices,  Ford, Strauss, Lesniewski-Laas, Rhea,  Kaashoek, and  Morris (OSDI 2006)  (pdf)

CarTel: A Distributed Mobile Sensor Computing System,  Hull, Bychkovsky, Chen, Goraczko, Miu, Shih, Zhang, Balakrishnan,  Madden,  (ACM SenSys, 2006)  (pdf)


   April 1   Project Iteration presentation 1
   April 8
  
Network evaluation
    The Flexlab Approach to Realistic Evaluation of Networked Systems, Ricci,  Duerig, Sanaga,  Gebhardt, Hibler, Atkinson, Zhang, Kasera, Lepreau,  (NSDI 2007) (pdf)

An Experimentation Workbench for Replayable Networking Research, Eide, Leigh Stoller, Lepreau (NSDI 2007 ) (pdf)

   April 15
 Project Iteration presentation 2
   April 22
  Open for project development
  April 29
 Project Final presentation

 

Additional Papers

Resource Policing to Support Fine-Grain Cycle Stealing in Networks of Workstations: Ryu, Hollingsworth (pdf)

 

Exploiting Fine-Grained Idle Periods in Networks of Workstations: Ryu, Hollingsworth (pdf)

 

Design and Implementation Tradeoffs for Wide Area Resource Discovery : Oppenheimer, Albrecht, Patterson, Vahdat  (pdf)

 

Predicting Internet Network Distance with Coordinates-Based Approaches: Ng, Zhang  (pdf)


Automatic Clustering of Grid Nodes: Xu, Subhlok (pdf)

 

A survey of rollback-recovery protocols in message-passing systems: Elnozahy, Alvisi, Wang, Johnson (pdf)

 

A Worldwide Flock of Condors : Load Sharing among Workstation Clusters: D. H. J Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyn (pdf)

Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers: Gioiosa1,  Sancho, Jiang,  Petrini,  Davis (pdf)
 
The LAM/MPI Checkpoint/Restart Framework: System initiated Checkpointing: Sankaran, Squyres, Barrett, Lumsdaine, Duell, Hargrove, Roman (pdf)


MPI/FT: Architecture and Taxonomies for Fault-Tolerant, Message-Passing Middleware for Performance-Portable Parallel Computing: Batchu, Neelamegam, Cui, Beddhu, Skjellum, Dandass, Apte (pdf)

 

Process Fault-Tolerance:Semantics, Design and Applications for High Performance Computing, Fagg, Edgar Gabriel, Chen, Angskun, Bosilca, Pjesivac-Grbovic, and Dongarra (pdf)



Transparent Network Services via a Virtual Traffic Layer for Virtual Machines
John R. Lange Peter A. Dinda (pdf)