High Performance Systems Lab University of Houston Department of CS at UH

Performance Skeletons

The performance skeleton of an application is a short-running program whose performance in any scenario reflects the performance of the application it represents. Such a skeleton can be employed to quickly estimate the performance of a large application under a new and unpredictable environment.

Skeleton Construction
The approach is based on capturing the execution behavior of an application and automatically generating a synthetic skeleton program that reflects that execution behavior.

Illustration of our Approach

  1. Record application's execution trace: The application is executed on a controlled test bed and its execution activity, specifically CPU usage and message exchanges, is recorded. This is the execution trace.
  2. Trace logicalization: The communication pattern is identified by analyzing the communication traffic in the execution traces. Subsequently the family of traces from parallel execution is converted to a single trace with communication between physical neighbors converted to communication beween logical neighbors. Cool pics of communication pattern of NAS benchmarks.
  3. Compress execution trace into an execution signature: The repeated patterns in the recorded execution trace are identified and used to generate a compact representation of the trace by introducing a "loop structure". The new compact representation is the execution signature.
  4. Generate performance skeleton program from the execution signature: The application execution signature is converted to a computer program which generates execution activity that is similar to the recorded execution signature but with execution time scaled down by a given factor K. This is the performance skeleton.


  • J. Subhlok and Q, Xu, Automatic Construction of Coordinated Performance Skeletons, The NSF Next Generation Software Workshop at IPDPS 2008, Miami, FL, April 2008, pdf, slides

These 3 publications comprehensively cover the major recent results from this project:

  • Q. Xu and J. Subhlok., Construction and Evaluation of Coordinated Performance Skeletons, Technical Report UH-CS-08-09, University of Houston, May 2008 pdf
  • Q. Xu and J. Subhlok., Efficient discovery of loop nests in communication traces of parallel programs, Technical Report UH-CS-08-08, University of Houston, May 2008 pdf
  • Q. Xu, R. Prithivathi, J. Subhlok, and R.Zheng, Logicalization of {MPI} communication traces, Technical Report UH-CS-08-07, University of Houston, May 2008 pdf

The following covers the general problem of skeleton construction, basic techniques and a wider set of results:

  • S. Sodhi, Q. Xu and J. Subhlok, Performance Prediction with Skeletons, Cluster Computing: The Journal of Networks, Software Tools and Applications, Volume 11, No 2/June 2008. source, pdf

A full list of publications relating to this project is included in Subhlok's papers

This work supported by the National Science Foundation under Grant No. ACI-0234328 and Grant No. CNS-0410797.


For questions, please send email to Qiang Xu Qiang.Xu[AT]mail[DOT]uh[DOT]edu or Jaspal Subhlok at jaspal[AT]uh[DOT]edu