OpenUH: A Portable and Optimizing OpenMP Compiler
We are pleased to announce the release of the OpenUH compiler by the High Performance Computing Tools (HPCTools) group of the University of Houston. OpenUH is a robust, optimizing, portable OpenMP compiler, which translates OpenMP 2.0 (www.openmp.org) directives in conjunction with C, C++, and FORTRAN 77/90 (and some FORTRAN 95). OpenUH is available as stand-alone software or with the Eclipse GUI integrated development environment. Our release also includes Dragon, a tool for browsing an application's source code, callgraph, flowgraph, profile information and more.
OpenUH is based on SGI open source Pro64 compiler, which targets the IA-64 Linux platform.Our compiler's major optimization components are the interprocedural analyzer, the loop nest optimizer and global optimizer.In order to achieve portability while preserving most optimizations, we have enhanced the suite's IR-to-source translators to produce compilable code immediately before the code generator. To achieve greater stability we merge work from the two major branches of Open64 (ORC and Pathscale) to exploit all upgrades and bug fixes
We have strong working relations involving our compiler and tools with multiple partners in the research and business sectors. We have installed OpenUH at the National Center for Supercomputing Applications and NASA Ames for a pilot evaluation. We are using this compiler for research into OpenMP language extensions and into novel translation techniques. OpenUH can be downloaded at: www.cs.uh.edu/~openuh.
OpenMP Language for Multi-Core Architecture
OpenMP was designed for flat systems. But even some current SMPs do not provide equal cost of access to memory. Multi-core platforms may be hierarchical, as they may also exploit simultaneous or interleaved multithreading, and subsets of threads may share substantial resources. As a result, the way in which computation is mapped to the hardware may have a major performance impact. OpenMP provides features for assigning work to user-level threads, but not for the subgrouping of these threads, the mapping of them to the hardware, or for data placement. It has no point-to-point synchronization.
We are currently working within the OpenMP ARB to explore potential new features for OpenMP. There remains a tension between the need to enable highest performance for those programmers who require it, and the desire to keep OpenMP as simple and straightforward as possible. We have proposed language extensions that define, shape and exploit sub-teams of threads and permit a finer degree of thread synchronization and data locality. These ideas can be used to parallelize multi-dimensional loop nests for large thread counts, to describe a variety of execution scenarios including pipelining, as well as to assign work flexibly across a system with non-uniform resource sharing. We have successfully implemented and tested the subteam concept in the OpenUH compiler.
Cluster-Enabled OpenMP
Given the importance of clusters, we are evaluating approaches to providing OpenMP on clusters. The traditional approach relies on software distributed shared memory, which incurs high overheads and (unless it is integrated with a compiler) is not amenable to important code optimizations. An alternative solution involving translation to MPI is hard to implement. We chose instead to explore a translation using Global Arrays (http://www.emsl.pnl.gov/docs/global/). Our solution is both simpler and permits a variety of compile-time improvements. This translation has been specified and an implementation is under way outside the current OpenUH compiler release, since optimization work is on-going.
COPPER:COmpilation and OPtimiza tion with PERformance feedback
In a collaboration with colleagues at NCSA, the University of Tennessee (UTK), and Virginia Tech (VT), we are designing and implementing interfaces and strategies for compiler and performance tools interaction. The goal is to better support the process of developing and tuning high performance codes in MPI and OpenMP. HPCTools contributes OpenUH and Dragon tool to this effort, NCSA contributes PerfSuite, UTK provides KOJAK and PAPI, and VT implements applications to be used towards the development and evaluation of the new environment. We also aim to explore the need for additional support by the OpenMP standard for tools, particularly for profiling (KOJAK) and hardware counter information (PerfSuite and PAPI). We enhance the PDB (program database toolkit, KOJAK and PAPI) via selective instrumentation and our extended F90 front end.
Modeling of Hybrid MPI+OpenMP Code
We have also worked to create a framework for performance modeling of hybrid OpenMP and MPI applications. The OpenUH compiler determines an application signature statically, and a parallelization overhead measurement benchmark, realized by Sphinx and Perfsuite, collect system profiles. Based upon these, we have proposed a performance evaluation measurement system to identify communication and computation efficiency.
The Dragon Analysis Tool
The Dragon Analysis Tool, which supports OpenMP application development, is built on top of the OpenUH compiler. In addition to collecting and displaying the results of traditional static program analyses (e.g. Callgraph, Control Flow Graph and Dependence Graph), Dragon is able to instrument a program to gather and display dynamic execution details. A module is being added to automatically generate OpenMP directives. Other on-going work includes the attempt to gather precise information on thread-specific access to shared data at run time, and the integration of several tools and a compiler to provide a complete environment for the application development life cycle.
OpenMP Validation Suite
In addition, we have released the first public OpenMP validation suite. The validation suite is the result of a collaborative effort between the HPCTools group and the High Performance Computing Center at Stuttgart, Germany. The test suite is in conformance with OpenMP specifications 1.0 and 2.0, is complete for both Fortran and C, and permits tests to be individually or collectively executed. The test suite can be downloaded at: www.cs.uh.edu/~openuh.
