OpenUH: A Portable and Optimizing OpenMP Compiler
The OpenUH compiler is a branch of the Open64 compiler, maintain by the High Performance Computing Tools (HPCTools) group of the University of Houston. OpenUH is a robust, optimizing, and portable OpenMP compiler; which translates OpenMP 2.5 (www.openmp.org) directives in conjunction with C, C++, and FORTRAN 77/90 (and some FORTRAN 95).

OpenUH is available as stand-alone software or with the Eclipse GUI integrated development environment. Our release also includes Dragon, a tool for browsing an application's source code, callgraph, flowgraph, profile information and more. We are currently working on additional support for UPC, Co-Array Fortran, and SHMEM.

To learn more about it, please visit this tab or its main website


Coarray Fortran (CAF) Project
Coarray Fortran (CAF) comprises a set of proposed language extensions to Fortran that are expected to be adopted as part of the Fortran 2008 standard. Its features promise to simplify the task of creating parallel programs by providing syntax with which a programmer can express communication at a high level. For this project, we are developing a Coarray Fortran reference implementation based upon the OpenUH branch of the Open64 compiler suite. We are exploring the potential for a variety of optimizations to Coarray Fortran codes, and considering specific enhancements to CAF in collaboration with our project partners. CAF System

To know more about Co-array Fortran. Click here

This work is supported by Total SA.

Heterogeneous OpenMP
The goal of this project is to implement OpenMP for heterogeneous systems, that may consist of combinations of general-purpose multicore processors, graphics processing units (GPUs), digital signal processors (DSPs), field programmable gate arrays (FPGAs), and special purpose accelerators. The needs of high-performance computations and embedded systems are taken into consideration. As part of our project work, we are studying how to provide standard, portable runtime support for heterogeneous OpenMP codes across multiple devices. We have joined and are working with its members to help define appropriate efficient, low-level interfaces. To perform this work, we are collaborating with who are providing applications and hardware to support our efforts. is also supporting this effort with applications and hardware.
This work is supported by the under grant # CCF-0917285. Additional support comes from the . Related links to this project includes:

eXtreme OpenMP
The eXtreme OpenMP project is a effort to enhance OpenMP with productive, portable, and efficient extreme-scale programming, a single programming model from heterogeneous multicore through large-scale distributed systems, a minimal yet powerful set of language extensions, and sophisticated implementation technology. Language research includes expression of parallelism,program locality, thread synchronization, parallel I/O, and program adaptivity. Expressions of parallelism include implementing long-lived nested regions, naming of parallel regions, and mapping of parallel regions to hardware. The problem of locality will be addressed by providing affinity of data and threads, thread subteams, and data mapping hints. We intend to provide point-to-point synchronization, transactions, and synchronization attributes and parallel I/O via thread-collective I/O interfaces and hints. We will also provide OpenMP with adaptivity through asynchronous tasks and dynamically adjustable thread teams.
This work is supported by the under grant # CCF-0833201.
For more information, please visit the project site.

OpenSHMEM Project
SHMEM is a library that implements 1-sided point-to-point and collective data transfers. It is intended to run on a variety of single-node and distributed systems, but there is particular emphasis on clusters with Infiniband interconnects. Infiniband allows remote writes to memory using the Direct Memory Access (DMA) features of hardware. Such writes do not interrupt the remote processor, thus allowing better overlap of communication and computation.
OpenSHMEM-UH-Implementation
The project involves developing SHMEM on a variety of platforms including clusters with an Infiniband interconnect. We will support the SHMEM community by producing documentation, training material and Wikis, and develop conference and workshop presence for future enhancements to SHMEM.

This work is supported by the under grant # DE-AC05-00OR22725. This work involves collaboration with the Oak Ridge National Laboratory.


DARWIN - Dynamic Adaptive Runtime Infrastructure
In this project, we exploit existing compiler technology for automatic parallelization and OpenMP translation in order to facilitate application development for multicore systems. To do so, we are extending our robust, open source OpenUH compiler for Fortran 95/C/C++ and OpenMP to enable it to combine automatic parallelization and conventional OpenMP translation strategies. We are implementing the new tasking features of OpenMP to increase the usefulness of this programming model and are considering new static optimization strategies.

The Dragon Analysis Tool
The Dragon Analysis Tool, which supports OpenMP application development, is built on top of the OpenUH compiler. In addition to collecting and displaying the results of traditional static program analyses (e.g. Callgraph, Control Flow Graph and Dependence Graph), Dragon is able to instrument a program to gather and display dynamic execution details. A module is being added to automatically generate OpenMP directives. Other on-going work includes the attempt to gather precise information on thread-specific access to shared data at run time, and the integration of several tools and a compiler to provide a complete environment for the application development life cycle. Dragon tool

OpenMP Validation Suite
This validation suite is the result of a collaborative effort between the group & the High Performance Computing Center at Stuttgart, Germany . The test suite is in conformance with OpenMP specifications 1.0 and 2.0, is complete for both Fortran and C, and permits tests to be individually or collectively executed. The test suite can be downloaded at: www.cs.uh.edu/~openuh. OpenMP Validation Suite

PModels and PModels2 Project
The Center for Programming Models for Scalable Parallel Computing is focused on research and development in the area of programming models for scalable parallel computing. Work carried out in this Center advances the state of the art in the understanding, definition, implementation, and use of models expressed in libraries, languages, and annotations.
Apart from our group at University of Houston, this project involves participants from Argonne National Laboratory, Ohio State University, Pacific Nothwest National Laboratory, Rice University, UC at Berkeley and University of Illinois.
HPCTools research group has joined the continuation of the successful project PModels currently named as PModels2. This work is supported by the under grant # DE-FC02-06ER25759.

OpenMP Language for Multi-Core Architecture
OpenMP was designed for flat systems. But even some current SMPs do not provide equal cost of access to memory. Multi-core platforms may be hierarchical, as they may also exploit simultaneous or interleaved multithreading, and subsets of threads may share substantial resources. As a result, the way in which computation is mapped to the hardware may have a major performance impact. OpenMP provides features for assigning work to user-level threads, but not for the subgrouping of these threads, the mapping of them to the hardware, or for data placement. It has no point-to-point synchronization. Multicore

We are currently working within the OpenMP ARB to explore potential new features for OpenMP. There remains a tension between the need to enable highest performance for those programmers who require it, and the desire to keep OpenMP as simple and straightforward as possible. We have proposed language extensions that define, shape and exploit sub-teams of threads and permit a finer degree of thread synchronization and data locality. These ideas can be used to parallelize multi-dimensional loop nests for large thread counts, to describe a variety of execution scenarios including pipelining, as well as to assign work flexibly across a system with non-uniform resource sharing. We have successfully implemented and tested the subteam concept in the OpenUH compiler.


Embedded high-level programming model and Medical Imaging Project
The goal of this project is to implement Medical UltraSound on BeagleBoard. The USB-powered Beagle Board delivers laptop-like performance and expansion. To discover more about Beagle Board, click the link below We are collaborating with to achieve this project.Texas Instruments (TI) is a global analog and digital semiconductor IC design and manufacturing company. In addition to analog technologies, digital signal processing (DSP) and microcontroller (MCU) semiconductors, TI designs and manufactures semiconductor solutions for analog and digital embedded and application processing. Click this link to view a demo of the Medical Ultrasound Project.

Cluster-Enabled OpenMP
Given the importance of clusters, we are evaluating approaches to providing OpenMP on clusters. The traditional approach relies on software distributed shared memory, which incurs high overheads and (unless it is integrated with a compiler) is not amenable to important code optimizations. An alternative solution involving translation to MPI is hard to implement. We chose instead to explore a translation using Global Arrays. Our solution is both simpler and permits a variety of compile-time improvements. This translation has been specified and an implementation is under way outside the current OpenUH compiler release, since optimization work is on-going. OpenMP to GA

COPPER:COmpilation and OPtimization with PERformance feedback
In a collaboration with colleagues at NCSA, the University of Tennessee (UTK), and Virginia Tech (VT), we are designing and implementing interfaces and strategies for compiler and performance tools interaction. The goal is to better support the process of developing and tuning high performance codes in MPI and OpenMP. HPCTools contributes OpenUH and Dragon tool to this effort, NCSA contributes PerfSuite, UTK provides KOJAK and PAPI, and VT implements applications to be used towards the development and evaluation of the new environment. We also aim to explore the need for additional support by the OpenMP standard for tools, particularly for profiling (KOJAK) and hardware counter information (PerfSuite and PAPI). We enhance the PDB (program database toolkit, KOJAK and PAPI) via selective instrumentation and our extended F90 front end.
Copper
This work is funded by under contract CCF-0444468.

Modeling of Hybrid MPI+OpenMP Code

We have also worked to create a framework for performance modeling of hybrid OpenMP and MPI applications. The OpenUH compiler determines an application signature statically, and a parallelization overhead measurement benchmark, realized by Sphinx and Perfsuite, collect system profiles. Based upon these, we have proposed a performance evaluation measurement system to identify communication and computation efficiency.

Hybrid MPI and OpenMP Our approach has the advantage that it does not need to execute the program. Our methodology is not only able to identify parallelization efficiency, it can also predict application performance. It can also be extended to support other programming models such as UPC and global arrays.
The work is funded by under contract CCF-0444468 and under contract DE-FC03-01ER25502.

GLASS BOX

Currently, we see rapid increase in number of cores on modern computers. This trend imposes scalability demand not only on applications but also on the software tools used for their development. This also makes the optimization process of parallel codes more difficult, creating a great need for a scalable performance analysis technology. However, delivering such a technology can hardly be provided by a single tool. This requires higher degrees of collaboration between different HPC performance analysis tools. The main goal of this project is to develop a framework that enhances collaboration between several performance analysis/optimization tools to ensure scalability, interoperability and reduction in maintenance cost.

Hybrid MPI and OpenMP

The project includes Georgia Institute of Technology, University of Houston and University of Oregon. The project is funded by the National Science Foundation(NSF).