Recruiting PhD students supported by TA/RA! Please send your resume to pwu7@uh.edu
Bio: I'm currently an associate professor in CS@UH. I joined CS@UH in 2018 as an asssitant professor. I was a post-doctoral research associate in Innovative Computing Laboratory (ICL), University of Tennessee led by Prof. Jack Dongarra. Earlier I did my Ph.D. in computer science at University of California Riverside advised by Prof. Zizhong Chen. Even earlier I did my B.S. in mathematics at Univesisty of Science and Technology of China (USTC).
Teaching
Fall 2023: COSC3320 Algorithms and Data Structures
Research Interests
- High performance computing
- Numerical algorithms and software
- High performance big data analytics
- Parallel and distributed computing
- Fault tolerant systems and algorithms
More detailed research interest descriptions at Research Page
Projects:
Selected Recent Publications
More publications with PDF downloads.
PPoPP'23 | Zhang S, Shah R, Ootomo H, Yokota R, Wu P. Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming February 25-27, 2023, (pp. 301-312) |
ICPP'21 | Zhang S, Wu P. Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization Proceedings of the 50th International Conference on Parallel Processing August 9-11, 2021 |
SOCC'20 | Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing The 11th ACM Symposium on Cloud Computing October 19-21, 2020, (virtual event due to COVID-19) |
ICS'20 | TensorSVM: Accelerating Kernel Machines with Tensor Engine The 34th ACM International Conference on Supercomputing June 29 - July 2, 2020. Barcelona, Spain. Acceptance Rate: 30% (40/132) PDF from ACM |
HPDC'20 Best Paper Nominee |
High Accuracy Matrix Computations on Neural Engines: a Study of QR Factorization and its Applications The 20th ACM International Symposium on High-Performance Parallel and Distributed Computing PDF/Web/Video from ACM DL |
BigData'19 | xSVM: Scalable Distributed Kernel Support Vector Machine Training IEEE International Conference on Big Data in 2019 |
ACM TOMS'19 | PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP ACM Transactions on Mathematical Software, Volume 45(2): 16:1-16:35 (2019) PDF from ACM |
SC'18 | Fault Tolerant One-sided Matrix Decompositions on Heterogeneous Systems with GPUs Proceedings of the 30th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, |
ICCS'18 | The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques International Conference on Computational Science, ICCS 2018 |
TPDS'18 | Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architecture IEEE Transactions on Parallel and Distributed Systems |
Last updated: 11/6/2023