Research Areas

High Accuracy Matrix Computations on Low Precision Tensor Units

Project: LATER

Scalable and Accelerated Numerical Optimization for Data Driven Science & and Engineering

Project: LibKernel:

This project aims to provide a high performance, scalable, and accelerated framework for large scale kernel machines. It tackles one of the most severe obstacles in kernel machines---the prohibitively expensive computational cost of O(n^3) where n is number of training point, through state-of-the-art kernel matrix approximation, and second-order optimization algorithms. It provides the following features

Kernel Matrix Approximation:: Incomplete Cholesky; truncated eigenvalue decomposition via orthogonal iteration (power iteration); truncated Lanczos iteration; Nystrom's method; randomized truncated eigenvalue decomposition.
Optimization Algorithms:: Mehrotra's Interior Point Method for smooth constrained models, Trust-region for unconstrained

The backbone of machine learning and data driven science and engineering is often mathematical optimization. At large scale, first order methods have been the most commonly used and successful algorithms for solving optimization problems. However except for some models such as deep neural networks for which the back-propagation training algorithm can be casted as matrix-matrix multiplication, other models employing first order methods exhibit performance bounded by memory which can be two orders of magnitudes slower than processor, with the gap increasing exponentially.

On the other hand, second order methods which are favored by traditional optimization community are challenging to be adopted at large scale machine learning models, largely due to their complexity and high cost per iteration. However second order methods have at least three promising advantages:

Superlinear convergence that are insensitive to conditioning (often 100 iterations to convergence vs. millions)
High arithmetic intensity to be compute bound rather than memory bound
High scalability to large parallelism

In order to realize the potential of second order methods for large scale machine learning, we identify and investigate three techniques to handle the challenges:

High performance linear algebra algorithms on accelerators and distributed memory systems to scale and accelerate per iteration operations;
Pass-efficient randomized numerial linear algebra (RandNLA) to reduce computational complexity without compromising predition accuracy
Stable mixed precision matrix computations and optimization algorithms to exploit blazingly fast tensor processing units such as NVIDIA TensorCore, Google TPU, etc.

Error Correcting Codes for Matrix Computations

Acknowledgements: Acknowledgements: "This material is based upon work supported by the National Science Foundation under Grant No. 2146509. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."