Scalable and Accelerated Numerical Optimization for Data Driven Science &
and Engineering
Project: LibKernel:
This project aims to provide a high performance, scalable, and
accelerated framework for large scale kernel machines. It tackles
one of the most severe obstacles in kernel machines---the prohibitively
expensive computational cost of O(n^3) where n is number of training point,
through state-of-the-art kernel matrix approximation, and second-order
optimization algorithms.
It provides the following features
Mehrotra's Interior Point Method for smooth constrained models,
Trust-region for unconstrained
The backbone of machine learning and data driven science and engineering
is often mathematical optimization.
At large scale, first order methods have been the most commonly
used and successful algorithms for solving optimization problems.
However except for some models such as deep neural networks for which
the back-propagation training algorithm can be casted as matrix-matrix
multiplication, other models employing first order methods exhibit
performance bounded by memory which can be two orders of magnitudes
slower than processor, with the gap increasing exponentially.
On the other hand, second order methods which are favored by
traditional optimization community are challenging to be adopted
at large scale machine learning models, largely due to their
complexity and high cost per iteration. However second order methods
have at least three promising advantages:
Superlinear convergence that are insensitive to
conditioning (often 100 iterations to convergence vs. millions)
High arithmetic intensity to be compute bound rather than
memory bound
High scalability to large parallelism
In order to realize the potential of second order methods for large
scale machine learning, we identify and investigate three techniques
to handle the challenges:
High performance linear algebra algorithms on accelerators
and distributed memory systems to scale and accelerate per
iteration operations;
Pass-efficient randomized numerial linear algebra (RandNLA)
to reduce computational complexity without compromising
predition accuracy
Stable mixed precision matrix computations and optimization algorithms
to exploit blazingly fast tensor processing units such as NVIDIA
TensorCore, Google TPU, etc.
Error Correcting Codes for Matrix Computations
Acknowledgements: Acknowledgements: "This material is based upon work supported by the National Science Foundation under Grant No. 2146509. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."