Abstract
One of the truisms of supercomputing is that promised (aka peak) performance always dwarfs realized performance. This gap has generally widened over time, most notably after the reemergence of ML/AI. To some degree, this is to be expected. Vendors are motivated to put their best foot forward, and as it was once put to me, vendor-published benchmarks should be considered the asymptotic limit of tuning. No matter how hard customers try, they will never completely replicate a vendor result – Hercules never catches the hare. At best, he gets within a given epsilon.
Even though Newton refuted the Hercules argument, it is still the case that the performance gap will always exist. The size of the gap, however, is a different matter. Jack Dongarra noted in the 80’s that to achieve what he called “supercomputer performance” on the Cray 1, computations had to be organized to reuse data. Without data reuse, computation performance on the Cray was basically scalar speed. Compilers and vector computers in the 80’s by and large delivered supercomputer performance, a fact which unfortunately is not so true today.
This talk discusses how to achieve supercomputing performance. While the compiler is essential to a successful strategy, achieving supercomputer performance requires more: an appropriate compiler, a balanced architecture and an informed user. The talk roughly covers three broad topics from the user, compiler, and architecture perspective:
- A quick review of the compiler theory behind reordering transformations such as vectorization and parallelization, demonstrating how it can produce supercomputer performance on vector architectures. As a general rule, understanding how compilers work is usually invaluable in aiding application developers produce more effective programs.
- An examination of some common architectures used for ML/AI acceleration, discussing how reordering theory is necessary for their success, why their performance gap is so large, and the challenges that have to be overcome to meet the “supercomputer” mark.
- Speculation from a compiler point of view as to the architectures that can be effective in ML/AI acceleration. Following the days of CISC v. RISC, finding the right balance between compiler and architecture is essential in finding a successful approach.
Speakers
Host: Teresa Cervero, Leading Researcher. Technical Management HW Engineering - Computer Sciences Department, BSC