System Tools and Advanced Runtimes (STAR)
The System Tools and Advanced Runtimes (STAR) group focus on research crossing multiple software layers, from OS, runtimes and low-level APIs to programming models, tools, and applications. Our goal is to improve the system software stack to support increasingly complex workloads such as traditional HPC, AI and Data Analytics on massively parallel HPC and Cloud platforms. We are involved in several European and industrial projects.
Objectives
Data-flow programming models
We develop and mantain the OmpSs-2 programming model, a versatile data-flow programming model to build HPC applications. OmpSs-2 main targets are multi-core, heterogeneous and distributed systems. The OmpSs-2 programming model is implemented by the LLVM compiler and the Nanos6 runtime system.
Compilers and Domain-Specific Languages (DSLs)
The OmpSs-2 compiler for C and C++ languages leverages the LLVM infrastructure to generate high-performance code. We also develop Saiph, a domain-specific language (DSL) that targets the Computational Fluid Dynamics (CFD) domain. This DSL is a research vehicle to demonstrate the feasibility of DSLs on HPC systems and it leverages many of the OmpSs-2 features to run on HPC systems.
OS and Runtimes Systems
Runtime systems are a crucial component to exploit current multi-core and heterogeneous systems. Our research on runtime systems focuses on Nanos6, the reference runtime implementation of the OmpSs-2 programming model. Nanos6 provides state-of-the-art task scheduling, dependency management, monitoring infrastructure and support for accelerators such as GPUs and FPGAs. We also work with the Linux kernel to extend it to better support advanced runtime systems. Finally, we also have augmented the LLVM OpenMP runtime with low-level APIs required to support the Task-Aware (TA) libraries described below.
Interoperability with other programming models and APIs
Our Nanos6 and LLVM OpenMP runtime systems provide several features to integrate other programming models and low-level APIs with the data-flow execution model of OmpSs-2 and OpenMP tasks. On top of these low-level APIs we have developed several Task-Aware (TA) libraries that ease the orchestration of complex hybrid and heterogeneous applications. The TAMPI and TAGASPI libraries support MPI and GASPI message-passing APIs, respectively, while TA-CUDA, TA-OpenCL and TACL, leverage CUDA, OpenCL and AscendCL APIs to exploit heterogeneous systems. Finally, the TASIO library facilitates the integration of the Linux io_uring high-performance asynchronous storage API with OmpSs-2 and OpenMP.
Heterogeneous Computing
OmpSs-2 provides high-level support to develop applications on heterogeneous systems composed of multi-cores, GPUs and FPGAs. To that end, OmpSs-2 offers a high-level abstraction to leverage kernels developed with CUDA C, Xilinx HL had OpenACC. These kernels can be used as simple tasks in an OmpSs-2 program thanks to a transparent directory/cache that automatically manages all data transfers between the host and accelerators. This research line is conducted closely with the Programming Models and AccelCom groups.
Benchmarks, libraries and tools
The Garlic application suite comprises several mini-apps and benchmarks that represent common computational patterns found on several scientific domains. Each application is implemented using several programming models such as MPI, OpenMP and OmpSs-2. The TVM library provides optimized kernels to perform the Tensor Vector Multiply operation using the OpenMP fork-join model. Finally, the ovni instrumentation library is used to obtain detailed execution traces, facilitating the analysis and optimization of applications and programming models.