The DLB library will improve the load balance of the outer level of parallelism (e.g. MPI) by redistributing the computational resources at the inner level of parallelism (e.g. OpenMP). This readjustment of resources will be done dynamically at runtime.
This dynamism allows DLB to react to different sources of imbalance: Algorithm, data, hardware architecture, variability and resource availability among others.
How does DLB work?
DLB will use the malleability of the inner level of parallelism to change the number of threads of the different processes running in the same node. There are different load balancing algorithms implemented within DLB. They all relay on this main idea but they target different types of applications or situations.
Who can use DLB?
Any application written in C, C++ or Fortran in any of the supported parallel programming models. The current supported parallel programming models are the following:
- MPI+OpenMP
- MPI+OmpSs
- OmpSs (Multiple Applications)
We are open to adding support for more programming models in both inner and outer level of parallelism.
Technical Requirements:
- Shared Memory between Processes: DLB needs a shared memory node and more than one process running in the same node.
- Preload Mechanism: The system must provide a preolad mechanism to intercept MPI calls. (Not necessary if using Nanos++ runtime and we don't need to intercept MPI calls)
- Parallel Regions in OpenMP: If using the OpenMP model DLB needs different parallel regions to open and close in order to change the number of threads (i.e. OpenMP standard only allows to change the number of threads outside a parallel region).
- Non-busy waiting mode for MPI calls: To use the cpu that is waiting in communication DLB needs the MPI calls to be busy waiting. The different MPI implementations usually offer a way of obtaining this behaviour but it is not enabled by default. DLB offers a mode where the cpu where the MPI call is being executed will not be used, but the performance obtained is penalized.