During the execution of synthetic and real app benchmarks, we detected
that the performance of the benchmarks have a high variability on
MareNostrum. Also, we were notified by other colleagues that they
achieved more stable times with other apps, as FALL3D, on clusters like
JUWELS. Therefore, we decided to investigate the origin of this “system
noise” and the different ways to limit it, as its affects the stability
of the application performance and affects negatively in the
scalability, mainly in applications with synchronization MPI calls in
their kernel.
Using a synthetic benchmark developed by us, we observed that there are periodic events
during the simulation that affects the time of a small number of
iterations. We concluded that these events come from the system and they
produce preemptions on the simulation. To avoid these preemptions, we
tried to run the same benchmarks leaving 1 or 2 cores per node empty.
Thus, these 1 or 2 empty cores are available to run these periodic
events therefore the other cores avoid the possible preemption and the
applications running on them could have a more stable time per
iteration.
In addition, during the PRACE UEABS activity, we noticed that some
applications perform better on SkyLake systems with HyperThreading
enabled, as JUWELS cluster (beyond the performance improvement acquired
by the higher frequencies). This is because the systems with
HyperThreading handle better the preemption and the context switching
are swifter.
Taking these points into account, we decided to test multiple
applications with multiple configurations, with HyperThreading and
without limiting the frequency to 2,1 GHz, as others SkyLake clusters
do.