MareNostrum 5
NEW essential changes!
This information is provisional and will be available only during the pre-production period.
HPC user accounts management
Users will now have a unique username associated with their (institutional) email address:
- Your username can now have resource assignments for multiple projects such as BSC, RES, EuroHPC, etc.
- Your username belongs to a primary Unix group (typically corresponding to your institution but without any resource allocation) and will have an associated secondary group per project with resource allocation.
- So you must use newgrp (Linux) and account (Slurm) with the secondary group to manage your projects' data and jobs.
A slight modification is applied to existing BSC staff usernames:
- bscXXYYY → bsc0XXYYY
A new bsc_command has been developed to easily switch between your projects, called bsc_project.
Submitting jobs
When submitting a job, it is now mandatory to specify both the account (which will be the same as the secondary group associated with your project) and the Slurm queue.
By specifying the queue, you can send jobs from any login node to any partition.
Available storage spaces
- New (empty) filesystems:
/gpfs/home #one per user account (username)
/gpfs/projects #one per project (secondary group)
/gpfs/scratch #one per project (secondary group)
We keep incremental backups of /gpfs/home, /gpfs/apps and /gpfs/projects, the frequency of which depends on the amount of data. That said, it is your responsibility as a user of our facilities to backup all your critical data.
Filesystem | Time to complete copy |
---|---|
/gpfs/home | ~1 day |
/gpfs/apps | ~1 day |
/gpfs/projects | 3~4 days |
MareNostrum 4 and (old) Storage filesystems
- The final location for your old MN4-Storage data is as follows:
/gpfs/home/<PRIMARY_GROUP>/$USER/MN4/<MN4_USER>
/gpfs/projects/<GROUP>/MN4/<GROUP>
/gpfs/scratch/<GROUP>/MN4/<GROUP>/<MN4_USER>
Slurm changes with performance implications
Due to modifications in the current Slurm version, srun no longer considers
SLURM_CPUS_PER_TASK
and does not inherit the--cpus-per-task
option from sbatch.Therefore, it is necessary to specify explicitly
--cpus-per-task
in your srun commands or set the environment variableSRUN_CPUS_PER_TASK
instead, for example:Example 1:
[...]
#SBATCH -n 1
#SBATCH -c 2
srun --cpus-per-task=2 ./openmp_binaryExample 2:
[...]
#SBATCH -n 1
#SBATCH -c 2
export SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK}
srun ./openmp_binary
- This only applies to srun, not mpirun.
- This becomes crucial when executing with more than one thread per process.
- If the aforementioned is excluded, thread pinning (thread affinity) will be adversely affected, resulting in threads overlapping on the same cores (hardware threads).
- This will have a direct impact on the application's performance.
When using mpirun instead of srun, the SLURM_CPU_BIND variable must be set to "none":
export SLURM_CPU_BIND=none
When using the Nvidia HPC SDK in the accelerated partition, MPI binaries must be run using mpirun, rather than srun. This is due to the Slurm inside the Nvidia SDK not being entirely compatible with Marenostrum5's Slurm configuration, causing it to fail.
Other considerations
Remote Operation Error
If you run into an error similar to this one:
[gs15r1b68:2016180:0:2016180] ib_mlx5_log.c:179 Remote operation error on mlx5_0:1/IB (synd 0x14 vend 0x89 hw_synd 0/0)
[gs15r1b68:2016180:0:2016180] ib_mlx5_log.c:179 DCI QP 0x1b270 wqe[106]: SEND s-e [rqpn 0xce03 rlid 4285] [va 0x7f072bdf5400 len 65 lkey 0xb300f5]
Check that you are using an UCX module, as this error comes from a known bug in the system-wide installation of UCX, running this command should fix the issue:
module load ucx
Floating-Point Exception Error
Another error you might encounter is a floating-point exception, which appears as:
Program received signal SIGFPE: Floating-point exception - erroneousarithmetic operation.
This error could also be related to the UCX module. To address it, try loading the UCX module with:
module load ucx
By loading the UCX module, you should be able to resolve both types of errors.
Hyper-Threading
All nodes in MareNostrum 5 come with Hyper-Threading capability. In this regard, unless you explicitly request to run on SMT, you don't need to be concerned, and you can continue configuring your jobs just as you did in MN4.
We'll soon provide guidance on effective utilization for those interested in leveraging this new functionality.