About SGI cpusets

An SGI cpuset is a named set of CPUs. The processes attached to a cpuset can only run on the CPUs belonging to that cpuset.

How LSF uses cpusets

LSF uses two types of cpusets:

  • Dynamic cpusets: Jobs are attached to a cpuset dynamically created by LSF. The cpuset is deleted when the job finishes or exits. If not specified, the default cpuset type is dynamic.

  • Static cpusets: Jobs are attached to a static cpuset specified by users at job submission. This cpuset is not deleted when the job finishes or exits. Specifying a cpuset name at job submission implies that the cpuset type is static. If the static cpuset does not exist, the job will remain pending until LSF detects a static cpuset with the specified name.

The following diagram shows the system architecture:

Cpusets can be created and deallocated dynamically out of available machine resources. Not only does the cpuset provide containment, so that a job requiring a specific number of CPUs will only run on those CPUs, but also reservation, so that the required number of CPUs are guaranteed to be available only for the job they are allocated to.

LSF can be configured to make use of SGI cpusets to enforce processor limits for LSF jobs. When a job is submitted, LSF creates a cpuset and attaches it to the job when the job is scheduled. After the job finishes, LSF deallocates the cpuset. If no host meets the CPU requirements, the job remains pending until processors become available to allocate the cpuset.

Assumptions and limitations

  • When LSF selects cpuset jobs to preempt, MINI_JOB and LEAST_RUN_TIME are ignored in the PREEMPT_FOR parameter in lsb.params.

  • When using cpusets, LSF schedules jobs based on the number of slots assigned to the hosts instead of the number of CPUs. The lsb.params parameter setting PARALLEL_SCHED_BY_SLOTS=N has no effect.

  • Preemptable queue preference is not supported.

  • Before upgrading from a previous version, clusters must be drained of all running jobs (especially cpuset hosts).

  • The new cpuset integration cannot coexist with the old integration within the same cluster.

  • Under the MultiCluster lease model, both clusters must use the same version of the cpuset integration.

  • Since backfill and slot reservation are based on an entire host, they may not work correctly if your cluster contains hosts that use both static and dynamic cpusets or multiple static cpusets.

  • Jobs submitted to a chunk job queue are not chunked together, but run as individual LSF jobs inside a dynamic cpuset.

  • When LSF selects cpuset jobs to preempt, specialized preemption preferences, such as MINI_JOB and LEAST_RUN_TIME in the PREEMPT_FOR parameter in lsb.params and others are ignored when slot preemption is required.

  • Preemptable queue preference is not supported.

  • Job pre-execution programs run within the job cpuset, since they are part of the job. By default, post-execution programs run outside of the job cpuset.

  • If JOB_INCLUDE_POSTPROC=Y is specified in lsb.applications, post- execution processing is not attached to the job cpuset, and Platform LSF does not release the cpuset until post-execution processing has finished.

  • Jobs suspended (for example, with bstop) will release their cpusets.

  • Jobs running in a cpuset cannot be resized.

SGI MPI jobs

To run mulithost MPI applications, you must also enable rsh without password prompts between hosts:

  • The remote host must defined in the arrayd configuration.

  • Configure .rhosts so that rsh does not require a password.

Forcing a cpuset job to run

The administrator must use brun -c to force a cpuset job to run. If the job is forced to run on non-cpuset hosts, or if any host in the host list specified with -m is not a cpuset host, -extsched cpuset options are ignored and the job runs with no cpusets allocated.

If the job is forced to run on a cpuset host:

  • For dynamic cpusets: LSF allocates a dynamic cpuset without any cpuset options and runs the job inside the dynamic cpuset.

  • For static cpusets: LSF runs the job in static cpuset. If the specific static cpuset does not exsit, the job is requeued.