The LSF system uses built-in and configured resources to track resource availability and usage. Jobs are scheduled according to the resources available on individual hosts.
Jobs submitted through the LSF system will have the resources they use monitored while they are running. This information is used to enforce resource limits and load thresholds as well as fairshare scheduling.
LSF collects information such as:
Total CPU time consumed by all processes in the job
Total resident memory usage in KB of all currently running processes in a job
Total virtual memory usage in KB of all currently running processes in a job
Currently active process group ID in a job
Currently active processes in a job
On UNIX, job-level resource usage is collected through PIM.
lsinfo — View the resources available in your cluster
bjobs -l — View current resource usage of a job
SBD_SLEEP_TIME in lsb.params — Configures how often resource usage information is sampled by PIM, collected by sbatchd, and sent to mbatchd
Load indices measure the availability of dynamic, non-shared resources on hosts in the cluster. Load indices built into the LIM are updated at fixed time intervals.
lsload -l — View all load indices
bhosts -l — View load levels on a host
Defined and configured by the LSF administrator and collected by an External Load Information Manager (ELIM) program. The ELIM also updates LIM when new values are received.
lsinfo — View external load indices
Built-in resources that represent host information that does not change over time, such as the maximum RAM available to user processes or the number of processors in a machine. Most static resources are determined by the LIM at startup.
Static resources can be used to select appropriate hosts for particular jobs based on binary architecture, relative CPU speed, and system configuration.
Two types of load thresholds can be configured by your LSF administrator to schedule jobs in queues. Each load threshold specifies a load index value:
loadSched determines the load condition for dispatching pending jobs. If a host’s load is beyond any defined loadSched, a job will not be started on the host. This threshold is also used as the condition for resuming suspended jobs.
loadStop determines when running jobs should be suspended.
To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched.
The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.
bhosts -l — View suspending conditions for hosts
bqueues -l — View suspending conditions for queues
bjobs -l — View suspending conditions for a particular job and the scheduling thresholds that control when a job is resumed
lsb.hosts — Configure thresholds for hosts
lsb.queues — Configure thresholds for queues
Limit the use of resources while a job is running. Jobs that consume more than the specified amount of a resource are signalled or have their priority lowered.
lsb.queues — Configure resource usage limits for queues
Resource limits specified at the queue level are hard limits while those specified with job submission are soft limits. See setrlimit(2) man page for concepts of hard and soft limits.
Restrict the amount of a given resource that must be available during job scheduling for different classes of jobs to start, and which resource consumers the limits apply to. If all of the resource has been consumed, no more jobs can be started until some of the resource is released.
lsb.resources — Configure queue-level resource allocation limits for hosts, users, queues, and projects
Restrict which hosts the job can run on. Hosts that match the resource requirements are the candidate hosts. When LSF schedules a job, it collects the load index values of all the candidate hosts and compares them to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.
bsub -R — Specify resource requirement string for a job
lsb.queues — Configure resource requirements for queues