lsb.queues

The lsb.queues file defines batch queues. Numerous controls are available at the queue level to allow cluster administrators to customize site policies.

This file is optional; if no queues are configured, LSF creates a queue named default, with all parameters set to default values.

This file is installed by default in LSB_CONFDIR/cluster_name/configdir.

Changing lsb.queues configuration

After making any changes to lsb.queues, run badmin reconfig to reconfigure mbatchd.

Some parameters such as run window and run time limit do not take effect immediately for running jobs unless you run mbatchd restart or sbatchd restart on the job execution host.

lsb.queues structure

Each queue definition begins with the line Begin Queue and ends with the line End Queue. The queue name must be specified; all other parameters are optional.

ADMINISTRATORS

Syntax

ADMINISTRATORS=user_name | user_group ...

Description

List of queue administrators. To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME\user_group).

Queue administrators can perform operations on any user’s job in the queue, as well as on the queue itself.

Default

Not defined. You must be a cluster administrator to operate on this queue.

APS_PRIORITY

Syntax

APS_PRIORITY=WEIGHT[[factor, value] [subfactor, value]...]...] LIMIT[[factor, value] [subfactor, value]...]...] GRACE_PERIOD[[factor, value] [subfactor, value]...]...]

Description

Specifies calculation factors for absolute priority scheduling (APS). Pending jobs in the queue are ordered according to the calculated APS value.

If weight of a subfactor is defined, but the weight of parent factor is not defined, the parent factor weight is set as 1.

The WEIGHT and LIMIT factors are floating-point values. Specify a value for GRACE_PERIOD in seconds (values), minutes (valuem), or hours (valueh).

The default unit for grace period is hours.

For example, the following sets a grace period of 10 hours for the MEM factor, 10 minutes for the JPRIORITY factor, 10 seconds for the QPRIORITY factor, and 10 hours (default) for the RSRC factor:
GRACE_PERIOD[[MEM,10h] [JPRIORITY, 10m] [QPRIORITY,10s] [RSRC, 10]]

You cannot specify zero (0) for the WEIGHT, LIMIT, and GRACE_PERIOD of any factor or subfactor.

APS queues cannot configure cross-queue fairshare (FAIRSHARE_QUEUES). The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.

Suspended (bstop) jobs and migrated jobs (bmig) are always scheduled before pending jobs. For migrated jobs, LSF keeps the existing job priority information.

If LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured in lsf.conf, the migrated jobs keep their APS information. When LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured, the migrated jobs need to compete with other pending jobs based on the APS value. If you want to reset the APS value, the you should use brequeue, not bmig.

Default

Not defined

BACKFILL

Syntax

BACKFILL=Y | N

Description

If Y, enables backfill scheduling for the queue.

A possible conflict exists if BACKFILL and PREEMPTION are specified together. If PREEMPT_JOBTYPE = BACKFILL is set in the lsb.params file, a backfill queue can be preemptable. Otherwise a backfill queue cannot be preemptable. If BACKFILL is enabled do not also specify PREEMPTION = PREEMPTABLE.

BACKFILL is required for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds).

When MAX_SLOTS_IN_POOL, SLOT_RESERVE, and BACKFILL are defined for the same queue, jobs in the queue cannot backfill using slots reserved by other jobs in the same queue.

Default

Not defined. No backfilling.

CHKPNT

Syntax

CHKPNT=chkpnt_dir [chkpnt_period]

Description

Enables automatic checkpointing for the queue. All jobs submitted to the queue are checkpointable.

The checkpoint directory is the directory where the checkpoint files are created. Specify an absolute path or a path relative to CWD, do not use environment variables.

Specify the optional checkpoint period in minutes.

Only running members of a chunk job can be checkpointed.

If checkpoint-related configuration is specified in both the queue and an application profile, the application profile setting overrides queue level configuration.

If checkpoint-related configuration is specified in the queue, application profile, and at job level:
  • Application-level and job-level parameters are merged. If the same parameter is defined at both job-level and in the application profile, the job-level value overrides the application profile value.
  • The merged result of job-level and application profile settings override queue-level configuration.

To enable checkpointing of MultiCluster jobs, define a checkpoint directory in both the send-jobs and receive-jobs queues (CHKPNT in lsb.queues), or in an application profile (CHKPNT_DIR, CHKPNT_PERIOD, CHKPNT_INITPERIOD, CHKPNT_METHOD in lsb.applications) of both submission cluster and execution cluster. LSF uses the directory specified in the execution cluster.

To make a MultiCluster job checkpointable, both submission and execution queues must enable checkpointing, and the application profile or queue setting on the execution cluster determines the checkpoint directory. Checkpointing is not supported if a job runs on a leased host.

The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

Default

Not defined

CHUNK_JOB_SIZE

Syntax

CHUNK_JOB_SIZE=integer

Description

Chunk jobs only. Enables job chunking and specifies the maximum number of jobs allowed to be dispatched together in a chunk. Specify a positive integer greater than 1.

The ideal candidates for job chunking are jobs that have the same host and resource requirements and typically take 1 to 2 minutes to run.

Job chunking can have the following advantages:
  • Reduces communication between sbatchd and mbatchd and reduces scheduling overhead in mbschd.
  • Increases job throughput in mbatchd and CPU utilization on the execution hosts.

However, throughput can deteriorate if the chunk job size is too big. Performance may decrease on queues with CHUNK_JOB_SIZE greater than 30. You should evaluate the chunk job size on your own systems for best performance.

With MultiCluster job forwarding model, this parameter does not affect MultiCluster jobs that are forwarded to a remote cluster.

Compatibility

This parameter is ignored in the following kinds of queues and applications:
  • Interactive (INTERACTIVE=ONLY parameter)
  • CPU limit greater than 30 minutes (CPULIMIT parameter)
  • Run limit greater than 30 minutes (RUNLIMIT parameter)
  • Runtime estimate greater than 30 minutes (RUNTIME parameter in lsb.applications only)

If CHUNK_JOB_DURATION is set in lsb.params, chunk jobs are accepted regardless of the value of CPULIMIT, RUNLIMIT or RUNTIME.

Example

The following configures a queue named chunk, which dispatches up to 4 jobs in a chunk:
Begin Queue
QUEUE_NAME     = chunk 
PRIORITY       = 50 
CHUNK_JOB_SIZE = 4 
End Queue

Default

Not defined

COMMITTED_RUN_TIME_FACTOR

Syntax

COMMITTED_RUN_TIME_FACTOR=number

Description

Used only with fairshare scheduling. Committed run time weighting factor.

In the calculation of a user’s dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the -W option of bsub is not specified at job submission and a RUNLIMIT has not been set for the queue, the committed run time is not considered.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Valid values

Any positive number between 0.0 and 1.0

Default

Not defined.

CORELIMIT

Syntax

CORELIMIT=integer

Description

The per-process (hard) core file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).

Default

Unlimited

Start of change

CPU_FREQUENCY

Syntax

CPU_FREQUENCY=[float_number][unit]

Description

Specifies the CPU frequency for a queue. All jobs submit to the queue require the specified CPU frequency. Value is a positive float number with units (GHz, MHz, or KHz). If no units are set, the default is GHz.

This value can also be set using the command bsub –freq.

The submission value will overwrite the application profile value, and the application profile value will overwrite the queue value.

Default

Not defined (Nominal CPU frequency is used)

End of change

CPULIMIT

Syntax

CPULIMIT=[default_limit] maximum_limit

where default_limit and maximum_limit are:

[hour:]minute[/host_name | /host_model]

Description

Maximum normalized CPU time and optionally, the default normalized CPU time allowed for all processes of a job running in this queue. The name of a host or host model specifies the CPU time normalization host to use.

Limits the total CPU time the job can use. This parameter is useful for preventing runaway jobs or jobs that use up too many resources.

When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.

If a job dynamically spawns processes, the CPU time used by these processes is accumulated over the life of the job.

Processes that exist for fewer than 30 seconds may be ignored.

By default, if a default CPU limit is specified, jobs submitted to the queue without a job-level CPU limit are killed when the default CPU limit is reached.

If you specify only one limit, it is the maximum, or hard, CPU limit. If you specify two limits, the first one is the default, or soft, CPU limit, and the second one is the maximum CPU limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30 or 210.

If no host or host model is given with the CPU time, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured, otherwise uses the host with the largest CPU factor (the fastest host in the cluster).

On Windows, a job that runs under a CPU time limit may exceed that limit by up to SBD_SLEEP_TIME. This is because sbatchd periodically checks if the limit has been exceeded.

On UNIX systems, the CPU limit can be enforced by the operating system at the process level.

You can define whether the CPU limit is a per-process limit enforced by the OS or a per-job limit enforced by LSF with LSB_JOB_CPULIMIT in lsf.conf.

Jobs submitted to a chunk job queue are not chunked if CPULIMIT is greater than 30 minutes.

Default

Unlimited

CPU_TIME_FACTOR

Syntax

CPU_TIME_FACTOR=number

Description

Used only with fairshare scheduling. CPU time weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the cumulative CPU time used by a user’s jobs.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

0.7

DATALIMIT

Syntax

DATALIMIT=[default_limit] maximum_limit

Description

The per-process data segment size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).

By default, if a default data limit is specified, jobs submitted to the queue without a job-level data limit are killed when the default data limit is reached.

If you specify only one limit, it is the maximum, or hard, data limit. If you specify two limits, the first one is the default, or soft, data limit, and the second one is the maximum data limit.

Default

Unlimited

DEFAULT_EXTSCHED

Syntax

DEFAULT_EXTSCHED=external_scheduler_options

Description

Specifies default external scheduling options for the queue.

-extsched options on the bsub command are merged with DEFAULT_EXTSCHED options, and -extsched options override any conflicting queue-level options set by DEFAULT_EXTSCHED.

Default

Not defined

DEFAULT_HOST_SPEC

Syntax

DEFAULT_HOST_SPEC=host_name | host_model

Description

The default CPU time normalization host for the queue.

The CPU factor of the specified host or host model is used to normalize the CPU time limit of all jobs in the queue, unless the CPU time normalization host is specified at the job level.

Default

Not defined. The queue uses the DEFAULT_HOST_SPEC defined in lsb.params. If DEFAULT_HOST_SPEC is not defined in either file, LSF uses the fastest host in the cluster.

DESCRIPTION

Syntax

DESCRIPTION=text

Description

Description of the job queue displayed by bqueues -l.

This description should clearly describe the service features of this queue, to help users select the proper queue for each job.

The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 512 characters.

DISPATCH_BY_QUEUE

Syntax

DISPATCH_BY_QUEUE=Y|y|N|n

Description

Set this parameter to increase queue responsiveness. The scheduling decision for the specified queue will be published without waiting for the whole scheduling session to finish. The scheduling decision for the jobs in the specified queue is final and these jobs cannot be preempted within the same scheduling cycle.

Tip:

Only set this parameter for your highest priority queue (such as for an interactive queue) to ensure that this queue has the highest responsiveness.

Default

N

DISPATCH_ORDER

Syntax

DISPATCH_ORDER=QUEUE

Description

Defines an ordered cross-queue fairshare set. DISPATCH_ORDER indicates that jobs are dispatched according to the order of queue priorities first, then user fairshare priority.

By default, a user has the same priority across the master and slave queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user has submitted across the master-slave set.

If DISPATCH_ORDER=QUEUE is set in the master queue, jobs are dispatched according to queue priorities first, then user priority. Jobs from users with lower fairshare priorities who have pending jobs in higher priority queues are dispatched before jobs in lower priority queues. This avoids having users with higher fairshare priority getting jobs dispatched from low-priority queues.

Jobs in queues having the same priority are dispatched according to user priority.

Queues that are not part of the cross-queue fairshare can have any priority; they are not limited to fall outside of the priority range of cross-queue fairshare queues.

Default

Not defined

DISPATCH_WINDOW

Syntax

DISPATCH_WINDOW=time_window ...

Description

The time windows in which jobs from this queue are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.

Default

Not defined. Dispatch window is always open.

ENABLE_HIST_RUN_TIME

Syntax

ENABLE_HIST_RUN_TIME=y | Y | n | N

Description

Used only with fairshare scheduling. If set, enables the use of historical run time in the calculation of fairshare scheduling priority.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

EXCLUSIVE

Syntax

EXCLUSIVE=Y | N | CU[cu_type]

Description

If Y, specifies an exclusive queue.

If CU, CU[], or CU[cu_type], specifies an exclusive queue as well as a queue exclusive to compute units of type cu_type (as defined in lsb.params). If no type is specified, the default compute unit type is used.

Jobs submitted to an exclusive queue with bsub -x are only dispatched to a host that has no other LSF jobs running. Jobs submitted to a compute unit exclusive queue with bsub -R "cu[excl]" only run on a compute unit that has no other jobs running.

For hosts shared under the MultiCluster resource leasing model, jobs are not dispatched to a host that has LSF jobs running, even if the jobs are from another cluster.

Note: EXCLUSIVE=Y or EXCLUSIVE=CU[cu_type] must be configured to enable affinity jobs to use CPUs exclusively, when the alljobs scope is specified in the exclusive option of an affinity[] resource requirement string.

Default

N

FAIRSHARE

Syntax

FAIRSHARE=USER_SHARES[[user, number_shares] ...]
  • Specify at least one user share assignment.
  • Enclose the list in square brackets, as shown.
  • Enclose each user share assignment in square brackets, as shown.
  • user: Specify users who are also configured to use queue. You can assign the shares to:
    • A single user (specify user_name). To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).
    • Users in a group, individually (specify group_name@) or collectively (specify group_name). To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\group_name.
    • Users not included in any other share assignment, individually (specify the keyword default) or collectively (specify the keyword others)
      • By default, when resources are assigned collectively to a group, the group members compete for the resources on a first-come, first-served (FCFS) basis. You can use hierarchical fairshare to further divide the shares among the group members.
      • When resources are assigned to members of a group individually, the share assignment is recursive. Members of the group and of all subgroups always compete for the resources according to FCFS scheduling, regardless of hierarchical fairshare policies.
  • number_shares
    • Specify a positive integer representing the number of shares of the cluster resources assigned to the user.
    • The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.

Description

Enables queue-level user-based fairshare and specifies share assignments. Only users with share assignments can submit jobs to the queue.

Compatibility

Do not configure hosts in a cluster to use fairshare at both queue and host levels. However, you can configure user-based fairshare and queue-based fairshare together.

Default

Not defined. No fairshare.

FAIRSHARE_ADJUSTMENT_FACTOR

Syntax

FAIRSHARE_ADJUSTMENT_FACTOR=number

Description

Used only with fairshare scheduling. Fairshare adjustment plugin weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the user-defined adjustment made in the fairshare plugin (libfairshareadjust.*).

A positive float number both enables the fairshare plugin and acts as a weighting factor.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

FAIRSHARE_QUEUES

Syntax

FAIRSHARE_QUEUES=queue_name[queue_name ...]

Description

Defines cross-queue fairshare. When this parameter is defined:
  • The queue in which this parameter is defined becomes the “master queue”.
  • Queues listed with this parameter are slave queuesand inherit the fairshare policy of the master queue.
  • A user has the same priority across the master and slave queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user has submitted across the master-slave set.

Notes

  • By default, the PRIORITY range defined for queues in cross-queue fairshare cannot be used with any other queues. For example, you have 4 queues: queue1, queue2, queue3, queue4. You configure cross-queue fairshare for queue1, queue2, queue3 and assign priorities of 30, 40, 50 respectively.
  • By default, the priority of queue4 (which is not part of the cross-queue fairshare) cannot fall between the priority range of the cross-queue fairshare queues (30-50). It can be any number up to 29 or higher than 50. It does not matter if queue4 is a fairshare queue or FCFS queue. If DISPATCH_ORDER=QUEUE is set in the master queue, the priority of queue4 (which is not part of the cross-queue fairshare) can be any number, including a priority falling between the priority range of the cross-queue fairshare queues (30-50).
  • FAIRSHARE must be defined in the master queue. If it is also defined in the queues listed in FAIRSHARE_QUEUES, it is ignored.
  • Cross-queue fairshare can be defined more than once within lsb.queues. You can define several sets of master-slave queues. However, a queue cannot belong to more than one master-slave set. For example, you can define:
    • In queue normal: FAIRSHARE_QUEUES=short
    • In queue priority: FAIRSHARE_QUEUES=night owners
      Restriction: You cannot, however, define night, owners, or priority as slaves in the queue normal; or normaland short as slaves in the priority queue; or short, night, owners as master queues of their own.
  • Cross-queue fairshare cannot be used with host partition fairshare . It is part of queue-level fairshare.
  • Cross-queue fairshare cannot be used with absolute priority scheduling.

Default

Not defined

FILELIMIT

Syntax

FILELIMIT=integer

Description

The per-process (hard) file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).

Default

Unlimited

HIST_HOURS

Syntax

HIST_HOURS=hours

Description

Used only with fairshare scheduling. Determines a rate of decay for cumulative CPU time, run time, and historical run time.

To calculate dynamic user priority, LSF scales the actual CPU time and the run time using a decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.

To calculate dynamic user priority with decayed run time and historical run time, LSF scales the accumulated run time of finished jobs and run time of running jobs using the same decay factor, so that 1 hour of recently-used time is equivalent to 0.1 hours after the specified number of hours has elapsed.

When HIST_HOURS=0, CPU time and run time accumulated by running jobs is not decayed.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

HJOB_LIMIT

Syntax

HJOB_LIMIT=integer

Description

Per-host job slot limit.

Maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it may have.

Example

The following runs a maximum of one job on each of hostA, hostB, and hostC:
Begin Queue 
... 
HJOB_LIMIT = 1 
HOSTS=hostA hostB hostC 
... 
End Queue

Default

Unlimited

Start of change

HOST_POST_EXEC

Syntax

HOST_POST_EXEC=command

Description

Enables host-based post-execution processing at the queue level. The HOST_POST_EXEC command runs on all execution hosts after the job finishes. If job based post-execution POST_EXEC was defined at the queue-level/application-level/job-level, the HOST_POST_EXEC command runs after POST_EXEC of any level.

Host-based post-execution commands can be configured at the queue and application level, and run in the following order:
  1. The application-level command
  2. The queue-level command.

The supported command rule is the same as the existing POST_EXEC for the queue section. See the POST_EXEC topic for details.

Note:

The host-based pre-execution command cannot be executed on Windows platforms. This parameter cannot be used to configure job-based post-execution processing.

Default

Not defined.

End of change
Start of change

HOST_PRE_EXEC

Syntax

HOST_PRE_EXEC=command

Description

Enables host-based pre-execution processing at the queue level. The HOST_PRE_EXEC command runs on all execution hosts before the job starts. If job based pre-execution PRE_EXEC was defined at the queue-level/application-level/job-level, the HOST_PRE_EXEC command runs before PRE_EXEC of any level.

Host-based pre-execution commands can be configured at the queue and application level, and run in the following order:
  1. The queue-level command
  2. The application-level command.

The supported command rule is the same as the existing PRE_EXEC for the queue section. See the PRE_EXEC topic for details.

Note:

The host-based pre-execution command cannot be executed on Windows platforms. This parameter cannot be used to configure job-based pre-execution processing.

Default

Not defined.

End of change
Start of change

HOSTLIMIT_PER_JOB

Syntax

HOSTLIMIT_PER_JOB=integer

Description

Per-job host limit.

The maximum number of hosts that a job in this queue can use. LSF verifies the host limit during the allocation phase of scheduling. If the number of hosts requested for a parallel job exceeds this limit and LSF cannot satisfy the minimum number of request slots, the parallel job will pend. However, for resumed parallel jobs, this parameter does not stop the job from resuming even if the job's host allocation exceeds the per-job host limit specified in this parameter.

Default

Unlimited

End of change

HOSTS

Syntax

HOSTS=host_list | none
  • host_list is a space-separated list of the following items:
    • host_name[@cluster_name][[!] | +pref_level]
    • host_partition[+pref_level]
    • host_group[[!] | +pref_level]
    • compute_unit[[!] | +pref_level]
    • [~]host_name
    • [~]host_group
    • [~]compute_unit
  • The list can include the following items only once:
    • all@cluster_name
    • others[+pref_level]
    • all
    • allremote
  • The none keyword is only used with the MultiCluster job forwarding model, to specify a remote-only queue.

Description

A space-separated list of hosts on which jobs from this queue can be run.

If compute units, host groups, or host partitions are included in the list, the job can run on any host in the unit, group, or partition. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected.

Some items can be followed by a plus sign (+) and a positive number to indicate the preference for dispatching a job to that host. A higher number indicates a higher preference. If a host preference is not given, it is assumed to be 0. If there are multiple candidate hosts, LSF dispatches the job to the host with the highest preference; hosts at the same level of preference are ordered by load.

If compute units, host groups, or host partitions are assigned a preference, each host in the unit, group, or partition has the same preference.

Use the keyword others to include all hosts not explicitly listed.

Use the keyword all to include all hosts not explicitly excluded.

Use the keyword all@cluster_name hostgroup_name or allremote hostgroup_name to include lease in hosts.

Use the not operator (~) to exclude hosts from the all specification in the queue. This is useful if you have a large cluster but only want to exclude a few hosts from the queue definition.

The not operator can only be used with the all keyword. It is not valid with the keywords others and none.

The not operator (~) can be used to exclude host groups.

For parallel jobs, specify first execution host candidates when you want to ensure that a host has the required resources or runtime environment to handle processes that run on the first execution host.

To specify one or more hosts, host groups, or compute units as first execution host candidates, add the exclamation point (!) symbol after the name.

Follow these guidelines when you specify first execution host candidates:
  • If you specify a compute unit or host group, you must first define the unit or group in the file lsb.hosts.
  • Do not specify a dynamic host group as a first execution host.
  • Do not specify all, allremote, or others, or a host partition as a first execution host.
  • Do not specify a preference (+) for a host identified by (!) as a first execution host candidate.
  • For each parallel job, specify enough regular hosts to satisfy the CPU requirement for the job. Once LSF selects a first execution host for the current job, the other first execution host candidates
    • Become unavailable to the current job
    • Remain available to other jobs as either regular or first execution hosts
  • You cannot specify first execution host candidates when you use the brun command.
Restriction: If you have enabled EGO, host groups and compute units are not honored.

With MultiCluster resource leasing model, use the format host_name@cluster_name to specify a borrowed host. LSF does not validate the names of remote hosts. The keyword others indicates all local hosts not explicitly listed. The keyword all indicates all local hosts not explicitly excluded. Use the keyword allremote to specify all hosts borrowed from all remote clusters. Use all@cluster_name to specify the group of all hosts borrowed from one remote cluster. You cannot specify a host group or partition that includes remote resources, unless it uses the keyword allremote to include all remote hosts. You cannot specify a compute unit that includes remote resources.

With MultiCluster resource leasing model, the not operator (~) can be used to exclude local hosts or host groups. You cannot use the not operator (~) with remote hosts.

Restriction: Hosts that participate in queue-based fairshare cannot be in a host partition.

Behavior with host intersection

Host preferences specified by bsub -m combine intelligently with the queue specification and advance reservation hosts. The jobs run on the hosts that are both specified at job submission and belong to the queue or have advance reservation.

Example 1

HOSTS=hostA+1 hostB hostC+1 hostD+3

This example defines three levels of preferences: run jobs on hostD as much as possible, otherwise run on either hostA or hostC if possible, otherwise run on hostB. Jobs should not run on hostB unless all other hosts are too busy to accept more jobs.

Example 2

HOSTS=hostD+1 others

Run jobs on hostD as much as possible, otherwise run jobs on the least-loaded host available.

With MultiCluster resource leasing model, this queue does not use borrowed hosts.

Example 3

HOSTS=all ~hostA

Run jobs on all hosts in the cluster, except for hostA.

With MultiCluster resource leasing model, this queue does not use borrowed hosts.

Example 4

HOSTS=Group1 ~hostA hostB hostC

Run jobs on hostB, hostC, and all hosts in Group1 except for hostA.

With MultiCluster resource leasing model, this queue uses borrowed hosts if Group1 uses the keyword allremote.

Example 5

HOSTS=hostA! hostB+ hostC hostgroup1!

Runs parallel jobs using either hostA or a host defined in hostgroup1 as the first execution host. If the first execution host cannot run the entire job due to resource requirements, runs the rest of the job on hostB. If hostB is too busy to accept the job, or if hostB does not have enough resources to run the entire job, runs the rest of the job on hostC.

Example 6

HOSTS=computeunit1! hostB hostC

Runs parallel jobs using a host in computeunit1 as the first execution host. If the first execution host cannot run the entire job due to resource requirements, runs the rest of the job on other hosts in computeunit1 followed by hostB and finally hostC.

Example 7

HOSTS=hostgroup1! computeunitA computeunitB computeunitC

Runs parallel jobs using a host in hostgroup1 as the first execution host. If additional hosts are required, runs the rest of the job on other hosts in the same compute unit as the first execution host, followed by hosts in the remaining compute units in the order they are defined in the lsb.hosts ComputeUnit section.

Default

all (the queue can use all hosts in the cluster, and every host has equal preference)

With MultiCluster resource leasing model, this queue can use all local hosts, but no borrowed hosts.

IGNORE_DEADLINE

Syntax

IGNORE_DEADLINE=Y

Description

If Y, disables deadline constraint scheduling (starts all jobs regardless of deadline constraints).

IMPT_JOBBKLG

Syntax

IMPT_JOBBKLG=integer |infinit

Description

MultiCluster job forwarding model only.

Specifies the MultiCluster pending job limit for a receive-jobs queue. This represents the maximum number of MultiCluster jobs that can be pending in the queue; once the limit has been reached, the queue stops accepting jobs from remote clusters.

Use the keyword infinit to make the queue accept an unlimited number of pending MultiCluster jobs.

Default

50

IMPT_SLOTBKLG

Syntax

IMPT_SLOTBKLG=integer |infinit

Description

MultiCluster job forwarding model only.

Specifies the MultiCluster pending job slot limit for a receive-jobs queue. In the submission cluster, if the total of requested job slots and the number of imported pending slots in the receiving queue is greater than IMPT_SLOTBKLG, the queue stops accepting jobs from remote clusters, and the job is not forwarded to the receiving queue.

Specify an integer between 0 and 2147483646 for the number of slots.

Use the keyword infinit to make the queue accept an unlimited number of pending MultiCluster job slots.

Set IMPT_SLOTBKLG to 0 to forbid any job being forwarded to the receiving queue.

Default

infinit (the queue accepts an unlimited number of pending MultiCluster job slots)

INTERACTIVE

Syntax

INTERACTIVE=YES | NO | ONLY

Description

YES causes the queue to accept both interactive and non-interactive batch jobs, NO causes the queue to reject interactive batch jobs, and ONLY causes the queue to accept interactive batch jobs and reject non-interactive batch jobs.

Interactive batch jobs are submitted via bsub -I.

Default

YES. The queue accepts both interactive and non-interactive jobs.

INTERRUPTIBLE_BACKFILL

Syntax

INTERRUPTIBLE_BACKFILL=seconds

Description

Configures interruptible backfill scheduling policy, which allows reserved job slots to be used by low priority small jobs that are terminated when the higher priority large jobs are about to start.

There can only be one interruptible backfill queue.It should be the lowest priority queue in the cluster.

Specify the minimum number of seconds for the job to be considered for backfilling.This minimal time slice depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created if a site has jobs of distinctively different classes.

An interruptible backfill job:
  • Starts as a regular job and is killed when it exceeds the queue runtime limit, or
  • Is started for backfill whenever there is a backfill time slice longer than the specified minimal time, and killed before the slot-reservation job is about to start

The queue RUNLIMIT corresponds to a maximum time slice for backfill, and should be configured so that the wait period for the new jobs submitted to the queue is acceptable to users. 10 minutes of runtime is a common value.

You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues.

BACKFILL and RUNLIMIT must be configured in the queue. The queue is disabled if BACKFILL and RUNLIMIT are not configured.

Assumptions and limitations:

  • The interruptible backfill job holds the slot-reserving job start until its calculated start time, in the same way as a regular backfill job. The interruptible backfill job are not preempted in any way other than being killed when its time come.
  • While the queue is checked for the consistency of interruptible backfill, backfill and runtime specifications, the requeue exit value clause is not verified, nor executed automatically. Configure requeue exit values according to your site policies.
  • The interruptible backfill job must be able to do at least one unit of useful calculations and save its data within the minimal time slice, and be able to continue its calculations after it has been restarted
  • Interruptible backfill paradigm does not explicitly prohibit running parallel jobs, distributed across multiple nodes; however, the chance of success of such job is close to zero.

Default

Not defined. No interruptible backfilling.

JOB_ACCEPT_INTERVAL

Syntax

JOB_ACCEPT_INTERVAL=integer

Description

The number you specify is multiplied by the value of lsb.params MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host, before dispatching a second job to the same host.

If 0 (zero), a host may accept more than one job in each dispatch turn. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it is unable to create any more processes. It is not recommended to set this parameter to 0.

JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params).

Note:

The parameter JOB_ACCEPT_INTERVAL only applies when there are running jobs on a host. A host running a short job which finishes before JOB_ACCEPT_INTERVAL has elapsed is free to accept a new job without waiting.

Default

Not defined. The queue uses JOB_ACCEPT_INTERVAL defined in lsb.params, which has a default value of 1.

JOB_ACTION_WARNING_TIME

Syntax

JOB_ACTION_WARNING_TIME=[hour:]minute

Description

Specifies the amount of time before a job control action occurs that a job warning action is to be taken. For example, 2 minutes before the job reaches runtime limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.

Job action warning time is not normalized.

A job action warning time must be specified with a job warning action in order for job warning to take effect.

The warning time specified by the bsub -wt option overrides JOB_ACTION_WARNING_TIME in the queue. JOB_ACTION_WARNING_TIME is used as the default when no command line option is specified.

Example

JOB_ACTION_WARNING_TIME=2

Default

Not defined

JOB_CONTROLS

Syntax

JOB_CONTROLS=SUSPEND[signal | command | CHKPNT] RESUME[signal | command] TERMINATE[signal | command | CHKPNT]
  • signal is a UNIX signal name (for example, SIGTSTP or SIGTERM). The specified signal is sent to the job. The same set of signals is not supported on all UNIX systems. To display a list of the symbolic names of the signals (without the SIG prefix) supported on your system, use the kill -l command.
  • command specifies a /bin/sh command line to be invoked.
    Restriction:

    Do not quote the command line inside an action definition. Do not specify a signal followed by an action that triggers the same signal. For example, do not specify JOB_CONTROLS=TERMINATE[bkill] or JOB_CONTROLS=TERMINATE[brequeue]. This causes a deadlock between the signal and the action.

  • CHKPNT is a special action, which causes the system to checkpoint the job. Only valid for SUSPEND and TERMINATE actions:
    • If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically.
    • If the TERMINATE action is CHKPNT, then the job is checkpointed and killed automatically.

Description

Changes the behavior of the SUSPEND, RESUME, and TERMINATE actions in LSF.
  • The contents of the configuration line for the action are run with /bin/sh -c so you can use shell features in the command.
  • The standard input, output, and error of the command are redirected to the NULL device, so you cannot tell directly whether the command runs correctly. The default null device on UNIX is /dev/null.
  • The command is run as the user of the job.
  • All environment variables set for the job are also set for the command action. The following additional environment variables are set:
    • LSB_JOBPGIDS: a list of current process group IDs of the job
    • LSB_JOBPIDS: a list of current process IDs of the job
  • For the SUSPEND action command, the following environment variables are also set:
    • LSB_SUSP_REASONS - an integer representing a bitmap of suspending reasons as defined in lsbatch.h. The suspending reason can allow the command to take different actions based on the reason for suspending the job.
    • LSB_SUSP_SUBREASONS - an integer representing the load index that caused the job to be suspended. When the suspending reason SUSP_LOAD_REASON (suspended by load) is set in LSB_SUSP_REASONS, LSB_SUSP_SUBREASONS set to one of the load index values defined in lsf.h. Use LSB_SUSP_REASONS and LSB_SUSP_SUBREASONS together in your custom job control to determine the exact load threshold that caused a job to be suspended.
  • If an additional action is necessary for the SUSPEND command, that action should also send the appropriate signal to the application. Otherwise, a job can continue to run even after being suspended by LSF. For example, JOB_CONTROLS=SUSPEND[kill $LSB_JOBPIDS; command]
  • If you set preemption with the signal SIGTSTP you use IBM Platform License Scheduler, define LIC_SCHED_PREEMPT_STOP=Y in lsf.conf for License Scheduler preemption to work.

Default

On UNIX, by default, SUSPEND sends SIGTSTP for parallel or interactive jobs and SIGSTOP for other jobs. RESUME sends SIGCONT. TERMINATE sends SIGINT, SIGTERM and SIGKILL in that order.

On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications are able to process them. Termination is implemented by the TerminateProcess( ) system call.

JOB_IDLE

Syntax

JOB_IDLE=number

Description

Specifies a threshold for idle job exception handling. The value should be a number between 0.0 and 1.0 representing CPU time/runtime. If the job idle factor is less than the specified threshold, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job idle exception.

The minimum job run time before mbatchd reports that the job is idle is defined as DETECT_IDLE_JOB_AFTER in lsb.params.

Valid values

Any positive number between 0.0 and 1.0

Example

JOB_IDLE=0.10

A job idle exception is triggered for jobs with an idle value (CPU time/runtime) less than 0.10.

Default

Not defined. No job idle exceptions are detected.

JOB_OVERRUN

Syntax

JOB_OVERRUN=run_time

Description

Specifies a threshold for job overrun exception handling. If a job runs longer than the specified run time, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job overrun exception.

Example

JOB_OVERRUN=5

A job overrun exception is triggered for jobs running longer than 5 minutes.

Default

Not defined. No job overrun exceptions are detected.

JOB_STARTER

Syntax

JOB_STARTER=starter [starter] ["%USRCMD"] [starter]

Description

Creates a specific environment for submitted jobs prior to execution.

starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified.

By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user’s job in the job starter command line. The %USRCMD string and any additional commands must be enclosed in quotation marks (" ").

If your job starter script runs on a Windows execution host and includes symbols (like & or |), you can use the JOB_STARTER_EXTEND=preservestarter parameter in lsf.conf and set JOB_STARTER=preservestarter in lsb.queues. A customized userstarter can also be used.

Example

JOB_STARTER=csh -c "%USRCMD;sleep 10"
In this case, if a user submits a job
% bsub myjob arguments
the command that actually runs is:
% csh -c "myjob arguments;sleep 10"

Default

Not defined. No job starter is used.

JOB_UNDERRUN

Syntax

JOB_UNDERRUN=run_time

Description

Specifies a threshold for job underrun exception handling. If a job exits before the specified number of minutes, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job underrun exception.

Example

JOB_UNDERRUN=2

A job underrun exception is triggered for jobs running less than 2 minutes.

Default

Not defined. No job underrun exceptions are detected.

JOB_WARNING_ACTION

Syntax

JOB_WARNING_ACTION=signal

Description

Specifies the job action to be taken before a job control action occurs. For example, 2 minutes before the job reaches runtime limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.

A job warning action must be specified with a job action warning time in order for job warning to take effect.

If JOB_WARNING_ACTION is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action.

The warning action specified by the bsub -wa option overrides JOB_WARNING_ACTION in the queue. JOB_WARNING_ACTION is used as the default when no command line option is specified.

Example

JOB_WARNING_ACTION=URG

Default

Not defined

load_index

Syntax

load_index=loadSched[/loadStop]

Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index. Specify multiple lines to configure thresholds for multiple load indices.

Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.

Description

Scheduling and suspending thresholds for the specified dynamic load index.

The loadSched condition must be satisfied before a job is dispatched to the host. If a RESUME_COND is not specified, the loadSched condition must also be satisfied before a suspended job can be resumed.

If the loadStop condition is satisfied, a job on the host is suspended.

The loadSched and loadStop thresholds permit the specification of conditions using simple AND/OR logic. Any load index that does not have a configured threshold has no effect on job scheduling.

LSF does not suspend a job if the job is the only batch job running on the host and the machine is interactively idle (it>0).

The r15s, r1m, and r15m CPU run queue length conditions are compared to the effective queue length as reported by lsload -E, which is normalized for multiprocessor hosts. Thresholds for these parameters should be set at appropriate levels for single processor hosts.

Example

MEM=100/10 
SWAP=200/30
These two lines translate into a loadSched condition of
mem>=100 && swap>=200 
and a loadStop condition of
mem < 10 || swap < 30

Default

Not defined

LOCAL_MAX_PREEXEC_RETRY

Syntax

LOCAL_MAX_PREEXEC_RETRY=integer

Description

The maximum number of times to attempt the pre-execution command of a job on the local cluster.

Valid values

0 < MAX_PREEXEC_RETRY < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of preexec retry times is unlimited

MANDATORY_EXTSCHED

Syntax

MANDATORY_EXTSCHED=external_scheduler_options

Description

Specifies mandatory external scheduling options for the queue.

-extsched options on the bsub command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by -extsched.

Default

Not defined

MAX_JOB_PREEMPT

Syntax

MAX_JOB_PREEMPT=integer

Description

The maximum number of times a job can be preempted. Applies to queue-based preemption only.

Valid values

0 < MAX_JOB_PREEMPT < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of preemption times is unlimited.

MAX_JOB_REQUEUE

Syntax

MAX_JOB_REQUEUE=integer

Description

The maximum number of times to requeue a job automatically.

Valid values

0 < MAX_JOB_REQUEUE < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of requeue times is unlimited

MAX_PREEXEC_RETRY

Syntax

MAX_PREEXEC_RETRY=integer

Description

Use REMOTE_MAX_PREEXEC_RETRY instead. This parameter is maintained for backwards compatibility.

MultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.

If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.

Valid values

0 < MAX_PREEXEC_RETRY < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

5

MAX_PROTOCOL_INSTANCES

Syntax

MAX_PROTOCOL_INSTANCES=integer

Description

For LSF IBM Parallel Environment (PE) integration. Specify the number of parallel communication paths (windows) available to the protocol on each network. If number of windows specified for the job (with the instances option of bsub -network or the NETWORK_REQ parameter in lsb.queues or lsb.applications), or it is greater than the specified maximum value, LSF rejects the job.

Specify MAX_PROTOCOL_INSTANCES in a queue (lsb.queues) or cluster-wide in lsb.params. The value specified in a queue overrides the value specified in lsb.params.

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for MAX_PROTOCOL_INSTANCES to take effect and for LSF to run PE jobs. If LSF_PE_NETWORK_NUM is not defined or is set to 0, the value of MAX_PROTOCOL_INSTANCES is ignored with a warning message.

For best performance, set MAX_PROTOCOL_INSTANCES so that the communication subsystem uses every available adapter before it reuses any of the adapters.

Default

No default value

MAX_RSCHED_TIME

Syntax

MAX_RSCHED_TIME=integer | infinit

Description

MultiCluster job forwarding model only. Determines how long a MultiCluster job stays pending in the execution cluster before returning to the submission cluster. The remote timeout limit in seconds is:
MAX_RSCHED_TIME * MBD_SLEEP_TIME=timeout

Specify infinit to disable remote timeout (jobs always get dispatched in the correct FCFS order because MultiCluster jobs never get rescheduled, but MultiCluster jobs can be pending in the receive-jobs queue forever instead of being rescheduled to a better queue).

Note:

apply to the queue in the submission cluster (only). This parameter is ignored by the receiving queue.

Remote timeout limit never affects advance reservation jobs

Jobs that use an advance reservation always behave as if remote timeout is disabled.

Default

20 (20 minutes by default)

MAX_SLOTS_IN_POOL

Syntax

MAX_SLOTS_IN_POOL=integer

Description

Queue-based fairshare only. Maximum number of job slots available in the slot pool the queue belongs to for queue based fairshare.

Defined in the first queue of the slot pool. Definitions in subsequent queues have no effect.

When defined together with other slot limits (QJOB_LIMIT, HJOB_LIMIT or UJOB_LIMIT in lsb.queues or queue limits in lsb.resources) the lowest limit defined applies.

When MAX_SLOTS_IN_POOL, SLOT_RESERVE, and BACKFILL are defined for the same queue, jobs in the queue cannot backfill using slots reserved by other jobs in the same queue.

Valid values

MAX_SLOTS_IN_POOL can be any number from 0 to INFINIT_INT, where INFINIT_INT is defined in lsf.h.

Default

Not defined

MAX_TOTAL_TIME_PREEMPT

Syntax

MAX_TOTAL_TIME_PREEMPT=integer

Description

The accumulated preemption time in minutes after which a job cannot be preempted again, where minutes is wall-clock time, not normalized time.

Setting the parameter of the same name in lsb.applications overrides this parameter; setting this parameter overrides the parameter of the same name in lsb.params.

Valid values

Any positive integer greater than or equal to one (1)

Default

Unlimited

MEMLIMIT

Syntax

MEMLIMIT=[default_limit] maximum_limit

Description

The per-process (hard) process resident set size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).

Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process.

By default, if a default memory limit is specified, jobs submitted to the queue without a job-level memory limit are killed when the default memory limit is reached.

If you specify only one limit, it is the maximum, or hard, memory limit. If you specify two limits, the first one is the default, or soft, memory limit, and the second one is the maximum memory limit.

LSF has two methods of enforcing memory usage:
  • OS Memory Limit Enforcement
  • LSF Memory Limit Enforcement

OS memory limit enforcement

OS memory limit enforcement is the default MEMLIMIT behavior and does not require further configuration. OS enforcement usually allows the process to eventually run to completion. LSF passes MEMLIMIT to the OS that uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT. Only available on systems that support RLIMIT_RSS for setrlimit().

Not supported on:
  • Sun Solaris 2.x
  • Windows

LSF memory limit enforcement

To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in lsf.conf to y. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT.

You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in lsf.conf to y. The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.

Available for all systems on which LSF collects total memory usage.

Example

The following configuration defines a queue with a memory limit of 5000 KB:
Begin Queue 
QUEUE_NAME  = default 
DESCRIPTION = Queue with memory limit of 5000 kbytes 
MEMLIMIT    = 5000 
End Queue

Default

Unlimited

MIG

Syntax

MIG=minutes

Description

Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes.

LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes. Specify a value of 0 to migrate jobs immediately upon suspension. The migration threshold applies to all jobs running on the host.

Job-level command line migration threshold overrides threshold configuration in application profile and queue. Application profile configuration overrides queue level configuration.

When a host migration threshold is specified, and is lower than the value for the job, the queue, or the application, the host value is used..

Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from the job chunk and put into PEND state.

Does not affect MultiCluster jobs that are forwarded to a remote cluster.

Default

Not defined. LSF does not migrate checkpointable or rerunnable jobs automatically.

NETWORK_REQ

Syntax

NETWORK_REQ="network_res_req"

network_res_req has the following syntax:

[type=sn_all | sn_single] [:protocol=protocol_name[(protocol_number)][,protocol_name[(protocol_number)]] [:mode=US | IP] [:usage=dedicated | shared] [:instance=positive_integer]

Description

For LSF IBM Parallel Environment (PE) integration. Specifies the network resource requirements for a PE job.

If any network resource requirement is specified in the job, queue, or application profile, the job is treated as a PE job. PE jobs can only run on hosts where IBM PE pnsd daemon is running.

The network resource requirement string network_res_req has the same syntax as the bsub -network option.

The -network bsub option overrides the value of NETWORK_REQ defined in lsb.queues or lsb.applications. The value of NETWORK_REQ defined in lsb.applications overrides queue-level NETWORK_REQ defined in lsb.queues.

The following IBM LoadLeveller job command file options are not supported in LSF:
  • collective_groups
  • imm_send_buffers
  • rcxtblocks
The following network resource requirement options are supported:
type=sn_all | sn_single
Specifies the adapter device type to use for message passing: either sn_all or sn_single.
sn_single

When used for switch adapters, specifies that all windows are on a single network

sn_all

Specifies that one or more windows are on each network, and that striped communication should be used over all available switch networks. The networks specified must be accessible by all hosts selected to run the PE job. See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about submitting jobs that use striping.

If mode is IP and type is specified as sn_all or sn_single, the job will only run on InfiniBand (IB) adapters (IPoIB). If mode is IP and type is not specified, the job will only run on Ethernet adapters (IPoEth). For IPoEth jobs, LSF ensures the job is running on hosts where pnsd is installed and running. For IPoIB jobs, LSF ensures the job the job is running on hosts where pnsd is installed and running, and that IB networks are up. Because IP jobs do not consume network windows, LSF does not check if all network windows are used up or the network is already occupied by a dedicated PE job.

Equivalent to the PE MP_EUIDEVICE environment variable and -euidevice PE flag See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information. Only sn_all or sn_single are supported by LSF. The other types supported by PE are not supported for LSF jobs.

protocol=protocol_name[(protocol_number)]
Network communication protocol for the PE job, indicating which message passing API is being used by the application. The following protocols are supported by LSF:
mpi

The application makes only MPI calls. This value applies to any MPI job regardless of the library that it was compiled with (PE MPI, MPICH2).

pami

The application makes only PAMI calls.

lapi

The application makes only LAPI calls.

shmem

The application makes only OpenSHMEM calls.

user_defined_parallel_api

The application makes only calls from a parallel API that you define. For example: protocol=myAPI or protocol=charm.

The default value is mpi.

LSF also supports an optional protocol_number (for example, mpi(2), which specifies the number of contexts (endpoints) per parallel API instance. The number must be a power of 2, but no greater than 128 (1, 2, 4, 8, 16, 32, 64, 128). LSF will pass the communication protocols to PE without any change. LSF will reserve network windows for each protocol.

When you specify multiple parallel API protocols, you cannot make calls to both LAPI and PAMI (lapi, pami) or LAPI and OpenSHMEM (lapi, shmem) in the same application. Protocols can be specified in any order.

See the MP_MSG_API and MP_ENDPOINTS environment variables and the -msg_api and -endpoints PE flags in the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about the communication protocols that are supported by IBM Parallel Edition.

mode=US | IP

The network communication system mode used by the communication specified communication protocol: US (User Space) or IP (Internet Protocol). A US job can only run with adapters that support user space communications, such as the IB adapter. IP jobs can run with either Ethernet adapters or IB adapters. When IP mode is specified, the instance number cannot be specified, and network usage must be unspecified or shared.

Each instance on the US mode requested by a task running on switch adapters requires and adapter window. For example, if a task requests both the MPI and LAPI protocols such that both protocol instances require US mode, two adapter windows will be used.

The default value is US.

usage=dedicated | shared

Specifies whether the adapter can be shared with tasks of other job steps: dedicated or shared. Multiple tasks of the same job can share one network even if usage is dedicated.

The default usage is shared.

instances=positive_integer

The number of parallel communication paths (windows) per task made available to the protocol on each network. The number actually used depends on the implementation of the protocol subsystem.

The default value is 1.

If the specified value is greater than MAX_PROTOCOL_INSTANCES in lsb.params or lsb.queues, LSF rejects the job.

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for NETWORK_REQ to take effect. If LSF_PE_NETWORK_NUM is not defined or is set to 0, NETWORK_REQ is ignored with a warning message.

Example

The following network resource requirement string specifies that the requirements for an sn_all job (one or more windows are on each network, and striped communication should be used over all available switch networks). The PE job uses MPI API calls (protocol), runs in user-space network communication system mode, and requires 1 parallel communication path (window) per task.

NETWORK_REQ = "protocol=mpi:mode=us:instance=1:type=sn_all"

Default

No default value, but if you specify no value (NETWORK_REQ=""), the job uses the following: protocol=mpi:mode=US:usage=shared:instance=1 in the queue.

NEW_JOB_SCHED_DELAY

Syntax

NEW_JOB_SCHED_DELAY=seconds

Description

The number of seconds that a new job waits, before being scheduled. A value of zero (0) means the job is scheduled without any delay. The scheduler still periodically fetches jobs from mbatchd. Once it gets jobs, scheduler schedules them without any delay. This may speed up job scheduling a bit, but it also generates some communication overhead. Therefore, you should only set it to 0 for high priority, urgent or interactive queues for a small workloads.

If NEW_JOB_SCHED_DELAY is set to a non-zero value, scheduler will periodically fetch new jobs from mbatchd, after which it sets job scheduling time to job submission time + NEW_JOB_SCHED_DELAY.

Default

0 seconds

NICE

Syntax

NICE=integer

Description

Adjusts the UNIX scheduling priority at which jobs from this queue execute.

The default value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs on a queue-by-queue basis, to control their effect on other batch or interactive jobs. See the nice(1) manual page for more details.

On Windows, this value is mapped to Windows process priority classes as follows:
  • nice>=0 corresponds to an priority class of IDLE
  • nice<0 corresponds to an priority class of NORMAL

LSF on Windows does not support HIGH or REAL-TIME priority classes.

This value is overwritten by the NICE setting in lsb.applications, if defined.

Default

0 (zero)

NO_PREEMPT_INTERVAL

Syntax

NO_PREEMPT_INTERVAL=minutes

Description

Prevents preemption of jobs for the specified number of minutes of uninterrupted run time, where minutes is wall-clock time, not normalized time. NO_PREEMPT_INTERVAL=0 allows immediate preemption of jobs as soon as they start or resume running.

Setting the parameter of the same name in lsb.applications overrides this parameter; setting this parameter overrides the parameter of the same name in lsb.params.

Default

0

PJOB_LIMIT

Syntax

PJOB_LIMIT=float

Description

Per-processor job slot limit for the queue.

Maximum number of job slots that this queue can use on any processor. This limit is configured per processor, so that multiprocessor hosts automatically run more jobs.

Default

Unlimited

POST_EXEC

Syntax

POST_EXEC=command

Description

Enables post-execution processing at the queue level. The POST_EXEC command runs on the execution host after the job finishes. Post-execution commands can be configured at the application and queue levels. Application-level post-execution commands run before queue-level post-execution commands.

The POST_EXEC command uses the same environment variable values as the job, and, by default, runs under the user account of the user who submits the job. To run post-execution commands under a different user account (such as root for privileged operations), configure the parameter LSB_PRE_POST_EXEC_USER in lsf.sudoers.

When a job exits with one of the queue’s REQUEUE_EXIT_VALUES, LSF requeues the job and sets the environment variable LSB_JOBPEND. The post-execution command runs after the requeued job finishes.

When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. If the execution environment for the job cannot be set up, LSB_JOBEXIT_STAT is set to 0 (zero).

The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

For UNIX:
  • The pre- and post-execution commands run in the /tmp directory under /bin/sh -c, which allows the use of shell features in the commands. The following example shows valid configuration lines:
    PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
    POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
  • LSF sets the PATH environment variable to
    PATH='/bin /usr/bin /sbin /usr/sbin'
  • The stdin, stdout, and stderr are set to /dev/null
  • To allow UNIX users to define their own post-execution commands, an LSF administrator specifies the environment variable $USER_POSTEXEC as the POST_EXEC command. A user then defines the post-execution command:
    setenv USER_POSTEXEC /path_name
    Note: The path name for the post-execution command must be an absolute path. Do not define POST_EXEC=$USER_POSTEXEC when LSB_PRE_POST_EXEC_USER=root. This parameter cannot be used to configure host-based post-execution processing.
For Windows:
  • The pre- and post-execution commands run under cmd.exe /c
  • The standard input, standard output, and standard error are set to NULL
  • The PATH is determined by the setup of the LSF Service
Note:

For post-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe.

Default

Not defined. No post-execution commands are associated with the queue.

PRE_EXEC

Syntax

PRE_EXEC=command

Description

Enables pre-execution processing at the queue level. The PRE_EXEC command runs on the execution host before the job starts. If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.

Pre-execution commands can be configured at the queue, application, and job levels and run in the following order:
  1. The queue-level command
  2. The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.

The PRE_EXEC command uses the same environment variable values as the job, and runs under the user account of the user who submits the job. To run pre-execution commands under a different user account (such as root for privileged operations), configure the parameter LSB_PRE_POST_EXEC_USER in lsf.sudoers.

The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

For UNIX:
  • The pre- and post-execution commands run in the /tmp directory under /bin/sh -c, which allows the use of shell features in the commands. The following example shows valid configuration lines:
    PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
    POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
  • LSF sets the PATH environment variable to
    PATH='/bin /usr/bin /sbin /usr/sbin'
  • The stdin, stdout, and stderr are set to /dev/null
For Windows:
  • The pre- and post-execution commands run under cmd.exe /c
  • The standard input, standard output, and standard error are set to NULL
  • The PATH is determined by the setup of the LSF Service
Note:

For pre-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. This parameter cannot be used to configure host-based pre-execution processing.

Default

Not defined. No pre-execution commands are associated with the queue.

PREEMPTION

Syntax

PREEMPTION=PREEMPTIVE[[low_queue_name[+pref_level]...]] PREEMPTION=PREEMPTABLE[[hi_queue_name...]] PREEMPTION=PREEMPTIVE[[low_queue_name[+pref_level]...]] PREEMPTABLE[[hi_queue_name...]]

Description

PREEMPTIVE

Enables preemptive scheduling and defines this queue as preemptive. Jobs in this queue preempt jobs from the specified lower-priority queues or from all lower-priority queues if the parameter is specified with no queue names. PREEMPTIVE can be combined with PREEMPTABLE to specify that jobs in this queue can preempt jobs in lower-priority queues, and can be preempted by jobs in higher-priority queues.

PREEMPTABLE

Enables preemptive scheduling and defines this queue as preemptable. Jobs in this queue can be preempted by jobs from specified higher-priority queues, or from all higher-priority queues, even if the higher-priority queues are not preemptive. PREEMPTIVE can be combined with PREEMPTIVE to specify that jobs in this queue can be preempted by jobs in higher-priority queues, and can preempt jobs in lower-priority queues.

low_queue_name

Specifies the names of lower-priority queues that can be preempted.

To specify multiple queues, separate the queue names with a space, and enclose the list in a single set of square brackets.

+pref_level

Specifies to preempt this queue before preempting other queues. When multiple queues are indicated with a preference level, an order of preference is indicated: queues with higher relative preference levels are preempted before queues with lower relative preference levels set.

hi_queue_name

Specifies the names of higher-priority queues that can preempt jobs in this queue.

To specify multiple queues, separate the queue names with a space and enclose the list in a single set of square brackets.

Example: configure selective, ordered preemption across queues

The following example defines four queues, as follows:
  • high
    • Has the highest relative priority of 99
    • Jobs from this queue can preempt jobs from all other queues
  • medium
    • Has the second-highest relative priority at 10
    • Jobs from this queue can preempt jobs from normal and low queues, beginning with jobs from low, as indicated by the preference (+1)
  • normal
    • Has the second-lowest relative priority, at 5
    • Jobs from this queue can preempt jobs from low, and can be preempted by jobs from both high and medium queues
  • low
    • Has the lowest relative priority, which is also the default priority, at 1
    • Jobs from this queue can be preempted by jobs from all preemptive queues, even though it does not have the PREEMPTABLE keyword set
Begin Queue
QUEUE_NAME=high 
PREEMPTION=PREEMPTIVE 
PRIORITY=99 
End Queue
Begin Queue
QUEUE_NAME=medium 
PREEMPTION=PREEMPTIVE[normal low+1] 
PRIORITY=10 
End Queue
Begin Queue
QUEUE_NAME=normal
PREEMPTION=PREEMPTIVE[low]
PREEMPTABLE[high medium]
PRIORITY=5 
End Queue
Begin Queue 
QUEUE_NAME=low 
PRIORITY=1 
End Queue

PREEMPT_DELAY

Syntax

PREEMPT_DELAY=seconds

Description

Preemptive jobs will wait the specified number of seconds from the submission time before preempting any low priority preemptable jobs. During the grace period, preemption will not be trigged, but the job can be scheduled and dispatched by other scheduling policies.

This feature can provide flexibility to tune the system to reduce the number of preemptions. It is useful to get better performance and job throughput. When the low priority jobs are short, if high priority jobs can wait a while for the low priority jobs to finish, preemption can be avoided and cluster performance is improved. If the job is still pending after the grace period has expired, the preemption will be triggered.

The waiting time is for preemptive jobs in the pending status only. It will not impact the preemptive jobs that are suspended.

The time is counted from the submission time of the jobs. The submission time means the time mbatchd accepts a job, which includes newly submitted jobs, restarted jobs (by brestart) or forwarded jobs from a remote cluster.

When the preemptive job is waiting, the pending reason is:

The preemptive job is allowing a grace period before preemption.

If you use an older version of bjobs, the pending reason is:

Unknown pending reason code <6701>;

The parameter is defined in lsb.params, lsb.queues (overrides lsb.params), and lsb.applications (overrides both lsb.params and lsb.queues).

Run badmin reconfig to make your changes take effect.

Default

Not defined (if the parameter is not defined anywhere, preemption is immediate).

PRIORITY

Syntax

PRIORITY=integer

Description

Specifies the relative queue priority for dispatching jobs. A higher value indicates a higher job-dispatching priority, relative to other queues.

LSF schedules jobs from one queue at a time, starting with the highest-priority queue. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.

LSF queue priority is independent of the UNIX scheduler priority system for time-sharing processes. In LSF, the NICE parameter is used to set the UNIX time-sharing priority for batch jobs.

integer

Specify a number greater than or equal to 1, where 1 is the lowest priority.

Default

1

PROCESSLIMIT

Syntax

PROCESSLIMIT=[default_limit] maximum_limit

Description

Limits the number of concurrent processes that can be part of a job.

By default, if a default process limit is specified, jobs submitted to the queue without a job-level process limit are killed when the default process limit is reached.

If you specify only one limit, it is the maximum, or hard, process limit. If you specify two limits, the first one is the default, or soft, process limit, and the second one is the maximum process limit.

Default

Unlimited

PROCLIMIT

Syntax

PROCLIMIT=[minimum_limit [default_limit]] maximum_limit

Description

Maximum number of slots that can be allocated to a job. For parallel jobs, the maximum number of processors that can be allocated to the job.

Start of changeQueue level PROCLIMIT has the highest priority over application level PROCLIMIT and job level PROCLIMIT. Application level PROCLIMIT has higher priority than job level PROCLIMIT. Job-level limits must fall within the maximum and minimum limits of the application profile and the queue.End of change

Optionally specifies the minimum and default number of job slots.

All limits must be positive numbers greater than or equal to 1 that satisfy the following relationship:

1 <= minimum <= default <= maximum

If RES_REQ in a queue was defined as a compound resource requirement with a block size (span[block=value]), the default value for PROCLIMIT should be a multiple of a block.

For example, this configuration would be accepted:

Queue-level RES_REQ="1*{type==any } + {type==local span[block=4]}"

PROCLIMIT = 5 9 13

This configuration, for example, would not be accepted. An error message will appear when doing badmin reconfig:

Queue-level RES_REQ="1*{type==any } + {type==local span[block=4]}"

PROCLIMIT = 4 10 12

Default

Unlimited, the default number of slots is 1

QJOB_LIMIT

Syntax

QJOB_LIMIT=integer

Description

Job slot limit for the queue. Total number of job slots that this queue can use.

Default

Unlimited

QUEUE_GROUP

Syntax

QUEUE_GROUP=queue1, queue2 ...

Description

Configures absolute priority scheduling (APS) across multiple queues.

When APS is enabled in the queue with APS_PRIORITY, the FAIRSHARE_QUEUES parameter is ignored. The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.

Default

Not defined

QUEUE_NAME

Syntax

QUEUE_NAME=string

Description

Required. Name of the queue.

Specify any ASCII string up to 59 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. You cannot specify the reserved name default.

Default

You must specify this parameter to define a queue. The default queue automatically created by LSF is named default.

RCVJOBS_FROM

Syntax

RCVJOBS_FROM=cluster_name ... | allclusters

Description

MultiCluster only. Defines a MultiCluster receive-jobs queue.

Specify cluster names, separated by a space. The administrator of each remote cluster determines which queues in that cluster forward jobs to the local cluster.

Use the keyword allclusters to specify any remote cluster.

Example

RCVJOBS_FROM=cluster2 cluster4 cluster6

This queue accepts remote jobs from clusters 2, 4, and 6.

REMOTE_MAX_PREEXEC_RETRY

Syntax

REMOTE_MAX_PREEXEC_RETRY=integer

Description

MultiCluster job forwarding model only. Applies to the execution cluster. Define the maximum number of times to attempt the pre-execution command of a job from the remote cluster.

Valid values

0 - INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

5

REQUEUE_EXIT_VALUES

Syntax

REQUEUE_EXIT_VALUES=[exit_code ...] [EXCLUDE(exit_code ...)]

Description

Enables automatic job requeue and sets the LSB_EXIT_REQUEUE environment variable. Use spaces to separate multiple exit codes. Application-level exit values override queue-level values. Job-level exit values (bsub -Q) override application-level and queue-level values.

exit_code has the following form:
"[all] [~number ...] | [number ...]"

The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list.

Jobs are requeued to the head of the queue. The output from the failed run is not saved, and the user is not notified by LSF.

Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue, ensuring the job does not rerun on the samehost. Exclusive job requeue does not work for parallel jobs.

For MultiCluster jobs forwarded to a remote execution cluster, the exit values specified in the submission cluster with the EXCLUDE keyword are treated as if they were non-exclusive.

You can also requeue a job if the job is terminated by a signal.

If a job is killed by a signal, the exit value is 128+signal_value. The sum of 128 and the signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.

For example, if you want a job to rerun if it is killed with a signal 9 (SIGKILL), the exit value would be 128+9=137. You can configure the following requeue exit value to allow a job to be requeue if it was kill by signal 9:

REQUEUE_EXIT_VALUES=137

In Windows, if a job is killed by a signal, the exit value is signal_value. The signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.

For example, if you want to rerun a job after it was killed with a signal 7 (SIGKILL), the exit value would be 7. You can configure the following requeue exit value to allow a job to requeue after it was killed by signal 7:

REQUEUE_EXIT_VALUES=7

You can configure the following requeue exit value to allow a job to requeue for both Linux and Windows after it was killed:

REQUEUE_EXIT_VALUES=137 7

If mbatchd is restarted, it does not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.

You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds).

Example

REQUEUE_EXIT_VALUES=30 EXCLUDE(20)

means that jobs with exit code 30 are requeued, jobs with exit code 20 are requeued exclusively, and jobs with any other exit code are not requeued.

Default

Not defined. Jobs are not requeued.

RERUNNABLE

Syntax

RERUNNABLE=yes | no

Description

If yes, enables automatic job rerun (restart).

Rerun is disabled when RERUNNABLE is set to no. The yes and no arguments are not case sensitive.

For MultiCluster jobs, the setting in the submission queue is used, and the setting in the execution queue is ignored.

Members of a chunk job can be rerunnable. If the execution host becomes unavailable, rerunnable chunk job members are removed from the job chunk and dispatched to a different execution host.

Default

no

RESOURCE_RESERVE

Syntax

RESOURCE_RESERVE=MAX_RESERVE_TIME[integer]

Description

Enables processor reservation and memory reservation for pending jobs for the queue. Specifies the number of dispatch turns (MAX_RESERVE_TIME) over which a job can reserve job slots and memory.

Overrides the SLOT_RESERVE parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, an error is displayed when the cluster is reconfigured, and SLOT_RESERVE is ignored. Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plugin module names for both resource reservation and parallel batch jobs (schmod_parallel and schmod_reserve) are configured in the lsb.modules file: The schmod_parallel name must come before schmod_reserve in lsb.modules.

If a job has not accumulated enough memory or job slots to start by the time MAX_RESERVE_TIME expires, it releases all its reserved job slots or memory so that other pending jobs can run. After the reservation time expires, the job cannot reserve memory or slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve available memory and job slots again for another period specified by MAX_RESERVE_TIME.

If BACKFILL is configured in a queue, and a run limit is specified with -W on bsub or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs in the queue, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.

Unlike slot reservation, which only applies to parallel jobs, memory reservation and backfill on memory apply to sequential and parallel jobs.

Example

RESOURCE_RESERVE=MAX_RESERVE_TIME[5]

This example specifies that jobs have up to 5 dispatch turns to reserve sufficient job slots or memory (equal to 5 minutes, by default).

Default

Not defined. No job slots or memory is reserved.

RES_REQ

Syntax

RES_REQ=res_req

Description

Resource requirements used to determine eligible hosts. Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds. Resource requirement strings can be simple (applying to the entire job), compound (applying to the specified number of slots) or can contain alternative resources (alternatives between 2 or more simple and/or compound). For alternative resources, if the first resource cannot be found that satisfies the first resource requirement, then the next resource requirement is tried, and so on until the requirement is satisfied.

Compound and alternative resource requirements follow the same set of rules for determining how resource requirements are going to be merged between job, application, and queue level. For more detail on merge rules, see the Administering IBM Platform LSF.

When a compound or alternative resource requirement is set for a queue, it will be ignored unless it is the only resource requirement specified (no resource requirements are set at the job-level or application-level).

When a simple resource requirement is set for a queue and a compound resource requirement is set at the job-level or application-level, the queue-level requirements merge as they do for simple resource requirements. However, any job-based resources defined in the queue only apply to the first term of the merged compound resource requirements.

When LSF_STRICT_RESREQ=Y is configured in lsf.conf, resource requirement strings in select sections must conform to a more strict syntax. The strict resource requirement syntax only applies to the select section. It does not apply to the other resource requirement sections (order, rusage, same, span, cu or affinity). When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.

For simple resource requirements, the select sections from all levels must be satisfied and the same sections from all levels are combined. cu, order, and span sections at the job-level overwrite those at the application-level which overwrite those at the queue-level. Multiple rusage definitions are merged, with the job-level rusage taking precedence over the application-level, and application-level taking precedence over the queue-level.

The simple resource requirement rusage section can specify additional requests. To do this, use the OR (||) operator to separate additional rusage strings. Multiple -R options cannot be used with multi-phase rusage resource requirements.

For simple resource requirements the job-level affinity section overrides the application-level, and the application-level affinity section overrides the queue-level.

Note:

Compound and alternative resource requirements do not support use of the || operator within rusage sections or the cu section.

The RES_REQ consumable resource requirements must satisfy any limits set by the parameter RESRSV_LIMIT in lsb.queues, or the RES_REQ will be ignored.

When both the RES_REQ and RESRSV_LIMIT are set in lsb.queues for a consumable resource, the queue-level RES_REQ no longer acts as a hard limit for the merged RES_REQ rusage values from the job and application levels. In this case only the limits set by RESRSV_LIMIT must be satisfied, and the queue-level RES_REQ acts as a default value.

For example:
Queue-level RES_REQ:
RES_REQ=rusage[mem=200:lic=1] ...
For the job submission:
bsub -R'rusage[mem=100]' ...
the resulting requirement for the job is
rusage[mem=100:lic=1]

where mem=100 specified by the job overrides mem=200 specified by the queue. However, lic=1 from queue is kept, since job does not specify it.

Queue-level RES_REQ threshold:
RES_REQ = rusage[bwidth =2:threshold=5] ...
For the job submission:
bsub -R "rusage[bwidth =1:threshold=6]" ...

the resulting requirement for the job is

rusage[bwidth =1:threshold=6]
Queue-level RES_REQ with decay and duration defined:
RES_REQ=rusage[mem=200:duration=20:decay=1] ...
For a job submission with no decay or duration:
bsub -R'rusage[mem=100]' ...
the resulting requirement for the job is:
rusage[mem=100:duration=20:decay=1]

Queue-level duration and decay are merged with the job-level specification, and mem=100 for the job overrides mem=200 specified by the queue. However, duration=20 and decay=1 from queue are kept, since job does not specify them.

Queue-level RES_REQ with multi-phase job-level rusage:
RES_REQ=rusage[mem=200:duration=20:decay=1] ...
For a job submission with no decay or duration:
bsub -R'rusage[mem=(300 200 100):duration=(10 10 10)]' ...
the resulting requirement for the job is:
rusage[mem=(300 200 100):duration=(10 10 10)]
Multi-phase rusage values in the job submission override the single phase specified by the queue.
  • If RESRSV_LIMIT is defined in lsb.queues and has a maximum memory limit of 300 MB or greater, this job will be accepted.
  • If RESRSV_LIMIT is defined in lsb.queues and has a maximum memory limit of less than 300 MB, this job will be rejected.
  • If RESRSV_LIMIT is not defined in lsb.queues and the queue-level RES_REQ value of 200 MB acts as a ceiling, this job will be rejected.
Queue-level multi-phase rusage RES_REQ:
RES_REQ=rusage[mem=(350 200):duration=(20):decay=(1)] ...
For a single phase job submission with no decay or duration:
bsub -q q_name -R'rusage[mem=100:swap=150]' ...
the resulting requirement for the job is:
rusage[mem=100:swap=150]

The job-level rusage string overrides the queue-level multi-phase rusage string.

The order section defined at the job level overwrites any resource requirements specified at the application level or queue level. The order section defined at the application level overwrites any resource requirements specified at the queue level. The default order string is r15s:pg.

If RES_REQ is defined at the queue level and there are no load thresholds defined, the pending reasons for each individual load index are not displayed by bjobs.

The span section defined at the queue level is ignored if the span section is also defined at the job level or in an application profile.

Note: Define span[hosts=-1] in the application profile or bsub -R resource requirement string to override the span section setting in the queue.

Default

select[type==local] order[r15s:pg]. If this parameter is defined and a host model or Boolean resource is specified, the default type is any.

RESRSV_LIMIT

Syntax

RESRSV_LIMIT=[res1={min1,} max1] [res2={min2,} max2]...

Where res is a consumable resource name, min is an optional minimum value and max is the maximum allowed value. Both max and min must be float numbers between 0 and 2147483647, and min cannot be greater than max.

Description

Sets a range of allowed values for RES_REQ resources.

Queue-level RES_REQ rusage values (set in lsb.queues) must be in the range set by RESRSV_LIMIT, or the queue-level RES_REQ is ignored. Merged RES_REQ rusage values from the job and application levels must be in the range of RESRSV_LIMIT, or the job is rejected.

Changes made to the rusage values of running jobs using bmod -R cannot exceed the maximum values of RESRSV_LIMIT, but can be lower than the minimum values.

When both the RES_REQ and RESRSV_LIMIT are set in lsb.queues for a consumable resource, the queue-level RES_REQ no longer acts as a hard limit for the merged RES_REQ rusage values from the job and application levels. In this case only the limits set by RESRSV_LIMIT must be satisfied, and the queue-level RES_REQ acts as a default value.

For MultiCluster, jobs must satisfy the RESRSV_LIMIT range set for the send-jobs queue in the submission cluster. After the job is forwarded the resource requirements are also checked against the RESRSV_LIMIT range set for the receive-jobs queue in the execution cluster.

Note:

Only consumable resource limits can be set in RESRSV_LIMIT. Other resources will be ignored.

Default

Not defined.

If max is defined and optional min is not, the default for min is 0.

RESUME_COND

Syntax

RESUME_COND=res_req

Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.

Description

LSF automatically resumes a suspended (SSUSP) job in this queue if the load on the host satisfies the specified conditions.

If RESUME_COND is not defined, then the loadSched thresholds are used to control resuming of jobs. The loadSched thresholds are ignored, when resuming jobs, if RESUME_COND is defined.

Default

Not defined. The loadSched thresholds are used to control resuming of jobs.

RUN_JOB_FACTOR

Syntax

RUN_JOB_FACTOR=number

Description

Used only with fairshare scheduling. Job slots weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the number of job slots reserved and in use by a user.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

RUN_TIME_DECAY

Syntax

RUN_TIME_DECAY=Y | y | N | n

Description

Used only with fairshare scheduling. Enables decay for run time at the same rate as the decay set by HIST_HOURS for cumulative CPU time and historical run time.

In the calculation of a user’s dynamic share priority, this factor determines whether run time is decayed.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Restrictions

Running badmin reconfig or restarting mbatchd during a job's run time results in the decayed run time being recalculated.

When a suspended job using run time decay is resumed, the decay time is based on the elapsed time.

Default

Not defined

RUN_TIME_FACTOR

Syntax

RUN_TIME_FACTOR=number

Description

Used only with fairshare scheduling. Run time weighting factor.

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the total run time of a user’s running jobs.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

RUN_WINDOW

Syntax

RUN_WINDOW=time_window ...

Description

Time periods during which jobs in the queue are allowed to run.

When the window closes, LSF suspends jobs running in the queue and stops dispatching jobs from the queue. When the window reopens, LSF resumes the suspended jobs and begins dispatching additional jobs.

Default

Not defined. Queue is always active.

RUNLIMIT

Syntax

RUNLIMIT=[default_limit] maximum_limit

where default_limit and maximum_limit are:

[hour:]minute[/host_name | /host_model]

Description

The maximum run limit and optionally the default run limit. The name of a host or host model specifies the runtime normalization host to use.

By default, jobs that are in the RUN state for longer than the specified maximum run limit are killed by LSF. You can optionally provide your own termination job action to override this default.

Jobs submitted with a job-level run limit (bsub -W) that is less than the maximum run limit are killed when their job-level run limit is reached. Jobs submitted with a run limit greater than the maximum run limit are rejected by the queue.

If a default run limit is specified, jobs submitted to the queue without a job-level run limit are killed when the default run limit is reached. The default run limit is used with backfill scheduling of parallel jobs.
Note:

If you want to provide an estimated run time for scheduling purposes without killing jobs that exceed the estimate, define the RUNTIME parameter in an application profile instead of a run limit (see lsb.applications for details).

If you specify only one limit, it is the maximum, or hard, run limit. If you specify two limits, the first one is the default, or soft, run limit, and the second one is the maximum run limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30, or 210.

The run limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The run limit you specify is the normalized run time. This is done so that the job does approximately the same amount of processing, even if it is sent to host with a faster or slower CPU. Whenever a normalized run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.

If ABS_RUNLIMIT=Y is defined in lsb.params, the runtime limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted to a queue with a run limit configured.

Optionally, you can supply a host name or a host model name defined in LSF. You must insert ‘/’ between the run limit and the host name or model name. (See lsinfo(1) to get host model information.)

If no host or host model is given, LSF uses the default runtime normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured; otherwise, the host with the largest CPU factor (the fastest host in the cluster).

For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster).

Jobs submitted to a chunk job queue are not chunked if RUNLIMIT is greater than 30 minutes.

RUNLIMIT is required for queues configured with INTERRUPTIBLE_BACKFILL.

Default

Unlimited

SLA_GUARANTEES_IGNORE

Syntax

SLA_GUARANTEES_IGNORE=Y| y | N | n

Description

Applies to SLA guarantees only.

SLA_GUARANTEES_IGNORE=Y allows jobs in the queue access to all guaranteed resources. As a result, some guarantees might not be honored. If a queue does not have this parameter set, jobs in this queue cannot trigger preemption of an SLA job. If an SLA job is suspended (e.g. by a bstop), jobs in queues without the parameter set can still make use of the slots released by the suspended job.

Note:

Using SLA_GUARANTEES_IGNORE=Y defeats the purpose of guaranteeing resources. This should be used sparingly for low traffic queues only.

Default

Not defined (N). The queue must honor resource guarantees when dispatching jobs.

SLOT_POOL

Syntax

SLOT_POOL=pool_name

Description

Name of the pool of job slots the queue belongs to for queue-based fairshare. A queue can only belong to one pool. All queues in the pool must share the same set of hosts.

Valid values

Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces.

Default

Not defined. No job slots are reserved.

SLOT_RESERVE

Syntax

SLOT_RESERVE=MAX_RESERVE_TIME[integer]

Description

Enables processor reservation for the queue and specifies the reservation time. Specify the keyword MAX_RESERVE_TIME and, in square brackets, the number of MBD_SLEEP_TIME cycles over which a job can reserve job slots. MBD_SLEEP_TIME is defined in lsb.params; the default value is 60 seconds.

If a job has not accumulated enough job slots to start before the reservation expires, it releases all its reserved job slots so that other jobs can run. Then, the job cannot reserve slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve job slots again for another period specified by SLOT_RESERVE.

SLOT_RESERVE is overridden by the RESOURCE_RESERVE parameter.

If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, job slot reservation and memory reservation are enabled and an error is displayed when the cluster is reconfigured. SLOT_RESERVE is ignored.

Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plugin module names for both resource reservation and parallel batch jobs (schmod_parallel and schmod_reserve) are configured in the lsb.modules file: The schmod_parallel name must come before schmod_reserve in lsb.modules.

If BACKFILL is configured in a queue, and a run limit is specified at the job level (bsub -W), application level (RUNLIMIT in lsb.applications), or queue level (RUNLIMIT in lsb.queues), or if an estimated run time is specified at the application level (RUNTIME in lsb.applications), backfill parallel jobs can use job slots reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.

Unlike memory reservation, which applies both to sequential and parallel jobs, slot reservation applies only to parallel jobs.

Example

SLOT_RESERVE=MAX_RESERVE_TIME[5]

This example specifies that parallel jobs have up to 5 cycles of MBD_SLEEP_TIME (5 minutes, by default) to reserve sufficient job slots to start.

Default

Not defined. No job slots are reserved.

SLOT_SHARE

Syntax

SLOT_SHARE=integer

Description

Share of job slots for queue-based fairshare. Represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero (0) and less than or equal to 100.

The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.

Default

Not defined

SNDJOBS_TO

Syntax

SNDJOBS_TO=[queue@]cluster_name[+preference] ...

Description

Defines a MultiCluster send-jobs queue.

Specify remote queue names, in the form queue_name@cluster_name[+preference], separated by a space.

This parameter is ignored if lsb.queues HOSTS specifies remote (borrowed) resources.

Queue preference is defined at the queue level in SNDJOBS_TO (lsb.queues) of the submission cluster for each corresponding execution cluster queue receiving forwarded jobs.

Example

SNDJOBS_TO=queue2@cluster2+1 queue3@cluster2+2

STACKLIMIT

Syntax

STACKLIMIT=integer

Description

The per-process (hard) stack segment size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).

Default

Unlimited

STOP_COND

Syntax

STOP_COND=res_req

Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.

Description

LSF automatically suspends a running job in this queue if the load on the host satisfies the specified conditions.
  • LSF does not suspend the only job running on the host if the machine is interactively idle (it > 0).
  • LSF does not suspend a forced job (brun -f).
  • LSF does not suspend a job because of paging rate if the machine is interactively idle.

If STOP_COND is specified in the queue and there are no load thresholds, the suspending reasons for each individual load index is not displayed by bjobs.

Example

STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swp < 50))]

In this example, assume “cs” is a Boolean resource indicating that the host is a computer server. The stop condition for jobs running on computer servers is based on the availability of swap memory. The stop condition for jobs running on other kinds of hosts is based on the idle time.

SUCCESS_EXIT_VALUES

Syntax

SUCCESS_EXIT_VALUES=[exit_code ...]

Description

Use this parameter to specify exit values used byLSF to determine if the job was done successfully. Application level success exit values defined with SUCCESS_EXIT_VALUES in lsb.applications override the configuration defined in lsb.queues. Job-level success exit values specified with the LSB_SUCCESS_EXIT_VALUES environment variable override the configration in lsb.queues and lsb.applications.

Use SUCCESS_EXIT_VALUES for submitting jobs to specific queues that successfully exit with non-zero values so that LSF does not interpret non-zero exit codes as job failure.

If the same exit code is defined in SUCCESS_EXIT_VALUES and REQUEUE_EXIT_VALUES, any job with this exit code is requeued instead of being marked as DONE because sbatchd processes requeue exit values before success exit values.

In MultiCluster job forwarding mode, LSF uses the SUCCESS_EXIT_VALUES from the remote cluster.

In a MultiCluster resource leasing environment, LSF uses the SUCCESS_EXIT_VALUES from the consumer cluster.

exit_code should be a value between 0 and 255. Use spaces to separate multiple exit codes.

Any changes you make to SUCCESS_EXIT_VALUES will not affect running jobs. Only pending jobs will use the new SUCCESS_EXIT_VALUES definitions, even if you run badmin reconfig and mbatchd restart to apply your changes.

Default

Not defined.

SWAPLIMIT

Syntax

SWAPLIMIT=integer

Description

The amount of total virtual memory limit (in KB) for a job from this queue.

This limit applies to the whole job, no matter how many processes the job may contain.

The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL.

Default

Unlimited

TERMINATE_WHEN

Syntax

TERMINATE_WHEN=[LOAD] [PREEMPT] [WINDOW]

Description

Configures the queue to invoke the TERMINATE action instead of the SUSPEND action in the specified circumstance.
  • LOAD: kills jobs when the load exceeds the suspending thresholds.
  • PREEMPT: kills jobs that are being preempted.
  • WINDOW: kills jobs if the run window closes.

If the TERMINATE_WHEN job control action is applied to a chunk job, sbatchd kills the chunk job element that is running and puts the rest of the waiting elements into pending state to be rescheduled later.

Example

Set TERMINATE_WHEN to WINDOW to define a night queue that kills jobs if the run window closes:
Begin Queue 
NAME           = night 
RUN_WINDOW     = 20:00-08:00 
TERMINATE_WHEN = WINDOW 
JOB_CONTROLS   = TERMINATE[kill -KILL $LS_JOBPGIDS; mail - s "job $LSB_JOBID 
killed by queue run window" $USER < /dev/null]
End Queue

THREADLIMIT

Syntax

THREADLIMIT=[default_limit] maximum_limit

Description

Limits the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.

By default, if a default thread limit is specified, jobs submitted to the queue without a job-level thread limit are killed when the default thread limit is reached.

If you specify only one limit, it is the maximum, or hard, thread limit. If you specify two limits, the first one is the default, or soft, thread limit, and the second one is the maximum thread limit.

Both the default and the maximum limits must be positive integers. The default limit must be less than the maximum limit. The default limit is ignored if it is greater than the maximum limit.

Examples

THREADLIMIT=6

No default thread limit is specified. The value 6 is the default and maximum thread limit.

THREADLIMIT=6 8

The first value (6) is the default thread limit. The second value (8) is the maximum thread limit.

Default

Unlimited

UJOB_LIMIT

Syntax

UJOB_LIMIT=integer

Description

Per-user job slot limit for the queue. Maximum number of job slots that each user can use in this queue.

UJOB_LIMIT must be within or greater than the range set by PROCLIMIT or bsub -n (if either is used), or jobs are rejected.

Default

Unlimited

USE_PAM_CREDS

Syntax

USE_PAM_CREDS=y | n

Description

If USE_PAM_CREDS=y, applies PAM limits to a queue when its job is dispatched to a Linux host using PAM. PAM limits are system resource limits defined in limits.conf.

When USE_PAM_CREDS is enabled, PAM limits override others. For example, the PAM limit is used even if queue-level soft limit is less than PAM limit. However, it still cannot exceed queue's hard limit.

If the execution host does not have PAM configured and this parameter is enabled, the job fails.

For parallel jobs, only takes effect on the first execution host.

USE_PAM_CREDS only applies on the following platforms:
  • linux2.6-glibc2.3-ia64
  • linux2.6-glibc2.3-ppc64
  • linux2.6-glibc2.3-sn-ipf
  • linux2.6-glibc2.3-x86
  • linux2.6-glibc2.3-x86_64

Overrides MEMLIMIT_TYPE=Process.

Overridden (for CPU limit only) by LSB_JOB_CPULIMIT=y.

Overridden (for memory limits only) by LSB_JOB_MEMLIMIT=y.

Default

n

USE_PRIORITY_IN_POOL

Syntax

USE_PRIORITY_IN_POOL= y | Y | n | N

Description

Queue-based fairshare only. After job scheduling occurs for each queue, this parameter enables LSF to dispatch jobs to any remaining slots in the pool in first-come first-served order across queues.

Default

N

USERS

Syntax

USERS=all [~user_name ...] [~user_group ...] | [user_name ...] [user_group [~user_group ...] ...]

Description

A space-separated list of user names or user groups that can submit jobs to the queue. LSF cluster administrators are automatically included in the list of users. LSF cluster administrators can submit jobs to this queue, or switch (bswitch) any user’s jobs into this queue.

If user groups are specified, each user in the group can submit jobs to this queue. If FAIRSHARE is also defined in this queue, only users defined by both parameters can submit jobs, so LSF administrators cannot use the queue if they are not included in the share assignments.

User names must be valid login names. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

User group names can be LSF user groups or UNIX and Windows user groups. To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\user_group).

Use the keyword all to specify all users or user groups in a cluster.

Use the not operator (~) to exclude users from the all specification or from user groups. This is useful if you have a large number of users but only want to exclude a few users or groups from the queue definition.

The not operator (~) can only be used with the all keyword or to exclude users from user groups.

CAUTION:
The not operator does not exclude LSF administrators from the queue definintion.

Default

all (all users can submit jobs to the queue)

Examples

  • USERS=user1 user2
  • USERS=all ~user1 ~user2
  • USERS=all ~ugroup1
  • USERS=groupA ~user3 ~user4

Automatic time-based configuration

Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.queues by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.

The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability.

Example

Begin Queue
 ... 
#if time(8:30-18:30)   
INTERACTIVE  = ONLY  # interactive only during day shift #endif
 ... 
End Queue