The lsb.applications file defines application profiles. Use application profiles to define common parameters for the same type of jobs, including the execution requirements of the applications, the resources they require, and how they should be run and managed.
This file is optional. Use the DEFAULT_APPLICATION parameter in lsb.params to specify a default application profile for all jobs. LSF does not automatically assign a default application profile.
This file is installed by default in LSB_CONFDIR/cluster_name/configdir.
After making any changes to lsb.applications, run badmin reconfig to reconfigure mbatchd. Configuration changes apply to pending jobs only. Running jobs are not affected.
Each application profile definition begins with the line Begin Application and ends with the line End Application. The application name must be specified. All other parameters are optional.
Begin Application
NAME = catia
DESCRIPTION = CATIA V5
CPULIMIT = 24:0/hostA # 24 hours of host hostA
FILELIMIT = 20000
DATALIMIT = 20000 # jobs data segment limit
CORELIMIT = 20000
PROCLIMIT = 5 # job processor limit
REQUEUE_EXIT_VALUES = 55 34 78
End Application
See the lsb.applications template file for additional application profile examples.
#INCLUDE "path-to-file"
A MultiCluster environment allows common configurations to be shared by all clusters. Use #INCLUDE to centralize the configuration work for groups of clusters when they all need to share a common configuration. Using #INCLUDE lets you avoid having to manually merge these common configurations into each local cluster's configuration files.
To make the new configuration active, use badmin reconfig, then use bapp to confirm the changes. After configuration, both common resources and local resources will take effect on the local cluster.
#INCLUDE "/scratch/Shared/lsf.applications.common.g"
#INCLUDE "/scratch/Shared/lsf.applications.common.o"
Begin Application
...
Time-based configuration supports MultiCluster configuration in terms of shared configuration for groups of clusters. That means you can include a common configuration file by using the time-based feature in local configuration files. If you want to use the time-based function with an include file, the time-based #include should be placed before all sections. For example:
#if time(11:00-20:00)
#include "/scratch/Shared/lsf.applications.common.grape"
#endif
All #include lines must be inserted at the beginning of the local configuration file. If placed within or after any other sections, LSF reports an error.
Not defined.
ABS_RUNLIMIT=y | Y
The runtime estimates and limits are not normalized by the host CPU factor.
Not defined. Run limit and runtime estimate are normalized.
BIND_JOB=NONE | BALANCE | PACK | ANY | USER | USER_CPU_LIST
Specifies the processor binding policy for sequential and parallel job processes that run on a single host. On Linux execution hosts that support this feature, job processes are hard bound to selected processors.
If processor binding feature is not configured with the BIND_JOB parameter in an application profile in lsb.applications, the LSF_BIND_JOB configuration setting lsf.conf takes effect. The application profile configuration for processor binding overrides the lsf.conf configuration.
Linux with kernel version 2.6 or higher
Not defined. Processor binding is disabled.
CHKPNT_DIR=chkpnt_dir
Specifies the checkpoint directory for automatic checkpointing for the application. To enable automatic checkpoint for the application profile, administrators must specify a checkpoint directory in the configuration of the application profile.
If CHKPNT_PERIOD, CHKPNT_INITPERIOD or CHKPNT_METHOD was set in an application profile but CHKPNT_DIR was not set, a warning message is issued and and those settings are ignored.
The checkpoint directory is the directory where the checkpoint files are created. Specify an absolute path or a path relative to the current working directory for the job. Do not use environment variables in the directory path.
If checkpoint-related configuration is specified in both the queue and an application profile, the application profile setting overrides queue level configuration.
To enable checkpointing of MultiCluster jobs, define a checkpoint directory in an application profile (CHKPNT_DIR, CHKPNT_PERIOD, CHKPNT_INITPERIOD, CHKPNT_METHOD in lsb.applications) of both submission cluster and execution cluster. LSF uses the directory specified in the execution cluster.
Checkpointing is not supported if a job runs on a leased host.
The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.
Not defined
CHKPNT_INITPERIOD=init_chkpnt_period
Specifies the initial checkpoint period in minutes. CHKPNT_DIR must be set in the application profile for this parameter to take effect. The periodic checkpoint specified by CHKPNT_PERIOD does not happen until the initial period has elapse.
Specify a positive integer.
Job-level command line values override the application profile configuration.
If administrators specify an initial checkpoint period and do not specify a checkpoint period (CHKPNT_PERIOD), the job will only checkpoint once.
If the initial checkpoint period if a job is specified, and you run bchkpnt to checkpoint the job at a time before the initial checkpoint period, the initial checkpoint period is not changed by bchkpnt. The first automatic checkpoint still happens after the specified number of minutes.
Not defined
CHKPNT_PERIOD=chkpnt_period
Specifies the checkpoint period for the application in minutes. CHKPNT_DIR must be set in the application profile for this parameter to take effect. The running job is checkpointed automatically every checkpoint period.
Specify a positive integer.
Job-level command line values override the application profile and queue level configurations. Application profile level configuration overrides the queue level configuration.
Not defined
CHKPNT_METHOD=chkpnt_method
Specifies the checkpoint method. CHKPNT_DIR must be set in the application profile for this parameter to take effect. Job-level command line values override the application profile configuration.
Not defined
CHUNK_JOB_SIZE=integer
Chunk jobs only. Allows jobs submitted to the same application profile to be chunked together and specifies the maximum number of jobs allowed to be dispatched together in a chunk. Specify a positive integer greater than or equal to 1.
All of the jobs in the chunk are scheduled and dispatched as a unit, rather than individually.
Specify CHUNK_JOB_SIZE=1 to disable job chunking for the application. This value overrides chunk job dispatch configured in the queue.
Use the CHUNK_JOB_SIZE parameter to configure application profiles that chunk small, short-running jobs. The ideal candidates for job chunking are jobs that have the same host and resource requirements and typically take 1 to 2 minutes to run.
The ideal candidates for job chunking are jobs that have the same host and resource requirements and typically take 1 to 2 minutes to run.
However, throughput can deteriorate if the chunk job size is too big. Performance may decrease on profiles with CHUNK_JOB_SIZE greater than 30. You should evaluate the chunk job size on your own systems for best performance.
With MultiCluster job forwarding model, this parameter does not affect MultiCluster jobs that are forwarded to a remote cluster.
If CHUNK_JOB_DURATION is set in lsb.params, chunk jobs are accepted regardless of the value of CPULIMIT, RUNLIMIT or RUNTIME.
Not defined
CORELIMIT=integer
The per-process (soft) core file size limit for all of the processes belonging to a job from this application profile (see getrlimit(2)). Application-level limits override any default limit specified in the queue, but must be less than the hard limit of the submission queue. Job-level core limit (bsub -C) overrides queue-level and application-level limits.
By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the the limit (MB, GB, TB, PB, or EB).
Unlimited
Specifies the CPU frequency for an application profile. All jobs submit to the application profile require the specified CPU frequency. Value is a positive float number with units (GHz, MHz, or KHz). If no units are set, the default is GHz.
This value can also be set using the command bsub –freq.
The submission value will overwrite the application profile value, and the application profile value will overwrite the queue value.
Not defined (Nominal CPU frequency is used)
CPULIMIT=[[hour:]minute[/host_name | /host_model]
Normalized CPU time allowed for all processes of a job running in the application profile. The name of a host or host model specifies the CPU time normalization host to use.
Limits the total CPU time the job can use. This parameter is useful for preventing runaway jobs or jobs that use up too many resources.
When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.
If a job dynamically spawns processes, the CPU time used by these processes is accumulated over the life of the job.
Processes that exist for fewer than 30 seconds may be ignored.
By default, jobs submitted to the application profile without a job-level CPU limit (bsub -c) are killed when the CPU limit is reached. Application-level limits override any default limit specified in the queue.
The number of minutes may be greater than 59. For example, three and a half hours can be specified either as 3:30 or 210.
If no host or host model is given with the CPU time, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured, otherwise uses the host with the largest CPU factor (the fastest host in the cluster).
On Windows, a job that runs under a CPU time limit may exceed that limit by up to SBD_SLEEP_TIME. This is because sbatchd periodically checks if the limit has been exceeded.
On UNIX systems, the CPU limit can be enforced by the operating system at the process level.
You can define whether the CPU limit is a per-process limit enforced by the OS or a per-job limit enforced by LSF with LSB_JOB_CPULIMIT in lsf.conf.
Unlimited
DATALIMIT=integer
The per-process (soft) data segment size limit (in KB) for all of the processes belonging to a job running in the application profile (see getrlimit(2)).
By default, jobs submitted to the application profile without a job-level data limit (bsub -D) are killed when the data limit is reached. Application-level limits override any default limit specified in the queue, but must be less than the hard limit of the submission queue.
Unlimited
DESCRIPTION=text
Description of the application profile. The description is displayed by bapp -l.
The description should clearly describe the service features of the application profile to help users select the proper profile for each job.
The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 512 characters.
DJOB_COMMFAIL_ACTION="KILL_TASKS|IGNORE_COMMFAIL"
Defines the action LSF should take if it detects a communication failure with one or more remote parallel or distributed tasks. If defined with "KILL_TASKS", LSF tries to kill all the current tasks of a parallel or distributed job associated with the communication failure. If defined with "IGNORE_COMMFAIL", failures will be ignored and the job continues. If not defined, LSF terminates all tasks and shuts down the entire job.
This parameter only applies to the blaunch distributed application framework.
When defined in an application profile, the LSB_DJOB_COMMFAIL_ACTION variable is set when running bsub -app for the specified application.
Not defined. Terminate all tasks, and shut down the entire job.
DJOB_DISABLED=Y | N
Disables the blaunch distributed application framework.
Not defined. Distributed application framework is enabled.
DJOB_ENV_SCRIPT=script_name
Defines the name of a user-defined script for setting and cleaning up the parallel or distributed job environment.
The specified script must support a setup argument and a cleanup argument. The script is executed by LSF with the setup argument before launching a parallel or distributed job, and with argument cleanup after the job is finished.
The script runs as the user, and is part of the job.
If a full path is specified, LSF uses the path name for the execution. Otherwise, LSF looks for the executable from $LSF_BINDIR.
This parameter only applies to the blaunch distributed application framework.
When defined in an application profile, the LSB_DJOB_ENV_SCRIPT variable is set when running bsub -app for the specified application.
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
if DJOB_ENV_SCRIPT=openmpi_rankfile.sh is set in lsb.applications, LSF creates a host rank file and sets the environment variable LSB_RANK_HOSTFILE.
Not defined.
DJOB_HB_INTERVAL=seconds
Value in seconds used to calculate the heartbeat interval between the task RES and job RES of a parallel or distributed job.
This parameter only applies to the blaunch distributed application framework.
When DJOB_HB_INTERVAL is specified, the interval is scaled according to the number of tasks in the job:
max(DJOB_HB_INTERVAL, 10) + host_factor
where
host_factor = 0.01 * number of hosts allocated for the job
Not defined. Interval is the default value of LSB_DJOB_HB_INTERVAL.
When a resizable job releases resources, the LSF distributed parallel job framework terminates running tasks if a host has been completely removed. A DJOB_RESIZE_GRACE_PERIOD defines a grace period in seconds for the application to clean up tasks itself before LSF forcibly terminates them.
No grace period.
DJOB_RU_INTERVAL=seconds
Value in seconds used to calculate the resource usage update interval for the tasks of a parallel or distributed job.
This parameter only applies to the blaunch distributed application framework.
When DJOB_RU_INTERVAL is specified, the interval is scaled according to the number of tasks in the job:
max(DJOB_RU_INTERVAL, 10) + host_factor
where
host_factor = 0.01 * number of hosts allocated for the job
Not defined. Interval is the default value of LSB_DJOB_RU_INTERVAL.
For CPU and memory affinity scheduling jobs launched with the blaunch distributed application framework.
To enable LSF to bind each task to the proper CPUs or NUMA nodes you must use blaunch to start tasks. You must set DJOB_TASK_BIND=Y in lsb.applications or LSB_DJOB_TASK_BIND=Y in the submission environment before submitting the job. When set, only the CPU and memory bindings allocated to the task itself will be set in each tasks environment.
If DJOB_TASK_BIND=N or LSB_DJOB_TASK_BIND=N, or they are not set, each task will have the same CPU or NUMA node binding on one host.
If you do not use blaunch to start tasks, and use another MPI mechanism such as IBM Platform MPI or IBM Parallel Environment, you should not set DJOB_TASK_BIND or set it to N.
N
ENV_VARS="name='value'[,name1='value1'] [,name2='value2',... ]"
ENV_VARS defines application-specific environment variables that will be used by jobs for the application. Use this parameter to define name/value pairs as environment variables. These environment variables are also used in the pre/post-execution environment.
You can include spaces within the single quotation marks when defining a value. Commas and double quotation marks are reserved by LSF and cannot be used as part of the environment variable name or value. If the same environment variable is named multiple times in ENV_VARS and given different values, the last value in the list will be the one which takes effect. LSF does not allow environment variables to contain other environment variables to be expanded on the execution side. Do not redefine LSF environment variables in ENV_VARS.
To define a NULL environment variable, use single quotes with nothing inside. For example:
ENV_VARS="TEST_CAR=''"
Any variable set in the user’s environment will overwrite the value in ENV_VARS. The application profile value will overwrite the execution host environment value.
After changing the value of this parameter, run badmin reconfig to have the changes take effect. The changes apply to pending jobs only. Running jobs are not affected.
Not defined.
FILELIMIT=integer
The per-process (soft) file size limit (in KB) for all of the processes belonging to a job running in the application profile (see getrlimit(2)). Application-level limits override any default limit specified in the queue, but must be less than the hard limit of the submission queue.
Unlimited
Enables host-based post-execution processing at the application level. The HOST_POST_EXEC command runs on all execution hosts after the job finishes. If job based post-execution POST_EXEC was defined at the queue-level/application-level/job-level, the HOST_POST_EXEC command runs after POST_EXEC of any level.
The supported command rule is the same as the existing POST_EXEC for the queue section. See the POST_EXEC topic for details.
The host-based post-execution command cannot be executed on Windows platforms. This parameter cannot be used to configure job-based post-execution processing.
Not defined.
Enables host-based pre-execution processing at the application level. The HOST_PRE_EXEC command runs on all execution hosts before the job starts. If job based pre-execution PRE_EXEC was defined at the queue-level/application-level/job-level, the HOST_PRE_EXEC command runs before PRE_EXEC of any level.
The supported command rule is the same as the existing PRE_EXEC for the queue section. See the PRE_EXEC topic for details.
The host-based pre-execution command cannot be executed on Windows platforms. This parameter cannot be used to configure job-based pre-execution processing.
Not defined.
JOB_CWD=directory
Current working directory (CWD) for the job in the application profile. The path can be absolute or relative to the submission directory. The path can include the following dynamic patterns (which are case sensitive):
Unsupported patterns are treated as text.
If this parameter is changed, then any newly submitted jobs with the -app option will use the new value for CWD if bsub -cwd is not defined.
JOB_CWD supports all LSF path conventions such as UNIX, UNC and Windows formats. In the mixed UNIX /Windows cluster it can be specified with one value for UNIX and another value for Windows separated by a pipe character (|).
JOB_CWD=unix_path|windows_path
The first part of the path must be for UNIX and the second part must be for Windows. Both paths must be full paths.
Not defined.
JOB_CWD_TTL=hours
Specifies the time-to-live for the current working directory (CWD) of a job. LSF cleans created CWD directories after a job finishes based on the TTL value. LSF deletes the CWD for the job if LSF created that directory for the job. The following options are available:
The system checks the directory list every 5 minutes with regards to cleaning and deletes only the last directory of the path to avoid conflicts when multiple jobs share some parent directories. TTL will be calculated after the post-exec script finishes. When LSF (sbatchd) starts, it checks the directory list file and deletes expired CWDs.
If the value for this parameter is not set in the application profile, LSF checks to see if it is set at the cluster-wide level. If neither is set, the default value is used.
Not defined. The value of 2147483647 is used, meaning the CWD is not deleted.
The variable LSB_JOB_INCLUDE_POSTPROC in the user environment overrides the value of JOB_INCLUDE_POSTPROC in an application profile in lsb.applications. JOB_INCLUDE_POSTPROC in an application profile in lsb.applications overrides the value of JOB_INCLUDE_POSTPROC in lsb.params.
For CPU and memory affinity jobs, if JOB_INCLUDE_POSTPROC=Y, LSF does not release affinity resources until post-execution processing has finished, since slots are still occupied by the job during post-execution processing.
For SGI cpusets, if JOB_INCLUDE_POSTPROC=Y, LSF does not release the cpuset until post-execution processing has finished, even though post-execution processes are not attached to the cpuset.
N. Post-execution processing is not included as part of the job, and a new job can start on the execution host before post-execution processing finishes.
Specifies a timeout in minutes for job post-execution processing. The specified timeout must be greater than zero
If post-execution processing takes longer than the timeout, sbatchd reports that post-execution has failed (POST_ERR status). On UNIX and Linux, it kills the entire process group of the job's pre-execution processes. On Windows, only the parent process of the pre-execution command is killed when the timeout expires, the child processes of the pre-execution command are not killed.
If JOB_INCLUDE_POSTPROC=Y, and sbatchd kills the post-execution processes because the timeout has been reached, the CPU time of the post-execution processing is set to 0, and the job’s CPU time does not include the CPU time of post-execution processing.
JOB_POSTPROC_TIMEOUT defined in an application profile in lsb.applications overrides the value in lsb.params. JOB_POSTPROC_TIMEOUT cannot be defined in user environment.
When running host-based post execution processing, set JOB_POSTPROC_TIMEOUT to a value that gives the process enough time to run.
Not defined. Post-execution processing does not time out.
Specify a timeout in minutes for job pre-execution processing. The specified timeout must be an integer greater than zero. If the job's pre-execution processing takes longer than the timeout, LSF kills the job's pre-execution processes, kills the job with a pre-defined exit value of 98, and then requeues the job to the head of the queue. However, if the number of pre-execution retries has reached the limit, LSF suspends the job with PSUSP status instead of requeuing it.
JOB_PREPROC_TIMEOUT defined in an application profile in lsb.applications overrides the value in lsb.params. JOB_PREPROC_TIMEOUT cannot be defined in the user environment.
On UNIX and Linux, sbatchd kills the entire process group of the job's pre-execution processes.
On Windows, only the parent process of the pre-execution command is killed when the timeout expires, the child processes of the pre-execution command are not killed.
Not defined. Pre-execution processing does not time out. However, when running host-based pre-execution processing, you cannot use the infinite value or it will fail. You must configure a reasonable value.
JOB_STARTER=starter [starter] ["%USRCMD"] [starter]
Creates a specific environment for submitted jobs prior to execution. An application-level job starter overrides a queue-level job starter.
starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified.
By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user’s job in the job starter command line. The %USRCMD string and any additional commands must be enclosed in quotation marks (" ").
JOB_STARTER=csh -c "%USRCMD;sleep 10"
bsub myjob arguments
csh -c "myjob arguments;sleep 10"
Not defined. No job starter is used,
LOCAL_MAX_PREEXEC_RETRY=integer
The maximum number of times to attempt the pre-execution command of a job on the local cluster.
0 < MAX_PREEXEC_RETRY < INFINIT_INT
INFINIT_INT is defined in lsf.h.
Not defined. The number of preexec retry times is unlimited
MAX_JOB_PREEMPT=integer
The maximum number of times a job can be preempted. Applies to queue-based preemption only.
0 < MAX_JOB_PREEMPT < INFINIT_INT
INFINIT_INT is defined in lsf.h.
Not defined. The number of preemption times is unlimited.
MAX_JOB_REQUEUE=integer
The maximum number of times to requeue a job automatically.
0 < MAX_JOB_REQUEUE < INFINIT_INT
INFINIT_INT is defined in lsf.h.
Not defined. The number of requeue times is unlimited
MAX_PREEXEC_RETRY=integer
Use REMOTE_MAX_PREEXEC_RETRY instead. This parameter is only maintained for backwards compatibility.
MultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.
If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.
0 < MAX_PREEXEC_RETRY < INFINIT_INT
INFINIT_INT is defined in lsf.h.
5
MAX_TOTAL_TIME_PREEMPT=integer
The accumulated preemption time in minutes after which a job cannot be preempted again, where minutes is wall-clock time, not normalized time.
Setting this parameter in lsb.applications overrides the parameter of the same name in lsb.queues and in lsb.params.
Any positive integer greater than or equal to one (1)
Unlimited
MEMLIMIT=integer
The per-process (soft) process resident set size limit for all of the processes belonging to a job running in the application profile.
Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process.
By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the the limit (MB, GB, TB, PB, or EB).
By default, jobs submitted to the application profile without a job-level memory limit are killed when the memory limit is reached. Application-level limits override any default limit specified in the queue, but must be less than the hard limit of the submission queue.
OS memory limit enforcement is the default MEMLIMIT behavior and does not require further configuration. OS enforcement usually allows the process to eventually run to completion. LSF passes MEMLIMIT to the OS, which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT. Only available on systems that support RLIMIT_RSS for setrlimit().
To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in lsf.conf to y. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT.
You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in lsf.conf to y. The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.
Available for all systems on which LSF collects total memory usage.
Unlimited
MEMLIMIT_TYPE=JOB [PROCESS] [TASK]
MEMLIMIT_TYPE=PROCESS [JOB] [TASK]
MEMLIMIT_TYPE=TASK [PROCESS] [JOB]
A memory limit is the maximum amount of memory a job is allowed to consume. Jobs that exceed the level are killed. You can specify different types of memory limits to enforce. Use any combination of JOB, PROCESS, and TASK.
By specifying a value in the application profile, you overwrite these three parameters: LSB_JOB_MEMLIMIT, LSB_MEMLIMIT_ENFORCE, LSF_HPC_EXTENSIONS (TASK_MEMLIMIT).
Not defined. The memory limit-level is still controlled by LSF_HPC_EXTENSIONS=TASK_MEMLIMIT, LSB_JOB_MEMLIMIT, LSB_MEMLIMIT_ENFORCE
MIG=minutes
Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes.
LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes. A value of 0 specifies that a suspended job is migrated immediately. The migration threshold applies to all jobs running on the host.
Job-level command line migration threshold overrides threshold configuration in application profile and queue. Application profile configuration overrides queue level configuration.
When a host migration threshold is specified, and is lower than the value for the job, the queue, or the application, the host value is used.
Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from the job chunk and put into PEND state.
Does not affect MultiCluster jobs that are forwarded to a remote cluster.
Not defined. LSF does not migrate checkpointable or rerunnable jobs automatically.
NAME=string
Required. Unique name for the application profile.
NAME=myapp 1.0
You must specify this parameter to define an application profile. LSF does not automatically assign a default application profile name.
network_res_req has the following syntax:
For LSF IBM Parallel Environment (PE) integration. Specifies the network resource requirements for a PE job.
If any network resource requirement is specified in the job, queue, or application profile, the job is treated as a PE job. PE jobs can only run on hosts where IBM PE pnsd daemon is running.
The network resource requirement string network_res_req has the same syntax as the bsub -network option.
The -network bsub option overrides the value of NETWORK_REQ defined in lsb.queues or lsb.applications. The value of NETWORK_REQ defined in lsb.applications overrides queue-level NETWORK_REQ defined in lsb.queues.
The following IBM LoadLeveller job command file options are not supported in LSF:When used for switch adapters, specifies that all windows are on a single network
Specifies that one or more windows are on each network, and that striped communication should be used over all available switch networks. The networks specified must be accessible by all hosts selected to run the PE job. See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about submitting jobs that use striping.
If mode is IP and type is specified as sn_all or sn_single, the job will only run on InfiniBand (IB) adapters (IPoIB). If mode is IP and type is not specified, the job will only run on Ethernet adapters (IPoEth). For IPoEth jobs, LSF ensures the job is running on hosts where pnsd is installed and running. For IPoIB jobs, LSF ensures the job the job is running on hosts where pnsd is installed and running, and that IB networks are up. Because IP jobs do not consume network windows, LSF does not check if all network windows are used up or the network is already occupied by a dedicated PE job.
Equivalent to the PE MP_EUIDEVICE environment variable and -euidevice PE flag See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information. Only sn_all or sn_single are supported by LSF. The other types supported by PE are not supported for LSF jobs.
The application makes only MPI calls. This value applies to any MPI job regardless of the library that it was compiled with (PE MPI, MPICH2).
The application makes only PAMI calls.
The application makes only LAPI calls.
The application makes only OpenSHMEM calls.
The application makes only calls from a parallel API that you define. For example: protocol=myAPI or protocol=charm.
The default value is mpi.
LSF also supports an optional protocol_number (for example, mpi(2), which specifies the number of contexts (endpoints) per parallel API instance. The number must be a power of 2, but no greater than 128 (1, 2, 4, 8, 16, 32, 64, 128). LSF will pass the communication protocols to PE without any change. LSF will reserve network windows for each protocol.
When you specify multiple parallel API protocols, you cannot make calls to both LAPI and PAMI (lapi, pami) or LAPI and OpenSHMEM (lapi, shmem) in the same application. Protocols can be specified in any order.
See the MP_MSG_API and MP_ENDPOINTS environment variables and the -msg_api and -endpoints PE flags in the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about the communication protocols that are supported by IBM PE.
The network communication system mode used by the communication specified communication protocol: US (User Space) or IP (Internet Protocol). A US job can only run with adapters that support user space communications, such as the IB adapter. IP jobs can run with either Ethernet adapters or IB adapters. When IP mode is specified, the instance number cannot be specified, and network usage must be unspecified or shared.
Each instance on the US mode requested by a task running on switch adapters requires and adapter window. For example, if a task requests both the MPI and LAPI protocols such that both protocol instances require US mode, two adapter windows will be used.
The default value is US.
Specifies whether the adapter can be shared with tasks of other job steps: dedicated or shared. Multiple tasks of the same job can share one network even if usage is dedicated.
The default usage is shared.
The number of parallel communication paths (windows) per task made available to the protocol on each network. The number actually used depends on the implementation of the protocol subsystem.
The default value is 1.
If the specified value is greater than MAX_PROTOCOL_INSTANCES in lsb.params or lsb.queues, LSF rejects the job.
LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for NETWORK_REQ to take effect. If LSF_PE_NETWORK_NUM is not defined or is set to 0, NETWORK_REQ is ignored with a warning message.
The following network resource requirement string specifies that the requirements for an sn_all job (one or more windows are on each network, and striped communication should be used over all available switch networks). The PE job uses MPI API calls (protocol), runs in user-space network communication system mode, and requires 1 parallel communication path (window) per task.
NETWORK_REQ = "protocol=mpi:mode=us:instance=1:type=sn_all"
No default value, but if you specify no value (NETWORK_REQ=""), the job uses the following: protocol=mpi:mode=US:usage=shared:instance=1 in the application profile.
NICE=integer
Adjusts the UNIX scheduling priority at which jobs from the application execute.
A value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs to control their effect on other batch or interactive jobs. See the nice(1) manual page for more details.
LSF on Windows does not support HIGH or REAL-TIME priority classes.
When set, this value overrides NICE set at the queue level in lsb.queues.
Not defined.
Prevents preemption of jobs for the specified number of minutes of uninterrupted run time, where minutes is wall-clock time, not normalized time. NO_PREEMPT_INTERVAL=0 allows immediate preemption of jobs as soon as they start or resume running.
Setting this parameter in lsb.applications overrides the parameter of the same name in lsb.queues and in lsb.params.
0
Prevents preemption of jobs that will finish within the specified number of minutes or the specified percentage of the estimated run time or run limit.
Specifies that jobs due to finish within the specified number of minutes or percentage of job duration should not be preempted, where minutes is wall-clock time, not normalized time. Percentage must be greater than 0 or less than 100% (between 1% and 99%).
For example, if the job run limit is 60 minutes and NO_PREEMPT_FINISH_TIME=10%, the job cannot be preempted after it runs 54 minutes or longer.
If you specify percentage for NO_PREEMPT_FINISH_TIME, requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.applications)
Prevents preemption of jobs that have been running for the specified number of minutes or the specified percentage of the estimated run time or run limit.
Specifies that jobs that have been running for the specified number of minutes or longer should not be preempted, where minutes is wall-clock time, not normalized time. Percentage must be greater than 0 or less than 100% (between 1% and 99%).
For example, if the job run limit is 60 minutes and NO_PREEMPT_RUN_TIME=50%, the job cannot be preempted after it running 30 minutes or longer.
If you specify percentage for NO_PREEMPT_RUN_TIME, requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.applications)
Applies when migrating parallel jobs in a multicluster environment. Setting PERSISTENT_HOST_ORDER=Y ensures that jobs are restarted on hosts based on alphabetical names of the hosts, preventing them from being restarted on the same hosts that they ran on before migration.
PERSISTENT_HOST_ORDER=N. Migrated jobs in a multicluster environment could run on the same hosts that they ran on before.
Enables post-execution processing at the application level. The POST_EXEC command runs on the execution host after the job finishes. Post-execution commands can be configured at the job, application, and queue levels.
If both application-level (POST_EXEC in lsb.applications) and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands. Queue-level post-execution commands (POST_EXEC in lsb.queues) run after application-level post-execution and job-level post-execution commands.
The POST_EXEC command uses the same environment variable values as the job, and runs under the user account of the user who submits the job.
When a job exits with one of the application profile’s REQUEUE_EXIT_VALUES, LSF requeues the job and sets the environment variable LSB_JOBPEND. The post-execution command runs after the requeued job finishes.
When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. If the execution environment for the job cannot be set up, LSB_JOBEXIT_STAT is set to 0 (zero).
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
PATH="/bin /usr/bin /sbin /usr/sbin"
setenv USER_POSTEXEC /path_name
For post-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe.
Not defined. No post-execution commands are associated with the application profile.
PREEMPT_DELAY=seconds
Preemptive jobs will wait the specified number of seconds from the submission time before preempting any low priority preemptable jobs. During the grace period, preemption will not be trigged, but the job can be scheduled and dispatched by other scheduling policies.
This feature can provide flexibility to tune the system to reduce the number of preemptions. It is useful to get better performance and job throughput. When the low priority jobs are short, if high priority jobs can wait a while for the low priority jobs to finish, preemption can be avoided and cluster performance is improved. If the job is still pending after the grace period has expired, the preemption will be triggered.
The waiting time is for preemptive jobs in the pending status only. It will not impact the preemptive jobs that are suspended.
The time is counted from the submission time of the jobs. The submission time means the time mbatchd accepts a job, which includes newly submitted jobs, restarted jobs (by brestart) or forwarded jobs from a remote cluster.
When the preemptive job is waiting, the pending reason is:
The preemptive job is allowing a grace period before preemption.
If you use an older version of bjobs, the pending reason is:
Unknown pending reason code <6701>;
The parameter is defined in lsb.params, lsb.queues (overrides lsb.params), and lsb.applications (overrides both lsb.params and lsb.queues).
Run badmin reconfig to make your changes take effect.
Not defined (if the parameter is not defined anywhere, preemption is immediate).
Enables pre-execution processing at the application level. The PRE_EXEC command runs on the execution host before the job starts. If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.
The PRE_EXEC command uses the same environment variable values as the job, and runs under the user account of the user who submits the job.
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
PATH="/bin /usr/bin /sbin /usr/sbin"
For pre-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. This parameter cannot be used to configure host-based pre-execution processing.
Not defined. No pre-execution commands are associated with the application profile.
PROCESSLIMIT=integer
Limits the number of concurrent processes that can be part of a job.
By default. jobs submitted to the application profile without a job-level process limit are killed when the process limit is reached. Application-level limits override any default limit specified in the queue.
SIGINT, SIGTERM, and SIGKILL are sent to the job in sequence when the limit is reached.
Unlimited
PROCLIMIT=[minimum_limit [default_limit]] maximum_limit
Maximum number of slots that can be allocated to a job. For parallel jobs, the maximum number of processors that can be allocated to the job.
Optionally specifies the minimum and default number of job slots. All limits must be positive integers greater than or equal to 1 that satisfy the following relationship:
1 <= minimum <= default <= maximum
Queue level PROCLIMIT has the highest priority over application level PROCLIMIT and job level PROCLIMIT. Application level PROCLIMIT has higher priority than job level PROCLIMIT. Job-level limits must fall within the maximum and minimum limits of the application profile and the queue.
Jobs that request fewer slots than the minimum PROCLIMIT or more slots than the maximum PROCLIMIT cannot use the application profile and are rejected. If the job requests minimum and maximum job slots, the maximum slots requested cannot be less than the minimum PROCLIMIT, and the minimum slots requested cannot be more than the maximum PROCLIMIT.
Unlimited, the default number of slots is 1
REMOTE_MAX_PREEXEC_RETRY=integer
MultiCluster job forwarding model only. The maximum number of times to attempt the pre-execution command of a job from a remote cluster.
If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.
up to INFINIT_INT defined in lsf.h.
5
REQUEUE_EXIT_VALUES=[exit_code ...] [EXCLUDE(exit_code ...)]
Enables automatic job requeue and sets the LSB_EXIT_REQUEUE environment variable. Use spaces to separate multiple exit codes. Application-level exit values override queue-level values. Job-level exit values (bsub -Q) override application-level and queue-level values.
"[all] [~number ...] | [number ...]"
The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list.
Jobs are requeued to the head of the queue. The output from the failed run is not saved, and the user is not notified by LSF.
Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue, ensuring the job does not rerun on the samehost. Exclusive job requeue does not work for parallel jobs.
For MultiCluster jobs forwarded to a remote execution cluster, the exit values specified in the submission cluster with the EXCLUDE keyword are treated as if they were non-exclusive.
You can also requeue a job if the job is terminated by a signal.
If a job is killed by a signal, the exit value is 128+signal_value. The sum of 128 and the signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.
For example, if you want a job to rerun if it is killed with a signal 9 (SIGKILL), the exit value would be 128+9=137. You can configure the following requeue exit value to allow a job to be requeue if it was kill by signal 9:
REQUEUE_EXIT_VALUES=137
In Windows, if a job is killed by a signal, the exit value is signal_value. The signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.
For example, if you want to rerun a job after it was killed with a signal 7 (SIGKILL), the exit value would be 7. You can configure the following requeue exit value to allow a job to requeue after it was killed by signal 7:
REQUEUE_EXIT_VALUES=7
You can configure the following requeue exit value to allow a job to requeue for both Linux and Windows after it was killed:
REQUEUE_EXIT_VALUES=137 7
If mbatchd is restarted, it does not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.
You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds).
REQUEUE_EXIT_VALUES=30 EXCLUDE(20)
means that jobs with exit code 30 are requeued, jobs with exit code 20 are requeued exclusively, and jobs with any other exit code are not requeued.
Not defined. Jobs are not requeued.
RERUNNABLE=yes | no
If yes, enables automatic job rerun (restart) for any job associated with the application profile.
Rerun is disabled when RERUNNABLE is set to no. The yes and no arguments are not case-sensitive.
Members of a chunk job can be rerunnable. If the execution host becomes unavailable, rerunnable chunk job members are removed from the job chunk and dispatched to a different execution host.
Job level rerun (bsub -r) overrides the RERUNNABLE value specified in the application profile, which overrides the queue specification. bmod -rn to make rerunnable jobs non-rerunnable overrides both the application profile and the queue.
Not defined.
RES_REQ=res_req
Resource requirements used to determine eligible hosts. Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds.
Resource requirement strings can be simple (applying to the entire job), compound (applying to the specified number of slots), or can contain alternative resources (alternatives between 2 or more simple and/or compound). When a compound resource requirement is set at the application-level, it will be ignored if any job-level resource requirements (simple or compound) are defined.
Compound and alternative resource requirements follow the same set of rules for determining how resource requirements are going to be merged between job, application, and queue level. In the event no job-level resource requirements are set, the compound or alternative application-level requirements interact with queue-level resource requirement strings in the following ways:
section | compound/alternative application and simple queue behavior |
---|---|
select | both levels satisfied; queue requirement applies to all terms |
same | queue level ignored |
order span |
application-level section overwrites queue-level section (if a given level is present); queue requirement (if used) applies to all terms |
rusage |
For example: if the application-level requirement is num1*{rusage[R1]} + num2*{rusage[R2]} and the queue-level requirement is rusage[RQ] where RQ is a job resource, the merged requirement is num1*{rusage[merge(R1,RQ)]} + num2*{rusage[R2]} |
Compound or alternative resource requirements do not support the cu section, or the || operator within the rusage section.
Alternative resource strings use the || operator as a separator for each alternative resource.
Multiple -R strings cannot be used with multi-phase rusage resource requirements.
For internal load indices and duration, jobs are rejected if they specify resource reservation requirements at the job or application level that exceed the requirements specified in the queue.
By default, memory (mem) and swap (swp) limits in select[] and rusage[] sections are specified in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for these limits (GB, TB, PB, or EB).
When LSF_STRICT_RESREQ=Y is configured in lsf.conf, resource requirement strings in select sections must conform to a more strict syntax. The strict resource requirement syntax only applies to the select section. It does not apply to the other resource requirement sections (order, rusage, same, span, cu, or affinity). When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.
For simple resource requirements, the select section defined at the application, queue, and job level must all be satisfied.
The rusage section can specify additional requests. To do this, use the OR (||) operator to separate additional rusage strings. The job-level rusage section takes precedence.
Compound resource requirements do not support use of the || operator within the component rusage simple resource requirements. Multiple rusage strings cannot be used with multi-phase rusage resource requirements.
When both job-level and application-level rusage sections are defined using simple resource requirement strings, the rusage section defined for the job overrides the rusage section defined in the application profile. The rusage definitions are merged, with the job-level rusage taking precedence. Any queue-level requirements are then merged with that result.
RES_REQ=rusage[mem=200:lic=1] ...
bsub -R "rusage[mem=100]" ...
the resulting requirement for the job is
rusage[mem=100:lic=1]
where mem=100 specified by the job overrides mem=200 specified by the application profile. However, lic=1 from application profile is kept, since job does not specify it.
RES_REQ = rusage[bwidth =2:threshold=5] ...
bsub -R "rusage[bwidth =1:threshold=6]" ...
the resulting requirement for the job is
rusage[bwidth =1:threshold=6]
RES_REQ=rusage[mem=200:duration=20:decay=1] ...
bsub -R "rusage[mem=100]" ...
rusage[mem=100:duration=20:decay=1]
Application-level duration and decay are merged with the job-level specification, and mem=100 for the job overrides mem=200 specified by the application profile. However, duration=20 and decay=1 from application profile are kept, since job does not specify them.
RES_REQ=rusage[mem=(200 150):duration=(10 10):decay=(1),swap=100] ...
bsub -app app_name -R "rusage[mem=(600 350):duration=(20 10):decay=(0 1)]" ...
rusage[mem=(600 350):duration=(20 10):decay=(0 1),swap=100]
The job-level values for mem, duration and decay override the application-level values. However, swap=100 from the application profile is kept, since the job does not specify swap.
RES_REQ=rusage[mem=(200 150):duration=(10 10):decay=(1)] ...
bsub -app app_name -R "rusage[mem=200:duration=15:decay=0]" ...
rusage[mem=200:duration=15:decay=0]
Job-level values override the application-level multi-phase rusage string.
For simple resource requirements the order section defined at the job-level overrides any application-level order section. An application-level order section overrides queue-level specification. The order section defined at the application level is ignored if any resource requirements are specified at the job level. If the no resource requirements include an order section, the default order r15s:pg is used.
The command syntax is:
[!][-]resource_name [: [-]resource_name]
For example:
bsub -R "order[!ncpus:mem]" myjob
"!" only works with consumable resources because resources can be specified in the rusage[] section and their value may be changed in schedule cycle (for example, slot or memory). In LSF scheduler, slots under RUN, SSUSP, USUP and RSV may be freed in different scheduling phases. Therefore, the slot value may change in different scheduling cycles.
For simple resource requirements the span section defined at the job-level overrides an application-level span section, which overrides a queue-level span section.
For simple resource requirements all same sections defined at the job-level, application-level, and queue-level are combined before the job is dispatched.
For simple resource requirements the job-level cu section overrides the application-level, and the application-level cu section overrides the queue-level.
For simple resource requirements the job-level affinity section overrides the application-level, and the application-level affinity section overrides the queue-level.
select[type==local] order[r15s:pg]
If this parameter is defined and a host model or Boolean resource is specified, the default type is any.
RESIZABLE_JOBS = [Y|N|auto]
N|n: The resizable job feature is disabled in the application profile. Under this setting, all jobs attached to this application profile are not resizable. All bresize and bsub -ar commands will be rejected with a proper error message.
Y|y: Resize is enabled in the application profile and all jobs belonging to the application are resizable by default. Under this setting, users can run bresize commands to cancel pending resource allocation requests for the job or release resources from an existing job allocation, or use bsub to submit an autoresizable job.
auto: All jobs belonging to the application will be autoresizable.
Resizable jobs must be submitted with an application profile that defines RESIZABLE_JOBS as either auto or Y. If application defines RESIZABLE_JOBS=auto, but administrator changes it to N and reconfigures LSF, jobs without job-level auto resizable attribute become not autoresizable. For running jobs that are in the middle of notification stage, LSF lets current notification complete and stops scheduling. Changing RESIZABLE_JOBS configuration does not affect jobs with job-level autoresizable attribute. (This behavior is same as exclusive job, bsub -x and EXCLUSIVE parameter in queue level.)
Auto-resizable jobs cannot be submitted with compute unit resource requirements. In the event a bswitch call or queue reconfiguration results in an auto-resizable job running in a queue with compute unit resource requirements, the job will no longer be auto-resizable.
Resizable jobs cannot have compound resource requirements.
If the parameter is undefined, the default value is N.
Defines an executable command to be invoked on the first execution host of a job when a resize event occurs. The maximum length of notification command is 4 KB.
Not defined. No resize notification command is invoked.
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
RTASK_GONE_ACTION="[KILLJOB_TASKDONE | KILLJOB_TASKEXIT] [IGNORE_TASKCRASH]"
Defines the actions LSF should take if it detects that a remote task of a parallel or distributed job is gone.
This parameter only applies to the blaunch distributed application framework.
A remote task crashes. LSF does nothing. The job continues to launch the next task.
A remote task exits with zero value. LSF terminates all tasks in the job.
A remote task exits with non-zero value. LSF terminates all tasks in the job.
When defined in an application profile, the LSB_DJOB_RTASK_GONE_ACTION variable is set when running bsub -app for the specified application.
You can also use the environment variable LSB_DJOB_RTASK_GONE_ACTION to override the value set in the application profile.
RTASK_GONE_ACTION="IGNORE_TASKCRASH KILLJOB_TASKEXIT"
Not defined. LSF does nothing.
RUNLIMIT=[hour:]minute[/host_name | /host_model]
The default run limit. The name of a host or host model specifies the runtime normalization host to use.
By default, jobs that are in the RUN state for longer than the specified run limit are killed by LSF. You can optionally provide your own termination job action to override this default.
If you want to provide an estimated run time for scheduling purposes without killing jobs that exceed the estimate, define the RUNTIME parameter in the application profile, or submit the job with -We instead of a run limit.
The run limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.
The run limit you specify is the normalized run time. This is done so that the job does approximately the same amount of processing, even if it is sent to host with a faster or slower CPU. Whenever a normalized run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.
If ABS_RUNLIMIT=Y is defined in lsb.params or in the application profile, the runtime limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted to an application profile with a run limit configured.
Optionally, you can supply a host name or a host model name defined in LSF. You must insert ‘/’ between the run limit and the host name or model name. (See lsinfo(1) to get host model information.)
If no host or host model is given, LSF uses the default runtime normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured; otherwise, the host with the largest CPU factor (the fastest host in the cluster).
For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster).
Jobs submitted to a chunk job queue are not chunked if RUNLIMIT is greater than 30 minutes.
Unlimited
RUNTIME=[hour:]minute[/host_name | /host_model]
The RUNTIME parameter specifies an estimated run time for jobs associated with an application. LSF uses the RUNTIME value for scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined RUNLIMIT. The format of runtime estimate is same as the RUNLIMIT parameter.
The job-level runtime estimate specified by bsub -We overrides the RUNTIME setting in an application profile.
Not defined
STACKLIMIT=integer
The per-process (soft) stack segment size limit for all of the processes belonging to a job from this queue (see getrlimit(2)). Application-level limits override any default limit specified in the queue, but must be less than the hard limit of the submission queue.
By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the the limit (MB, GB, TB, PB, or EB).
Unlimited
SUCCESS_EXIT_VALUES=[exit_code …]
Specifies exit values used by LSF to determine if job was done successfully. Use spaces to separate multiple exit codes. Job-level success exit values specified with the LSB_SUCCESS_EXIT_VALUES environment variable override the configration in application profile.
Use SUCCESS_EXIT_VALUES for applications that successfully exit with non-zero values so that LSF does not interpret non-zero exit codes as job failure.
exit_code should be the value between 0 and 255. Use spaces to separate exit code values.
If both SUCCESS_EXIT_VALUES and REQUEUE_EXIT_VALUES are defined with the same exit code, REQUEUE_EXIT_VALUES will take precedence and the job will be set to PEND state and requeued.
0
When the suspending reason SUSP_LOAD_REASON (suspended by load) is set in LSB_SUSP_REASONS, LSB_SUSP_SUBREASONS is set to one of the load index values defined in lsf.h.
Use LSB_SUSP_REASONS and LSB_SUSP_SUBREASONS together in your custom job control to determine the exact load threshold that caused a job to be suspended.
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
SWAPLIMIT=integer
Limits the amount of total virtual memory limit for the job.
This limit applies to the whole job, no matter how many processes the job may contain. Application-level limits override any default limit specified in the queue.
The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL.
By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the the limit (MB, GB, TB, PB, or EB).
Unlimited
The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
THREADLIMIT=integer
Limits the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.
By default, jobs submitted to the queue without a job-level thread limit are killed when the thread limit is reached. Application-level limits override any default limit specified in the queue.
The limit must be a positive integer.
Unlimited
USE_PAM_CREDS=y | n
If USE_PAM_CREDS=y, applies PAM limits to an application when its job is dispatched to a Linux host using PAM. PAM limits are system resource limits defined in limits.conf.
When USE_PAM_CREDS is enabled, PAM limits override others.
If the execution host does not have PAM configured and this parameter is enabled, the job fails.
For parallel jobs, only takes effect on the first execution host.
Overrides MEMLIMIT_TYPE=Process.
Overridden (for CPU limit only) by LSB_JOB_CPULIMIT=y.
Overridden (for memory limits only) by LSB_JOB_MEMLIMIT=y.
n
Use if-else constructs and time expressions to define time windows in the file. Configuration defined within in a time window applies only during the specified time period; configuration defined outside of any time window applies at all times. After editing the file, run badmin reconfig to reconfigure the cluster.
Time expressions in the file are evaluated by LSF every 10 minutes, based on mbatchd start time. When an expression evaluates true, LSF changes the configuration in real time, without restarting mbatchd, providing continuous system availability.
Time-based configuration also supports MultiCluster configuration in terms of shared configuration for groups of clusters (using the #include parameter). That means you can include a common configuration file by using the time-based feature in local configuration files.
Begin application
NAME=app1
#if time(16:00-18:00)
CPULIMIT=180/hostA
#else
CPULIMIT=60/hostA
#endif
End application
In this example, for two hours every day, the configuration is the following:
Begin application
NAME=app1
CPULIMIT=180/hostA
End application
The rest of the time, the configuration is the following:
Begin application
NAME=app1
CPULIMIT=60/hostA
End application