Most MPI implementations and many distributed applications use rsh and ssh as their task launching mechanism. The blaunch command provides a drop-in replacement for rsh and ssh as a transparent method for launching parallel and distributed applications within LSF.
The following figure illustrates blaunch processing:
Similar to the LSF lsrun command, blaunch transparently connects directly to the RES/SBD on the remote host, and subsequently creates and tracks the remote tasks, and provides the connection back to LSF. There no need to insert pam/taskstarter into the rsh or ssh calling sequence, or configure any wrapper scripts.
blaunch supports the following core command line options as rsh and ssh:
rsh host_name command
ssh host_name command
Whereas the host name value for rsh and ssh can only be a single host name, you can use the -z option to specify a space-delimited list of hosts where tasks are started in parallel. All other rsh and ssh options are silently ignored.
You cannot run blaunch directly from the command line as a standalone command. blaunch only works within an LSF job; it can only be used to launch tasks on remote hosts that are part of a job allocation. On success, blaunch exits with 0.
blaunch is supported on Windows 2000 or later with the following exceptions:
Only the following signals are supported: SIGKILL, SIGSTOP, SIGCONT.
The -n option is not supported.
CMD.EXE /C <user command line> is used as intermediate command shell when -no-shell is not specified
CMD.EXE /C is not used when -no-shell is specified.
Windows Vista User Account Control must be configured correctly to run jobs.
LSF provides the following APIs for programming your own applications to use the blaunch distributed application framework:
lsb_launch(): Synchronous API call to allow source level integration with vendor MPI implementations. This API will launch the specified command (argv) on the remote nodes in parallel. LSF must be installed before integrating your MPI implementation with lsb_launch(). The lsb_launch() API requires the full set of liblsf.so, libbat.so (or liblsf.a, libbat.a).
lsb_getalloc(): Allocates memory for a host list to be used for launching parallel tasks through blaunch and the lsb_launch()API. It is the responsibility of the caller to free the host list when it is no longer needed. On success, the host list will be a list of strings. Before freeing the host list, the individual elements must be freed. An application using the lsb_getalloc() API is assumed to be part of an LSF job, and that LSB_MCPU_HOSTS is set in the environment.
blaunch determines from the job environment what job it is running under, and what the allocation for the job is. These can be determined by examining the environment variables LSB_JOBID, LSB_JOBINDEX, and LSB_MCPU_HOSTS. If any of these variables do not exist, blaunch exits with a non-zero value. Similarly, if blaunch is used to start a task on a host not listed in LSB_MCPU_HOSTS, the command exits with a non-zero value.
The job submission script contains the blaunch command in place of rsh or ssh. The blaunch command does sanity checking of the environment to check for LSB_JOBID and LSB_MCPU_HOSTS. The blaunch command contacts the job RES to validate the information determined from the job environment. When the job RES receives the validation request from blaunch, it registers with the root sbatchd to handle signals for the job.
The job RES periodically requests resource usage for the remote tasks. This message also acts as a heartbeat for the job. If a resource usage request is not made within a certain period of time it is assumed the job is gone and that the remote tasks should be shut down. This timeout is configurable in an application profile in lsb.applications.
The blaunch command also honors the parameters LSB_CMD_LOG_MASK, LSB_DEBUG_CMD, and LSB_CMD_LOGDIR when defined in lsf.conf or as environment variables. The environment variables take precedence over the values in lsf.conf.
To ensure that no other users can run jobs on hosts allocated to tasks launched by blaunch set LSF_DISABLE_LSRUN=Y in lsf.conf. When LSF_DISABLE_LSRUN=Y is defined, RES refuses remote connections from lsrun and lsgrun unless the user is either an LSF administrator or root. LSF_ROOT_REX must be defined for remote execution by root. Other remote execution commands, such as ch and lsmake are not affected.
By default, LSF creates a temporary directory for a job only on the first execution host. If LSF_TMPDIR is set in lsf.conf, the path of the job temporary directory on the first execution host is set to LSF_TMPDIR/job_ID.tmpdir.
If LSB_SET_TMPDIR= Y, the environment variable TMPDIR will be set equal to the path specified by LSF_TMPDIR. This value for TMPDIR overrides any value that might be set in the submission environment.
Tasks launched through the blaunch distributed application framework make use of the LSF temporary directory specified by LSF_TMPDIR:
When the environment variable TMPDIR is set on the first execution host, the blaunch framework propagates this environment variable to all execution hosts when launching remote tasks.
The job RES or the task RES creates the directory specified by TMPDIR if it does not already exist before starting the job.
The directory created by the job RES or task RES has permission 0700 and is owned by the execution user.
If the TMPDIR directory was created by the task RES, LSF deletes the temporary directory and its contents when the task is complete.
If the TMPDIR directory was created by the job RES, LSF will delete the temporary directory and its contents when the job is done.
If the TMPDIR directory is on a shared file system, it is assumed to be shared by all the hosts allocated to the blaunch job, so LSF does not remove TMPDIR directories created by the job RES or task RES.
LSF automatically places the allocated hosts for a job into the $LSB_HOSTS and $LSB_MCPU_HOSTS environment variables. Since most MPI implementations and parallel applications expect to read the allocated hosts from a file, LSF creates a host file in the default job output directory $HOME/.lsbatch on the execution host before the job runs, and deletes it after the job has finished running. The name of the host file created has the format:
.lsb.<jobid>.hostfile
The host file contains one host per line. For example, if LSB_MCPU_HOSTS="hostA 2 hostB 2 hostC 1", the host file contains:
hostA
hostA
hostB
hostB
hostC
LSF publishes the full path to the host file by setting the environment variable LSB_DJOB_HOSTFILE.
You can configure an application profile in lsb.applications to control the behavior of a parallel or distributed application when a remote task exits. Specify a value for RTASK_GONE_ACTION in the application profile to define what the LSF does when a remote task exits. The default behavior is as follows:
When task exits with zero value, LSF does nothing.
When task exits with non-zero value, LSF LSF does nothing.
When task crashes, LSF shuts down the entire job.
RTASK_GONE_ACTION has the following syntax:
RTASK_GONE_ACTION="[KILLJOB_TASKDONE | KILLJOB_TASKEXIT]
[IGNORE_TASKCRASH]"
Where:
IGNORE_TASKCRASH: A remote task crashes. LSF does nothing. The job continues to launch the next task.
KILLJOB_TASKDONE: A remote task exits with zero value. LSF terminates all tasks in the job.
KILLJOB_TASKEXIT: A remote task exits with non-zero value. LSF terminates all tasks in the job.
For example:
RTASK_GONE_ACTION="IGNORE_TASKCRASH KILLJOB_TASKEXIT"
RTASK_GONE_ACTION only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_RTASK_GONE_ACTION variable is set when running bsub -app for the specified application. You can also use the environment variable LSB_DJOB_RTASK_GONE_ACTION to override the value set in the application profile.
RTASK_GONE_ACTION=IGNORE_TASKCRASH has no effect on PE jobs: When a user application is killed, POE triggers the job to quit.
By default, LSF shuts down the entire job if connection is lost with the task RES, validation timeout, or heartbeat timeout. You can configure an application profile in lsb.applications so only the current tasks are shut down, not the entire job.
Use DJOB_COMMFAIL_ACTION="KILL_TASKS" to define the behavior of LSF when it detects a communication failure between itself and one or more tasks. If not defined, LSF terminates all tasks, and shuts down the job. If set to KILL_TASKS, LSF tries to kill all the current tasks of a parallel or distributed job associated with the communication failure.
DJOB_COMMFAIL_ACTION only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_COMMFAIL_ACTION environment variable is set when running bsub -app for the specified application.
LSF can run an appropriate script that is responsible for setup and cleanup of the job launching environment. You can specify the name of the appropriate script in an application profile in lsb.applications.
Use DJOB_ENV_SCRIPT to define the path to a script that sets the environment for the parallel or distributed job launcher. The script runs as the user, and is part of the job. DJOB_ENV_SCRIPT only applies to the blaunch distributed application framework. If a full path is specified, LSF uses the path name for the execution. If a full path is not specified, LSF looks for it in LSF_BINDIR.
The specified script must support a setup argument and a cleanup argument. LSF invokes the script with the setup argument before launching the actual job to set up the environment, and with cleanup argument after the job is finished.
LSF assumes that if setup cannot be performed, the environment to run the job does not exist. If the script returns a non-zero value at setup, an error is printed to stderr of the job, and the job exits. Regardless of the return value of the script at cleanup, the real job exit value is used. If the return value of the script is non-zero, an error message is printed to stderr of the job.
When defined in an application profile, the LSB_DJOB_ENV_SCRIPT variable is set when running bsub -app for the specified application. For example, if DJOB_ENV_SCRIPT=mpich.script, LSF runs $LSF_BINDIR/mpich.script setup to set up the environment to run an MPICH job. After the job completes, LSF runs $LSF_BINDIR/mpich.script cleanup
On cleanup, the mpich.script file could, for example, remove any temporary files and release resources used by the job. Changes to the LSB_DJOB_ENV_SCRIPT environment variable made by the script are visible to the job.
Use DJOB_HB_INTERVAL in an application profile in lsb.applications to configure an interval in seconds used to update the heartbeat between LSF and the tasks of a parallel or distributed job. DJOB_HB_INTERVAL only applies to the blaunch distributed application framework. When DJOB_HB_INTERVAL is specified, the interval is scaled according to the number of tasks in the job:
max(DJOB_HB_INTERVAL, 10) + host_factor
where host_factor = 0.01 * number of hosts allocated for the job.
When defined in an application profile, the LSB_DJOB_HB_INTERVAL variable is set in the parallel or distributed job environment. You should not manually change the value of LSB_DJOB_HB_INTERVAL.
By default, the interval is equal to SBD_SLEEP_TIME in lsb.params, where the default value of SBD_SLEEP_TIME is 30 seconds.
The current support for task geometry in LSF requires the user submitting a job to specify the wanted task geometry by setting the environment variable LSB_TASK_GEOMETRY in their submission environment before job submission. LSF checks for LSB_TASK_GEOMETRY and modifies LSB_MCPU_HOSTS appropriately.
The environment variable LSB_TASK_GEOMETRY is checked for all parallel jobs. If LSB_TASK_GEOMETRY is set users submit a parallel job (a job that requests more than 1 slot), LSF attempts to shape LSB_MCPU_HOSTS accordingly.
LSB_TASK_GEOMETRY was introduced to replace LSB_PJL_TASK_GEOMETRY, which is kept for compatibility with earlier versions. However, task geometry does not work using blaunch alone; it works with the PE/blaunch integration.
Parallel and distributed jobs are typically launched with a job script. If your job script runs multiple commands, you can ensure that resource usage is collected correctly for all commands in a job script by configuring LSF_HPC_EXTENSIONS=CUMULATIVE_RUSAGE in lsf.conf. Resource usage is collected for jobs in the job script, rather than being overwritten when each command is executed.
Because a resizable job can be resized any time, the blaunch framework is aware of the newly added resources (hosts) or released resources. When a validation request comes with those additional resources, the blaunch framework accepts the request and launches the remote tasks accordingly. When part of an allocation is released, the blaunch framework makes sure no remote tasks are running on those released resources, by terminating remote tasks on the released hosts if any. Any further validation requests with those released resources are rejected.
The blaunch framework provides the following functionality for resizable jobs:
The blaunch command and lsb_getalloc() API call accesses up to date resource allocation through the LSB_DJOB_HOSTFILE environment variable
Validation request (to launch remote tasks) with the additional resources succeeds.
Validation request (to launch remote tasks) with the released resources fails.
Remote tasks on the released resources are terminated and the blaunch framework terminates tasks on a host when the host has been completely removed from the allocation.
When releasing resources, LSF allows a configurable grace period (DJOB_RESIZE_ GRACE_PERIOD in lsb.applications) for tasks to clean up and exit themselves. By default, there is no grace period.
When remote tasks are launched on new additional hosts but the notification command fails, those remote tasks are terminated.
Use bsub to call blaunch, or to invoke an execution script that calls blaunch. The blaunch command assumes that bsub -n implies one task per job slot.
Submit a job:
bsub -n 4 blaunch myjob
Submit a job to launch tasks on a specific host:
bsub -n 4 blaunch hostA myjob
Submit a job with a host list:
bsub -n 4 blaunch -z "hostA hostB" myjob
Submit a job with a host file:
bsub -n 4 blaunch -u ./hostfile myjob
Submit a job to an application profile
bsub -n 4 -app djob blaunch myjob
To launch an ANSYS job through LSF using the blaunch framework, substitute the path to rsh or ssh with the path to blaunch. For example:
#BSUB -o stdout.txt
#BSUB -e stderr.txt
# Note: This case statement should be used to set up any
# environment variables needed to run the different versions
# of Ansys. All versions in this case statement that have the
# string "version list entry" on the same line will appear as
# choices in the Ansys service submission page.
case $VERSION in
10.0) #version list entry
export ANSYS_DIR=/usr/share/app/ansys_inc/v100/Ansys
export ANSYSLMD_LICENSE_FILE=1051@licserver.company.com
export MPI_REMSH=/opt/lsf/bin/blaunch
program=${ANSYS_DIR}/bin/ansys100
;;
*)
echo "Invalid version ($VERSION) specified"
exit 1
;;
esac
if [ -z "$JOBNAME" ]; then
export JOBNAME=ANSYS-$$
fi
if [ $CPUS -eq 1 ]; then
${program} -p ansys -j $JOBNAME -s read -l en-us -b -i $INPUT $OPTS
else
if [ $MEMORY_ARCH = "Distributed" ] Then
HOSTLIST=`echo $LSB_HOSTS | sed s/" "/":1:"/g` ${program} -j $JOBNAME - p
ansys -pp -dis -machines \
${HOSTLIST}:1 -i $INPUT $OPTS
else
${program} -j $JOBNAME -p ansys -pp -dis -np $CPUS \
-i $INPUT $OPTS
fi
fi
The blaunch application framework uses the following parameters:
(replaces LSB_PJL_TASK_GEOMETRY)
For details on these parameters, see the IBM Platform LSF Configuration Reference.