Running and monitoring Session Scheduler jobs

Create a Session Scheduler session and run tasks

Procedure

  1. Create task definition file.

    For example:

    cat my.tasks
    sleep 10
    hostname
    uname
    ls
  2. Use bsub with the ssched application profile to submit a Session Scheduler job with the task definition.
    bsub -app ssched bsub_options ssched [task_options] [-tasks task_definition_file]
     [command [arguments]]
    For example:
    bsub -app ssched ssched -tasks my.tasks

Results

When all tasks finish, Session Scheduler exits, all temporary files are deleted, the session job is cleaned from the system, and Session Scheduler output is captured and included in the standard LSF job e-mail.

You can also submit a Session Scheduler job without a task definition file to specify a single task.

Note:

The submission directory path can contain up to 4094 characters.

See the ssched command reference for detailed information about all task options.

Submit a Session Scheduler job as a parallel Platform LSF job

Procedure

Use the -n option of bsub to submit a Session Scheduler job as a parallel LSF job.
bsub -app ssched -n num_hosts ssched [task_options] [-tasks task_definition_file]
 [command [arguments]]
For example:
bsub -app ssched -n 2 ssched -tasks my.tasks

Submit task array jobs

Procedure

Use the -J option to submit a task array via the command line, and no task definition file is needed:
-J task_name[index_list]

The index list must be enclosed in square brackets. The index list is a comma-separated list whose elements have the syntax start[-end[:step]] where start, end and step are positive integers. If the step is omitted, a step of one (1) is assumed. The task array index starts at one (1).

All tasks in the array share the same option parameters. Each element of the array is distinguished by its array index.

See the ssched command reference for detailed information about all task options.

Submit tasks with automatic task requeue

Procedure

Use the -Q option to specify requeue exit values for the tasks:
-Q "exit_code ..."

-Q enables automatic task requeue and sets the LSB_EXIT_REQUEUE environment variable. Use spaces to separate multiple exit codes. LSF does not save the output from the failed task, and does not notify the user that the task failed.

If a job is killed by a signal, the exit value is 128+signal_value. Use the sum of 128 and the signal value as the exit code in the parameter. For example, if you want a task to rerun if it is killed with a signal 9 (SIGKILL), the exit value is 128+9=137.

The SSCHED_REQUEUE_LIMIT setting limits the number of times a task can be requeued.

See the ssched command reference for detailed information about all task options.

Integrate Session Scheduler with bsub

Integrate Session Scheduler with bsub to make the execution of Session Scheduler jobs transparent. You can then use bsub to submit Session Scheduler jobs without specifying the Session Scheduler application profile and options.

The bsub command recognizes two environment variables to support Session Scheduler job submission: LSB_TASK_LIST (the task definition file) and LSB_BSUB_MODE (the current bsub mode). If LSB_BUSB_MODE is "ssched", running bsub does not submit a job to mbatchd. Instead, running bsub opens the task definition file (LSB_TASK_LIST) and inserts the submitted job as a task into the task definition file.

This integration supports the following bsub options: -E, -Ep, -e, -i, -J, -j, -o, -M, -Q, and -W.

Other bsub options are ignored.

Set up the integrated execution environment

Create the script files necessary for setting up the execution environment to integrate Session Scheduler with bsub.

Procedure

  1. Create the begin_ssched.sh script, which creates a Session Scheduler job and sets the necessary environment variables.
    #!/bin/sh -x
    
    TMPDIR=~/.ssched
    
    LSB_TASKLIST=$TMPDIR/task.lst.$$
    export LSB_TASKLIST
    
    
    if [ ! -d $TMPDIR ] 
    then
        mkdir -p $TMPDIR
    fi
    
    #
    # make sure no two sessions conflict each other
    #
    i=0
    while [ -f $LSB_TASKLIST ]
    do
        let i=i+1
        LSB_TASKLIST=$TMPDIR/task.lst.$$.$i
        export LSB_TASKLIST
    done
    
    JID=`bsub -H -Ep "rm -f $LSB_TASKLIST" $* ssched -tasks $LSB_TASKLIST | cut -f2 -d'<' | cut -f1 -d'>'`
    export JID
    
    LSB_BSUB_MODE=ssched
    export LSB_BSUB_MODE
  2. Create the end_ssched.sh script, to schedule and execute the Session Scheduler job.
    #!/bin/sh
    
    bresume $JID > /dev/null 2>&1
    
    unset LSB_BSUB_MODE
    unset LSB_TASKLIST
  3. Copy the two script files into the LSF_BINDIR directory.
  4. Set the file permissions of the two script files to be executable for all users.

Use the integrated execution environment

Use bsub to submit Session Scheduler jobs without specifying the Session Scheduler application profile and options.

Procedure

  1. Run the begin_ssched.sh script to create a Session Scheduler job and set up the environment variables.

    You can use standard bsub options with begin_ssched.sh to apply to the session.

    For example, to create a session job with two slots and send the output to a.out:

    . begin_ssched.sh -n2 -o a.out

  2. Run bsub for each batch job you want to include in the session.

    You can run bsub with the following options:-E, -Ep, -e, -i, -J, -j, -o, -M, -Q, and -W.

  3. Run the end_ssched.sh script to have LSFcreate a Session Scheduler job and set up the environment variables.

    . end_ssched.sh

    The task definition file is automatically deleted after the Session Scheduler job is complete.

What to do next

You can also run these commands entirely from a script. For example:

#!/bin/sh

. begin_ssched.sh -n2

bsub task1
bsub task2

. end_ssched.sh

Monitor Session Scheduler jobs

Procedure

  1. Run bjobs -ss to get summary information for Session Scheduler jobs and tasks.
    JOBID OWNER JOB_NAME NTASKS PEND DONE  RUN EXIT
    1   lsfadmin job1   10     4    4    2    0
    2   lsfadmin job2   10    10    0    0    0
    3   lsfadmin job3   10    10    0    0    0

    Information displays about your session scheduler job, including Job ID, the owner, the job name, the number of total tasks, and the number of tasks in any of the following states: pend, run, done, exit.

  2. Use bjobs -l -ss or bread to track the progress of the Session Scheduler job.

Kill a Session Scheduler session

Procedure

Use bkill to kill the Session Scheduler session. All temporary files are deleted, and the session job is cleaned from the system.

Check your job submission

Procedure

Use the -C option to sanity-check all parameters and the task definition file.

ssched exits after the check is complete. An exit code of 0 indicates no errors were found. A non-zero exit code indicates errors. You can run ssched -C outside of LSF.

See the ssched command reference for detailed information about all task options.

Example output of ssched -C:

ssched -C -tasks my.tasks
Error in tasks file line 1: -XXX 123 sleep 0
Unsupported option: -XXX
Error in tasks file line 2: -o my.out
A command must be specified

Results

Only the ssched parameters are checked, not the ssched task command itself. The task command must exist and be executable. ssched -C cannot detect whether the task command exists or is executable. To check a task definitions file, remember to specify the -tasks option.

Enable recoverable Session Scheduler sessions

About this task

By default, Session Scheduler sessions are unrecoverable. In the event of a system crash, the session job must be resubmitted and all tasks are resubmitted and rerun.

However, the Session Scheduler supports application-level checkpoint/restart using Platform LSF's existing facilities. If the user specifies a checkpoint directory when submitting the session job, the job can be restarted using brestart. After a restart, only those tasks that have not yet completed are resubmitted and run.

Procedure

To enable recoverable sessions, when submitting the session job:
  1. Provide a writable directory on a shared file system.
  2. Specify the ssched checkpoint method with the bsub -k option.

Results

You do not need to call bchkpnt. The Session Scheduler automatically checkpoints itself after each task completes.

Example

For example:
bsub -app ssched -k "/share/scratch method=ssched" -n 8 ssched -tasks simpton.tasks
Job <123> is submitted to default queue <normal>.
...
brestart /share/scratch 123