How Session Scheduler Runs Tasks

Once a Session Scheduler session job has been dispatched and starts running, Session Scheduler parses the task definition file specified on the ssched command. Each line of the task definition file is one task. Tasks run on the hosts in the allocation in any order. Dependencies between tasks are not supported.

Session Scheduler status is posted to the Session Scheduler session job through the LSF bpost command. Use bread or bjobs -l to view Session Scheduler status. The status includes the current number of pending, running and completed tasks. LSF administrators can configure how often the status is updated.

When all tasks are completed, the Session Scheduler exits normally.

ssched runs under the submission user account. Any processes it creates, either locally or remotely, also run under the submission user account. Session Scheduler does not require any privileges beyond those normally granted a user.

Session Scheduler job sessions

The Session Scheduler session job is compatible with all currently supported LSF job submission and execution parameters, including pre-execution, post-execution, job-starters, I/O redirection, queue and application profile configuration.

Run limits are interpreted and enforced as normal LSF parallel jobs. Application-level checkpointing is also supported. Job chunking is not relevant to Session Scheduler jobs since a single Session Scheduler session is generally long running and should not be chunked.

If the Session Scheduler session is killed (bkill) or requeued (brequeue), the Session Scheduler kills all running tasks, execution agents, and any other processes it has started, both local and remote. The session scheduler also cleans up any temporary files created and then exits. If the session scheduler is then requeued and restarted, all tasks are rerun.

If the Session Scheduler session is suspended (bstop), the Session Scheduler and all local and remote components will be stopped until the session is resumed (bresume).

Session Scheduler tasks

ssched and sservice and sschild execution agents ensure that the user submission environment variables are set correctly for each task. In order to minimize the load on the LSF, mbatchd does not have any knowledge of individual tasks.

Task definition file format

The task definition file is an ASCII file. Each line represents one task, or an array of tasks. Each line has the following format.
[task_options] command [arguments]

Session and task accounting

Jobs corresponding to the Session Scheduler session have one record in lsb.acct. This record represents the aggregate resource usage of all tasks in the allocation.

If task accounting is enabled with SSCHED_ACCT_DIR in lsb.params, Session Scheduler creates task accounting files for each Session Scheduler session job and appends an accounting record to the end of the file. This record follows a similar format to the LSF accounting file lsb.acct format, but with additional fields/

The accounting file is named jobID.ssched.acct. If no directory is specified, accounting records are not written.

The Session Scheduler accounting directory must be accessible and writable from all hosts in the cluster. Each Session Scheduler session (each ssched instance) creates one accounting file. Each file contains one accounting entry for each task. Each completed task index has one line in the file. Each line records the resource usage of one task.

Task accounting file format

Task accounting records have a similar format as the lsb.acct JOB_FINISH event record. See the Platform LSF Configuration Reference for more information about JOB_FINISH event fields.

Field

Description

Event type (%s)

TASK_FINISH

Version Number (%s)

9.1.2

Event Time (%d)

Time the event was logged (in seconds since the epoch)

jobId (%d)

ID for the job

userId (%d)

UNIX user ID of the submitter

options (%d)

Always 0

numProcessors (%d)

Always 1

submitTime (%d)

Task enqueue time

beginTime (%d)

Always 0

termTime (%d)

Always 0

startTime (%d)

Task start time

userName (%s)

User name of the submitter

queue (%s)

Always empty

resReq (%s)

Always empty

dependCond (%s)

Always empty

preExecCmd (%s)

Task pre-execution command

fromHost (%s)

Submission host name

cwd (%s)

Execution host current working directory (up to 4094 characters)

inFile (%s)

Task input file name (up to 4094 characters)

outFile (%s)

Task output file name (up to 4094 characters)

errFile (%s)

Task error output file name (up to 4094 characters)

jobFile (%s)

Task script file name

numAskedHosts (%d)

Always 0

askedHosts (%s)

Always empty

numExHosts (%d)

Always 1

execHosts (%s)

Name of task execution host

jStatus (%d)

64 indicates task completed normally. 32 indicates task exited abnormally

hostFactor (%f)

CPU factor of the task execution host

jobName (%s)

Always empty

command (%s)

Complete batch task command specified by the user (up to 4094 characters)

lsfRusage (%f)

All rusage fields contain resource usage information for the task

mailUser (%s)

Always empty

projectName (%s)

Always empty

exitStatus (%d)

UNIX exit status of the task

maxNumProcessors (%d)

Always 1

loginShell (%s)

Always empty

timeEvent (%s)

Always empty

idx (%d)

Session Job Index

maxRMem (%d)

Always 0

maxRSwap (%d)

Always 0

inFileSpool (%s)

Always empty

commandSpool (%s)

Always empty

rsvId (%s)

Always empty

sla (%s)

Always empty

exceptMask (%d)

Always 0

additionalInfo (%s)

Always empty

exitInfo (%d)

Always 0

warningAction (%s)

Always empty

warningTimePeriod (%d)

Always 0

chargedSAAP (%s)

Always empty

licenseProject (%s)

Always empty

options3 (%d)

Always 0

app (%s)

Always empty

taskID (%d)

Task ID

taskIdx (%d)

Task index

taskName (%s)

Task name

taskOptions (%d)

Bit mask of task options:
  • TASK_IN_FILE (0x01)—specify input file

  • TASK_OUT_FILE (0x02)—specify output file

  • TASK_ERR_FILE (0x04)—specify error file

  • TASK_PRE_EXEC (0x08)—specify pre-exec command

  • TASK_POST_EXEC (0x10)—specify post-exec command

  • TASK_NAME (0x20)—specify task name

taskExitReason (%d)

Task exit reason:
  • TASK_EXIT_NORMAL = 0— normal exit

  • TASK_EXIT_INIT = 1—generic task initialization failure

  • TASK_EXIT_PATH = 2—failed to initialize path

  • TASK_EXIT_NO_FILE = 3—failed to create task file

  • TASK_EXIT_PRE_EXEC = 4— task pre-exec failed

  • TASK_EXIT_NO_PROCESS = 5—fork failed

  • TASK_EXIT_XDR = 6—xdr communication error

  • TASK_EXIT_NOMEM = 7— no memory

  • TASK_EXIT_SYS = 8—system call failed

  • TASK_EXIT_TSCHILD_EXEC = 9—failed to run sschild

  • TASK_EXIT_RUNLIMIT = 10—task reaches run limit

  • TASK_EXIT_IO = 11—I/O failure

  • TASK_EXIT_RSRC_LIMIT = 12—set task resource limit failed