lsb.events

The LSF batch event log file lsb.events is used to display LSF batch event history and for mbatchd failure recovery.

Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid. See mbatchd(8) for the description of LSB_SHAREDIR.

The bhist command searches the most current lsb.events file for its output.

lsb.events structure

The event log file is an ASCII file with one record per line. For the lsb.events file, the first line has the format # history_seek_position>, which indicates the file position of the first history event after log switch. For the lsb.events.# file, the first line has the format # timestamp_most_recent_event, which gives the timestamp of the most recent event in the file.

Limiting the size of lsb.events

Use MAX_JOB_NUM in lsb.params to set the maximum number of finished jobs whose events are to be stored in the lsb.events log file.

Once the limit is reached, mbatchd starts a new event log file. The old event log file is saved as lsb.events.n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the new lsb.events file.

Records and fields

The fields of a record are separated by blanks. The first string of an event record indicates its type. The following types of events are recorded:

  • JOB_NEW

  • JOB_FORWARD

  • JOB_ACCEPT

  • JOB_ACCEPTACK

  • JOB_CHKPNT

  • JOB_START

  • JOB_START_ACCEPT

  • JOB_STATUS

  • JOB_SWITCH

  • JOB_SWITCH2

  • JOB_MOVE

  • QUEUE_CTRL

  • HOST_CTRL

  • MBD_START

  • MBD_DIE

  • UNFULFILL

  • LOAD_INDEX

  • JOB_SIGACT

  • MIG

  • JOB_MODIFY2

  • JOB_SIGNAL

  • JOB_EXECUTE

  • JOB_REQUEUE

  • JOB_CLEAN

  • JOB_EXCEPTION

  • JOB_EXT_MSG

  • JOB_ATTA_DATA

  • JOB_CHUNK

  • SBD_UNREPORTED_STATUS

  • PRE_EXEC_START

  • JOB_FORCE

  • GRP_ADD

  • GRP_MOD

  • LOG_SWITCH

  • JOB_RESIZE_NOTIFY_START

  • JOB_RESIZE_NOTIFY_ACCEPT

  • JOB_RESIZE_NOTIFY_DONE

  • JOB_RESIZE_RELEASE

  • JOB_RESIZE_CANCEL

  • Start of change

    HOST_POWER_STATUS

    End of change
  • Start of change

    JOB_PROV_HOST

    End of change

JOB_NEW

A new job has been submitted. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

userId (%d)

UNIX user ID of the submitter

options (%d)

Bit flags for job processing

numProcessors (%d)

Number of processors requested for execution

submitTime (%d)

Job submission time

beginTime (%d)

Start time – the job should be started on or after this time

termTime (%d)

Termination deadline – the job should be terminated by this time (%d)

sigValue (%d)

Signal value

chkpntPeriod (%d)

Checkpointing period

restartPid (%d)

Restart process ID

userName (%s)

User name

rLimits

Soft CPU time limit (%d), see getrlimit(2)

rLimits

Soft file size limit (%d), see getrlimit(2)

rLimits

Soft data segment size limit (%d), see getrlimit(2)

rLimits

Soft stack segment size limit (%d), see getrlimit(2)

rLimits

Soft core file size limit (%d), see getrlimit(2)

rLimits

Soft memory size limit (%d), see getrlimit(2)

rLimits

Reserved (%d)

rLimits

Reserved (%d)

rLimits

Reserved (%d)

rLimits

Soft run time limit (%d), see getrlimit(2)

rLimits

Reserved (%d)

hostSpec (%s)

Model or host name for normalizing CPU time and run time

hostFactor (%f)

CPU factor of the above host

umask (%d)

File creation mask for this job

queue (%s)

Name of job queue to which the job was submitted

resReq (%s)

Resource requirements

fromHost (%s)

Submission host name

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)

chkpntDir (%s)

Checkpoint directory

inFile (%s)

Input file name (up to 4094 characters for UNIX or 255 characters for Windows)

outFile (%s)

Output file name (up to 4094 characters for UNIX or 255 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)

subHomeDir (%s)

Submitter’s home directory

jobFile (%s)

Job file name

numAskedHosts (%d)

Number of candidate host names

askedHosts (%s)

List of names of candidate hosts for job dispatching

dependCond (%s)

Job dependency condition

preExecCmd (%s)

Job pre-execution command

jobName (%s)

Job name (up to 4094 characters)

command (%s)

Job command (up to 4094 characters for UNIX or 255 characters for Windows)

nxf (%d)

Number of files to transfer (%d)

xf (%s)

List of file transfer specifications

mailUser (%s)

Mail user name

projectName (%s)

Project name

niosPort (%d)

Callback port if batch interactive job

maxNumProcessors (%d)

Maximum number of processors

schedHostType (%s)

Execution host type

loginShell (%s)

Login shell

timeEvent (%d)

Time Event, for job dependency condition; specifies when time event ended

userGroup (%s)

User group

exceptList (%s)

Exception handlers for the job

options2 (%d)

Bit flags for job processing

idx (%d)

Job array index

inFileSpool (%s)

Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)

commandSpool (%s)

Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)

jobSpoolDir (%s)

Job spool directory (up to 4094 characters for UNIX or 255 characters for Windows)

userPriority (%d)

User priority

rsvId %s

Advance reservation ID; for example, "user2#0"

jobGroup (%s)

The job group under which the job runs

sla (%s)

SLA service class name under which the job runs

rLimits

Thread number limit

extsched (%s)

External scheduling options

warningAction (%s)

Job warning action

warningTimePeriod (%d)

Job warning time period in seconds

SLArunLimit (%d)

Absolute run time limit of the job for SLA service classes

licenseProject (%s)

IBM Platform License Scheduler project name

options3 (%d)

Bit flags for job processing

app (%s)

Application profile name

postExecCmd (%s)

Post-execution command to run on the execution host after the job finishes

runtimeEstimation (%d)

Estimated run time for the job

requeueEValues (%s)

Job exit values for automatic job requeue

resizeNotifyCmd (%s)

Resize notification command to run on the first execution host to inform job of a resize event.

jobDescription (%s)

Job description (up to 4094 characters).

submitEXT
Submission extension field, reserved for internal use.
Num (%d)

Number of elements (key-value pairs) in the structure.

key (%s)

Reserved for internal use.

value (%s)

Reserved for internal use.

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

network (%s)

Network requirements for IBM Parallel Environment (PE) jobs.

Start of changecpu_frequency(%d)End of change
Start of change

CPU frequency at which the job runs.

End of change

JOB_FORWARD

A job has been forwarded to a remote cluster (IBM Platform MultiCluster only).

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.

The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

numReserHosts (%d)

Number of reserved hosts in the remote cluster

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the reserHosts field.

cluster (%s)

Remote cluster name

reserHosts (%s)

List of names of the reserved hosts in the remote cluster

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

idx (%d)

Job array index

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

effectiveResReq (%s)

The runtime resource requirements used for the job.

JOB_ACCEPT

A job from a remote cluster has been accepted by this cluster. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID at the accepting cluster

remoteJid (%d)

Job ID at the submission cluster

cluster (%s)

Job submission cluster name

idx (%d)

Job array index

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_ACCEPTACK

Contains remote and local job ID mapping information. The default number for the ID is -1 (which means that this is not applicable to the job), and the default value for the cluster name is "" (empty string). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

The ID number of the job at the execution cluster

idx (%d)

The job array index

jobRmtAttr (%d)

Remote job attributes from:

  • Remote batch job on the submission side

  • Lease job on the submission side

  • Remote batch job on the execution side

  • Lease job on the execution side

  • Lease job re-syncronization during restart

  • Remote batch job re-running on the execution cluster

srcCluster (%s)

The name of the submission cluster

srcJobId (%d)

The submission cluster job ID

dstCluster (%s)

The name of the execution cluster

dstJobId (%d)

The execution cluster job ID

JOB_CHKPNT

Contains job checkpoint information. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

The ID number of the job at the execution cluster

period (%d)

The new checkpointing period

jobPid (%d)

The process ID of the checkpointing process, which is a child sbatchd

ok (%d)
  • 0 means the checkpoint started
  • 1 means the checkpoint succeeded
flags (%d)

Checkpoint flags, see <lsf/lsbatch.h>:

  • LSB_CHKPNT_KILL: Kill the process if checkpoint is successful
  • LSB_CHKPNT_FORCE: Force checkpoint even if non-checkpointable conditions exist
  • LSB_CHKPNT_MIG: Checkpoint for the purpose of migration
idx (%d)

Job array index (must be 0 in JOB_NEW)

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_START

A job has been dispatched.

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.

The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

jStatus (%d)

Job status, (4, indicating the RUN status of the job)

jobPid (%d)

Job process ID

jobPGid (%d)

Job process group ID

hostFactor (%f)

CPU factor of the first execution host

numExHosts (%d)

Number of processors used for execution

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.

execHosts (%s)

List of execution host names

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

queuePreCmd (%s)

Pre-execution command

queuePostCmd (%s)

Post-execution command

jFlags (%d)

Job processing flags

userGroup (%s)

User group name

idx (%d)

Job array index

additionalInfo (%s)

Placement information of HPC jobs

preemptBackfill (%d)
How long a backfilled job can run. Used for preemption backfill jobs.
jFlags2 (%d)
Job flags
srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

effectiveResReq (%s)

The runtime resource requirements used for the job.

num_network (%d)

The number of the allocated network for IBM Parallel Environment (PE) jobs.

networkID (%s)

Network ID of the allocated network for IBM Parallel Environment (PE) jobs.

num_window (%d)

Number of allocated windows for IBM Parallel Environment (PE) jobs.

Start of changecpu_frequency(%d)End of change
Start of change

CPU frequency at which the job runs.

End of change

JOB_START_ACCEPT

A job has started on the execution host(s). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

jobPid (%d)

Job process ID

jobPGid (%d)

Job process group ID

idx (%d)

Job array index

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_STATUS

The status of a job changed after dispatch. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

jStatus (%d)

New status, see <lsf/lsbatch.h>

For JOB_STAT_EXIT (32) and JOB_STAT_DONE (64), host-based resource usage information is appended to the JOB_STATUS record in the fields numHostRusage and hostRusage.

reason (%d)

Pending or suspended reason code, see <lsf/lsbatch.h>

subreasons (%d)

Pending or suspended subreason code, see <lsf/lsbatch.h>

cpuTime (%f)

CPU time consumed so far

endTime (%d)

Job completion time

ru (%d)

Resource usage flag

lsfRusage (%s)

Resource usage statistics, see <lsf/lsf.h>

exitStatus (%d)

Exit status of the job, see <lsf/lsbatch.h>

idx (%d)

Job array index

exitInfo (%d)

Job termination reason, see <lsf/lsbatch.h>

duration4PreemptBackfill

How long a backfilled job can run. Used for preemption backfill jobs

numHostRusage(%d)

Start of changeFor a jStatus of JOB_STAT_EXIT (32) or JOB_STAT_DONE (64), this field contains the number of host-based resource usage entries (hostRusage) that follow. 0 unless LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.End of change

hostRusage
Start of changeFor a jStatus of JOB_STAT_EXIT (32) or JOB_STAT_DONE (64), these fields contain host-based resource usage information for the job for parallel jobs when LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.
hostname (%s)

Name of the host.

mem(%d)

Total resident memory usage of all processes in the job running on this host.

swap(%d)

Total virtual memory usage of all processes in the job running on this host.

utime(%d)

User time used on this host.

stime(%d)

System time used on this host.

hHostExtendInfo(%d)

Number of following key-value pairs containing extended host information (PGIDs and PIDs). Set to 0 in lsb.events, lsb.acct, and lsb.stream files.

End of change
maxMem

Peak memory usage (in Mbytes)

avgMem

Average memory usage (in Mbytes)

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_SWITCH

A job switched from one queue to another (bswitch). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

userId (%d)

UNIX user ID of the user invoking the command

jobId (%d)

Job ID

queue (%s)

Target queue name

idx (%d)

Job array index

userName (%s)

Name of the job submitter

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_SWITCH2

A job array switched from one queue to another (bswitch). The fields are:
Version number (%s)

The version number

Event time (%d)

The time of the event

userId (%d)

UNIX user ID of the user invoking the command

jobId (%d)

Job ID

queue (%s)

Target queue name

userName (%s)

Name of the job submitter

indexRangeCnt (%s)

The number of ranges indicating successfully switched elements

indexRangeStart1 (%d)

The start of the first index range

indexRangeEnd1 (%d)

The end of the first index range

indexRangeStep1 (%d)

The step of the first index range

indexRangeStart2 (%d)

The start of the second index range

indexRangeEnd2 (%d)

The end of the second index range

indexRangeStep2 (%d)

The step of the second index range

indexRangeStartN (%d)

The start of the last index range

indexRangeEndN (%d)

The end of the last index range

indexRangeStepN (%d)

The step of the last index range

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

rmtCluster (%d)

The destination cluster to which the remote jobs belong

rmtJobCtrlId (%d)

Unique identifier for the remote job control session in the MultiCluster.

numSuccJobId (%d)

The number of jobs that were successful during this remote control operation.

succJobIdArray (%d)

Contains IDs for all the jobs that were successful during this remote control operation.

numFailJobId (%d)

The number of jobs which failed during this remote control session.

failJobIdArray (%d)

Contains IDs for all the jobs that failed during this remote control operation.

failReason (%d)

Contains the failure code and reason for each failed job in the failJobIdArray.

To prevent JOB_SWITCH2 from getting too long, the number of index ranges is limited to 500 per JOB_SWITCH2 event log. Therefore, if switching a large job array, several JOB_SWITCH2 events may be generated.

JOB_MOVE

A job moved toward the top or bottom of its queue (bbot or btop). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

userId (%d)

UNIX user ID of the user invoking the command

jobId (%d)

Job ID

position (%d)

Position number

base (%d)

Operation code, (TO_TOP or TO_BOTTOM), see <lsf/lsbatch.h>

idx (%d)

Job array index

userName (%s)

Name of the job submitter

QUEUE_CTRL

A job queue has been altered. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

opCode (%d)

Operation code), see <lsf/lsbatch.h>

queue (%s)

Queue name

userId (%d)

UNIX user ID of the user invoking the command

userName (%s)

Name of the user

ctrlComments (%s)

Administrator comment text from the -C option of badmin queue control commands qclose, qopen, qact, and qinact

HOST_CTRL

A batch server host changed status. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

opCode (%d)

Operation code, see <lsf/lsbatch.h>

host (%s)

Host name

userId (%d)

UNIX user ID of the user invoking the command

userName (%s)

Name of the user

ctrlComments (%s)

Administrator comment text from the -C option of badmin host control commands hclose and hopen

MBD_START

The mbatchd has started. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

master (%s)

Master host name

cluster (%s)

cluster name

numHosts (%d)

Number of hosts in the cluster

numQueues (%d)

Number of queues in the cluster

MBD_DIE

The mbatchd died. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

master (%s)

Master host name

numRemoveJobs (%d)

Number of finished jobs that have been removed from the system and logged in the current event file

exitCode (%d)

Exit code from mbatchd

ctrlComments (%s)

Administrator comment text from the -C option of badmin mbdrestart

UNFULFILL

Actions that were not taken because the mbatchd was unable to contact the sbatchd on the job execution host. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

notSwitched (%d)

Not switched: the mbatchd has switched the job to a new queue, but the sbatchd has not been informed of the switch

sig (%d)

Signal: this signal has not been sent to the job

sig1 (%d)

Checkpoint signal: the job has not been sent this signal to checkpoint itself

sig1Flags (%d)

Checkpoint flags, see <lsf/lsbatch.h>

chkPeriod (%d)

New checkpoint period for job

notModified (%s)

If set to true, then parameters for the job cannot be modified.

idx (%d)

Job array index

LOAD_INDEX

mbatchd restarted with these load index names (see lsf.cluster(5)). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

nIdx (%d)

Number of index names

name (%s)

List of index names

JOB_SIGACT

An action on a job has been taken. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

period (%d)

Action period

pid (%d)

Process ID of the child sbatchd that initiated the action

jstatus (%d)

Job status

reasons (%d)

Job pending reasons

flags (%d)

Action flags, see <lsf/lsbatch.h>

actStatus (%d)

Action status:

1: Action started

2: One action preempted other actions

3: Action succeeded

4: Action Failed

signalSymbol (%s)

Action name, accompanied by actFlags

idx (%d)

Job array index

MIG

A job has been migrated (bmig). The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

numAskedHosts (%d)

Number of candidate hosts for migration

askedHosts (%s)

List of names of candidate hosts

userId (%d)

UNIX user ID of the user invoking the command

idx (%d)

Job array index

userName (%s)

Name of the job submitter

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_MODIFY2

This is created when the mbatchd modifies a previously submitted job with bmod.
Version number (%s)

The version number

Event time (%d)

The time of the event

jobIdStr (%s)

Job ID

options (%d)

Bit flags for job modification options processing

options2 (%d)

Bit flags for job modification options processing

delOptions (%d)

Delete options for the options field

userId (%d)

UNIX user ID of the submitter

userName (%s)

User name

submitTime (%d)

Job submission time

umask (%d)

File creation mask for this job

numProcessors (%d)

Number of processors requested for execution. The value 2147483646 means the number of processors is undefined.

beginTime (%d)

Start time – the job should be started on or after this time

termTime (%d)

Termination deadline – the job should be terminated by this time

sigValue (%d)

Signal value

restartPid (%d)

Restart process ID for the original job

jobName (%s)

Job name (up to 4094 characters)

queue (%s)

Name of job queue to which the job was submitted

numAskedHosts (%d)

Number of candidate host names

askedHosts (%s)

List of names of candidate hosts for job dispatching; blank if the last field value is 0. If there is more than one host name, then each additional host name will be returned in its own field

resReq (%s)

Resource requirements

rLimits

Soft CPU time limit (%d), see getrlimit(2)

rLimits

Soft file size limit (%d), see getrlimit(2)

rLimits

Soft data segment size limit (%d), see getrlimit2)

rLimits

Soft stack segment size limit (%d), see getrlimit(2)

rLimits

Soft core file size limit (%d), see getrlimit(2)

rLimits

Soft memory size limit (%d), see getrlimit(2)

rLimits

Reserved (%d)

rLimits

Reserved (%d)

rLimits

Reserved (%d)

rLimits

Soft run time limit (%d), see getrlimit(2)

rLimits

Reserved (%d)

hostSpec (%s)

Model or host name for normalizing CPU time and run time

dependCond (%s)

Job dependency condition

timeEvent (%d)

Time Event, for job dependency condition; specifies when time event ended

subHomeDir (%s)

Submitter’s home directory

inFile (%s)

Input file name (up to 4094 characters for UNIX or 255 characters for Windows)

outFile (%s)

Output file name (up to 4094 characters for UNIX or 255 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)

command (%s)

Job command (up to 4094 characters for UNIX or 255 characters for Windows)

chkpntPeriod (%d)

Checkpointing period

chkpntDir (%s)

Checkpoint directory

nxf (%d)

Number of files to transfer

xf (%s)

List of file transfer specifications

jobFile (%s)

Job file name

fromHost (%s)

Submission host name

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)

preExecCmd (%s)

Job pre-execution command

mailUser (%s)

Mail user name

projectName (%s)

Project name

niosPort (%d)

Callback port if batch interactive job

maxNumProcessors (%d)

Maximum number of processors. The value 2147483646 means the maximum number of processors is undefined.

loginShell (%s)

Login shell

schedHostType (%s)

Execution host type

userGroup (%s)

User group

exceptList (%s)

Exception handlers for the job

delOptions2 (%d)

Delete options for the options2 field

inFileSpool (%s)

Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)

commandSpool (%s)

Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)

userPriority (%d)

User priority

rsvId %s

Advance reservation ID; for example, "user2#0"

extsched (%s)

External scheduling options

warningTimePeriod (%d)

Job warning time period in seconds

warningAction (%s)

Job warning action

jobGroup (%s)

The job group to which the job is attached

sla (%s)

SLA service class name that the job is to be attached to

licenseProject (%s)

IBM Platform License Scheduler project name

options3 (%d)

Bit flags for job processing

delOption3 (%d)

Delete options for the options3 field

app (%s)

Application profile name

apsString (%s)

Absolute priority scheduling (APS) value set by administrator

postExecCmd (%s)

Post-execution command to run on the execution host after the job finishes

runtimeEstimation (%d)

Estimated run time for the job

requeueEValues (%s)

Job exit values for automatic job requeue

resizeNotifyCmd (%s)

Resize notification command to run on the first execution host to inform job of a resize event.

jobdescription (%s)

Job description (up to 4094 characters).

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

network (%s)

Network requirements for IBM Parallel Environment (PE) jobs.

Start of changecpu_frequency(%d)End of change
Start of change

CPU frequency at which the job runs.

End of change

JOB_SIGNAL

This is created when a job is signaled with bkill or deleted with bdel. The fields are in the order they appended:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

userId (%d)

UNIX user ID of the user invoking the command

runCount (%d)

Number of runs

signalSymbol (%s)

Signal name

idx (%d)

Job array index

userName (%s)

Name of the job submitter

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_EXECUTE

This is created when a job is actually running on an execution host. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

execUid (%d)

Mapped UNIX user ID on execution host

jobPGid (%d)

Job process group ID

execCwd (%s)

Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)

execHome (%s)

Home directory job used on execution host

execUsername (%s)

Mapped user name on execution host

jobPid (%d)

Job process ID

idx (%d)

Job array index

additionalInfo (%s)

Placement information of HPC jobs

SLAscaledRunLimit (%d)

Run time limit for the job scaled by the execution host

execRusage

An internal field used by LSF.

Position

An internal field used by LSF.

duration4PreemptBackfill

How long a backfilled job can run; used for preemption backfill jobs

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

JOB_REQUEUE

This is created when a job ended and requeued by mbatchd. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

idx (%d)

Job array index

JOB_CLEAN

This is created when a job is removed from the mbatchd memory. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

idx (%d)

Job array index

JOB_EXCEPTION

This is created when an exception condition is detected for a job. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

exceptMask (%d)

Exception Id

0x01: missched

0x02: overrun

0x04: underrun

0x08: abend

0x10: cantrun

0x20: hostfail

0x40: startfail

0x100:runtime_est_exceeded

actMask (%d)

Action Id

0x01: kill

0x02: alarm

0x04: rerun

0x08: setexcept

timeEvent (%d)

Time Event, for missched exception specifies when time event ended.

exceptInfo (%d)

Except Info, pending reason for missched or cantrun exception, the exit code of the job for the abend exception, otherwise 0.

idx (%d)

Job array index

JOB_EXT_MSG

An external message has been sent to a job. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

idx (%d)

Job array index

msgIdx (%d)

Index in the list

userId (%d)

Unique user ID of the user invoking the command

dataSize (%ld)

Size of the data if it has any, otherwise 0

postTime (%ld)

Message sending time

dataStatus (%d)

Status of the attached data

desc (%s)

Text description of the message

userName (%s)

Name of the author of the message

Flags (%d)

Used for internal flow control

JOB_ATTA_DATA

An update on the data status of a message for a job has been sent. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

idx (%d)

Job array index

msgIdx (%d)

Index in the list

dataSize (%ld)

Size of the data if is has any, otherwise 0

dataStatus (%d)

Status of the attached data

fileName (%s)

File name of the attached data

JOB_CHUNK

This is created when a job is inserted into a chunk.

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.

The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

membSize (%ld)

Size of array membJobId

membJobId (%ld)

Job IDs of jobs in the chunk

numExHosts (%ld)

Number of execution hosts

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.

execHosts (%s)

Execution host name array

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

SBD_UNREPORTED_STATUS

This is created when an unreported status change occurs. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

actPid (%d)

Acting processing ID

jobPid (%d)

Job process ID

jobPGid (%d)

Job process group ID

newStatus (%d)

New status of the job

reason (%d)

Pending or suspending reason code, see <lsf/lsbatch.h>

suspreason (%d)

Pending or suspending subreason code, see <lsf/lsbatch.h>

lsfRusage
The following fields contain resource usage information for the job (see getrusage(2)). If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
ru_utime (%f)

User time used

ru_stime (%f)

System time used

ru_maxrss (%f)

Maximum shared text size

ru_ixrss (%f)

Integral of the shared text size over time (in KB seconds)

ru_ismrss (%f)

Integral of the shared memory size over time (valid only on Ultrix)

ru_idrss (%f)

Integral of the unshared data size over time

ru_isrss (%f)

Integral of the unshared stack size over time

ru_minflt (%f)

Number of page reclaims

ru_majflt (%f)

Number of page faults

ru_nswap (%f)

Number of times the process was swapped out

ru_inblock (%f)

Number of block input operations

ru_oublock (%f)

Number of block output operations

ru_ioch (%f)

Number of characters read and written (valid only on HP-UX)

ru_msgsnd (%f)

Number of System V IPC messages sent

ru_msgrcv (%f)

Number of messages received

ru_nsignals (%f)

Number of signals received

ru_nvcsw (%f)

Number of voluntary context switches

ru_nivcsw (%f)

Number of involuntary context switches

ru_exutime (%f)

Exact user time used (valid only on ConvexOS)

exitStatus (%d)

Exit status of the job, see <lsf/lsbatch.h>

execCwd (%s)

Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)

execHome (%s)

Home directory job used on execution host

execUsername (%s)

Mapped user name on execution host

msgId (%d)

ID of the message

actStatus (%d)

Action status

1: Action started

2: One action preempted other actions

3: Action succeeded

4: Action Failed

sigValue (%d)

Signal value

seq (%d)

Sequence status of the job

idx (%d)

Job array index

jRusage
The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
mem (%d)

Total resident memory usage in KB of all currently running processes in a given process group

swap (%d)

Totaly virtual memory usage in KB of all currently running processes in given process groups

utime (%d)

Cumulative total user time in seconds

stime (%d)

Cumulative total system time in seconds

npids (%d)

Number of currently active process in given process groups. This entry has four sub-fields:

pid (%d)

Process ID of the child sbatchd that initiated the action

ppid (%d)

Parent process ID

pgid (%d)

Process group ID

jobId (%d)

Process Job ID

npgids (%d)

Number of currently active process groups

exitInfo (%d)

Job termination reason, see <lsf/lsbatch.h>

PRE_EXEC_START

A pre-execution command has been started.

The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

jStatus (%d)

Job status, (4, indicating the RUN status of the job)

jobPid (%d)

Job process ID

jobPGid (%d)

Job process group ID

hostFactor (%f)

CPU factor of the first execution host

numExHosts (%d)

Number of processors used for execution

execHosts (%s)

List of execution host names

queuePreCmd (%s)

Pre-execution command

queuePostCmd (%s)

Post-execution command

jFlags (%d)

Job processing flags

userGroup (%s)

User group name

idx (%d)

Job array index

additionalInfo (%s)

Placement information of HPC jobs

effectiveResReq (%s)

The runtime resource requirements used for the job.

JOB_FORCE

A job has been forced to run with brun.
Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

userId (%d)

UNIX user ID of the user invoking the command

idx (%d)

Job array index

options (%d)

Bit flags for job processing

numExecHosts (%ld)

Number of execution hosts

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.

execHosts (%s)

Execution host name array

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

userName (%s)

Name of the user

queue (%s)

Name of queue if a remote brun job ran; otherwise, this field is empty. For MultiCluster this is the name of the receive queue at the execution cluster.

GRP_ADD

This is created when a job group is added. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

userId (%d)

UNIX user ID of the job group owner

submitTime (%d)

Job submission time

userName (%s)

User name of the job group owner

depCond (%s)

Job dependency condition

timeEvent (%d)

Time Event, for job dependency condition; specifies when time event ended

groupSpec (%s)

Job group name

delOptions (%d)

Delete options for the options field

delOptions2 (%d)

Delete options for the options2 field

sla (%s)

SLA service class name that the job group is to be attached to

maxJLimit (%d)

Job group limit set by bgadd -L

groupType (%d)
Job group creation method:
  • 0x01 - job group was created explicitly

  • 0x02 - job group was created implicitly

GRP_MOD

This is created when a job group is modified. The fields in order of occurrence are:
Version number (%s)

The version number

Event time (%d)

The time of the event

userId (%d)

UNIX user ID of the job group owner

submitTime (%d)

Job submission time

userName (%s)

User name of the job group owner

depCond (%s)

Job dependency condition

timeEvent (%d)

Time Event, for job dependency condition; specifies when time event ended

groupSpec (%s)

Job group name

delOptions (%d)

Delete options for the options field

delOptions2 (%d)

Delete options for the options2 field

sla (%s)

SLA service class name that the job group is to be attached to

maxJLimit (%d)

Job group limit set by bgmod -L

LOG_SWITCH

This is created when switching the event file lsb.events. The fields in order of occurrence are:

Version number (%s)

The version number

Event time (%d)

The time of the event

jobId (%d)

Job ID

JOB_RESIZE_NOTIFY_START

LSF logs this event when a resize (shrink or grow) request has been sent to the first execution host. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

notifyId (%d)

Identifier or handle for notification.

numResizeHosts (%d)

Number of processors used for execution. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.

resizeHosts (%s)

List of execution host names. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

JOB_RESIZE_NOTIFY_ACCEPT

LSF logs this event when a resize request has been accepted from the first execution host of a job. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

notifyId (%d)

Identifier or handle for notification.

resizeNotifyCmdPid (%d)

Resize notification executable process ID. If no resize notification executable is defined, this field will be set to 0.

resizeNotifyCmdPGid (%d)

Resize notification executable process group ID. If no resize notification executable is defined, this field will be set to 0.

status (%d)

Status field used to indicate possible errors. 0 Success, 1 failure.

JOB_RESIZE_NOTIFY_DONE

LSF logs this event when the resize notification command completes. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

notifyId (%d)

Identifier or handle for notification.

status (%d)

Resize notification exit value. (0, success, 1, failure, 2 failure but cancel request.)

JOB_RESIZE_RELEASE

LSF logs this event when receiving resource release request from client. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

reqid (%d)

Request Identifier or handle.

options (%d)

Release options.

userId (%d)

UNIX user ID of the user invoking the command.

userName (%s)

User name of the submitter.

resizeNotifyCmd (%s)

Resize notification command to run on the first execution host to inform job of a resize event.

numResizeHosts (%d)

Number of processors used for execution during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.

resizeHosts (%s)

List of execution host names during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

JOB_RESIZE_CANCEL

LSF logs this event when receiving cancel request from client. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

userId (%d)

UNIX user ID of the user invoking the command.

userName (%s)

User name of the submitter.

Start of change

HOST_POWER_STATUS

LSF logs this event when a host power status is changed, whether by power policy, job, or by the command badmin hpower. The HOST_POWER_STATUS event is logged to reflect the power status changes. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

Request Id (%d)

The power operation request ID to identify a power operation.

Op Code (%d)

Power operation type.

Trigger (%d)

The power operation trigger: power policy, job, or badmin hpower.

Status (%d)

The power operation status.

Trigger Name (%s)

If the operation is triggered by power policy, this is the power policy name. If the operation is triggered by an administrator, this is the administrator user name.

Number (%d)

Number of hosts on which the power operation occurred.

Hosts (%s)

The hosts on which the power operation occurred.

End of change
Start of change

JOB_PROV_HOST

When a job has been dispatched to a power saved host (or hosts), it will trigger a power state change for the host and the job will be in the PROV state. This event logs those PROV cases. The fields in order of occurrence are:
Version number (%s)

The version number.

Event time (%d)

The time of the event.

jobId (%d)

The job ID.

idx (%d)

Job array index.

status (%d)

Indicates if the provision has started, is done, or is failed.

num (%d)

Number of hosts that need to be provisioned.

hostNameList(%d)

Names of hosts that need to be provisioned.

hostStatusList(%d)

Host status for provisioning result.

End of change