The LSF batch event log file lsb.events is used to display LSF batch event history and for mbatchd failure recovery.
Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid. See mbatchd(8) for the description of LSB_SHAREDIR.
The bhist command searches the most current lsb.events file for its output.
The event log file is an ASCII file with one record per line. For the lsb.events file, the first line has the format # history_seek_position>, which indicates the file position of the first history event after log switch. For the lsb.events.# file, the first line has the format # timestamp_most_recent_event, which gives the timestamp of the most recent event in the file.
Use MAX_JOB_NUM in lsb.params to set the maximum number of finished jobs whose events are to be stored in the lsb.events log file.
Once the limit is reached, mbatchd starts a new event log file. The old event log file is saved as lsb.events.n, with subsequent sequence number suffixes incremented by 1 each time a new log file is started. Event logging continues in the new lsb.events file.
The fields of a record are separated by blanks. The first string of an event record indicates its type. The following types of events are recorded:
JOB_NEW
JOB_FORWARD
JOB_ACCEPT
JOB_ACCEPTACK
JOB_CHKPNT
JOB_START
JOB_START_ACCEPT
JOB_STATUS
JOB_SWITCH
JOB_SWITCH2
JOB_MOVE
QUEUE_CTRL
HOST_CTRL
MBD_START
MBD_DIE
UNFULFILL
LOAD_INDEX
JOB_SIGACT
MIG
JOB_MODIFY2
JOB_SIGNAL
JOB_EXECUTE
JOB_REQUEUE
JOB_CLEAN
JOB_EXCEPTION
JOB_EXT_MSG
JOB_ATTA_DATA
JOB_CHUNK
SBD_UNREPORTED_STATUS
PRE_EXEC_START
JOB_FORCE
GRP_ADD
GRP_MOD
LOG_SWITCH
JOB_RESIZE_NOTIFY_START
JOB_RESIZE_NOTIFY_ACCEPT
JOB_RESIZE_NOTIFY_DONE
JOB_RESIZE_RELEASE
JOB_RESIZE_CANCEL
HOST_POWER_STATUS
JOB_PROV_HOST
The version number
The time of the event
Job ID
UNIX user ID of the submitter
Bit flags for job processing
Number of processors requested for execution
Job submission time
Start time – the job should be started on or after this time
Termination deadline – the job should be terminated by this time (%d)
Signal value
Checkpointing period
Restart process ID
User name
Soft CPU time limit (%d), see getrlimit(2)
Soft file size limit (%d), see getrlimit(2)
Soft data segment size limit (%d), see getrlimit(2)
Soft stack segment size limit (%d), see getrlimit(2)
Soft core file size limit (%d), see getrlimit(2)
Soft memory size limit (%d), see getrlimit(2)
Reserved (%d)
Reserved (%d)
Reserved (%d)
Soft run time limit (%d), see getrlimit(2)
Reserved (%d)
Model or host name for normalizing CPU time and run time
CPU factor of the above host
File creation mask for this job
Name of job queue to which the job was submitted
Resource requirements
Submission host name
Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)
Checkpoint directory
Input file name (up to 4094 characters for UNIX or 255 characters for Windows)
Output file name (up to 4094 characters for UNIX or 255 characters for Windows)
Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)
Submitter’s home directory
Job file name
Number of candidate host names
List of names of candidate hosts for job dispatching
Job dependency condition
Job pre-execution command
Job name (up to 4094 characters)
Job command (up to 4094 characters for UNIX or 255 characters for Windows)
Number of files to transfer (%d)
List of file transfer specifications
Mail user name
Project name
Callback port if batch interactive job
Maximum number of processors
Execution host type
Login shell
Time Event, for job dependency condition; specifies when time event ended
User group
Exception handlers for the job
Bit flags for job processing
Job array index
Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)
Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)
Job spool directory (up to 4094 characters for UNIX or 255 characters for Windows)
User priority
Advance reservation ID; for example, "user2#0"
The job group under which the job runs
SLA service class name under which the job runs
Thread number limit
External scheduling options
Job warning action
Job warning time period in seconds
Absolute run time limit of the job for SLA service classes
IBM Platform License Scheduler project name
Bit flags for job processing
Application profile name
Post-execution command to run on the execution host after the job finishes
Estimated run time for the job
Job exit values for automatic job requeue
Resize notification command to run on the first execution host to inform job of a resize event.
Job description (up to 4094 characters).
Number of elements (key-value pairs) in the structure.
Reserved for internal use.
Reserved for internal use.
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
Network requirements for IBM Parallel Environment (PE) jobs.
CPU frequency at which the job runs.
A job has been forwarded to a remote cluster (IBM Platform MultiCluster only).
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
The version number
The time of the event
Job ID
Number of reserved hosts in the remote cluster
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the reserHosts field.
Remote cluster name
List of names of the reserved hosts in the remote cluster
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
Job array index
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The runtime resource requirements used for the job.
The version number
The time of the event
Job ID at the accepting cluster
Job ID at the submission cluster
Job submission cluster name
Job array index
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
The ID number of the job at the execution cluster
The job array index
Remote job attributes from:
Remote batch job on the submission side
Lease job on the submission side
Remote batch job on the execution side
Lease job on the execution side
Lease job re-syncronization during restart
Remote batch job re-running on the execution cluster
The name of the submission cluster
The submission cluster job ID
The name of the execution cluster
The execution cluster job ID
The version number
The time of the event
The ID number of the job at the execution cluster
The new checkpointing period
The process ID of the checkpointing process, which is a child sbatchd
Checkpoint flags, see <lsf/lsbatch.h>:
Job array index (must be 0 in JOB_NEW)
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
A job has been dispatched.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
The version number
The time of the event
Job ID
Job status, (4, indicating the RUN status of the job)
Job process ID
Job process group ID
CPU factor of the first execution host
Number of processors used for execution
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
List of execution host names
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
Pre-execution command
Post-execution command
Job processing flags
User group name
Job array index
Placement information of HPC jobs
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The runtime resource requirements used for the job.
The number of the allocated network for IBM Parallel Environment (PE) jobs.
Network ID of the allocated network for IBM Parallel Environment (PE) jobs.
Number of allocated windows for IBM Parallel Environment (PE) jobs.
CPU frequency at which the job runs.
The version number
The time of the event
Job ID
Job process ID
Job process group ID
Job array index
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
Job ID
New status, see <lsf/lsbatch.h>
For JOB_STAT_EXIT (32) and JOB_STAT_DONE (64), host-based resource usage information is appended to the JOB_STATUS record in the fields numHostRusage and hostRusage.
Pending or suspended reason code, see <lsf/lsbatch.h>
Pending or suspended subreason code, see <lsf/lsbatch.h>
CPU time consumed so far
Job completion time
Resource usage flag
Resource usage statistics, see <lsf/lsf.h>
Exit status of the job, see <lsf/lsbatch.h>
Job array index
Job termination reason, see <lsf/lsbatch.h>
How long a backfilled job can run. Used for preemption backfill jobs
For a jStatus of JOB_STAT_EXIT (32) or JOB_STAT_DONE (64), this field contains the number of host-based resource usage entries (hostRusage) that follow. 0 unless LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.
Name of the host.
Total resident memory usage of all processes in the job running on this host.
Total virtual memory usage of all processes in the job running on this host.
User time used on this host.
System time used on this host.
Number of following key-value pairs containing extended host information (PGIDs and PIDs). Set to 0 in lsb.events, lsb.acct, and lsb.stream files.
Peak memory usage (in Mbytes)
Average memory usage (in Mbytes)
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
UNIX user ID of the user invoking the command
Job ID
Target queue name
Job array index
Name of the job submitter
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
UNIX user ID of the user invoking the command
Job ID
Target queue name
Name of the job submitter
The number of ranges indicating successfully switched elements
The start of the first index range
The end of the first index range
The step of the first index range
The start of the second index range
The end of the second index range
The step of the second index range
The start of the last index range
The end of the last index range
The step of the last index range
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The destination cluster to which the remote jobs belong
Unique identifier for the remote job control session in the MultiCluster.
The number of jobs that were successful during this remote control operation.
Contains IDs for all the jobs that were successful during this remote control operation.
The number of jobs which failed during this remote control session.
Contains IDs for all the jobs that failed during this remote control operation.
Contains the failure code and reason for each failed job in the failJobIdArray.
To prevent JOB_SWITCH2 from getting too long, the number of index ranges is limited to 500 per JOB_SWITCH2 event log. Therefore, if switching a large job array, several JOB_SWITCH2 events may be generated.
The version number
The time of the event
UNIX user ID of the user invoking the command
Job ID
Position number
Operation code, (TO_TOP or TO_BOTTOM), see <lsf/lsbatch.h>
Job array index
Name of the job submitter
The version number
The time of the event
Operation code), see <lsf/lsbatch.h>
Queue name
UNIX user ID of the user invoking the command
Name of the user
Administrator comment text from the -C option of badmin queue control commands qclose, qopen, qact, and qinact
The version number
The time of the event
Operation code, see <lsf/lsbatch.h>
Host name
UNIX user ID of the user invoking the command
Name of the user
Administrator comment text from the -C option of badmin host control commands hclose and hopen
The version number
The time of the event
Master host name
cluster name
Number of hosts in the cluster
Number of queues in the cluster
The version number
The time of the event
Master host name
Number of finished jobs that have been removed from the system and logged in the current event file
Exit code from mbatchd
Administrator comment text from the -C option of badmin mbdrestart
The version number
The time of the event
Job ID
Not switched: the mbatchd has switched the job to a new queue, but the sbatchd has not been informed of the switch
Signal: this signal has not been sent to the job
Checkpoint signal: the job has not been sent this signal to checkpoint itself
Checkpoint flags, see <lsf/lsbatch.h>
New checkpoint period for job
If set to true, then parameters for the job cannot be modified.
Job array index
The version number
The time of the event
Number of index names
List of index names
The version number
The time of the event
Job ID
Action period
Process ID of the child sbatchd that initiated the action
Job status
Job pending reasons
Action flags, see <lsf/lsbatch.h>
Action status:
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
Action name, accompanied by actFlags
Job array index
The version number
The time of the event
Job ID
Number of candidate hosts for migration
List of names of candidate hosts
UNIX user ID of the user invoking the command
Job array index
Name of the job submitter
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
Job ID
Bit flags for job modification options processing
Bit flags for job modification options processing
Delete options for the options field
UNIX user ID of the submitter
User name
Job submission time
File creation mask for this job
Number of processors requested for execution. The value 2147483646 means the number of processors is undefined.
Start time – the job should be started on or after this time
Termination deadline – the job should be terminated by this time
Signal value
Restart process ID for the original job
Job name (up to 4094 characters)
Name of job queue to which the job was submitted
Number of candidate host names
List of names of candidate hosts for job dispatching; blank if the last field value is 0. If there is more than one host name, then each additional host name will be returned in its own field
Resource requirements
Soft CPU time limit (%d), see getrlimit(2)
Soft file size limit (%d), see getrlimit(2)
Soft data segment size limit (%d), see getrlimit2)
Soft stack segment size limit (%d), see getrlimit(2)
Soft core file size limit (%d), see getrlimit(2)
Soft memory size limit (%d), see getrlimit(2)
Reserved (%d)
Reserved (%d)
Reserved (%d)
Soft run time limit (%d), see getrlimit(2)
Reserved (%d)
Model or host name for normalizing CPU time and run time
Job dependency condition
Time Event, for job dependency condition; specifies when time event ended
Submitter’s home directory
Input file name (up to 4094 characters for UNIX or 255 characters for Windows)
Output file name (up to 4094 characters for UNIX or 255 characters for Windows)
Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)
Job command (up to 4094 characters for UNIX or 255 characters for Windows)
Checkpointing period
Checkpoint directory
Number of files to transfer
List of file transfer specifications
Job file name
Submission host name
Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)
Job pre-execution command
Mail user name
Project name
Callback port if batch interactive job
Maximum number of processors. The value 2147483646 means the maximum number of processors is undefined.
Login shell
Execution host type
User group
Exception handlers for the job
Delete options for the options2 field
Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)
Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)
User priority
Advance reservation ID; for example, "user2#0"
External scheduling options
Job warning time period in seconds
Job warning action
The job group to which the job is attached
SLA service class name that the job is to be attached to
IBM Platform License Scheduler project name
Bit flags for job processing
Delete options for the options3 field
Application profile name
Absolute priority scheduling (APS) value set by administrator
Post-execution command to run on the execution host after the job finishes
Estimated run time for the job
Job exit values for automatic job requeue
Resize notification command to run on the first execution host to inform job of a resize event.
Job description (up to 4094 characters).
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
Network requirements for IBM Parallel Environment (PE) jobs.
CPU frequency at which the job runs.
The version number
The time of the event
Job ID
UNIX user ID of the user invoking the command
Number of runs
Signal name
Job array index
Name of the job submitter
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
Job ID
Mapped UNIX user ID on execution host
Job process group ID
Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)
Home directory job used on execution host
Mapped user name on execution host
Job process ID
Job array index
Placement information of HPC jobs
Run time limit for the job scaled by the execution host
An internal field used by LSF.
An internal field used by LSF.
How long a backfilled job can run; used for preemption backfill jobs
The submission cluster job ID
The name of the submission cluster
The execution cluster job ID
The name of the execution cluster
The version number
The time of the event
Job ID
Job array index
The version number
The time of the event
Job ID
Job array index
The version number
The time of the event
Job ID
Exception Id
0x01: missched
0x02: overrun
0x04: underrun
0x08: abend
0x10: cantrun
0x20: hostfail
0x40: startfail
0x100:runtime_est_exceeded
Action Id
0x01: kill
0x02: alarm
0x04: rerun
0x08: setexcept
Time Event, for missched exception specifies when time event ended.
Except Info, pending reason for missched or cantrun exception, the exit code of the job for the abend exception, otherwise 0.
Job array index
The version number
The time of the event
Job ID
Job array index
Index in the list
Unique user ID of the user invoking the command
Size of the data if it has any, otherwise 0
Message sending time
Status of the attached data
Text description of the message
Name of the author of the message
Used for internal flow control
The version number
The time of the event
Job ID
Job array index
Index in the list
Size of the data if is has any, otherwise 0
Status of the attached data
File name of the attached data
This is created when a job is inserted into a chunk.
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
The version number
The time of the event
Size of array membJobId
Job IDs of jobs in the chunk
Number of execution hosts
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
Execution host name array
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
The version number
The time of the event
Job ID
Acting processing ID
Job process ID
Job process group ID
New status of the job
Pending or suspending reason code, see <lsf/lsbatch.h>
Pending or suspending subreason code, see <lsf/lsbatch.h>
User time used
System time used
Maximum shared text size
Integral of the shared text size over time (in KB seconds)
Integral of the shared memory size over time (valid only on Ultrix)
Integral of the unshared data size over time
Integral of the unshared stack size over time
Number of page reclaims
Number of page faults
Number of times the process was swapped out
Number of block input operations
Number of block output operations
Number of characters read and written (valid only on HP-UX)
Number of System V IPC messages sent
Number of messages received
Number of signals received
Number of voluntary context switches
Number of involuntary context switches
Exact user time used (valid only on ConvexOS)
Exit status of the job, see <lsf/lsbatch.h>
Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)
Home directory job used on execution host
Mapped user name on execution host
ID of the message
Action status
1: Action started
2: One action preempted other actions
3: Action succeeded
4: Action Failed
Signal value
Sequence status of the job
Job array index
Total resident memory usage in KB of all currently running processes in a given process group
Totaly virtual memory usage in KB of all currently running processes in given process groups
Cumulative total user time in seconds
Cumulative total system time in seconds
Number of currently active process in given process groups. This entry has four sub-fields:
Process ID of the child sbatchd that initiated the action
Parent process ID
Process group ID
Process Job ID
Number of currently active process groups
Job termination reason, see <lsf/lsbatch.h>
A pre-execution command has been started.
The version number
The time of the event
Job ID
Job status, (4, indicating the RUN status of the job)
Job process ID
Job process group ID
CPU factor of the first execution host
Number of processors used for execution
List of execution host names
Pre-execution command
Post-execution command
Job processing flags
User group name
Job array index
Placement information of HPC jobs
The runtime resource requirements used for the job.
The version number
The time of the event
Job ID
UNIX user ID of the user invoking the command
Job array index
Bit flags for job processing
Number of execution hosts
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
Execution host name array
If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
Name of the user
Name of queue if a remote brun job ran; otherwise, this field is empty. For MultiCluster this is the name of the receive queue at the execution cluster.
The version number
The time of the event
UNIX user ID of the job group owner
Job submission time
User name of the job group owner
Job dependency condition
Time Event, for job dependency condition; specifies when time event ended
Job group name
Delete options for the options field
Delete options for the options2 field
SLA service class name that the job group is to be attached to
Job group limit set by bgadd -L
0x01 - job group was created explicitly
0x02 - job group was created implicitly
The version number
The time of the event
UNIX user ID of the job group owner
Job submission time
User name of the job group owner
Job dependency condition
Time Event, for job dependency condition; specifies when time event ended
Job group name
Delete options for the options field
Delete options for the options2 field
SLA service class name that the job group is to be attached to
Job group limit set by bgmod -L
This is created when switching the event file lsb.events. The fields in order of occurrence are:
The version number
The time of the event
Job ID
The version number.
The time of the event.
The job ID.
Job array index.
Identifier or handle for notification.
Number of processors used for execution. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.
List of execution host names. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
The version number.
The time of the event.
The job ID.
Job array index.
Identifier or handle for notification.
Resize notification executable process ID. If no resize notification executable is defined, this field will be set to 0.
Resize notification executable process group ID. If no resize notification executable is defined, this field will be set to 0.
Status field used to indicate possible errors. 0 Success, 1 failure.
The version number.
The time of the event.
The job ID.
Job array index.
Identifier or handle for notification.
Resize notification exit value. (0, success, 1, failure, 2 failure but cancel request.)
The version number.
The time of the event.
The job ID.
Job array index.
Request Identifier or handle.
Release options.
UNIX user ID of the user invoking the command.
User name of the submitter.
Resize notification command to run on the first execution host to inform job of a resize event.
Number of processors used for execution during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.
List of execution host names during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
The version number.
The time of the event.
The job ID.
Job array index.
UNIX user ID of the user invoking the command.
User name of the submitter.
The version number.
The time of the event.
The power operation request ID to identify a power operation.
Power operation type.
The power operation trigger: power policy, job, or badmin hpower.
The power operation status.
If the operation is triggered by power policy, this is the power policy name. If the operation is triggered by an administrator, this is the administrator user name.
Number of hosts on which the power operation occurred.
The hosts on which the power operation occurred.
The version number.
The time of the event.
The job ID.
Job array index.
Indicates if the provision has started, is done, or is failed.
Number of hosts that need to be provisioned.
Names of hosts that need to be provisioned.
Host status for provisioning result.