lsb.acct

The lsb.acct file is the batch job log file of LSF. The master batch daemon (see mbatchd(8)) generates a record for each job completion or failure. The record is appended to the job log file lsb.acct.

The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid(1). See mbatchd(8) for the description of LSB_SHAREDIR.

The bacct command uses the current lsb.acct file for its output.

lsb.acct structure

The job log file is an ASCII file with one record per line. The fields of a record are separated by blanks. If the value of some field is unavailable, a pair of double quotation marks ("") is logged for character string, 0 for time and number, and -1 for resource usage.

Configuring automatic archiving

The following parameters in lsb.params affect how records are logged to lsb.acct:
ACCT_ARCHIVE_AGE=days

Enables automatic archiving of LSF accounting log files, and specifies the archive interval. LSF archives the current log file if the length of time from its creation date exceeds the specified number of days.

By default there is no limit to the age of lsb.acct.

ACCT_ARCHIVE_SIZE=kilobytes

Enables automatic archiving of LSF accounting log files, and specifies the archive threshold. LSF archives the current log file if its size exceeds the specified number of kilobytes.

By default, there is no limit to the size of lsb.acct.

ACCT_ARCHIVE_TIME=hh:mm

Enables automatic archiving of LSF accounting log file lsb.acct, and specifies the time of day to archive the current log file.

By default, no time is set for archiving lsb.acct.

MAX_ACCT_ARCHIVE_FILE=integer

Enables automatic deletion of archived LSF accounting log files and specifies the archive limit.

By default, lsb.acct.n files are not automatically deleted.

Records and fields

The fields of a record are separated by blanks. The first string of an event record indicates its type. The following types of events are recorded:
  • JOB_FINISH

  • EVENT_ADRSV_FINISH

  • JOB_RESIZE

JOB_FINISH

A job has finished.

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.acct file format.

The fields in order of occurrence are:
Event type (%s)

Which is JOB_FINISH

Version Number (%s)

Version number of the log file format

Event Time (%d)

Time the event was logged (in seconds since the epoch)

jobId (%d)

ID for the job

userId (%d)

UNIX user ID of the submitter

options (%d)

Bit flags for job processing

numProcessors (%d)

Number of processors initially requested for execution

submitTime (%d)

Job submission time

beginTime (%d)

Job start time – the job should be started at or after this time

termTime (%d)

Job termination deadline – the job should be terminated by this time

startTime (%d)

Job dispatch time – time job was dispatched for execution

userName (%s)

User name of the submitter

queue (%s)

Name of the job queue to which the job was submitted

resReq (%s)

Resource requirement specified by the user

dependCond (%s)

Job dependency condition specified by the user

preExecCmd (%s)

Pre-execution command specified by the user

fromHost (%s)

Submission host name

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 512 characters for Windows)

subcwd (%s)

Current working directory specified by bsub -cwd.

inFile (%s)

Input file name (up to 4094 characters for UNIX or 512 characters for Windows)

outFile (%s)

output file name (up to 4094 characters for UNIX or 512 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 512 characters for Windows)

jobFile (%s)

Job script file name

numAskedHosts (%d)

Number of host names to which job dispatching will be limited

askedHosts (%s)

List of host names to which job dispatching will be limited (%s for each); nothing is logged to the record for this value if the last field value is 0. If there is more than one host name, then each additional host name will be returned in its own field

numExHosts (%d)

Number of processors used for execution

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in the execHosts field.

Logged value reflects the allocation at job finish time.

execHosts (%s)

List of execution host names (%s for each); nothing is logged to the record for this value if the last field value is 0.

If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.

The logged value reflects the allocation at job finish time.

jStatus (%d)

Job status. The number 32 represents EXIT, 64 represents DONE

hostFactor (%f)

CPU factor of the first execution host.

jobName (%s)

Job name (up to 4094 characters).

command (%s)

Complete batch job command specified by the user (up to 4094 characters for UNIX or 512 characters for Windows).

lsfRusage (%f)
The following fields contain resource usage information for the job (see getrusage(2)). If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
ru_utime (%f)

User time used

ru_stime (%f)

System time used

ru_maxrss (%f)

Maximum shared text size

ru_ixrss (%f)

Integral of the shared text size over time (in KB seconds)

ru_ismrss (%f)

Integral of the shared memory size over time (valid only on Ultrix)

ru_idrss (%f)

Integral of the unshared data size over time

ru_isrss (%f)

Integral of the unshared stack size over time

ru_minflt (%f)

Number of page reclaims

ru_majflt (%f)

Number of page faults

ru_nswap (%f)

Number of times the process was swapped out

ru_inblock (%f)

Number of block input operations

ru_oublock (%f)

Number of block output operations

ru_ioch (%f)

Number of characters read and written (valid only on HP-UX)

ru_msgsnd (%f)

Number of System V IPC messages sent

ru_msgrcv (%f)

Number of messages received

ru_nsignals (%f)

Number of signals received

ru_nvcsw (%f)

Number of voluntary context switches

ru_nivcsw (%f)

Number of involuntary context switches

ru_exutime (%f)

Exact user time used (valid only on ConvexOS)

mailUser (%s)

Name of the user to whom job related mail was sent

projectName (%s)

LSF project name

exitStatus (%d)

UNIX exit status of the job

maxNumProcessors (%d)

Maximum number of processors specified for the job

loginShell (%s)

Login shell used for the job

timeEvent (%s)

Time event string for the job - Platform Process Manager only

idx (%d)

Job array index

maxRMem (%d)

Maximum resident memory usage in KB of all processes in the job

maxRSwap (%d)

Maximum virtual memory usage in KB of all processes in the job

inFileSpool (%s)

Spool input file (up to 4094 characters for UNIX or 512 characters for Windows)

commandSpool (%s)

Spool command file (up to 4094 characters for UNIX or 512 characters for Windows)

rsvId %s

Advance reservation ID for a user group name less than 120 characters long; for example, "user2#0"

If the advance reservation user group name is longer than 120 characters, the rsvId field output appears last.

sla (%s)

SLA service class name under which the job runs

exceptMask (%d)

Job exception handling

Values:
  • J_EXCEPT_OVERRUN 0x02

  • J_EXCEPT_UNDERUN 0x04

  • J_EXCEPT_IDLE 0x80

additionalInfo (%s)

Placement information of HPC jobs

exitInfo (%d)

Job termination reason, mapped to corresponding termination keyword displayed by bacct.

warningAction (%s)

Job warning action

warningTimePeriod (%d)

Job warning time period in seconds

chargedSAAP (%s)

SAAP charged to a job

licenseProject (%s)

Platform License Scheduler project name

app (%s)

Application profile name

postExecCmd (%s)

Post-execution command to run on the execution host after the job finishes

runtimeEstimation (%d)

Estimated run time for the job, calculated as the CPU factor of the submission host multiplied by the runtime estimate (in seconds).

jobGroupName (%s)

Job group name

requeueEvalues (%s)

Requeue exit value

options2 (%d)

Bit flags for job processing

resizeNotifyCmd (%s)

Resize notification command to be invoked on the first execution host upon a resize request.

lastResizeTime (%d)

Last resize time. The latest wall clock time when a job allocation is changed.

rsvId %s

Advance reservation ID for a user group name more than 120 characters long.

If the advance reservation user group name is longer than 120 characters, the rsvId field output appears last.

jobDescription (%s)

Job description (up to 4094 characters).

submitEXT
Submission extension field, reserved for internal use.
Num (%d)

Number of elements (key-value pairs) in the structure.

key (%s)

Reserved for internal use.

value (%s)

Reserved for internal use.

options3 (%d)

Bit flags for job processing

bsub -W(%s)

Job submission runtime limit

Start of changenumHostRusage(%d)End of change
Start of change

The number of host-based resource usage entries (hostRusage) that follow. 0 unless LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.

End of change
Start of changehostRusageEnd of change
Start of change
The following fields contain host-based resource usage information for the job., and only appear for parallel jobs when LSF_HPC_EXTENSIONS="HOST_RUSAGE" is set in lsf.conf.
hostname (%s)

Name of the host.

mem(%d)

Total resident memory usage of all processes in the job running on this host.

swap(%d)

The total virtual memory usage of all processes in the job running on this host.

utime(%d)

User time used on this host.

stime(%d)

System time used on this host.

hHostExtendInfo(%d)

Number of following key-value pairs containing extended host information (PGIDs and PIDs). Set to 0 in lsb.events, lsb.acct, and lsb.stream files.

srcJobId (%d)

The submission cluster job ID

srcCluster (%s)

The name of the submission cluster

dstJobId (%d)

The execution cluster job ID

dstCluster (%s)

The name of the execution cluster

End of change
effectiveResReq (%s)

The runtime resource requirements used for the job.

network (%s)

Network requirements for IBM Parallel Environment (PE) jobs.

totalProvisionTime (%d)
Platform Dynamic Cluster only - time in seconds that the job has been in the provisioning (PROV) state.
runTime (%d)
Time in seconds that the job has been in the run state. runTime includes the totalProvisionTime.
Start of changecpu_frequency(%d)End of change
Start of change

CPU frequency at which the job ran.

End of change

EVENT_ADRSV_FINISH

An advance reservation has expired. The fields in order of occurrence are:
Event type (%s)

Which is EVENT_ADRSV_FINISH

Version Number (%s)

Version number of the log file format

Event Logging Time (%d)

Time the event was logged (in seconds since the epoch); for example, "1038942015"

Reservation Creation Time (%d)

Time the advance reservation was created (in seconds since the epoch); for example, 1038938898

Reservation Type (%d)
Type of advance reservation request:
  • User reservation (RSV_OPTION_USER, defined as 0x001)

  • User group reservation (RSV_OPTION_GROUP, defined as 0x002)

  • System reservation (RSV_OPTION_SYSTEM, defined as 0x004)

  • Recurring reservation (RSV_OPTION_RECUR, defined as 0x008)

For example, 9is a recurring reservation created for a user.

Creator ID (%d)

UNIX user ID of the reservation creator; for example, 30408

Reservation ID (rsvId %s)

For example, user2#0

User Name (%s)

User name of the reservation user; for example, user2

Time Window (%s)
Time window of the reservation:
  • One-time reservation in seconds since the epoch; for example, 1033761000-1033761600

  • Recurring reservation; for example, 17:50-18:00

Creator Name (%s)

User name of the reservation creator; for example, user1

Duration (%d)

Duration of the reservation, in hours, minutes, seconds; for example, 600is 6 hours, 0 minutes, 0 seconds

Number of Resources (%d)

Number of reserved resource pairs in the resource list; for example 2indicates 2 resource pairs (hostA 1 hostB 1)

Host Name (%s)

Reservation host name; for example, hostA

Number of CPUs (%d)

Number of reserved CPUs; for example 1

JOB_RESIZE

When there is an allocation change, LSF logs the event after mbatchd receives a JOB_RESIZE_NOTIFY_DONE event. From lastResizeTime and eventTime, you can calculate the duration of previous job allocation. The fields in order of occurrence are:
Version number (%s)

The version number.

Event Time (%d)

Time the event was logged (in seconds since the epoch).

jobId (%d)

ID for the job.

tdx (%d)

Job array index.

startTime (%d)

The start time of the running job.

userId (%d)

UNIX user ID of the user invoking the command

userName (%s)

User name of the submitter

resizeType (%d)

Resize event type, 0, grow, 1 shrink.

lastResizeTime(%d)

The wall clock time when job allocation is changed previously. The first lastResizeTime is the job start time.

numExecHosts (%d)

The number of execution hosts before allocation is changed. Support LSF_HPC_EXTENSIONS="SHORT_EVENTFILE".

execHosts (%s)

Execution host list before allocation is changed. Support LSF_HPC_EXTENSIONS="SHORT_EVENTFILE".

numResizeHosts (%d)

Number of processors used for execution during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of hosts listed in short format.

resizeHosts (%s)

List of execution host names during resize. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.