Load indices

Load indices are built-in resources that measure the availability of static or dynamic, non-shared resources on hosts in the LSF cluster.

Load indices that are built into the LIM are updated at fixed time intervals.

External load indices are defined and configured by the LSF administrator, who writes an external load information manager (elim) executable. The elim collects the values of the external load indices and sends these values to the LIM.

Load indices collected by LIM

Index

Measures

Units

Direction

Averaged over

Update Interval

status

host status

string

15 seconds

r15s

run queue length

processes

increasing

15 seconds

15 seconds

r1m

run queue length

processes

increasing

1 minute

15 seconds

r15m

run queue length

processes

increasing

15 minutes

15 seconds

ut

CPU utilization

percent

increasing

1 minute

15 seconds

pg

paging activity

pages in + pages out per second

increasing

1 minute

15 seconds

ls

logins

users

increasing

N/A

30 seconds

it

idle time

minutes

decreasing

N/A

30 seconds

swp

available swap space

MB

decreasing

N/A

15 seconds

mem

available memory

MB

decreasing

N/A

15 seconds

tmp

available space in temporary file system

MB

decreasing

N/A

120 seconds

io

disk I/O (shown by lsload -l)

KB per second

increasing

1 minute

15 seconds

name

external load index configured by LSF administrator

site-defined

Status

The status index is a string indicating the current status of the host. This status applies to the LIM and RES.

The possible values for status are:

Status

Description

ok

The host is available to accept remote jobs. The LIM can select the host for remote execution.

-ok

When the status of a host is preceded by a dash (-), it means that LIM is available but RES is not running on that host or is not responding.

busy

The host is overloaded (busy) because a load index exceeded a configured threshold. An asterisk (*) marks the offending index. LIM will not select the host for interactive jobs.

lockW

The host is locked by its run window. Use lshosts to display run windows.

lockU

The host is locked by an LSF administrator or root.

unavail

The host is down or the LIM on the host is not running or is not responding.

Note:

The term available is frequently used in command output titles and headings. Available means that a host is in any state except unavail. This means an available host could be, locked, busy, or ok.

CPU run queue lengths (r15s, r1m, r15m)

The r15s, r1m and r15m load indices are the 15-second, 1-minute, and 15-minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval.

On UNIX, run queue length indices are not necessarily the same as the load averages printed by the uptime(1) command; uptime load averages on some platforms also include processes that are in short-term wait states (such as paging or disk I/O).

Effective run queue length

On multiprocessor systems, more than one process can execute at a time. LSF scales the run queue value on multiprocessor systems to make the CPU load of uniprocessors and multiprocessors comparable. The scaled value is called the effective run queue length.

Use lsload -E to view the effective run queue length.

Normalized run queue length

LSF also adjusts the CPU run queue that is based on the relative speeds of the processors (the CPU factor). The normalized run queue length is adjusted for both number of processors and CPU speed. The host with the lowest normalized run queue length runs a CPU-intensive job the fastest.

Use lsload -N to view the normalized CPU run queue lengths.

CPU utilization (ut)

The ut index measures CPU utilization, which is the percentage of time spent running system and user code. A host with no process running has a ut value of 0 percent; a host on which the CPU is completely loaded has a ut of 100 percent.

Paging rate (pg)

The pg index gives the virtual memory paging rate in pages per second. This index is closely tied to the amount of available RAM memory and the total size of the processes running on a host; if there is not enough RAM to satisfy all processes, the paging rate is high. Paging rate is a good measure of how a machine responds to interactive use; a machine that is paging heavily feels very slow.

Login sessions (ls)

The ls index gives the number of users logged in. Each user is counted once, no matter how many times they have logged into the host.

Interactive idle time (it)

On UNIX, the it index is the interactive idle time of the host, in minutes. Idle time is measured from the last input or output on a directly attached terminal or a network pseudo-terminal supporting a login session. This does not include activity directly through the X server such as CAD applications or emacs windows, except on Solaris and HP-UX systems.

On Windows, the it index is based on the time a screen saver has been active on a particular host.

Temporary directories (tmp)

The tmp index is the space available in MB or in units set in LSF_UNIT_FOR_LIMITS in lsf.conf) on the file system that contains the temporary directory:

  • /tmp on UNIX

  • C:\temp on Windows

Swap space (swp)

The swp index gives the currently available virtual memory (swap space) in MB or units set in LSF_UNIT_FOR_LIMITS in lsf.conf). This represents the largest process that can be started on the host.

Memory (mem)

The mem index is an estimate of the real memory currently available to user processes, measured in MB or in units set in LSF_UNIT_FOR_LIMITS in lsf.conf). This represents the approximate size of the largest process that could be started on a host without causing the host to start paging.

LIM reports the amount of free memory available. LSF calculates free memory as a sum of physical free memory, cached memory, buffered memory, and an adjustment value. The command vmstat also reports free memory but displays these values separately. There may be a difference between the free memory reported by LIM and the free memory reported by vmstat because of virtual memory behavior variations among operating systems. You can write an ELIM that overrides the free memory values that are returned by LIM.

I/O rate (io)

The io index measures I/O throughput to disks attached directly to this host, in KB per second. It does not include I/O to disks that are mounted from other hosts.

View information about load indices

lsinfo -l

The lsinfo -l command displays all information available about load indices in the system. You can also specify load indices on the command line to display information about selected indices:

lsinfo -l swp
RESOURCE_NAME:  swp
DESCRIPTION: Available swap space (Mbytes) (alias: swap)
TYPE      ORDER   INTERVAL  BUILTIN  DYNAMIC  RELEASE
Numeric     Dec         60      Yes      Yes       NO
lsload -l

The lsload -l command displays the values of all load indices. External load indices are configured by your LSF administrator:

lsload
HOST_NAME  status  r15s  r1m  r15m  ut   pg   ls  it  tmp  swp   mem
hostN      ok      0.0   0.0  0.1   1%   0.0  1   224 43M  67M   3M
hostK      -ok     0.0   0.0  0.0   3%   0.0  3   0   38M  40M   7M
hostF      busy    0.1   0.1  0.3   7%   *17  6   0   9M   23M   28M
hostG      busy    *6.2  6.9  9.5   85%  1.1  30  0   5M   400M  385M
hostV      unavail