Load indices are built-in resources that measure the availability of static or dynamic, non-shared resources on hosts in the LSF cluster.
Load indices that are built into the LIM are updated at fixed time intervals.
External load indices are defined and configured by the LSF administrator, who writes an external load information manager (elim) executable. The elim collects the values of the external load indices and sends these values to the LIM.
Index |
Measures |
Units |
Direction |
Averaged over |
Update Interval |
---|---|---|---|---|---|
status |
host status |
string |
15 seconds |
||
r15s |
run queue length |
processes |
increasing |
15 seconds |
15 seconds |
r1m |
run queue length |
processes |
increasing |
1 minute |
15 seconds |
r15m |
run queue length |
processes |
increasing |
15 minutes |
15 seconds |
ut |
CPU utilization |
percent |
increasing |
1 minute |
15 seconds |
pg |
paging activity |
pages in + pages out per second |
increasing |
1 minute |
15 seconds |
ls |
logins |
users |
increasing |
N/A |
30 seconds |
it |
idle time |
minutes |
decreasing |
N/A |
30 seconds |
swp |
available swap space |
MB |
decreasing |
N/A |
15 seconds |
mem |
available memory |
MB |
decreasing |
N/A |
15 seconds |
tmp |
available space in temporary file system |
MB |
decreasing |
N/A |
120 seconds |
io |
disk I/O (shown by lsload -l) |
KB per second |
increasing |
1 minute |
15 seconds |
name |
external load index configured by LSF administrator |
site-defined |
The status index is a string indicating the current status of the host. This status applies to the LIM and RES.
The possible values for status are:
Status |
Description |
---|---|
ok |
The host is available to accept remote jobs. The LIM can select the host for remote execution. |
-ok |
When the status of a host is preceded by a dash (-), it means that LIM is available but RES is not running on that host or is not responding. |
busy |
The host is overloaded (busy) because a load index exceeded a configured threshold. An asterisk (*) marks the offending index. LIM will not select the host for interactive jobs. |
lockW |
The host is locked by its run window. Use lshosts to display run windows. |
lockU |
The host is locked by an LSF administrator or root. |
unavail |
The host is down or the LIM on the host is not running or is not responding. |
The term available is frequently used in command output titles and headings. Available means that a host is in any state except unavail. This means an available host could be, locked, busy, or ok.
The r15s, r1m and r15m load indices are the 15-second, 1-minute, and 15-minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval.
On UNIX, run queue length indices are not necessarily the same as the load averages printed by the uptime(1) command; uptime load averages on some platforms also include processes that are in short-term wait states (such as paging or disk I/O).
On multiprocessor systems, more than one process can execute at a time. LSF scales the run queue value on multiprocessor systems to make the CPU load of uniprocessors and multiprocessors comparable. The scaled value is called the effective run queue length.
Use lsload -E to view the effective run queue length.
LSF also adjusts the CPU run queue that is based on the relative speeds of the processors (the CPU factor). The normalized run queue length is adjusted for both number of processors and CPU speed. The host with the lowest normalized run queue length runs a CPU-intensive job the fastest.
Use lsload -N to view the normalized CPU run queue lengths.
The ut index measures CPU utilization, which is the percentage of time spent running system and user code. A host with no process running has a ut value of 0 percent; a host on which the CPU is completely loaded has a ut of 100 percent.
The pg index gives the virtual memory paging rate in pages per second. This index is closely tied to the amount of available RAM memory and the total size of the processes running on a host; if there is not enough RAM to satisfy all processes, the paging rate is high. Paging rate is a good measure of how a machine responds to interactive use; a machine that is paging heavily feels very slow.
The ls index gives the number of users logged in. Each user is counted once, no matter how many times they have logged into the host.
On UNIX, the it index is the interactive idle time of the host, in minutes. Idle time is measured from the last input or output on a directly attached terminal or a network pseudo-terminal supporting a login session. This does not include activity directly through the X server such as CAD applications or emacs windows, except on Solaris and HP-UX systems.
On Windows, the it index is based on the time a screen saver has been active on a particular host.
The tmp index is the space available in MB or in units set in LSF_UNIT_FOR_LIMITS in lsf.conf) on the file system that contains the temporary directory:
/tmp on UNIX
C:\temp on Windows
The swp index gives the currently available virtual memory (swap space) in MB or units set in LSF_UNIT_FOR_LIMITS in lsf.conf). This represents the largest process that can be started on the host.
The mem index is an estimate of the real memory currently available to user processes, measured in MB or in units set in LSF_UNIT_FOR_LIMITS in lsf.conf). This represents the approximate size of the largest process that could be started on a host without causing the host to start paging.
LIM reports the amount of free memory available. LSF calculates free memory as a sum of physical free memory, cached memory, buffered memory, and an adjustment value. The command vmstat also reports free memory but displays these values separately. There may be a difference between the free memory reported by LIM and the free memory reported by vmstat because of virtual memory behavior variations among operating systems. You can write an ELIM that overrides the free memory values that are returned by LIM.
The io index measures I/O throughput to disks attached directly to this host, in KB per second. It does not include I/O to disks that are mounted from other hosts.
The lsinfo -l command displays all information available about load indices in the system. You can also specify load indices on the command line to display information about selected indices:
lsinfo -l swp
RESOURCE_NAME: swp
DESCRIPTION: Available swap space (Mbytes) (alias: swap)
TYPE ORDER INTERVAL BUILTIN DYNAMIC RELEASE
Numeric Dec 60 Yes Yes NO
The lsload -l command displays the values of all load indices. External load indices are configured by your LSF administrator:
lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostN ok 0.0 0.0 0.1 1% 0.0 1 224 43M 67M 3M
hostK -ok 0.0 0.0 0.0 3% 0.0 3 0 38M 40M 7M
hostF busy 0.1 0.1 0.3 7% *17 6 0 9M 23M 28M
hostG busy *6.2 6.9 9.5 85% 1.1 30 0 5M 400M 385M
hostV unavail