The lsb.hosts file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups, host partitions, and compute units.
This file is optional. All sections are optional.
By default, this file is installed in LSB_CONFDIR/cluster_name/configdir.
After making any changes to lsb.hosts, run badmin reconfig to reconfigure mbatchd.
Optional. Defines the hosts, host types, and host models used as server hosts, and contains per-host configuration information. If this section is not configured, LSF uses all hosts in the cluster (the hosts listed in lsf.cluster.cluster_name) as server hosts.
Limit the maximum number of jobs run in total
Limit the maximum number of jobs run by each user
Run jobs only under specific load conditions
Run jobs only under specific time windows
The entries in a line for a host override the entries in a line for its model or type.
When you modify the cluster by adding or removing hosts, no changes are made to lsb.hosts. This does not affect the default configuration, but if hosts, host models, or host types are specified in this file, you should check this file whenever you make changes to the cluster and update it manually if necessary.
The first line consists of keywords identifying the load indices that you wish to configure on a per-host basis. The keyword HOST_NAME must be used; the others are optional. Load indices not listed on the keyword line do not affect scheduling decisions.
Each subsequent line describes the configuration information for one host, host model or host type. Each line must contain one entry for each keyword. Use empty parentheses ( ) or a dash (-) to specify the default value for an entry.
Required. Specify the name, model, or type of a host, or the keyword default.
The name of a host defined in lsf.cluster.cluster_name.
A host model defined in lsf.shared.
A host type defined in lsf.shared.
The reserved host name default indicates all hosts in the cluster not otherwise referenced in the section (by name or by listing its model or type).
If C, checkpoint copy is enabled. With checkpoint copy, all opened files are automatically copied to the checkpoint directory by the operating system when a process is checkpointed.
HOST_NAME CHKPNT hostA C
Checkpoint copy is only supported on Cray systems.
No checkpoint copy
The time windows in which jobs from this host, host model, or host type are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.
Not defined (always open)
Specifies a threshold for exited jobs. Specify a number of jobs. If the number of jobs that exit over a period of time specified by JOB_EXIT_RATE_DURATION in lsb.params (5 minutes by default) exceeds the number of jobs you specify as the threshold in this parameter, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.
EXIT_RATE for a specific host overrides a default GLOBAL_EXIT_RATE specified in lsb.params.
The following Host section defines a job exit rate of 20 jobs for all hosts, and an exit rate of 10 jobs on hostA.
Begin Host
HOST_NAME MXJ EXIT_RATE # Keywords
Default ! 20
hostA ! 10
End Host
Not defined
Per-user job slot limit for the host. Maximum number of job slots that each user can use on this host.
HOST_NAME JL/U
hostA 2
Unlimited
Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes.
LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes. Specify a value of 0 to migrate jobs immediately upon suspension. The migration threshold applies to all jobs running on the host.
Job-level command line migration threshold overrides threshold configuration in application profile and queue. Application profile configuration overrides queue level configuration. When a host migration threshold is specified, and is lower than the value for the job, the queue, or the application, the host value is used.
Does not affect MultiCluster jobs that are forwarded to a remote cluster.
Not defined. LSF does not migrate checkpointable or rerunnable jobs automatically.
The number of job slots on the host.
With MultiCluster resource leasing model, this is the number of job slots on the host that are available to the local cluster.
Use ! to make the number of job slots equal to the number of CPUs on a host.
For the reserved host name default, ! makes the number of job slots equal to the number of CPUs on all hosts in the cluster not otherwise referenced in the section.
By default, the number of running and suspended jobs on a host cannot exceed the number of job slots. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot.
On multiprocessor hosts, to fully use the CPU resource, make the number of job slots equal to or greater than the number of processors.
Unlimited
load_index loadSched[/loadStop]
Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared (host based) dynamic custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.
Scheduling and suspending thresholds for dynamic load indices supported by LIM, including external load indices.
Each load index column must contain either the default entry or two numbers separated by a slash (/), with no white space. The first number is the scheduling threshold for the load index; the second number is the suspending threshold.
Queue-level scheduling and suspending thresholds are defined in lsb.queues. If both files specify thresholds for an index, those that apply are the most restrictive ones.
HOST_NAME mem swp
hostA 100/10 200/30
mem>=100 && swp>=200
mem < 10 || swp < 30
Not defined
Specifies whether the host can be used to run affinity jobs, and if so which CPUs are eligible to do so. The syntax accepts Y, N, a list of CPUs, or a CPU range.
HOST_NAME MXJ r1m AFFINITY
hostA ! () (Y)
HOST_NAME MXJ r1m AFFINITY
hostA ! () (CPU_LIST="1,3,5,7-10")
This configuration enables affinity scheduling on hostA and tells LSF to just use CPUs 1,3,5, and CPUs 7-10 to run affinity jobs.
HOST_NAME MXJ r1m AFFINITY
hostA ! () (N)
Not defined. Affinity scheduling is not enabled.
Begin Host
HOST_NAME MXJ JL/U r1m pg DISPATCH_WINDOW
hostA 1 - 0.6/1.6 10/20 (5:19:00-1:8:30 20:00-8:30)
Linux 1 - 0.5/2.5 - 23:00-8:00
default 2 1 0.6/1.6 20/40 ()
End Host
Linux is a host type defined in lsf.shared. This example Host section configures one host and one host type explicitly and configures default values for all other load-sharing hosts.
HostA runs one batch job at a time. A job will only be started on hostA if the r1m index is below 0.6 and the pg index is below 10; the running job is stopped if the r1m index goes above 1.6 or the pg index goes above 20. HostA only accepts batch jobs from 19:00 on Friday evening until 8:30 Monday morning and overnight from 20:00 to 8:30 on all other days.
For hosts of type Linux, the pg index does not have host-specific thresholds and such hosts are only available overnight from 23:00 to 8:00.
The entry with host name default applies to each of the other hosts in the cluster. Each host can run up to two jobs at the same time, with at most one job from each user. These hosts are available to run jobs at all times. Jobs may be started if the r1m index is below 0.6 and the pg index is below 20.
Optional. Defines host groups.
The name of the host group can then be used in other host group, host partition, and queue definitions, as well as on the command line. Specifying the name of a host group has exactly the same effect as listing the names of all the hosts in the group.
Host groups are specified in the same format as user groups in lsb.users.
The first line consists of two mandatory keywords, GROUP_NAME and GROUP_MEMBER, as well as optional keywords, CONDENSE and GROUP_ADMIN. Subsequent lines name a group and list its membership.
The sum of all host groups, compute groups, and host partitions cannot be more than 1024.
An alphanumeric string representing the name of the host group.
You cannot use the reserved name all, and group names must not conflict with host names.
Optional. Defines condensed host groups.
Condensed host groups are displayed in a condensed output format for the bhosts and bjobs commands.
If you configure a host to belong to more than one condensed host group, bjobs can display any of the host groups as execution host name.
Y or N.
N (the specified host group is not condensed)
A space-delimited list of host names or previously defined host group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of hosts and host groups can appear on multiple lines because hosts can belong to multiple groups. The reserved name all specifies all hosts in the cluster. An exclamation mark (!) indicates an externally-defined host group, which the egroup executable retrieves.
You can use string literals and special characters when defining host group members. Each entry cannot contain any spaces, as the list itself is space delimited.
When a leased-in host joins the cluster, the host name is in the form of host@cluster. For these hosts, only the host part of the host name is subject to pattern definitions.
Use a tilde (~) to exclude specified hosts or host groups from the list.
Use an asterisk (*) as a wildcard character to represent any number of characters.
Use square brackets with a hyphen ([integer1 - integer2]) to define a range of non-negative integers at the end of a host name. The first integer must be less than the second integer.
Use square brackets with commas ([integer1, integer2 ...]) to define individual non-negative integers at the end of a host name.
Use square brackets with commas and hyphens (for example, [integer1 - integer2, integer3, integer4 - integer5]) to define different ranges of non-negative integers at the end of a host name.
... (hostA[1-10]B[1-20] hostC[101-120])
... (hostA[1-20] hostC[101-120])
You cannot define subgroups that contain wildcards and special characters.
Host group administrators have the ability to open or close the member hosts for the group they are administering.
the GROUP_ADMIN field is a space-delimited list of user names or previously defined user group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of users and user groups can appear on multiple lines because users can belong to and administer multiple groups.
Host group administrator rights are inherited. For example, if the user admin2 is an administrator for host group hg1 and host group hg2 is a member of hg1, admin2 is also an administrator for host group hg2.
When host group administrators (who are not also cluster administrators) open or close a host, they must specify a comment with the -C option.
Any existing user or user group can be specified. A user group that specifies an external list is also allowed; however, in this location, you use the user group name that has been defined with (!) rather than (!) itself.
You cannot specify any wildcards or special characters (for example: *, !, $, #, &, ~).
You cannot specify an external group (egroup).
You cannot use the keyword ALL and you cannot administer any group that has ALL as its members.
User names and user group names cannot have spaces.
Begin HostGroup
GROUP_NAME GROUP_MEMBER GROUP_ADMIN
groupA (hostA hostD) (user1 user10)
groupB (hostF groupA hostK) ()
groupC (!) ()
End HostGroup
groupA includes hostA and hostD and can be administered by user1 and user10.
groupB includes hostF and hostK, along with all hosts in groupA. It has no administrators (only the cluster administrator can control the member hosts).
The group membership of groupC is defined externally and retrieved by the egroup executable.
Begin HostGroup
GROUP_NAME GROUP_MEMBER GROUP_ADMIN
groupA (all) ()
groupB (groupA ~hostA ~hostB) (user11 user14)
groupC (hostX hostY hostZ) ()
groupD (groupC ~hostX) usergroupB
groupE (all ~groupC ~hostB) ()
groupF (hostF groupC hostK) ()
End HostGroup
groupA contains all hosts in the cluster and is administered by the cluster administrator.
groupB contains all the hosts in the cluster except for hostA and hostB and is administered by user11 and user14.
groupC contains only hostX, hostY, and hostZ and is administered by the cluster administrator.
groupD contains the hosts in groupC except for hostX. Note that hostX must be a member of host group groupC to be excluded from groupD. usergroupB is the administrator for groupD.
groupE contains all hosts in the cluster excluding the hosts in groupC and hostB and is administered by the cluster administrator.
groupF contains hostF, hostK, and the 3 hosts in groupC and is administered by the cluster administrator.
Begin HostGroup
GROUP_NAME CONDENSE GROUP_MEMBER GROUP_ADMIN
groupA N (all) ()
groupB N (hostA, hostB) (usergroupC user1)
groupC Y (all)()
End HostGroup
groupA shows uncondensed output and contains all hosts in the cluster and is administered by the cluster administrator.
groupB shows uncondensed output, and contains hostA and hostB. It is administered by all members of usergroupC and user1.
groupC shows condensed output and contains all hosts in the cluster and is administered by the cluster administrator.
Begin HostGroup
GROUP_NAME CONDENSE GROUP_MEMBER GROUP_ADMIN
groupA Y (host*) (user7)
groupB N (*A) ()
groupC N (hostB* ~hostB[1-50]) ()
groupD Y (hostC[1-50] hostC[101-150]) (usergroupJ)
groupE N (hostC[51-100] hostC[151-200]) ()
groupF Y (hostD[1,3] hostD[5-10]) ()
groupG N (hostD[11-50] ~hostD[15,20,25] hostD2) ()
End HostGroup
groupA shows condensed output, and contains all hosts starting with the string host. It is administered by user7.
groupB shows uncondensed output, and contains all hosts ending with the string A, such as hostA and is administered by the cluster administrator.
groupC shows uncondensed output, and contains all hosts starting with the string hostB except for the hosts from hostB1 to hostB50 and is administered by the cluster administrator.
groupD shows condensed output, and contains all hosts from hostC1 to hostC50 and all hosts from hostC101 to hostC150 and is administered by the the members of usergroupJ.
groupE shows uncondensed output, and contains all hosts from hostC51 to hostC100 and all hosts from hostC151 to hostC200 and is administered by the cluster administrator.
groupF shows condensed output, and contains hostD1, hostD3, and all hosts from hostD5 to hostD10 and is administered by the cluster administrator.
groupG shows uncondensed output, and contains all hosts from hostD11 to hostD50 except for hostD15, hostD20, and hostD25. groupG also includes hostD2. It is administered by the cluster administrator.
Optional. Used with host partition user-based fairshare scheduling. Defines a host partition, which defines a user-based fairshare policy at the host level.
Configure multiple sections to define multiple partitions.
The members of a host partition form a host group with the same name as the host partition.
If you configure a host partition, you cannot configure fairshare at the queue level.
Jobs in the queue sometimes may be dispatched to the host partition even though hosts not belonging to any host partition have a lighter load.
If some hosts belong to one host partition and some hosts belong to another, only the priorities of one host partition are used when dispatching a parallel job to hosts from more than one host partition.
If a resource is shared among hosts included in host partitions and hosts that are not included in any host partition, jobs in queues that use the host partitions will always get the shared resource first, regardless of queue priority.
If a resource is shared among host partitions, jobs in queues that use the host partitions listed first in the HostPartition section of lsb.hosts will always have priority to get the shared resource first. To allocate shared resources among host partitions, LSF considers host partitions in the order they are listed in lsb.hosts.
Each host partition always consists of 3 lines, defining the name of the partition, the hosts included in the partition, and the user share assignments.
HPART_NAME=partition_name
Specifies the name of the partition. The name must be 59 characters or less.
HOSTS=[[~]host_name | [~]host_group | all]...
Specifies the hosts in the partition, in a space-separated list.
A host cannot belong to multiple partitions.
A host group cannot be empty.
Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy.
Optionally, use the reserved host name all to configure a single partition that applies to all hosts in a cluster.
Optionally, use the not operator (~) to exclude hosts or host groups from the list of hosts in the host partition.
HOSTS=all ~hostK ~hostM
HOSTS=groupA ~hostL
The partition includes all the hosts in host group groupA except for hostL.
USER_SHARES=[user, number_shares]...
Specify at least one user share assignment.
Enclose each user share assignment in square brackets, as shown.
Separate a list of multiple share assignments with a space between the square brackets.
To a single user (specify user_name). To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).
To users in a group, individually (specify group_name@) or collectively (specify group_name). To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\group_name).
To users not included in any other share assignment, individually (specify the keyword default) or collectively (specify the keyword others).
By default, when resources are assigned collectively to a group, the group members compete for the resources according to FCFS scheduling. You can use hierarchical fairshare to further divide the shares among the group members.
Specify a positive integer representing the number of shares of the cluster resources assigned to the user.
The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
Begin HostPartition
HPART_NAME = Partition1 HOSTS = hostA hostB USER_SHARES =
[groupA@, 3] [groupB, 7] [default, 1]
End HostPartition
Optional. Defines compute units.
Once defined, the compute unit can be used in other compute unit and queue definitions, as well as in the command line. Specifying the name of a compute unit has the same effect as listing the names of all the hosts in the compute unit.
Compute units are similar to host groups, with the added feature of granularity allowing the construction of structures that mimic the network architecture. Job scheduling using compute unit resource requirements effectively spreads jobs over the cluster based on the configured compute units.
To enforce consistency, compute unit configuration has the following requirements:
Hosts and host groups appear in the finest granularity compute unit type, and nowhere else.
Hosts appear in only one compute unit of the finest granularity.
All compute units of the same type have the same type of compute units (or hosts) as members.
Compute units are specified in the same format as host groups in lsb.hosts.
The first line consists of three mandatory keywords, NAME, MEMBER, and TYPE, as well as optional keywords CONDENSE and ADMIN. Subsequent lines name a compute unit and list its membership.
The sum of all host groups, compute groups, and host partitions cannot be more than 1024.
An alphanumeric string representing the name of the compute unit.
You cannot use the reserved names all, allremote, others, and default. Compute unit names must not conflict with host names, host partitions, or host group names.
Optional. Defines condensed compute units.
Condensed compute units are displayed in a condensed output format for the bhosts and bjobs commands. The condensed compute unit format includes the slot usage for each compute unit.
Y or N.
N (the specified host group is not condensed)
A space-delimited list of host names or previously defined compute unit names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of hosts and host groups can appear only once, and only in a compute unit type of the finest granularity.
An exclamation mark (!) indicates an externally-defined host group, which the egroup executable retrieves.
You can use string literals and special characters when defining compute unit members. Each entry cannot contain any spaces, as the list itself is space delimited.
Use a tilde (~) to exclude specified hosts or host groups from the list.
Use an asterisk (*) as a wildcard character to represent any number of characters.
Use square brackets with a hyphen ([integer1 - integer2]) to define a range of non-negative integers at the end of a host name. The first integer must be less than the second integer.
Use square brackets with commas ([integer1, integer2...]) to define individual non-negative integers at the end of a host name.
Use square brackets with commas and hyphens (for example, [integer1 - integer2, integer3, integer4 - integer5]) to define different ranges of non-negative integers at the end of a host name.
... (enclA[1-10]B[1-20] enclC[101-120])
... (enclA[1-20] enclC[101-120])
Compute unit names cannot be used in compute units of the finest granularity.
You cannot include host or host group names except in compute units of the finest granularity.
You must not skip levels of granularity. For example:
If lsb.params contains COMPUTE_UNIT_TYPES=enclosure rack cabinet then a compute unit of type cabinet can contain compute units of type rack, but not of type enclosure.
The keywords all, allremote, all@cluster, other and default cannot be used when defining compute units.
The type of the compute unit, as defined in the COMPUTE_UNIT_TYPES parameter of lsb.params.
Compute unit administrators have the ability to open or close the member hosts for the compute unit they are administering.
the ADMIN field is a space-delimited list of user names or previously defined user group names, enclosed in one pair of parentheses.
You cannot use more than one pair of parentheses to define the list.
The names of users and user groups can appear on multiple lines because users can belong to and administer multiple compute units.
Compute unit administrator rights are inherited. For example, if the user admin2 is an administrator for compute unit cu1 and compute unit cu2 is a member of cu1, admin2 is also an administrator for compute unit cu2.
When compute unit administrators (who are not also cluster administrators) open or close a host, they must specify a comment with the -C option.
Any existing user or user group can be specified. A user group that specifies an external list is also allowed; however, in this location, you use the user group name that has been defined with (!) rather than (!) itself.
You cannot specify any wildcards or special characters (for example: *, !, $, #, &, ~).
You cannot specify an external group (egroup).
You cannot use the keyword ALL and you cannot administer any group that has ALL as its members.
User names and user group names cannot have spaces.
Begin ComputeUnit
NAME MEMBER TYPE
encl1 (host1 host2) enclosure
encl2 (host3 host4) enclosure
encl3 (host5 host6) enclosure
encl4 (host7 host8) enclosure
rack1 (encl1 encl2) rack
rack2 (encl3 encl4) rack
cbnt1 (rack1 rack2) cabinet
End ComputeUnit
encl1, encl2, encl3 and encl4 are the finest granularity, and each contain two hosts.
rack1 is of coarser granularity and contains two levels. At the enclosure level rack1 contains encl1 and encl2. At the lowest level rack1 contains host1, host2, host3, and host4.
rack2 has the same structure as rack1, and contains encl3 and encl4.
cbnt1 contains two racks (rack1 and rack2), four enclosures (encl1, encl2, encl3, and encl4) and all eight hosts. Compute unit cbnt1 is the coarsest granularity in this example.
Begin ComputeUnit
NAME CONDENSE MEMBER TYPE ADMIN
encl1 Y (hg123 ~hostA ~hostB) enclosure (user11 user14)
encl2 Y (hg456) enclosure ()
encl3 N (hostA hostB) enclosure usergroupB
encl4 N (hgroupX ~hostB) enclosure ()
encl5 Y (hostC* ~hostC[101-150]) enclosure usergroupJ
encl6 N (hostC[101-150]) enclosure ()
rack1 Y (encl1 encl2 encl3) rack ()
rack2 N (encl4 encl5) rack usergroupJ
rack3 N (encl6) rack ()
cbnt1 Y (rack1 rack2) cabinet ()
cbnt2 N (rack3) cabinet user14
End ComputeUnit
All six enclosures (finest granularity) contain only hosts and host groups. All three racks contain only enclosures. Both cabinets (coarsest granularity) contain only racks.
encl1 contains all the hosts in host group hg123 except for hostA and hostB and is administered by user11 and user14. Note that hostA and hostB must be members of host group hg123 to be excluded from encl1. encl1 shows condensed output.
encl2 contains host group hg456 and is administered by the cluster administrator. encl2 shows condensed output.
encl3 contains hostA and hostB. usergroupB is the administrator for encl3. encl3 shows uncondensed output.
encl4 contains host group hgroupX except for hostB. Since each host can appear in only one enclosure and hostB is already in encl3, it cannot be in encl4. encl4 is administered by the cluster administrator. encl4 shows uncondensed output.
encl5 contains all hosts starting with the string hostC except for hosts hostC101 to hostC150, and is administered by usergroupJ. encl5 shows condensed output.
rack1 contains encl1, encl2, and encl3. rack1 shows condensed output.
rack2 contains encl4, and encl5. rack2 shows uncondensed output.
rack3 contains encl6. rack3 shows uncondensed output.
cbnt1 contains rack1 and rack2. cbnt1 shows condensed output.
cbnt2 contains rack3. Even though rack3 only contains encl6, cbnt3 cannot contain encl6 directly because that would mean skipping the level associated with compute unit type rack. cbnt2 shows uncondensed output.
Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.hosts by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.
The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability.
Begin Host
HOST_NAME r15s r1m pg
host1 3/5 3/5 12/20
#if time(5:16:30-1:8:30 20:00-8:30)
host2 3/5 3/5 12/20
#else
0host2 2/3 2/3 10/12
#endif
host3 3/5 3/5 12/20
End Host