lsf.shared

The lsf.shared file contains common definitions that are shared by all load sharing clusters defined by lsf.cluster.cluster_name files. This includes lists of cluster names, host types, host models, the special resources available, and external load indices, including indices required to submit jobs using JSDL files.

This file is installed by default in the directory defined by LSF_CONFDIR.

Changing lsf.shared configuration

After making any changes to lsf.shared, run the following commands:
  • lsadmin reconfig to reconfigure LIM

  • badmin mbdrestart to restart mbatchd

Cluster section

(Required) Lists the cluster names recognized by the LSF system

Cluster section structure

The first line must contain the mandatory keyword ClusterName. The other keyword is optional.

The first line must contain the mandatory keyword ClusterName and the keyword Servers in a MultiCluster environment.

Each subsequent line defines one cluster.

Example Cluster section

Begin Cluster
ClusterName  Servers
cluster1     hostA
cluster2     hostB
End Cluster

ClusterName

Defines all cluster names recognized by the LSF system.

All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.

By default, if MultiCluster is installed, all clusters listed in this section participate in the same MultiCluster environment. However, individual clusters can restrict their MultiCluster participation by specifying a subset of clusters at the cluster level (lsf.cluster.cluster_name RemoteClusters section).

Servers

MultiCluster only. List of hosts in this cluster that LIMs in remote clusters can connect to and obtain information from.

For other clusters to work with this cluster, one of these hosts must be running mbatchd.

MultiCluster shared configuration

A MultiCluster environment allows common configurations to be shared by all clusters. Use #INCLUDE to centralize the configuration work for groups of clusters when they all need to share a common configuration. Using #INCLUDE lets you avoid having to manually merge these common configurations into each local cluster's configuration files.

Local administrators for each cluster open their local configuration files (lsf.shared and lsb.applications) and add the #include "path_to_file" syntax to them. All #include lines must be inserted at the beginning of the local configuration file.

For example:

#INCLUDE "/Shared/lsf.shared.common.a"
#INCLUDE "/Shared/lsf.shared.common.c"
Begin Cluster
Cluster Name     Servers
...

To make the new configuration active in lsf.shared, restart LSF with the lsfrestart command. Both common resources and local resources will take effect on the local cluster. Once LSF is running again, use the lsinfo command to check whether the configuration is active.

HostType section

(Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type.

CAUTION:

If you remove NTX86, NTX64, or NTIA64 from the HostType section, the functionality of lspasswd.exe is affected. The lspasswd command registers a password for a Windows user account.

HostType section structure

The first line consists of the mandatory keyword TYPENAME.

Subsequent lines name valid host types.

Example HostType section

Begin HostType
TYPENAME
SOL64
SOLSPARC
LINUX86LINUXPPC
LINUX64
NTX86
NTX64
NTIA64
End HostType

TYPENAME

Host type names are usually based on a combination of the hardware name and operating system. If your site already has a system for naming host types, you can use the same names for LSF.

HostModel section

(Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model.

LSF uses the relative CPU scaling factor to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The CPU factor affects the calculation of job execution time limits and accounting. Using large or inaccurate values for the CPU factor can cause confusing results when CPU time limits or accounting are used.

HostModel section structure

The first line consists of the mandatory keywords MODELNAME, CPUFACTOR, and ARCHITECTURE.

Subsequent lines define a model and its CPU factor.

Example HostModel section

Begin HostModel MODELNAME  CPUFACTOR     ARCHITECTURE
PC400        13.0        (i86pc_400 i686_400)
PC450        13.2        (i86pc_450 i686_450)
Sparc5F       3.0        (SUNWSPARCstation5_170_sparc)
Sparc20       4.7        (SUNWSPARCstation20_151_sparc)
Ultra5S      10.3        (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9)
End HostModel

ARCHITECTURE

(Reserved for system use only) Indicates automatically detected host models that correspond to the model names.

CPUFACTOR

Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

MODELNAME

Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

About automatically detected host models and types

When you first install LSF, you do not necessarily need to assign models and types to hosts in lsf.cluster.cluster_name. If you do not assign models and types to hosts in lsf.cluster.cluster_name, LIM automatically detects the model and type for the host.

If you have versions earlier than LSF 4.0, you may have host models and types already assigned to hosts. You can take advantage of automatic detection of host model and type also.

Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. LSF will automatically detect the change.

Mapping to CPU factors

Automatically detected models are mapped to the short model names in lsf.shared in the ARCHITECTURE column. Model strings in the ARCHITECTURE column are only used for mapping to the short model names.

Example lsf.shared file:
Begin HostModel
MODELNAME   CPUFACTOR     ARCHITECTURE
SparcU5     5.0           (SUNWUltra510_270_sparcv9)
PC486       2.0           (i486_33 i486_66)
PowerPC     3.0           (PowerPC12 PowerPC16 PowerPC31)
End HostModel

If an automatically detected host model cannot be matched with the short model name, it is matched to the best partial match and a warning message is generated.

If a host model cannot be detected or is not supported, it is assigned the DEFAULT model name and an error message is generated.

Naming convention

Models that are automatically detected are named according to the following convention:
hardware_platform [_processor_speed[_processor_type]]
where:
  • hardware_platform is the only mandatory component

  • processor_speed is the optional clock speed and is used to differentiate computers within a single platform

  • processor_type is the optional processor manufacturer used to differentiate processors with the same speed

  • Underscores (_) between hardware_platform, processor_speed, processor_type are mandatory.

Resource section

Optional. Defines resources (must be done by the LSF administrator).

Resource section structure

The first line consists of the keywords. RESOURCENAME and DESCRIPTION are mandatory. The other keywords are optional. Subsequent lines define resources.

Example Resource section

Begin Resource
RESOURCENAME  TYPE    INTERVAL INCREASING CONSUMABLE DESCRIPTION  # Keywords
   patchrev   Numeric  ()        Y         ()         (Patch revision)
   specman    Numeric  ()        N         ()         (Specman)
   switch     Numeric  ()        Y         N          (Network Switch)
   rack       String   ()        ()        ()         (Server room rack)
   owner      String   ()        ()        ()         (Owner of the host)
   elimres    Numeric  10        Y         ()         (elim generated index)
   ostype     String   ()        ()        ()         (Operating system and version)
   lmhostid   String   ()        ()        ()         (FlexLM's lmhostid)
   limversion String   ()        ()        ()         (Version of LIM binary)
End Resource 

RESOURCENAME

The name you assign to the new resource. An arbitrary character string.
  • A resource name cannot begin with a number.

  • A resource name cannot contain any of the following characters:

    :  .  (  )  [  +  - *  /  !  &  | <  >  @  =
  • A resource name cannot be any of the following reserved names:

    cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it 
    mem ncpus define_ncpus_cores define_ncpus_procs 
    define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
  • To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (upper case or lower case). Resource requirment strings, such as -R "infra" or -R "nano" will cause an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.

  • Resource names are case sensitive

  • Resource names can be up to 39 characters in length

  • For Solaris machines, the keyword int is reserved and cannot be used.

TYPE

The type of resource:
  • Boolean—Resources that have a value of 1 on hosts that have the resource and 0 otherwise.

  • Numeric—Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor.

  • String— Resources that take string values, such as host type, host model, host status.

Default

If TYPE is not given, the default type is Boolean.

INTERVAL

Optional. Applies to dynamic resources only.

Defines the time interval (in seconds) at which the resource is sampled by the ELIM.

If INTERVAL is defined for a numeric resource, it becomes an external load index.

Default

If INTERVAL is not given, the resource is considered static.

INCREASING

Applies to numeric resources only.

If a larger value means greater load, INCREASING should be defined as Y. If a smaller value means greater load, INCREASING should be defined as N.

CONSUMABLE

Explicitly control if a resource is consumable. Applies to static or dynamic numeric resources.

Static and dynamic numeric resources can be specified as consumable. CONSUMABLE is optional. The defaults for the consumable attribute are:
  • Built-in indicies:
    • The following are consumable: r15s, r1m, r15m, ut, pg, io, ls, it, tmp, swp, mem.

    • All other built-in static resources are not consumable. (e.g., ncpus, ndisks, maxmem, maxswp, maxtmp, cpuf, type, model, status, rexpri, server, hname).

  • External shared resources:
    • All numeric resources are consumable.

    • String and boolean resources are not consumable.

You should only specify consumable resources in the rusage section of a resource requirement string. Non-consumable resources are ignored in rusage sections.

A non-consumable resource should not be releasable. Non-consumable numeric resource should be able to used in order, select and same sections of a resource requirement string.

When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.

DESCRIPTION

Brief description of the resource.

The information defined here will be returned by the ls_info() API call or printed out by the lsinfo command as an explanation of the meaning of the resource.

RELEASE

Applies to numeric shared resources only.

Controls whether LSF releases the resource when a job using the resource is suspended. When a job using a shared resource is suspended, the resource is held or released by the job depending on the configuration of this parameter.

Specify N to hold the resource, or specify Y to release the resource.

Default

N