Enable LSF HPC Features

HPC features are installed on UNIX or Linux hosts as part of the PARALLEL template. When you install, some changes are made for you automatically. You should add the appropriate resource names under the RESOURCES column of the Host section of lsf.cluster.cluster_name.

The HPC feature installation Automatically configures the following files:

  • lsb.modules

  • lsb.resources

  • lsb.queues

  • lsf.cluster

  • lsf.conf

  • lsf.shared

lsb.modules

  • Adds the external scheduler plugin module names to the PluginModule section of lsb.modules:

Begin PluginModule
SCH_PLUGIN          RB_PLUGIN   SCH_DISABLE_PHASES 
schmod_default        ()                 ()
schmod_fcfs           ()                 ()
schmod_fairshare      ()                 ()
schmod_limit          ()                 ()
schmod_parallel       ()                 ()
schmod_reserve        ()                 ()
schmod_mc             ()                 ()
schmod_preemption     ()                 ()
schmod_advrsv         ()                 ()
schmod_ps             ()                 ()
schmod_affinity       ()                 ()
#schmod_dc            ()                 ()
schmod_aps            ()                 ()
schmod_cpuset         ()                 ()
End PluginModule
Note:

The HPC plugin names must be configured after the standard LSF plugin names in the PluginModule list.

lsb.resources

For IBM POE jobs, lsfinstall configures the ReservationUsage section in lsb.resources to reserve HPS resources on a per-slot basis.

Resource usage defined in the ReservationUsage section overrides the cluster-wide RESOURCE_RESERVE_PER_SLOT parameter defined in lsb.params if it also exists.

Begin ReservationUsage
RESOURCE           METHOD
adapter_windows    PER_SLOT
nrt_windows        PER_SLOT
End ReservationUsage

lsb.queues

Configures hpc_ibm queue for IBM POE jobs and the hpc_ibm_tv queue for debugging IBM POE jobs:

Begin Queue
QUEUE_NAME   = hpc_linux
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0   # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000     # jobs data segment limit
#CORELIMIT    = 20000
#PROCLIMIT    = 5         # job processor limit
#USERS        = all       # users who can submit jobs to this queue
#HOSTS        = all       # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
DESCRIPTION  = IBM Platform LSF 9.1 for linux.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_linux_tv
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000     # jobs data segment limit
#CORELIMIT    = 20000
#PROCLIMIT    = 5         # job processor limit
#USERS        = all       # users who can submit jobs to this queue
#HOSTS        = all       # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION  = IBM Platform LSF 9.1 for linux debug queue.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_ibm
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA  # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000      # jobs data segment limit
#CORELIMIT    = 20000
#PROCLIMIT    = 5          # job processor limit
#USERS        = all        # users who can submit jobs to this queue
#HOSTS        = all        # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
EXCLUSIVE = Y
REQUEUE_EXIT_VALUES = 133 134 135
DESCRIPTION  = IBM Platform LSF 9.1 for IBM. This queue is to run POE jobs ONLY.
End Queue
 
Begin Queue
QUEUE_NAME   = hpc_ibm_tv
PRIORITY     = 30
NICE         = 20
#RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
#r1m         = 0.7/2.0    # loadSched/loadStop
#r15m         = 1.0/2.5
#pg           = 4.0/8
#ut           = 0.2
#io           = 50/240
#CPULIMIT     = 180/hostA  # 3 hours of host hostA
#FILELIMIT    = 20000
#DATALIMIT    = 20000      # jobs data segment limit
#CORELIMIT    = 20000
#PROCLIMIT    = 5          # job processor limit
#USERS        = all        # users who can submit jobs to this queue
#HOSTS        = all        # hosts on which jobs in this queue can run
#PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v Hey
RES_REQ = select[ poe > 0 ]
REQUEUE_EXIT_VALUES = 133 134 135
TERMINATE_WHEN = LOAD PREEMPT WINDOW
RERUNNABLE = NO
INTERACTIVE = NO
DESCRIPTION  = IBM Platform LSF 9.1 for IBM debug queue. This queue is to run POE jobs ONLY.
End Queue

lsf.cluster.cluster_name

For IBM POE jobs, configures the ResourceMap section of lsf.cluster.cluster_name to map the following shared resources for POE jobs to all hosts in the cluster:

Begin ResourceMap
RESOURCENAME        LOCATION
adapter_windows     [default]
ntbl_windows        [default]
poe                 [default]
dedicated_tasks     (0@[default])
ip_tasks            (0@[default])
us_tasks            (0@[default])
End ResourceMap

lsf.conf

  • LSB_SUB_COMMANDNAME=Y to lsf.conf to enable the LSF_SUB_COMMANDLINE environment variable required by esub.

  • LSF_ENABLE_EXTSCHEDULER=Y: LSF uses an external scheduler for topology-aware external scheduling.

  • LSB_CPUSET_BESTCPUS=Y: LSF schedules jobs based on the shortest CPU radius in the processor topology using a best-fit algorithm. On HP-UX hosts, sets the full path to the HP vendor MPI library libmpirm.sl LSF_VPLUGIN="/opt/mpi/lib/pa1.1/libmpirm.sl"

  • LSB_RLA_PORT=port_number, where port_number is the TCP port used for communication between the LSF HPC topology adapter (RLA) and sbatchd. The default port number is 6883.

  • LSB_SHORT_HOSTLIST=1: Displays an abbreviated list of hosts in bjobs and bhist for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the format processes*hostA.

lsf.shared

Defines the following shared resources required by HPC features in lsf.shared:

Begin Resource
RESOURCENAME    TYPE    INTERVAL INCREASING  DESCRIPTION       # Keywords
slurm           Boolean    ()    ()          (SLURM)
cpuset          Boolean    ()    ()          (CPUSET)
mpich_gm        Boolean    ()    ()          (MPICH GM MPI)
lammpi          Boolean    ()    ()          (LAM MPI)
mpichp4         Boolean    ()    ()          (MPICH P4 MPI)
mvapich         Boolean    ()    ()          (Infiniband MPI)
sca_mpimon      Boolean    ()    ()          (SCALI MPI)
ibmmpi          Boolean    ()    ()          (IBM POE MPI)
hpmpi           Boolean    ()    ()          (HP MPI)
intelmpi        Boolean    ()    ()          (Intel MPI)
crayxt3         Boolean    ()    ()          (Cray XT3 MPI)
crayx1          Boolean    ()    ()          (Cray X1 MPI)
fluent          Boolean    ()    ()          (fluent availability)
ls_dyna         Boolean    ()    ()          (ls_dyna availability)
nastran         Boolean    ()    ()          (nastran availability)
pvm             Boolean    ()    ()          (pvm availability)
openmp          Boolean    ()    ()          (openmp availability)
ansys           Boolean    ()    ()          (ansys availability)
blast           Boolean    ()    ()          (blast availability)
gaussian        Boolean    ()    ()          (gaussian availability)
lion            Boolean    ()    ()          (lion availability)
scitegic        Boolean    ()    ()          (scitegic availability)
schroedinger    Boolean    ()    ()          (schroedinger availability)
hmmer           Boolean    ()    ()          (hmmer availability)
adapter_windows Numeric    30    N    (free adapter windows on css0 on IBM SP)
ntbl_windows    Numeric    30    N    (free ntbl windows on IBM HPS)
poe             Numeric    30    N    (poe availability)
css0            Numeric    30    N    (free adapter windows on css0 on IBM SP)
csss            Numeric    30    N    (free adapter windows on csss on IBM SP)
dedicated_tasks Numeric    ()    Y    (running dedicated tasks)
ip_tasks        Numeric    ()    Y    (running IP tasks)
us_tasks        Numeric    ()    Y    (running US tasks)
End Resource