LSF Integration with Cray Linux

This topic applies to LSF 8.0 or later integration with Cray Linux Environment 4.0 or later. You must have LSF Standard (LSF must not be running in Express mode).

Download and Installation

Download the installation package and the distribution tar file for the LSF/Cray Linux (on CRAY XT/XE/XC) integration. For example, in LSF Version # release, the following files are needed:
- lsf<version>_lnx26-lib23-x64-cray.tar.Z
- lsf<version>_lsfinstall.tar.Z
  
  If you install on a Linux host, you may download lsf<version>_lsfinstall_linux_x86_64.tar.Z. If you install LSF 9.1.2 on a Linux host, you can download lsf<version>_no_jre_lsfinstall.tar.Z. The above two special installation packages are smaller in size since they either include the Linux version of the JRE package or do not include the JRE package.
Before running the installation, confirm the Cray Linux system is working:
1. On CLE 4.0 or above, confirm the existence of /opt/cray/rca/default/bin/rca-helper, /etc/xthostname and /etc/opt/cray/sdb/node_classes. Otherwise, confirm that the xtuname and xthostname commands exist and are in the $PATH.
2. Confirm that all compute PEs are in batch mode. If not, switch all compute PEs to batch mode and restart ALPS services on the boot node:
  - xtprocadmin -k m batch
  - $/etc/init.d/alps restart (optional)
  - apstat -rn (optional)
Follow the standard LSF installation procedure to install LSF on the boot nodes:
1. Run the xtopview command to switch to a shared root file system.
2. Edit the install.config file:
  - LSF_TOP=/software/lsf
  - LSF_CLUSTER_NAME=<crayxt machine name>
  - LSF_MASTER_LIST=<mast host candidates> # login nodes or service nodes
  - EGO_DAEMON_CONTROL=N
  - ENABLE_DYNAMIC_HOSTS=N
  - LSF_ADD_SERVERS=<service or login nodes>
  - ENABLE_HPC_CONFIG=Y # if you are installing LSF 9.1.1 or earlier versions
  - CONFIGURATION_TEMPLATE=PARALLEL # if you are installing LSF 9.1.2 or later versions
  LSF_MASTER_LIST and LSF_ADD_SERVERS should only include login nodes or service nodes.
  
  The startup/shutdown script for LSF daemons can be found in $LSF_SERVERDIR/lsf_daemons.
3. If you would like to join the CRAY Linux machine to an existing cluster, refer to Upgrade/Migration instructions.
As LSF administrator:
1. Add the following to /opt/xt-boot/default/etc/serv_cmd:
  - service_cmd_info='LSF-HPC',service_num=XXX,heartbeat=null
  - start_cmd='<$LSF_SERVERDIR>/lsf_daemons start'
  - stop_cmd='<$LSF_SERVERDIR>/lsf_daemons stop'
  - restart_cmd='<$LSF_SERVERDIR>/lsf_daemons restart'
  - fail_cmd='<$LSF_SERVERDIR>/lsf_daemons stop'
2. Create a service command: xtservcmd2db -f /opt/xt-boot/default/etc/serv_cmd.
3. Assign the LSF-HPC service to serv_cmd: xtservconfig -c login add LSF-HPC.
4. Exit xtopview and access a login node:
  - Make sure /ufs is shared among all login/service nodes and root and LSF administrators have write permission.
  - Set up sub-directories under /ufs the same as /opt/xt-lsfhpc/log and /opt/xt-lsfhpc/work (see section "File Structure" for details).
  - Make sure the directory ownership and permission mode are reserved (you can use the cp -r command), and that root and LSF administrators have write permission to the sub-directories under /ufs/lsfhpc.
Use the module command to set the LSF environment variables: module load xt-lsfhpc

Configuration

Modify $LSF_ENVDIR/lsf.conf (some of the parameters may have been added by LSF installer):
- LSB_SHAREDIR=/ufs/lsfhpc/work # A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
- LSF_LOGDIR=/ufs/lsfhpc/log# A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
- LSF_LIVE_CONFDIR=/ufs/lsfhpc/work/<cluster_name>/live_confdir # A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
- LSB_RLA_PORT=21787 # a unique port
- LSB_SHORT_HOSTLIST=1
- LSF_ENABLE_EXTSCHEDULER=Y
- LSB_SUB_COMMANDNAME=Y
- LSF_CRAY_PS_CLIENT=/usr/bin/apbasil
- LSF_LIMSIM_PLUGIN="liblimsim_craylinux"
- LSF_CRAYLINUX_FRONT_NODES="nid00060 nid00062" # A list of Cray Linux login/service nodes with LSF daemons started and running.
- LSF_CRAYLINUX_FRONT_NODES_POLL_INTERVAL=120 # Interval for Master Lim polling RLA to query computer node status and configuration information. Default value is 120 seconds. Any value less than 120 seconds will be reset to default
- LSB_MIG2PEND=1

Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.modules. Make sure schmod_craylinux is the last plug-in module and schmod_crayxt3 is commented out. If you do not use the MultiCluster feature or CPUSET integration, comment them both out.

Begin PluginModule
SCH_PLUGIN        RB_PLUGIN   SCH_DISABLE_PHASES
schmod_default     ()         ()
schmod_fcfs        ()         ()
schmod_fairshare   ()         ()
schmod_limit       ()         ()
schmod_parallel    ()         ()
schmod_reserve     ()         ()
#schmod_mc         ()         ()
schmod_preemption  ()         ()
schmod_advrsv      ()         ()
schmod_ps          ()         ()
schmod_aps         ()         ()
#schmod_cpuset     ()         ()
#schmod_crayxt3    ()         ()
schmod_craylinux   ()         ()
End PluginModule

From a log in node, run $LSF_BINDIR/genVnodeConf. This command will generate a list of compute nodes in BATCH mode. You may add the compute nodes to the HOST section in $LSF_ENVDIR/lsf.cluster.<clustername>.

HOSTNAME  model   type   server    r1m  mem  swp  RESOURCES
nid00038     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00039     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00040     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00041     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00042     !     !       1       3.5   ()  ()   (craylinux vnode gpu)
nid00043     !     !       1       3.5   ()  ()   (craylinux vnode gpu)
nid00044     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00045     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00046     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00047     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00048     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00049     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00050     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00051     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00052     !     !       1       3.5   ()  ()   (craylinux vnode gpu)
nid00053     !     !       1       3.5   ()  ()   (craylinux vnode gpu)
nid00054     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00055     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00056     !     !       1       3.5   ()  ()   (craylinux vnode)
nid00057     !     !       1       3.5   ()  ()   (craylinux vnode)

Configure $LSF_ENVDIR/hosts. Make sure the IP addresses of computer nodes do not conflict with any IP being used.

cat $LSF_ENVDIR/hosts
 
172.25.235.55  amd07.lsf.platform.com amd07
10.128.0.34    nid00033     c0-0c1s0n3  sdb001  sdb002
10.128.0.61    nid00060     c0-0c1s1n0  login   login1  castor-p2
10.128.0.36    nid00035     c0-0c1s1n3
10.128.0.59    nid00058     c0-0c1s2n0
10.128.0.38    nid00037     c0-0c1s2n3
10.128.0.57    nid00056     c0-0c1s3n0
10.128.0.58    nid00057     c0-0c1s3n1
10.128.0.39    nid00038     c0-0c1s3n2
10.128.0.40    nid00039     c0-0c1s3n3
10.128.0.55    nid00054     c0-0c1s4n0
10.128.0.56    nid00055     c0-0c1s4n1
10.128.0.41    nid00040     c0-0c1s4n2
10.128.0.42    nid00041     c0-0c1s4n3
10.128.0.53    nid00052     c0-0c1s5n0
10.128.0.54    nid00053     c0-0c1s5n1
10.128.0.43    nid00042     c0-0c1s5n2
10.128.0.44    nid00043     c0-0c1s5n3
10.128.0.51    nid00050     c0-0c1s6n0
10.128.0.52    nid00051     c0-0c1s6n1
10.128.0.45    nid00044     c0-0c1s6n2
10.128.0.46    nid00045     c0-0c1s6n3
10.128.0.49    nid00048     c0-0c1s7n0
10.128.0.50    nid00049     c0-0c1s7n1
10.128.0.47    nid00046     c0-0c1s7n2
10.128.0.48    nid00047     c0-0c1s7n3
10.131.255.251 sdb sdb-p2 syslog ufs

Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.hosts. Make sure Cray Linux login/service nodes that are also LSF server hosts have a large number set in the MXJ column (larger than the total number of PEs).

Begin Host
HOST_NAME   MXJ     r1m  pg  ls  tmp  DISPATCH_WINDOW   # Keywords
 nid00060   9999    ()   ()  ()  ()   ()                # Example
 nid00062   9999    ()   ()  ()  ()   ()                # Example
 default     !      ()   ()  ()  ()   ()                # Example
End Host

In LSF 9.1.2 or above, you need to disable AFFINITY on Cray compute nodes.

Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.queues.

JOB_CONTROLS and RERUNNABLE are required.
Comment out all loadSched/loadStop lines.
DEF_EXTSCHED and MANDATORY_EXTSCHED are optional.
PRE_EXEC and POST_EXEC are required to run CCM jobs.
Refer to CRAY Guide to find the scripts.

Begin Queue
  QUEUE_NAME   = normal
  PRIORITY     = 30
  NICE         = 20
  PREEMPTION   = PREEMPTABLE
  JOB_CONTROLS = SUSPEND[bmig $LSB_BATCH_JID]
  RERUNNABLE   = Y
  #RUN_WINDOW  = 5:19:00-1:8:30 20:00-8:30
  #r1m         = 0.7/2.0   # loadSched/loadStop
  #r15m        = 1.0/2.5
  #pg          = 4.0/8
  #ut          = 0.2
  #io          = 50/240
  #CPULIMIT    = 180/hostA  # 3 hours of hostA
  #FILELIMIT   = 20000
  #DATALIMIT   = 20000  # jobs data segment limit
  #CORELIMIT   = 20000
  #PROCLIMIT   = 5   # job processor limit
  #USERS       = all   # users who can submit jobs to this queue
  #HOSTS       = all   # hosts on which jobs in this queue can run
  #PRE_EXEC    = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
  #POST_EXEC   = /usr/local/lsf/misc/testq_post |grep -v "Hey"
  #REQUEUE_EXIT_VALUES = 55 34 78
  #APS_PRIORITY = WEIGHT[[RSRC, 10.0] [MEM, 20.0] [PROC, 2.5] [QPRIORITY, 2.0]] \
  #LIMIT[[RSRC, 3.5] [QPRIORITY, 5.5]] \
  #GRACE_PERIOD[[QPRIORITY, 200s] [MEM, 10m] [PROC, 2h]]
  DESCRIPTION  = For normal low priority jobs, running only if hosts are lightly loaded.
End Queue
 
Begin Queue
  QUEUE_NAME    = owners
  PRIORITY      = 43
  JOB_CONTROLS  = SUSPEND[bmig $LSB_BATCH_JID]
  RERUNNABLE    = YES
  PREEMPTION    = PREEMPTIVE
  NICE          = 10
  #RUN_WINDOW   = 5:19:00-1:8:30 20:00-8:30
  r1m           = 1.2/2.6
  #r15m         = 1.0/2.6
  #r15s         = 1.0/2.6
  pg            = 4/15
  io            = 30/200
  swp           = 4/1
  tmp           = 1/0
  #CPULIMIT     = 24:0/hostA  # 24 hours of hostA
  #FILELIMIT    = 20000
  #DATALIMIT    = 20000  # jobs data segment limit
  #CORELIMIT    = 20000
  #PROCLIMIT    = 5   # job processor limit
  #USERS        = user1 user2
  #HOSTS        = hostA hostB
  #ADMINISTRATORS = user1 user2
  #PRE_EXEC     = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
  #POST_EXEC    = /usr/local/lsf/misc/testq_post |grep -v "Hey"
  #REQUEUE_EXIT_VALUES = 55 34 78
  DESCRIPTION   = For owners of some machines, only users listed in the HOSTS\
  section can submit jobs to this queue.
End Queue

Modify $LSF_ENVDIR/lsf.shared. Make sure the following boolean resources are defined in RESOURCE section:

vnode      Boolean () () (sim node)
gpu        Boolean () () (gpu)
frontnode  Boolean () () (login/service node)
craylinux  Boolean () () (Cray XT/XE MPI)

By default, Comprehensive System Accounting (CSA) is enabled. If CSA is not installed in your environment, you must disable CSA by setting LSF_ENABLE_CSA=N in lsf.conf.
Use the service command to start and stop the LSF services as needed:
- service LSF-HPC start
- service LSF-HPC stop

File Structure

LSF is installed in LSF_TOP (e.g. /software/lsf/). The directory layout after installation is:

/ufs
`-- lsfhpc
    |-- log
    |   
    `-- work
        `-- <cluster_name>
            |-- craylinux
            |-- logdir
            |-- lsf_cmddir
            |-- live_confdir
            `-- lsf_indir

There are eight directories and three files in /software/lsf/:

|--<version>
|   |-- include
|   |   `-- lsf
|   |-- install
|   |   |-- instlib
|   |   |-- patchlib
|   |   `-- scripts
|   |-- linux2.6-glibc2.3-x86_64-cray
|   |   |-- bin
|   |   |-- etc
|   |   |   `-- scripts
|   |   `-- lib
|   |-- man
|   |   |-- man1
|   |   |-- man3
|   |   |-- man5
|   |   `-- man8
|   |-- misc
|   |   |-- conf_tmpl
|   |   |   |-- eservice
|   |   |   |   |-- esc
|   |   |   |   |   `-- conf
|   |   |   |   |       `-- services
|   |   |   |   `-- esd
|   |   |   |       `-- conf
|   |   |   |           `-- named
|   |   |   |               |-- conf
|   |   |   |               `-- namedb
|   |   |   `-- kernel
|   |   |       |-- conf
|   |   |       |   `-- mibs
|   |   |       |-- log
|   |   |       `-- work
|   |   |-- config
|   |   |-- examples
|   |   |   |-- blastparallel
|   |   |   |-- blogin
|   |   |   |-- dr
|   |   |   |-- eevent
|   |   |   |-- external_plugin
|   |   |   |-- extsched
|   |   |   |-- reselim
|   |   |   |-- web-lsf
|   |   |   |   |-- cgi-bin
|   |   |   |   |-- doc
|   |   |   |   `-- lsf_html
|   |   |   `-- xelim
|   |   |-- lsmake
|   |   |-- lstcsh
|   |   `-- src
|   |-- schema
|   |   `-- samples
|   `-- scripts
|-- conf
|   |-- ego
|   |   `-- <cluster_name>
|   |       |-- eservice
|   |       |   |-- esc
|   |       |   |   `-- conf
|   |       |   |       `-- services
|   |       |   `-- esd
|   |       |       `-- conf
|   |       |           `-- named
|   |       |               |-- conf
|   |       |               `-- namedb
|   |       `-- kernel
|   |           `-- mibs
|   `-- lsbatch
|       `-- <cluster_name>
|           `-- configdir
|-- log
|-- patch
|   |-- backup
|   |-- lock
|   `-- patchdb
|       `-- PackageInfo_LSF<version>_linux2.6-glibc2.3-x86_64-cray
`-- work
`-- <cluster_name>
        |-- ego
        |-- live_confdir
        |-- logdir
        |-- lsf_cmddir
        `-- lsf_indir

Submit and Run Parallel Jobs

Before you submit jobs to the cluster, be aware that CLE4.0 does not support multiple jobs running on one compute node. All ALPS reservations created by LSF will have the "mode=EXCLUSIVE" attribute. You can define a limit to make sure LSF does not dispatch jobs to compute nodes where a job has been running.

Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.resources:

Begin Limit
  NAME     = COMPUTE_NODES_LIMIT
  USERS    = all
  PER_HOST = list_of_compute_nodes #This limit applies to compute nodes only.
  JOBS     = 1
End Limit

There are other ways in LSF to enforce this limitation for ALPS:

To submit a job that requires Cray Linux reservations (e.g., aprun job, CCM job), compound resource requirements must be used:

bsub -extsched "CRAYLINUX[]" -R "1*{select[craylinux && \!vnode]} +

n*{select[vnode && craylinux] span[ptile=q*d]}" aprun -n y -d p -N q a.out

n must be greater than or equal to MAX(y*p, p*q) (the default of y p q is 1).
To submit a job that requires Cray Linux reservations with GPU (e.g., aprun job, CCM job):

bsub -extsched "CRAYLINUX[GPU]" -R "1*{select[craylinux && \!vnode]} + n*{select[vnode

&& craylinux && gpu] span[ptile=q*d] rusage[jobcnt=1]}" aprun -n y -d p -N q a.out

n must be greater than or equal to MAX(y*p, p*q) (the default of y p q is 1).
To submit a job that runs on Cray service/login nodes without creating Cray Linux reservations:

bsub -R "select[craylinux && frontnodes]" hostname
The following jobs with wrong RESREQ will be detected and put in pending state:
- Jobs asking for vnode but without CRAYLINUX[] specified. The pending reason is the job cannot run on hosts with vnode.
- Jobs with CRAYLINUX[] but the allocation by LSF does not contain at least one front node and at least one vnode. The pending reason is: Cannot create/confirm a reservation by apbasil/catnip
To create Advance Reservation, you need to complete the following steps:
1. Create AR on compute nodes (hosts with craylinux && vnode).
2. Add slots on front nodes (host with craylinux && \!vnode).
3. Submit jobs and specify the Advance Reservation for the job as usual.

Command Description

The bjobs/bhists/bacct commands display reservation_id under additionalInfo.

Assumptions and Limitations

After the patch has been installed and configured, advance reservation, preemption, and reservation scheduling policies are supported with the following limitations:

Not all scheduling policies behave the same way or automatically support the same things as standard LSF. ALPS in CLE4.0 only supports node exclusive reservations (no two jobs can run on the same node). Resource reservations (slot and resource) in LSF are impacted as jobs that reserved slots may not be able to run due to this ALPS limitation.
Only one Cray Linux machine per cluster is allowed.