This topic applies to LSF 8.0 or later integration with Cray Linux Environment 4.0 or later. You must have LSF Standard (LSF must not be running in Express mode).
Download the installation package and the distribution tar file for the LSF/Cray Linux (on CRAY XT/XE/XC) integration. For example, in LSF Version # release, the following files are needed:
lsf<version>_lnx26-lib23-x64-cray.tar.Z
lsf<version>_lsfinstall.tar.Z
If you install on a Linux host, you may download lsf<version>_lsfinstall_linux_x86_64.tar.Z. If you install LSF 9.1.2 on a Linux host, you can download lsf<version>_no_jre_lsfinstall.tar.Z. The above two special installation packages are smaller in size since they either include the Linux version of the JRE package or do not include the JRE package.
Before running the installation, confirm the Cray Linux system is working:
On CLE 4.0 or above, confirm the existence of /opt/cray/rca/default/bin/rca-helper, /etc/xthostname and /etc/opt/cray/sdb/node_classes. Otherwise, confirm that the xtuname and xthostname commands exist and are in the $PATH.
Confirm that all compute PEs are in batch mode. If not, switch all compute PEs to batch mode and restart ALPS services on the boot node:
xtprocadmin -k m batch
$/etc/init.d/alps restart (optional)
apstat -rn (optional)
Follow the standard LSF installation procedure to install LSF on the boot nodes:
Run the xtopview command to switch to a shared root file system.
Edit the install.config file:
LSF_TOP=/software/lsf
LSF_CLUSTER_NAME=<crayxt machine name>
LSF_MASTER_LIST=<mast host candidates> # login nodes or service nodes
EGO_DAEMON_CONTROL=N
ENABLE_DYNAMIC_HOSTS=N
LSF_ADD_SERVERS=<service or login nodes>
ENABLE_HPC_CONFIG=Y # if you are installing LSF 9.1.1 or earlier versions
CONFIGURATION_TEMPLATE=PARALLEL # if you are installing LSF 9.1.2 or later versions
LSF_MASTER_LIST and LSF_ADD_SERVERS should only include login nodes or service nodes.
The startup/shutdown script for LSF daemons can be found in $LSF_SERVERDIR/lsf_daemons.
If you would like to join the CRAY Linux machine to an existing cluster, refer to Upgrade/Migration instructions.
As LSF administrator:
Add the following to /opt/xt-boot/default/etc/serv_cmd:
service_cmd_info='LSF-HPC',service_num=XXX,heartbeat=null
start_cmd='<$LSF_SERVERDIR>/lsf_daemons start'
stop_cmd='<$LSF_SERVERDIR>/lsf_daemons stop'
restart_cmd='<$LSF_SERVERDIR>/lsf_daemons restart'
fail_cmd='<$LSF_SERVERDIR>/lsf_daemons stop'
Create a service command: xtservcmd2db -f /opt/xt-boot/default/etc/serv_cmd.
Assign the LSF-HPC service to serv_cmd: xtservconfig -c login add LSF-HPC.
Exit xtopview and access a login node:
Make sure /ufs is shared among all login/service nodes and root and LSF administrators have write permission.
Set up sub-directories under /ufs the same as /opt/xt-lsfhpc/log and /opt/xt-lsfhpc/work (see section "File Structure" for details).
Make sure the directory ownership and permission mode are reserved (you can use the cp -r command), and that root and LSF administrators have write permission to the sub-directories under /ufs/lsfhpc.
Use the module command to set the LSF environment variables: module load xt-lsfhpc
Modify $LSF_ENVDIR/lsf.conf (some of the parameters may have been added by LSF installer):
LSB_SHAREDIR=/ufs/lsfhpc/work # A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
LSF_LOGDIR=/ufs/lsfhpc/log# A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
LSF_LIVE_CONFDIR=/ufs/lsfhpc/work/<cluster_name>/live_confdir # A shared file system that is accessible by root and LSF admin on both master hosts and Cray Linux login/service nodes.
LSB_RLA_PORT=21787 # a unique port
LSB_SHORT_HOSTLIST=1
LSF_ENABLE_EXTSCHEDULER=Y
LSB_SUB_COMMANDNAME=Y
LSF_CRAY_PS_CLIENT=/usr/bin/apbasil
LSF_LIMSIM_PLUGIN="liblimsim_craylinux"
LSF_CRAYLINUX_FRONT_NODES="nid00060 nid00062" # A list of Cray Linux login/service nodes with LSF daemons started and running.
LSF_CRAYLINUX_FRONT_NODES_POLL_INTERVAL=120 # Interval for Master Lim polling RLA to query computer node status and configuration information. Default value is 120 seconds. Any value less than 120 seconds will be reset to default
LSB_MIG2PEND=1
Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.modules. Make sure schmod_craylinux is the last plug-in module and schmod_crayxt3 is commented out. If you do not use the MultiCluster feature or CPUSET integration, comment them both out.
Begin PluginModule
SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES
schmod_default () ()
schmod_fcfs () ()
schmod_fairshare () ()
schmod_limit () ()
schmod_parallel () ()
schmod_reserve () ()
#schmod_mc () ()
schmod_preemption () ()
schmod_advrsv () ()
schmod_ps () ()
schmod_aps () ()
#schmod_cpuset () ()
#schmod_crayxt3 () ()
schmod_craylinux () ()
End PluginModule
From a log in node, run $LSF_BINDIR/genVnodeConf. This command will generate a list of compute nodes in BATCH mode. You may add the compute nodes to the HOST section in $LSF_ENVDIR/lsf.cluster.<clustername>.
HOSTNAME model type server r1m mem swp RESOURCES
nid00038 ! ! 1 3.5 () () (craylinux vnode)
nid00039 ! ! 1 3.5 () () (craylinux vnode)
nid00040 ! ! 1 3.5 () () (craylinux vnode)
nid00041 ! ! 1 3.5 () () (craylinux vnode)
nid00042 ! ! 1 3.5 () () (craylinux vnode gpu)
nid00043 ! ! 1 3.5 () () (craylinux vnode gpu)
nid00044 ! ! 1 3.5 () () (craylinux vnode)
nid00045 ! ! 1 3.5 () () (craylinux vnode)
nid00046 ! ! 1 3.5 () () (craylinux vnode)
nid00047 ! ! 1 3.5 () () (craylinux vnode)
nid00048 ! ! 1 3.5 () () (craylinux vnode)
nid00049 ! ! 1 3.5 () () (craylinux vnode)
nid00050 ! ! 1 3.5 () () (craylinux vnode)
nid00051 ! ! 1 3.5 () () (craylinux vnode)
nid00052 ! ! 1 3.5 () () (craylinux vnode gpu)
nid00053 ! ! 1 3.5 () () (craylinux vnode gpu)
nid00054 ! ! 1 3.5 () () (craylinux vnode)
nid00055 ! ! 1 3.5 () () (craylinux vnode)
nid00056 ! ! 1 3.5 () () (craylinux vnode)
nid00057 ! ! 1 3.5 () () (craylinux vnode)
cat $LSF_ENVDIR/hosts
172.25.235.55 amd07.lsf.platform.com amd07
10.128.0.34 nid00033 c0-0c1s0n3 sdb001 sdb002
10.128.0.61 nid00060 c0-0c1s1n0 login login1 castor-p2
10.128.0.36 nid00035 c0-0c1s1n3
10.128.0.59 nid00058 c0-0c1s2n0
10.128.0.38 nid00037 c0-0c1s2n3
10.128.0.57 nid00056 c0-0c1s3n0
10.128.0.58 nid00057 c0-0c1s3n1
10.128.0.39 nid00038 c0-0c1s3n2
10.128.0.40 nid00039 c0-0c1s3n3
10.128.0.55 nid00054 c0-0c1s4n0
10.128.0.56 nid00055 c0-0c1s4n1
10.128.0.41 nid00040 c0-0c1s4n2
10.128.0.42 nid00041 c0-0c1s4n3
10.128.0.53 nid00052 c0-0c1s5n0
10.128.0.54 nid00053 c0-0c1s5n1
10.128.0.43 nid00042 c0-0c1s5n2
10.128.0.44 nid00043 c0-0c1s5n3
10.128.0.51 nid00050 c0-0c1s6n0
10.128.0.52 nid00051 c0-0c1s6n1
10.128.0.45 nid00044 c0-0c1s6n2
10.128.0.46 nid00045 c0-0c1s6n3
10.128.0.49 nid00048 c0-0c1s7n0
10.128.0.50 nid00049 c0-0c1s7n1
10.128.0.47 nid00046 c0-0c1s7n2
10.128.0.48 nid00047 c0-0c1s7n3
10.131.255.251 sdb sdb-p2 syslog ufs
Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.hosts. Make sure Cray Linux login/service nodes that are also LSF server hosts have a large number set in the MXJ column (larger than the total number of PEs).
Begin Host
HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW # Keywords
nid00060 9999 () () () () () # Example
nid00062 9999 () () () () () # Example
default ! () () () () () # Example
End Host
In LSF 9.1.2 or above, you need to disable AFFINITY on Cray compute nodes.
Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.queues.
JOB_CONTROLS and RERUNNABLE are required.
Comment out all loadSched/loadStop lines.
DEF_EXTSCHED and MANDATORY_EXTSCHED are optional.
PRE_EXEC and POST_EXEC are required to run CCM jobs.
Refer to CRAY Guide to find the scripts.
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
NICE = 20
PREEMPTION = PREEMPTABLE
JOB_CONTROLS = SUSPEND[bmig $LSB_BATCH_JID]
RERUNNABLE = Y
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#PROCLIMIT = 5 # job processor limit
#USERS = all # users who can submit jobs to this queue
#HOSTS = all # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v "Hey"
#REQUEUE_EXIT_VALUES = 55 34 78
#APS_PRIORITY = WEIGHT[[RSRC, 10.0] [MEM, 20.0] [PROC, 2.5] [QPRIORITY, 2.0]] \
#LIMIT[[RSRC, 3.5] [QPRIORITY, 5.5]] \
#GRACE_PERIOD[[QPRIORITY, 200s] [MEM, 10m] [PROC, 2h]]
DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded.
End Queue
Begin Queue
QUEUE_NAME = owners
PRIORITY = 43
JOB_CONTROLS = SUSPEND[bmig $LSB_BATCH_JID]
RERUNNABLE = YES
PREEMPTION = PREEMPTIVE
NICE = 10
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
r1m = 1.2/2.6
#r15m = 1.0/2.6
#r15s = 1.0/2.6
pg = 4/15
io = 30/200
swp = 4/1
tmp = 1/0
#CPULIMIT = 24:0/hostA # 24 hours of hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#PROCLIMIT = 5 # job processor limit
#USERS = user1 user2
#HOSTS = hostA hostB
#ADMINISTRATORS = user1 user2
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v "Hey"
#REQUEUE_EXIT_VALUES = 55 34 78
DESCRIPTION = For owners of some machines, only users listed in the HOSTS\
section can submit jobs to this queue.
End Queue
Modify $LSF_ENVDIR/lsf.shared. Make sure the following boolean resources are defined in RESOURCE section:
vnode Boolean () () (sim node)
gpu Boolean () () (gpu)
frontnode Boolean () () (login/service node)
craylinux Boolean () () (Cray XT/XE MPI)
By default, Comprehensive System Accounting (CSA) is enabled. If CSA is not installed in your environment, you must disable CSA by setting LSF_ENABLE_CSA=N in lsf.conf.
Use the service command to start and stop the LSF services as needed:
service LSF-HPC start
service LSF-HPC stop
LSF is installed in LSF_TOP (e.g. /software/lsf/). The directory layout after installation is:
/ufs
`-- lsfhpc
|-- log
|
`-- work
`-- <cluster_name>
|-- craylinux
|-- logdir
|-- lsf_cmddir
|-- live_confdir
`-- lsf_indir
There are eight directories and three files in /software/lsf/:
|--<version>
| |-- include
| | `-- lsf
| |-- install
| | |-- instlib
| | |-- patchlib
| | `-- scripts
| |-- linux2.6-glibc2.3-x86_64-cray
| | |-- bin
| | |-- etc
| | | `-- scripts
| | `-- lib
| |-- man
| | |-- man1
| | |-- man3
| | |-- man5
| | `-- man8
| |-- misc
| | |-- conf_tmpl
| | | |-- eservice
| | | | |-- esc
| | | | | `-- conf
| | | | | `-- services
| | | | `-- esd
| | | | `-- conf
| | | | `-- named
| | | | |-- conf
| | | | `-- namedb
| | | `-- kernel
| | | |-- conf
| | | | `-- mibs
| | | |-- log
| | | `-- work
| | |-- config
| | |-- examples
| | | |-- blastparallel
| | | |-- blogin
| | | |-- dr
| | | |-- eevent
| | | |-- external_plugin
| | | |-- extsched
| | | |-- reselim
| | | |-- web-lsf
| | | | |-- cgi-bin
| | | | |-- doc
| | | | `-- lsf_html
| | | `-- xelim
| | |-- lsmake
| | |-- lstcsh
| | `-- src
| |-- schema
| | `-- samples
| `-- scripts
|-- conf
| |-- ego
| | `-- <cluster_name>
| | |-- eservice
| | | |-- esc
| | | | `-- conf
| | | | `-- services
| | | `-- esd
| | | `-- conf
| | | `-- named
| | | |-- conf
| | | `-- namedb
| | `-- kernel
| | `-- mibs
| `-- lsbatch
| `-- <cluster_name>
| `-- configdir
|-- log
|-- patch
| |-- backup
| |-- lock
| `-- patchdb
| `-- PackageInfo_LSF<version>_linux2.6-glibc2.3-x86_64-cray
`-- work
`-- <cluster_name>
|-- ego
|-- live_confdir
|-- logdir
|-- lsf_cmddir
`-- lsf_indir
Before you submit jobs to the cluster, be aware that CLE4.0 does not support multiple jobs running on one compute node. All ALPS reservations created by LSF will have the "mode=EXCLUSIVE" attribute. You can define a limit to make sure LSF does not dispatch jobs to compute nodes where a job has been running.
Modify $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.resources:
Begin Limit
NAME = COMPUTE_NODES_LIMIT
USERS = all
PER_HOST = list_of_compute_nodes #This limit applies to compute nodes only.
JOBS = 1
End Limit
There are other ways in LSF to enforce this limitation for ALPS:
To submit a job that requires Cray Linux reservations (e.g., aprun job, CCM job), compound resource requirements must be used:
bsub -extsched "CRAYLINUX[]" -R "1*{select[craylinux && \!vnode]} +
n*{select[vnode && craylinux] span[ptile=q*d]}" aprun -n y -d p -N q a.out
n must be greater than or equal to MAX(y*p, p*q) (the default of y p q is 1).
To submit a job that requires Cray Linux reservations with GPU (e.g., aprun job, CCM job):
bsub -extsched "CRAYLINUX[GPU]" -R "1*{select[craylinux && \!vnode]} + n*{select[vnode
&& craylinux && gpu] span[ptile=q*d] rusage[jobcnt=1]}" aprun -n y -d p -N q a.out
n must be greater than or equal to MAX(y*p, p*q) (the default of y p q is 1).
To submit a job that runs on Cray service/login nodes without creating Cray Linux reservations:
bsub -R "select[craylinux && frontnodes]" hostname
The following jobs with wrong RESREQ will be detected and put in pending state:
Jobs asking for vnode but without CRAYLINUX[] specified. The pending reason is the job cannot run on hosts with vnode.
Jobs with CRAYLINUX[] but the allocation by LSF does not contain at least one front node and at least one vnode. The pending reason is: Cannot create/confirm a reservation by apbasil/catnip
To create Advance Reservation, you need to complete the following steps:
Create AR on compute nodes (hosts with craylinux && vnode).
Add slots on front nodes (host with craylinux && \!vnode).
Submit jobs and specify the Advance Reservation for the job as usual.
The bjobs/bhists/bacct commands display reservation_id under additionalInfo.
Not all scheduling policies behave the same way or automatically support the same things as standard LSF. ALPS in CLE4.0 only supports node exclusive reservations (no two jobs can run on the same node). Resource reservations (slot and resource) in LSF are impacted as jobs that reserved slots may not be able to run due to this ALPS limitation.
Only one Cray Linux machine per cluster is allowed.