If the ResourceMap section contains even one resource mapped as default, and if there are multiple elim executables in LSF_SERVERDIR, the MELIM starts all of the elim executables in LSF_SERVERDIR on all hosts in the cluster. Not all of the elim executables continue to run, however. Those that use a checking header could exit with ELIM_ABORT_VALUE if they are not programmed to report values for the resources listed in LSF_RESOURCES.
Restarts an elim if the elim exits. To prevent system-wide problems in case of a fatal error in the elim, the maximum restart frequency is once every 90 seconds. The MELIM does not restart any elim that exits with ELIM_ABORT_VALUE.
Collects the load information reported by the elim executables.
Checks the syntax of load update strings before sending the information to the LIM.
Merges the load reports from each elim and sends the merged load information to the LIM. If there is more than one value reported for a single resource, the MELIM reports the latest value.
Logs its activities and data into the log file LSF_LOGDIR/melim.log.host_name
Increases system reliability by buffering output from multiple elim executables; failure of one elim does not affect other elim executables running on the same host.
Must map external resource names to locations in lsf.cluster.cluster_name
Optionally, use the environment variables LSF_RESOURCES, LSF_MASTER, and ELIM_ABORT_VALUE in your elim executables
If the specified LOCATION is … |
Then the elim executables start on … |
---|---|
|
|
|
|
|
|
If you use the default keyword for any external resource in lsf.cluster.cluster_name, all elim executables in LSF_SERVERDIR run on all hosts in the cluster. You can control the hosts on which your elim executables run by using the environment variables LSF_MASTER, LSF_RESOURCES, and ELIM_ABORT_VALUE. These environment variables provide a way to ensure that elim executables run only when they are programmed to report the values for resources expected on a host.
LSF_MASTER—You can program your elim to check the value of the LSF_MASTER environment variable. The value is Y on the master host and N on all other hosts. An elim executable can use this parameter to check the host on which the elim is currently running.
LSF_RESOURCES—When the LIM starts an MELIM on a host, the LIM checks the resource mapping defined in the ResourceMap section of lsf.cluster.cluster_name. Based on the mapping location (default, all, or a host list), the LIM sets LSF_RESOURCES to the list of resources expected on the host.
When the location of the resource is defined as default, the resource is listed in LSF_RESOURCES on the server hosts. When the location of the resource is defined as all, the resource is only listed in LSF_RESOURCES on the master host.
Use LSF_RESOURCES in a checking header to verify that an elim is programmed to collect values for at least one of the resources listed in LSF_RESOURCES.
ELIM_ABORT_VALUE—An elim should exit with ELIM_ABORT_VALUE if the elim is not programmed to collect values for at least one of the resources listed in LSF_RESOURCES. The MELIM does not restart an elim that exits with ELIM_ABORT_VALUE. The default value is 97.
#!/bin/sh
# list the resources that the elim can report to lim
my_resource="myrsc"
# do the check when $LSF_RESOURCES is defined by lim
if [ -n "$LSF_RESOURCES" ]; then
# check if the resources elim can report are listed in $LSF_RESOURCES
res_ok=`echo " $LSF_RESOURCES " | /bin/grep " $my_resource " `
# exit with $ELIM_ABORT_VALUE if the elim cannot report on at least
# one resource listed in $LSF_RESOURCES
if [ "$res_ok" = "" ] ; then
exit $ELIM_ABORT_VALUE
fi
fi
while [ 1 ];do
# set the value for resource "myrsc"
val="1"
# create an output string in the format:
# number_indices index1_name index1_value...
reportStr="1 $my_resource $val"
echo "$reportStr"
# wait for 30 seconds before reporting again
sleep 30
done