To control all daemons in the cluster, you must
Be logged on as root or as a user listed in the /etc/lsf.sudoers file. See the LSF Configuration Reference for configuration details of lsf.sudoers.
Be able to run the rsh or ssh commands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring the rsh and ssh commands. The shell command specified by LSF_RSH in lsf.conf is used before rsh is tried.
Daemon |
Action |
Command |
Permissions |
---|---|---|---|
All in cluster |
Start |
lsfstartup |
Must be root or a user listed in lsf.sudoers for all these commands |
Shut down |
lsfshutdown |
||
sbatchd |
Start |
badmin hstartup [host_name ...|all] |
Must be root or a user listed in lsf.sudoers for the startup command |
Restart |
badmin hrestart [host_name ...|all] |
Must be root or the LSF administrator for other commands |
|
Shut down |
badmin hshutdown [host_name ...|all] |
||
mbatchd mbschd |
Restart |
badmin mbdrestart |
Must be root or the LSF administrator for these commands |
Shut down |
|
||
Reconfigure |
badmin reconfig |
||
RES |
Start |
lsadmin resstartup [host_name ...|all] |
Must be root or a user listed in lsf.sudoers for the startup command |
Shut down |
lsadmin resshutdown [host_name ...|all] |
Must be the LSF administrator for other commands |
|
Restart |
lsadmin resrestart [host_name ...|all] |
||
LIM |
Start |
lsadmin limstartup [host_name ...|all] |
Must be root or a user listed in lsf.sudoers for the startup command |
Shut down |
lsadmin limshutdown [host_name ...|all] |
Must be the LSF administrator for other commands |
|
Restart |
lsadmin limrestart [host_name ...|all] |
||
Restart all in cluster |
lsadmin reconfig |
Restarting sbatchd on a host does not affect jobs that are running on that host.
If sbatchd is shut down, the host is not available to run new jobs. Existing jobs running on that host continue, but the results are not sent to the user until sbatchd is restarted.
Jobs running on the host are not affected by restarting the daemons.
If a daemon is not responding to network connections, lsadmin displays an error message with the host name. In this case, you must kill and restart the daemon manually.
If the LIM and the other daemons on the current master host shut down, another host automatically takes over as master.
If the RES is shut down while remote interactive tasks are running on the host, the running tasks continue but no new tasks are accepted.
The following LSF daemons are protected from being killed on systems that support out-of-memory (OOM) killer:
For the above daemons, oom_adj will automatically be set to -17 or oom_score_adj will be set to -1000 upon start/restart. This feature ensures that LSF daemons survive OOM killer but not user jobs.
When set daemons oom_adj/oom_score_adj are used, log messages are set to DEBUG level: “Set oom_adj to -17.” and “Set oom_score_adj to -1000.”
Root res, root lim, root sbatchd, pim, melim, and mbatchd protect themselves actively and will log messages.
All logs must set LSF_LOG_MASK as LOG_DEBUG.
lim must be configured as LSF_DEBUG_LIM="LC_TRACE"
When ego is enabled, must set EGO_LOG_MASK=LOG_DEBUG in ego.conf