MAX_SBD_FAIL=integer
The maximum number of retries for reaching a non-responding slave batch daemon, sbatchd.
The minimum interval between retries is defined by MBD_SLEEP_TIME/10. If mbatchd fails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unavailable or unreachable.
After mbatchd tries to reach a host MAX_SBD_FAIL number of times, mbatchd reports the host status as unavailable or unreachable.
When a host becomes unavailable, mbatchd assumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with the bsub -r option) are scheduled to be rerun on another host.
3