LSF base system

The diagram below shows the components of the LSF base and their relationship:

LSF base consists of the LSF base library (LSLIB) and two servers daemons, the Load Information Manager (LIM) and the Remote Execution Server (RES).

LSLIB: The LSF API LSLIB is the direct user interface to the LSF base system. LSF APIs provide easy access to the services of LSF servers. An LSF server host runs load-shared jobs. A LIM and a RES run on every LSF server host. They interface with the host’s operating system to give users a uniform, host-independent environment.
Cluster: A cluster is a collection of hosts running LSF. A LIM on one of the hosts in a cluster acts as the master LIM for the cluster. The master LIM is chosen among all the LIMs running in the cluster based on configuration file settings. If the master LIM becomes unavailable, the LIM on the next configured host will automatically become the new master LIM.
LIM: The LIM on each host monitors its host's load and reports load information to the master LIM. The master LIM collects information from all hosts and provides that information to the applications.
RES: The RES on each server host accepts remote execution requests and provides fast, transparent, and secure remote execution of tasks.

Application and LSF base interactions

The following diagram shows how an application interacts with LSF base. All of the transactions take place transparently to the programmer:

LSF base executes tasks by sending user requests between the submission, master, and execution hosts. From the submission host send a task into the LSF base system. The master host determines the best execution host to run the task. The execution host runs the task.

lsrun submits a task to LSF for execution.
The submitted task proceeds through the LSF base library (LSLIB).
The LIM communicates the task’s information to the cluster’s master LIM. Periodically, the LIM on individual machines gathers its 12 built-in load indices and forwards this information to the master LIM.
The master LIM determines the best host to run the task and sends this information back to the submission host’s LIM.
Information about the chosen execution host is passed through the LSF base library.
Information about the host to execute the task is passed back to lsrun.
lsrun creates NIOS (network input output server) which is the communication pipe that talks to the RES on the execution host.
Task execution information is passed from the NIOS to the RES on the execution host.
The RES creates a child RES and passes the task execution information to the child RES.
The child RES creates the execution environment and runs the task.
The child RES receives completed task information.
The child RES sends the completed task information to the RES.
The output is sent from the RES to the NIOS. The child RES and the execution environment is destroyed by the RES.
The NIOS sends the output to standard out

To run a task remotely or to perform a file operation remotely, an application calls the remote execution or remote file operation service functions in LSLIB, which then contact the RES to get the services.

The same NIOS is shared by all remote tasks running on different hosts started by the same instance of LSLIB. The LSLIB contacts multiple Remote Execution Servers (RES) and they all call back to the same NIOS. The sharing of the NIOS is restricted to within the same application.

Remotely executed tasks behave as if they were executing locally. The local execution environment passed to the RES is re-established on the remote host, and the task’s status and resource usage are passed back to the client. Terminal I/O is transparent, so even applications such as vi that do complicated terminal manipulation run transparently on remote hosts. UNIX signals are supported across machines, so remote tasks get signals as if they were running locally. Job control also is done transparently. This level of transparency is maintained between heterogeneous hosts.