Start of change

Using SGI Cpusets with ULDB

The SGI user limits database (ULDB) allows user-specific limits for jobs. If no ULDB is defined, job limits are the same for all jobs. If you use ULDB, you can configures LSF so that jobs submitted to a host with the SGI job limits package installed are subject to the job limits configured in the ULDB.

Set the ULDB domain

Set LSF_ULDB_DOMAIN=domain_name in lsf.conf to specify the name of the LSF domain in the ULDB domain directive. A domain definition of name domain_name must be configured in the jlimit.in input file.

The ULDB contains job limit information that system administrators use to control access to a host on a per user basis. The job limits in the ULDB override the system default values for both job limits and process limits. When a ULDB domain is configured, the limits will be enforced as SGI job limits.

If the ULDB domain specified in LSF_ULDB_DOMAIN is not valid or does not exist, LSF uses the limits defined in the domain named batch. If the batch domain does not exist, then the system default limits are set. When an LSF job is submitted, an SGI job is created, and the job limits in the ULDB are applied.

Next, LSF resource usage limits are enforced for the SGI job under which the LSF job is running. LSF limits override the corresponding SGI job limits. The ULDB limits are used for any LSF limits that are not defined. If the job reaches the SGI job limits, the action defined in the SGI system is used. SGI job limits in the ULDB apply only to batch jobs.

You can also define resource limits (rlimits) in the ULDB domain. One advantage to defining rlimits in ULDB as opposed to in LSF is that rlimits can be defined per user and per domain in ULDB, whereas in LSF, limits are enforced per queue or per job.

LSF resource usage limits controlled by ULDB job limits

The following are the LSF resource usage limits controlled by ULDB job limits:

  • PROCESSLIMIT: Corresponds to SGI JLIMIT_NUMPROC; fork(2) fails, but the existing processes continue to run.

  • MEMLIMIT: Corresponds to JLIMIT_RSS; Resident pages above the limit become prime swap candidates.

  • DATALIMIT: Corresponds to LIMIT_DATA; malloc(3) calls in the job fail with errno set to ENOMEM.

  • CPULIMIT: Corresponds to JLIMIT_CPU; a SIGXCPU signal is sent to the job, then after the grace period expires, SIGINT, SIGTERM, and SIGKILL are sent.

  • FILELIMIT: No corresponding limit; use process limit RLIMIT_FSIZE.

  • STACKLIMIT: No corresponding limit; use process limit RLIMIT_STACK.

  • CORELIMIT: No corresponding limit; use process limit RLIMIT_CORE.

  • SWAPLIMIT: Corresponds to JLIMIT_VMEM; use process limit RLIMIT_VMEM.

In some pre-defined LSF queues, such as normal, the default MEMLIMIT is set to 5000 (5 MB). However, if ULDB is enabled (LSF_ULDB_DOMAIN is defined) the MEMLIMIT should be set greater than 8000 in lsb.queues.

ULDB domain configuration

The following steps are an example of how to enable the ULDB domain LSF for user user1:

  1. Define the LSF_ULDB_DOMAIN parameter in lsf.conf:

    ...
    LSF_ULDB_DOMAIN=LSF
    ...

    You can set the LSF_ULDB_DOMAIN to include more than one domain. For example: LSF_ULDB_DOMAIN="lsf:batch:system"

  2. Configure the domain directive LSF in the jlimit.in file:

    domain <LSF> {           # domain for LSF
          jlimit_numproc_cur = unlimited
          jlimit_numproc_max = unlimited   # JLIMIT_NUMPROC 
          jlimit_nofile_cur = unlimited
          jlimit_nofile_max = unlimited    # JLIMIT_NOFILE 
          jlimit_rss_cur = unlimited
          jlimit_rss_max = unlimited       # JLIMIT_RSS 
          jlimit_vmem_cur = 128M
          jlimit_vmem_max = 256M           # JLIMIT_VMEM 
          jlimit_data_cur = unlimited
          jlimit_data_max =unlimited       # JLIMIT_DATA 
          jlimit_cpu_cur = 80
          jlimit_cpu_max = 160             # JLIMIT_CPU 
    	} 
  3. Configure the user limit directive for user1 in the jlimit.in file

    user user1 { 
            LSF { 
               jlimit_data_cur = 128M 
               jlimit_data_max = 256M 
             } 
    } 
  4. Use the genlimits or equivalent command to create the user limits database:

    genlimits -l -v
End of change