Support Knowledge Center

Barcelona Supercomputing Center

GREASY User's Guide

Greasy User Guide

Introduction

Greasy is a tool designed to make easier the deployment of embarrassingly parallel simulations in any environment. It is able to run in parallel a list of different tasks, schedule them and run them using the available resources. It is the perfect tool to use, for example, when your application is a serial program, and you need to run a large number of instances with different parameters. Greasy packs all these separate runs and uses the resources granted to run as many tasks as possible in parallel. As this tasks finish, Greasy will continue starting the tasks that were waiting for resources.

Since one of the main principles of Greasy is to keep it simple for the user, the list of tasks is just that: a list of tasks in a text file. Then, each line in the file becomes a task to be run by Greasy. It is able to manage dependencies between tasks, or to rerun a task in case of failure if desired.

Greasy can be easily configured by default with a configuration file, and can be also customized for each particular execution using environment variables. It also provides a log system where all greasy actions will be recorded to keep track of what is the progress of your run.

Installation

How to install Greasy

Greasy can be installed in any Linux machine or cluster. However, additional requirements may be necessary depending on the features desired:

If none of these packages is available and usable, Greasy will only be able to run tasks using the local node. The standard installation procedure is:

tar xvzf greasy-X.Y.tar.gz
cd greasy-X.Y
./configure --prefix=<path-to-greasy-installation-dir>
make
make install

Additional configuration options may be specified to configure in order to enable the additional engines to the greasy runtime. See ./configure –help for details.

Installation structure

We will call GREASY_HOME to the root directory of the installation. Once installed, a Greasy installation will contain the following files under this directory structure:

Greasy Usage

Writing the task file

The task file is the most important element, as here the user defines the tasks to be executed. As mentioned before, each line of the file becomes a task from Greasy’s point of view. This way, each line contains exactly what you want to execute, for example, the path to the binary followed by the arguments with the necessary redirections. The line number corresponds to what we call the task ID. This task ID is useful to understand the log file and when dependencies are needed.

Lines starting with # are treated as comments, and are not processed, but still reserve an ID. That means, for example, if you have the first line commented and the second line contains a valid task, then this task will have ID=2. This way, it is easy to use an editor to show the file line numbers (ie: vim), and address any task through its task ID just jumping into the corresponding file line number.

Working directory is specified at the beginning of the line, preceding the task command and between [@ and @]. This syntax allows to execute the task in the directory specified. As an example:

task 1 # executed on the directory which greasy has been launched from
[@ /home/user/greasy @] task 2 # executes “cd /home/user/greasy && task 2”

Dependencies are specified at the beginning of the line, right before the task and between [# and #]. All tasks that must end before the current task must be included here, and more than one task may be included as a dependency. The best way to understand dependency syntax is through an example. This is an example task file with some dependencies among tasks:

task 1
[# 1 #] task 2
[# -2, 2 #] task 3
task 4
[# 2-4 #] task 5

In this case, tasks 1 and 4 do not have any dependencies. Task 2 depends on task 1, and task 3 depends on tasks 1 and 2. Note that with the “-2” we specify a relative dependency: the parent task is the one 2 lines above, which is task 1. And finally, task 5 depends on tasks 2, 3 and 4. In this case, we specified the dependencies as a range. Remember that you can add more dependencies separating the elements with commas, as in task 3.

It is important to point out that only backward dependencies are allowed. That means that you can only add dependencies to tasks with ID less than the current task ID. In other words, you can only add dependencies to tasks defined in previous lines in the task file.

In summary, when writing a Greasy task file, just keep in mind these simple rules:

    # this line is a comment
    /bin/sleep 2
    # the following task is 4. Tasks IDs 1 and 3 do not exist.
    /bin/sleep 4
    /bin/sleep 5
    /bin/sleep 6
    /bin/sleep 7
    # the following task will be run after completion of the "sleep 5"
    [# 5 #] /bin/sleep 9
    # the following task will be run after completion of the "sleep 9"
    [# -2 #] /bin/sleep 11
    # the following task is invalid because tasks 1 and 3 do not exist
    [#1-3#] /bin/sleep 13
    # the following task will be run after completion of tasks 2, 5, 6 and 7
    [#2, 5 - 7 #] /bin/sleep 15
    # the following task will be executed on the directory /tmp/scratch
    [@ /tmp/scratch @] pwd
    # it is possible to combine dependencies and working directory for a task
    [@ /tmp/scratch @][# -2 #] echo “It works!”

Configuring Greasy

Greasy is configured using two different methods when it is executed: the configuration file and environment variables. It is mandatory to have a valid configuration file with the basic parameters set, and is located at GREASY_HOME/etc/greasy.conf.

In addition to the file, these parameters can be defined or overridden by the environment variables. That means that a parameter defined both in the configuration file and the environment will take the value of the latter.

Here is a brief summary of the different parameters:

There are some considerations regarding the configuration of Greasy:

Running Greasy

There is an example of script to run greasy under example directory in GREASY_HOME. The best way to get started with Greasy is copying this example directory to the place where you want to work, and then use the provided templates to set up your Greasy execution.

Greasy can be run interactively or using a batch system. There is an example script called bsc_greasy.job, which can be adapted to work in both ways. If run interactively, the following changes are needed before launching the script:

As mentioned above, the script is also prepared to be submitted as a BSC / RES job using mnsubmit. In this case, some additional modifications to the script are necessary:

Understanding the log file

While Greasy is still running or after its execution, it is a good idea to check the log. The log provides information of what is being done at any time, when and where the tasks are executed, and possible errors or problems occurred during the execution. LogLevel 3 (Information mode) is recommended as it generates only the minimum useful information. Here is an example of log records extracted from an execution of the short-example.txt with 3 workers:

[2012-02-14 16:50:15] Start greasing short-example.txt
[2012-02-14 16:50:15] INFO: BASIC engine is ready to run with 3 workers
[2012-02-14 16:50:15] INFO: Allocating task 1
[2012-02-14 16:50:15] INFO: Allocating task 2
[2012-02-14 16:50:15] INFO: Allocating task 3
[2012-02-14 16:50:20] INFO: Task 3 completed successfully on node lnx.site. Elapsed: 00:00:05
[2012-02-14 16:50:25] INFO: Task 2 completed successfully on node lnx.site. Elapsed: 00:00:10
[2012-02-14 16:50:35] INFO: Task 1 completed successfully on node lnx.site. Elapsed: 00:00:20
[2012-02-14 16:50:35] INFO: BASIC engine finished
[2012-02-14 16:50:35] INFO: Summary of 3 tasks: 3 OK, 0 FAILED, 0 CANCELLED, 0 INVALID.
[2012-02-14 16:50:35] INFO: Total time: 00:00:20
[2012-02-14 16:50:35] INFO: Resource Utilization: 58.33%
[2012-02-14 16:50:35] Finished greasing short-example.txt

As you can see, every record comes with a timestamp as a prefix, making easier to understand the behaviour of Greasy and the task scheduling throughout the whole execution. Having a quick look at the log, we see that the program started at 16:50:15 and finished at 16:50:35. We can also observe that the engine used is the default basic and that it is configured with 3 workers.

Since there are only 3 tasks to allocate and 3 workers, Greasy allocates all the tasks to all workers at the beginning. Later, Greasy records tasks completions reporting success or error, the node where the task was launched and the time it took to run.

Finally, when all tasks have been executed, Greasy shows some statistics about the execution, such as the tasks completed successfully, the failures or the canceled tasks. It is also very useful to check the global elapsed time and the resource utilization. The higher this latter value is, the more efficiently Greasy used the resources available to run the tasks. If you feel that this number is too low, consider changing the number of workers in order to have them busy the maximum time possible.

Something went wrong: the restart file

Sometimes things do not work as expected, and it is possible that some tasks or even the entire Greasy execution finishes abnormally. When that happens, Greasy generates a restart file containing those tasks that have failed, have been canceled or have just not been executed. The restart file is a valid task file for Greasy, so it can be used to run the tasks that could not run properly the first time. Here is an example of the restart file, running example.txt:

# 
# Greasy restart file generated at 2012-02-14 16:53:50
# Original task file: example.txt
# Log file: greasy.log
#
# Warning: Task 2 failed
/usr/bin/hostname
# Warning: Task 13 was cancelled due to a dependency failure
[# 8 #] /bin/sleep 13
# Warning: Task 15 failed
/usr/bin/hostname
# Warning: Task 22 failed
[ 1 ] /bin/sleep 22
# Warning: Task 24 failed
[ 1 #] /bin/sleep 24
# Invalid tasks were found. Check these lines on example.txt:
# 23, 26, 27, 29, 31, 32
# End of restart file

There is a little header which provides information about the restart itself: when it was created, the task file that was being executed and the log file of the execution. This information is very useful to link the tasks in the restart with the original execution and figure out what happened.

When a task was failed or cancelled, Greasy adds a comment identifying the task in the original file and telling the reason why it is in the restart. If the task was not able to run because Greasy was told to quit before, then there will be no comment added. Finally, At the end of the restart file, if there were invalid tasks in the original task file, because there were syntax or semantic errors, they are listed with their corresponding IDs.

Support & Contact

For any question, doubt or bug report about Greasy, please send an e-mail to:

Greasy is in continuous process of development, so if you feel that you have an idea to improve greasy with new features, or just want to provide some feedback, do not hesitate to contact to the address above. We will be delighted to receive your comments and suggestions to make Greasy even more useful and powerful.