Autosubmit

Distributed Computing Climate BSC Group: Earth Sciences Software
Autosubmit is a Python-based workflow manager to create, manage and monitor complex tasks involving different substeps, such as scientific computational experiments. These workflows may involve multiple computing systems for their completion, from HPCs to post-processing clusters or workstations. Autosubmit can orchestrate all the tasks integrating the workflow by managing their dependencies, interfacing with all the platforms involved, and handling eventual errors.
Software Author: 
Daniel Beltran, Miguel Castrillo, Kim Serradell, Francisco Javier Doblas Reyes, Domingo Manubens, Oriol Mula, Wilmer Uruchi
Research Lines:
License: 

GPL License (Version 3.0)

Primary tabs

GPL License (Version 3.0) (Latest Version)

Multi-threaded wrappers were introduced in this version. It provides the possibility to specify multiple hosts for the same platform (in a list) so it is more robust against connection issues/login failures. In general, big experiments, with many startdates/members or featuring very big wrappers run much more efficiently with 3.13.0. A completely new implementation of remote dependencies (PreSubmission) was introduced in this version. It helps to speed up the jobs in a Slurm platform by sending the next 10 Waiting jobs in advance to the queues. Workflows have more flexibility by the inclusion of a new way to define dependencies for specific chunks. Changes were made to the algorithm that handles the maximum active jobs by platform. From this version, wrapped jobs count as a single job for Autosubmit, and the maximum number of inner jobs can be defined with new wrapper parameters. New POLICY option allowing to tune the behaviour for creating wrapper jobs (more greedy, more conservative, and a more balanced one. Wrappers has a new option, QUEUE, that allows putting the wrapper job in a different queue than the single jobs. There is a new log (.err, .out, COMPLETED, STAT files) recovering system, that performs re-tries (in background threads) of the log files transfer from the remote platforms in case of failure. The user can specify a datetime or time to trigger the experiment start by sending the -st flag (plus the right format) using the autosubmit run command. The user can specify an experiment dependency by providing the -sa (plus the right expid format) flag to the autosubmit run command. The experiment will start when the experiment specified in the -sa flag finishes. When the user quits Autosubmit by using the CTRL+C keys, Autosubmit will make sure all threads are finished correctly before closing. Job lifecycle information is stored in an external database that will allow users to visualize job historical information. This information is gathered in a way that does not interfere with the normal workflow (even if the information gathering fails or any of its components). Furthermore, threading is implemented to prevent unnecessary delays. Specific members can be selected to run by using the -rm flag with autosubmit run. Autosubmit will only run jobs belonging to the specified members. Jobs already running will be monitored and properly completed. The git clone operation (Autosubmit create) now implements a backup procedure that will prevent loss of information in case of wrong configuration or network error. There is an improvement of the security, now all commands that could change the workflow are locked by an owner-only mechanism. Ej: create, refresh and run. New autosubmit dbfix expid command allowsing users to fix the database malformed error. Custom shebang (header of the script templates) so it is possible to use Python or R templates with a specific Python/R version dependency. Only create and run commands can update the workflow configuration and structure information. In the case of run, they will only be updated if a change is detected before the starting of the main run loop. Increased robustness. AS will try to prevent as many errors as possible at the beginning of the run and will handle other delicate operations before run time. Allows prioritizing a list of jobs to be run before the rest of the workflow. Via the use of the Two_step_start variable set in expdef.conf Allows skipping jobs of the same section if their last queuing member/chunk is higher than other on queuing/waiting/ready status. Reworked migrate command, with improvements in robustness and security. New pklfix command to restore a corrupted local database. New updatedescrip command to modify the experiment's description. Added Nord3 support.

Release Notes

Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion, dialog, curl, python-tk, python2-dev, graphviz >= 2.41, pip2) must be locally available. These packages (argparse, python-dateutil, pyparsing, numpy, pydotplus, matplotlib, paramiko, python2-pythondialog, portalocker, requests, typing, six >= 1.10, tkinter) must be available at the Python runtime. The machine needs to be able to access HPC platforms via password-less ssh.