Biological data integration and management

Present bioinformatics faces a exponential growth of data. Genomics, clinical records, or simulation data accumulate terabytes of data that require to find new ways of storage or data transmission, the adaptation of analytical tools, and efficient ways to present results.

Summary

Data management is a key step in present biological projects. The amount of data generated in genomics studies, especially in the biomedical context, and also in the biomolecular simulations field, requires initiatives focussed in data: transmission, security, storage, and efficient distribution. Our group is developing some actions in these directions.

As part of the Spanish node of the ELIXIR European infrastructure is collaborating in the definition of the guidelines, and to generate tools to ease biological data management. We provide support for data management to a number of large scale genomics projects: ImidKit, The study of the genomics of Chronic Lymphocity Leukemia, integrates in the Internation Cancer Genome Consortium (ICGC), the PanCancer Analysis of Whole Genomes (PCAWG), Blueprint.

Objectives

Develop strategies for the efficient use of biological ontologies in the generation of bioinformatics tools
Develop new models of web-service registries
Develop strategies for automated verification and benchmarking of bioinformatics tools
Interface bioinformatics pipelines to HPC and cloud infrastructures