Alberto Labarga Gutierrez

Primary tabs

Biography

With a background in Biomedical Engineering and Bioinformatics, for the last twenty years I have been involved in biomedical data science projects and systems administration, both at University and research centers, and as professional activity at several start-ups company which I contributed to create. 

Most of my experience comes from projects managing health data, most recently ICTUSENSOPT where I develop algorithms to improve both the dagnosis and management of stroke patients, or NAGEN, a major genome sequencing initiative at Navarra which spun different projects such as NAGEN 1000 consisting on WGS for rare diseases diagnosis or PharmaNAGEN involving WES for pharma. In the context of NAGEN I led the setup of the bioinformatics infraestructure, software and sequence analysis pipelines, in close collaboration with NASERTIC (supercomputing center in Navarra) and CNAG (sequencing facilities). I am also aware of European-wide initiatives like Elixir, or EGA. I worked for three years at the European Bioinformatics Institute and I am still in contact with many groups there. I was responsible for the web and API acccess to most of the EBI databases and services, and know the importance of standardized public availability of data and tools for the development of open science. I also been commited to the development of interoperability tools and standards, as well as with the FAIR principles, having participated 6 times in the NBDC/DBCLS Biohackathon (now also co-organized by Elixir).

Before joining BSC, I lead the Data Engineering team at IOMED, a health tech startup applying artificial intelligence and natural language processing to electronic health records. We process and normalize the different electronic health records across European hospitals, using language models to extract information from text to build a federated search engine to speed-up clinical studies, and have built one of the biggest federated networks of OMOP-CDM clinical databases in the world. We process thousands of clinical records daily, applying latest NLP technologies to extract information in a cloud neutral containerized environment using technologies such as Python, SQL, PostgreSQL, MongoDB, docker, Kubernetes and Airflow.

My current work at BSC involve integration of genomics data within the OMOP-CDM, synthetic data generation, clinical beacons, etc.

Research

I will be setting up the Health Data Unit, where we will develop projects and tools related to:

  • standardization of EHR to OMOP-CDM
  • integration of genomics and clinical data
  • extraction of clinically relevant informaton from text using Natural Language Processing
  • synthetic health data generation