Aitor Gonzalez Aguirre

Primary tabs

Biography

Aitor Gonzalez-Agirre is the team leader of Language Modeling in the Language Technologies Unit of the Barcelona Supercomputing Center - Centro Nacional de Supercomputacion. This team is responsible for the development of Large Language Models and their applications in Natural Language Processing tasks.

He received his M.S. degrees and his Ph.D. in Natural Language Processing from the University of the Basque Country (UPV/EHU). He was awarded the 2017 Best Thesis Award at the SEPLN Conference (Sociedad Española para el Procesamiento del Lenguaje Natural) held in Sevilla from 19-21 September 2018.

He participates in the ALIA project, funded by the Spanish Ministry of Digitalization, and contributes to the AINA project aimed at enhancing language infrastructures in Spanish, Catalan, and other co-official languages. He was previously part of the Plan de Impulso de las Tecnologías del Lenguajes (PlanTL) project, which promoted language infrastructures.

Recently, he has concentrated on the development of Large Language Models. His team is behind the creation of the Salamandra language models, which focus on enhancing performance in multiple languages. He also developed the MarIA models, for which he received the Archiletras Innovation Award (2022) for the work "MarIA: Spanish Language Models". Additionally, he received an award for Best Dataset Paper for his work on VeritasQA, a benchmark dataset for evaluating question answering systems based on natural language understanding.

He has worked in the biomedical and legal domains as a member of the Text Mining Unit of the BSC and has experience organizing popular tasks, including Semantic Textual Similarity (STS), Biomedical Abbreviation Recognition and Resolution 2nd Edition (BARR2), Medical Document Anonymization task (MEDDOCAN), Pharmacological Substances, Compounds and Proteins and Named Entity Recognition track (PharmaCoNER), MESINESP and CodiEsp, among others.

Aitor has also contributed to the creation and enrichment of multilingual lexical knowledge bases, including the Multilingual Central Repository 3.0 (MCR), the eXtended WordNet Domains (XWD), the MeSpEN resource, and Medical and Legal Word Embeddings for Spanish and Catalan.

Education

  • Ph.D. in Computer Science, University of the Basque Country (July 2017).
  • M.S. in Languague Analysis and Processing (September 2012)
  • M.S. in Advanced Computer Systems (September 2011)
  • B.S in Computer Science (June 2010)

Main Research Lines