On January 13 at the Faculty of Medicine of the University of Barcelona, a key advance in the application of Artificial Intelligence in the field of health and natural language processing will be presented: the first anonymized corpus of different types of clinical reports in Spanish, called CARME-I (Corpus of Anonymized Records for Medical information Extraction ). CARME-I has been developed in the context of the Plan for the Promotion of Language Technologies (TL Plan) promoted by the Secretary of State for Digitalization and Artificial Intelligence (SEDIA) and the agreement between the Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) and the Hospital Clínic de Barcelona. In addition to the CARMEN-I corpus, researchers will showcase pre-trained models to facilitate detection of sensitive data and clinical concepts (diseases, findings and procedures amongst others). Anonymization and annotation documentation will be made available.
About CARMEN-I: applying Artificial Intelligence to medical records
CARMEN-I is a first step in the creation of properly documented, evaluated and licensed clinical Natural Language Processing (NLP) components. Natural Language Processing (NLP) is key to improving the quality of care and clinical research. Both health professionals and researchers can benefit from access to medical records duly processed, validated and adapted for study or analysis.
The quality of word processing systems has improved in recent years thanks to advances in Artificial Intelligence, deep learning techniques and the use of language models. The objective of CARMEN-I is to serve as a set or health database of free access that allows the application of Artificial Intelligence in health. Martin Kralinger, lead researcher of the Text Mining group at the BSC, defines the objective of CARMEN-I: "CARMEN-I wants not only to promote the technological development of clinical NLP systems, but also to serve as a technical basis to facilitate the process of creating anonymized data nationally and internationally, especially for Latin America and countries with data in Romance languages".
CARMEN-I will be accessible to researchers and health professionals, together with the anonymization protocol and guidelines, in order to promote the development of language and AI technologies applied to clinical data and offer guidelines and standards that serve for the anonymization process of sensitive data.
The debate - Anonymizing data and processing natural language in the field of health
The event on January 13 will address two main topics: how an anonymized dataset such as CARMEN-I has been created and how the automatic extraction of clinical information from texts works. Martin Kralinger, lead researcher of the Text Mining group at BSC, will explain the process of automatic extraction of clinical data and the system developed to anonymize the information of the clinical reports of CARMEN-I:
This event also aims to contribute to the training and dissemination of language technologies applied to the health sector, both for the industry environment and the academic sector and health experts and researchers.
In addition, the event will feature the presentation of Jesús Pinilla, Secretary of State for Digitalization and Artificial Intelligence on the new economy of language in applications in health and biomedicine. The closing of the event will be carried out by Alfonso Valencia, director of Life Sciences at BSC, Xavier Pastor, doctor at the Hospital Clínic de Barcelona and Jesús Pinilla, Secretary of State for Digitalization and Artificial Intelligence.
The full agenda of the event is available here.