A Computer-implemented and Reference-free Method for Identifying Variants in Nucleic Acid Sequences

 Status:
Granted
 Publication number:
WO2018007034
 Priority date:
 Inventor:
David Carrera Perez,  Jordà Polo,  Nicola Cadenelli,  David Torrents Arenales,  Mercè Planas Felix
 Applicant:
Barcelona Supercomputing Center - Centro Nacional De Supercomputacion (BSC-CNS), Institució Catalana de Recerca I Estudis Avançats (ICREA), Universitat Politècnica de Catalunya (UPC)

Abstract

There is provided a computer-implemented method for identifying of nucleic acid variants between two cells, such as a normal cell vs. a pathological cell of a patient, or a cell at two different stages of development. The method is alignment-free, as it does not depend on the use of a reference genome, and is based on the generation and comparison of polymorphic k-mers derived from the nucleotide sequence reads of both biological states. The invention accurately identifies all sorts of genetic variants, ranging from single nucleotide substitutions (SNVs) to large structural variants with great sensitivity and specificity. As a major novelty, it also identifies non-human insertions, such as those derived from retroviruses. Altogether, this invention allows the integration with specific hardware architectures in order to speed up the executions to an unprecedented level. 
 
The project leading to this patent has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595).