Description
Long non-coding RNAs (lncRNAs) are abundant in mammalian transcriptomes. However, it remains unclear how many of them are functional, and how their functions are performed. LncRNAs seem to be poorly conserved at the sequence level, but some of them share conserved structural elements and are present at syntenic genomic positions in different species.
A recent study revealed that secondary structure constrains sequence variation in lncRNAs, so that polymorphisms are depleted in low accessibility regions and tend to be neutral with respect to structural stability. This is in contrast with previous analyses that dismissed relationships between structure and sequence evolution in lncRNAs. A crucial difference in the former study is that the considered structural feature, accessibility, is computed from an ensemble of thermodynamically stable structures. Moreover, high-throughput structure probing shows that many lncRNA sites exhibit positive signals for both single- and double-strand specific enzymes, suggesting several structures may coexist.
Based on this, I argue that the difficulty of identifying links between sequence and structure in lncRNAs results in part from limitations imposed by assuming a single, stable structure. I thus propose to consider ensembles of co-existing structures in lncRNAs, and develop a new computational framework that enables this. Using this new paradigm, I will study lncRNAs from animals and fungi by coupling experimental data from RNA structure probing to novel computational approaches that overcome current limitations. Overall, this novel multidisciplinary approach will profoundly impact our understanding of the evolution of lncRNAs. Furthermore, my project should help to fill the gap between structure and function of lncRNAs in different species. Moreover, as many lncRNAs are involved in a variety of human diseases, these results may provide insights towards novel clinical applications.