The bioinformatics side has become the ‘bottleneck’ of all high-throughput based biological studies. Next-generation sequencers (NGS) produce millions of sequences (reads) in a short amount of time at low costs. A major problem is the handling and analysis of these large-scale data sets in an efficient and systematic way. Bioinformatics methods can be applied to analyze generated high-throughput sequencing data computationally and therefore help to address biological questions.
This thesis approaches computational challenges and biological questions that arise when investigating microRNA genes (miRNAs) in nematodes using NGS technologies (ABI SOLiD, Illumina GA II, and HiSeq). On the one hand, bioinformatics methods and computational strategies were identified and developed to analyze experimental large-scale small RNA data. These data sets were generated in-house and by collaborators as well as publicly available.
On the other hand, this work addresses the question whether miRNA genes impact developmental arrest and long-term survival in dauer larvae of two free-living nematodes (Caenorhabditis elegans (C. elegans) and Pristionchus pacificus (P. pacificus)) and the infective stage of parasites (Strongyloides ratti (S. ratti)). In particular, I address the long-standing hypothesis that dauer and infective larvae share a common origin. This investigation is specifically focused on determining whether these two larval stages exhibit similar miRNA expression signatures.
In the first part of this study I developed a bioinformatics workflow that characterizes the miRNA gene complement in C. elegans, P. pacificus, and S. ratti and investigates their expression levels. Additionally, this workflow infers miRNA gene families and integrates the observed phylogenetic relationships with measured expression level changes. As part of this study, I was involved in the development of FLEXBAR (published 2012 in the special issue “Next-Generation Sequencing Approaches in Biology”, Biology), a program that I applied to preprocess our small RNA sequencing data.
FLEXBAR is a versatile solution for three critical preprocessing steps in any next- generation processing pipeline: (i) basic clipping and quality filtering, (ii) barcode recognition and processing, and (iii) adapter recognition and removal. Importantly, all of these steps can be performed in one program call and executed in parallel. FLEXBAR performs slightly better than FASTX, which is widely considered to be the best of all (selected) competitors in removing adapters from an Illumina read (benchmark I). Furthermore, FLEXBAR covers a large range of sequencing platform applications, formats, and features and provides detailed output statistics, e.g. graphical output of read alignments.
In the second part of this study I applied the bioinformatics workflow to address the question whether miRNAs impact developmental arrest and long term survival in dauer and infective larvae of nematodes (published 2013 in Genome Biology and Evolution). This study identifies and extends the number of described miRNA genes to 257 for C. elegans, tripled the known gene set for P. pacificus to 362 miRNAs, and reports the first miRNAs in a Strongyloides parasite, i.e. 106 miRNAs in S. ratti. Although our data suggests that miRNA gene sets diverged rapidly in nematodes, my in-depth assessment of miRNAs in free-living and parasitic nematodes revealed conserved miRNA gene families with similar expression signatures in dauer and infective larvae. This finding suggests that common post-transcriptional regulatory mechanisms are at work and that the same miRNA families play important roles in developmental arrest and long-term survival in free-living and parasitic nematodes. Moreover, this result supports the hypothesis that dauer and infective larvae share a common origin.
Taken together, this thesis describes an extensive set of bioinformatic tools and strategies for the analysis of miRNA genes in free-living and parasitic nematodes and constitutes a valuable resource to researchers studying miRNA evolution and in particular, any aspects of developmental arrest. The starting point of this work was the identification of miRNAs in high-throughput small RNA sequencing data profiled by two distinct sequencing platforms. In this context, I provided sophisticated bioinformatic solutions to analyze small RNA sequencing data sets and to address the aforementioned questions computationally.
Falls Ihr Browser eine Datei nicht öffnen kann, die Datei zuerst herunterladen und dann öffnen.