Bioinformatics tools for metagenomics analysis of microbial communities

Goussarov Gleb


Vandamme Peter,

SCK•CEN Mentor

Monsieurs Pieter
+32 14 33 21 08

Expert group


PhD started


Short project description

In-depth studies of microbial communities previously relied on cultivation of the bacteria under lab conditions. However, this culturing is complicated, and in most cases even impossible (1). Within this context, the development of new DNA sequencing technologies has led to a revolution in microbial ecology, as it allows scrutinizing in detail microbial communities without the need of culturing them in the lab. This methodology – called metagenomics – offers a unique approach to identify the potential of microbes surviving in extreme environments, such as the cooling circuits of the BR2 (2, 3) or clay water of the underground waste disposal repository (4). Nowadays the most used strategy is to amplify and sequence one specific phylogenetic marker gene which enables microbiologists to identify the type of bacteria present in the studied community. While powerful, this approach has certain limitations, the most important one that it only enables to give an insight on the taxonomic classification but not on the biological functions that can be performed by the community. Due to a decrease in the sequencing costs, it is now feasible to sequence the complete DNA of a microbial community (i.e. shotgun metagenomics), rather than focusing on a specific marker gene, which will give access to the complete metabolic potential of this community and the biological functions encoded within the bacterial genomes. However, this approach comes with a dramatic increase in the complexity of the data analysis as such requiring the development of new bioinformatics tools.

References: (1) Oliver JD. 2005. The viable but nonculturable state in bacteria. J Microbiol 43 Spec No:93-100. (2) Props R, Kerckhof FM, Rubbens P, De Vrieze J, Hernandez Sanabria E, Waegeman W, Monsieurs P, Hammes F, Boon N. 2016. Absolute quantification of microbial taxon abundances. ISME J doi:10.1038/ismej.2016.117. (3) Props R, Monsieurs P, Mysara M, Clement L, Boon N. 2016. Measuring the biodiversity of microbial communities by flow cytometry. Methods in Ecology and Evolution doi:10.1111/2041-210x.12607. (4) Wouters K, Moors H, Boven P, Leys N. 2013. Evidence and characteristics of a diverse and metabolically active microbial community in deep subsurface clay borehole water. FEMS Microbiol Ecol 86:458-473.


Where the analysis of 16S rRNA amplicon sequencing has already been proven to be computationally challenging (5-7), this is certainly the case for shotgun metagenomics. Where the first strategies for analyzing shotgun metagenomics data have already been developed, it is yet unclear which one is the most optimal for studying less intensively explored environments. Therefore, selecting the most optimal strategy for those extreme environments will be a tedious task, requiring an extensive benchmark study. For example, it will be essential to derive whether to choose a methodology mainly based on publicly available reference databases or rather implement methods more independent of such reference data sets. Based on those results, a dedicated software pipeline will need to be developed, either based on existing tools or by developing our own software modules. The ultimate goal of those bioinformatics applications will be to end up with the complete bacterial genomes of the most abundant species within the environmental sample. Similar as with the bioinformatics tools developed for 16S rRNA amplicon sequencing, the algorithms implemented within this PhD are highly relevant for different SCK-CEN life sciences research lines such as the identification of microbial communities surviving within the cooling circuits of the BR2 reactor, the effect of radiation on the human gut microbiome, the impact of microbes on the safe disposal of nuclear waste in clay layers and the interaction of microbes with plants. Therefore, validation of the developed tool(s) within other (nuclear or radiobiology related) environments will be essential within the context of this PhD.

References: (5) Mysara M, Leys N, Raes J, Monsieurs P. 2016. IPED: a highly efficient denoising tool for Illumina MiSeq Paired-end 16S rRNA gene amplicon sequencing data. BMC Bioinformatics 17:192. (6) Mysara M, Leys N, Raes J, Monsieurs P. 2015. NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads. BMC Bioinformatics 16:88. (7) Mysara M, Saeys Y, Leys N, Raes J, Monsieurs P. 2015. CATCh, an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Appl Environ Microbiol 81:1573-1584.