OCToPUS version 2: Comparative analysis between 16S metagenomics clustering approaches

SCK•CEN Mentor

Ahmed Mohamed Mysara, mahmed@sckcen.be, +32 (0)14 33 28 36

Expert group


SCK•CEN Co-mentor

Monsieurs Pieter , pmonsieu@sckcen.be , +32 (0)14 33 21 08


The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and grouping the reads into meaningful taxa. There are several approaches for grouping the reads, either using reference-based approach (commonly referred to as phylotyping or closed-reference approach), reference-independent approach (commonly referred to as OTU-clustering) or a hybrid approach (open-reference approach). Another concept has been recently introduced, named Oligotyping, that helps microbial ecologists to investigate concealed diversity within their operational taxonomic units at an extremely precise level by utilizing very subtle variations among 16S Ribosomal RNA gene sequences. By discriminating highly similar sequences, it might be possible to unravel the microbial composition into unseen depth.



This work will focus on the bioinformatics application of data handling, and the expected output would be:

  • Design a challenging simulated data taking into account the chimera, closely related species, sequencing errors and PCR amplification bias.
  • Comparative study of various approaches for sequencing reads grouping (OTU, Oligotyping, closed and opened reference).
  • Incorporating those finding into a second version of our in-house developed 16S metagenomics pipeline (OCToPUS).

The minimum diploma level of the candidate needs to be

Academic bachelor

The candidate needs to have a background in

Bio-engineering , Biology , Informatics

Estimated duration

6-9 month