Binning 16S rRNA sequences into Operational Taxonomic Units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods are currently used to achieve binning, but most suffer from either inaccuracies or from being unable to handle the millions of sequences generated in current studies. Furthermore, existing binning methods usually require a priori decisions regarding binning parameters such as the distance level for defining an OTU.
A new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance or variance levels to define OTUs.