ESPRIT: Estimating Species Richness Using Large Collections of 16S rRNA Pyrosequences

Y. Sun, Y. Cai, L. Liu, F. Yu, M. L. Farrell, W. McKendree, and W. Farmerie

Nucleic Acids Research, vol. 37, no. 10 e76, May 2009.


Recent metagenomics studies of environmental samples suggest that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, classifying large collections of 16S ribosomal sequences poses a serious computational challenge for existing algorithms. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational limitations of prior methods. We developed two versions of ESPRIT, one for personal computers and one for computer clusters. The personal-computer version is used for small and medium-scale datasets and can process several tens of thousands sequences within a few minutes, while the computer-cluster version is for large-scale problems and is able to analyze several hundreds of thousands of sequences within one day. The ESPRIT algorithm is available upon request. Please send an email to Dr. Yijun Sun at with your name, email address and affiliations.



o   ESPRIT is a standard implementation of the complete-linkage based hierarchical clustering method. It can comfortably process several tens of thousands sequences using a desktop computer. We have used the algorithm to process 1.1M human gut sequences using a small computer cluster consisting of 100 nodes. Here is the paper.

o    Y. Sun, Y. Cai, V. Mai, W. Farmerie, F. Yu, J. Li, and S. Goodison, Advanced Computational Algorithms for Microbial Community Analysis Using Massive 16S rRNA Sequence Data, Nucleic Acids Research, vol. 38, no. 22, e205,  2010.


o   Many existing algorithms, though widely used by the biology community, have not yet been fully or properly benchmarked. They vary widely in their outputs, which makes it difficult to interpret and compare research findings from different research groups. We conducted a large-scale benchmark study to evaluate the performance of each algorithm. One of the reviewers commented that “every graduate student and PI using pyrosequencing technology for microbial community analysis should read this paper”. We hope you find the paper useful.

o   Y. Sun, Y. Cai, S. Huse, R. Knight, W. Farmerie, X. Wang and V. Mai, A Large-scale Benchmark Study of Existing Algorithms for Taxonomy-Independent Microbial Community Analysis, Briefings in Bioinformatics, in press, 2011.


o   ESPRIT has been implemented into the Novo-G system, the world's most powerful reconfigurable computer for research. And here is the paper.

o   C. Pascoe, A. Lawande, H. Lam, A. George, Y. Sun, W. Farmerie, and H. Martin, Reconfigurable Supercomputing with Scalable Systolic Arrays and In-Stream Control for Wavefront Genomics Processing. in Proc 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC10), July 2010.


o   ESPRIT is an O(N2) algorithm with quadratic computational and space complexity. We are developing a more powerful algorithm capable of handling several tens of millions of 16S rRNA pyrosequences. A preliminary study showed that the new algorithm has close-to-linear computational and space complexities, and runs about 500-1000 times faster than ESPRIT. This approach is useful for other types of biological sequence clustering (e.g., identification of orthologs).

o   Y. Cai and Y. Sun, ESPRIT-Tree: Hierarchical Clustering Analysis of Millions of 16S rRNA Pyrosequences in Quasilinear Time, Nucleic Acids Research, 39 (14): e95, 2011.


o   ESPRIT and ESPRIT-Tree use pairwise sequence alignment, instead of multiple sequence alignment, to compute pairwise distances. It was criticized that using pairwise sequence alignment ignores the secondary structure information of 16S rRNA gene. We performed a simulation study that showed that including secondary structure information actually does not improve OTU picking performance, but significantly increases computational complexity.

o   X. Wang, Y. Cai, Y. Sun, R. Knight, and V. Mai, Secondary Structure Information Does not Improve OTU Picking for 16S rRNA Sequences, The ISME Journal, in press.


o    While parallel computing is generally not a viable solution to scaling up O(N2) algorithms, the quasilinear space and computational complexities of the ESPRIT-Tree algorithm make it computationally tractable to process tens of millions of sequences by using a small computer cluster.

o   Y. Cai and Y. Sun, ESPRIT-Forest: Taxonomy Independent Analysis of Tens of Millions of 16S rRNA Pyrosequences Using Parallel Computing, to be submitted to Nucleic Acids Research, 2011.



o   Manuscript and supplement accepted by Nucleic Acids Research

o   User guide


Update history:

o   Version 1.0 released on January 30, 2009

o   Version 1.1 released on May 13, 2009

o   Version 1.2 released on July 21, 2009

o   Version 1.3 released on July 7, 2010

o   Version 1.4 released on February 17, 2011



ESPRIT has been used in the following research institutes:

o   MIT-Broad Institute

o   Stanford University

o   Technical University of Catalonia, Spain.

o   Genome Oriented Bioinformatics, Technical University, Munich, Germany

o   Department of Biology, Portland State University

o   Max Planck Institute for Marine Microbiology, Germany

o   School of Biological Sciences, Seoul National University, South Korea

o   Institute for Genomic Biology, University of Illinois at Urbana-Champaign

o   Dept. of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida

o   Instituto de Ecologia, Universidad Nacional Autonoma de Mexico, Mexico

o   BioInformatics, Nestle Research Center, Switzerland

o   Department of Epidemiology, School of Public Health, Columbia University

o   Department of Microbiology & Immunology, University of Maryland School of Medicine

o   Department of Biology, West Virginia State University

o   CMBI Bacterial Genomics, NIZO food research B.V. Top Institute Food and Nutrition, The Netherlands

o   Department of Biology, Hong Kong University of Science and Tech, China

o   Invitrogen, Inc.

o   NRA - Unite Mathematique, Informatique et Genome, France

o   King's College London, UK

o   Molecular and Environmental Plant Sciences, Texas A&M University,

o   Computer Science Department, UNC Charlotte

o   Centre for Geobiology, University of Bergen, Norway

o   Department of Soil and Crop Sciences, Texas A&M University

o   Medical School of Zhejian University, China.

o   Department of Biology, West Virginia State University

o   Synthetic Genomics, Inc.

o   Josephine Bay Paul Center, Marine Biological Laboratory, USA

o   Department of Microbiology, University of Washington, Seattle

o   Department of Civil Engineering, University of Glasgow, UK

o   Institute for Genome Sciences, University of Maryland School of Medicine

o   Bioinformatics Center, Biotechnology Institute - National University of Colombia, Colombia

o   Department of Biological Sciences, Florida Institute of Technology

o   US Air Force

o   School of Biosciences, Division of Microbiology, Human Microbiomics Lab, Cardiff University, UK

o   Midwest Research Institute, USA

o   Universidad de Buenos Aires, Ciudad Universitaria, Argentina

o   Rowett Institute of Nutrition of Health, University of Aberdeen, Scotland

o   Institute of Molecular and Cell Biology, University of Tartu, Estonia

o   Department of Food and Animal Biotechnology, Seoul National University, Seoul, Korea

o   Department of Animal Sciences, Ohio State University

o   Department Bodenokologie, Helmholtz Centre for Environmental Research, Germany

o   Department of Biological Sciences, Inha University, South Korea

o   Sun Center of Excellence for Visual Genomics, Faculty of Medicine, University of Calgary, Canada

o   Lawrence Berkeley National Labs, USA

o   Laboratoire dEcologie Alpine, France

o   The Spanish National Center of Biotechnology

o   Center of Microbial Ecology, Michigan State University

o   Centro de Bioinformatica, Instituto de Biotecnologia - Universidad Nacional de Colombia

o   Bharathiar University, India

o   Microbial Habitat Group, Max Planck Institute for Marine Microbiology

o   University of Groningen, Netherlands

o   Research Institute for Microbial Diseases, Osaka University, Japan

o   McGill University and Genome Quebec Innovation Center, Canada

o   Department of Microbiology & Immunology, University of Michigan

o   J. Craig Venter Institute, USA

o   Australian National University

o   Department of Biological Sciences, University of Idaho

o   Coastal Marine Laboratory, The Hong Kong University of Science and Technology

o   Graduate School of Oceanography, University of Rhode Island

o   Center for Ecology, Evolution and Conservation, University of East Anglia, UK

o   French National Institute for Agricultural Research (INRA-French)

o   The University of Chicago

o   Institut fur Populationsgenetik, Veterinarmedizinische Universitat Wien, Vienna

o   Dalhousie University, Canada

o   Center for Public Health Research, Spain

o   Laboratory of Animal Diversity and Systematics, Catholic University of Louvain, Belgium

o   Human and Molecular Genetics Center, Medical College of Wisconsin

o   Dept of Zoology and Animal Biology, University of Geneva, Switzerland

o   Genopole de Institut Pasteur, France

o   Gladstone Institute for Cardiovascular Disease, University of California, San Francisco

o   ALMARAI, Kingdom of Saudi Arabia

o   National Dairy Research Institute, India

o   Molecular Microbial Ecology Group, Laboratory of Microbiology, Wageningen University, The Netherlands

o   University of Southern California

o   Nanyan Technological University, Singapore

o   The University of York, UK

o   Jet Propulsion Laboratory, California Institute of Technology

o   Institute for Genome Sciences, University of Maryland at Baltimore

o   University of Arizona

o   University of Padua (Italy)

o   University of Ljubljana (Slovenia)

o   Universite Nice (France)

o   Bangor University (UK)

o   University of Oslo (Norway)

o   Department of Neuroscience, University of Arizona

o   Aberdeen University (UK)

o   National Institute of Advanced Industrial Science and Technology, Japan

o   University of Edinburgh (UK)

o   University of South Florida

o   University of Auckland (New Zealand)

o   University of Oslo (Norway)

o   Embrapa Recursos Geneticos e Biotecnologia (Brazil)

o   Stephen F. Austin State University

o   Carnegie Mellon University

o   The University of Algarve (Portugal)

o   University of Queensland (Australia)

o   Laboratory of Animal Diversity and Systematics, Katholieke Universiteit Leuven (Belgium)

o   Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel (Germany)

o   Laboratory for Microbial Oceanography, French National Center for Scientific Research

o   Northweatern Polytechnical University (China)

o   University of Guelph (Canada)

o   University of the Andes (Colombia)

o   Atmosphere and Ocean Research Institute, The University of Tokyo (Japan)

o   Department of Microbiology, University of Pennsylvania School of Medicine

o   University of Amsterdam and VU University Amsterdam (the Netherlands)

o   The Hannover Medical School (Germany)

o   Centre Bioengineering, Russian Academy of Sciences (Russia)

o   The Wellcome Trust Sanger Institute (UK)

o   National University of Singapore

o   School of Biotechnology and Biomolecular Sciences, The University of New South Wales (Australia)

o   University of California-Davis

o   University of North Texas

o   University of Nice (France)

o   Aberystwyth University (UK)

o   National Museum of Natural History, Smithsonian Institution

o   University of New Mexico

o   Southern Medical University of China

o   Tuskegee University

o   National Center for Biotechnology-CSIC (Spain)

o   University of New Mexico

o   Emory University

o   University of Tennessee

o   Oak Ridge National Laboratory

o   Murdoch Childrens Research Institute (Australia)

o   University of California- San Francisco

o   UNC-Chapel Hill

o   Uppsala University, Sweden

o   Corpogen-GeBiX (Colombia)

o   Albert Einstein College of Medicine of Yeshiva University

o   University of Arizona

o  Scottish Association for Marine Science

o  Tsinghua University (China)

o  University of Oregon

o  Center for Environmental Sciences, University of Maryland

o  University of Nebraska-Lincoln

o  The Rockefeller University

o  University of Connecticut

o  Alfred Wegener Institute for Polar and Marine Research (Germany)

o  Montana State University

o  Max Planck Institute for Plant Breeding Research (Germany)

o  Nankai University (China)

o  Beijing Genomics Institute (China)

o  Universite de Rennes 1 (France)

o  Indiana University

o  Wuhan University (China)

o  University of New Hampshire

o  Bielefeld University (Germany)

o  Free University of Berlin (Germany)

o  University of Kaiserslautern (Germany)

o  Michigan State University

o  University of Birmingham (UK)

o  Novartis

o  Virginia Commonwealth University

o  University of California - San Diego

o  National Institute of Animal Science (South Korea)

o  The Methodist Hospital Research Institute

o  University of Manitoba (Canada)

o  Universidade Federal do Pampa (Brazil)

o  University of Natural Resources and Life Sciences (Austria)

o  King Abdullah University of Science and Technology (Saudi Arabia)

o  Deutsches Forschungszentrum für Gesundheit und Umwelt (Germany)

o  Duke University

o  The Genome Analysis Centre (UK)

o  European Molecular Biology Laboratory

o  Northwestern Polytechnic University (China)

o  University of Toronto (Canada)

o  University of Massachusetts – Boston

o  IASMA - Fondazione Edmund Mach (Italy)

o  St. Jude Children's Research Hospital

o  Dragon Genomics Center (Japan)

o  University of Connecticut

o  Warsaw University (Poland)

o  University of Waikato (New Zealand)

o  Arizona State University

o  The Commonwealth Scientific and Industrial Research Organisation (Australia)

o  New York University School of Medicine

o  The University of Arizona

o  The Station biologique de Roscoff (France)

o  CorpoGen (Colombia)

o  North Carolina State University

o  The Federal University of Pampa (Brazil)

o  National Laboratory for Scientific Computing (Brazil)

o  CSIRO Plant Industry (Australia)

o  University of Bari (Italy)

o  Integrative Biology group (Italy)

o  University of Western Sydney (Australia)

o  University of Florence (Italy)

o  George Mason Univerisity

o  University of British Columbia (Canada)

o  National Cheng Kung University

o  Agricultural research Organization (ARO) (Israel)

o  University of Lethbridge (Canada)

o  Volcani Research Center (Israel)


















Description: Description: Description: Description: Description: Description: Description: Description: Description: free hit