ESPRIT: Estimating Species Richness Using Large Collections of 16S rRNA Pyrosequences
Y. Sun, Y. Cai, L. Liu, F. Yu, M. L. Farrell, W. McKendree, and W. Farmerie
Nucleic Acids Research, vol. 37, no. 10 e76, May 2009.
Recent metagenomics studies of environmental samples suggest that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, classifying large collections of 16S ribosomal sequences poses a serious computational challenge for existing algorithms. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational limitations of prior methods. We developed two versions of ESPRIT, one for personal computers and one for computer clusters. The personal-computer version is used for small and medium-scale datasets and can process several tens of thousands sequences within a few minutes, while the computer-cluster version is for large-scale problems and is able to analyze several hundreds of thousands of sequences within one day. The ESPRIT algorithm is available upon request. Please send an email to Dr. Yijun Sun at sunyijun@ufl.edu with your name, email address and affiliations.
Notes:
o
ESPRIT is a
standard implementation of the complete-linkage based hierarchical clustering method.
It can comfortably process several tens of thousands sequences using a desktop
computer. We have used the algorithm to process 1.1M human gut sequences using
a small computer cluster consisting of 100 nodes. Here is the paper.
o
Y. Sun, Y. Cai,
V. Mai, W. Farmerie, F. Yu, J. Li, and S. Goodison, Advanced Computational
Algorithms for Microbial Community Analysis Using Massive 16S rRNA Sequence Data, Nucleic Acids Research, vol. 38, no.
22, e205, 2010.
o
Many existing
algorithms, though widely used by the biology community, have not yet been
fully or properly benchmarked. They vary widely in their outputs, which makes
it difficult to interpret and compare research findings from different research
groups. We conducted a large-scale benchmark study to evaluate the performance
of each algorithm. One of the reviewers commented that “every graduate
student and PI using pyrosequencing technology for microbial community analysis
should read this paper”. We hope you find the paper useful.
o Y. Sun, Y. Cai, S. Huse, R. Knight, W. Farmerie, X. Wang and V. Mai, A Large-scale Benchmark Study of Existing Algorithms for Taxonomy-Independent Microbial Community Analysis, Briefings in Bioinformatics, in press, 2011.
o ESPRIT has been implemented into the Novo-G system, the world's most powerful reconfigurable computer for research. And here is the paper.
o
C.
Pascoe, A. Lawande, H. Lam, A. George, Y. Sun, W. Farmerie, and H. Martin, Reconfigurable Supercomputing
with Scalable Systolic Arrays and In-Stream Control for Wavefront
Genomics Processing. in Proc 2010
Symposium on Application Accelerators in High-Performance Computing (SAAHPC10),
July 2010.
o
ESPRIT is an O(N2)
algorithm with quadratic computational and space complexity. We are developing
a more powerful algorithm capable of handling several tens of millions of 16S rRNA pyrosequences. A preliminary
study showed that the new algorithm has close-to-linear computational and space
complexities, and runs about 500-1000 times faster than ESPRIT. This approach
is useful for other types of biological sequence clustering (e.g.,
identification of orthologs).
o Y. Cai and Y. Sun, ESPRIT-Tree: Hierarchical
Clustering Analysis of Millions of 16S rRNA Pyrosequences in Quasilinear Time,
Nucleic Acids Research, 39 (14): e95, 2011.
o
ESPRIT and
ESPRIT-Tree use pairwise sequence alignment, instead of multiple sequence
alignment, to compute pairwise distances. It was criticized that using pairwise
sequence alignment ignores the secondary structure information of 16S rRNA gene. We performed a simulation study that showed that
including secondary structure information actually does not improve OTU picking
performance, but significantly increases computational complexity.
o
X.
Wang, Y. Cai, Y. Sun, R. Knight, and V. Mai, Secondary
Structure Information Does not Improve OTU Picking for 16S rRNA
Sequences, The ISME Journal, in
press.
o
While
parallel computing is generally not a viable solution to scaling up O(N2) algorithms, the quasilinear space and computational complexities of the ESPRIT-Tree
algorithm make it computationally tractable to process tens of millions of
sequences by using a small computer cluster.
o Y. Cai and Y. Sun, ESPRIT-Forest: Taxonomy Independent Analysis of Tens of Millions of 16S rRNA Pyrosequences Using Parallel Computing, to be submitted to Nucleic Acids Research, 2011.
Documents:
o Manuscript and supplement accepted by Nucleic Acids Research
Update history:
o Version 1.0 released on January 30, 2009
o Version 1.1 released on May 13, 2009
o Version 1.2 released on July 21, 2009
o Version 1.3 released on July 7, 2010
o Version 1.4 released on February 17, 2011
ESPRIT has been used in the following research institutes:
o
MIT-Broad
Institute
o
Stanford
University
o Technical University of Catalonia,
Spain.
o Genome Oriented Bioinformatics, Technical University, Munich, Germany
o Department of Biology, Portland State University
o Max Planck Institute for Marine Microbiology, Germany
o School of Biological Sciences, Seoul National University, South Korea
o Institute for Genomic Biology, University of Illinois at Urbana-Champaign
o Dept. of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
o Instituto de Ecologia, Universidad Nacional Autonoma de Mexico, Mexico
o BioInformatics, Nestle Research Center, Switzerland
o Department of Epidemiology, School of Public Health, Columbia University
o Department of Microbiology & Immunology, University of Maryland School of Medicine
o Department of Biology, West Virginia State University
o CMBI Bacterial Genomics, NIZO food research B.V. Top Institute Food and Nutrition, The Netherlands
o Department of Biology, Hong Kong University of Science and Tech, China
o Invitrogen, Inc.
o NRA - Unite Mathematique, Informatique et Genome, France
o King's College London, UK
o Molecular and Environmental Plant Sciences, Texas A&M University,
o Computer Science Department, UNC Charlotte
o Centre for Geobiology, University of Bergen, Norway
o Department of Soil and Crop Sciences, Texas A&M University
o Medical School of Zhejian University, China.
o Department of Biology, West Virginia State University
o Synthetic Genomics, Inc.
o Josephine Bay Paul Center, Marine Biological Laboratory, USA
o Department of Microbiology, University of Washington, Seattle
o Department of Civil Engineering, University of Glasgow, UK
o Institute for Genome Sciences, University of Maryland School of Medicine
o Bioinformatics Center, Biotechnology Institute - National University of Colombia, Colombia
o Department of Biological Sciences, Florida Institute of Technology
o US Air Force
o School of Biosciences, Division of Microbiology, Human Microbiomics Lab, Cardiff University, UK
o Midwest Research Institute, USA
o Universidad de Buenos Aires, Ciudad Universitaria, Argentina
o Rowett Institute of Nutrition of Health, University of Aberdeen, Scotland
o Institute of Molecular and Cell Biology, University of Tartu, Estonia
o Department of Food and Animal Biotechnology, Seoul National University, Seoul, Korea
o Department of Animal Sciences, Ohio State University
o Department Bodenokologie, Helmholtz Centre for Environmental Research, Germany
o Department of Biological Sciences, Inha University, South Korea
o Sun Center of Excellence for Visual Genomics, Faculty of Medicine, University of Calgary, Canada
o Lawrence Berkeley National Labs, USA
o Laboratoire dEcologie Alpine, France
o The Spanish National Center of Biotechnology
o Center of Microbial Ecology, Michigan State University
o Centro de Bioinformatica, Instituto de Biotecnologia - Universidad Nacional de Colombia
o Bharathiar University, India
o Microbial Habitat Group, Max Planck Institute for Marine Microbiology
o University of Groningen, Netherlands
o Research Institute for Microbial Diseases, Osaka University, Japan
o McGill University and Genome Quebec Innovation Center, Canada
o Department of Microbiology & Immunology, University of Michigan
o J. Craig Venter Institute, USA
o Australian National University
o Department of Biological Sciences, University of Idaho
o Coastal Marine Laboratory, The Hong Kong University of Science and Technology
o Graduate School of Oceanography, University of Rhode Island
o Center for Ecology, Evolution and Conservation, University of East Anglia, UK
o French National Institute for Agricultural Research (INRA-French)
o The University of Chicago
o Institut fur Populationsgenetik, Veterinarmedizinische Universitat Wien, Vienna
o Dalhousie University, Canada
o Center for Public Health Research, Spain
o Laboratory of Animal Diversity and Systematics, Catholic University of Louvain, Belgium
o Human and Molecular Genetics Center, Medical College of Wisconsin
o Dept of Zoology and Animal Biology, University of Geneva, Switzerland
o Genopole de Institut Pasteur, France
o Gladstone Institute for Cardiovascular Disease, University of California, San Francisco
o ALMARAI, Kingdom of Saudi Arabia
o National Dairy Research Institute, India
o Molecular Microbial Ecology Group, Laboratory of Microbiology, Wageningen University, The Netherlands
o University of Southern California
o Nanyan Technological University, Singapore
o The University of York, UK
o Jet Propulsion Laboratory, California Institute of Technology
o Institute for Genome Sciences, University of Maryland at Baltimore
o University of Arizona
o University of Padua (Italy)
o University of Ljubljana (Slovenia)
o Universite Nice (France)
o Bangor University (UK)
o University of Oslo (Norway)
o Department of Neuroscience, University of Arizona
o Aberdeen University (UK)
o National Institute of Advanced Industrial Science and Technology, Japan
o University of Edinburgh (UK)
o University of South Florida
o University of Auckland (New Zealand)
o University of Oslo (Norway)
o Embrapa Recursos Geneticos e Biotecnologia (Brazil)
o Stephen F. Austin State University
o Carnegie Mellon University
o The University of Algarve (Portugal)
o University of Queensland (Australia)
o Laboratory of Animal Diversity and Systematics, Katholieke Universiteit Leuven (Belgium)
o Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel (Germany)
o Laboratory for Microbial Oceanography, French National Center for Scientific Research
o Northweatern Polytechnical University (China)
o University of Guelph (Canada)
o University of the Andes (Colombia)
o Atmosphere and Ocean Research Institute, The University of Tokyo (Japan)
o Department of Microbiology, University of Pennsylvania School of Medicine
o University of Amsterdam and VU University Amsterdam (the Netherlands)
o The Hannover Medical School (Germany)
o Centre Bioengineering, Russian Academy of Sciences (Russia)
o The Wellcome Trust Sanger Institute (UK)
o National University of Singapore
o School of Biotechnology and Biomolecular Sciences, The University of New South Wales (Australia)
o University of California-Davis
o University of North Texas
o University of Nice (France)
o Aberystwyth University (UK)
o National Museum of Natural History, Smithsonian Institution
o University of New Mexico
o Southern Medical University of China
o Tuskegee University
o National Center for Biotechnology-CSIC (Spain)
o University of New Mexico
o Emory University
o University of Tennessee
o Oak Ridge National Laboratory
o Murdoch Childrens Research Institute (Australia)
o University of California- San Francisco
o UNC-Chapel Hill
o Uppsala University, Sweden
o Corpogen-GeBiX (Colombia)
o Albert Einstein College of Medicine of Yeshiva University
o University of Arizona
o Scottish Association for Marine Science
o Tsinghua
University (China)
o University of
Oregon
o Center for
Environmental Sciences, University of Maryland
o University of
Nebraska-Lincoln
o The
Rockefeller University
o University of
Connecticut
o Alfred Wegener
Institute for Polar and Marine Research (Germany)
o Montana State
University
o Max Planck
Institute for Plant Breeding Research (Germany)
o Nankai University
(China)
o Beijing
Genomics Institute (China)
o Universite de
Rennes 1 (France)
o Indiana
University
o Wuhan
University (China)
o University of
New Hampshire
o Bielefeld
University (Germany)
o Free
University of Berlin (Germany)
o University of
Kaiserslautern (Germany)
o Michigan State
University
o University of
Birmingham (UK)
o Novartis
o Virginia
Commonwealth University
o University of
California - San Diego
o National
Institute of Animal Science (South Korea)
o The Methodist Hospital
Research Institute
o University of
Manitoba (Canada)
o Universidade Federal do
Pampa (Brazil)
o University of
Natural Resources and Life Sciences (Austria)
o King Abdullah
University of Science and Technology (Saudi Arabia)
o Deutsches Forschungszentrum für Gesundheit und Umwelt (Germany)
o Duke
University
o The Genome
Analysis Centre (UK)
o European
Molecular Biology Laboratory
o Northwestern
Polytechnic University (China)
o University of
Toronto (Canada)
o University of
Massachusetts – Boston
o IASMA - Fondazione Edmund Mach (Italy)
o St. Jude
Children's Research Hospital
o Dragon
Genomics Center (Japan)
o University of
Connecticut
o Warsaw
University (Poland)
o University of
Waikato (New Zealand)
o Arizona State
University
o The
Commonwealth Scientific and Industrial Research Organisation
(Australia)
o New York
University School of Medicine
o The University
of Arizona
o The Station biologique de Roscoff (France)
o CorpoGen (Colombia)
o North Carolina State University
o The Federal
University of Pampa (Brazil)
o National Laboratory
for Scientific Computing (Brazil)
o CSIRO Plant
Industry (Australia)
o University of
Bari (Italy)
o Integrative
Biology group (Italy)
o University of
Western Sydney (Australia)
o University of
Florence (Italy)
o George Mason Univerisity
o University of
British Columbia (Canada)
o National Cheng
Kung University
o Agricultural
research Organization (ARO) (Israel)
o University of Lethbridge (Canada)
o Volcani Research
Center (Israel)