MGnify Proteins Resource

Description of the MGnify Proteins and related services
Author
Affiliation
Published

October 3, 2024

MGnify Proteins Resource

Introduction

The MGnify Protein Database comprises sequences predicted from assemblies generated from publicly available metagenomic datasets. Since its initial release in August 2017, which comprised just under 50 million sequences, it has grown to over 2.4 billion sequences. All sequences have stable accessions, prefixed with MGYP, such as MGYP000261684433. Due to the dataset’s size, sequences are clustered at 90% identity using MMSeq2/Linclust. Despite clustering, the sequences still capture the biological complexity inherent in metagenomic data.

The dataset is accessible via several platforms:

Schematic of MGnify Proteins resource

License

The data is available for both academic and commercial use under a CC0 1.0 Universal License.

If you make use of the MGnify Protein Database, please cite the following paper:

  • Richardson, L., Allen, B., Baldi, G., Beracochea, M., Bileschi, M. L., Burdett, T., Burgin, J., Caballero-Pérez, J., Cochrane, G., Colwell, L. J., Curtis, T., Escobar-Zepeda, A., Gurbich, T. A., Kale, V., Korobeynikov, A., Raj, S., Rogers, A. B., Sakharova, E., Sanchez, S., Wilkinson, D. J., Finn, R. D. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research (2023). https://doi.org/10.1093/nar/gkac1080

Citation

BibTeX citation:
@online{2024,
  author = {, MGnify},
  title = {MGnify {Proteins} {Resource}},
  pages = {undefined},
  date = {2024-10-03},
  url = {https://docs.mgnify.org/src/docs/mgnify-proteins.html},
  langid = {en}
}
For attribution, please cite this work as:
MGnify. 2024. “MGnify Proteins Resource.” October 3, 2024. https://docs.mgnify.org/src/docs/mgnify-proteins.html.