MGnify Proteins Resource

Description of the MGnify Proteins and related services

Author

Affiliation

MGnify

EMBL-EBI

Published

July 31, 2025

MGnify Proteins Resource

Introduction

The MGnify Protein Database comprises sequences predicted from assemblies generated from publicly available metagenomic datasets. Since its initial release in August 2017, which comprised just under 50 million sequences, it has grown to over 2.4 billion sequences. All sequences have stable accessions, prefixed with MGYP, such as MGYP000261684433. Due to the dataset’s size, sequences are clustered at 90% identity using MMSeq2/Linclust. Despite clustering, the sequences still capture the biological complexity inherent in metagenomic data.

The dataset is accessible via several platforms:

FTP Server: Available for download from our FTP server.
HMMER Sequence Search Webservice: Accessible through our Sequence Search service.
MGnify Proteins Portal: Explore the data on the MGnify Proteins web portal.
Google Cloud Public Dataset: Available as a Big Query public dataset on Google Cloud.

License

The data is available for both academic and commercial use under a CC0 1.0 Universal License.

If you make use of the MGnify Protein Database, please cite the following paper:

Richardson, L., Allen, B., Baldi, G., Beracochea, M., Bileschi, M. L., Burdett, T., Burgin, J., Caballero-Pérez, J., Cochrane, G., Colwell, L. J., Curtis, T., Escobar-Zepeda, A., Gurbich, T. A., Kale, V., Korobeynikov, A., Raj, S., Rogers, A. B., Sakharova, E., Sanchez, S., Wilkinson, D. J., Finn, R. D. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research (2023). https://doi.org/10.1093/nar/gkac1080

Citation

BibTeX citation:

@online{2025,
  author = {, MGnify},
  title = {MGnify {Proteins} {Resource}},
  pages = {undefined},
  date = {2025-07-31},
  url = {https://docs.mgnify.org/src/docs/mgnify-proteins.html},
  langid = {en}
}

For attribution, please cite this work as:

MGnify. 2025. “MGnify Proteins Resource.” July 31, 2025. https://docs.mgnify.org/src/docs/mgnify-proteins.html.