You can run and edit these examples interactively on Galaxy
Search for MGnify Studies or Samples, using MGnifyR
The MGnify API returns data and relationships as JSON. MGnifyR is a package to help you read MGnify data into your R analyses.
This example shows you how to perform a search of MGnify Studies or Samples
You can find all of the other “API endpoints” using the Browsable API interface in your web browser. This interface also lets you inspect the kinds of Filters that can be created for each list.
This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.
MGnifyR is an R package that provides a convenient way for R users to access data from the MGnify API.
Detailed help for each function is available in R using the standard ?function_name command (i.e. typing ?mgnify_query will bring up built-in help for the mgnify_query command).
A vignette is available containing a reasonably verbose overview of the main functionality. This can be read either within R with the vignette("MGnifyR") command, or in the development repository
MGnifyR Command cheat sheet
The following list of key functions should give a starting point for finding relevent documentation.
mgnify_client() : Create the client object required for all other functions.
mgnify_query() : Search the whole MGnify database.
mgnify_analyses_from_xxx() : Convert xxx accessions to analyses accessions. xxx is either samples or studies.
mgnify_get_analyses_metadata() : Retrieve all study, sample and analysis metadata for given analyses.
mgnify_get_analyses_phyloseq() : Convert abundance, taxonomic, and sample metadata into a single phyloseq object.
mgnify_get_analyses_results() : Get functional annotation results for a set of analyses.
mgnify_download() : Download raw results files from MGnify.
mgnify_retrieve_json() : Low level API access helper function.
In these examples we set maxhits=1 to retrieve only the first page of results. You can change the limit or set it to -1 to retrieve all samples matching the query.
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA593593, and was assembled with metaSPAdes v3.15.2. This project includes samples from the following biomes: root:Engineered:Wastewater.
EMG produced TPA metagenomics assembly of PRJNA593593 data set (Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 10).
SUBMITTED
2022-03-11T21:49:39
studies
studies
MGYS00005997
PRJEB45727
1
MGYS00005997
FALSE
ERP129875
EMG
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA593594, and was assembled with metaSPAdes v3.15.2. This project includes samples from the following biomes: root:Engineered:Wastewater.
EMG produced TPA metagenomics assembly of PRJNA593594 data set (Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 11).
Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 10
HARVESTED
2022-02-28T14:04:08
studies
studies
MGYS00002316
PRJEB24109
1
MGYS00002316
FALSE
ERP105914
EMBL-EBI
The activated sludge metagenome Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set: PRJNA340752. This project includes samples from the following biomes: Engineered, Wastewater, Activated Sludge.
EMG produced TPA metagenomics assembly of the Active sludge microbial communities of municipal wastewater-treating anaerobic digesters from China - AD_SCU002_MetaG metagenome (activated sludge metagenome) data set.
SUBMITTED
2022-02-03T15:58:54
studies
studies
MGYS00005846
PRJEB47494
110
MGYS00005846
FALSE
ERP131768
EMG
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJEB27054, and was assembled with metaSPAdes v3.12.0. This project includes samples from the following biomes: root:Engineered:Wastewater:Water and sludge.
EMG produced TPA metagenomics assembly of PRJEB27054 data set (Global surveillance of antimicrobial resistance).
SUBMITTED
2021-11-18T06:32:39
studies
studies
MGYS00005847
PRJEB27054
109
MGYS00005847
FALSE
ERP109094
DTU-GE
Antimicrobial resistance (AMR) is one of the most serious global public health threats, however, obtaining representative data on AMR for healthy human populations is difficult. We characterized the bacterial resistome from untreated sewage from 79 sites in 60 countries. We found systematic differences in abundance and diversity of AMR genes between Europe/North-America/Oceania and Africa/Asia/South-America. Antimicrobial use data only explained a minor part of the AMR variation and no evidence for cross-selection between antimicrobial classes nor effect of travel by flight between sites were found. However, AMR abundance was strongly correlated with socio-economic, health and environmental factors, which we used to predict AMR abundances in all countries in the world. Our findings suggest that the global AMR gene diversity and abundance varies by region and are caused by national circumstances. Improving sanitation and health could potentially limit the global burden of AMR. We propose to use sewage for an ethically acceptable and economically feasible continuous global surveillance and prediction of AMR.
To find metadata_keys and values, it is best to browse the interactive API Browser, and use the Filters button to construct queries interactively at first.
To find metadata_keys and values, it is best to browse the interactive API Browser, and use the Filters button to construct queries interactively at first.
Example: adding additional filters to the data frame
First, fetch some samples from the Lentic biome. We can specify the entire Biome lineage, too.
Now, also filter by depth within the returned results, using normal R syntax.
depth_numeric =as.numeric(lentic_samples$depth) # We must convert data from MGnifyR (always strings) to numerical format.depth_numeric[is.na(depth_numeric)] =0.0# If depth data is missing, assume it is surface-level.lentic_subset = lentic_samples[depth_numeric >=25& depth_numeric <=50,] # Filter to samples collected between 25m and 50m down.lentic_subset