Load Analyses for a MGnify Study


Sandy Rogers

MGnify team at EMBL-EBI

This is a static preview

You can run and edit these examples interactively on Galaxy

Load a Study from the MGnify API and fetch its Analyses

The MGnify API returns JSON data. The jsonapi_client package can help you load this data into Python, e.g. into a Pandas dataframe.

This example shows you how to load a MGnify Study’s Analyses from the MGnify API

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.

Select a Study

Pick a particular Study of interest. If you followed a link to this notebook, we might already know the Study Accession. Otherwise, you can enter one or use the example:

from lib.variable_utils import get_variable_from_link_or_input

# You can also just directly set the accession variable in code, like this:
# accession = "MGYS00005292"
accession = get_variable_from_link_or_input('MGYS', 'Study Accession', 'MGYS00005292')

Using Study Accession MGYS00005292 from the link you followed.

Using "MGYS00005292" as Study Accession

Fetch data

Fetch Analyses for this study from the MGnify API, into a Pandas dataframe

from jsonapi_client import Session
import pandas as pd

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    analyses = map(lambda r: r.json, mgnify.iterate(f'studies/{accession}/analyses'))
    analyses = pd.json_normalize(analyses)

Inspect the data

The .head() method prints the first few rows of the table

type id attributes.pipeline-version attributes.analysis-status attributes.experiment-type attributes.accession attributes.analysis-summary attributes.is-private attributes.last-update attributes.complete-time attributes.instrument-platform attributes.instrument-model relationships.sample.data.id relationships.sample.data.type relationships.run.data.id relationships.run.data.type relationships.study.data.id relationships.study.data.type
0 analysis-jobs MGYA00448077 4.1 completed amplicon MGYA00448077 [{'key': 'Submitted nucleotide sequences', 'va... False 2024-01-29T15:29:19.757516 2020-01-31T08:26:49 ILLUMINA Illumina HiSeq 2500 SRS2065862 samples SRR6132556 runs MGYS00005292 studies
1 analysis-jobs MGYA00448078 4.1 completed amplicon MGYA00448078 [{'key': 'Submitted nucleotide sequences', 'va... False 2024-01-29T15:29:19.757516 2020-01-31T08:27:25 ILLUMINA Illumina HiSeq 2500 SRS2065861 samples SRR6132555 runs MGYS00005292 studies
2 analysis-jobs MGYA00448079 4.1 completed amplicon MGYA00448079 [{'key': 'Submitted nucleotide sequences', 'va... False 2024-01-29T15:29:19.757516 2020-01-31T08:28:04 ILLUMINA Illumina HiSeq 2500 SRS2065860 samples SRR6132554 runs MGYS00005292 studies
3 analysis-jobs MGYA00448080 4.1 completed amplicon MGYA00448080 [{'key': 'Submitted nucleotide sequences', 'va... False 2024-01-29T15:29:19.757516 2020-01-31T08:28:42 ILLUMINA Illumina HiSeq 2500 SRS2065859 samples SRR6132553 runs MGYS00005292 studies
4 analysis-jobs MGYA00448081 4.1 completed amplicon MGYA00448081 [{'key': 'Submitted nucleotide sequences', 'va... False 2024-01-29T15:29:19.757516 2020-01-31T08:29:18 ILLUMINA Illumina HiSeq 2500 SRS2065858 samples SRR6132552 runs MGYS00005292 studies

Example: distribution of instruments used for the Analysed Samples

import matplotlib.pyplot as plt
plt.title('Number of Analysed Samples by instrument type');