Load Analyses for a MGnify Study


Sandy R (MGnify team)

This is a static preview

You can run and edit these examples interactively on Galaxy

Load a Study from the MGnify API and fetch its Analyses

The MGnify API returns JSON data. The jsonapi_client package can help you load this data into Python, e.g. into a Pandas dataframe.

This example shows you how to load a MGnify Study’s Analyses from the MGnify API

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.

Select a Study

Pick a particular Study of interest. If you followed a link to this notebook, we might already know the Study Accession. Otherwise, you can enter one or use the example:

from lib.variable_utils import get_variable_from_link_or_input

# You can also just directly set the accession variable in code, like this:
# accession = "MGYS00005292"
accession = get_variable_from_link_or_input('MGYS', 'Study Accession', 'MGYS00005292')

Using Study Accession MGYS00005292 from the link you followed.

Using "MGYS00005292" as Study Accession

Fetch data

Fetch Analyses for this study from the MGnify API, into a Pandas dataframe

from jsonapi_client import Session
import pandas as pd

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    analyses = map(lambda r: r.json, mgnify.iterate(f'studies/{accession}/analyses'))
    analyses = pd.json_normalize(analyses)

Inspect the data

The .head() method prints the first few rows of the table

type id attributes.accession attributes.experiment-type attributes.analysis-status attributes.pipeline-version attributes.analysis-summary attributes.is-private attributes.complete-time attributes.instrument-platform attributes.instrument-model relationships.run.data.id relationships.run.data.type relationships.study.data.id relationships.study.data.type relationships.sample.data.id relationships.sample.data.type
0 analysis-jobs MGYA00448077 MGYA00448077 amplicon completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... False 2020-01-31T08:26:49 ILLUMINA Illumina HiSeq 2500 SRR6132556 runs MGYS00005292 studies SRS2065862 samples
1 analysis-jobs MGYA00448078 MGYA00448078 amplicon completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... False 2020-01-31T08:27:25 ILLUMINA Illumina HiSeq 2500 SRR6132555 runs MGYS00005292 studies SRS2065861 samples
2 analysis-jobs MGYA00448079 MGYA00448079 amplicon completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... False 2020-01-31T08:28:04 ILLUMINA Illumina HiSeq 2500 SRR6132554 runs MGYS00005292 studies SRS2065860 samples
3 analysis-jobs MGYA00448080 MGYA00448080 amplicon completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... False 2020-01-31T08:28:42 ILLUMINA Illumina HiSeq 2500 SRR6132553 runs MGYS00005292 studies SRS2065859 samples
4 analysis-jobs MGYA00448081 MGYA00448081 amplicon completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... False 2020-01-31T08:29:18 ILLUMINA Illumina HiSeq 2500 SRR6132552 runs MGYS00005292 studies SRS2065858 samples

Example: distribution of instruments used for the Analysed Samples

import matplotlib.pyplot as plt
plt.title('Number of Analysed Samples by instrument type');