Load Analyses for a MGnify Study

Python
Author
Affiliation

Sandy Rogers

MGnify team at EMBL-EBI

This is a static preview

You can run and edit these examples interactively on Galaxy

Load a Study from the MGnify API and fetch its Analyses

The MGnify API returns JSON data. The jsonapi_client package can help you load this data into Python, e.g. into a Pandas dataframe.

This example shows you how to load a MGnify Study’s Analyses from the MGnify API

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.


Select a Study

Pick a particular Study of interest. If you followed a link to this notebook, we might already know the Study Accession. Otherwise, you can enter one or use the example:

from lib.variable_utils import get_variable_from_link_or_input

# You can also just directly set the accession variable in code, like this:
# accession = "MGYS00005292"
accession = get_variable_from_link_or_input('MGYS', 'Study Accession', 'MGYS00005292')

Using Study Accession MGYS00005292 from the link you followed.

Using "MGYS00005292" as Study Accession

Fetch data

Fetch Analyses for this study from the MGnify API, into a Pandas dataframe

from jsonapi_client import Session
import pandas as pd

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    analyses = map(lambda r: r.json, mgnify.iterate(f'studies/{accession}/analyses'))
    analyses = pd.json_normalize(analyses)

Inspect the data

The .head() method prints the first few rows of the table

analyses.head()
type id attributes.analysis-summary attributes.pipeline-version attributes.accession attributes.analysis-status attributes.experiment-type attributes.is-private attributes.complete-time attributes.instrument-platform attributes.instrument-model relationships.study.data.id relationships.study.data.type relationships.run.data.id relationships.run.data.type relationships.sample.data.id relationships.sample.data.type
0 analysis-jobs MGYA00448077 [{'key': 'Submitted nucleotide sequences', 'va... 4.1 MGYA00448077 completed amplicon False 2020-01-31T08:26:49 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132556 runs SRS2065862 samples
1 analysis-jobs MGYA00448078 [{'key': 'Submitted nucleotide sequences', 'va... 4.1 MGYA00448078 completed amplicon False 2020-01-31T08:27:25 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132555 runs SRS2065861 samples
2 analysis-jobs MGYA00448079 [{'key': 'Submitted nucleotide sequences', 'va... 4.1 MGYA00448079 completed amplicon False 2020-01-31T08:28:04 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132554 runs SRS2065860 samples
3 analysis-jobs MGYA00448080 [{'key': 'Submitted nucleotide sequences', 'va... 4.1 MGYA00448080 completed amplicon False 2020-01-31T08:28:42 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132553 runs SRS2065859 samples
4 analysis-jobs MGYA00448081 [{'key': 'Submitted nucleotide sequences', 'va... 4.1 MGYA00448081 completed amplicon False 2020-01-31T08:29:18 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132552 runs SRS2065858 samples

Example: distribution of instruments used for the Analysed Samples

import matplotlib.pyplot as plt
analyses.groupby('attributes.instrument-model').size().plot(kind='pie')
plt.title('Number of Analysed Samples by instrument type');