Load Analyses for a MGnify Study

Python
Author
Affiliation

Sandy Rogers

MGnify team at EMBL-EBI

This is a static preview

You can run and edit these examples interactively on Galaxy

Load a Study from the MGnify API and fetch its Analyses

The MGnify API returns JSON data. The jsonapi_client package can help you load this data into Python, e.g. into a Pandas dataframe.

This example shows you how to load a MGnify Study’s Analyses from the MGnify API

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.


Select a Study

Pick a particular Study of interest. If you followed a link to this notebook, we might already know the Study Accession. Otherwise, you can enter one or use the example:

from lib.variable_utils import get_variable_from_link_or_input

# You can also just directly set the accession variable in code, like this:
# accession = "MGYS00005292"
accession = get_variable_from_link_or_input('MGYS', 'Study Accession', 'MGYS00005292')

Using Study Accession MGYS00005292 from the link you followed.

Using "MGYS00005292" as Study Accession

Fetch data

Fetch Analyses for this study from the MGnify API, into a Pandas dataframe

from jsonapi_client import Session
import pandas as pd

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    analyses = map(lambda r: r.json, mgnify.iterate(f'studies/{accession}/analyses'))
    analyses = pd.json_normalize(analyses)

Inspect the data

The .head() method prints the first few rows of the table

analyses.head()
type id attributes.accession attributes.analysis-status attributes.pipeline-version attributes.analysis-summary attributes.experiment-type attributes.is-private attributes.complete-time attributes.instrument-platform attributes.instrument-model relationships.study.data.id relationships.study.data.type relationships.run.data.id relationships.run.data.type relationships.sample.data.id relationships.sample.data.type
0 analysis-jobs MGYA00448077 MGYA00448077 completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... amplicon False 2020-01-31T08:26:49 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132556 runs SRS2065862 samples
1 analysis-jobs MGYA00448078 MGYA00448078 completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... amplicon False 2020-01-31T08:27:25 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132555 runs SRS2065861 samples
2 analysis-jobs MGYA00448079 MGYA00448079 completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... amplicon False 2020-01-31T08:28:04 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132554 runs SRS2065860 samples
3 analysis-jobs MGYA00448080 MGYA00448080 completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... amplicon False 2020-01-31T08:28:42 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132553 runs SRS2065859 samples
4 analysis-jobs MGYA00448081 MGYA00448081 completed 4.1 [{'key': 'Submitted nucleotide sequences', 'va... amplicon False 2020-01-31T08:29:18 ILLUMINA Illumina HiSeq 2500 MGYS00005292 studies SRR6132552 runs SRS2065858 samples

Example: distribution of instruments used for the Analysed Samples

import matplotlib.pyplot as plt
analyses.groupby('attributes.instrument-model').size().plot(kind='pie')
plt.title('Number of Analysed Samples by instrument type');