MGnify Notebooks (previews)
MGnify Notebooks
The quantity and richness of metagenomics-derived data in MGnify grows every day. The MGnify website is the best place to start exploring and searching the MGnify database, and allows users to download modest query results as CSV tables.
For larger queries, or more complex requirements like fetching metadata from samples across multiple studies, a programmatic access approach is far better.
Programmatic access - fetching data from MGnify using a terminal command or code script - uses the MGnify API (Application Programming Interface). The API provides access to every data type in MGnify: Studies, Samples, Analyses, Annotations, MAGs etc: it is what lies behind the MGnify website. Using the API means you can fetch more data than is possible via the website, and can help you write reproducible analysis scripts.
The API can be explored interactively online, using the API Browser. But actually using the API first requires knowledge and/or installation of tools on your computer. This might range from a command line tool like cURL, to learning R and setting up the R Studio application, to setting up a Python environment and installing a suite of packages used for data analysis. Second, the API returns most data in JSON format: this is standard on the web, but less familiar for bioinformaticians used to TSVs and dataframes.
The MGnify Notebook Servers at EMBL and Galaxy, and MGnifyR package are designed to bridge these gaps. Users can launch an online R and Python coding environment in their browser, without installing anything. It already includes the main libraries needed for communicating with the MGnify API, analysing data, and making plots. It uses the popular Jupyter Lab software, which means you can code inside Notebooks: interactive code documents.
There are example Notebooks written in both R and Python, so users can pick whichever they’re more familiar with.