MetaGraph is a tool for scalable construction of annotated genome graphs and sequence-to-graph alignment.
The default index representations in MetaGraph are extremely scalable and support building graphs with trillions of nodes and millions of annotation labels. At the same time, the provided workflows and their careful implementation, combined with low-level optimizations of the core data structures, enable exceptional query and alignment performance.
Online documentation is available at https://metagraph.ethz.ch/static/docs/index.html. Offline sources are here.
Install the latest release on Linux or Mac OS X with Anaconda:
conda install -c bioconda -c conda-forge metagraph
If docker is available on the system, immediately get started with
docker pull ghcr.io/ratschlab/metagraph:master
docker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master
build -v -k 10 -o /mnt/transcripts_1000 /mnt/transcripts_1000.fa
and replace ${HOME}
with a directory on the host system to map it under /mnt
in the container.
To run the binary compiled for the Protein
alphabet, just add --entrypoint metagraph_Protein
:
docker run -v ${HOME}:/mnt --entrypoint metagraph_Protein ghcr.io/ratschlab/metagraph:master
build -v -k 10 -o /mnt/graph /mnt/protein.fa
As you see, running MetaGraph from docker containers is very easy. Also, the following command (or similar) may be handy to see what directory is mounted in the container or other sort of debugging of the command:
docker run -v ${HOME}:/mnt --entrypoint ls ghcr.io/ratschlab/metagraph:master /mnt
All different versions of the container image are listed here.
To compile from source (e.g., for builds with custom alphabet or other configurations), see documentation online.
./metagraph build
./metagraph annotate
./metagraph transform_anno
./metagraph query
DATA="../tests/data/transcripts_1000.fa"
./metagraph build -k 12 -o transcripts_1000 $DATA
./metagraph annotate -i transcripts_1000.dbg --anno-filename -o transcripts_1000 $DATA
./metagraph query -i transcripts_1000.dbg -a transcripts_1000.column.annodbg $DATA
./metagraph stats -a transcripts_1000.column.annodbg transcripts_1000.dbg
./metagraph
./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10
-o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz
2>&1 | tee <LOG_DIR>/log.txt
./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 --disk-swap <GRAPH_DIR>
-o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz
2>&1 | tee <LOG_DIR>/log.txt
K=20
./KMC/kmc -ci5 -t4 -k$K -m5 -fm <FILE>.fasta.gz <FILE>.cutoff_5 ./KMC
./metagraph build -v -p 4 -k $K --mem-cap-gb 10 -o graph <FILE>.cutoff_5.kmc_pre
./metagraph annotate -v --anno-type row --fasta-anno
-i primates.dbg
-o primates
~/fasta_zurich/refs_chimpanzee_primates.fa
./metagraph transform_anno -v --linkage --greedy
-o linkage.txt
--subsample R
-p NCORES
primates.column.annodbg
Requires N*R/8 + 6*N^2
bytes of RAM, where N
is the number of columns and R
is the number of rows subsampled.
./metagraph transform_anno -v -p NCORES --anno-type brwt
--linkage-file linkage.txt
-o primates
--parallel-nodes V
-p NCORES
primates.column.annodbg
Requires M*V/8 + Size(BRWT)
bytes of RAM, where M
is the number of rows in the annotation and V
is the number of nodes merged concurrently.
./metagraph query -v -i <GRAPH_DIR>/graph.dbg
-a <GRAPH_DIR>/annotation.column.annodbg
--min-kmers-fraction-label 0.8 --labels-delimiter ", "
query_seq.fa
./metagraph align -v -i <GRAPH_DIR>/graph.dbg query_seq.fa
./metagraph assemble -v <GRAPH_DIR>/graph.dbg
-o assembled.fa
--unitigs
./metagraph assemble -v <GRAPH_DIR>/graph.dbg
--unitigs
-a <GRAPH_DIR>/annotation.column.annodbg
--diff-assembly-rules diff_assembly_rules.json
-o diff_assembled.fa
See metagraph/tests/data/example.diff.json
and metagraph/tests/data/example_simple.diff.json
for sample files.
Stats for graph
./metagraph stats graph.dbg
Stats for annotation
./metagraph stats -a annotation.column.annodbg
Stats for both
./metagraph stats -a annotation.column.annodbg graph.dbg
The Makefile
in the top level source directory can be used to build and test metagraph
more conveniently. The following
arguments are supported:
env
: environment in which to compile/run (""
: on the host, docker
: in a docker container)alphabet
: compile metagraph for a certain alphabet (e.g. DNA
or Protein
, default DNA
)additional_cmake_args
: additional arguments to pass to cmake.Examples:
# compiles metagraph in a docker container for the `DNA` alphabet
make build-metagraph env=docker alphabet=DNA
Creating a new version release is done in three steps:
Metagraph is distributed under the GPLv3 License (see LICENSE). Please find further information in the AUTHORS and COPYRIGHTS files.