Using GraphBin2
You can see the usage options of GraphBin2 by typing ./graphbin2 -h on the command line. For example,
Usage: graphbin2 [OPTIONS]
GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using
Assembly Graphs GraphBin2 is a tool which refines the binning results
obtained from existing tools and, is able to assign contigs to multiple
bins. GraphBin2 uses the connectivity and coverage information from
assembly graphs to adjust existing binning results on contigs and to infer
contigs shared by multiple species.
Options:
--assembler [spades|megahit|sga|flye]
name of the assembler used. (Supports
SPAdes, SGA and Flye) [required]
--graph PATH path to the assembly graph file [required]
--contigs PATH path to the contigs file [required]
--paths PATH path to the contigs.paths (metaSPAdes) or
assembly.info (metaFlye) file
--abundance PATH path to the abundance file [required]
--binned PATH path to the .csv file with the initial
binning output from an existing toole
[required]
--output PATH path to the output folder [required]
--prefix TEXT prefix for the output file
--depth INTEGER maximum depth for the breadth-first-search.
[default: 5]
--threshold FLOAT threshold for determining inconsistent
vertices. [default: 1.5]
--delimiter [,|;|$'\t'|" "] delimiter for output results. Supports a
comma (,), a semicolon (;), a tab ($'\t'), a
space (" ") and a pipe (|) . [default: ,]
--nthreads INTEGER number of threads to use. [default: 8]
-v, --version Show the version and exit.
--help Show this message and exit.
Input Format
The SPAdes version of graphbin2 takes in 4 files as inputs (required).
* Contigs file (in .fasta format)
* Assembly graph file (in .gfa format)
* Paths of contigs (in .paths format)
* Binning output from an existing tool (in .csv format)
The SGA version of graphbin2 takes in 4 files as inputs (required).
* Contigs file (in .fasta format)
* Abundance file (tab separated file with contig ID and coverage in each line)
* Assembly graph file (in .asqg format)
* Binning output from an existing tool (in .csv format)
The MEGAHIT version of graphbin2 takes in 4 files as inputs (required).
* Contigs file (in .fasta format)
* Abundance file (tab separated file with contig ID and coverage in each line)
* Assembly graph file (in .gfa format)
* Binning output from an existing tool (in .csv format)
The Flye version of graphbin2 takes in 4 files as inputs (required).
* Contigs file (in .fasta format)
* Abundance file (tab separated file with contig ID and coverage in each line)
* Assembly graph file (in .gfa format)
* Binning output from an existing tool (in .csv format)
Note: The abundance file (e.g., abundance.abund) is a tab separated file with contig ID and the coverage for each contig in the assembly. metaSPAdes provides the coverage of each contig in the contig identifier of the final assembly. We can directly extract these values to create the abundance.abund file. However, no such information is provided for contigs produced by SGA. Hence, reads should be mapped back to the assembled contigs in order to determine the coverage of SGA contigs.
Note: Make sure that the initial binning result consists of contigs belonging to only one bin. GraphBin2 is designed to handle initial contigs which belong to only one bin.
Note: You can specify the delimiter for the initial binning result file and the final output file using the delimiter paramter. Enter the following values for different delimiters; , for a comma, ; for a semicolon, $'\t' for a tab, " " for a space and | for a pipe.
Example Usage
# metaSPAdes assembly
graphbin2 --assembler spades --contigs /path/to/contigs.fasta --paths /path/to/paths_file.paths --graph /path/to/graph_file.gfa --binned /path/to/binning_result.csv --abundance /path/to/abundance.tsv --output /path/to/output_folder
# SGA assembly
graphbin2 --assembler sga --contigs /path/to/contigs.fa --graph /path/to/graph_file.asqg --binned /path/to/binning_result.csv --abundance /path/to/abundance.tsv --output /path/to/output_folder
# MEGAHIT version
graphbin2 --assembler megahit --graph /path/to/final.gfa --contigs /path/to/final.contigs.fa --binned /path/to/binning_result.csv --abundance /path/to/abundance.tsv --output /path/to/output_folder
# metaFlye assembly
graphbin2 --assembler flye --contigs /path/to/assembly.fasta --paths /path/to/assembly_info.txt --graph /path/to/graph_file.gfa --binned /path/to/binning_result.csv --abundance /path/to/abundance.tsv --output /path/to/output_folder