genecovr
is an R
package that provides plotting functions that summarize gene transcript to genome alignments. The main purpose is to assess the effect of polishing and scaffolding operations has on the quality of a genome assembly. The gene transcript set is a large sequence set consisting of assembled transcripts from RNA-seq data generated in relation to a genome assembly project. Therefore, genecovr
serves as a complement to software such as BUSCO, which evaluates genome assembly quality using a smaller set of well-defined single-copy orthologs.
Installation
You can install the released version of genecovr from NBIS GitHub with:
# If necessary, uncomment to install devtools
# install.packages("devtools")
devtools::install_github("NBISweden/genecovr")
Usage
genecovr script quick start
There is a helper script for generating basic plots located in PACKAGE_DIR/bin/genecovr. Create a data input csv-delimited file with columns
- data label
- mapping file (supported formats: psl)
- assembly file (fasta or fasta index)
- transcript file (fasta or fasta index)
Columns 3 and 4 can be set to missing value (NA) in which case sequence sizes will be inferred from the alignment files. Then run the script to generate plots:
PACKAGE_DIR/bin/genecovr indata.csv
Example
There are example files located in PACKAGE_DIR/inst/extdata consisting of two psl alignment files containing gmap alignments and fasta indices for the transcript sequences and two for different assembly versions:
- nonpolished.fai - fasta index for raw assembly
- polished.fai - fasta index for polished assembly
- transcripts.fai - fasta index for transcript sequences
- transcripts2nonpolished.psl - gmap alignments, transcripts to raw assembly
- transcripts2polished.psl - gmap alignments, transcripts to polished assembly
Using these files and the labels non
and pol
for the different assemblies, a genecovr
input file (called e.g., assemblies.csv
) would look as follows:
nonpol,transcripts2nonpolished.psl,nonpolished.fai,transcripts.fai pol,transcripts2polished.psl,polished.fai,transcripts.fai
and the command to run would be:
genecovr assemblies.csv
genecovr options
To list genecovr script options, type ’genecovr -h`:
: genecovr [-h] [-v] [-p number]
usage-d OUTPUT_DIRECTORY] [--height HEIGHT]
[--width WIDTH]
[
csvfile
:
positional arguments-delimited file with columns
csvfile csv1. data label
2. mapping file (supported formats: psl)
3. assembly file (fasta or fasta index)
4. transcript file (fasta or fasta index)
:
optional arguments-h, --help show this help message and exit
-v, --verbose print extra output
-p number, --cpus number
1]
number of cpus [default -d OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
output directory--height HEIGHT figure height in inches [default 6.0]
--width WIDTH figure width in inches [default 6.0]
R package vignette
Alternatively, import the library in an R script and use the package functions. See Get started or run vignette("genecovr")
for a minimum working example.