agat_sp_fix_features_locations_duplicated.pl#
DESCRIPTION#
The script aims to modify/remove feature with duplicated locations. Even if it not an error by itself in a gtf/gff file, it becomes problematic when submitting the file to ENA (after convertion). To modify locations, AGAT modify the UTRs (when available) by shortening them by 1 bp (and consequently the Parent features and the exons accordingly)
- Case1: When isoforms have identical exon structures, AGAT removes duplicates by keeping the one with longest CDS;
- Case2: When l2 (e.g. mRNA) from different gene identifier have identical exon but no CDS at all, AGAT removes one duplicate);
- Case3: When l2 (e.g. mRNA) from different gene identifier have identical exon and CDS structures, AGAT removes duplicates by keeping the one with longest CDS);
- Case4: When l2 (e.g. mRNA) from different gene identifier have identical exon structures and different CDS structures, AGAT reshapes UTRs to modify mRNA and gene locations);
- Case5: When l2 (e.g. mRNA) from different gene identifier overlap but have different exon structure. In that case AGAT modified the gene locations by clipping UTRs;
SYNOPSIS#
agat_sp_fix_features_locations_duplicated.pl --gff infile [-o outfile]
agat_sp_fix_features_locations_duplicated.pl --help
OPTIONS#
-
-f, --file, --gff3 or --gff
Input GTF/GFF file.
-
-m or --model
To select cases you want to fix. By default all are used. To select specific cases write e.g. --model 1,4,5
Case1: When isoforms have identical exon structures AGAT removes duplicates by keeping the one with longest CDS; Case2: When l2 (e.g. mRNA) from different gene identifier have identical exon but no CDS at all (AGAT removes one duplicate); Case3: When l2 (e.g. mRNA) from different gene identifier have identical exon and CDS structures (AGAT removes duplicates by keeping the one with longest CDS); Case4: When l2 (e.g. mRNA) from different gene identifier have identical exon structures and different CDS structures (AGAT reshapes UTRs to modify mRNA and gene locations); Case5: When l2 (e.g. mRNA) from different gene identifier overlap but have different exon structure. In that case AGAT modified the gene locations by clipping UTRs;
-
-v or verbose
Add verbosity.
-
-o, --out, --output or --outfile
Output file. If none given, will be display in standard output.
-
-c or --config
String - Input agat config file. By default AGAT takes as input agat_config.yaml file from the working directory if any, otherwise it takes the orignal agat_config.yaml shipped with AGAT. To get the agat_config.yaml locally type: "agat config --expose". The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
-
--help or -h
Display this helpful text.