For this exercise you need to be logged in to Uppmax.
Setup the folder structure:
source ~/git/GAAS/profiles/activate_rackham_env
export data=/proj/g2019006/nobackup/$USER/data
export work_with_gff=/proj/g2019006/nobackup/$USER/work_with_gff
mkdir -p $work_with_gff
GFF and GTF format are close and could be difficult to differentiate. To a complete overview of the format you can have a look in the cheat sheet section.
cd $work_with_gff
cp $data/annotation/augustus.xxx .
less augustus.xxx

 Click to see the solution .
 Click to see the solution .tag value Now edit the file to fix the 9th column:
  chmod +w augustus.xxx
  nano augustus.xxx
 Click to see the solution .
 Click to see the solution .    
    4	AUGUSTUS        gene    386     13142   0.01    +	.	gene_id g1;
    4	AUGUSTUS        transcript	386     13142   0.01    +	.	transcript_id g1.t1;
  
Now your file has at least a correct structure!
Let’s convert it to GFF3 format:
  gxf_to_gff3.pl --gff augustus.xxx -o augustus.gff3 
The script gxf_to_gff3.pl can be your friend when dealing with GFF/GTF format files. It can deal with any kind of GFF/GTF format (even mixed formats) and errors. It allows to create a standardized GFF3 format file.
The GFF fomat has been developed to be easy to parse and process by a variety of programs in different languages (e.g Unix tools as grep and sort, perl, awk, etc). For these reasons, they decided that each feature is described on a single line.
Download human gff annotation v96 from Ensembl:
 wget ftp://ftp.ensembl.org/pub/release-96/gff3/homo_sapiens/Homo_sapiens.GRCh38.96.chr.gff3.gz
 What is the size of this file?
 What is the size of this file?
Now uncompress it:
 gunzip Homo_sapiens.GRCh38.96.chr.gff3.gz 
 What is the size of the uncompressed file?
 What is the size of the uncompressed file?
The gff/gtf format has a good compression ratio.
Let’s now compute some statistics on this file.

 Click to see the solution .
 Click to see the solution .wc -l Homo_sapiens.GRCh38.96.chr.gff3 awk '{if($3=="gene") print $0}' Homo_sapiens.GRCh38.96.chr.gff3 | wc -l awk '{if($3=="mRNA") print $0}' Homo_sapiens.GRCh38.96.chr.gff3 | wc -l awk '{if($3=="gene" && $1=="1") print $0}' Homo_sapiens.GRCh38.96.chr.gff3 | wc -l awk '{if($0 !~ /^#/)print $3}' Homo_sapiens.GRCh38.96.chr.gff3 | sort -u