Reproducible research repository for NBIS project G Arnqvist 1305
View the Project on GitHub NBISweden/ReprRes_G_Arnqvist_1305
This is a collaborative project between the Göran Arnqvist group, Animal Ecology, Dept. of Ecology and Genetics, Uppsala University and the NBIS long-term support (a.k.a. WABI), performed during 2015-2018 (NBIS LT project name: G_Arnqvist_1305).
As a part of the NBIS reproducible research policy, this repository provides the data-files and scripts necessary to perform the underlying analyses and create tables and plots used in Sayadi et al. The genomic footprint of sexual conflict (manuscript in prep; link not yet available)
Several scripts are provided to reproduce most parts of the Pool-Seq analysis. Please refer to the readme file for more details.
Note! The repository is provided for reproducibility reasons, only; no further development will be made.
The content of this repository can be obtained in two ways:
Download Zip File or Download TAR Ball on the github.io page)View on GitHub)Data:
In this folder you will find several data files.
README.md
This file.
Scripts (in order of execution)
Python, bowtie2, smatools, bcftools, bwa, java, picard).
The Raw files data need to be downloaded: (Genome assembly file, raw sequencing data).
The annotated genome assembly, along with sequence data, is available from the European Nucleotide Archive (ENA) under accession PRJEB30475.
All Pool-seq raw sequencing data have been deposited at the NCBI sequence read archive, under the accession number PRJNA503561.
Please modify the script according to your folder organisation.
Several folders need to be created before running the script (place your shell script in the same directory):
mkdir ref # place genome assembly file in this folder
mkdir re ads # place raw reads in this folder
mkdir trimmed.reads
mkdir mapping
bash PoPoolationPart1.sh
bash PoPoolationPart2.sh
After running the script PopoolationPart1.sh, as an output you will get the Bra.Ca.Yem.idf.mpileup file.
PoPoolationPart2.sh uses the mpileup file to produce the sync file, the cmh file and the fst file.
Please refer to popoolation manual to get familiar with all the output files.
gff.to.SNPs.pl:
This script is used to extract for each list of CDSs the corresponding list of SNPs.
perl gff.to.SNPs.pl
SNPs.to.cmh.allCDSs.pl:
To get the cmh value of each SNP in the CDS regions, you can run this script.
perl SNPs.to.cmh.allCDSs.pl
SNPs.to.cmh.pl:
To get the cmh value of each SNP in the CDS regions, for each list of trancripts (SFP, Digestive enzymes, Abodmen, Head&Thorax). you can run this script.
perl SNPs.to.cmh.pl
Polymorphic.CDSs.pl:
To get the list of SNPs present in the 3 pop in both samples at a proportion of at least 30/70.
perl Polymorphic.CDSs.pl > Polymorphic.SNPs.CDSs.txt
SNPsLogFC.pl:
To get the LogFC value for each SNP in the list of transcripts Abodmen and Head&thorax, you need to run this script. You need to modify the first line of the script to select the input file.
perl SNPsLogFC.pl > output.txt
Go.terms.enrichment.R:
Go term enrichment calculation is made using this script.
Please modify the read.table line to specify the input file.