Contamination is not always easy to detect at the read level. Let’s assess the assemblies again using Kraken2, but this time at the assembly level.
One could also classify the sequences using Blast, however this is a time consuming process. Running a single assembly on the resources allocated to you can take up to an hour.
Copy the results we have already provided and inspect them.
The results were generated in the following way:
Run Blobtools on each assembly. Blobtools requires both a BAM file as input and blast output for the classification step. What do these plots show?
Bandage is a great tool to visualise assembly graphs. Load and draw some of the assembly graphs.
The assembly graphs for some of the assemblies can be found here:
Bandage is loaded through the conda environment GA2018
.
Use the Blast results to create a label csv to load into bandage and identify scaffolds. Have a go at labelling only 5 sequences.
Look at the workshop wiki for a brief description of the GFA file format, and the Bandage webpage for information on how to construct the csv.
Hint: Use nano
, a unix file editor to write the file.
Optional: Use the unix command line tools such as grep
, sort
, cut
, and join
to manipulate the data into csv.