Step-by-Step Guide: Variant Calling
Author: Dr. Itunuoluwa Isewon
Email: itunu.isewon@covenantuniversity.edu.ng
đĽ Dataset: Download the file here, then import history accordingly (change history name to ACEMFS _your_name)
Quality control
N.B.: Skip âObtain Fastq filesâ if you click the dataset link
Obtain Fastq files:
- In the tool Bar, click on Get Data
- Choose âDownload and Extract Reads in FASTA/Q format from NCBI SRAâ
- Change the select input type to âList of SRA accession", then chose your sample id file and run tool
- In this tutorial weâll use six datasets.
Sample | Condition |
---|---|
SRR15044361 | test |
SRR15044360 | test |
SRR15044359 | test |
SRR15044358 | control |
SRR15044357 | control |
SRR15044356 | control |
Perform QC:
- On the search bar, type fastqc
- Choose the desired fastq files (paired end) in the raw read tab.
- Leave all other tabs unchanged
- Once it runs, two files are generated, a raw data file and a Webpage file
- View the result by clicking on the webpage file produced
- Repeat for the second data and compare their results.
Multiqc
Why: it helps us to obtain a more intuitive comparison
- On the search bar, type multiqc
- On the âWhich tool was used generate logs?â tab, choose Fastqc
- Then click on âInsert FastQC outputâ
- Type of output is raw data
- Add the raw data files generated earlier
- Leave all other parameters at default
- Run tool
- View the result by clicking on the webpage file produced
Variant calling
Mapping
- Search for Map with BWA-MEM in the tool search bar, choose the options for longer reads
- We would be using a built-in genome
- Choose Aspergillus flavus NRRL3357 as the reference genome
- Leave other parameters as default
Descriptive statistics
- Search for Samtools flagstat in the tool search bar, choose the options for longer reads
- Select the file generated from the BWA-MEM and leave the output format as txt
- Run tool
- View results
Generate genotype likelihoods
- Search for bcftools mpileup in the tool search bar
- We are using single BAM alignment input
- Select the file generated from the BWA-MEM
- Reference genome is Aspergillus flavus NRRL3357
- Output format is uncompressed VCF
- Run tool
Variant calling
- Search for bcftools call in the tool search bar, choose the options for longer reads
- Select the file generated from the bcftools mpileup
- Leave all other parameters default
- Output format is uncompressed VCF
- Run tool
- View result
Remove homologous variants and variants with missing phenotype
- Search for Filter data on any column using simple expressions in the tool search bar
- Select the file generated from the bcftools call
- Supply the condition c10 != â0/0â : sample genotype information are on the tenth column, != means not equal to, â0/0â represents homologous variants (portions of the genome not different from the refence)
- Run tool
- View result
- Search for Filter data on any column using simple expressions in the tool search bar
- Select the file generated from the last step
- Supply the condition c10 != â./.â : â./.â denotes missing data
- Run tool
- View result
Sorting
- Find sort in the search bar, choose âSort data in ascending or descending orderâ
- Sort on column 6: Quality
- Keep every other parameter as default.
- Variants with high quality are now on top.
Variant Annotation with Ensembl Fungi VEP
- Search for Variant Effect Predictor (VEP) in the Ensembl tools search bar.
- Select the input file: choose the VCF file generated from the sorting step.
- Species selection: Set species to your organism (e.g., Aspergillus flavus, Saccharomyces cerevisiae, etc.) from the Ensembl Fungi database.
- Input format: keep as VCF.
- Output options:Keep default output as tab-delimited text or select VCF with annotations if you prefer annotated VCF.
- Annotations to include (keep defaults, but you can also enable if available):
⢠Gene symbol
⢠Consequence terms (missense, synonymous, stop gained, etc.)
⢠Protein domains (Pfam, InterPro)
⢠SIFT/PolyPhen predictions (if available for your species)
⢠Transcript ID and biotype
In the meantime a list of Variants have provided for you to use. Run each of these using the âRUN VEP! For this line optionâ.
Sample | |||||
---|---|---|---|---|---|
AAIH03000093.1 | 2709654 | . | G | A | 3.02336 |
AAIH03000170.1 | 1814273 | . | T | A | 3.02336 |
AAIH03000282.1 | 926657 | . | G | T | 3.02336 |
AAIH03000072.1 | 3023139 | . | T | C | 3.02501 |
AAIH03000170.1 | 1417818 | . | C | T | 3.02996 |
AAIH03000103.1 | 103700 | . | taaaaaaa | taaaaaaa | 3.03091 |
AAIH03000103.1 | 598760 | . | T | C | 3.03539 |
AAIH03000235.1 | 829594 | . | G | A | 3.04327 |
AAIH03000072.1 | 2160575 | . | A | C | 3.04541 |
AAIH03000226.1 | 603606 | . | T | A | 3.05565 |
AAIH03000226.1 | 3304340 | . | T | A | 3.06291 |
AAIH03000011.1 | 401846 | . | G | A | 3.07192 |
AAIH03000072.1 | 4101557 | . | T | C | 3.03091 |
AAIH03000173.1 | 1293019 | . | tc | t | 3.07736 |
AAIH03000072.1 | 3410716 | . | C | T | 3.08215 |