|
|
@@ -1,15 +1,21 @@ |
|
|
|
# Quality control of germline variants calling results using a Chinese Quartet family |
|
|
|
|
|
|
|
> Author: Run Luyao |
|
|
|
> Author: Chen Haonan |
|
|
|
> |
|
|
|
> E-mail:18110700050@fudan.edu.cn |
|
|
|
> E-mail:haonanchen0815@163.com |
|
|
|
> |
|
|
|
> Git: http://47.103.223.233/renluyao/quartet_dna_quality_control_wgs_big_pipeline |
|
|
|
> Git: http://choppy.3steps.cn/chenhaonan/quartet_dna_quality_control_wgs_big_pipeline.git |
|
|
|
> |
|
|
|
> Last Updates: 2022/4/26 |
|
|
|
> Last Updates: 2023/7/20 |
|
|
|
|
|
|
|
## Install |
|
|
|
|
|
|
|
``` |
|
|
|
open-choppy-env |
|
|
|
choppy install renluyao/quartet_dna_quality_control_big_pipeline |
|
|
|
``` |
|
|
|
## Usage |
|
|
|
|
|
|
|
``` |
|
|
|
open-choppy-env |
|
|
|
choppy install renluyao/quartet_dna_quality_control_big_pipeline |
|
|
@@ -69,37 +75,54 @@ fastq_screen --aligner <aligner> --conf <config_file> --top <number_of_reads> -- |
|
|
|
|
|
|
|
### 2. Genome alignment |
|
|
|
|
|
|
|
####[sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28 |
|
|
|
Reads were mapped to the human reference genome GRCh38 using BWA-MEM.SAMTools is a tool used for SAM/BAM file conversion and BAM file sorting. |
|
|
|
|
|
|
|
####[BWA-MEM](https://github.com/lh3/bwa):v0.7.17 |
|
|
|
|
|
|
|
|
|
|
|
Reads were mapped to the human reference genome GRCh38 using Sentieon BWA. |
|
|
|
|
|
|
|
```bash |
|
|
|
${SENTIEON_INSTALL_DIR}/bin/bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $nt -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | ${SENTIEON_INSTALL_DIR}/bin/sentieon util sort -o ${sample}.sorted.bam -t $nt --sam2bam -i - |
|
|
|
# Mapping to reference genome, converting sam to bam, sorting bam file |
|
|
|
bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $(nproc) -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | samtools view -bS -@ $(nproc) - |
|
|
|
| samtools sort -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam - |
|
|
|
``` |
|
|
|
|
|
|
|
####[SAMTools](https://github.com/samtools/samtools):v1.17 |
|
|
|
|
|
|
|
```bash |
|
|
|
# Building an index for sorted bam file |
|
|
|
samtools index -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam.bai ${user_define_name}_${project}_${sample}.sorted.bam |
|
|
|
``` |
|
|
|
|
|
|
|
### 3. Post-alignment QC |
|
|
|
|
|
|
|
Qualimap and Paicard Tools (implemented by Sentieon) are used to check the quality of BAM files. Deduplicated BAM files are used in this step. |
|
|
|
Qualimap and Picard Tools are used to check the quality of BAM files. Deduplicated BAM files are used in this step. |
|
|
|
|
|
|
|
#### [Qualimap](<http://qualimap.bioinfo.cipf.es/>) 2.0.0 |
|
|
|
|
|
|
|
```bash |
|
|
|
# BAM QC by qualimap |
|
|
|
qualimap bamqc -bam <bam_file> -outformat PDF:HTML -nt <threads> -outdir <output_directory> --java-mem-size=32G |
|
|
|
``` |
|
|
|
|
|
|
|
####[Sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28 |
|
|
|
####[Picard](https://github.com/broadinstitute/picard/):v3.0.0 |
|
|
|
|
|
|
|
``` |
|
|
|
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${Dedup_bam} --algo CoverageMetrics --omit_base_output ${sample}_deduped_coverage_metrics --algo MeanQualityByCycle ${sample}_deduped_mq_metrics.txt --algo QualDistribution ${sample}_deduped_qd_metrics.txt --algo GCBias --summary ${sample}_deduped_gc_summary.txt ${sample}_deduped_gc_metrics.txt --algo AlignmentStat ${sample}_deduped_aln_metrics.txt --algo InsertSizeMetricAlgo ${sample}_deduped_is_metrics.txt --algo QualityYield ${sample}_deduped_QualityYield.txt --algo WgsMetricsAlgo ${sample}_deduped_WgsMetricsAlgo.txt |
|
|
|
```bash |
|
|
|
# Remove duplicates |
|
|
|
java -jar /usr/local/picard.jar MarkDuplicates -I ${sorted_bam} -O ${sample}.sorted.deduped.bam -M ${sample}_dedup_metrics.txt --REMOVE_DUPLICATES |
|
|
|
# Building an index for the sorted and deduplicated bam file |
|
|
|
samtools index -@ $(nproc) -o ${sample}.sorted.deduped.bam.bai ${sample}.sorted.deduped.bam |
|
|
|
``` |
|
|
|
|
|
|
|
### 4. Germline variant calling |
|
|
|
|
|
|
|
HaplotyperCaller implemented by Sentieon is used to identify germline variants. |
|
|
|
HaplotyperCaller implemented by Google DeepVariant is used to identify germline variants. |
|
|
|
|
|
|
|
```bash |
|
|
|
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${recaled_bam} --algo Haplotyper ${sample}_hc.vcf |
|
|
|
# Calling variant |
|
|
|
deepvariant/bin/run_deepvariant --model_type=WGS --ref=${ref_dir}/${fasta} --reads=${recaled_bam} --output_vcf=${sample}_hc.vcf --num_shards=$(nproc) |
|
|
|
# Building an index for the vcf file |
|
|
|
gatk IndexFeatureFile -I ${sample}_hc.vcf -O ${sample}_hc.vcf.idx |
|
|
|
``` |
|
|
|
|
|
|
|
### 5. Variants Calling QC |