Browse Source

test14

tags/v0.3.0
Haonan917 2 years ago
parent
commit
41eb8a2f83
1 changed files with 36 additions and 13 deletions
  1. +36
    -13
      README.md

+ 36
- 13
README.md View File

@@ -1,15 +1,21 @@
# Quality control of germline variants calling results using a Chinese Quartet family

> Author: Run Luyao
> Author: Chen Haonan
>
> E-mail:18110700050@fudan.edu.cn
> E-mail:haonanchen0815@163.com
>
> Git: http://47.103.223.233/renluyao/quartet_dna_quality_control_wgs_big_pipeline
> Git: http://choppy.3steps.cn/chenhaonan/quartet_dna_quality_control_wgs_big_pipeline.git
>
> Last Updates: 2022/4/26
> Last Updates: 2023/7/20

## Install

```
open-choppy-env
choppy install renluyao/quartet_dna_quality_control_big_pipeline
```
## Usage

```
open-choppy-env
choppy install renluyao/quartet_dna_quality_control_big_pipeline
@@ -69,37 +75,54 @@ fastq_screen --aligner <aligner> --conf <config_file> --top <number_of_reads> --

### 2. Genome alignment

####[sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28
Reads were mapped to the human reference genome GRCh38 using BWA-MEM.SAMTools is a tool used for SAM/BAM file conversion and BAM file sorting.

####[BWA-MEM](https://github.com/lh3/bwa):v0.7.17


Reads were mapped to the human reference genome GRCh38 using Sentieon BWA.

```bash
${SENTIEON_INSTALL_DIR}/bin/bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $nt -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | ${SENTIEON_INSTALL_DIR}/bin/sentieon util sort -o ${sample}.sorted.bam -t $nt --sam2bam -i -
# Mapping to reference genome, converting sam to bam, sorting bam file
bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $(nproc) -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | samtools view -bS -@ $(nproc) -
| samtools sort -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam -
```

####[SAMTools](https://github.com/samtools/samtools):v1.17

```bash
# Building an index for sorted bam file
samtools index -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam.bai ${user_define_name}_${project}_${sample}.sorted.bam
```

### 3. Post-alignment QC

Qualimap and Paicard Tools (implemented by Sentieon) are used to check the quality of BAM files. Deduplicated BAM files are used in this step.
Qualimap and Picard Tools are used to check the quality of BAM files. Deduplicated BAM files are used in this step.

#### [Qualimap](<http://qualimap.bioinfo.cipf.es/>) 2.0.0

```bash
# BAM QC by qualimap
qualimap bamqc -bam <bam_file> -outformat PDF:HTML -nt <threads> -outdir <output_directory> --java-mem-size=32G
```

####[Sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28
####[Picard](https://github.com/broadinstitute/picard/):v3.0.0

```
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${Dedup_bam} --algo CoverageMetrics --omit_base_output ${sample}_deduped_coverage_metrics --algo MeanQualityByCycle ${sample}_deduped_mq_metrics.txt --algo QualDistribution ${sample}_deduped_qd_metrics.txt --algo GCBias --summary ${sample}_deduped_gc_summary.txt ${sample}_deduped_gc_metrics.txt --algo AlignmentStat ${sample}_deduped_aln_metrics.txt --algo InsertSizeMetricAlgo ${sample}_deduped_is_metrics.txt --algo QualityYield ${sample}_deduped_QualityYield.txt --algo WgsMetricsAlgo ${sample}_deduped_WgsMetricsAlgo.txt
```bash
# Remove duplicates
java -jar /usr/local/picard.jar MarkDuplicates -I ${sorted_bam} -O ${sample}.sorted.deduped.bam -M ${sample}_dedup_metrics.txt --REMOVE_DUPLICATES
# Building an index for the sorted and deduplicated bam file
samtools index -@ $(nproc) -o ${sample}.sorted.deduped.bam.bai ${sample}.sorted.deduped.bam
```

### 4. Germline variant calling

HaplotyperCaller implemented by Sentieon is used to identify germline variants.
HaplotyperCaller implemented by Google DeepVariant is used to identify germline variants.

```bash
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${recaled_bam} --algo Haplotyper ${sample}_hc.vcf
# Calling variant
deepvariant/bin/run_deepvariant --model_type=WGS --ref=${ref_dir}/${fasta} --reads=${recaled_bam} --output_vcf=${sample}_hc.vcf --num_shards=$(nproc)
# Building an index for the vcf file
gatk IndexFeatureFile -I ${sample}_hc.vcf -O ${sample}_hc.vcf.idx
```

### 5. Variants Calling QC

Loading…
Cancel
Save