소스 검색

test14

tags/v0.3.0
Haonan917 2 년 전
부모
커밋
41eb8a2f83
1개의 변경된 파일36개의 추가작업 그리고 13개의 파일을 삭제
  1. +36
    -13
      README.md

+ 36
- 13
README.md 파일 보기

# Quality control of germline variants calling results using a Chinese Quartet family # Quality control of germline variants calling results using a Chinese Quartet family


> Author: Run Luyao
> Author: Chen Haonan
> >
> E-mail:18110700050@fudan.edu.cn
> E-mail:haonanchen0815@163.com
> >
> Git: http://47.103.223.233/renluyao/quartet_dna_quality_control_wgs_big_pipeline
> Git: http://choppy.3steps.cn/chenhaonan/quartet_dna_quality_control_wgs_big_pipeline.git
> >
> Last Updates: 2022/4/26
> Last Updates: 2023/7/20


## Install ## Install


```
open-choppy-env
choppy install renluyao/quartet_dna_quality_control_big_pipeline
```
## Usage

``` ```
open-choppy-env open-choppy-env
choppy install renluyao/quartet_dna_quality_control_big_pipeline choppy install renluyao/quartet_dna_quality_control_big_pipeline


### 2. Genome alignment ### 2. Genome alignment


####[sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28
Reads were mapped to the human reference genome GRCh38 using BWA-MEM.SAMTools is a tool used for SAM/BAM file conversion and BAM file sorting.

####[BWA-MEM](https://github.com/lh3/bwa):v0.7.17



Reads were mapped to the human reference genome GRCh38 using Sentieon BWA.


```bash ```bash
${SENTIEON_INSTALL_DIR}/bin/bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $nt -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | ${SENTIEON_INSTALL_DIR}/bin/sentieon util sort -o ${sample}.sorted.bam -t $nt --sam2bam -i -
# Mapping to reference genome, converting sam to bam, sorting bam file
bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $(nproc) -K 10000000 ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | samtools view -bS -@ $(nproc) -
| samtools sort -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam -
```

####[SAMTools](https://github.com/samtools/samtools):v1.17


```bash
# Building an index for sorted bam file
samtools index -@ $(nproc) -o ${user_define_name}_${project}_${sample}.sorted.bam.bai ${user_define_name}_${project}_${sample}.sorted.bam
``` ```


### 3. Post-alignment QC ### 3. Post-alignment QC


Qualimap and Paicard Tools (implemented by Sentieon) are used to check the quality of BAM files. Deduplicated BAM files are used in this step.
Qualimap and Picard Tools are used to check the quality of BAM files. Deduplicated BAM files are used in this step.


#### [Qualimap](<http://qualimap.bioinfo.cipf.es/>) 2.0.0 #### [Qualimap](<http://qualimap.bioinfo.cipf.es/>) 2.0.0


```bash ```bash
# BAM QC by qualimap
qualimap bamqc -bam <bam_file> -outformat PDF:HTML -nt <threads> -outdir <output_directory> --java-mem-size=32G qualimap bamqc -bam <bam_file> -outformat PDF:HTML -nt <threads> -outdir <output_directory> --java-mem-size=32G
``` ```


####[Sentieon-genomics](https://support.sentieon.com/manual/):v2019.11.28
####[Picard](https://github.com/broadinstitute/picard/):v3.0.0


```
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${Dedup_bam} --algo CoverageMetrics --omit_base_output ${sample}_deduped_coverage_metrics --algo MeanQualityByCycle ${sample}_deduped_mq_metrics.txt --algo QualDistribution ${sample}_deduped_qd_metrics.txt --algo GCBias --summary ${sample}_deduped_gc_summary.txt ${sample}_deduped_gc_metrics.txt --algo AlignmentStat ${sample}_deduped_aln_metrics.txt --algo InsertSizeMetricAlgo ${sample}_deduped_is_metrics.txt --algo QualityYield ${sample}_deduped_QualityYield.txt --algo WgsMetricsAlgo ${sample}_deduped_WgsMetricsAlgo.txt
```bash
# Remove duplicates
java -jar /usr/local/picard.jar MarkDuplicates -I ${sorted_bam} -O ${sample}.sorted.deduped.bam -M ${sample}_dedup_metrics.txt --REMOVE_DUPLICATES
# Building an index for the sorted and deduplicated bam file
samtools index -@ $(nproc) -o ${sample}.sorted.deduped.bam.bai ${sample}.sorted.deduped.bam
``` ```


### 4. Germline variant calling ### 4. Germline variant calling


HaplotyperCaller implemented by Sentieon is used to identify germline variants.
HaplotyperCaller implemented by Google DeepVariant is used to identify germline variants.


```bash ```bash
${SENTIEON_INSTALL_DIR}/bin/sentieon driver -r ${ref_dir}/${fasta} -t $nt -i ${recaled_bam} --algo Haplotyper ${sample}_hc.vcf
# Calling variant
deepvariant/bin/run_deepvariant --model_type=WGS --ref=${ref_dir}/${fasta} --reads=${recaled_bam} --output_vcf=${sample}_hc.vcf --num_shards=$(nproc)
# Building an index for the vcf file
gatk IndexFeatureFile -I ${sample}_hc.vcf -O ${sample}_hc.vcf.idx
``` ```


### 5. Variants Calling QC ### 5. Variants Calling QC

Loading…
취소
저장