Browse Source

debug

tags/v0.1.1
LUYAO REN 3 years ago
parent
commit
515e80d0d6
7 changed files with 238 additions and 200 deletions
  1. +18
    -43
      README.md
  2. +16
    -2
      defaults
  3. +18
    -4
      inputs
  4. +1
    -1
      tasks/deduped_Metrics.wdl
  5. +46
    -0
      tasks/get_variants_in_del.wdl
  6. +1
    -0
      tasks/qualimap.wdl
  7. +138
    -150
      workflow.wdl

+ 18
- 43
README.md View File

@@ -1,23 +1,21 @@
# WGS-germline Small Variants Quality Control Pipeline(Start from FASTQ files)
# Quality control of germline variants calling results using a Chinese Quartet family

> Author: Run Luyao
>
> E-mail:18110700050@fudan.edu.cn
>
> Git: http://choppy.3steps.cn/renluyao/WGS_germline_datapotal.git
> Git: http://47.103.223.233/renluyao/quartet_dna_quality_control_big_pipeline.git
>
> Last Updates: 2020/11/25
> Last Updates: 2021/7/5

## 安装指南
## Install

```
# 激活choppy环境
open-choppy-env
# 安装app
choppy install renluyao/WGS_germline_datapotal
choppy install renluyao/quartet_dna_quality_control_big_pipeline
```

## App概述——中华家系1号标准物质介绍
## Introduction of Chinese Quartet DNA reference materials

建立高通量全基因组测序的生物计量和质量控制关键技术体系,是保障测序数据跨技术平台、跨实验室可比较、相关研究结果可重复、数据可共享的重要关键共性技术。建立国家基因组标准物质和基准数据集,突破基因组学的生物计量技术,是将测序技术转化成临床应用的重要环节与必经之路,目前国际上尚属空白。中国计量科学研究院与复旦大学、复旦大学泰州健康科学研究院共同研制了人源中华家系1号基因组标准物质(**Quartet,一套4个样本,编号分别为LCL5,LCL6,LCL7,LCL8,其中LCL5和LCL6为同卵双胞胎女儿,LCL7为父亲,LCL8为母亲**),以及相应的全基因组测序序列基准数据集(“量值”),为衡量基因序列检测准确与否提供一把“标尺”,成为保障基因测序数据可靠性的国家基准。人源中华家系1号基因组标准物质来源于泰州队列同卵双生双胞胎家庭,从遗传结构上体现了我国南北交界的人群结构特征,同时家系的设计也为“量值”的确定提供了遗传学依据。

@@ -31,13 +29,11 @@ choppy install renluyao/WGS_germline_datapotal

![workflow](./pictures/workflow.png)

![](./pictures/table.png)

### 1. 原始数据质量控制
### 1. Pre-alignment QC

#### [Fastqc](<https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>) v0.11.5

FastQC是一个常用的测序原始数据的质控软件,主要包括12个模块,具体请参考[Fastqc模块详情](<https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/>)
[FastQC](<https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/>) is used to investigate the quality of fastq files

```bash
fastqc -t <threads> -o <output_directory> <fastq_file>
@@ -45,63 +41,48 @@ fastqc -t <threads> -o <output_directory> <fastq_file>

#### [Fastq Screen](<https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/>) 0.12.0

Fastq Screen是检测测序原始数据中是否引⼊入其他物种,或是接头引物等污染,⽐比如,如果测序样本
是⼈人类,我们期望99%以上的reads匹配到⼈人类基因组,10%左右的reads匹配到与⼈人类基因组同源性
较⾼高的⼩小⿏鼠上。如果有过多的reads匹配到Ecoli或者Yeast,要考虑是否在培养细胞的时候细胞系被污
染,或者建库时⽂文库被污染。
Fastq Screen is used to inspect whether the library were contaminated. For example, we expected 99% reads aligned to human genome, 10% reads aligned to mouse genome, which is partly homologous to human genome. If too many reads are aligned to E.Coli or Yeast, libraries or cell lines are probably comtminated.

```bash
fastq_screen --aligner <aligner> --conf <config_file> --top <number_of_reads> --threads <threads> <fastq_file>
```

`--conf` conifg 文件主要输入了多个物种的fasta文件地址,可根据自己自己的需求下载其他物种的fasta文件加入分析

`--top`一般不需要对整个fastq文件进行检索,取前100000行

### 2. 比对后数据质量控制
### 2. Post-alignment QC

#### [Qualimap](<http://qualimap.bioinfo.cipf.es/>) 2.0.0

Qualimap是一个比对指控软件,包含Picard的MarkDuplicates的结果和sentieon中metrics的质控结果。
Qualimap is used to check the quality od bam files

```bash
qualimap bamqc -bam <bam_file> -outformat PDF:HTML -nt <threads> -outdir <output_directory> --java-mem-size=32G
```

### 3. 突变检出数据质量控制

突变质量控制的流程如下
### 3. Variants Calling QC

![performance](./pictures/performance.png)

#### 3.1 根据标准数据集的数据质量控制
#### 3.1 Performance assessment based on reference datasets

#### [Hap.py](<https://github.com/Illumina/hap.py>) v0.3.9

hap.py是将被检测vcf结果与benchmarking对比,计算precision和recall的软件,它考虑了vcf中[突变表示形式的多样性](<https://genome.sph.umich.edu/wiki/Variant_Normalization>),进行了归一化。

```bash
hap.py <truth_vcf> <query_vcf> -f <bed_file> --threads <threads> -o <output_filename>
```

#### 3.2 根据Quartet四口之家遗传规律的质量控制
#### 3.2 Performance assessment based on Quartet genetic built-in truth

#### Reproducibility (in-house python script)
#### [Mendelian Concordance Rate](https://github.com/sbg/VBT-TrioAnalysis) (vbt v1.1)

标准数据集是根据我们整合多个平台方法,过滤不可重复检测、不符合孟德尔遗传规律的假阳性的突变。它可以评估数据产生和分析方法的相对好坏,但是具有一定的局限性,因为它排除掉了很多难测的基因组区域。我们可以通过比较同卵双胞胎突变检测的一致性对全基因组范围进行评估。

#### [Mendelian Concordance Ratio](https://github.com/sbg/VBT-TrioAnalysis) (vbt v1.1)

我们首先将四口之家拆分成两个三口之家进行孟德尔遗传的分析。当一个突变符合姐妹一致,且与父母符合孟德尔遗传规律,则认为是符合Quartet四口之家的孟德尔遗传规律。孟德尔符合率是指四个标准检测出的所有突变中满足孟德尔遗传规律的比例。
We splited the Quartet family to two trios (F7, M8, D5 and F7, M8, D6) and then do the Mendelian analysis. A Quartet Mendelian concordant variant is the same between the twins (D5 and D6) , and follow the Mendelian concordant between parents (F7 and M8). Mendelian concordance rate is the Mendelian concordance variant divided by total detected variants in a Quartet family.

```bash
vbt mendelian -ref <fasta_file> -mother <family_merged_vcf> -father <family_merged_vcf> -child <family_merged_vcf> -pedigree <ped_file> -outDir <output_directory> -out-prefix <output_directory_prefix> --output-violation-regions -thread-count <threads>
```

## App输入文件
## Input files

```bash
choppy samples WGS_germline_datapotal-latest --output samples
choppy samples renluyao/quartet_dna_quality_control_big_pipeline-latest --output samples
```

####Samples文件的输入包括
@@ -188,12 +169,6 @@ quartet_indel_aver-std.txt

quartet_snv_aver-std.txt

#### 3. D5_D6.WDL

如果用户没有完整输入一组家庭,但有同时有D5和D6的信息,我们可以计算同卵双胞胎检测出的突变一致性,但是这部分输出暂不整合至报告中。

${project}.sister.txt

## 结果展示与解读

####1. 原始数据质量控制

+ 16
- 2
defaults View File

@@ -1,24 +1,38 @@
{
"benchmarking_dir": "oss://pgx-result/renluyao/manuscript_v3.0/reference_datasets_v202103/",
"vcf_F7": "",
"SENTIEON_INSTALL_DIR": "/opt/sentieon-genomics",
"SENTIEON_LICENSE": "192.168.0.55:8990",
"vcf_M8": "",
"fasta": "GRCh38.d1.vd1.fa",
"BENCHMARKdocker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/rtg-hap:latest",
"vcf_D6": "",
"dbsnp_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/",
"BEDTOOLSdocker": "registry-internal.cn-shanghai.aliyuncs.com/pgx-docker-registry/bedtools:v2.27.1",
"disk_size": "500",
"FASTQCdocker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqc:0.11.8",
"MULTIQCdocker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/multiqc:v1.8",
"fastq_2_M8": "",
"fastq_1_M8": "",
"SMALLcluster_config": "OnDemand bcs.ps.g.xlarge img-ubuntu-vpc",
"screen_ref_dir": "oss://pgx-reference-data/fastq_screen_reference/",
"fastq_1_D5": "",
"dbmills_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/",
"BIGcluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc",
"fastq_screen_conf": "oss://pgx-reference-data/fastq_screen_reference/fastq_screen.conf",
"MULTIQCdocker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/multiqc:v1.8",
"fastq_2_D5": "",
"FASTQSCREENdocker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqscreen:0.12.0",
"fastq_2_F7": "",
"SENTIEON_LICENSE": "192.168.0.55:8990",
"fastq_1_D6": "",
"fastq_1_F7": "",
"SENTIEONdocker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/sentieon-genomics:v2019.11.28",
"QUALIMAPdocker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/qualimap:2.0.0",
"vcf_D5": "",
"benchmark_region": "oss://pgx-result/renluyao/manuscript_v3.0/reference_datasets_v202103/Quartet.high.confidence.region.v202103.bed",
"db_mills": "Mills_and_1000G_gold_standard.indels.hg38.vcf",
"dbsnp": "dbsnp_146.hg38.vcf",
"MENDELIANdocker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/vbt:v1.1",
"fastq_2_D6": "",
"DIYdocker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/high_confidence_call_manuscript:v1.4",
"ref_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/"
}

+ 18
- 4
inputs View File

@@ -1,26 +1,40 @@
{
"{{ project_name }}.benchmarking_dir": "{{ benchmarking_dir }}",
"{{ project_name }}.vcf_F7": "{{ vcf_F7 }}",
"{{ project_name }}.SENTIEON_INSTALL_DIR": "{{ SENTIEON_INSTALL_DIR }}",
"{{ project_name }}.SENTIEON_LICENSE": "{{ SENTIEON_LICENSE }}",
"{{ project_name }}.vcf_M8": "{{ vcf_M8 }}",
"{{ project_name }}.fasta": "{{ fasta }}",
"{{ project_name }}.BENCHMARKdocker": "{{ BENCHMARKdocker }}",
"{{ project_name }}.vcf_D6": "{{ vcf_D6 }}",
"{{ project_name }}.dbsnp_dir": "{{ dbsnp_dir }}",
"{{ project_name }}.BEDTOOLSdocker": "{{ BEDTOOLSdocker }}",
"{{ project_name }}.disk_size": "{{ disk_size }}",
"{{ project_name }}.inputSamplesFile": "{{ inputSamplesFile }}",
"{{ project_name }}.FASTQCdocker": "{{ FASTQCdocker }}",
"{{ project_name }}.MULTIQCdocker": "{{ MULTIQCdocker }}",
"{{ project_name }}.fastq_2_M8": "{{ fastq_2_M8 }}",
"{{ project_name }}.project": "{{ project }}",
"{{ project_name }}.fastq_1_M8": "{{ fastq_1_M8 }}",
"{{ project_name }}.SMALLcluster_config": "{{ SMALLcluster_config }}",
"{{ project_name }}.screen_ref_dir": "{{ screen_ref_dir }}",
"{{ project_name }}.bed": "{{ bed }}",
"{{ project_name }}.fastq_1_D5": "{{ fastq_1_D5 }}",
"{{ project_name }}.dbmills_dir": "{{ dbmills_dir }}",
"{{ project_name }}.BIGcluster_config": "{{ BIGcluster_config }}",
"{{ project_name }}.fastq_screen_conf": "{{ fastq_screen_conf }}",
"{{ project_name }}.fastq_2_D5": "{{ fastq_2_D5 }}",
"{{ project_name }}.FASTQSCREENdocker": "{{ FASTQSCREENdocker }}",
"{{ project_name }}.fastq_2_F7": "{{ fastq_2_F7 }}",
"{{ project_name }}.SENTIEON_LICENSE": "{{ SENTIEON_LICENSE }}",
"{{ project_name }}.fastq_1_D6": "{{ fastq_1_D6 }}",
"{{ project_name }}.fastq_1_F7": "{{ fastq_1_F7 }}",
"{{ project_name }}.SENTIEONdocker": "{{ SENTIEONdocker }}",
"{{ project_name }}.QUALIMAPdocker": "{{ QUALIMAPdocker }}",
"{{ project_name }}.QUALIMAPdocker": "{{ QUALIMAPdocke }}",
"{{ project_name }}.vcf_D5": "{{ vcf_D5 }}",
"{{ project_name }}.benchmark_region": "{{ benchmark_region }}",
"{{ project_name }}.db_mills": "{{ db_mills }}",
"{{ project_name }}.dbsnp": "{{ dbsnp }}",
"{{ project_name }}.MENDELIANdocker": "{{ MENDELIANdocker }}",
"{{ project_name }}.fastq_2_D6": "{{ fastq_2_D6 }}",
"{{ project_name }}.DIYdocker": "{{ DIYdocker }}",
"{{ project_name }}.ref_dir": "{{ ref_dir }}"
}
}

+ 1
- 1
tasks/deduped_Metrics.wdl View File

@@ -1,7 +1,7 @@
task deduped_Metrics {

File ref_dir
File bed
String SENTIEON_INSTALL_DIR
String sample
String fasta

+ 46
- 0
tasks/get_variants_in_del.wdl View File

@@ -0,0 +1,46 @@
task adjust_mendelian_status_del {
File D5_trio_vcf
File D6_trio_vcf
File del_bed
String family_name = basename(D5_trio_vcf,".D5.vcf"
String docker
String cluster_config
String disk_size
command <<<
export LD_LIBRARY_PATH=/opt/htslib-1.9
nt=$(nproc)

echo -e "${family_name}\tM8\t0\t0\t2\t-9\n${family_name}\tF7\t0\t0\t1\t-9\n${family_name}\tD5\tF7\tM8\t2\t-9" > ${family_name}.D5.ped

mkdir VBT_D5
/opt/VBT-TrioAnalysis/vbt mendelian -ref ${ref_dir}/${fasta} -mother ${family_vcf} -father ${family_vcf} -child ${family_vcf} -pedigree ${family_name}.D5.ped -outDir VBT_D5 -out-prefix ${family_name}.D5 --output-violation-regions -thread-count $nt

cat VBT_D5/${family_name}.D5_trio.vcf > ${family_name}.D5.vcf

echo -e "${family_name}\tM8\t0\t0\t2\t-9\n${family_name}\tF7\t0\t0\t1\t-9\n${family_name}\tD6\tF7\tM8\t2\t-9" > ${family_name}.D6.ped

mkdir VBT_D6
/opt/VBT-TrioAnalysis/vbt mendelian -ref ${ref_dir}/${fasta} -mother ${family_vcf} -father ${family_vcf} -child ${family_vcf} -pedigree ${family_name}.D6.ped -outDir VBT_D6 -out-prefix ${family_name}.D6 --output-violation-regions -thread-count $nt

cat VBT_D6/${family_name}.D6_trio.vcf > ${family_name}.D6.vcf
>>>

runtime {
docker:docker
cluster: cluster_config
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
}
output {
File D5_ped = "${family_name}.D5.ped"
File D6_ped = "${family_name}.D6.ped"
Array[File] D5_mendelian = glob("VBT_D5/*")
Array[File] D6_mendelian = glob("VBT_D6/*")
File D5_trio_vcf = "${family_name}.D5.vcf"
File D6_trio_vcf = "${family_name}.D6.vcf"
}
}




+ 1
- 0
tasks/qualimap.wdl View File

@@ -1,6 +1,7 @@
task qualimap {
File bam
File bai
File bed
String bamname = basename(bam,".bam")
String docker
String cluster_config

+ 138
- 150
workflow.wdl View File

@@ -15,11 +15,11 @@ import "./tasks/merge_mendelian.wdl" as merge_mendelian
import "./tasks/quartet_mendelian.wdl" as quartet_mendelian
import "./tasks/fastqc.wdl" as fastqc
import "./tasks/fastqscreen.wdl" as fastqscreen
import "./tasks/D5_D6.wdl" as D5_D6
import "./tasks/merge_family.wdl" as merge_family
import "./tasks/filter_vcf_bed.wdl" as filter_vcf_bed


workflow {{ project_name }} {
workflow project_name {

File? fastq_1_D5
File? fastq_1_D6
@@ -48,6 +48,7 @@ workflow {{ project_name }} {
String MENDELIANdocker
String DIYdocker
String MULTIQCdocker
String BEDTOOLSdocker

String fasta
File ref_dir
@@ -59,6 +60,7 @@ workflow {{ project_name }} {
File screen_ref_dir
File fastq_screen_conf
File benchmarking_dir
File benchmark_region

String project

@@ -68,11 +70,6 @@ workflow {{ project_name }} {

if (fastq_1_D5!= "") {

#######################
### D5 fastq to vcf ###
#######################

call mapping.mapping as mapping_D5 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
@@ -112,8 +109,8 @@ workflow {{ project_name }} {
call Dedup.Dedup as Dedup_D5 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
sorted_bam=mapping.sorted_bam,
sorted_bam_index=mapping.sorted_bam_index,
sorted_bam=mapping_D5.sorted_bam,
sorted_bam_index=mapping_D5.sorted_bam_index,
sample="D5",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -122,8 +119,9 @@ workflow {{ project_name }} {

call qualimap.qualimap as qualimap_D5 {
input:
bam=Dedup.Dedup_bam,
bai=Dedup.Dedup_bam_index,
bed=bed,
bam=Dedup_D5.Dedup_bam,
bai=Dedup_D5.Dedup_bam_index,
docker=QUALIMAPdocker,
disk_size=disk_size,
cluster_config=BIGcluster_config
@@ -133,9 +131,10 @@ workflow {{ project_name }} {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
bed=bed,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_D5.Dedup_bam,
Dedup_bam_index=Dedup_D5.Dedup_bam_index,
sample="D5",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -144,10 +143,10 @@ workflow {{ project_name }} {

call sentieon.sentieon as sentieon_D5 {
input:
quality_yield=deduped_Metrics.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics.deduped_wgsmetrics,
aln_metrics=deduped_Metrics.dedeuped_aln_metrics,
is_metrics=deduped_Metrics.deduped_is_metrics,
quality_yield=deduped_Metrics_D5.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics_D5.deduped_wgsmetrics,
aln_metrics=deduped_Metrics_D5.dedeuped_aln_metrics,
is_metrics=deduped_Metrics_D5.deduped_is_metrics,
sample="D5",
docker=SENTIEONdocker,
cluster_config=SMALLcluster_config,
@@ -159,8 +158,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_D5.Dedup_bam,
Dedup_bam_index=Dedup_D5.Dedup_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
sample="D5",
@@ -174,8 +173,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
realigned_bam=Realigner.realigner_bam,
realigned_bam_index=Realigner.realigner_bam_index,
realigned_bam=Realigner_D5.realigner_bam,
realigned_bam_index=Realigner_D5.realigner_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
dbsnp=dbsnp,
@@ -191,8 +190,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
recaled_bam=BQSR.recaled_bam,
recaled_bam_index=BQSR.recaled_bam_index,
recaled_bam=BQSR_D5.recaled_bam,
recaled_bam_index=BQSR_D5.recaled_bam_index,
sample="D5",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -205,14 +204,14 @@ workflow {{ project_name }} {
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_D5 {
input:
vcf=filter_vcf_bed_D5.filterd_vcf,
vcf=filter_vcf_bed_D5.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_D5.filtered_bed,
@@ -222,10 +221,6 @@ workflow {{ project_name }} {
disk_size=disk_size
}

#######################
### D6 fastq to vcf ###
#######################
call mapping.mapping as mapping_D6 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
@@ -265,8 +260,8 @@ workflow {{ project_name }} {
call Dedup.Dedup as Dedup_D6 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
sorted_bam=mapping.sorted_bam,
sorted_bam_index=mapping.sorted_bam_index,
sorted_bam=mapping_D6.sorted_bam,
sorted_bam_index=mapping_D6.sorted_bam_index,
sample="D6",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -275,8 +270,9 @@ workflow {{ project_name }} {

call qualimap.qualimap as qualimap_D6 {
input:
bam=Dedup.Dedup_bam,
bai=Dedup.Dedup_bam_index,
bed=bed,
bam=Dedup_D6.Dedup_bam,
bai=Dedup_D6.Dedup_bam_index,
docker=QUALIMAPdocker,
disk_size=disk_size,
cluster_config=BIGcluster_config
@@ -286,9 +282,10 @@ workflow {{ project_name }} {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
bed=bed,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_D6.Dedup_bam,
Dedup_bam_index=Dedup_D6.Dedup_bam_index,
sample="D6",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -297,10 +294,10 @@ workflow {{ project_name }} {

call sentieon.sentieon as sentieon_D6 {
input:
quality_yield=deduped_Metrics.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics.deduped_wgsmetrics,
aln_metrics=deduped_Metrics.dedeuped_aln_metrics,
is_metrics=deduped_Metrics.deduped_is_metrics,
quality_yield=deduped_Metrics_D6.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics_D6.deduped_wgsmetrics,
aln_metrics=deduped_Metrics_D6.dedeuped_aln_metrics,
is_metrics=deduped_Metrics_D6.deduped_is_metrics,
sample="D6",
docker=SENTIEONdocker,
cluster_config=SMALLcluster_config,
@@ -312,8 +309,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_D6.Dedup_bam,
Dedup_bam_index=Dedup_D6.Dedup_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
sample="D6",
@@ -327,8 +324,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
realigned_bam=Realigner.realigner_bam,
realigned_bam_index=Realigner.realigner_bam_index,
realigned_bam=Realigner_D6.realigner_bam,
realigned_bam_index=Realigner_D6.realigner_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
dbsnp=dbsnp,
@@ -344,8 +341,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
recaled_bam=BQSR.recaled_bam,
recaled_bam_index=BQSR.recaled_bam_index,
recaled_bam=BQSR_D6.recaled_bam,
recaled_bam_index=BQSR_D6.recaled_bam_index,
sample="D6",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -358,14 +355,14 @@ workflow {{ project_name }} {
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_D6 {
input:
vcf=filter_vcf_bed_D6.filterd_vcf,
vcf=filter_vcf_bed_D6.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_D6.filtered_bed,
@@ -374,11 +371,6 @@ workflow {{ project_name }} {
cluster_config=BIGcluster_config,
disk_size=disk_size
}


#######################
### F7 fastq to vcf ###
#######################
call mapping.mapping as mapping_F7 {
input:
@@ -419,8 +411,8 @@ workflow {{ project_name }} {
call Dedup.Dedup as Dedup_F7 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
sorted_bam=mapping.sorted_bam,
sorted_bam_index=mapping.sorted_bam_index,
sorted_bam=mapping_F7.sorted_bam,
sorted_bam_index=mapping_F7.sorted_bam_index,
sample="F7",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -429,8 +421,9 @@ workflow {{ project_name }} {

call qualimap.qualimap as qualimap_F7 {
input:
bam=Dedup.Dedup_bam,
bai=Dedup.Dedup_bam_index,
bed=bed,
bam=Dedup_F7.Dedup_bam,
bai=Dedup_F7.Dedup_bam_index,
docker=QUALIMAPdocker,
disk_size=disk_size,
cluster_config=BIGcluster_config
@@ -441,8 +434,9 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
bed=bed,
Dedup_bam=Dedup_F7.Dedup_bam,
Dedup_bam_index=Dedup_F7.Dedup_bam_index,
sample="F7",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -451,10 +445,10 @@ workflow {{ project_name }} {

call sentieon.sentieon as sentieon_F7 {
input:
quality_yield=deduped_Metrics.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics.deduped_wgsmetrics,
aln_metrics=deduped_Metrics.dedeuped_aln_metrics,
is_metrics=deduped_Metrics.deduped_is_metrics,
quality_yield=deduped_Metrics_F7.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics_F7.deduped_wgsmetrics,
aln_metrics=deduped_Metrics_F7.dedeuped_aln_metrics,
is_metrics=deduped_Metrics_F7.deduped_is_metrics,
sample="F7",
docker=SENTIEONdocker,
cluster_config=SMALLcluster_config,
@@ -466,8 +460,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_F7.Dedup_bam,
Dedup_bam_index=Dedup_F7.Dedup_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
sample="F7",
@@ -481,8 +475,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
realigned_bam=Realigner.realigner_bam,
realigned_bam_index=Realigner.realigner_bam_index,
realigned_bam=Realigner_F7.realigner_bam,
realigned_bam_index=Realigner_F7.realigner_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
dbsnp=dbsnp,
@@ -498,8 +492,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
recaled_bam=BQSR.recaled_bam,
recaled_bam_index=BQSR.recaled_bam_index,
recaled_bam=BQSR_F7.recaled_bam,
recaled_bam_index=BQSR_F7.recaled_bam_index,
sample="F7",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -512,14 +506,14 @@ workflow {{ project_name }} {
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_F7 {
input:
vcf=filter_vcf_bed_F7.filterd_vcf,
vcf=filter_vcf_bed_F7.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_F7.filtered_bed,
@@ -528,10 +522,6 @@ workflow {{ project_name }} {
cluster_config=BIGcluster_config,
disk_size=disk_size
}

#######################
### M8 fastq to vcf ###
#######################
call mapping.mapping as mapping_M8 {
input:
@@ -572,8 +562,8 @@ workflow {{ project_name }} {
call Dedup.Dedup as Dedup_M8 {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
sorted_bam=mapping.sorted_bam,
sorted_bam_index=mapping.sorted_bam_index,
sorted_bam=mapping_M8.sorted_bam,
sorted_bam_index=mapping_M8.sorted_bam_index,
sample="M8",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -582,8 +572,9 @@ workflow {{ project_name }} {

call qualimap.qualimap as qualimap_M8 {
input:
bam=Dedup.Dedup_bam,
bai=Dedup.Dedup_bam_index,
bed=bed,
bam=Dedup_M8.Dedup_bam,
bai=Dedup_M8.Dedup_bam_index,
docker=QUALIMAPdocker,
disk_size=disk_size,
cluster_config=BIGcluster_config
@@ -594,8 +585,9 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
bed=bed,
Dedup_bam=Dedup_M8.Dedup_bam,
Dedup_bam_index=Dedup_M8.Dedup_bam_index,
sample="M8",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -604,10 +596,10 @@ workflow {{ project_name }} {

call sentieon.sentieon as sentieon_M8 {
input:
quality_yield=deduped_Metrics.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics.deduped_wgsmetrics,
aln_metrics=deduped_Metrics.dedeuped_aln_metrics,
is_metrics=deduped_Metrics.deduped_is_metrics,
quality_yield=deduped_Metrics_M8.deduped_QualityYield,
wgs_metrics_algo=deduped_Metrics_M8.deduped_wgsmetrics,
aln_metrics=deduped_Metrics_M8.dedeuped_aln_metrics,
is_metrics=deduped_Metrics_M8.deduped_is_metrics,
sample="M8",
docker=SENTIEONdocker,
cluster_config=SMALLcluster_config,
@@ -619,8 +611,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
Dedup_bam=Dedup.Dedup_bam,
Dedup_bam_index=Dedup.Dedup_bam_index,
Dedup_bam=Dedup_M8.Dedup_bam,
Dedup_bam_index=Dedup_M8.Dedup_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
sample="M8",
@@ -634,8 +626,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
realigned_bam=Realigner.realigner_bam,
realigned_bam_index=Realigner.realigner_bam_index,
realigned_bam=Realigner_M8.realigner_bam,
realigned_bam_index=Realigner_M8.realigner_bam_index,
db_mills=db_mills,
dbmills_dir=dbmills_dir,
dbsnp=dbsnp,
@@ -651,8 +643,8 @@ workflow {{ project_name }} {
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
recaled_bam=BQSR.recaled_bam,
recaled_bam_index=BQSR.recaled_bam_index,
recaled_bam=BQSR_M8.recaled_bam,
recaled_bam_index=BQSR_M8.recaled_bam_index,
sample="M8",
docker=SENTIEONdocker,
disk_size=disk_size,
@@ -665,14 +657,14 @@ workflow {{ project_name }} {
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_M8 {
input:
vcf=filter_vcf_bed_M8.filterd_vcf,
vcf=filter_vcf_bed_M8.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_M8.filtered_bed,
@@ -681,11 +673,7 @@ workflow {{ project_name }} {
cluster_config=BIGcluster_config,
disk_size=disk_size
}
}

#######################
### merge qc ###
#######################

Array[File] fastqc_read1_zip = [fastqc_D5.read1_zip, fastqc_D6.read1_zip, fastqc_F7.read1_zip, fastqc_M8.read1_zip]

@@ -700,7 +688,7 @@ workflow {{ project_name }} {
Array[File] qualimap_zip = [qualimap_D5.zip, qualimap_D6.zip, qualimap_F7.zip, qualimap_M8.zip]


call multiqc.multiqc as multiqc {
call multiqc.multiqc as multiqc_big {
input:
read1_zip=fastqc_read1_zip,
read2_zip=fastqc_read2_zip,
@@ -746,15 +734,15 @@ workflow {{ project_name }} {
disk_size=disk_size
}

call extract_tables.extract_tables as extract_tables {
call extract_tables.extract_tables as extract_tables_big {
input:
quality_yield_summary=merge_sentieon_metrics.quality_yield_summary,
wgs_metrics_summary=merge_sentieon_metrics.wgs_metrics_summary,
aln_metrics_summary=merge_sentieon_metrics.aln_metrics_summary,
is_metrics_summary=merge_sentieon_metrics.is_metrics_summary,
fastqc=multiqc.fastqc,
fastqscreen=multiqc.fastqscreen,
hap=multiqc.hap,
fastqc=multiqc_big.fastqc,
fastqscreen=multiqc_big.fastqscreen,
hap=multiqc_big.hap,
project=project,
docker=DIYdocker,
cluster_config=SMALLcluster_config,
@@ -762,24 +750,21 @@ workflow {{ project_name }} {
}
}

############################
## vcf input preprocess ##
############################
if (vcf_D5!= "") {
call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_D5 {
call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_D5_vcf {
input:
vcf=Haplotyper_D5.vcf,
vcf=vcf_D5,
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_D5 {
call benchmark.benchmark as benchmark_D5_vcf {
input:
vcf=filter_vcf_bed_D5.filterd_vcf,
vcf=filter_vcf_bed_D5_vcf.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_D5.filtered_bed,
@@ -787,22 +772,22 @@ workflow {{ project_name }} {
docker=BENCHMARKdocker,
cluster_config=BIGcluster_config,
disk_size=disk_size
}
}

call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_D6 {
call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_D6_vcf {
input:
vcf=Haplotyper_D6.vcf,
vcf=vcf_D6,
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_D6 {
call benchmark.benchmark as benchmark_D6_vcf {
input:
vcf=filter_vcf_bed_D6.filterd_vcf,
vcf=filter_vcf_bed_D6.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_D6.filtered_bed,
@@ -812,20 +797,20 @@ workflow {{ project_name }} {
disk_size=disk_size
}

call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_F7 {
call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_F7_vcf {
input:
vcf=Haplotyper_F7.vcf,
vcf=vcf_F7,
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_F7 {
call benchmark.benchmark as benchmark_F7_vcf {
input:
vcf=filter_vcf_bed_F7.filterd_vcf,
vcf=filter_vcf_bed_F7_vcf.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_F7.filtered_bed,
@@ -835,20 +820,20 @@ workflow {{ project_name }} {
disk_size=disk_size
}

call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_M8 {
call filter_vcf_bed.filter_vcf_bed as filter_vcf_bed_M8_vcf {
input:
vcf=Haplotyper_M8.vcf,
vcf=vcf_M8,
bed=bed,
benchmark_region=benchmark_region,
project=project,
docker=docker,
cluster_config=cluster_config,
docker=BEDTOOLSdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

call benchmark.benchmark as benchmark_M8 {
call benchmark.benchmark as benchmark_M8_vcf {
input:
vcf=filter_vcf_bed_M8.filterd_vcf,
vcf=filter_vcf_bed_M8_vcf.filtered_vcf,
benchmarking_dir=benchmarking_dir,
ref_dir=ref_dir,
qc_bed=filter_vcf_bed_M8.filtered_bed,
@@ -858,23 +843,30 @@ workflow {{ project_name }} {
disk_size=disk_size
}

Array[File] benchmark_summary = [benchmark_D5.summary, benchmark_D6.summary, benchmark_F7.summary, benchmark_M8.summary]
Array[File] benchmark_summary_hap = [benchmark_D5_vcf.summary, benchmark_D6_vcf.summary, benchmark_F7_vcf.summary, benchmark_M8_vcf.summary]

#### multiqc

call multiqc.multiqc as multiqc {
call multiqc.multiqc as multiqc_small {
input:
summary=benchmark_summary,
read1_zip="",
read2_zip="",
txt1="",
txt2="",
zip="",
summary=benchmark_summary_hap,
docker=MULTIQCdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size
}

#### extract table
}

call extract_tables.extract_tables as extract_tables {
call extract_tables.extract_tables as extract_tables_small {
input:
hap=multiqc.hap,
quality_yield_summary="",
wgs_metrics_summary="",
aln_metrics_summary="",
is_metrics_summary="",
fastqc="",
fastqscreen="",
hap=multiqc_small.hap,
project=project,
docker=DIYdocker,
cluster_config=SMALLcluster_config,
@@ -882,10 +874,6 @@ workflow {{ project_name }} {
}
}

########################
### mendelian ###
########################

call merge_family.merge_family as merge_family {
input:
D5_vcf=benchmark_D5.rtg_vcf,
@@ -904,7 +892,7 @@ workflow {{ project_name }} {

call mendelian.mendelian as mendelian {
input:
family_vcf=family_vcfs[idx],
family_vcf=merge_family.family_vcf,
ref_dir=ref_dir,
fasta=fasta,
docker=MENDELIANdocker,
@@ -916,7 +904,7 @@ workflow {{ project_name }} {
input:
D5_trio_vcf=mendelian.D5_trio_vcf,
D6_trio_vcf=mendelian.D6_trio_vcf,
family_vcf=family_vcfs[idx],
family_vcf=merge_family.family_vcf,
docker=DIYdocker,
cluster_config=SMALLcluster_config,
disk_size=disk_size

Loading…
Cancel
Save