Selaa lähdekoodia

first commit

master
Zhihui 4 vuotta sitten
commit
ed9701e342
15 muutettua tiedostoa jossa 753 lisäystä ja 0 poistoa
  1. BIN
      .DS_Store
  2. +120
    -0
      README.md
  3. +60
    -0
      defaults
  4. +61
    -0
      inputs
  5. BIN
      tasks/.DS_Store
  6. +67
    -0
      tasks/fastp.wdl
  7. +28
    -0
      tasks/fastqc.wdl
  8. +36
    -0
      tasks/fastqscreen.wdl
  9. +34
    -0
      tasks/hisat2.wdl
  10. +47
    -0
      tasks/multiqc.wdl
  11. +26
    -0
      tasks/qualimapBAMqc.wdl
  12. +27
    -0
      tasks/qualimapRNAseq.wdl
  13. +38
    -0
      tasks/samtools.wdl
  14. +31
    -0
      tasks/stringtie.wdl
  15. +178
    -0
      workflow.wdl

BIN
.DS_Store Näytä tiedosto


+ 120
- 0
README.md Näytä tiedosto

@@ -0,0 +1,120 @@
# RNA Sequencing Quality Control Pipeline

> Author: Li Zhihui
>
> E-mail:18210700119@fudan.edu.cn
>
> Git:
>
> Last Updates: 2020/07/13

## 安装指南

```
# 激活choppy环境
source activate choppy
# 安装app
choppy install lizhihui/test_dataportol1
```

## App概述——中华家系1号标准物质介绍

建立高通量全基因组测序的生物计量和质量控制关键技术体系,是保障测序数据跨技术平台、跨实验室可比较、相关研究结果可重复、数据可共享的重要关键共性技术。建立国家基因组标准物质和基准数据集,突破基因组学的生物计量技术,是将测序技术转化成临床应用的重要环节与必经之路,目前国际上尚属空白。中国计量科学研究院与复旦大学、复旦大学泰州健康科学研究院共同研制了人源中华家系1号基因组标准物质(**Quartet,一套4个样本,编号分别为LCL5,LCL6,LCL7,LCL8,其中LCL5和LCL6为同卵双胞胎女儿,LCL7为父亲,LCL8为母亲**),以及相应的全基因组测序序列基准数据集(“量值”),为衡量基因序列检测准确与否提供一把“标尺”,成为保障基因测序数据可靠性的国家基准。人源中华家系1号基因组标准物质来源于泰州队列同卵双生双胞胎家庭,从遗传结构上体现了我国南北交界的人群结构特征,同时家系的设计也为“量值”的确定提供了遗传学依据。



该Quality_control APP用于转录组测序(RNA Sequencing,RNA-Seq)数据的质量评估,包括原始数据质控、比对数据质控和基因表达数据质控。

## 流程与参数

![image-20200713083634120](https://tva1.sinaimg.cn/large/007S8ZIlgy1ggp1q2qstej31330u0wol.jpg)

## App输入文件
inputSamplesFile

```
#read1 #read2 #sample_id #adapter_sequence #adapter_sequence_r2
```

read1 是阿里云上fastq read1的地址

read2 是阿里云上fastq read2的地址

sample_id 是指样本的命名

adapter_sequence 是R1端需要去除的接头

adapter_sequence_r2 是R2端需要去除的接头

所有上传的文件应有规范的命名

## App输出文件
1.上游质控参数

| 列名 | 说明 | 范围 |
| -------------------------- | ---- | ---- |
| SampleID | | |
| #Date | | |
| #LibraryPrep | | |
| Replicate | | |
| Sample | | |
| #SequenceMachine | | |
| #SequenceSite | | |
| #SequenceTech | | |
raw reads
| Total_Reads_After_Trimming | | |
| GC_content |* | |
| Human.percentage | | |
| #ERCC.percentage | | |
| EColi.percentage | | |
| Adapter.percentage | | |
| #Vector.percentage | | |
| rRNA.percentage | | |
| Virus.percentage | | |
| Yeast.percentage | | |
| Mitoch.percentage | | |
| Phix.percentage | | |
| No.hits.percentage | | |
| GC_content_bamqc | | |
| Mapping_Ratio | * | |
| Insert_size_median | * | |
| Insert_size_peak | * | |
error rate
average length
3’5‘ gene cover
duplication
strand bias


2.下游质控参数

| Quality metrics | Category | Description | Reference value |
| ----------------------------------------- | ----------- | ------------------------------------------------------------ | --------------- |
| Number of detected genes | One group | This metric is used to estimate the detection abundance of one sample. | (**, 58,395] |
| Detection Jaccard index (JI) | One group | Detection JI is the ratio of number of the genes detected in both replicates than the number of the genes detected in either of the replicates. This metric is used to estimate the repeatability of one sample detected gene from different replicates. | [0.8, 1] |
| Coefficient of variation (CV) | One group | CV is calculated based on the normalized expression levels in all 3 replicates of one sample for each genes. This metric is used to estimate the repeatability of one sample expression level from different replicates. | [0, 0.2] |
| Correlation of technical replicates (CTR) | One group | CTR is calculated based on the correlation of one sample expression level from different replicates. | [0.95, 1] |
| Signal-to-noise Ratio (SNR) | More groups | Signal is defined as the average distance between libraries from the different samples on PCA plots and noise are those form the same samples. SNR is used to assess the ability to distinguish technical replicates from different biological samples. | [5, inf) |
| Sensitivity of detection | One group | Sensitivity is the proportion of "true" detected genes from reference dataset which can be correctly detected by the test set. | [0.96, 1] |
| /Reference dependent | | | |
| Specificity of detection | One group | Specificity is the proportion of "true" non-detected genes from reference dataset which can be correctly not detected by the test set. | [0.94, 1] |
| /Reference dependent | | | |
| Consistency ratio of relative expression | Two groups | Proportion of genes that falls into reference range (mean ± 2 fold SD) in relative ratio (log2FC). | [0.82, 1] |
| /Reference dependent | | | |
| Correlation of relative log2FC | Two groups | Pearson correlation between mean value of reference relative ratio and test site. | [0.96,1] |
| /Reference dependent | | | |
| Sensitivity of DEGs | Two groups | Sensitivity is the proportion of "true" DEGs from reference dataset which can be correctly identified as DEG by the test set. | [0.80, 1] |
| /Reference dependent | | | |
| Specificity of DEGs | Two groups | Specificity is the proportion of "true" not DEGs from reference dataset which can be can be correctly identified as non-DEG by the test set. | [0.95, 1] |
| /Reference dependent | | | |





## 结果展示与解读






+ 60
- 0
defaults Näytä tiedosto

@@ -0,0 +1,60 @@
{
"fastp_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastp:0.19.6",
"fastp_cluster": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc",
"trim_front1": "0",
"trim_tail1": "0",
"max_len1": "0",
"trim_front2": "0",
"trim_tail2": "0",
"max_len2": "0",
"adapter_sequence": "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA",
"adapter_sequence_r2": "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT",
"disable_adapter_trimming": "0",
"length_required": "50",
"length_required1": "20",
"UMI": "0",
"umi_len": "0",
"umi_loc": "umi_loc",
"qualified_quality_phred": "20",
"disable_quality_filtering": "1",
"hisat2_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/hisat2:v2.1.0-2",
"hisat2_cluster": "OnDemand bcs.a2.3xlarge img-ubuntu-vpc",
"idx_prefix": "genome_snp_tran",
"idx": "oss://pgx-reference-data/reference/hisat2/grch38_snp_tran/",
"fasta": "GRCh38.d1.vd1.fa",
"pen_cansplice":"0",
"pen_noncansplice":"3",
"pen_intronlen":"G,-8,1",
"min_intronlen":"30",
"max_intronlen":"500000",
"maxins":"500",
"minins":"0",
"samtools_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/samtools:v1.3.1",
"samtools_cluster": "OnDemand bcs.a2.large img-ubuntu-vpc",
"insert_size":"8000",
"gtf": "oss://pgx-reference-data/reference/annotation/Homo_sapiens.GRCh38.93.gtf",
"stringtie_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/stringtie:v1.3.4",
"stringtie_cluster": "OnDemand bcs.a2.large img-ubuntu-vpc",
"minimum_length_allowed_for_the_predicted_transcripts":"200",
"minimum_isoform_abundance":"0.01",
"Junctions_no_spliced_reads":"10",
"maximum_fraction_of_muliplelocationmapped_reads":"0.95",
"fastqc_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc",
"fastqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqc:v0.11.5",
"fastqc_disk_size": "150",
"qualimapBAMqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/qualimap:2.0.0",
"qualimapBAMqc_cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc",
"qualimapBAMqc_disk_size": "500",
"qualimapRNAseqqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/qualimap:2.0.0",
"qualimapRNAseqqc_cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc",
"qualimapRNAseqqc_disk_size": "500",
"fastqscreen_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqscreen:0.12.0",
"fastqscreen_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc",
"screen_ref_dir": "oss://pgx-reference-data/fastq_screen_reference/",
"fastq_screen_conf": "oss://pgx-reference-data/fastq_screen_reference/fastq_screen.conf",
"fastqscreen_disk_size": "100",
"ref_dir": "oss://chinese-quartet/quartet-storage-data/reference_data/",
"multiqc_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc",
"multiqc_docker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/multiqc:v1.8",
"multiqc_disk_size": "100"
}

+ 61
- 0
inputs Näytä tiedosto

@@ -0,0 +1,61 @@
{
"{{ project_name }}.inputSamplesFile": "{{ inputSamplesFile }}",
"{{ project_name }}.fastp_docker": "{{ fastp_docker }}",
"{{ project_name }}.fastp_cluster": "{{ fastp_cluster }}",
"{{ project_name }}.trim_front1": "{{ trim_front1 }}",
"{{ project_name }}.trim_tail1": "{{ trim_tail1 }}",
"{{ project_name }}.max_len1": "{{ max_len1 }}",
"{{ project_name }}.trim_front2": "{{ trim_front2 }}",
"{{ project_name }}.trim_tail2": "{{ trim_tail2 }}",
"{{ project_name }}.max_len2": "{{ max_len2 }}",
"{{ project_name }}.adapter_sequence": "{{ adapter_sequence }}",
"{{ project_name }}.adapter_sequence_r2": "{{ adapter_sequence_r2 }}",
"{{ project_name }}.disable_adapter_trimming": "{{ disable_adapter_trimming }}",
"{{ project_name }}.length_required": "{{ length_required }}",
"{{ project_name }}.UMI": "{{ UMI }}",
"{{ project_name }}.umi_loc": "{{ umi_loc }}",
"{{ project_name }}.umi_len": "{{ umi_len }}",
"{{ project_name }}.length_required": "{{ length_required }}",
"{{ project_name }}.qualified_quality_phred": "{{ qualified_quality_phred }}",
"{{ disable_quality_filtering }}.umi_loc": "{{ disable_quality_filtering }}",
"{{ project_name }}.hisat2_docker": "{{ hisat2_docker }}",
"{{ project_name }}.hisat2_cluster": "{{ hisat2_cluster }}",
"{{ project_name }}.idx_prefix": "{{ idx_prefix }}",
"{{ project_name }}.idx": "{{ idx }}",
"{{ project_name }}.fasta": "{{ fasta }}",
"{{ project_name }}.pen_cansplice": "{{ pen_cansplice }}",
"{{ project_name }}.pen_noncansplice": "{{ pen_noncansplice }}",
"{{ project_name }}.pen_intronlen": "{{ pen_intronlen }}",
"{{ project_name }}.min_intronlen": "{{ min_intronlen }}",
"{{ project_name }}.max_intronlen": "{{ max_intronlen }}",
"{{ project_name }}.maxins": "{{ maxins }}",
"{{ project_name }}.minins": "{{ minins }}",
"{{ project_name }}.samtools_docker": "{{ samtools_docker }}",
"{{ project_name }}.samtools_cluster": "{{ samtools_cluster }}",
"{{ project_name }}.insert_size": "{{ insert_size }}",
"{{ project_name }}.gtf": "{{ gtf }}",
"{{ project_name }}.stringtie_docker": "{{ stringtie_docker }}",
"{{ project_name }}.stringtie_cluster": "{{ stringtie_cluster }}",
"{{ project_name }}.minimum_length_allowed_for_the_predicted_transcripts": "{{ minimum_length_allowed_for_the_predicted_transcripts }}",
"{{ project_name }}.minimum_isoform_abundance": "{{ minimum_isoform_abundance }}",
"{{ project_name }}.Junctions_no_spliced_reads": "{{ Junctions_no_spliced_reads }}",
"{{ project_name }}.maximum_fraction_of_muliplelocationmapped_reads": "{{ maximum_fraction_of_muliplelocationmapped_reads }}",
"{{ project_name }}.fastqc_cluster_config": "{{ fastqc_cluster_config }}",
"{{ project_name }}.fastqc_docker": "{{ fastqc_docker }}",
"{{ project_name }}.fastqc_disk_size": "{{ fastqc_disk_size }}",
"{{ project_name }}.qualimapBAMqc_docker": "{{ qualimapBAMqc_docker }}",
"{{ project_name }}.qualimapBAMqc_cluster_config": "{{ qualimapBAMqc_cluster_config }}",
"{{ project_name }}.qualimapBAMqc_disk_size": "{{ qualimapBAMqc_disk_size }}",
"{{ project_name }}.qualimapRNAseqqc_docker": "{{ qualimapRNAseqqc_docker }}",
"{{ project_name }}.qualimapRNAseqqc_cluster_config": "{{ qualimapRNAseqqc_cluster_config }}",
"{{ project_name }}.qualimapRNAseqqc_disk_size": "{{ qualimapRNAseqqc_disk_size }}",
"{{ project_name }}.fastqscreen_docker": "{{ fastqscreen_docker }}",
"{{ project_name }}.fastqscreen_cluster_config": "{{ fastqscreen_cluster_config }}",
"{{ project_name }}.screen_ref_dir": "{{ screen_ref_dir }}",
"{{ project_name }}.fastq_screen_conf": "{{ fastq_screen_conf }}",
"{{ project_name }}.fastqscreen_disk_size": "{{ fastqscreen_disk_size }}",
"{{ project_name }}.ref_dir": "{{ ref_dir }}",
"{{ project_name }}.multiqc_cluster_config": "{{ multiqc_cluster_config }}",
"{{ project_name }}.multiqc_docker": "{{ multiqc_docker }}",
"{{ project_name }}.multiqc_disk_size": "{{ multiqc_disk_size }}"
}

BIN
tasks/.DS_Store Näytä tiedosto


+ 67
- 0
tasks/fastp.wdl Näytä tiedosto

@@ -0,0 +1,67 @@
task fastp {
String sample_id
File read1
File read2
String adapter_sequence
String adapter_sequence_r2
String docker
String cluster
String umi_loc
Int trim_front1
Int trim_tail1
Int max_len1
Int trim_front2
Int trim_tail2
Int max_len2
Int disable_adapter_trimming
Int length_required
Int umi_len
Int UMI
Int qualified_quality_phred
Int length_required1
Int disable_quality_filtering
command <<<
mkdir -p /cromwell_root/tmp/fastp/
##1.Disable_quality_filtering
if [ "${disable_quality_filtering}" == 0 ]
then
cp ${read1} /cromwell_root/tmp/fastp/{sample_id}_R1.fastq.tmp1.gz
cp ${read2} /cromwell_root/tmp/fastp/{sample_id}_R2.fastq.tmp1.gz
else
fastp --thread 4 --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i ${read1} -I ${read2} -o /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz -O /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz -j ${sample_id}.json -h ${sample_id}.html
fi

##2.UMI
if [ "${UMI}" == 0 ]
then
cp /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz
cp /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz
else
fastp --thread 4 -U --umi_loc=${umi_loc} --umi_len=${umi_len} --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz -I /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz -o /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz -O /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz -j ${sample_id}.json -h ${sample_id}.html
fi

##3.Trim
if [ "${disable_adapter_trimming}" == 0 ]
then
fastp --thread 4 -l ${length_required} -q ${qualified_quality_phred} -u ${length_required1} --adapter_sequence ${adapter_sequence} --adapter_sequence_r2 ${adapter_sequence_r2} --detect_adapter_for_pe --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz -I /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz -o ${sample_id}_R1.fastq.gz -O ${sample_id}_R2.fastq.gz -j ${sample_id}.json -h ${sample_id}.html
else
cp /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz ${sample_id}_R1.fastq.gz
cp /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz ${sample_id}_R2.fastq.gz
fi
>>>
runtime {
docker: docker
cluster: cluster
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd 200 /cromwell_root/"
}

output {
File json = "${sample_id}.json"
File report = "${sample_id}.html"
File Trim_R1 = "${sample_id}_R1.fastq.gz"
File Trim_R2 = "${sample_id}_R2.fastq.gz"
}
}

+ 28
- 0
tasks/fastqc.wdl Näytä tiedosto

@@ -0,0 +1,28 @@
task fastqc {
File read1
File read2
String docker
String cluster_config
String disk_size

command <<<
set -o pipefail
set -e
nt=$(nproc)
fastqc -t $nt -o ./ ${read1}
fastqc -t $nt -o ./ ${read2}
>>>

runtime {
docker:docker
cluster: cluster_config
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
}
output {
File read1_html = sub(basename(read1), "\\.(fastq|fq)\\.gz$", "_fastqc.html")
File read1_zip = sub(basename(read1), "\\.(fastq|fq)\\.gz$", "_fastqc.zip")
File read2_html = sub(basename(read2), "\\.(fastq|fq)\\.gz$", "_fastqc.html")
File read2_zip = sub(basename(read2), "\\.(fastq|fq)\\.gz$", "_fastqc.zip")
}
}

+ 36
- 0
tasks/fastqscreen.wdl Näytä tiedosto

@@ -0,0 +1,36 @@
task fastq_screen {
File read1
File read2
File screen_ref_dir
File fastq_screen_conf
String read1name = basename(read1,".fastq.gz")
String read2name = basename(read2,".fastq.gz")
String docker
String cluster_config
String disk_size

command <<<
set -o pipefail
set -e
nt=$(nproc)
mkdir -p /cromwell_root/tmp
cp -r ${screen_ref_dir} /cromwell_root/tmp/
fastq_screen --aligner bowtie2 --conf ${fastq_screen_conf} --top 100000 --threads $nt ${read1}
fastq_screen --aligner bowtie2 --conf ${fastq_screen_conf} --top 100000 --threads $nt ${read2}
>>>

runtime {
docker:docker
cluster: cluster_config
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
}
output {
File png1 = "${read1name}_screen.png"
File txt1 = "${read1name}_screen.txt"
File html1 = "${read1name}_screen.html"
File png2 = "${read2name}_screen.png"
File txt2 = "${read2name}_screen.txt"
File html2 = "${read2name}_screen.html"
}
}

+ 34
- 0
tasks/hisat2.wdl Näytä tiedosto

@@ -0,0 +1,34 @@
task hisat2 {
File idx
File Trim_R1
File Trim_R2
String idx_prefix
String sample_id
String docker
String cluster
String pen_intronlen
Int pen_cansplice
Int pen_noncansplice
Int min_intronlen
Int max_intronlen
Int maxins
Int minins
command <<<
nt=$(nproc)
hisat2 -t -p $nt -x ${idx}/${idx_prefix} --pen-cansplice ${pen_cansplice} --pen-noncansplice ${pen_noncansplice} --pen-intronlen ${pen_intronlen} --min-intronlen ${min_intronlen} --max-intronlen ${max_intronlen} --maxins ${maxins} --minins ${minins} --un-conc-gz ${sample_id}_un.fq.gz -1 ${Trim_R1} -2 ${Trim_R2} -S ${sample_id}.sam
>>>
runtime {
docker: docker
cluster: cluster
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd 200 /cromwell_root/"
}

output {
File sam = "${sample_id}.sam"
File unmapread_1p = "${sample_id}_un.fq.1.gz"
File unmapread_2p = "${sample_id}_un.fq.2.gz"
}
}

+ 47
- 0
tasks/multiqc.wdl Näytä tiedosto

@@ -0,0 +1,47 @@
task multiqc {

Array[File] read1_zip
Array[File] read2_zip

Array[File] txt1
Array[File] txt2

Array[File] bamqc_zip

String docker
String cluster_config
String disk_size

command <<<
set -o pipefail
set -e
mkdir -p /cromwell_root/tmp/fastqc
mkdir -p /cromwell_root/tmp/fastqscreen
mkdir -p /cromwell_root/tmp/bamqc
mkdir -p /cromwell_root/tmp/rnaseq

cp ${sep=" " read1_zip} ${sep=" " read2_zip} /cromwell_root/tmp/fastqc
cp ${sep=" " txt1} ${sep=" " txt2} /cromwell_root/tmp/fastqscreen
for i in ${sep=" " bamqc_zip}
do
tar -zxvf $i -C /cromwell_root/tmp/bamqc
done


multiqc /cromwell_root/tmp/
>>>

runtime {
docker:docker
cluster:cluster_config
systemDisk:"cloud_ssd 40"
dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
}

output {
File multiqc_html = "multiqc_report.html"
Array[File] multiqc_txt = glob("multiqc_data/*")
}
}

+ 26
- 0
tasks/qualimapBAMqc.wdl Näytä tiedosto

@@ -0,0 +1,26 @@
task qualimapBAMqc {
File bam
String bamname = basename(bam,".bam")
String docker
String cluster_config
String disk_size

command <<<
set -o pipefail
set -e
nt=$(nproc)
/opt/qualimap/qualimap bamqc -bam ${bam} -outformat PDF:HTML -nt $nt -outdir ${bamname}_bamqc --java-mem-size=32G
tar -zcvf ${bamname}_bamqc_qualimap.zip ${bamname}_bamqc
>>>

runtime {
docker:docker
cluster:cluster_config
systemDisk:"cloud_ssd 40"
dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
}

output {
File bamqc_zip = "${bamname}_bamqc_qualimap.zip"
}
}

+ 27
- 0
tasks/qualimapRNAseq.wdl Näytä tiedosto

@@ -0,0 +1,27 @@
task qualimapRNAseq {
File bam
File gtf
String bamname = basename(bam,".bam")
String docker
String cluster_config
String disk_size

command <<<
set -o pipefail
set -e
nt=$(nproc)
/opt/qualimap/qualimap rnaseq -bam ${bam} -outformat HTML -outdir ${bamname}_RNAseq -gtf ${gtf} -pe --java-mem-size=10G
tar -zcvf ${bamname}_RNAseq_qualimap.zip ${bamname}_RNAseq
>>>

runtime {
docker:docker
cluster:cluster_config
systemDisk:"cloud_ssd 40"
dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
}

output {
File rnaseq_zip = "${bamname}_RNAseq_qualimap.zip"
}
}

+ 38
- 0
tasks/samtools.wdl Näytä tiedosto

@@ -0,0 +1,38 @@
task samtools {
File sam
String sample_id
String bam = sample_id + ".bam"
String sorted_bam = sample_id + ".sorted.bam"
String percent_bam = sample_id + ".10percent.bam"
String sorted_bam_index = sample_id + ".sorted.bam.bai"
String ins_size = sample_id + ".ins_size"
String docker
String cluster
Int insert_size

command <<<
set -o pipefail
set -e
/opt/conda/bin/samtools view -bS ${sam} > ${bam}
/opt/conda/bin/samtools view -bs 42.1 ${bam} > ${percent_bam}
/opt/conda/bin/samtools sort -m 1000000000 ${bam} -o ${sorted_bam}
/opt/conda/bin/samtools index ${sorted_bam}
/opt/conda/bin/samtools stats -i ${insert_size} ${sorted_bam} |grep ^IS|cut -f 2- > ${sample_id}.ins_size
>>>

runtime {
docker: docker
cluster: cluster
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd 200 /cromwell_root/"
}

output {
File out_bam = sorted_bam
File out_percent = percent_bam
File out_bam_index = sorted_bam_index
File out_ins_size = ins_size
}

}


+ 31
- 0
tasks/stringtie.wdl Näytä tiedosto

@@ -0,0 +1,31 @@
task stringtie {
File bam
File gtf
String docker
String sample_id
String cluster
Int minimum_length_allowed_for_the_predicted_transcripts
Int Junctions_no_spliced_reads
Float minimum_isoform_abundance
Float maximum_fraction_of_muliplelocationmapped_reads

command <<<
nt=$(nproc)
mkdir ballgown
/opt/conda/bin/stringtie -e -B -p $nt -f ${minimum_isoform_abundance} -m ${minimum_length_allowed_for_the_predicted_transcripts} -a ${Junctions_no_spliced_reads} -M ${maximum_fraction_of_muliplelocationmapped_reads} -G ${gtf} -o ballgown/${sample_id}/${sample_id}.gtf -C ${sample_id}.cov.ref.gtf -A ${sample_id}.gene.abundance.txt ${bam} -g ${sample_id}_genecount.csv
>>>
runtime {
docker: docker
cluster: cluster
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd 150 /cromwell_root/"
}
output {
File covered_transcripts = "${sample_id}.cov.ref.gtf"
File gene_abundance = "${sample_id}.gene.abundance.txt"
Array[File] ballgown = ["ballgown/${sample_id}/${sample_id}.gtf", "ballgown/${sample_id}/e2t.ctab", "ballgown/${sample_id}/e_data.ctab", "ballgown/${sample_id}/i2t.ctab", "ballgown/${sample_id}/i_data.ctab", "ballgown/${sample_id}/t_data.ctab"]
File genecount = "{sample_id}_genecount.csv"
}
}

+ 178
- 0
workflow.wdl Näytä tiedosto

@@ -0,0 +1,178 @@
import "./tasks/fastp.wdl" as fastp
import "./tasks/hisat2.wdl" as hisat2
import "./tasks/samtools.wdl" as samtools
import "./tasks/stringtie.wdl" as stringtie
import "./tasks/fastqc.wdl" as fastqc
import "./tasks/fastqscreen.wdl" as fastqscreen
import "./tasks/qualimapBAMqc.wdl" as qualimapBAMqc
import "./tasks/qualimapRNAseq.wdl" as qualimapRNAseq
import "./tasks/multiqc.wdl" as multiqc

workflow {{ project_name }} {
File inputSamplesFile
File idx
File screen_ref_dir
File fastq_screen_conf
File gtf
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
String fastp_docker
String adapter_sequence
String adapter_sequence_r2
String fastp_cluster
String umi_loc
String idx_prefix
String pen_intronlen
String fastqc_cluster_config
String fastqc_docker
String fastqscreen_docker
String fastqscreen_cluster_config
String hisat2_docker
String hisat2_cluster
String qualimapBAMqc_docker
String qualimapBAMqc_cluster_config
String qualimapRNAseqqc_docker
String qualimapRNAseqqc_cluster_config
String samtools_docker
String samtools_cluster
String stringtie_docker
String stringtie_cluster
Int trim_front1
Int trim_tail1
Int max_len1
Int trim_front2
Int trim_tail2
Int max_len2
Int disable_adapter_trimming
Int length_required
Int umi_len
Int UMI
Int qualified_quality_phred
Int length_required1
Int disable_quality_filtering
Int pen_cansplice
Int pen_noncansplice
Int min_intronlen
Int max_intronlen
Int maxins
Int minins
Int fastqc_disk_size
Int fastqscreen_disk_size
Int qualimapBAMqc_disk_size
Int qualimapRNAseqqc_disk_size
Int insert_size
Int minimum_length_allowed_for_the_predicted_transcripts
Int Junctions_no_spliced_reads
Float minimum_isoform_abundance
Float maximum_fraction_of_muliplelocationmapped_reads

scatter (quartet in inputSamples){

call fastp.fastp as fastp {
input:
sample_id= quartet[2],
read1= quartet[0],
read2= quartet[1],
docker = fastp_docker,
cluster = fastp_cluster,
adapter_sequence = adapter_sequence,
adapter_sequence_r2 = adapter_sequence_r2,
umi_loc = umi_loc,
trim_front1 = trim_front1,
trim_tail1 = trim_tail1,
max_len1 = max_len1,
trim_front2 = trim_front2,
trim_tail2 = trim_tail2,
max_len2 = max_len2,
disable_adapter_trimming = disable_adapter_trimming,
length_required = length_required,
umi_len = umi_len,
UMI = UMI,
qualified_quality_phred = qualified_quality_phred,
length_required1 = length_required1,
disable_quality_filtering = disable_quality_filtering
}

call fastqc.fastqc as fastqc {
input:
read1=fastp.Trim_R1,
read2=fastp.Trim_R2
}

call fastqscreen.fastq_screen as fastqscreen {
input:
read1=fastp.Trim_R1,
read2=fastp.Trim_R2,
screen_ref_dir=screen_ref_dir,
fastq_screen_conf=fastq_screen_conf
}

call hisat2.hisat2 as hisat2 {
input:
sample_id = quartet[2],
idx = idx,
idx_prefix = idx_prefix,
Trim_R1 = fastp.Trim_R1,
Trim_R2 = fastp.Trim_R2,
docker = hisat2_docker,
cluster = hisat2_cluster,
pen_intronlen = pen_intronlen,
pen_cansplice = pen_cansplice,
pen_noncansplice pen_noncansplice,
min_intronlen = min_intronlen,
max_intronlen = max_intronlen,
maxins = maxins,
minins = minins
}

call samtools.samtools as samtools {
input:
sample_id = quartet[2],
sam = hisat2.sam,
docker = samtools_docker,
cluster = samtools_cluster,
insert_size = insert_size
}
call qualimapBAMqc.qualimapBAMqc as qualimapBAMqc {
input:
bam = samtools.out_percent,
docker = qualimapBAMqc_docker,
cluster = qualimapBAMqc_cluster_config,
disk_size = qualimapBAMqc_disk_size
}

call qualimapRNAseq.qualimapRNAseq as qualimapRNAseq {
input:
bam = samtools.out_percent,
docker = qualimapRNAseqqc_docker,
cluster = qualimapRNAseqqc_cluster_config,
disk_size = qualimapRNAseqqc_disk_size,
gtf = gtf
}

call stringtie.stringtie as stringtie {
input:
sample_id = quartet[2],
gtf = gtf,
bam = samtools.out_bam,
docker = stringtie_docker,
cluster = stringtie_cluster,
minimum_length_allowed_for_the_predicted_transcripts = minimum_length_allowed_for_the_predicted_transcripts,
Junctions_no_spliced_reads = Junctions_no_spliced_reads,
minimum_isoform_abundance = minimum_isoform_abundance,
maximum_fraction_of_muliplelocationmapped_reads = maximum_fraction_of_muliplelocationmapped_reads
}
}


call multiqc.multiqc as multiqc {
input:
read1_zip = fastqc.read1_zip,
read2_zip = fastqc.read2_zip,
txt1 = fastqscreen.txt1,
txt2 = fastqscreen.txt2,
bamqc_zip = qualimapBAMqc.bamqc_zip,
RNAseqqc_zip = qualimapBAMqc.rnaseq_zip
}
}

Loading…
Peruuta
Tallenna