5 年前 · ed9701e342
--- a/.DS_Store
+++ b/.DS_Store
--- a/README.md
+++ b/README.md
@@ -0,0 +1,120 @@
 # RNA Sequencing Quality Control Pipeline

 > Author： Li Zhihui
 >
 > E-mail：18210700119@fudan.edu.cn
 >
 > Git: 
 >
 > Last Updates: 2020/07/13

 ## 安装指南

 ```
 # 激活choppy环境
 source activate choppy
 # 安装app
 choppy install lizhihui/test_dataportol1
 ```

 ## App概述——中华家系1号标准物质介绍

 建立高通量全基因组测序的生物计量和质量控制关键技术体系，是保障测序数据跨技术平台、跨实验室可比较、相关研究结果可重复、数据可共享的重要关键共性技术。建立国家基因组标准物质和基准数据集，突破基因组学的生物计量技术，是将测序技术转化成临床应用的重要环节与必经之路，目前国际上尚属空白。中国计量科学研究院与复旦大学、复旦大学泰州健康科学研究院共同研制了人源中华家系1号基因组标准物质（**Quartet，一套4个样本，编号分别为LCL5，LCL6，LCL7，LCL8，其中LCL5和LCL6为同卵双胞胎女儿，LCL7为父亲，LCL8为母亲**），以及相应的全基因组测序序列基准数据集（“量值”），为衡量基因序列检测准确与否提供一把“标尺”，成为保障基因测序数据可靠性的国家基准。人源中华家系1号基因组标准物质来源于泰州队列同卵双生双胞胎家庭，从遗传结构上体现了我国南北交界的人群结构特征，同时家系的设计也为“量值”的确定提供了遗传学依据。



 该Quality_control APP用于转录组测序（RNA Sequencing，RNA-Seq）数据的质量评估，包括原始数据质控、比对数据质控和基因表达数据质控。

 ## 流程与参数

 ![image-20200713083634120](https://tva1.sinaimg.cn/large/007S8ZIlgy1ggp1q2qstej31330u0wol.jpg)

 ## App输入文件
 inputSamplesFile

 ```
 #read1	#read2	#sample_id	#adapter_sequence	#adapter_sequence_r2
 ```

 read1 是阿里云上fastq read1的地址

 read2 是阿里云上fastq read2的地址

 sample_id 是指样本的命名

 adapter_sequence 是R1端需要去除的接头

 adapter_sequence_r2 是R2端需要去除的接头

 所有上传的文件应有规范的命名

 ## App输出文件
 1.上游质控参数

 | 列名                       | 说明 | 范围 |
 | -------------------------- | ---- | ---- |
 | SampleID                   |      |      |
 | #Date                      |      |      |
 | #LibraryPrep               |      |      |
 | Replicate                  |      |      |
 | Sample                     |      |      |
 | #SequenceMachine           |      |      |
 | #SequenceSite              |      |      |
 | #SequenceTech              |      |      |
  raw reads
 | Total_Reads_After_Trimming |      |      |
 | GC_content                 |*     |      |
 | Human.percentage           |     |      |
 | #ERCC.percentage            |      |      |
 | EColi.percentage           |      |      |
 | Adapter.percentage         |      |      |
 | #Vector.percentage          |      |      |
 | rRNA.percentage            |      |      |
 | Virus.percentage           |      |      |
 | Yeast.percentage           |      |      |
 | Mitoch.percentage          |      |      |
 | Phix.percentage            |      |      |
 | No.hits.percentage         |      |      |
 | GC_content_bamqc           |      |      |
 | Mapping_Ratio              | *     |      |
 | Insert_size_median         |  *    |      |
 | Insert_size_peak           |   *   |      |
 error rate
 average length
 3’5‘ gene cover
 duplication
 strand bias


 2.下游质控参数

 | Quality metrics                           | Category    | Description                                                  | Reference value |
 | ----------------------------------------- | ----------- | ------------------------------------------------------------ | --------------- |
 | Number of detected genes                  | One group   | This metric is used to estimate the  detection abundance of one sample. | (**, 58,395]    |
 | Detection Jaccard index (JI)              | One group   | Detection JI is the ratio of number of the  genes detected in both replicates than the number of the genes detected in  either of the replicates. This metric is used to estimate the  repeatability of one sample detected gene from different replicates. | [0.8, 1]        |
 | Coefficient of variation (CV)             | One group   | CV is calculated based on the  normalized expression levels in all 3 replicates of one sample for each  genes. This metric is used to estimate the repeatability of one sample  expression level from different replicates. | [0, 0.2]        |
 | Correlation of technical replicates (CTR) | One group   | CTR is calculated based on the correlation  of one sample expression level from different replicates. | [0.95, 1]       |
 | Signal-to-noise Ratio (SNR)               | More groups | Signal is defined as the average distance  between libraries from the different samples on PCA plots and noise are those  form the same samples. SNR is used to assess the ability to distinguish  technical replicates from different biological samples. | [5, inf)        |
 | Sensitivity of  detection                 | One group   | Sensitivity is the proportion of  "true" detected genes from reference dataset which can be  correctly detected by the test set. | [0.96, 1]       |
 | /Reference  dependent                     |             |                                                              |                 |
 | Specificity of  detection                 | One group   | Specificity is the proportion of  "true" non-detected genes from reference dataset which can be  correctly not detected by the test set. | [0.94, 1]       |
 | /Reference  dependent                     |             |                                                              |                 |
 | Consistency  ratio of relative expression | Two groups  | Proportion of genes that falls into  reference range (mean ± 2 fold SD) in relative ratio (log2FC). | [0.82, 1]       |
 | /Reference  dependent                     |             |                                                              |                 |
 | Correlation of  relative log2FC           | Two groups  | Pearson correlation between mean value  of reference relative ratio and test site. | [0.96,1]        |
 | /Reference  dependent                     |             |                                                              |                 |
 | Sensitivity of  DEGs                      | Two groups  | Sensitivity is the proportion of  "true" DEGs from reference dataset which can be correctly  identified as DEG by the test set. | [0.80, 1]       |
 | /Reference  dependent                     |             |                                                              |                 |
 | Specificity of  DEGs                      | Two groups  | Specificity is the proportion of  "true" not DEGs from reference dataset which can be can be  correctly identified as non-DEG by the test set. | [0.95, 1]       |
 | /Reference  dependent                     |             |                                                              |                 |





 ## 结果展示与解读





--- a/defaults
+++ b/defaults
@@ -0,0 +1,60 @@
 {   
    "fastp_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastp:0.19.6", 
    "fastp_cluster": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc", 
    "trim_front1": "0", 
    "trim_tail1": "0", 
    "max_len1": "0", 
    "trim_front2": "0", 
    "trim_tail2": "0",  
    "max_len2": "0", 
    "adapter_sequence": "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA", 
    "adapter_sequence_r2": "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT", 
    "disable_adapter_trimming": "0", 
    "length_required": "50", 
    "length_required1": "20", 
    "UMI": "0", 
    "umi_len": "0", 
    "umi_loc": "umi_loc", 
    "qualified_quality_phred": "20", 
    "disable_quality_filtering": "1",
    "hisat2_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/hisat2:v2.1.0-2", 
    "hisat2_cluster": "OnDemand bcs.a2.3xlarge img-ubuntu-vpc", 
    "idx_prefix": "genome_snp_tran", 
    "idx": "oss://pgx-reference-data/reference/hisat2/grch38_snp_tran/",
    "fasta": "GRCh38.d1.vd1.fa", 
    "pen_cansplice":"0",
    "pen_noncansplice":"3",
    "pen_intronlen":"G,-8,1",
    "min_intronlen":"30",
    "max_intronlen":"500000",
    "maxins":"500",
    "minins":"0",
    "samtools_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/samtools:v1.3.1", 
    "samtools_cluster": "OnDemand bcs.a2.large img-ubuntu-vpc", 
    "insert_size":"8000",
    "gtf": "oss://pgx-reference-data/reference/annotation/Homo_sapiens.GRCh38.93.gtf", 
    "stringtie_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/stringtie:v1.3.4", 
    "stringtie_cluster": "OnDemand bcs.a2.large img-ubuntu-vpc", 
    "minimum_length_allowed_for_the_predicted_transcripts":"200",
    "minimum_isoform_abundance":"0.01",
    "Junctions_no_spliced_reads":"10",
    "maximum_fraction_of_muliplelocationmapped_reads":"0.95",
    "fastqc_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc", 
    "fastqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqc:v0.11.5", 
    "fastqc_disk_size": "150",
    "qualimapBAMqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/qualimap:2.0.0", 
    "qualimapBAMqc_cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc", 
    "qualimapBAMqc_disk_size": "500", 
    "qualimapRNAseqqc_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/qualimap:2.0.0", 
    "qualimapRNAseqqc_cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc", 
    "qualimapRNAseqqc_disk_size": "500", 
    "fastqscreen_docker": "registry.cn-shanghai.aliyuncs.com/pgx-docker-registry/fastqscreen:0.12.0", 
    "fastqscreen_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc",
    "screen_ref_dir": "oss://pgx-reference-data/fastq_screen_reference/", 
    "fastq_screen_conf": "oss://pgx-reference-data/fastq_screen_reference/fastq_screen.conf", 
    "fastqscreen_disk_size": "100", 
    "ref_dir": "oss://chinese-quartet/quartet-storage-data/reference_data/",
    "multiqc_cluster_config": "OnDemand bcs.b2.3xlarge img-ubuntu-vpc", 
    "multiqc_docker": "registry-vpc.cn-shanghai.aliyuncs.com/pgx-docker-registry/multiqc:v1.8", 
    "multiqc_disk_size": "100"   
    }
--- a/inputs
+++ b/inputs
@@ -0,0 +1,61 @@
 {
 	"{{ project_name }}.inputSamplesFile": "{{ inputSamplesFile }}",
 	"{{ project_name }}.fastp_docker": "{{ fastp_docker }}",
 	"{{ project_name }}.fastp_cluster": "{{ fastp_cluster }}",
 	"{{ project_name }}.trim_front1": "{{ trim_front1 }}",
 	"{{ project_name }}.trim_tail1": "{{ trim_tail1 }}",
 	"{{ project_name }}.max_len1": "{{ max_len1 }}",
 	"{{ project_name }}.trim_front2": "{{ trim_front2 }}",
 	"{{ project_name }}.trim_tail2": "{{ trim_tail2 }}",
 	"{{ project_name }}.max_len2": "{{ max_len2 }}",
 	"{{ project_name }}.adapter_sequence": "{{ adapter_sequence }}",
 	"{{ project_name }}.adapter_sequence_r2": "{{ adapter_sequence_r2 }}",
 	"{{ project_name }}.disable_adapter_trimming": "{{ disable_adapter_trimming }}",
 	"{{ project_name }}.length_required": "{{ length_required }}",
 	"{{ project_name }}.UMI": "{{ UMI }}",
 	"{{ project_name }}.umi_loc": "{{ umi_loc }}",
 	"{{ project_name }}.umi_len": "{{ umi_len }}",
 	"{{ project_name }}.length_required": "{{ length_required }}",
 	"{{ project_name }}.qualified_quality_phred": "{{ qualified_quality_phred }}",
 	"{{ disable_quality_filtering }}.umi_loc": "{{ disable_quality_filtering }}",
 	"{{ project_name }}.hisat2_docker": "{{ hisat2_docker }}",
 	"{{ project_name }}.hisat2_cluster": "{{ hisat2_cluster }}",
 	"{{ project_name }}.idx_prefix": "{{ idx_prefix }}",
 	"{{ project_name }}.idx": "{{ idx }}",
 	"{{ project_name }}.fasta": "{{ fasta }}",
 	"{{ project_name }}.pen_cansplice": "{{ pen_cansplice }}",
 	"{{ project_name }}.pen_noncansplice": "{{ pen_noncansplice }}",
 	"{{ project_name }}.pen_intronlen": "{{ pen_intronlen }}",
 	"{{ project_name }}.min_intronlen": "{{ min_intronlen }}",
 	"{{ project_name }}.max_intronlen": "{{ max_intronlen }}",
 	"{{ project_name }}.maxins": "{{ maxins }}",
 	"{{ project_name }}.minins": "{{ minins }}",
 	"{{ project_name }}.samtools_docker": "{{ samtools_docker }}",
 	"{{ project_name }}.samtools_cluster": "{{ samtools_cluster }}",
 	"{{ project_name }}.insert_size": "{{ insert_size }}",
 	"{{ project_name }}.gtf": "{{ gtf }}",
 	"{{ project_name }}.stringtie_docker": "{{ stringtie_docker }}",
 	"{{ project_name }}.stringtie_cluster": "{{ stringtie_cluster }}",
 	"{{ project_name }}.minimum_length_allowed_for_the_predicted_transcripts": "{{ minimum_length_allowed_for_the_predicted_transcripts }}",
 	"{{ project_name }}.minimum_isoform_abundance": "{{ minimum_isoform_abundance }}",
 	"{{ project_name }}.Junctions_no_spliced_reads": "{{ Junctions_no_spliced_reads }}",
 	"{{ project_name }}.maximum_fraction_of_muliplelocationmapped_reads": "{{ maximum_fraction_of_muliplelocationmapped_reads }}",
 	"{{ project_name }}.fastqc_cluster_config": "{{ fastqc_cluster_config }}",
 	"{{ project_name }}.fastqc_docker": "{{ fastqc_docker }}",
 	"{{ project_name }}.fastqc_disk_size": "{{ fastqc_disk_size }}",
 	"{{ project_name }}.qualimapBAMqc_docker": "{{ qualimapBAMqc_docker }}",
 	"{{ project_name }}.qualimapBAMqc_cluster_config": "{{ qualimapBAMqc_cluster_config }}",
 	"{{ project_name }}.qualimapBAMqc_disk_size": "{{ qualimapBAMqc_disk_size }}",
 	"{{ project_name }}.qualimapRNAseqqc_docker": "{{ qualimapRNAseqqc_docker }}",
 	"{{ project_name }}.qualimapRNAseqqc_cluster_config": "{{ qualimapRNAseqqc_cluster_config }}",
 	"{{ project_name }}.qualimapRNAseqqc_disk_size": "{{ qualimapRNAseqqc_disk_size }}",
 	"{{ project_name }}.fastqscreen_docker": "{{ fastqscreen_docker }}",
 	"{{ project_name }}.fastqscreen_cluster_config": "{{ fastqscreen_cluster_config }}",
 	"{{ project_name }}.screen_ref_dir": "{{ screen_ref_dir }}",
 	"{{ project_name }}.fastq_screen_conf": "{{ fastq_screen_conf }}",
 	"{{ project_name }}.fastqscreen_disk_size": "{{ fastqscreen_disk_size }}",
 	"{{ project_name }}.ref_dir": "{{ ref_dir }}",
 	"{{ project_name }}.multiqc_cluster_config": "{{ multiqc_cluster_config }}",
 	"{{ project_name }}.multiqc_docker": "{{ multiqc_docker }}",
 	"{{ project_name }}.multiqc_disk_size": "{{ multiqc_disk_size }}"
 }
--- a/tasks/.DS_Store
+++ b/tasks/.DS_Store
--- a/tasks/fastp.wdl
+++ b/tasks/fastp.wdl
@@ -0,0 +1,67 @@
 task fastp {
    String sample_id
    File read1
    File read2
    String adapter_sequence
    String adapter_sequence_r2
    String docker
    String cluster
    String umi_loc	
    Int trim_front1 
    Int trim_tail1 
    Int max_len1 
    Int trim_front2 
    Int trim_tail2  
    Int max_len2 
    Int disable_adapter_trimming
    Int length_required
    Int umi_len
    Int UMI
    Int qualified_quality_phred
    Int length_required1
    Int disable_quality_filtering
   
 	command <<<
        mkdir -p /cromwell_root/tmp/fastp/
 	##1.Disable_quality_filtering
 	if [ "${disable_quality_filtering}" == 0 ]
        then
 	cp ${read1} /cromwell_root/tmp/fastp/{sample_id}_R1.fastq.tmp1.gz
 	cp ${read2} /cromwell_root/tmp/fastp/{sample_id}_R2.fastq.tmp1.gz
        else
 	fastp --thread 4 --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i ${read1} -I ${read2} -o /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz -O /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz -j ${sample_id}.json -h ${sample_id}.html
        fi

 	##2.UMI
 	if [ "${UMI}" == 0 ]
        then
 	cp /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz
 	cp /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz
        else
 	fastp --thread 4 -U --umi_loc=${umi_loc} --umi_len=${umi_len} --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp1.gz -I /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp1.gz -o /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz -O /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz -j ${sample_id}.json -h ${sample_id}.html
 	fi

 	##3.Trim
        if [ "${disable_adapter_trimming}" == 0 ]
        then
 	fastp --thread 4 -l ${length_required} -q ${qualified_quality_phred} -u ${length_required1} --adapter_sequence ${adapter_sequence} --adapter_sequence_r2 ${adapter_sequence_r2} --detect_adapter_for_pe --trim_front1 ${trim_front1} --trim_tail1 ${trim_tail1} --max_len1 ${max_len1} --trim_front2 ${trim_front2} --trim_tail2 ${trim_tail2} --max_len2 ${max_len2} -i /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz -I /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz -o ${sample_id}_R1.fastq.gz -O ${sample_id}_R2.fastq.gz -j ${sample_id}.json -h ${sample_id}.html
        else
 	cp /cromwell_root/tmp/fastp/${sample_id}_R1.fastq.tmp2.gz ${sample_id}_R1.fastq.gz
 	cp /cromwell_root/tmp/fastp/${sample_id}_R2.fastq.tmp2.gz ${sample_id}_R2.fastq.gz
        fi
   >>>
   
   runtime { 
 		docker: docker
 		cluster: cluster
 		systemDisk: "cloud_ssd 40"
 		dataDisk: "cloud_ssd 200 /cromwell_root/"
   }

   output {
      File json = "${sample_id}.json"
      File report = "${sample_id}.html"
      File Trim_R1 = "${sample_id}_R1.fastq.gz"
      File Trim_R2 = "${sample_id}_R2.fastq.gz"
   }
 }
--- a/tasks/fastqc.wdl
+++ b/tasks/fastqc.wdl
@@ -0,0 +1,28 @@
 task fastqc {
 	File read1
 	File read2
 	String docker
 	String cluster_config
 	String disk_size

 	command <<<
 		set -o pipefail
 		set -e
 		nt=$(nproc)
 		fastqc -t $nt -o ./ ${read1}
 		fastqc -t $nt -o ./ ${read2}
 	>>>

 	runtime {
 		docker:docker
    	cluster: cluster_config
    	systemDisk: "cloud_ssd 40"
    	dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
 	}
 	output {
 		File read1_html = sub(basename(read1), "\\.(fastq|fq)\\.gz$", "_fastqc.html")
 		File read1_zip = sub(basename(read1), "\\.(fastq|fq)\\.gz$", "_fastqc.zip")
 		File read2_html = sub(basename(read2), "\\.(fastq|fq)\\.gz$", "_fastqc.html")
 		File read2_zip = sub(basename(read2), "\\.(fastq|fq)\\.gz$", "_fastqc.zip")
 	}
 }
--- a/tasks/fastqscreen.wdl
+++ b/tasks/fastqscreen.wdl
@@ -0,0 +1,36 @@
 task fastq_screen {
 	File read1
 	File read2
 	File screen_ref_dir
 	File fastq_screen_conf
 	String read1name = basename(read1,".fastq.gz")
 	String read2name = basename(read2,".fastq.gz")
 	String docker
 	String cluster_config
 	String disk_size

 	command <<<
 		set -o pipefail
 		set -e
 		nt=$(nproc)
 		mkdir -p /cromwell_root/tmp
 		cp -r ${screen_ref_dir} /cromwell_root/tmp/
 		fastq_screen --aligner bowtie2 --conf ${fastq_screen_conf} --top 100000 --threads $nt ${read1}
 		fastq_screen --aligner bowtie2 --conf ${fastq_screen_conf} --top 100000 --threads $nt ${read2}
 	>>>

 	runtime {
 		docker:docker
    	cluster: cluster_config
    	systemDisk: "cloud_ssd 40"
    	dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
 	}
 	output {
 		File png1 = "${read1name}_screen.png"
 		File txt1 = "${read1name}_screen.txt"
 		File html1 = "${read1name}_screen.html"
 		File png2 = "${read2name}_screen.png"
 		File txt2 = "${read2name}_screen.txt"
 		File html2 = "${read2name}_screen.html"
 	}
 }
--- a/tasks/hisat2.wdl
+++ b/tasks/hisat2.wdl
@@ -0,0 +1,34 @@
 task hisat2 {
   File idx
   File Trim_R1
   File Trim_R2
   String idx_prefix
   String sample_id
   String docker
   String cluster
   String pen_intronlen
   Int pen_cansplice
   Int pen_noncansplice
   Int min_intronlen
   Int max_intronlen
   Int maxins
   Int minins
   
   command <<<
      nt=$(nproc)
      hisat2 -t -p $nt -x ${idx}/${idx_prefix} --pen-cansplice ${pen_cansplice} --pen-noncansplice ${pen_noncansplice} --pen-intronlen ${pen_intronlen} --min-intronlen ${min_intronlen} --max-intronlen ${max_intronlen} --maxins ${maxins} --minins ${minins} --un-conc-gz ${sample_id}_un.fq.gz -1 ${Trim_R1} -2 ${Trim_R2} -S ${sample_id}.sam 
   >>>
   
   runtime { 
 		docker: docker 
 		cluster: cluster
 		systemDisk: "cloud_ssd 40"
 		dataDisk: "cloud_ssd 200 /cromwell_root/"
   }

   output {
      File sam = "${sample_id}.sam"
      File unmapread_1p = "${sample_id}_un.fq.1.gz"
      File unmapread_2p = "${sample_id}_un.fq.2.gz"
   }
 }
--- a/tasks/multiqc.wdl
+++ b/tasks/multiqc.wdl
@@ -0,0 +1,47 @@
 task multiqc {

 	Array[File] read1_zip
 	Array[File] read2_zip

 	Array[File] txt1
 	Array[File] txt2

 	Array[File] bamqc_zip

 	String docker
 	String cluster_config
 	String disk_size

 	command <<<
 		set -o pipefail
 		set -e
 		mkdir -p /cromwell_root/tmp/fastqc
 		mkdir -p /cromwell_root/tmp/fastqscreen
 		mkdir -p /cromwell_root/tmp/bamqc
 		mkdir -p /cromwell_root/tmp/rnaseq

 		cp ${sep=" " read1_zip} ${sep=" " read2_zip} /cromwell_root/tmp/fastqc
 		cp ${sep=" " txt1} ${sep=" " txt2} /cromwell_root/tmp/fastqscreen
 		for i in ${sep=" " bamqc_zip}
 		do
 		  tar -zxvf $i -C /cromwell_root/tmp/bamqc
 		done

 		

 		multiqc /cromwell_root/tmp/
 	
 	>>>

 	runtime {
 		docker:docker
 		cluster:cluster_config
 		systemDisk:"cloud_ssd 40"
 		dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
 	}

 	output {
 		File multiqc_html = "multiqc_report.html"
 		Array[File] multiqc_txt = glob("multiqc_data/*")
 	}
 }
--- a/tasks/qualimapBAMqc.wdl
+++ b/tasks/qualimapBAMqc.wdl
@@ -0,0 +1,26 @@
 task qualimapBAMqc {
 	File bam
 	String bamname = basename(bam,".bam")
 	String docker
 	String cluster_config
 	String disk_size

 	command <<<
 		set -o pipefail
 		set -e
 		nt=$(nproc)
 		/opt/qualimap/qualimap bamqc -bam ${bam} -outformat PDF:HTML -nt $nt -outdir ${bamname}_bamqc --java-mem-size=32G 
 		tar -zcvf ${bamname}_bamqc_qualimap.zip ${bamname}_bamqc
 	>>>

 	runtime {
 		docker:docker
 		cluster:cluster_config
 		systemDisk:"cloud_ssd 40"
 		dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
 	}

 	output {
 		File bamqc_zip = "${bamname}_bamqc_qualimap.zip"
 	}
 }
--- a/tasks/qualimapRNAseq.wdl
+++ b/tasks/qualimapRNAseq.wdl
@@ -0,0 +1,27 @@
 task qualimapRNAseq {
 	File bam
 	File gtf
 	String bamname = basename(bam,".bam")
 	String docker
 	String cluster_config
 	String disk_size

 	command <<<
 		set -o pipefail
 		set -e
 		nt=$(nproc)
 		/opt/qualimap/qualimap rnaseq -bam ${bam} -outformat HTML -outdir ${bamname}_RNAseq -gtf ${gtf} -pe --java-mem-size=10G
 		tar -zcvf ${bamname}_RNAseq_qualimap.zip ${bamname}_RNAseq
 	>>>

 	runtime {
 		docker:docker
 		cluster:cluster_config
 		systemDisk:"cloud_ssd 40"
 		dataDisk:"cloud_ssd " + disk_size + " /cromwell_root/"
 	}

 	output {
 		File rnaseq_zip = "${bamname}_RNAseq_qualimap.zip"
 	}
 }
--- a/tasks/samtools.wdl
+++ b/tasks/samtools.wdl
@@ -0,0 +1,38 @@
 task samtools {
    File sam
    String sample_id
    String bam = sample_id + ".bam"
    String sorted_bam = sample_id + ".sorted.bam"
    String percent_bam = sample_id + ".10percent.bam"
    String sorted_bam_index = sample_id + ".sorted.bam.bai"
    String ins_size = sample_id + ".ins_size"
    String docker
    String cluster
    Int insert_size

    command <<<
       set -o pipefail
       set -e
       /opt/conda/bin/samtools view -bS ${sam} > ${bam}
       /opt/conda/bin/samtools view -bs 42.1 ${bam} > ${percent_bam}
       /opt/conda/bin/samtools sort -m 1000000000 ${bam} -o ${sorted_bam}
       /opt/conda/bin/samtools index ${sorted_bam}
       /opt/conda/bin/samtools stats -i ${insert_size} ${sorted_bam} |grep ^IS|cut -f 2- > ${sample_id}.ins_size
    >>>

    runtime {
       docker: docker
       cluster: cluster
       systemDisk: "cloud_ssd 40"
       dataDisk: "cloud_ssd 200 /cromwell_root/"
    }

    output {
      File out_bam = sorted_bam
      File out_percent = percent_bam
      File out_bam_index = sorted_bam_index
      File out_ins_size = ins_size
    }

 }

--- a/tasks/stringtie.wdl
+++ b/tasks/stringtie.wdl
@@ -0,0 +1,31 @@
 task stringtie {
    File bam
    File gtf
    String docker
    String sample_id
    String cluster
    Int minimum_length_allowed_for_the_predicted_transcripts
    Int Junctions_no_spliced_reads
    Float minimum_isoform_abundance
    Float maximum_fraction_of_muliplelocationmapped_reads

    command <<<
      nt=$(nproc)
      mkdir ballgown
      /opt/conda/bin/stringtie -e -B -p $nt -f ${minimum_isoform_abundance} -m ${minimum_length_allowed_for_the_predicted_transcripts} -a ${Junctions_no_spliced_reads} -M ${maximum_fraction_of_muliplelocationmapped_reads} -G ${gtf} -o ballgown/${sample_id}/${sample_id}.gtf -C ${sample_id}.cov.ref.gtf -A ${sample_id}.gene.abundance.txt ${bam} -g ${sample_id}_genecount.csv
    >>>
    
    runtime {
      docker: docker
      cluster: cluster
      systemDisk: "cloud_ssd 40"
      dataDisk: "cloud_ssd 150 /cromwell_root/"
    }
    
    output {
      File covered_transcripts = "${sample_id}.cov.ref.gtf"
      File gene_abundance = "${sample_id}.gene.abundance.txt"
      Array[File] ballgown = ["ballgown/${sample_id}/${sample_id}.gtf", "ballgown/${sample_id}/e2t.ctab", "ballgown/${sample_id}/e_data.ctab", "ballgown/${sample_id}/i2t.ctab", "ballgown/${sample_id}/i_data.ctab", "ballgown/${sample_id}/t_data.ctab"]
      File genecount = "{sample_id}_genecount.csv"
    }
 }
--- a/workflow.wdl
+++ b/workflow.wdl
@@ -0,0 +1,178 @@
 import "./tasks/fastp.wdl" as fastp
 import "./tasks/hisat2.wdl" as hisat2
 import "./tasks/samtools.wdl" as samtools
 import "./tasks/stringtie.wdl" as stringtie
 import "./tasks/fastqc.wdl" as fastqc
 import "./tasks/fastqscreen.wdl" as fastqscreen
 import "./tasks/qualimapBAMqc.wdl" as qualimapBAMqc
 import "./tasks/qualimapRNAseq.wdl" as qualimapRNAseq
 import "./tasks/multiqc.wdl" as multiqc

 workflow {{ project_name }} {
 	
 	File inputSamplesFile
 	File idx
 	File screen_ref_dir
 	File fastq_screen_conf
 	File gtf
 	Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
 	String fastp_docker
 	String adapter_sequence
 	String adapter_sequence_r2
 	String fastp_cluster
 	String umi_loc
 	String idx_prefix
 	String pen_intronlen
 	String fastqc_cluster_config
 	String fastqc_docker
 	String fastqscreen_docker
 	String fastqscreen_cluster_config
 	String hisat2_docker
 	String hisat2_cluster
 	String qualimapBAMqc_docker
 	String qualimapBAMqc_cluster_config
 	String qualimapRNAseqqc_docker
 	String qualimapRNAseqqc_cluster_config
 	String samtools_docker
 	String samtools_cluster
 	String stringtie_docker
 	String stringtie_cluster
 	Int trim_front1 
 	Int trim_tail1 
 	Int max_len1 
 	Int trim_front2 
 	Int trim_tail2  
 	Int max_len2 
 	Int disable_adapter_trimming
 	Int length_required
 	Int umi_len
 	Int UMI
 	Int qualified_quality_phred
 	Int length_required1
 	Int disable_quality_filtering
 	Int pen_cansplice
 	Int pen_noncansplice
 	Int min_intronlen
 	Int max_intronlen
 	Int maxins
 	Int minins
 	Int fastqc_disk_size
 	Int fastqscreen_disk_size
 	Int qualimapBAMqc_disk_size
 	Int qualimapRNAseqqc_disk_size
 	Int insert_size
 	Int minimum_length_allowed_for_the_predicted_transcripts
    Int Junctions_no_spliced_reads
    Float minimum_isoform_abundance
    Float maximum_fraction_of_muliplelocationmapped_reads

 	scatter (quartet in inputSamples){

 		call fastp.fastp as fastp {
 		input: 
 		sample_id= quartet[2],
 		read1= quartet[0], 
 		read2= quartet[1],
 		docker = fastp_docker,
 		cluster = fastp_cluster,
 		adapter_sequence = adapter_sequence,
    	adapter_sequence_r2 = adapter_sequence_r2,
 		umi_loc = umi_loc,
 		trim_front1 = trim_front1,
 		trim_tail1 = trim_tail1, 
 		max_len1  = max_len1,
 		trim_front2  = trim_front2,
 		trim_tail2   = trim_tail2,
 		max_len2  = max_len2,
 		disable_adapter_trimming = disable_adapter_trimming,
 		length_required = length_required,
 		umi_len = umi_len,
 		UMI = UMI,
 		qualified_quality_phred = qualified_quality_phred,
 		length_required1 = length_required1,
 		disable_quality_filtering = disable_quality_filtering
 		}

 		call fastqc.fastqc as fastqc {
 		input:
 		read1=fastp.Trim_R1, 
 		read2=fastp.Trim_R2
 		}

 		call fastqscreen.fastq_screen as fastqscreen {
 		input:
 		read1=fastp.Trim_R1, 
 		read2=fastp.Trim_R2,
 		screen_ref_dir=screen_ref_dir,
 		fastq_screen_conf=fastq_screen_conf
 		}

 		call hisat2.hisat2 as hisat2 {
 		input: 
 		sample_id = quartet[2],
 		idx = idx, 
 		idx_prefix = idx_prefix, 
 		Trim_R1 = fastp.Trim_R1, 
 		Trim_R2 = fastp.Trim_R2,
 		docker = hisat2_docker,
 		cluster = hisat2_cluster,
 		pen_intronlen = pen_intronlen,
 		pen_cansplice = pen_cansplice,
 		pen_noncansplice pen_noncansplice,
 		min_intronlen = min_intronlen,
 		max_intronlen = max_intronlen,
 		maxins = maxins,
 		minins = minins
 		}

 		call samtools.samtools as samtools {
 		input: 
 		sample_id = quartet[2],
 		sam = hisat2.sam,
        docker = samtools_docker,
 		cluster = samtools_cluster,
 		insert_size = insert_size
 		}
        
        call qualimapBAMqc.qualimapBAMqc as qualimapBAMqc {
 		input:
 		bam = samtools.out_percent,
 		docker = qualimapBAMqc_docker,
 		cluster = qualimapBAMqc_cluster_config,
 		disk_size = qualimapBAMqc_disk_size
 		}

 		call qualimapRNAseq.qualimapRNAseq as qualimapRNAseq {
 		input:
 		bam = samtools.out_percent,
 		docker = qualimapRNAseqqc_docker,
 		cluster = qualimapRNAseqqc_cluster_config,
 		disk_size = qualimapRNAseqqc_disk_size,
 		gtf = gtf
 		}

 		call stringtie.stringtie as stringtie {
 		input: 
 		sample_id = quartet[2],
 		gtf = gtf, 
 		bam = samtools.out_bam,
 		docker = stringtie_docker,
 		cluster = stringtie_cluster,
 		minimum_length_allowed_for_the_predicted_transcripts = minimum_length_allowed_for_the_predicted_transcripts,
    	Junctions_no_spliced_reads = Junctions_no_spliced_reads,
    	minimum_isoform_abundance = minimum_isoform_abundance,
    	maximum_fraction_of_muliplelocationmapped_reads = maximum_fraction_of_muliplelocationmapped_reads
 		}
 	}


 	call multiqc.multiqc as multiqc {
 	input:
 	read1_zip = fastqc.read1_zip,
 	read2_zip = fastqc.read2_zip,
 	txt1 = fastqscreen.txt1,
 	txt2 = fastqscreen.txt2,
 	bamqc_zip = qualimapBAMqc.bamqc_zip,
 	RNAseqqc_zip = qualimapBAMqc.rnaseq_zip
 	}
 }