|
2 years ago | |
---|---|---|
assets | 2 years ago | |
tasks | 2 years ago | |
README.md | 2 years ago | |
inputs | 2 years ago | |
workflow.wdl | 2 years ago |
Author: Huang Yechao
E-mail:17210700095@fudan.edu.cn
Git: http://choppy.3steps.cn/huangyechao/wes-germline.git
Last Updates: 16/1/2019
Description
本 APP 所构建的是用于二代测序全外显子组 Germline 分析流程。使用的软件是Sentieon:A fast and accurate solution to variant calling from next-generation sequence data 。本流程构建所使用的方法是基于流程语言WDL 并将其封装为Choppy平台上的APP进行使用。流程图如下所示:
R1
和R2
;此外还应当包含有测序时使用的 bed
文件# 激活choppy环境
source activate choppy-latest
# 安装app
choppy install huangyechao/wes-germline:<version>
sample.csv
文件为提交任务时使用的输入文件,其内容是根据input
文件中定义的信息对应生成的,也可使用 Choppy
的 samples
功能生成:
choppy samples wes-germline --output samples.csv
#### samples.csv
read1,read2,regions,sample_name,cluster,disk_size,sample_id
其中sample_id
对应于所分析样本的索引号,用于生成当前样本提交时的任务信息,应注意不要包含_
,否则会出现报错。
choppy batch wes-germline samples.csv --project-name your_project
tasks
目录中分析流程中每一个步骤的 WDL 文件,如 mapping.wdl
如下所示
task mapping {
String fasta
File ref_dir
File fastq_1
File fastq_2
String SENTIEON_INSTALL_DIR
String group
String sample
String pl
String docker
String cluster_config
String disk_size
command <<<
set -o pipefail
set -e
export SENTIEON_LICENSE=192.168.0.55:8990
nt=$(nproc)
${SENTIEON_INSTALL_DIR}/bin/bwa mem -M -R "@RG\tID:${group}\tSM:${sample}\tPL:${pl}" -t $nt ${ref_dir}/${fasta} ${fastq_1} ${fastq_2} | ${SENTIEON_INSTALL_DIR}/bin/sentieon util sort -o ${sample}.sorted.bam -t $nt --sam2bam -i -
>>>
runtime {
dockerTag:docker
cluster: cluster_config
systemDisk: "cloud_ssd 40"
dataDisk: "cloud_ssd " + disk_size + " /cromwell_root/"
}
output {
File sorted_bam = "${sample}.sorted.bam"
File sorted_bam_index = "${sample}.sorted.bam.bai"
}
}
workflow.wdl
是定义了每一个步骤的输入文件以及各个步骤之间的以来关系的文件:
import "./tasks/mapping.wdl" as mapping
import "./tasks/Metrics.wdl" as Metrics
import "./tasks/Dedup.wdl" as Dedup
import "./tasks/deduped_Metrics.wdl" as deduped_Metrics
import "./tasks/Realigner.wdl" as Realigner
import "./tasks/BQSR.wdl" as BQSR
import "./tasks/Haplotyper.wdl" as Haplotyper
workflow {{ project_name }} {
File fastq_1
File fastq_2
String SENTIEON_INSTALL_DIR
String sample
String docker
String fasta
File ref_dir
File dbmills_dir
String db_mills
File dbsnp_dir
File regions
String dbsnp
String disk_size
String cluster_config
call mapping.mapping as mapping {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
group=sample,
sample=sample,
pl="ILLUMINAL",
fasta=fasta,
ref_dir=ref_dir,
fastq_1=fastq_1,
fastq_2=fastq_2,
docker=docker,
disk_size=disk_size,
cluster_config=cluster_config
}
call Metrics.Metrics as Metrics {
input:
SENTIEON_INSTALL_DIR=SENTIEON_INSTALL_DIR,
fasta=fasta,
ref_dir=ref_dir,
sorted_bam=mapping.sorted_bam,
sorted_bam_index=mapping.sorted_bam_index,
sample=sample,
docker=docker,
disk_size=disk_size,
cluster_config=cluster_config
}
......
......
}
其中文件最上面的 import
代表了所要使用的task文件,中间部分File/String xxx
表明了任务所传递出需要定义变量及其类型,call
部分声明了流程的各个步骤及其依赖关系。(文档的具体说明详见WDL)
input
文件为整个 APP 运行时所要输入的参数,对于可以固定的参数可以直接在input
文件中给出,对于需要改变的参数用{{}}
进行引用,将会使得参数在 samples
文件中出现;其中project_name
为所运行的任务的名称,需要在提交任务是进行定义
{
"{{ project_name }}.fasta": "GRCh38.d1.vd1.fa",
"{{ project_name }}.ref_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/",
"{{ project_name }}.dbsnp": "dbsnp_146.hg38.vcf",
"{{ project_name }}.fastq_1": "{{ read1 }}",
"{{ project_name }}.SENTIEON_INSTALL_DIR": "/opt/sentieon-genomics",
"{{ project_name }}.dbmills_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/",
"{{ project_name }}.db_mills": "Mills_and_1000G_gold_standard.indels.hg38.vcf",
"{{ project_name }}.cluster_config": "{{ cluster if cluster != '' else 'OnDemand ecs.sn1ne.4xlarge img-ubuntu-vpc' }}",
"{{ project_name }}.docker": "localhost:5000/sentieon-genomics:v2018.08.01 oss://pgx-docker-images/dockers",
"{{ project_name }}.dbsnp_dir": "oss://pgx-reference-data/GRCh38.d1.vd1/",
"{{ project_name }}.sample": "{{ sample_name }}",
"{{ project_name }}.disk_size": "{{ disk_size }}",
"{{ project_name }}.regions": "{{ regions }}",
"{{ project_name }}.fastq_2": "{{ read2 }}"
}
{{ cluster if cluster != '' else 'OnDemand ecs.sn1ne.4xlarge img-ubuntu-vpc' }}
表示当没有指定cluster
的配置信息时,则默认使用 ecs.sn1ne.4xlarge
详见Choppy使用说明)