YaqingLiu/NGSCheckMate_parallel: Generating a VAF file from one FASTQ file parallelly. And then parallelized read the set of VAF files by vaf_ncm.py.

Generating a VAF file from one FASTQ file parallelly. And then parallelized read the set of VAF files by vaf_ncm.py.

YaqingLiu da65279e92 alter -p		il y a 4 ans
picture	readme	il y a 5 ans
tasks	alter -p	il y a 4 ans
.DS_Store	add maxthread	il y a 4 ans
README.md	Update README	il y a 4 ans
defaults	Update README and -f	il y a 4 ans
inputs	Add output_id into fastq_ncm.wdl	il y a 4 ans
workflow.wdl	Alter output_id	il y a 4 ans

README.md

NGSCheckMate

A C program, ngscheckmate_fastq, can be directly called to generate a VAF file from one FASTQ file (single-end sequencing) or two FASTQ files(paired-end sequencing).

Then, another script, vaf_ncm.py is used to read a set of VAF files to complete the downstream analysis. When you need to analyze many FASTQ files, the first VAF file generation using ngscheckmate_fastq can be parallelized.

If you want to analyze the correlation of samples over multiple runs, I suggest you to save the historical vaf files and download the NGSCheckMate from https://github.com/parklab/NGSCheckMate and then run vaf_ncm.py locally.

Getting Started

We recommend using choppy system and Aliyun OSS service. The command will look like this:

# Activate the choppy environment
$ open-choppy-env

# Install the APP
$ choppy install YaqingLiu/NGSCheckMate_parallel-latest [-f]

# List the parameters
$ choppy samples YaqingLiu/NGSCheckMate_parallel-latest [--no-default]

# Submit you task with the `samples.json file` and `project name`
$ choppy batch YaqingLiu/NGSCheckMate_parallel-latest samples.json -p Project [-l project:Label]

# Query the status of all tasks in the project
$ choppy query -L project:Label | grep "status"

samples.json

{
  "sample_id": "test", 
  "fastq1": ["fq1_1", "fq1_2", ..., "fq1_n"],
  "fastq2": ["fq2_1", "fq2_2", ..., "fq2_n"],
  "output_id": ["out_id1", "out_id2", ..., "out_idn"]
}

other parameters

subsampling_rate: The default subsampling rate is 1. The speed is not very slow.
-f in vaf_ncm.wdl: Use strict VAF correlation cutoffs. Recommended when your data may include related individuals (parents-child, siblings).