Kaynağa Gözat

add segment-method

master
YaqingLiu 4 yıl önce
ebeveyn
işleme
d20a43e2bd
5 değiştirilmiş dosya ile 28 ekleme ve 2 silme
  1. +20
    -0
      README.md
  2. +3
    -1
      defaults
  3. +1
    -0
      inputs
  4. +2
    -1
      tasks/batch.wdl
  5. +2
    -0
      workflow.wdl

+ 20
- 0
README.md Dosyayı Görüntüle

***-s min_gap_size*** ***-s min_gap_size***


Minimum gap size between accessible sequence regions. Regions separated by less than this distance will be joined together. [Default: 5000] Minimum gap size between accessible sequence regions. Regions separated by less than this distance will be joined together. [Default: 5000]

***segment method***
Segmentation methods
The following segmentation algorithms can be specified with the -m option:

cbs – the default, circular binary segmentation (CBS). This method performed best in our benchmarking on mid-size target panels and exomes. Requires the R package DNAcopy.
flasso – Fused Lasso, reported by some users to perform best on exomes, whole genomes, and some target panels. Sometimes faster than CBS, but the current implementation cannot be parallelized over multiple CPUs. Beyond identifying breakpoints, additionally performs significance testing to distinguish CNAs from regions of neutral copy number, so large swathes of the output may have log2 values of exactly 0. Requires the R package cghFLasso.

haar – a pure-Python implementation of HaarSeg, a wavelet-based method. Very fast and performs reasonably well on small panels, but tends to over-segment large datasets.
hmm (experimental) – a 3-state Hidden Markov Model suitable for most samples. Faster than CBS, and slower but more accurate than Haar. Requires the Python package hmmlearn, as do the next two methods.

hmm-tumor (experimental) – a 5-state HMM suitable for finer-grained segmentation of good-quality tumor samples. In particular, this method can detect focal amplifications within a larger-scale, smaller-amplitude copy number gain, or focal deep deletions within a larger-scale hemizygous loss. Training this model takes a bit more CPU time than the simpler hmm method.

hmm-germline (experimental) – a 3-state HMM with fixed amplitude for the loss, neutral, and gain states corresponding to absolute copy numbers of 1, 2, and 3. Suitable for germline samples and single-cell sequencing of samples with mostly-diploid genomes that are not overly aneuploid.

none – simply calculate the weighted mean log2 value of each chromosome arm. Useful for testing or debugging, or as a baseline for benchmarking other methods.
The first two methods use R internally, and to use them you will need to have R and the R package dependencies installed (i.e. DNAcopy, cghFLasso). If you installed CNVkit with conda as recommended, these should have been installed for you automatically. If you installed the R packages in a nonstandard or non-default location, you can specify the location of the right Rscript executable you want to use with --rscript-path.

The HMM methods hmm, hmm-tumor and hmm-germline were introduced provisionally in CNVkit v.0.9.2, and may change in future releases.

## Output ## Output
1. *.cnn/cns of each sample. 1. *.cnn/cns of each sample.
2. A whole-genome copy ratio profile as a PDF scatter plot. 2. A whole-genome copy ratio profile as a PDF scatter plot.

+ 3
- 1
defaults Dosyayı Görüntüle

"cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc", "cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc",
"bed": "oss://pgx-reference-data/reference/wes_bedfiles/agilent_v6/SureSelect_Human_All_Exon_V6_r2.bed", "bed": "oss://pgx-reference-data/reference/wes_bedfiles/agilent_v6/SureSelect_Human_All_Exon_V6_r2.bed",
"ref_flat": "oss://pgx-reference-data/GRCh38.d1.vd1/refFlat.txt", "ref_flat": "oss://pgx-reference-data/GRCh38.d1.vd1/refFlat.txt",
"min_gap_size": "5000"
"min_gap_size": "5000",
"method": "hybrid",
"segment_method": "cbs"
} }

+ 1
- 0
inputs Dosyayı Görüntüle

"{{ project_name }}.faidx": "{{ faidx }}", "{{ project_name }}.faidx": "{{ faidx }}",
"{{ project_name }}.ref_flat": "{{ ref_flat }}", "{{ project_name }}.ref_flat": "{{ ref_flat }}",
"{{ project_name }}.method": "{{ method }}", "{{ project_name }}.method": "{{ method }}",
"{{ project_name }}.segment_method": "{{ segment_method }}",
"{{ project_name }}.reference": "{{ reference }}", "{{ project_name }}.reference": "{{ reference }}",
"{{ project_name }}.docker": "{{ docker }}", "{{ project_name }}.docker": "{{ docker }}",
"{{ project_name }}.bed": "{{ bed }}", "{{ project_name }}.bed": "{{ bed }}",

+ 2
- 1
tasks/batch.wdl Dosyayı Görüntüle

File access_bed File access_bed
File? reference File? reference
String method String method
String segment_method
String docker String docker
String cluster_config String cluster_config
String disk_size String disk_size
mkdir results mkdir results
cnvkit.py batch ${sep=' ' tumor_bam} --normal ${sep=' ' normal_bam} \ cnvkit.py batch ${sep=' ' tumor_bam} --normal ${sep=' ' normal_bam} \
--method ${method} \
--method ${method} --segment-method ${segment_method} \
--targets amplicon.bed ${access_opt} ${annotate_opt} \ --targets amplicon.bed ${access_opt} ${annotate_opt} \
--fasta hg38.fa ${reference_opt} \ --fasta hg38.fa ${reference_opt} \
--output-reference ~/${sample_id}.reference.cnn \ --output-reference ~/${sample_id}.reference.cnn \

+ 2
- 0
workflow.wdl Dosyayı Görüntüle

File? reference File? reference
String min_gap_size String min_gap_size
String method String method
String segment_method
String docker String docker
String cluster_config String cluster_config
String disk_size String disk_size
faidx = faidx, faidx = faidx,
ref_flat = ref_flat, ref_flat = ref_flat,
method = method, method = method,
segment_method = segment_method,
reference = reference, reference = reference,
tumor_bam = tumor_bam, tumor_bam = tumor_bam,
tumor_bai = tumor_bai, tumor_bai = tumor_bai,

Yükleniyor…
İptal
Kaydet