4 years ago · d20a43e2bd
--- a/README.md
+++ b/README.md
 ***-s min_gap_size***
 Minimum gap size between accessible sequence regions. Regions separated by less than this distance will be joined together. [Default: 5000]
 ***segment method***
 Segmentation methods
 The following segmentation algorithms can be specified with the -m option:
 cbs – the default, circular binary segmentation (CBS). This method performed best in our benchmarking on mid-size target panels and exomes. Requires the R package DNAcopy.
 flasso – Fused Lasso, reported by some users to perform best on exomes, whole genomes, and some target panels. Sometimes faster than CBS, but the current implementation cannot be parallelized over multiple CPUs. Beyond identifying breakpoints, additionally performs significance testing to distinguish CNAs from regions of neutral copy number, so large swathes of the output may have log2 values of exactly 0. Requires the R package cghFLasso.
 haar – a pure-Python implementation of HaarSeg, a wavelet-based method. Very fast and performs reasonably well on small panels, but tends to over-segment large datasets.
 hmm (experimental) – a 3-state Hidden Markov Model suitable for most samples. Faster than CBS, and slower but more accurate than Haar. Requires the Python package hmmlearn, as do the next two methods.
 hmm-tumor (experimental) – a 5-state HMM suitable for finer-grained segmentation of good-quality tumor samples. In particular, this method can detect focal amplifications within a larger-scale, smaller-amplitude copy number gain, or focal deep deletions within a larger-scale hemizygous loss. Training this model takes a bit more CPU time than the simpler hmm method.
 hmm-germline (experimental) – a 3-state HMM with fixed amplitude for the loss, neutral, and gain states corresponding to absolute copy numbers of 1, 2, and 3. Suitable for germline samples and single-cell sequencing of samples with mostly-diploid genomes that are not overly aneuploid.
 none – simply calculate the weighted mean log2 value of each chromosome arm. Useful for testing or debugging, or as a baseline for benchmarking other methods.
 The first two methods use R internally, and to use them you will need to have R and the R package dependencies installed (i.e. DNAcopy, cghFLasso). If you installed CNVkit with conda as recommended, these should have been installed for you automatically. If you installed the R packages in a nonstandard or non-default location, you can specify the location of the right Rscript executable you want to use with --rscript-path.
 The HMM methods hmm, hmm-tumor and hmm-germline were introduced provisionally in CNVkit v.0.9.2, and may change in future releases. 
 ## Output
 1. *.cnn/cns of each sample.
 2. A whole-genome copy ratio profile as a PDF scatter plot.
--- a/defaults
+++ b/defaults
    "cluster_config": "OnDemand bcs.a2.7xlarge img-ubuntu-vpc",
    "bed": "oss://pgx-reference-data/reference/wes_bedfiles/agilent_v6/SureSelect_Human_All_Exon_V6_r2.bed",
    "ref_flat": "oss://pgx-reference-data/GRCh38.d1.vd1/refFlat.txt",
    "min_gap_size": "5000"
    "min_gap_size": "5000",
    "method": "hybrid",
    "segment_method": "cbs"
 }
--- a/inputs
+++ b/inputs
  "{{ project_name }}.faidx": "{{ faidx }}",
  "{{ project_name }}.ref_flat": "{{ ref_flat }}",
  "{{ project_name }}.method": "{{ method }}",
  "{{ project_name }}.segment_method": "{{ segment_method }}",
  "{{ project_name }}.reference": "{{ reference }}",
  "{{ project_name }}.docker": "{{ docker }}",
  "{{ project_name }}.bed": "{{ bed }}",
--- a/tasks/batch.wdl
+++ b/tasks/batch.wdl
    File access_bed
    File? reference
    String method
    String segment_method
    String docker
    String cluster_config
    String disk_size
        mkdir results
        cnvkit.py batch ${sep=' ' tumor_bam} --normal ${sep=' ' normal_bam} \
        --method ${method} \
        --method ${method} --segment-method ${segment_method} \
        --targets amplicon.bed ${access_opt} ${annotate_opt} \
        --fasta hg38.fa ${reference_opt} \
        --output-reference ~/${sample_id}.reference.cnn \
--- a/workflow.wdl
+++ b/workflow.wdl
    File? reference
    String min_gap_size
    String method
    String segment_method
    String docker
    String cluster_config
    String disk_size
        faidx = faidx,
        ref_flat = ref_flat,
        method = method,
        segment_method = segment_method,
        reference = reference,
        tumor_bam = tumor_bam,
        tumor_bai = tumor_bai,