|
5 years ago | |
---|---|---|
pictures | 6 years ago | |
tasks | 5 years ago | |
README.md | 6 years ago | |
inputs | 5 years ago | |
workflow.wdl | 5 years ago |
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
FreeBayes is haplotype-based, in the sense that it calls variants based on the literal sequences of reads aligned to a particular target, not their precise alignment. This model is a straightforward generalization of previous ones (e.g. PolyBayes, samtools, GATK) which detect or report variants based on alignments. This method avoids one of the core problems with alignment-based variant detection--- that identical sequences may have multiple possible alignments:
FreeBayes uses short-read alignments for any number of individuals from a population and a reference genome to determine the most-likely combination of genotypes for the population at each position in the reference. It reports positions which it finds putatively polymorphic in variant call file format. It can also use an input set of variants (VCF) as a source of prior information, and a copy number variant map (BED) to define non-uniform ploidy variation across the samples under analysis. [1]
-C
variants supported by at least 2 observations in a single sample
-F
and also at least 20% of the reads from a single sample
--max-complex-gap
FreeBayes is capable of calling variant haplotypes shorter than a read length where multiple polymorphisms segregate on the same read. This parameter determines the maximum distance between polymorphisms phased in this way, which defaults to 3bp. In practice, this can comfortably be set to half the read length.
--min-alternate-count
Require that 2 reads in one sample support an allele in order to consider it
--min-alternate-fraction
or that the allele fraction in one sample is 0.2
FreeBayes incorporates a number of features in order to reduce the complexity of variant detection for researchers and developers:
So we use Dedup.bam from Sentieon, without doing indel realignment and BQSR.
-F
0.05, means at least 5% of the reads from a single sample support the variants
-m
,--min-mapping-quality
0, Exclude alignments from analysis if they have a mapping quality less than 1 (default). A mapping quality of zero means that the read maps to multiple locations with the same quality and that the mapper has picked one of these positions at random.
--genotype-qualities
Calculate the marginal probability of genotypes and report as GQ in each sample field in the VCF output.
For now, I am not quite sure why they set min mapping quality to 0, so I will use default settings.
Other settings you can find by :
freebayes --help
Basic usage:
freebayes -f ref.fa aln.bam >var.vcf
Command line used in this APP:
FreeBayes is very slow with single thread, using the scripts/freebayes-parallele script.
freebayes-parallel <(fasta_generate_regions.py ref.fa.fai 100000) 36 \
--genotype-qualities --max-complex-gap 75 -f ref.fa aln.bam >var.vcf
Settings:
Disk size: 400
Cluster: OnDemand ecs.sn1ne.8xlarge img-ubuntu-vpc
sample: 30x WGS
3h