Infer and visualize copy number from high-throughput DNA sequencing data.
Você não pode selecionar mais de 25 tópicos Os tópicos devem começar com uma letra ou um número, podem incluir traços ('-') e podem ter até 35 caracteres.

4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
4 anos atrás
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
  1. # CNVkit
  2. > Author: Yaqing Liu
  3. >
  4. > E-mail: yaqing.liu@outlook.com
  5. >
  6. CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
  7. Official document: https://cnvkit.readthedocs.io/en/stable/index.html
  8. ## Install
  9. ```
  10. # activate choppy environment
  11. open-choppy-env
  12. # install app
  13. choppy install YaqingLiu/CNVkit
  14. ```
  15. ## Copy number calling pipeline
  16. ![image](https://cnvkit.readthedocs.io/en/stable/_images/workflow.png)
  17. ## Input
  18. ```json
  19. {
  20. "tumor_bam": [
  21. "oss://choppy-cromwell-result/...bam",
  22. "oss://choppy-cromwell-result/...bam",
  23. "oss://choppy-cromwell-result/...bam"
  24. ],
  25. "tumor_bai": [
  26. "oss://choppy-cromwell-result/...bai",
  27. "oss://choppy-cromwell-result/...bai",
  28. "oss://choppy-cromwell-result/...bai"
  29. ],
  30. "normal_bam": [
  31. "oss://choppy-cromwell-result/...bam",
  32. "oss://choppy-cromwell-result/...bam",
  33. "oss://choppy-cromwell-result/...bam"
  34. ],
  35. "normal_bai": [
  36. "oss://choppy-cromwell-result/...bai",
  37. "oss://choppy-cromwell-result/...bai",
  38. "oss://choppy-cromwell-result/...bai"
  39. ],
  40. "sample_id": "...",
  41. "method": "...",
  42. "reference": "..." # this parameter is optional
  43. }
  44. ```
  45. ## Note
  46. <font color=darkred>***-m {hybrid,amplicon,wgs}, --seq-method {hybrid,amplicon,wgs}, --method {hybrid,amplicon,wgs}***</font>
  47. Sequencing assay type: hybridization capture ('hybrid'), targeted amplicon sequencing ('amplicon'), or whole genome sequencing ('wgs').
  48. Determines whether and how to use antitarget bins.
  49. <font color=darkred>***sequencing-accessible regions***</font>
  50. Many fully sequenced genomes, including the human genome, contain large regions of DNA that are inaccessable to sequencing. (These are mainly the centromeres, telomeres, and highly repetitive regions.) In the FASTA genome sequence these regions are filled in with large stretches of N characters. These regions cannot be mapped by resequencing, so we can avoid them when calculating the antitarget locations by passing the locations of the accessible sequence regions with the -g or --access option.
  51. To use CNVkit on **amplicon sequencing data** instead of hybrid capture – **although this is not recommended** – you can exclude all off-target regions from the analysis by passing the target BED file as the “access” file as well:
  52. ```shell
  53. cnvkit.py batch ... -t Tiled.bed -g Tiled.bed ...
  54. ```
  55. This results in empty ”.antitarget.cnn” files which CNVkit will handle safely from version 0.3.4 onward. **However, this approach does not collect any copy number information between targeted regions, so it should only be used if you have in fact prepared your samples with a targeted amplicon sequencing protocol.**
  56. ***To reuse an existing reference or create a new:***
  57. *-r REFERENCE, --reference REFERENCE*
  58. Copy number reference file (.cnn).
  59. *--output-reference FILENAME*
  60. Output filename/path for the new reference file being created. (If given, ignores the -o/--output-dir option and will write the file to the given path. Otherwise, "reference.cnn" will be created in the current directory or specified output directory.)
  61. ***--annotate***
  62. The gene annotations file (refFlat.txt) is useful to apply gene names to your baits BED file, if the BED file does not already have short, informative names for each bait interval. This file can be used in the next step.
  63. If the BED looks like this:
  64. > chr1 1508981 1509154 SSU72
  65. >
  66. > chr1 2407978 2408183 PLCH2
  67. >
  68. > chr1 2409866 2410095 PLCH2
  69. Then you don’t need refFlat.txt.
  70. ***index files***
  71. If you’ve prebuilt the index file (.bai, .fai), make sure its timestamp is later than the BAM file’s and fa's.
  72. CNVkit will automatically index the file if needed – that is, if the .bai/.fa file is missing, or if the timestamp of the .bai/.fa file is older than that of the corresponding .bam/.fa file.
  73. ***-s min_gap_size***
  74. Minimum gap size between accessible sequence regions. Regions separated by less than this distance will be joined together. [Default: 5000]
  75. ## Output
  76. 1. *.cnn/cns of each sample.
  77. 2. A whole-genome copy ratio profile as a PDF scatter plot.
  78. 3. An ideogram of copy ratios on chromosomes as a PDF.
  79. 4. A segment file which can be imported into IGV.