RNA-seq下游数据分析-ballgown到报告。 以Rscript为主,对接PGx RNA-seq choppy现有pipeline,到生成RNA-seq分析报告所需的rds和csv文件。
Du kan inte välja fler än 25 ämnen Ämnen måste starta med en bokstav eller siffra, kan innehålla bindestreck ('-') och vara max 35 tecken långa.

README.md 25KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614
  1. # RNAseqDownstream2report
  2. [TOC]
  3. RNA-seq下游数据分析-ballgown到报告。 以Rscript为主,对接PGx RNA-seq choppy现有pipeline,到生成RNA-seq分析报告所需的rds和csv文件。
  4. ## 整体流程图
  5. 包括一下几个文件:
  6. 1. **RNAseq_1_ballgown.R**:从ballgown的文件夹到基因表达水平表格(每列为样本,每行为基因)
  7. 2. **RNAseq_2_PCA.R** : 计算PCA。
  8. 3. **RNAseq_3_cor.R**:计算correlation,输出choppy report所需的scatterplot图的rds和csv文件。
  9. 4. **RNAseq_4_pwDEG.R**:根据分组信息,计算两两差异信息。
  10. 5. **RNAseq_5_pwGSEA.R:**根据基因表达水平基于GSEA进行通路分析。
  11. 6. **RNAseq_6_enrichFunc.R:**根据差异基因进行GO和KEGG通路分析。
  12. ```mermaid
  13. graph LR;
  14. A(input: ballgown dir)-->B[RNAseq_1_ballgown.R];
  15. B-->C[RNAseq_2_PCA.R];
  16. B-->D[RNAseq_3_cor.R];
  17. B-->E[RNAseq_4_pwDEG.R];
  18. B-->F[RNAseq_5_pwGSEA.R];
  19. E-->G[RNAseq_6_enrichFunc.R];
  20. ```
  21. # Quick start
  22. 1. 准备文件:
  23. 1. ballgown 文件夹
  24. 2. summary_group 样本group信息
  25. 2. 确认事项:
  26. 1. 所在机器的R/bioconductot安装包已完成(请参考library节查看)
  27. 服务器:10.157.72.53已完成包的安装。
  28. 2. 所在机器联网(RNAseq_6_enrichFunc.R需联网计算)
  29. PGx服务器每天需重新联网。
  30. 3. 代码、rdata、rds数据均已在运行目录下。
  31. 1. 代码:RNAseq开头的1-6*.R
  32. 2. rds: ID_convert_table.rds
  33. 3. rdata: human_c2_v5p2.rdata、human_c5_v5p2.rdata
  34. 3. 运行以下命令:
  35. ```shell
  36. Rscript RNAseq_1_ballgown.R -i ./ballgown/
  37. Rscript RNAseq_2_pca.R -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt -g summary_group.txt
  38. Rscript RNAseq_3_cor.R -o -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt -g group2.txt
  39. ```
  40. ## RNAseq_1_ballgown.R
  41. ### 功能简介
  42. 从ballgown的文件夹到基因表达水平表格(每列为样本,每行为基因)
  43. ### 代码参数
  44. ```shell
  45. Usage: Rscript RNAseq_1_ballgown.R [options]
  46. Options:
  47. -o OUT_DIR, --out_dir=OUT_DIR
  48. The output directory [default ~]
  49. -i INPUT, --input=INPUT
  50. The directory input of expression files. It is output from ballgown software.
  51. -f NUMBER, --floor_value=NUMBER
  52. A number to add to each value before log2 transformation to avoid infinite value.[default: 0.01]
  53. -l TRUE, --log2_norm=TRUE
  54. Perform log2 transformation on FPKM value. [default: TRUE]
  55. -p PROJECT_CODE, --project_code=PROJECT_CODE
  56. Project code, which is used as prefix of output file. [default: rnaseq]
  57. -h, --help
  58. Show this help message and exit
  59. ```
  60. 参数解释
  61. | 参数 | 取值类型 | 解释 | 例如 |
  62. | -------------------------------------------- | ---------- | ------------------------------------------------------------ | ----------- |
  63. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  64. | -i INPUT, --input=INPUT | character | 输入路径, ballgown的文件夹,**必须输入**。要求ballgown文件夹中文件如下所示<http://www.bioconductor.org/packages/release/bioc/vignettes/ballgown/inst/doc/ballgown.html | ./ballgown/ |
  65. | -f NUMBER, --floor_value=NUMBER | number | 在log2转换时,需在0中加入一个底值。默认为0.01 | 0.01 |
  66. | -l TRUE, --log2_norm=TRUE | TRUE/FALSE | 是否进行log2转换 | TRUE |
  67. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  68. | -h, --help | | 查看帮助文档并退出 | -h |
  69. ### 输出结果
  70. tab分隔的基因表达谱,文件名为:rnaseq_geneexp_fpkm_c4r58395_2019-04-30.txt
  71. 文件内容例如:
  72. > Gene P1 P2 P3
  73. > ENSG00000000003 2.951 5.085 3.592
  74. > ENSG00000000005 -6.644 -4.248 -3.085
  75. > ENSG00000000419 4.966 6.197 5.332
  76. > ENSG00000000457 0.854 1.838 0.665
  77. > ENSG00000000460 -0.19 1.693 0.145
  78. > ENSG00000000938 -6.644 -6.148 -6.644
  79. > ENSG00000000971 -5.919 -6.644 -6.644
  80. > ENSG00000001036 5.134 5.47 4.998
  81. > ENSG00000001084 2.676 2.303 2.638
  82. ### 运行示例
  83. ```shell
  84. #最少输入
  85. Rscript RNAseq_1_ballgown.R -i ./ballgown/
  86. # 其他输入
  87. Rscript RNAseq_1_ballgown.R -o /home/yuying/rnaseqreport_test -i ./ballgown/ -l FALSE -p test
  88. ```
  89. ## RNAseq_2_pca.R
  90. ### 功能简介
  91. 计算PCA,输出choppy report所需的scatterplot图的rds和csv文件。
  92. ### 代码参数
  93. ```shell
  94. Usage: Rscript RNAseq_2_pca.R [options]
  95. Options:
  96. -o OUT_DIR, --out_dir=OUT_DIR
  97. The output directory [default ~]
  98. -i INPUT, --input=INPUT
  99. The input expression files. required!
  100. -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP
  101. File for sample group infomation.The input file containing sample name and group infomation. note colname must be like: sample group1 group2...
  102. -p PROJECT_CODE, --project_code=PROJECT_CODE
  103. Project code, which is used as prefix of output file. [default: rnaseq]
  104. -h, --help
  105. Show this help message and exit
  106. ```
  107. | 参数 | 取值类型 | 解释 | 例如 |
  108. | -------------------------------------------- | --------- | ------------------------------------------------------------ | ----------- |
  109. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  110. | -i INPUT, --input=INPUT | character | 输入文件名,**必须输入。**输入表达谱必须是log scaled的tab分隔的表达谱,可以是RNAseq_1_ballgown.R的输出文件。 | example.txt |
  111. | -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP | character | 有tab分隔的样本的分组信息,一行为一个样本,每列为分组信息。分组信息可以是多列。如果没有,可以不输入。 | group.txt |
  112. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  113. | -h, --help | | 查看帮助文档并退出 | -h |
  114. ### 输出结果
  115. 各样本的各PC值,choppy report所需的scatterplot图的rds和csv文件,(其中绘图时仅需rds文件,csv文件就看看):
  116. rnaseq_pca.csv:逗号分隔的文件
  117. rnaseq_pca.rds:R对象
  118. 内容如下:
  119. > "","PC1","PC2","PC3","sample","group1","group2"
  120. > "P1",-135.940,-151.769,-4.017e-15,"P1","A","test1"
  121. > "P2",259.848,-4.758,-1.906e-13,"P2","B","test2"
  122. > "P3",-123.908,156.528,2.464e-13,"P3","B","test1"
  123. ### 运行示例
  124. ```shell
  125. #最少输入
  126. Rscript RNAseq_2_pca.R -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt
  127. #其他输入
  128. Rscript RNAseq_2_pca.R -o -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt -g group2.txt -p test
  129. ```
  130. ### choppy report
  131. 有group信息:
  132. @scatter-plot(dataFile='/*yourdir*/data/rnaseq_pca.rds', dataType='rds', xAxis='PC1', xTitle="",yAxis='PC2',yTitle="",colorAttr="group1")
  133. 无group信息:
  134. @scatter-plot(dataFile='/*yourdir*/data/rnaseq_pca.rds', dataType='rds', xAxis='PC1', xTitle="",yAxis='PC2',yTitle="")
  135. ## RNAseq_3_cor.R
  136. ### 功能简介
  137. 计算correlation,输出choppy report所需的scatterplot图的rds和csv文件。
  138. ### 代码参数
  139. ```shell
  140. Usage: Rscript RNAseq_3_cor.R [options]
  141. Options:
  142. -o OUT_DIR, --out_dir=OUT_DIR
  143. The output directory [default ./]
  144. -i INPUT, --input=INPUT
  145. The input expression files. required!
  146. -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP
  147. File for sample group infomation.The input file containing sample name and group infomation. note colname must be like: sample group1 group2...
  148. -p PROJECT_CODE, --project_code=PROJECT_CODE
  149. Project code, which is used as prefix of output file. [default: rnaseq]
  150. -h, --help
  151. Show this help message and exit
  152. ```
  153. | 参数 | 取值类型 | 解释 | 例如 |
  154. | -------------------------------------------- | --------- | ------------------------------------------------------------ | ----------- |
  155. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  156. | -i INPUT, --input=INPUT | character | 输入文件名,**必须输入。**输入表达谱必须是log scaled的tab分隔的表达谱,可以是RNAseq_1_ballgown.R的输出文件。 | example.txt |
  157. | -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP | character | 有tab分隔的样本的分组信息,一行为一个样本,每列为分组信息。分组信息可以是多列。这项输入在该代码中与输出无关。该参数设置仅为串行的逻辑完整性。 | group.txt |
  158. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  159. | -h, --help | | 查看帮助文档并退出 | -h |
  160. ### 输出结果
  161. 样本两两关系的peason correlation matrix,choppy report所需的heatmap图的rds和csv文件,(其中绘图时仅需rds文件,csv文件就看看)。
  162. rnaseq_cor.csv:逗号分隔的文件
  163. rnaseq_cor.rds:R对象
  164. 内容如下:
  165. > "","P1","P2","P3"
  166. > "P1",1,0.898,0.944
  167. > "P2",0.898,1,0.901
  168. > "P3",0.944,0.901,1
  169. ### 运行示例
  170. ```shell
  171. #最少输入
  172. Rscript RNAseq_3_cor.R -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt
  173. #其他输入
  174. Rscript RNAseq_3_cor.R -o -i ballgown_geneexp_log2fpkm_floor0p01_c3r58395_2019-04-29.txt -g group2.txt -p test
  175. ```
  176. ### choppy report
  177. @heatmap-d3(dataFile='/*yourdir*/data/rnaseq_cor.rds', dataType='rds', labCol='TRUE')
  178. ## RNAseq_4_pwDEG.R
  179. ### 功能简介
  180. 根据sample_group的分组信息,两两计算差异基因,输出差异基因列表和火山图所需文件,用于choppy报告系统。差异基因的cutoff:t test p<0.05 同时 log2 fold change >=1 或<= (-1)
  181. ### 代码参数
  182. ```shell
  183. Usage: Rscript RNAseq_4_pwDEG.R [options]
  184. Options:
  185. -o OUT_DIR, --out_dir=OUT_DIR
  186. The output directory [default ./]
  187. -i INPUT, --input=INPUT
  188. The input expression files. required!
  189. -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP
  190. File for sample group infomation.The input file containing sample name and group infomation. note colname must be like: sample group1 group2...
  191. -p PROJECT_CODE, --project_code=PROJECT_CODE
  192. Project code, which is used as prefix of output file. [default: rnaseq]
  193. -a FALSE, --output_all_genes=FALSE
  194. Output rds files for choppy contains all genes. By default, only DEGs are listed in the output rds and csv for report. NOTE choppy report may not be availble to display correctly if too many points exit. [default: FALSE]
  195. -b FALSE, --low_expr_filter=FALSE
  196. Conduct low expression filtering before DEG analysis. [default: FALSE]
  197. -f NUMBER, --low_expr_filter_cutoff=NUMBER
  198. Genes across all samples with expreesion lower than this value will be filtered out [default: 0]
  199. -h, --help
  200. Show this help message and exit
  201. ```
  202. | 参数 | 取值类型 | 解释 | 例如 |
  203. | -------------------------------------------- | ---------- | ------------------------------------------------------------ | ----------- |
  204. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  205. | -i INPUT, --input=INPUT | character | 输入文件名,**必须输入。**输入表达谱必须是log scaled的tab分隔的表达谱,可以是RNAseq_1_ballgown.R的输出文件。 | example.txt |
  206. | -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP | character | 有tab分隔的样本的分组信息,**必须输入**。格式为:每行一个样本,每列为分组信息。分组信息可以是多列。 | group.txt |
  207. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  208. | -a FALSE, --output_all_genes=FALSE | TRUE/FALSE | 是否输出所有基因 | FALSE |
  209. | -b FALSE, --low_expr_filter=FALSE | TRUE/FALSE | 在差异其因寻找之前是否进行低表达基因过滤 | FALSE |
  210. | -f NUMBER, --low_expr_filter_cutoff=NUMBER | number | 在所有样本中,低于-f的表达水平的基因会作为低表达基因过滤掉 | 0 |
  211. | -h, --help | | 查看帮助文档并退出 | -h |
  212. ### 输出结果
  213. 该代码将输出一系列输出文件:
  214. 1. PROJECT_CODE_GROUPversus_deg.csv(rds)
  215. 根据分组情况进行两两比较,每次比较的结果输出1个PROJECT_CODE_GROUPversus_deg.csv,一个PROJECT_CODE_GROUPversus_deg.rds。因此根据比较次数,将输出比较次数*2个文件。
  216. choppy report所需的火山图的rds和csv文件,(其中绘图时仅需rds文件,csv文件就看看)。
  217. 例如:
  218. rnaseq_AvsB_degs.csv
  219. rnaseq_AvsB_degs.rds
  220. 文件内容为:每次比较的差异基因、比较的组别信息、log2FC、-log10 P value、是否为DEG。默认情况下只输出DEG,若计算时设置参数“-a TRUE”,则输出所有基因,sigene标记为nonDEG和DEG两类。
  221. > "gene","versus","logfc","log10p","sigene"
  222. > "ENSG00000004776","A vs B",-2.82273809523809,1.50125858539637,"DEG"
  223. > "ENSG00000007171","A vs B",3.06833333333333,1.62715851425345,"DEG"
  224. > "ENSG00000011347","A vs B",-1.10492857142857,1.31435980565864,"DEG"
  225. > "ENSG00000011677","A vs B",-2.19445238095238,1.55184587515544,"DEG"
  226. 2. PROJECT_CODE_deg_acrossgroups.csv
  227. 将1产生的所有差异基因(仅差异基因)集合到一起,用于差异基因清单表展示。
  228. > "gene","versus","pvalue","log2FC"
  229. > "ENSG00000004776","A vs B",0.03153,-2.823
  230. > "ENSG00000007171","A vs B",0.0236,3.068
  231. > "ENSG00000011347","A vs B",0.04849,-1.105
  232. > "ENSG00000011677","A vs B",0.02806,-2.194
  233. > "ENSG00000012504","A vs B",0.04607,3.357
  234. 3. PROJECT_CODE_GROUPversus_deg_stats.csv(rds)
  235. 差异基因统计,用于choppy报告系统中的barplot图。
  236. > "number","type","versus"
  237. > 65,"upregulated","A vs B"
  238. > 138,"downregulated","A vs B"
  239. ### 运行示例
  240. ```shell
  241. #最少输入
  242. Rscript RNAseq_4_pwDEG.R -i example_geneexp_log2fpkm_floor0p01_c13r58395_2019-04-30.txt -g group13_1.txt
  243. #其他输入
  244. Rscript RNAseq_4_pwDEG.R -i example_geneexp_log2fpkm_floor0p01_c13r58395_2019-04-30.txt -g group13_1.txt -p example_2 --low_expr_filter=TRUE -f -1
  245. ```
  246. ### choppy report
  247. 1. 火山图
  248. @scatter-plot(dataFile='/*yourdir*/data/rnaseq_AvsB_degs.rds', dataType='rds', xAxis='logfc', xTitle="log2FC",yAxis='log10p',yTitle="-log10 (p)", colorAttr="sigene")
  249. 2. 差异基因清单表
  250. @data-table-js(dataUrl=/*yourdir*/data/rnaseq_degs_acrossgroups.csv')
  251. 3. 整体差异基因数量
  252. @stack-barplot-r(dataFile='/*yourdir*/data/rnaseq_degs_stats.rds', dataType='rds', xAxis='versus', yAxis='number',labelAttr='type',barPos='stack')
  253. ## RNAseq_5_pwGSEA.R
  254. ### 功能简介
  255. 利用fgsea包对不同比较的基因进行GSEA通路分析。GSEA原理请参看:<https://www.pnas.org/content/102/43/15545>
  256. ### 代码参数
  257. ```shell
  258. Usage: Rscript RNAseq_5_pwGSEA.R [options]
  259. Options:
  260. -o OUT_DIR, --out_dir=OUT_DIR
  261. The output directory [default ./]
  262. -i INPUT, --input=INPUT
  263. The input expression files. Required!
  264. -e TYPE_GENE_ID, --type_gene_id=TYPE_GENE_ID
  265. The type of gene symbol. Could be either of EnsemblGID/EntrezID/GeneSymbol [default: EnsemblGID]
  266. -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP
  267. File for sample group infomation.The input file containing sample name and group infomation. note colname must be like: sample group1 group2... Required!
  268. -f NUMBER, --padjvalueCutoff=NUMBER
  269. Cutoff value of adjusted p value. [default: 0.2]
  270. -p PROJECT_CODE, --project_code=PROJECT_CODE
  271. Project code, which is used as prefix of output file. [default: rnaseq]
  272. -h, --help
  273. Show this help message and exit
  274. ```
  275. | 参数 | 取值类型 | 解释 | 例如 |
  276. | -------------------------------------------- | --------- | ------------------------------------------------------------ | ----------- |
  277. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  278. | -i INPUT, --input=INPUT | character | 输入文件名,**必须输入。**输入表达谱必须是log scaled的tab分隔的表达谱,可以是RNAseq_1_ballgown.R的输出文件。 | example.txt |
  279. | -e TYPE_GENE_ID, --type_gene_id=TYPE_GENE_ID | character | 基因ID类型,可以是:Ensembl gene ID (EnsemblGID)、Entrez Gene ID (EntrezID)或Gene Symbol (GeneSymbol)。[default: EnsemblGID] | EnsemblGID |
  280. | -g SAMPLE_GROUP, --sample_group=SAMPLE_GROUP | character | 有tab分隔的样本的分组信息,**必须输入**。格式为:每行一个样本,每列为分组信息。分组信息可以是多列。 | group.txt |
  281. | -q NUMBER, --padjvalueCutoff=NUMBER | number | 富集分析adjust p值 cutoff | 0.2 |
  282. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  283. | -h, --help | | 查看帮助文档并退出 | -h |
  284. ### 运行示例
  285. ```shell
  286. Rscript RNAseq_5_pwGSEA.R -o /home/yuying/rnaseqreport_test -i example_geneexp_log2fpkm_floor0p01_c13r58395_2019-04-30.txt -g group13_1.txt
  287. ```
  288. ### 输出结果
  289. 1. rnaseq_gsea_curatedgenesets.csv 基于curated gene sets的GSEA富集结果。
  290. 2. rnaseq_gsea_go.csv 基于GO 功能的GSEA富集结果。
  291. > "","versus","pathway","pval","padj","ES","NES","nMoreExtreme","size","leadingEdge"
  292. > "1","test1 vs test2","GO_REGULATION_OF_CELL_ACTIVATION",0.001007,0.1232,-0.4512,-1.458,0,458,"2302, 3127, 3123, 6352, 3119, 912, 972, 2625, 3122, 958, 8808, 53833, 284021, 10451, 154, 301, 55024, 3109, 923, 348, 8456, 84433, 51237, 4323, 7409, 919, 11005, 3606, 727897, 3929, 7056, 114548, 857, 10673, 695, 163747, 3120, 683, 124912, 29126, 114771, 2150, 3113, 3956, 22914, 4773, 83639, 634, 1029, 1236, 29108, 90865, 441478, 3600, 5896, 51083, 89780, 10148, 3965, 9173, 2323, 84106, 51744, 282618, 2852, 2056, 1269, 5592, 5724, 3623, 3567, 11314, 23529, 7474, 558, 10461, 283234, 11148, 5341, 3273, 9466, 22890, 22806, 917, 8832, 5588, 3952, 282616, 2207, 3111, 3574, 84807, 2268, 282617, 6869, 3659, 6441, 8772, 3575, 8546, 1948, 246778, 1178, 5585, 84959, 2064"
  293. > "2","test1 vs test2","GO_IMMUNE_EFFECTOR_PROCESS",0.001007,0.1232,-0.4688,-1.515,0,455,"5473, 6374, 3127, 1380, 3123, 8519, 3119, 972, 2625, 10581, 958, 722, 284021, 3437, 3627, 8284, 56892, 10451, 3075, 1755, 3428, 51191, 4353, 644150, 28984, 4939, 55061, 114836, 717, 117157, 9245, 8809, 566, 7409, 919, 154064, 3929, 23705, 340061, 55601, 114548, 4599, 9844, 730, 2633, 3433, 10410, 3764, 2150, 7098, 91607, 60489, 3956, 3439, 3426, 29108, 90865, 10584, 725, 10417, 3078, 715, 3383, 84871, 3434, 10964, 51744, 64218, 1621, 282618, 23586, 3665, 4600, 78989, 710, 733, 3815, 3936, 1636"
  294. 每列的含义:
  295. A table with GSEA results. Each row corresponds to a tested pathway. The columns are the following:
  296. - versus - compared group
  297. - pathway – name of the pathway as in 'names(pathway)';
  298. - pval – an enrichment p-value;
  299. - padj – a BH-adjusted p-value;
  300. - ES – enrichment score, same as in Broad GSEA implementation;
  301. - NES – enrichment score normalized to mean enrichment of random samples of the same size;
  302. - nMoreExtreme' – a number of times a random gene set had a more extreme enrichment score value;
  303. - size – size of the pathway after removing genes not present in 'names(stats)'.
  304. - leadingEdge – vector with indexes of leading edge genes that drive the enrichment, see <http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading>.
  305. ### choppy report
  306. @data-table-js(dataUrl=/*yourdir*/data/rnaseq_gsea_curatedgenesets.csv')
  307. @data-table-js(dataUrl=/*yourdir*/data/rnaseq_gsea_go.csv')
  308. ## RNAseq_6_enrichfunc.R
  309. ### 功能简介
  310. 利用 clusterProfiler包对不同比较中的差异基因进行GO和KEGG通路分析。
  311. 输入:RNAseq_4_pwDEG.R输出的差异基因清单表(PROJECT_CODE_deg_acrossgroups.csv)。
  312. *注意联网*
  313. 本功能较慢,每组的分析约需5分钟。
  314. ### 代码参数
  315. ```shell
  316. Usage: Rscript RNAseq_6_enrichfunc.R [options]
  317. Options:
  318. -o OUT_DIR, --out_dir=OUT_DIR
  319. The output directory [default ./]
  320. -i INPUT, --input=INPUT
  321. The input DEG list in csv format. The first column: gene; second column: group. Required!
  322. -e TYPE_GENE_ID, --type_gene_id=TYPE_GENE_ID
  323. The type of gene symbol. Could be either of EnsemblGID/EntrezID/GeneSymbol [default: EnsemblGID]
  324. -f NUMBER, --pvalueCutoff=NUMBER
  325. Cutoff value of p value. [default: 0.05]
  326. -m PADJUSTMETHOD, --pAdjustMethod=PADJUSTMETHOD
  327. Method of adjust p value. One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". [default: BH]
  328. -q NUMBER, --qvalueCutoff=NUMBER
  329. Cutoff value of q value. [default: 0.2]
  330. -p PROJECT_CODE, --project_code=PROJECT_CODE
  331. Project code, which is used as prefix of output file. [default: rnaseq]
  332. -h, --help
  333. Show this help message and exit
  334. ```
  335. | 参数 | 取值类型 | 解释 | 例如 |
  336. | ----------------------------------------------- | --------- | ------------------------------------------------------------ | ----------- |
  337. | -o OUT_DIR, --out_dir=OUT_DIR | character | 输出路径,默认为./。可加“/”也可不加“/” | ./ |
  338. | -i INPUT, --input=INPUT | character | 输入文件名,**必须输入。**输入表达谱必须是log scaled的tab分隔的表达谱,可以是RNAseq_1_ballgown.R的输出文件。 | example.txt |
  339. | -e TYPE_GENE_ID, --type_gene_id=TYPE_GENE_ID | character | 基因ID类型,可以是:Ensembl gene ID (EnsemblGID)、Entrez Gene ID (EntrezID)或Gene Symbol (GeneSymbol)。[default: EnsemblGID] | EnsemblGID |
  340. | -f NUMBER, --pvalueCutoff=NUMBER | number | 富集分析p值 cutoff | 0.05 |
  341. | -m PADJUSTMETHOD, --pAdjustMethod=PADJUSTMETHOD | character | p adjust method | BH |
  342. | -q NUMBER, --qvalueCutoff=NUMBER | number | 富集分析q值 cutoff | 0.2 |
  343. | -p PROJECT_CODE, --project_code=PROJECT_CODE | character | project代号,输出文件的前缀,默认rnaseq | rnaseq |
  344. | -h, --help | | 查看帮助文档并退出 | -h |
  345. ### 输出结果
  346. GO和KEGG通路结果。
  347. 内容如下:
  348. > "","versus","ID","Description","GeneRatio","BgRatio","pvalue","p.adjust","qvalue","geneID","Count"
  349. > "1","A vs B","hsa05168","Herpes simplex virus 1 infection","39/185","492/7847",9.625e-12,2.117e-09,2.057e-09,"256051/684/7568/10172/84765/80095/162963/84436/55762/55769/55786/90594/148268/342908/30832/348327/100129543/81931/390927/147837/57573/388566/91120/113835/163059/84671/65251/79973/126017/147949/374900/7594/3111/3135/728927/100129842/59348/26974/7772",39
  350. ### 运行示例
  351. ```shell
  352. #最少输入
  353. Rscript RNAseq_6_enrichfunc.R -i rnaseq_degs_acrossgroups.csv
  354. ```
  355. ### choppy report
  356. @data-table-js(dataUrl=/*yourdir*/data/rnaseq_GOenrich.csv')
  357. @data-table-js(dataUrl=/*yourdir*/data/rnaseq_KEGGenrich.csv')