GSEA是非常常見的富集分析方式,以前我們做GSEA需要用依賴java的GSEA軟件,那個(gè)時(shí)候準(zhǔn)備分析的文件可能要花上很長(zhǎng)時(shí)間,報(bào)錯(cuò)還不知道如何處理?,F(xiàn)在我們來(lái)學(xué)習(xí)一下R語(yǔ)言進(jìn)行GSEA分析。
加載R包
rm(list = ls())
library(ReactomePA)
library(tidyverse)
library(data.table)
library(org.Hs.eg.db)
library(clusterProfiler)
library(biomaRt)
library(enrichplot)
導(dǎo)入文件
導(dǎo)入的文件是兩組間差異分析的結(jié)果,有基因名和logFC
genelist_input <- fread(file="GENE.txt", header = T, sep='\t', data.table = F)
genename <- as.character(genelist_input[,1]) #提取第一列基因名
基因名轉(zhuǎn)換
將基因名轉(zhuǎn)換成ENTREZID
gene_map <- select(org.Hs.eg.db, keys=genename, keytype="SYMBOL", columns=c("ENTREZID"))
colnames(gene_map)[1]<-"Gene"
write.csv(as.data.frame(gene_map),"基因轉(zhuǎn)換.csv",row.names =F)#導(dǎo)出結(jié)果至默認(rèn)路徑下
將ENTREZID與logFC結(jié)合,并根據(jù)logFC的值降序排列
aaa<-inner_join(gene_map,genelist_input,by = "Gene")
aaa<-aaa[,-1]
aaa<-na.omit(aaa)
aaa$logFC<-sort(aaa$logFC,decreasing = T)
GSEA文件準(zhǔn)備
整理成GSEA分析的格式
geneList = aaa[,2]
names(geneList) = as.character(aaa[,1])
geneList
開始分析
接下來(lái)可以進(jìn)行富集分析了,并保存結(jié)果文件
#GSEA分析——GO
Go_gseresult <- gseGO(geneList, 'org.Hs.eg.db', keyType = "ENTREZID", ont="all", nPerm = 1000, minGSSize = 10, maxGSSize = 1000, pvalueCutoff=1)
#GSEA分析——KEGG
KEGG_gseresult <- gseKEGG(geneList, nPerm = 1000, minGSSize = 10, maxGSSize = 1000, pvalueCutoff=1)
#GSEA分析——Reactome
Go_Reactomeresult <- gsePathway(geneList, nPerm = 1000, minGSSize = 10, maxGSSize = 1000, pvalueCutoff=1)
#保存文件
write.table (Go_gseresult, file ="Go_gseresult.csv", sep =",", row.names =TRUE)
write.table (KEGG_gseresult, file ="KEGG_gseresult.csv", sep =",", row.names =TRUE)
write.table (Go_Reactomeresult, file ="Go_Reactomeresult.csv", sep =",", row.names =TRUE)
可視化
用波浪圖展示GO的前10個(gè)結(jié)果
#波浪圖
ridgeplot(Go_gseresult,10) #輸出前十個(gè)結(jié)果
單個(gè)GSEA結(jié)果的展示
方法1
#富集曲線圖類型1:
gseaplot(Go_Reactomeresult,1,pvalue_table = TRUE) #輸出第1個(gè)結(jié)果
方法2
#富集曲線圖類型2:
gseaplot2(Go_Reactomeresult,212,pvalue_table = TRUE)#輸出第212個(gè)結(jié)果
方法3 同時(shí)展示多個(gè)結(jié)果
#gseaplot2還可以同時(shí)顯示復(fù)數(shù)個(gè)功能組的富集曲線,并標(biāo)記P值:
gseaplot2(Go_Reactomeresult, 1:4, pvalue_table = TRUE)
本文的示例文件及全部代碼
聯(lián)系客服