久久精品一区二区国产,天天久久,人人爱天天做夜夜爽

??專注R語言在??生物醫(yī)學(xué)中的使用

設(shè)為“星標(biāo)”，精彩不錯過

藥物敏感性分析是生信數(shù)據(jù)挖掘常用的技能之一，目前做藥敏分析最常見的就是兩個R包：pRRophetic和oncoPredict。

這兩個包的作者都是同一個人，oncoPredict可以看做是pRRophetic的升級版。兩個R包的使用基本上是一樣的思路，只不過使用的訓(xùn)練數(shù)據(jù)集不同而已。

在介紹R包的使用之前，需要大家先了解一下常用的藥物敏感性數(shù)據(jù)庫，最好是去到這些數(shù)據(jù)庫的主頁看看或者讀一讀相關(guān)的文獻，對這些數(shù)據(jù)庫有一個大致的了解。

常用藥敏數(shù)據(jù)庫
pRRophetic方法學(xué)介紹
安裝
預(yù)測不同組別患者對化療藥物的敏感性
其他示例

pRRopheticCV
使用CCLE示例數(shù)據(jù)
自定義訓(xùn)練集

自帶數(shù)據(jù)探索
預(yù)測全的藥物的敏感性
參考

常用藥敏數(shù)據(jù)庫

藥敏數(shù)據(jù)庫非常多，但最常用的無非就是GDSC/CTRP/CCLE等，在珠江腫瘤公眾號中早就介紹過這些數(shù)據(jù)庫了，所以我就不重復(fù)了，大家可以參考以下鏈接。

以下鏈接介紹了GDSC、CTRP、CCLE、NCI-60、DepMap、Pharmacodb等數(shù)據(jù)庫，是非常棒的參考資料：

腫瘤藥敏多組學(xué)數(shù)據(jù)庫(GDSC)概覽
腫瘤藥敏多組學(xué)數(shù)據(jù)庫(GDSC)的數(shù)據(jù)介紹和獲取
GDSC與其他藥敏多組學(xué)數(shù)據(jù)庫
GDSC與CELL數(shù)據(jù)庫的藥物基因組學(xué)一致性
靶點表達水平可作為靶向藥物敏感性的指標(biāo)

pRRophetic方法學(xué)介紹

這個R包的思路其實很簡單，就是根據(jù)已知的細胞系表達矩陣和藥物敏感性信息作為訓(xùn)練集建立模型，然后對新的表達矩陣進行預(yù)測。已知的信息就是從直接從上面介紹的數(shù)據(jù)庫下載的，pRRophetic包使用的是CGP和CCLE的數(shù)據(jù)，但是CCLE的藥敏數(shù)據(jù)只有24種藥物和500多個細胞系，數(shù)據(jù)量比較少，所以通常大家使用的都是CGP的數(shù)據(jù)。

作者專門發(fā)了一篇文章，詳細介紹該包背后的方法和原理：Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines。

作者把上面這篇文獻中的方法變成了pRRophetic包，又發(fā)了一篇文章，非常妙：pRRophetic: An R Package for Prediction of Clinical Chemotherapeutic Response from Tumor Gene Expression Levels

其中有一個簡化版的方法學(xué)介紹，我截圖如下，如果想要詳細了解，建議閱讀原文獻哦：

簡單來說，使用的表達矩陣是芯片數(shù)據(jù)，訓(xùn)練集和測試集分別進行quantile-normalization, 去除方差低的基因，用每個基因作為預(yù)測變量，藥物的IC50作為結(jié)果變量擬合嶺回歸，然后使用擬合好的模型對新的數(shù)據(jù)進行預(yù)測。

安裝

這個R包非常古老，雖然文章里說會不斷更新，但是很明顯沒有更新過。

需要首先安裝依賴包，然后通過以下鏈接下載pRRophetic_0.5.tar.gz這個壓縮包，進行本地安裝。

鏈接：https://osf.io/dwzce/?action=download

#安裝依賴包
BiocManager::install(c("car", "ridge", "preprocessCore", "genefilter", "sva"))
#本地安裝
install.packages("E:/R/R包/pRRophetic_0.5.tar.gz", repos = NULL, type = "source")

這個包太老了，有些版本比較新的R可能安裝不了，我使用的R4.2.0安裝沒有任何問題。

但是安裝之后還是不能使用，因為它太古老了，可能會遇到以下報錯：

# 報錯：
Error in if (class(exprMatUnique) == "numeric") { :
the condition has length > 1

# 或者
Error in if (class(testExprData) != "matrix") stop("ERROR: \"testExprData\" must be a matrix.") : the condition has length > 1

遇到了不要驚慌，畢竟果子老師已經(jīng)幫我們解決這個問題，按照果子老師的介紹重新安裝即可：

基因表達量預(yù)測藥物反應(yīng)的R包pRRophetic近期報錯解決方案

預(yù)測不同組別患者對化療藥物的敏感性

在包的github中作者給了一個使用示例：https://github.com/paulgeeleher/pRRophetic/blob/master/vignetteOutline.pdf

下面我們結(jié)合這個示例簡單介紹下這個包的使用。

通常我們會根據(jù)某種方法把所有樣本分為不同的組（比如最常見的高風(fēng)險組/低風(fēng)險組，或者不同的分子亞型等），然后想看看不同的組對某種藥物的敏感性。

這個包就可以幫你做這樣的事情，而且只需要你提供自己的表達矩陣即可，它默認會使用cgp2014的數(shù)據(jù)作為訓(xùn)練集建立模型，然后對你的表達矩陣進行預(yù)測，這樣你就可以得到每個樣本的IC50值。

除了cgp2014，你還可以選擇cgp2016作為訓(xùn)練數(shù)據(jù)，cgp2016有251種藥物，cgp2014只有138種。

前面介紹過的GDSC（Genomics of Drug Sensitivity in Cancer），是CGP項目（Cancer Genome Project）的一部分。CGP的官網(wǎng)：https://www.cancerrxgene.org/。

加載R包：

library(pRRophetic)

在預(yù)測對某個藥物的敏感性前，最好先評估數(shù)據(jù)的正態(tài)性，因為CGP中的許多藥物的IC50并不是呈正態(tài)分布的，此時是不適合使用線性模型的。

用R包自帶的硼替佐米數(shù)據(jù)做個演示，先看下硼替佐米這個藥的IC50是不是符合正態(tài)分布：

data("bortezomibData")
pRRopheticQQplot("Bortezomib")

從這個QQ圖來看其實不是非常符合，但還算可以，我們就認為它符合吧。

然后就可以用pRRopheticPredict預(yù)測對這個藥物的敏感性了，這也是這個包最重要的函數(shù)。

我們這里用的是示例表達矩陣，你用的時候只需要換成自己的表達矩陣即可。

exprDataBortezomib是一個標(biāo)準(zhǔn)的表達矩陣，行是基因，列是樣本：

dim(exprDataBortezomib) #22283個基因，264個樣本
## [1] 22283   264
exprDataBortezomib[1:4,1:4]
##       GSM246523 GSM246524 GSM246525 GSM246526
## <NA>   235.5230  498.2220  309.2070  307.5690
## RFC2    41.4470   69.0219   69.3994   36.9310
## HSPA6   84.8689   56.8352   49.4388   54.6669
## PAX8   530.4490  457.9310  536.1780  325.3630

預(yù)測：

predictedPtype <- pRRopheticPredict(testMatrix = exprDataBortezomib, #表達矩陣
                                    drug = "Bortezomib", # 藥物
                                    tissueType = "blood", 
                                    batchCorrect = "eb", #訓(xùn)練集和測試集數(shù)據(jù)整合方法，默認eb，即使用combat
                                    powerTransformPhenotype = T, # 是否進行冪轉(zhuǎn)換
                                    selection=1, # 遇到名字重復(fù)的基因取平均值
                                    dataset = "cgp2014")
## 
##  11683  gene identifiers overlap between the supplied expression matrices... 
## 
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## 
##  2324 low variabilty genes filtered.
## Fitting Ridge Regression model... Done
## 
## Calculating predicted phenotype...Done

tissueType指定你想用CGP細胞系中的哪些類型腫瘤作為訓(xùn)練集，默認是all；

結(jié)果是一個命名向量，就是每個樣本的IC50值：

head(predictedPtype)
## GSM246523 GSM246524 GSM246525 GSM246526 GSM246527 GSM246528 
## -6.808324 -5.557028 -5.382334 -3.999054 -6.330220 -4.751816

這個示例數(shù)據(jù)中所有的樣本可以被分為兩組，一組是NR組，另一組是R組，通常你的表達矩陣也會分組的，比如根據(jù)某個方法分成高風(fēng)險組和低風(fēng)險組，一樣的意思。

我們就可以對兩組的IC50做個t檢驗：

t.test(predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)],
       predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)],
       alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)] and predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)]
## t = 4.1204, df = 165.24, p-value = 2.984e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.3975589       Inf
## sample estimates:
## mean of x mean of y 
## -4.372173 -5.036370

還可以畫個圖展示：

library(ggplot2)
library(ggpubr)

df <- stack(list(NR=predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)], 
                 R=predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)]))

ggboxplot(df, x="ind",y="values",fill="ind",alpha=0.3,palette = "lancet",
          ylab="Predicted Bortezomib Sensitivity",
          xlab="Clinical Response"
          )+
  stat_compare_means(method = "t.test")+
  theme(legend.position = "none")

這張圖就是文獻中最常見的圖了。

下面再展示下預(yù)測對厄洛替尼的敏感性，這個藥物的IC50明顯不符合正態(tài)分布：

pRRopheticQQplot("Erlotinib")

所以此時我們使用pRRopheticLogisticPredict函數(shù)預(yù)測樣本的IC50值：

predictedPtype_erlotinib <- pRRopheticLogisticPredict(exprDataBortezomib,
                                                      "Erlotinib",
                                                      selection=1)
## 
##  11683  gene identifiers overlap between the supplied expression matrices... 
## 
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## Fitting model, may take some time....

后面的分析就都是一樣的了~

其他示例

pRRopheticCV

為了說明這個包的預(yù)測結(jié)果的準(zhǔn)確性，還可以使用pRRopheticCV函數(shù)查看預(yù)測結(jié)果和真實結(jié)果的一致性，使用5折交叉驗證：

cvOut <- pRRopheticCV("Bortezomib", cvFold=5, testExprData=exprDataBortezomib)
## 
##  11683  gene identifiers overlap between the supplied expression matrices... 
## 
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## 
## 1 of 5 iterations complete.
## 2 of 5 iterations complete.
## 3 of 5 iterations complete.
## 4 of 5 iterations complete.
## 5 of 5 iterations complete.

查看結(jié)果：

summary(cvOut)
## 
## Summary of cross-validation results:
## 
## Pearsons correlation: 0.44 , P =  6.37210297840637e-15 
## R-squared value: 0.2
## Estimated 95% confidence intervals: -4.21, 3.36
## Mean prediction error: 1.61

真實結(jié)果和預(yù)測結(jié)果的相關(guān)性只有0.42，還給出了P值、R^2、預(yù)測錯誤率等信息，可以畫個圖展示下真實結(jié)果和預(yù)測結(jié)果：

plot(cvOut)

使用CCLE示例數(shù)據(jù)

CCLE中只有24個藥物，500+細胞系，用的很少，數(shù)據(jù)量比CGP少太多了。該包自帶了一個CCLE數(shù)據(jù)ccleData，其使用方法和上面完全一樣，就不重復(fù)介紹了。

自定義訓(xùn)練集

指定訓(xùn)練用的表達矩陣和對應(yīng)的樣本類別，再提供一個表達矩陣，就可以預(yù)測該表達矩陣每個樣本對藥物的敏感性。也就是說這個方法可以讓你能夠使用自己的訓(xùn)練數(shù)據(jù)~但是我好像并沒有見到這么做的，如果大家有見過的，歡迎告訴我~

下面我們繼續(xù)用硼替佐米數(shù)據(jù)作為示例進行演示。

我們先從exprDataBortezomib這個完整的表達矩陣提取一部分數(shù)據(jù)作為訓(xùn)練用的表達矩陣，并且也提取這部分樣本的類別（有5個類別：CR、PR、MR、NC、PD）。

然后再提取一部分表達矩陣作為測試用表達矩陣，來預(yù)測這部分樣本對硼替佐米的敏感性。

準(zhǔn)備訓(xùn)練數(shù)據(jù)和測試數(shù)據(jù)：

# 訓(xùn)練用表達矩陣
trainExpr <- exprDataBortezomib[, (detailedResponse %in% c(1,2,3,4,5)) & studyIndex %in% c("studyCode = 25", "studyCode = 40")]

# 訓(xùn)練用樣本類型
trainPtype <- detailedResponse[(detailedResponse %in% c(1,2,3,4,5)) & studyIndex %in% c("studyCode = 25", "studyCode = 40")]

# 預(yù)測用表達矩陣
testExpr <- exprDataBortezomib[, (detailedResponse %in% c(1,2,3,4,5)) & studyIndex %in% c("studyCode = 39")]
dim(testExpr) # 141個樣本
## [1] 22283   141

下面就可以預(yù)測樣本對藥物的敏感性了：

ptypeOut <- calcPhenotype(trainExpr, trainPtype, testExpr, selection=1)
## 
##  22283  gene identifiers overlap between the supplied expression matrices... 
## 
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
## 
##  2500 low variabilty genes filtered.
## Fitting Ridge Regression model... Done
## 
## Calculating predicted phenotype...Done

這個結(jié)果就是141個樣本的預(yù)測敏感性：

length(ptypeOut)
## [1] 141
head(ptypeOut)
## GSM246530 GSM246536 GSM246537 GSM246539 GSM246540 GSM246544 
##  2.990402  2.615408  3.314234  2.718105  2.578793  2.823383

有了這個預(yù)測的結(jié)果，我們可以與真實的結(jié)果做一個相關(guān)性分析：

# 提取真實結(jié)果
testPtype <- detailedResponse[(detailedResponse %in% c(1,2,3,4,5)) & studyIndex %in% c("studyCode = 39")]

# 相關(guān)性分析
cor.test(testPtype, ptypeOut, alternative="greater")
## 
##  Pearson's product-moment correlation
## 
## data:  testPtype and ptypeOut
## t = 2.1512, df = 139, p-value = 0.01659
## alternative hypothesis: true correlation is greater than 0
## 95 percent confidence interval:
##  0.04142448 1.00000000
## sample estimates:
##       cor 
## 0.1795014

還可以做t檢驗：

t.test(ptypeOut[testPtype %in% c(3,4,5)], ptypeOut[testPtype %in% c(1,2)],
       alternative="greater")
## 
##  Welch Two Sample t-test
## 
## data:  ptypeOut[testPtype %in% c(3, 4, 5)] and ptypeOut[testPtype %in% c(1, 2)]
## t = 2.0182, df = 137.43, p-value = 0.02276
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.02533599        Inf
## sample estimates:
## mean of x mean of y 
##  2.646449  2.505269

自帶數(shù)據(jù)探索

這個包自帶的所有數(shù)據(jù)可以在包的安裝目錄中查看，就是這幾個：

rm(list = ls())
library(pRRophetic)

加載數(shù)據(jù)查看一下：

data(cgp2016ExprRma) 
dim(cgp2016ExprRma)
## [1] 17419  1018
cgp2016ExprRma[1:4,1:4]
##          CAL-120   DMS-114   CAL-51    H2869
## TSPAN6  7.632023  7.548671 8.712338 7.797142
## TNMD    2.964585  2.777716 2.643508 2.817923
## DPM1   10.379553 11.807341 9.880733 9.883471
## SCYL3   3.614794  4.066887 3.956230 4.063701

這個是cgp2016版本的表達矩陣，其中行是基因，列是細胞系，一共17419個基因，1018個細胞系。

data(PANCANCER_IC_Tue_Aug_9_15_28_57_2016)
length(unique(drugData2016$Drug.name))
## [1] 251
head(unique(drugData2016$Drug.name),30)
##  [1] "Erlotinib"           "Rapamycin"           "Sunitinib"          
##  [4] "PHA-665752"          "MG-132"              "Paclitaxel"         
##  [7] "Cyclopamine"         "AZ628"               "Sorafenib"          
## [10] "VX-680"              "Imatinib"            "TAE684"             
## [13] "Crizotinib"          "Saracatinib"         "S-Trityl-L-cysteine"
## [16] "Z-LLNle-CHO"         "Dasatinib"           "GNF-2"              
## [19] "CGP-60474"           "CGP-082996"          "A-770041"           
## [22] "WH-4-023"            "WZ-1-84"             "BI-2536"            
## [25] "BMS-536924"          "BMS-509744"          "CMK"                
## [28] "Pyrimethamine"       "JW-7-52-1"           "A-443654"

上面這個是cgp2016版本的細胞系和藥物敏感性信息，包含了每種細胞系對每種藥物的IC50等信息，可以看到其中一共有251種藥物，cgp2014只有138種藥物（可以通過?pRRopheticPredict查看幫助文檔確定）。

data(drugAndPhenoCgp)

這里面是一些原始文件，貌似平常用不到，大家感興趣可以自己探索下。

可以看到其中還有一個ccleData，其實和上面用到的硼替佐米數(shù)據(jù)是一樣的，只不過一個來自于CGP，另一個來自于CCLE而已，就不展示了。

預(yù)測全部藥物的敏感性

假如我們要對自己的表達矩陣預(yù)測所有藥物的敏感性，只需要把所有的藥物提取出來，寫個循環(huán)即可，這里以cgp2016的藥物為例。

以下這段代碼來自生信技能樹：藥物預(yù)測R包之pRRophetic

耗時巨長??！

library(parallel)
library(pRRophetic)

# 加載cgp2016的藥敏信息
data(PANCANCER_IC_Tue_Aug_9_15_28_57_2016) # drugData2016
data(cgp2016ExprRma) # cgp2016ExprRma
data(bortezomibData)
#提取cgp2016的所有藥物名字
possibleDrugs2016 <- unique( drugData2016$Drug.name)
#possibleDrugs2016
# 用system.time來返回計算所需時間
#head(possibleDrugs2016)
system.time({ 
  cl <- makeCluster(8)  
  results <- parLapply(cl,possibleDrugs2016[1:10],#只用前10個測試，用全部時間太長
                       function(x){
                         library(pRRophetic) 
                         data(bortezomibData)
                         predictedPtype=pRRopheticPredict(
                           testMatrix=exprDataBortezomib,#換成你自己的表達矩陣
                           drug=x,
                           tissueType = "all", 
                           batchCorrect = "eb",
                           selection=1,
                           dataset = "cgp2016")
                         return(predictedPtype)
                       }) # lapply的并行版本
  res.df <- do.call('rbind',results) # 整合結(jié)果
  stopCluster(cl) # 關(guān)閉集群
})

畫個圖看看，畫圖之前需要一些數(shù)據(jù)格式的轉(zhuǎn)換，就是常規(guī)的長寬轉(zhuǎn)換，加名字即可。

然后使用ggplot系列包畫圖、添加顯著性，一氣呵成，非常簡單，所以R語言基礎(chǔ)是非常有必要的。

library(tidyr)
library(dplyr)
library(ggplot2)
library(ggpubr)
library(ggsci)

plot_df <- res.df %>% 
  as.data.frame() %>% 
  t() %>% 
  bind_cols(studyResponse) %>% 
  bind_cols(bortIndex) %>% 
  filter(!studyResponse == "PGx_Responder = IE", bortIndex == TRUE)
names(plot_df) <- c(possibleDrugs2016[1:10],"studyResponse","bortIndex")  

plot_df %>% 
  pivot_longer(1:10,names_to = "drugs",values_to = "ic50") %>% 
  ggplot(., aes(studyResponse,ic50))+
  geom_boxplot(aes(fill=studyResponse))+
  scale_fill_jama()+
  theme(axis.text.x = element_text(angle = 45,hjust = 1),
        axis.title.x = element_blank(),
        legend.position = "none")+
  facet_wrap(vars(drugs),scales = "free_y",nrow = 2)+
  stat_compare_means()

easy！這次內(nèi)容挺多的，下次再介紹oncoPredict吧。

參考

生信技能樹：藥物預(yù)測R包之pRRophetic

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

九色国产,午夜在线视频,新黄色网址,九九色综合,天天做夜夜做久久做狠狠,天天躁夜夜躁狠狠躁2021a,久久不卡一区二区三区