Limma Moderated and Ordinary t-statistics

When analyzing large amounts of genetic and genomic data, the first line of analysis is usually some sort of univariate test. That is, conduct a statistical test for each SNP or CpG site or Gene and then correct for multiple testing. The limma package on Bioconductor is a popular method for computing moderated t-statistics using a combination of the limma::lmFit and limma::eBayes functions. In this post, I show how to calculate the ordinary t-statistics from limma output.

First we load the required packages

# clear workspace
rm(list=ls())

if (!requireNamespace("pacman", quietly = TRUE)) 
  install.packages("pacman")

# this will install (if you dont already have it) and load the package
pacman::p_load("minfi")
pacman::p_load("minfiData")
pacman::p_load("limma")
pacman::p_load("CpGassoc")

# check the data that is available in the minfiData package
pacman::p_data("minfiData")
##   Data        Description                                                                           
## 1 MsetEx      An example dataset for Illumina's Human Methylation 450k dataset, after preprocessing.
## 2 MsetEx.sub  An example dataset for Illumina's Human Methylation 450k dataset, after preprocessing.
## 3 RGsetEx     An example dataset for Illumina's Human Methylation 450k dataset.                     
## 4 RGsetEx.sub An example dataset for Illumina's Human Methylation 450k dataset.

Next, we extract some sample data and create a covariate of interest

# get the M-values for the sample data
DT <- minfi::getM(MsetEx.sub)

dim(DT)
## [1] 600   6
# rows are CpGs and columns are samples
head(DT)
##            5723646052_R02C02 5723646052_R04C01 5723646052_R05C02
## cg00050873          3.502348         0.4414491          4.340695
## cg00212031         -3.273751         0.9234662         -2.614777
## cg00213748          2.076816        -0.1309465          1.260995
## cg00214611         -3.438838         1.7463950         -2.270551
## cg00455876          1.839010        -0.9927320          1.619479
## cg01707559         -3.303987        -0.6433201         -3.540887
##            5723646053_R04C02 5723646053_R05C02 5723646053_R06C02
## cg00050873        0.24458355        -0.3219281         0.2744392
## cg00212031       -0.21052257        -0.6861413        -0.1397595
## cg00213748       -1.10373279        -1.6616553        -0.1270869
## cg00214611        0.29029649        -0.2103599        -0.6138630
## cg00455876       -0.09504721        -0.2854655         0.6361273
## cg01707559       -0.74835377        -0.4678048        -1.1345421
# create a fake covariate. 3 intact and 3 degraded samples
tissue <- factor(c(rep("Intact",3), rep("Degraded",3)))
design <- model.matrix(~ tissue)

Then we calculate the moderated and ordinary t-statistics and compare them:

# limma fit
fit <- lmFit(DT, design)
fit_ebayes <- eBayes(fit)

# ordinary t-statistic
ordinary.t <- fit$coefficients[,"tissueIntact"] / 
  fit$stdev.unscaled[,"tissueIntact"] / 
  fit$sigma

plot(x = ordinary.t, 
     y = fit_ebayes$t[,"tissueIntact"], 
     xlab = "ordinary t-statistic", 
     ylab = "moderated t-statistic")
abline(a = 0, b = 1, col = "red")

plot of chunk unnamed-chunk-2

We can calculate the corresponding p-values from the ordinary t-statistics. This is given by

# t-distribution with n-p-1 degrees of freedom 
# (n: number of samples, p:number of covariates excluding intercept)
ordinary.pvalue <- pt(q = abs(ordinary.t), 
                      df = dim(DT)[2]-(ncol(design)-1)-1, 
                      lower.tail = F) * 2

We can also use the CpGassoc package to calculate ordinary t-statistics and compare the result to our manual calculations:

# compare result with CpGassoc package (regular t-test)
t_test_mvalues <- CpGassoc::cpg.assoc(
  beta.val = minfi::getBeta(MsetEx.sub),
  indep = tissue,
  fdr.cutoff = 0.05,
  logit.transform = TRUE)

# these should be equal
plot(t_test_mvalues$results$P.value, ordinary.pvalue)
abline(a = 0, b = 1, col = "red")

plot of chunk unnamed-chunk-4

# t-statistic is the squared F.statistic (in certain cases) 
plot(t_test_mvalues$results$F.statistic, ordinary.t^2)
abline(a = 0, b = 1, col = "red")

plot of chunk unnamed-chunk-4

Session Info

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## attached base packages:
## [1] stats4    parallel  methods   stats     graphics  grDevices
## [7] utils     datasets  base     
## 
## other attached packages:
##  [1] CpGassoc_2.55                                     
##  [2] nlme_3.1-128                                      
##  [3] limma_3.30.10                                     
##  [4] minfiData_0.20.0                                  
##  [5] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
##  [6] IlluminaHumanMethylation450kmanifest_0.4.0        
##  [7] minfi_1.20.2                                      
##  [8] bumphunter_1.14.0                                 
##  [9] locfit_1.5-9.1                                    
## [10] iterators_1.0.8                                   
## [11] foreach_1.4.3                                     
## [12] Biostrings_2.42.1                                 
## [13] XVector_0.14.0                                    
## [14] SummarizedExperiment_1.4.0                        
## [15] GenomicRanges_1.26.2                              
## [16] GenomeInfoDb_1.10.2                               
## [17] IRanges_2.8.1                                     
## [18] S4Vectors_0.12.1                                  
## [19] Biobase_2.34.0                                    
## [20] BiocGenerics_0.20.0                               
## [21] knitr_1.15.1                                      
## 
## loaded via a namespace (and not attached):
##  [1] mclust_5.2.2             base64_2.0              
##  [3] Rcpp_0.12.9              lattice_0.20-34         
##  [5] Rsamtools_1.26.1         digest_0.6.12           
##  [7] R6_2.2.0                 plyr_1.8.4              
##  [9] RSQLite_1.1-2            evaluate_0.10           
## [11] highr_0.6                httr_1.2.1              
## [13] zlibbioc_1.20.0          GenomicFeatures_1.26.2  
## [15] data.table_1.10.4        annotate_1.52.1         
## [17] Matrix_1.2-7.1           preprocessCore_1.36.0   
## [19] splines_3.3.2            BiocParallel_1.8.1      
## [21] stringr_1.2.0            RCurl_1.95-4.8          
## [23] biomaRt_2.30.0           rtracklayer_1.34.1      
## [25] multtest_2.30.0          pkgmaker_0.22           
## [27] openssl_0.9.6            GEOquery_2.40.0         
## [29] quadprog_1.5-5           codetools_0.2-15        
## [31] matrixStats_0.51.0       XML_3.98-1.5            
## [33] reshape_0.8.6            GenomicAlignments_1.10.0
## [35] MASS_7.3-45              bitops_1.0-6            
## [37] grid_3.3.2               xtable_1.8-2            
## [39] registry_0.3             DBI_0.5-1               
## [41] pacman_0.4.1             magrittr_1.5            
## [43] stringi_1.1.2            genefilter_1.56.0       
## [45] doRNG_1.6                nor1mix_1.2-2           
## [47] RColorBrewer_1.1-2       siggenes_1.48.0         
## [49] tools_3.3.2              illuminaio_0.16.0       
## [51] rngtools_1.2.4           survival_2.40-1         
## [53] AnnotationDbi_1.36.2     beanplot_1.2            
## [55] memoise_1.0.0
If you have any questions or comments, please post them below.
comments powered by Disqus