Limma Moderated and Ordinary t-statistics
07 Feb 2017
When analyzing large amounts of genetic and genomic data, the first line of analysis is usually some sort of univariate test. That is, conduct a statistical test for each SNP or CpG site or Gene and then correct for multiple testing. The limma package on Bioconductor is a popular method for computing moderated t-statistics using a combination of the `limma::lmFit`

and `limma::eBayes`

functions. In this post, I show how to calculate the ordinary t-statistics from `limma`

output.

First we load the required packages

# clear workspace
rm ( list = ls ())
if ( ! requireNamespace ( "pacman" , quietly = TRUE ))
install.packages ( "pacman" )
# this will install (if you dont already have it) and load the package
pacman :: p_load ( "minfi" )
pacman :: p_load ( "minfiData" )
pacman :: p_load ( "limma" )
pacman :: p_load ( "CpGassoc" )
# check the data that is available in the minfiData package
pacman :: p_data ( "minfiData" )

```
## Data Description
## 1 MsetEx An example dataset for Illumina's Human Methylation 450k dataset, after preprocessing.
## 2 MsetEx.sub An example dataset for Illumina's Human Methylation 450k dataset, after preprocessing.
## 3 RGsetEx An example dataset for Illumina's Human Methylation 450k dataset.
## 4 RGsetEx.sub An example dataset for Illumina's Human Methylation 450k dataset.
```

Next, we extract some sample data and create a covariate of interest

# get the M-values for the sample data
DT <- minfi :: getM ( MsetEx.sub )
dim ( DT )

`## [1] 600 6`

# rows are CpGs and columns are samples
head ( DT )

```
## 5723646052_R02C02 5723646052_R04C01 5723646052_R05C02
## cg00050873 3.502348 0.4414491 4.340695
## cg00212031 -3.273751 0.9234662 -2.614777
## cg00213748 2.076816 -0.1309465 1.260995
## cg00214611 -3.438838 1.7463950 -2.270551
## cg00455876 1.839010 -0.9927320 1.619479
## cg01707559 -3.303987 -0.6433201 -3.540887
## 5723646053_R04C02 5723646053_R05C02 5723646053_R06C02
## cg00050873 0.24458355 -0.3219281 0.2744392
## cg00212031 -0.21052257 -0.6861413 -0.1397595
## cg00213748 -1.10373279 -1.6616553 -0.1270869
## cg00214611 0.29029649 -0.2103599 -0.6138630
## cg00455876 -0.09504721 -0.2854655 0.6361273
## cg01707559 -0.74835377 -0.4678048 -1.1345421
```

# create a fake covariate. 3 intact and 3 degraded samples
tissue <- factor ( c ( rep ( "Intact" , 3 ), rep ( "Degraded" , 3 )))
design <- model.matrix ( ~ tissue )

Then we calculate the moderated and ordinary t-statistics and compare them:

# limma fit
fit <- lmFit ( DT , design )
fit_ebayes <- eBayes ( fit )
# ordinary t-statistic
ordinary.t <- fit $ coefficients [, "tissueIntact" ] /
fit $ stdev.unscaled [, "tissueIntact" ] /
fit $ sigma
plot ( x = ordinary.t ,
y = fit_ebayes $ t [, "tissueIntact" ],
xlab = "ordinary t-statistic" ,
ylab = "moderated t-statistic" )
abline ( a = 0 , b = 1 , col = "red" )

We can calculate the corresponding p-values from the ordinary t-statistics. This is given by

# t-distribution with n-p-1 degrees of freedom
# (n: number of samples, p:number of covariates excluding intercept)
ordinary.pvalue <- pt ( q = abs ( ordinary.t ),
df = dim ( DT )[ 2 ] - ( ncol ( design ) -1 ) -1 ,
lower.tail = F ) * 2

We can also use the `CpGassoc`

package to calculate ordinary t-statistics and compare the result to our manual calculations:

# compare result with CpGassoc package (regular t-test)
t_test_mvalues <- CpGassoc :: cpg.assoc (
beta.val = minfi :: getBeta ( MsetEx.sub ),
indep = tissue ,
fdr.cutoff = 0.05 ,
logit.transform = TRUE )
# these should be equal
plot ( t_test_mvalues $ results $ P.value , ordinary.pvalue )
abline ( a = 0 , b = 1 , col = "red" )

# t-statistic is the squared F.statistic (in certain cases)
plot ( t_test_mvalues $ results $ F .statistic , ordinary.t ^ 2 )
abline ( a = 0 , b = 1 , col = "red" )

Session Info
```
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## attached base packages:
## [1] stats4 parallel methods stats graphics grDevices
## [7] utils datasets base
##
## other attached packages:
## [1] CpGassoc_2.55
## [2] nlme_3.1-128
## [3] limma_3.30.10
## [4] minfiData_0.20.0
## [5] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
## [6] IlluminaHumanMethylation450kmanifest_0.4.0
## [7] minfi_1.20.2
## [8] bumphunter_1.14.0
## [9] locfit_1.5-9.1
## [10] iterators_1.0.8
## [11] foreach_1.4.3
## [12] Biostrings_2.42.1
## [13] XVector_0.14.0
## [14] SummarizedExperiment_1.4.0
## [15] GenomicRanges_1.26.2
## [16] GenomeInfoDb_1.10.2
## [17] IRanges_2.8.1
## [18] S4Vectors_0.12.1
## [19] Biobase_2.34.0
## [20] BiocGenerics_0.20.0
## [21] knitr_1.15.1
##
## loaded via a namespace (and not attached):
## [1] mclust_5.2.2 base64_2.0
## [3] Rcpp_0.12.9 lattice_0.20-34
## [5] Rsamtools_1.26.1 digest_0.6.12
## [7] R6_2.2.0 plyr_1.8.4
## [9] RSQLite_1.1-2 evaluate_0.10
## [11] highr_0.6 httr_1.2.1
## [13] zlibbioc_1.20.0 GenomicFeatures_1.26.2
## [15] data.table_1.10.4 annotate_1.52.1
## [17] Matrix_1.2-7.1 preprocessCore_1.36.0
## [19] splines_3.3.2 BiocParallel_1.8.1
## [21] stringr_1.2.0 RCurl_1.95-4.8
## [23] biomaRt_2.30.0 rtracklayer_1.34.1
## [25] multtest_2.30.0 pkgmaker_0.22
## [27] openssl_0.9.6 GEOquery_2.40.0
## [29] quadprog_1.5-5 codetools_0.2-15
## [31] matrixStats_0.51.0 XML_3.98-1.5
## [33] reshape_0.8.6 GenomicAlignments_1.10.0
## [35] MASS_7.3-45 bitops_1.0-6
## [37] grid_3.3.2 xtable_1.8-2
## [39] registry_0.3 DBI_0.5-1
## [41] pacman_0.4.1 magrittr_1.5
## [43] stringi_1.1.2 genefilter_1.56.0
## [45] doRNG_1.6 nor1mix_1.2-2
## [47] RColorBrewer_1.1-2 siggenes_1.48.0
## [49] tools_3.3.2 illuminaio_0.16.0
## [51] rngtools_1.2.4 survival_2.40-1
## [53] AnnotationDbi_1.36.2 beanplot_1.2
## [55] memoise_1.0.0
```

If you have any questions or comments, please post them below.