Statistical Power in t tests with Unequal Group Sizes

When performing Student’s t-test to compare difference in means between two group, it is a useful exercise to determine the effect of unequal sample sizes in the comparison groups on power. Large imbalances generally will not have adequate statistical power to detect even large effect sizes associated with a factor, leading to a high Type II error rate as shown in the figure below:

plot of chunk unnamed-chunk-2

Read more »

Math Expressions with Facets in ggplot2

In this post I show how we can use math expressions to label the panels in facets to produce the following plot:

plot of chunk unnamed-chunk-1

Read more »

Heatmaps in R

In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data. In this post I simulate some gene expression data and visualise it using the pheatmap function from the pheatmap package in R. You will also need the mvrnorm function from the MASS library to simulate from a multivariate normal distribution, and the brewer.pal function from the RColorBrewer library for easier customization of colors.

Read more »

Contrasts in R

In this post I discuss how to create custom contrasts for factor variables in R. First lets create some simulated data. Create the data, and factor Disease status:

Disease <- c(rep("RA", 5), rep("SLE", 5), rep("Scleroderma", 5), 
             rep("Myositis", 5), rep("Control", 5))
sex <-  rbinom(25,1, 0.5)
age <-  rnorm(25, 40, 5)
y <- rnorm(25, 0.5, 0.12)
data <- data.frame(y,sex,age,Disease=factor(Disease))
## 'data.frame':	25 obs. of  4 variables:
##  $ y      : num  0.506 0.323 0.552 0.492 0.513 ...
##  $ sex    : int  0 1 1 1 1 1 0 0 1 1 ...
##  $ age    : num  44.4 46.9 31.6 36.9 40.1 ...
##  $ Disease: Factor w/ 5 levels "Control","Myositis",..: 3 3 3 3 3 5 5 5 5 5 ...

We want the following contrasts:

  • Control versus all 4 diseases combined
  • RA versus the combination of (SLE, Scleroderma, Myositis), leaving out the Controls
Read more »

Testing RMarkdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
Read more »