When performing Student’s t-test to compare difference in means between two group, it is a useful exercise to determine the effect of unequal sample sizes in the comparison groups on power. Large imbalances generally will not have adequate statistical power to detect even large effect sizes associated with a factor, leading to a high Type II error rate as shown in the figure below:
In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data. In this post I simulate some gene expression data and visualise it using the pheatmap function from the pheatmap package in R. You will also need the mvrnorm function from the MASS library to simulate from a multivariate normal distribution, and the brewer.pal function from the RColorBrewer library for easier customization of colors.