In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data. In this post I simulate some gene expression data and visualise it using the
pheatmap function from the pheatmap package in
R. You will also need the
mvrnorm function from the MASS library to simulate from a multivariate normal distribution, and the
brewer.pal function from the RColorBrewer library for easier customization of colors.
Components of a Heatmap
There are four main components that should be considered when drawing a heatmap:
First I simulate some gene expression data, based on a function that I created, for genes which are correlated conditional on an exposure status (the function definition is given at the end of this post):
In order to properly label the heatmap, we must label the matrix of gene expressions:
There are 3 types of palettes, sequential, diverging, and qualitative:
- Sequential palettes are suited to ordered data that progress from low to high. Lightness steps dominate the look of these schemes, with light colors for low data values to dark colors for high data values.
- Diverging palettes put equal emphasis on mid-range critical values and extremes at both ends of the data range. The critical class or break in the middle of the legend is emphasized with light colors and low and high extremes are emphasized with dark colors that have contrasting hues.
- Qualitative palettes do not imply magnitude differences between legend classes, and hues are used to create the primary visual differences between classes. Qualitative schemes are best suited to representing nominal or categorical data
To see the palettes available for coloring the heatmap:
You need to provide the
RColorBrewer::brewer.pal function with two arguments; the number of values (All the diverging palettes are available in variations from 3 different values up to 11 different values), and the name of the palette as shown in the figure above. We will use the Reds palette which has a maximum number of 9 colors:
If the subjects can be contrasted, it is useful to display this information on the heatmap e.g. case/control status or exposed vs. unexposed. To do so, we first need to create a separate data frame which contains that information. This data frame can contain many columns or just one column. (Note that the rownames of this data frame need to correspond to the rownames i.e. Subjects IDs of the gene expression data created above). In this example we create a data frame which has exposure status and tumor type for each subject:
We also want to annotate information on the genes, such as pathway membership. To do so, we create another data frame which has the gene annotations. Note once again that the rownames of this data frame need to correspond to the columnames i.e. Gene IDs of the gene expression data created above.
You need to decide if its important to cluster the rows and/or columns of your heatmap. If you decide to cluster, you must then choose the distance metric to use and the clustering method.
The pheatmap comes with lots of customizations (see the help page for a complete list of options). In this example I only want to cluster the genes (i.e. the rows), and place a gap between subject who were exposed and unexposed. Note that we must pass the transpose of the matrix for the
pheatmap function, which is not the case for other functions such as
Update: June 25, 2015
Interactive Heatmaps using
It is also possible to create Interactive heatmaps (in the sense that you can see the actual values by hovering your mouse over the plot) using the
d3heatmap pacakge available on github:
This is useful if you are producible markdown reports. The syntax is standard, though does not allow for multiple annotations as in
For some reason, this map is not showing up on this website, but it should work when compiling Rmarkdown scripts and viewing the resulting HTML document in your browser or within RStudio.
Update: August 7, 2015
Interactive Heatmaps using
After some user setup (see the plotly help page), the following code creates an interactive heatmap: