In this post I discuss how to create custom contrasts for factor variables in R. First lets create some simulated data. Create the data, and factor Disease status:

We want the following contrasts:

Control versus all 4 diseases combined

RA versus the combination of (SLE, Scleroderma, Myositis), leaving out the Controls

Default settings

Let be the indicators for Myositis, RA, Scleroderma and SLE, respectively. The standard linear model R will fit is given by (for simplicity I am ignoring age and sex, but it won’t make a difference when you add them in the model):

This is the default contrast matrix with unordered factor variables:

Myositis

RA

Scleroderma

SLE

Control

0

0

0

0

Myositis

1

0

0

0

RA

0

1

0

0

Scleroderma

0

0

1

0

SLE

0

0

0

1

This compares the mean of the response for the Controls to the mean of the response for Myositis, RA, Scleroderma, and SLE separately. The table can be read by column, and the numbers in the columns represent the weight of the regression coefficient, e.g. in the first column Myositis is being compare to Control.

Custom Contrats

Since we want only two contrasts, we want R to fit the following model:

where represents the contrast estimate for the comparison between controls and all other diseases, and represents the contrast estimate of RA versus the combination of SLE, Scleroderma, Myositis.

To create custom contrasts, we must specify the contrast matrix as follows:

Control_vs_All

RA_vs_Myos_Scle_SLE

Control

0.8

0.0000000

Myositis

-0.2

-0.3333333

RA

-0.2

1.0000000

Scleroderma

-0.2

-0.3333333

SLE

-0.2

-0.3333333

Again we look at the above table, column by column. The variables we want to contrast should have opposite signs and the columns should sum to 0. This contrast matrix leads to the following mean response equations for each of the groups:

To solve for we can add up all the equations to get

To solve for we substract from the combined mean of and which gives:

To solve for we substract from the combined mean of and which gives:

First we create the contrast matrix with appropriate row and column names for clarity:

Then we store the contrasts attribute to the Disease variable. The how.many argument specifies how many contrasts we want, therefore this should correspond to the number of columns in the contrast matrix.

Here we check to make sure that the lm fit is giving the same result as the formulas derived above: