In this short post, I show how to calculate polygenic risk scores (PRS) using the data.table package in R. I will show an example on a small dataset, but can be easily extended to much larger datasets. The PRS based on $p$ SNPs is given by:

where $\beta_j$ is the beta coefficient for the $j^{th}$ SNP, and $SNP_{ij}$ is the value of $j^{th}$ SNP for the $i^{th}$ individual.

First we load the data.table package:

Next we create some sample data, where the rows are SNPs and the columns are individuals. This data.table also contains a column with the beta coefficients for each SNP.

In situations where there are many individuals, this can be a tedious calculation because standard methods would require to type out the name of all of the columns to be multiplied by. Instead, using the .SDcols argument, greatly simplifies this process by allowing columns to be called on dynamically.

We first create a character vector of the column names that we want to multiply beta by as well as the new column names that we want to store the results in:

Now with a simple command we get the desired multiplication:

The PRS for each individual can be calculated using the colSums function in base R: