Sahir's blog
https://sahirbhatnagar.github.io/blog/
Recent content on Sahir's blogHugo -- gohugo.ioen-usTue, 16 Apr 2019 00:00:00 +0000Variable Selection and Prediction in Competing Risk Regression
https://sahirbhatnagar.github.io/blog/2019/04/16/variable-selection-prediction-competing-risk-regression/
Tue, 16 Apr 2019 00:00:00 +0000https://sahirbhatnagar.github.io/blog/2019/04/16/variable-selection-prediction-competing-risk-regression/Objective The main goal of this post is to:
Show one way of performing variable selection in a competing risks regression model
Evaluate the predictive performance for a list of models using resampling methods What the data looks like We will use the bmtcrr dataset from the casebase package available on CRAN. Here is what the data looks like:
pacman::p_load(casebase) head(bmtcrr) ## Sex D Phase Age Status Source ftime ## 1 M ALL Relapse 48 2 BM+PB 0.Jekyll website with Hugo blog with blogdown
https://sahirbhatnagar.github.io/blog/2019/04/12/jekyll-hugo-blogdown/
Fri, 12 Apr 2019 00:00:00 +0000https://sahirbhatnagar.github.io/blog/2019/04/12/jekyll-hugo-blogdown/I currently have a Jekyll based website for my academic website hosted on GitHub using the al-folio theme.
What I like about having a Jekyll based site Jekyll has been around for a long time now and thus has extensive documentation and support online. I also like the Jekyll-based al-folio theme because:
It automatically generates publications from a BibTeX file using jekyll-scholar.
Information can be spread across multiple pages (e.Getting Travis to Auto Push to GitHub
https://sahirbhatnagar.github.io/blog/2018/08/23/getting-travis-to-auto-push-to-github/
Thu, 23 Aug 2018 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2018/08/23/getting-travis-to-auto-push-to-github/Following the advice given on the bookdown site, I use Travis-CI to automatically build my bookdown book and push it to gh-pages. I always forget how to do this, so I’m writing these notes to supplement what is already written there.
Follow these instructions on GitHub to create a personal access token (PAT) Copy the generated PAT to your clipboard
From the command line and within the root directory of your repository hosting the source of the bookdown, enter the following command:Documenting R Packages
https://sahirbhatnagar.github.io/blog/2018/04/03/documenting-r-packages/
Tue, 03 Apr 2018 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2018/04/03/documenting-r-packages/Some brief notes on my R package documentation workflow.
styler I first use the styler package for pretty-printing of R source code
pacman::p_load_gh("r-lib/styler") # run style_dir on R directory styler::style_dir("./R") prefixer Next I use the prefixer package to prefix all my functions with their NAMESPACE
pacman::p_load_gh("dreamRs/prefixer") # launch the addin via RStudio's Addins menu sinew In a third step, I use the sinew package to generate a roxygen2 skeleton on all my R source code filesI Am Color Blind
https://sahirbhatnagar.github.io/blog/2018/04/03/i-am-color-blind/
Tue, 03 Apr 2018 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2018/04/03/i-am-color-blind/I recently gave a mini-course on regression trees. As I talked about the red region in the following figure:
one of the audience members said they had no clue what I was referring to. It turns out they were color blind and could not identify the red region. Yikes!
Here is a colorblind friendly pallette courtesy of Cookbook for R
cbbPalette Mixed Models with Kinship in R
https://sahirbhatnagar.github.io/blog/2017/10/12/mixed-models-with-kinship-in-r/
Thu, 12 Oct 2017 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2017/10/12/mixed-models-with-kinship-in-r/In this post, I describe how to estimate a kinship matrix and subsequently fit a mixed model using that estimated kinship. In particular, I show how this can be done on an arbitrary matrix of genotype data, which is not stored in plink format. I also show how to deal with missing genotypes.
Load Packages # install.packages("gaston") library(gaston) Set Simulation Parameters # number of subjects n <- 1000 # number of genotypes p <- 1e4 # number of causal genotypes p_causal <- 50 # Signal to noise ratio signal_to_noise_ratio <- 2 # vector of allele frequencies from which to sample probs <- c(0.Polygenic Risks Scores with data.table in R
https://sahirbhatnagar.github.io/blog/2017/08/11/polygenic-risks-scores-with-data.table-in-r/
Fri, 11 Aug 2017 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2017/08/11/polygenic-risks-scores-with-data.table-in-r/<p>In this short post, I show how to calculate <a href="https://en.wikipedia.org/wiki/Polygenic_score">polygenic risk scores</a> (PRS) using the <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html"><code>data.table</code></a> package in <code>R</code>. I will show an example on a small dataset, but can be easily extended to much larger datasets. The PRS based on <code>$p$</code> SNPs is given by:</p>
<p><code>$$
PRS_i = \sum_{j=1}^{p}\beta_j \times SNP_{ij}
$$</code></p>
<p>where <code>$\beta_j$</code> is the beta coefficient for the <code>$j^{th}$</code> SNP, and <code>$SNP_{ij}$</code> is the value of <code>$j^{th}$</code> SNP for the <code>$i^{th}$</code> individual.</p>Bayesian Non-Linear Multilevel Models
https://sahirbhatnagar.github.io/blog/2017/02/23/bayesian-non-linear-multilevel-models/
Thu, 23 Feb 2017 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2017/02/23/bayesian-non-linear-multilevel-models/<p>Consider the following repeated measures model:</p>
<p><code>$$y_{ij} =\beta_0 + \beta_1 a_{ij} + \beta_1^2 b_{ij} + \mu_i + \varepsilon_{ij}$$</code></p>
<p>for <code>$i = 1, \ldots, n$</code>, <code>$j = 1, 2$</code> where <code>$n$</code> is the sample size, <code>$j$</code> represents the index of the repeated measure, i.e., each subject has two measurements, <code>$\mu_i$</code> is a normally distributed random effect, <code>$\varepsilon_{ij}$</code> is a normally distributed error term, <code>$y_{ij}$</code> is the continuous response, and <code>$a_{ij}, b_{ij}$</code> are covariates. This is a multilevel model because of the nested structure of the data, and also non-linear in the <code>$\beta_1$</code> parameter. In this post I simulate some data under this model, and try to leverage Bayesian computation techniques to estimate the parameters using the <a href="https://github.com/paul-buerkner/brms">brms</a> which is an interface to fit Bayesian generalized (non-)linear multilevel models using <a href="http://mc-stan.org/">Stan</a>.</p>
<p><img src="http://i.imgur.com/r0IXN1w.jpg" alt="" /></p>Limma Moderated and Ordinary t-statistics
https://sahirbhatnagar.github.io/blog/2017/02/07/limma-moderated-and-ordinary-t-statistics/
Tue, 07 Feb 2017 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2017/02/07/limma-moderated-and-ordinary-t-statistics/<p>When analyzing large amounts of genetic and genomic data, the first line of analysis is usually some sort of univariate test. That is, conduct a statistical test for each SNP or CpG site or Gene and then correct for multiple testing. The <a href="https://bioconductor.org/packages/release/bioc/html/limma.html">limma</a> package on Bioconductor is a popular method for computing <em>moderated</em> t-statistics using a combination of the <code>limma::lmFit</code> and <code>limma::eBayes</code> functions. In this post, I show how to calculate the <em>ordinary</em> t-statistics from <code>limma</code> output.</p>A Plain Markdown Post
https://sahirbhatnagar.github.io/blog/2016/12/30/a-plain-markdown-post/
Fri, 30 Dec 2016 21:49:57 -0700https://sahirbhatnagar.github.io/blog/2016/12/30/a-plain-markdown-post/This is a post written in plain Markdown (*.md) instead of R Markdown (*.Rmd). The major differences are:
You cannot run any R code in a plain Markdown document, whereas in an R Markdown document, you can embed R code chunks (```{r}); A plain Markdown post is rendered through Blackfriday, and an R Markdown document is compiled by rmarkdown and Pandoc. There are many differences in syntax between Blackfriday’s Markdown and Pandoc’s Markdown.About
https://sahirbhatnagar.github.io/blog/about/
Thu, 05 May 2016 21:48:51 -0700https://sahirbhatnagar.github.io/blog/about/This is a “hello world” example website for the blogdown package. The theme was forked from @jrutheiser/hugo-lithium-theme and modified by Yihui Xie.Statistical Power in t tests with Unequal Group Sizes
https://sahirbhatnagar.github.io/blog/2016/02/25/statistical-power-in-t-tests-with-unequal-group-sizes/
Thu, 25 Feb 2016 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2016/02/25/statistical-power-in-t-tests-with-unequal-group-sizes/<p>When performing <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">Student’s t-test</a> to compare difference in means between two group, it is a useful exercise to determine the effect of unequal sample sizes in the comparison groups on power. Large imbalances generally will not have adequate statistical power to detect even large effect sizes associated with a factor, leading to a high Type II error rate as shown in the figure below:</p>
<p><img src="https://sahirbhatnagar.github.io/blog/figure/posts/2016-02-25-power_ttest_sample_size/unnamed-chunk-2-1.png" alt="plot of chunk unnamed-chunk-2" /></p>Math Expressions with Facets in ggplot2
https://sahirbhatnagar.github.io/blog/2016/02/08/math-expressions-with-facets-in-ggplot2/
Mon, 08 Feb 2016 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2016/02/08/math-expressions-with-facets-in-ggplot2/<p>In this post I show how we can use <code>$\LaTeX$</code> math expressions to label the panels in facets to produce the following plot:</p>
<p><img src="https://sahirbhatnagar.github.io/blog/figure/posts/2016-02-08-facet_wrap_labels/unnamed-chunk-1-1.png" title="plot of chunk unnamed-chunk-1" alt="plot of chunk unnamed-chunk-1" style="display: block; margin: auto;" /></p>Hello R Markdown
https://sahirbhatnagar.github.io/blog/2015/07/23/hello-r-markdown/
Thu, 23 Jul 2015 21:13:14 -0500https://sahirbhatnagar.github.io/blog/2015/07/23/hello-r-markdown/R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
You can embed an R code chunk like this:
summary(cars) ## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.Heatmaps in R
https://sahirbhatnagar.github.io/blog/2015/06/10/heatmaps-in-r/
Wed, 10 Jun 2015 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2015/06/10/heatmaps-in-r/<p>In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data. In this post I simulate some gene expression data and visualise it using the <code>pheatmap</code> function from the <a href="http://cran.r-project.org/web/packages/pheatmap/">pheatmap</a> package in <code>R</code>. You will also need the <code>mvrnorm</code> function from the <a href="http://cran.r-project.org/web/packages/MASS/index.html">MASS</a> library to simulate from a multivariate normal distribution, and the <code>brewer.pal</code> function from the <a href="http://cran.r-project.org/web/packages/RColorBrewer/index.html">RColorBrewer</a> library for easier customization of colors.</p>Contrasts in R
https://sahirbhatnagar.github.io/blog/2015/03/04/contrasts-in-r/
Wed, 04 Mar 2015 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2015/03/04/contrasts-in-r/<p>In this post I discuss how to create custom contrasts for factor variables in <code>R</code>. First lets create some simulated data. Create the data, and factor Disease status:</p>
<pre class="r"><code>
Disease <- c(rep("RA", 5), rep("SLE", 5), rep("Scleroderma", 5),
rep("Myositis", 5), rep("Control", 5))
set.seed(1234)
sex <- rbinom(25,1, 0.5)
age <- rnorm(25, 40, 5)
y <- rnorm(25, 0.5, 0.12)
data <- data.frame(y,sex,age,Disease=factor(Disease))
str(data)
</code></pre>
<pre class="r"><code>
## 'data.frame': 25 obs. of 4 variables:
## $ y : num 0.506 0.323 0.552 0.492 0.513 ...
## $ sex : int 0 1 1 1 1 1 0 0 1 1 ...
## $ age : num 44.4 46.9 31.6 36.9 40.1 ...
## $ Disease: Factor w/ 5 levels "Control","Myositis",..: 3 3 3 3 3 5 5 5 5 5 ...
</code></pre>
<p>We want the following contrasts:</p>
<ul>
<li>Control versus all 4 diseases combined</li>
<li>RA versus the combination of (SLE, Scleroderma, Myositis), leaving out the Controls</li>
</ul>Gradient Descent
https://sahirbhatnagar.github.io/blog/2014/11/15/gradient-descent/
Sat, 15 Nov 2014 15:09:00 +0000https://sahirbhatnagar.github.io/blog/2014/11/15/gradient-descent/<p>I am taking the Machine Learning course on <a href="https://class.coursera.org/ml-007/lecture">Coursera</a> being taught by Andrew Ng. It is turning out to be useful so far, and he has presented the material clearly. It’s a nice introduction to the Machine Learning/Computer Science language, since I come from a statistics background.</p>
<p>I learned about gradient descent today for simple linear regression. The following is my code in R and I compare it to the <em>lm</em> function in base <em>R</em>.</p>CDPATH in Bash
https://sahirbhatnagar.github.io/blog/2014/07/04/cdpath-in-bash/
Fri, 04 Jul 2014 11:03:16 -0400https://sahirbhatnagar.github.io/blog/2014/07/04/cdpath-in-bash/<p>Instead of constantly typing the full path when using the <code>cd</code> command, <code>BASH</code> has a built-in feature called <code>CDPATH</code>. Thanks to <em>lhunath</em> who explained in this <a href="http://stackoverflow.com/questions/670488/how-to-manage-long-paths-in-bash">SO Post</a> how to use this feature.</p>