Gradient Descent
15 Nov 2014
I am taking the Machine Learning course on Coursera being taught by Andrew Ng. It is turning out to be useful so far, and he has presented the material clearly. Itâ€™s a nice introduction to the Machine Learning/Computer Science language, since I come from a statistics background.

I learned about gradient descent today for simple linear regression. The following is my code in R and I compare it to the lm function in base R .
I am using the Prostate dataset from the lasso2 package. The model I am fitting is:

#prostate cancer data set
library ( lasso2 )

```
## R Package to solve regression problems while imposing
## an L1 constraint on the parameters. Based on S-plus Release 2.1
## Copyright (C) 1998, 1999
## Justin Lokhorst <jlokhors@stats.adelaide.edu.au>
## Berwin A. Turlach <bturlach@stats.adelaide.edu.au>
## Bill Venables <wvenable@stats.adelaide.edu.au>
##
## Copyright (C) 2002
## Martin Maechler <maechler@stat.math.ethz.ch>
```

data ( Prostate )
# hypothesis
hypothesis <- function ( x , theta0 , theta1 ){
h <- theta0 + theta1 * x
return ( h )
}
# Jacobian
deriv <- function ( x , y , theta0 , theta1 ){
dt0 <- ( length ( x )) ^ ( -1 ) * sum (( hypothesis ( x , theta0 , theta1 ) - y ))
dt1 <- ( length ( x )) ^ ( -1 ) * t ( x ) %*% ( hypothesis ( x , theta0 , theta1 ) - y )
return ( c ( dt0 , dt1 ))
}
theta <- c ( 0 , 0 )
alpha <- 0.5
X <- Prostate $ lcavol
Y <- Prostate $ lpsa
i = 1
#
theta.star <- deriv ( Prostate $ lcavol , Prostate $ lpsa , theta [ 1 ], theta [ 2 ])
# set convergence threshold
threshold <- 1e-7
# logical to check if threshold has been achieved
continue = TRUE
while ( continue ){
theta [ 1 ] <- theta.star [ 1 ] - alpha * deriv ( x = X , y = Y , theta.star [ 1 ], theta.star [ 2 ])[ 1 ]
theta [ 2 ] <- theta.star [ 2 ] - alpha * deriv ( x = X , y = Y , theta.star [ 1 ], theta.star [ 2 ])[ 2 ]
continue <- ( abs (( theta.star - theta )[ 1 ]) > threshold & abs (( theta.star - theta )[ 2 ]) > threshold )
theta.star [ 1 ] <- theta [ 1 ]
theta.star [ 2 ] <- theta [ 2 ]
i = i +1
}
# number of iterations
i

`## [1] 214`

# beta0 and beta1
theta.star

`## [1] 1.5072975 0.7193205`

# compare to lm
fit <- lm ( lpsa ~ lcavol , data = Prostate )
summary ( fit )

```
##
## Call:
## lm(formula = lpsa ~ lcavol, data = Prostate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.67624 -0.41648 0.09859 0.50709 1.89672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50730 0.12194 12.36 <2e-16 ***
## lcavol 0.71932 0.06819 10.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7875 on 95 degrees of freedom
## Multiple R-squared: 0.5394, Adjusted R-squared: 0.5346
## F-statistic: 111.3 on 1 and 95 DF, p-value: < 2.2e-16
```

If you have any questions or comments, please post them below.