Bootstrap

In this exercise, I compare a Montecarlo estimator with a bootstrap one.

Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns of \(X\) and \(Y\), respectively, where \(X\) and \(Y\) are random quantities. We will invest a fraction \(\alpha\) of our money in \(X\), and will invest the remaining \(1-\alpha\) in \(Y\) . Since there is variability associated with the returns on these two assets, we wish to choose \(\alpha\) to minimize the total risk, or variance, of our investment. In other words, we want to minimize \(Var(\alpha X + (1 − \alpha)Y )\). One can show that the value that minimizes the risk is given by

\[\begin{equation} \alpha=\frac{\sigma^2_Y-\sigma_{XY}}{\sigma^2_X+\sigma^2_Y-2\sigma_{XY}}\label{eq::1} \end{equation}\]

where \(\sigma^2_X = Var(X)\), \(\sigma^2_Y = Var(Y)\), and \(\sigma^2_{XY} = Cov(X,Y)\). The interest is to estimate \(\alpha\), based on the bootstrap technique.

Simulate \(100\) observations with set.seed(123), such that

\[X,Y\sim N_2\left[ \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1&0.5\\ 0.5 & 1.25 \end{pmatrix} \right]\]

Call this data set original. Compute \(\hat{\alpha}\) with \(s^2_X\), \(s^2_Y\) and \(s_{XY}\)

library(mnormt)
mean=c(0,0)
varcov=matrix(c(1,0.5,0.5,1.25),2,2)

set.seed(123)
original=rmnorm(n=100,mean=mean,varcov=varcov)

vX=var(original[,1])
vY=var(original[,2])
covXY=cov(original[,1],original[,2])

alpha.or=(vY-covXY)/(vX+vY-2*covXY)
vX

## [1] 0.8090975

vY

## [1] 1.030543

covXY

## [1] 0.2542461

alpha.or

## [1] 0.5831784

Using the set.seed(Sys.time()) function, remove set.seed(123) . By using the lapply function, make a list called data.simulation and keep \(1000\) data sets with the same previous characteristics.

set.seed(Sys.time())
n.dataset=1000
data.simulation=lapply(1:n.dataset, 
                       function(a) rmnorm(n=100,mean=mean,varcov=varcov))
class(data.simulation[[1]])

## [1] "matrix"

Using both the lapply and the apply functions compute \(s^2_X\), \(s^2_Y\) and \(s_{XY}\) from each element of data.simulation. Now, compute \(\hat{\alpha}\) from each element of data.simulation. Keep them in only one data set of \(\hat{\alpha}\).

varXY.sim=lapply(1:n.dataset, 
                 function(a) apply(data.simulation[[a]],2,var))

head(varXY.sim)

## [[1]]
## [1] 1.078819 1.363634
## 
## [[2]]
## [1] 1.039702 1.331717
## 
## [[3]]
## [1] 0.920765 1.033606
## 
## [[4]]
## [1] 1.097335 1.259797
## 
## [[5]]
## [1] 1.091336 1.250979
## 
## [[6]]
## [1] 1.236733 1.240761

covXY.sim=lapply(1:n.dataset, 
                 function(a) cov(data.simulation[[a]][,1],data.simulation[[a]][,2]))

head(covXY.sim)

## [[1]]
## [1] 0.5511685
## 
## [[2]]
## [1] 0.4647191
## 
## [[3]]
## [1] 0.2827962
## 
## [[4]]
## [1] 0.2983015
## 
## [[5]]
## [1] 0.3985677
## 
## [[6]]
## [1] 0.5235005

alpha.sim=lapply(1:n.dataset, 
                 function(a) 
                 (varXY.sim[[a]][2]-covXY.sim[[a]])/(varXY.sim[[a]][1]+varXY.sim[[a]][2]-2*covXY.sim[[a]]))

head(alpha.sim)

## [[1]]
## [1] 0.6062652
## 
## [[2]]
## [1] 0.6012547
## 
## [[3]]
## [1] 0.5406261
## 
## [[4]]
## [1] 0.5461401
## 
## [[5]]
## [1] 0.5516581
## 
## [[6]]
## [1] 0.5014081

alpha=do.call('rbind',alpha.sim)
head(alpha)

##           [,1]
## [1,] 0.6062652
## [2,] 0.6012547
## [3,] 0.5406261
## [4,] 0.5461401
## [5,] 0.5516581
## [6,] 0.5014081

Make a histogram of \(\hat{\alpha}\) with a vertical red line in \(\hat{\alpha}=0.6\) (the true value of \(\alpha\)). Compute the mean and s.d of \(\hat{\alpha}\). This is the \(\alpha\) Montecarlo Estimator and it standard error.

hist(alpha)
abline(v=0.6,col="red")

mean(alpha);sd(alpha)

## [1] 0.5995154

## [1] 0.07850607

Bootstrap function: Make a function called alpha.function that returns \(\hat{\alpha}\) in any data set based on applying the \(\alpha\) function to the observations indexed by the argument index. Apply your function in order to compute \(\hat{\alpha}\) from data set using all the observations. Now, repeat this \(1000\) times using the sample() function to randomly select \(100\) observations from the range \(1\) to \(100\), with replacement. Make a histogram of \(\hat{\alpha}\) with a vertical red line in \(\hat{\alpha}=0.6\) (the true value of \(\alpha\)). Compute the mean and s.d of \(\hat{\alpha}\). Compare the results with them obtained in 4.

alpha.function=function(data,index)
{
 X=data[index,1]
 Y=data[index,2]
 return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}
alpha.function(original,1:dim(original)[1])

## [1] 0.5831784

## This value must equal to alpha.or

alpha.sample=do.call('rbind', 
                     lapply(1:1000, 
                            function(a) alpha.function(original,sample(1:100,100,replace=TRUE))))
head(alpha.sample)

##           [,1]
## [1,] 0.5967613
## [2,] 0.6441368
## [3,] 0.4665356
## [4,] 0.4673642
## [5,] 0.5233725
## [6,] 0.5335732

par(mfrow=c(1,2))
hist(alpha.sample)
abline(v=0.6,col="red")
hist(alpha)
abline(v=0.6,col="red")

mean(alpha);sd(alpha) ## results from 4.)

## [1] 0.5995154

## [1] 0.07850607

mean(alpha.sample);sd(alpha.sample) ## bootstrap results

## [1] 0.5844732

## [1] 0.06534835

From the boot library use the boot function with \(R=1000\) (The number of bootstrap replicates) bootstrap estimates for \(\alpha\). Compare on the results obtained in 5.

library(boot)
boot(original,alpha.function,R=1000)

## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = original, statistic = alpha.function, R = 1000)
## 
## 
## Bootstrap Statistics :
##      original       bias    std. error
## t1* 0.5831784 -0.001275389  0.06517734

mean(alpha.sample);sd(alpha.sample) ## results from 5.)

## [1] 0.5844732

## [1] 0.06534835

Bootstrap

Example based on Chapter 5.2 to An Introduction to Statistical Learning. With applications in R

Eduardo Alarcón-Bustamante

December 10, 2021