Title: | Bias Corrected Bootstrap Confidence Intervals |
---|---|
Description: | Computation of bootstrap confidence intervals in an almost automatic fashion as described in Efron and Narasimhan (2020, <doi:10.1080/10618600.2020.1714633>). |
Authors: | Bradley Efron [aut], Balasubramanian Narasimhan [aut, cre] |
Maintainer: | Balasubramanian Narasimhan <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2-3 |
Built: | 2024-10-27 05:50:20 UTC |
Source: | https://github.com/bnaras/bcaboot |
Bootstrap confidence intervals depend on three elements: (a) the cumulative distribution of the bootstrap replications, (b) the bias-correction, and (c) the acceleration number that measures the rate of change in the standard deviation of the estimate as the data changes. The first two of these depend only on the bootstrap distribution, and not how it is generated: parametrically or non-parametrically. Therefore, the only difference in a parametric bca analysis would lie in the nonparametric estimation of the acceleration, often a negligible error.
This routine computes nonparametric confidence intervals for bootstrap estimates. For reproducibility, save or set the random number state before calling this routine.
bcajack( x, B, func, ..., m = nrow(x), mr = 5, K = 2, J = 10, alpha = c(0.025, 0.05, 0.1, 0.16), verbose = TRUE )
bcajack( x, B, func, ..., m = nrow(x), mr = 5, K = 2, J = 10, alpha = c(0.025, 0.05, 0.1, 0.16), verbose = TRUE )
x |
an |
B |
number of bootstrap replications. It can also be a vector
of |
func |
function |
... |
additional arguments for |
m |
an integer less than or equal to |
mr |
if |
K |
a non-negative integer. If |
J |
the number of groups into which the bootstrap replications are split |
alpha |
percentiles desired for the bca confidence limits. One
only needs to provide |
verbose |
logical for verbose progress messages |
Bootstrap confidence intervals depend on three elements:
the cdf of the bootstrap replications
,
the bias-correction number
where
is the original estimate
the acceleration number that measures the rate of
change in
as
, the data changes.
The first two of these depend only on the bootstrap distribution,
and not how it is generated: parametrically or
non-parametrically. Program bcajack can be used in a hybrid fashion
in which the vector tt
of B bootstrap replications is first
generated from a parametric model.
So, in the diabetes example below, we might first draw bootstrap
samples where
and
were obtained from
lm(y~X)
; each would then provide a bootstrap
replication
tstar = rfun(cbind(X, ystar))
. Then we could get bca
intervals from bcajack(Xy, tt, rfun ....)
with tt
,
the vector of B tstar
values. The only difference from a full
parametric bca analysis would lie in the nonparametric estimation
of , often a negligible error.
a named list of several items
lims : first column shows the estimated bca confidence limits
at the requested alpha percentiles. These can be compared with
the standard limits , third column. The second column
jacksd
gives the internal standard errors for the bca limits,
quite small in the example. Column 4, pct
, gives the
percentiles of the ordered B bootstrap replications
corresponding to the bca limits, eg the 897th largest
replication equalling the .975 bca limit .557.
stats : top line of stats shows 5 estimates: theta is
, original point estimate of the parameter of
interest;
sdboot
is its bootstrap estimate of standard error;
z0
is the bca bias correction value, in this case quite
negative; a
is the acceleration, a component of the bca
limits (nearly zero here); sdjack
is the jackknife estimate
of standard error for theta. Bottom line gives the internal
standard errors for the five quantities above. This is
substantial for z0
above.
B.mean : bootstrap sample size B, and the mean of the B
bootstrap replications
ustats : The bias-corrected estimator 2 * t0 - mean(tt)
,
and an estimate sdu
of its sampling error
seed : The random number state for reproducibility
DiCiccio T and Efron B (1996). Bootstrap confidence intervals. Statistical Science 11, 189-228
Efron B (1987). Better bootstrap confidence intervals. JASA 82 171-200
B. Efron and B. Narasimhan. Automatic Construction of Bootstrap Confidence Intervals, 2018.
data(diabetes, package = "bcaboot") Xy <- cbind(diabetes$x, diabetes$y) rfun <- function(Xy) { y <- Xy[, 11] X <- Xy[, 1:10] summary(lm(y~X) )$adj.r.squared } set.seed(1234) ## n = 442 = 34 * 13 bcajack(x = Xy, B = 1000, func = rfun, m = 34, verbose = FALSE)
data(diabetes, package = "bcaboot") Xy <- cbind(diabetes$x, diabetes$y) rfun <- function(Xy) { y <- Xy[, 11] X <- Xy[, 1:10] summary(lm(y~X) )$adj.r.squared } set.seed(1234) ## n = 442 = 34 * 13 bcajack(x = Xy, B = 1000, func = rfun, m = 34, verbose = FALSE)
This function is a version of bcajack
that allows
all the recomputations of the original statistic function
to be carried out separately. This is an advantage
if
is time-consuming, in which case the B
replications for the nonparametric bca calculations might need
to be done on a distributed basis.
To use bcajack2
in this mode, we first compute a list Blist
via
Blist <- list(Y = Y, tt = tt, t0 = t0)
. Here tt
is a vector of
length B
having i-th entry tt[i] <- func(x[Ii,], ...)
, where x
is the data matrix and
Ii
is a bootstrap vector
of (observation) indices. Y
is a B
by count matrix,
whose i-th row is the counts corresponding to
Ii
. For example if
n = 5 and Ii = (2, 5, 2, 1, 4)
, then Yi = (1, 2, 0, 1, 1)
. Having computed Blist
, bcajack2
is invoked as
bcajack2(Blist)
without need to enter the function .
bcajack2( x, B, func, ..., m = nrow(x), mr, pct = 0.333, K = 2, J = 12, alpha = c(0.025, 0.05, 0.1, 0.16), verbose = TRUE )
bcajack2( x, B, func, ..., m = nrow(x), mr, pct = 0.333, K = 2, J = 12, alpha = c(0.025, 0.05, 0.1, 0.16), verbose = TRUE )
x |
an |
B |
number of bootstrap replications. |
func |
function |
... |
additional arguments for |
m |
an integer less than or equal to |
mr |
if |
pct |
|
K |
a non-negative integer. If |
J |
the number of groups into which the bootstrap replications are split |
alpha |
percentiles desired for the bca confidence limits. One
only needs to provide |
verbose |
logical for verbose progress messages |
a named list of several items
lims : first column shows the estimated bca confidence limits
at the requested alpha percentiles. These can be compared with
the standard limits , third column. The second column
jacksd
gives the internal standard errors for the bca limits,
quite small in the example. Column 4, pct
, gives the
percentiles of the ordered B bootstrap replications
corresponding to the bca limits, eg the 897th largest
replication equalling the .975 bca limit .557.
stats : top line of stats shows 5 estimates: theta is
, original point estimate of the parameter of
interest;
sdboot
is its bootstrap estimate of standard error;
z0
is the bca bias correction value, in this case quite
negative; a
is the acceleration, a component of the bca
limits (nearly zero here); sdjack
is the jackknife estimate
of standard error for theta. Bottom line gives the internal
standard errors for the five quantities above. This is
substantial for z0
above.
B.mean : bootstrap sample size B, and the mean of the B
bootstrap replications
ustats : The bias-corrected estimator 2 * t0 - mean(tt)
,
and an estimate sdu
of its sampling error
seed : The random number state for reproducibility
data(diabetes, package = "bcaboot") Xy <- cbind(diabetes$x, diabetes$y) rfun <- function(Xy) { y <- Xy[, 11] X <- Xy[, 1:10] summary(lm(y~X) )$adj.r.squared } set.seed(1234) bcajack2(x = Xy, B = 1000, func = rfun, m = 40, verbose = FALSE)
data(diabetes, package = "bcaboot") Xy <- cbind(diabetes$x, diabetes$y) rfun <- function(Xy) { y <- Xy[, 11] X <- Xy[, 1:10] summary(lm(y~X) )$adj.r.squared } set.seed(1234) bcajack2(x = Xy, B = 1000, func = rfun, m = 40, verbose = FALSE)
bcapar computes parametric bootstrap confidence intervals for a real-valued parameter theta in a p-parameter exponential family. It is described in Section 4 of the reference below.
bcapar( t0, tt, bb, alpha = c(0.025, 0.05, 0.1, 0.16), J = 10, K = 6, trun = 0.001, pct = 0.333, cd = 0, func )
bcapar( t0, tt, bb, alpha = c(0.025, 0.05, 0.1, 0.16), J = 10, K = 6, trun = 0.001, pct = 0.333, cd = 0, func )
t0 |
Observed estimate of theta, usually by maximum likelihood. |
tt |
A vector of parametric bootstrap replications of theta of
length |
bb |
A |
alpha |
percentiles desired for the bca confidence limits. One
only needs to provide |
J , K
|
Parameters controlling the jackknife estimates of Monte
Carlo error: |
trun |
Truncation parameter used in the calculation of the
acceleration |
pct |
Proportion of "nearby" b vectors used in the calculation
of |
cd |
If cd is 1 the bca confidence density is also returned; see Section 11.6 in reference Efron and Hastie (2016) below |
func |
Function |
a named list of several items:
lims : Bca confidence limits (first column) and the standard
limits (fourth column). Also the abc limits (fifth column) if
func
is provided. The second column, jacksd
, are the
jackknife estimates of Monte Carlo error; pct
, the third
column are the proportion of the replicates tt
less than each
bcalim
value
stats : Estimates and their jackknife Monte Carlo errors:
theta
= ;
sd
, the bootstrap standard deviation
for ;
a
the acceleration estimate; az
another
acceleration estimate that depends less on extreme values of tt
;
z0
the bias-correction estimate; A
the big-A measure of raw
acceleration; sdd
delta method estimate for standard deviation of
;
mean
the average of tt
abcstats : The abc estimates of a
and z0
, returned if func
was provided
ustats : The bias-corrected estimator 2 * t0 - mean(tt)
. ustats
gives ustat
, an estimate sdu
of its sampling error, and jackknife
estimates of monte carlo error for both ustat
and sdu
. Also given
is B
, the number of bootstrap replications
seed : The random number state for reproducibility
DiCiccio T and Efron B (1996). Bootstrap confidence intervals. Statistical Science 11, 189-228
T. DiCiccio and B. Efron. More accurate confidence intervals in exponential families. Biometrika (1992) p231-245.
Efron B (1987). Better bootstrap confidence intervals. JASA 82, 171-200
B. Efron and T. Hastie. Computer Age Statistical Inference. Cambridge University Press, 2016.
B. Efron and B. Narasimhan. Automatic Construction of Bootstrap Confidence Intervals, 2018.
data(diabetes, package = "bcaboot") X <- diabetes$x y <- scale(diabetes$y, center = TRUE, scale = FALSE) lm.model <- lm(y ~ X - 1) mu.hat <- lm.model$fitted.values sigma.hat <- stats::sd(lm.model$residuals) t0 <- summary(lm.model)$adj.r.squared y.star <- sapply(mu.hat, rnorm, n = 1000, sd = sigma.hat) tt <- apply(y.star, 1, function(y) summary(lm(y ~ X - 1))$adj.r.squared) b.star <- y.star %*% X set.seed(1234) bcapar(t0 = t0, tt = tt, bb = b.star)
data(diabetes, package = "bcaboot") X <- diabetes$x y <- scale(diabetes$y, center = TRUE, scale = FALSE) lm.model <- lm(y ~ X - 1) mu.hat <- lm.model$fitted.values sigma.hat <- stats::sd(lm.model$residuals) t0 <- summary(lm.model)$adj.r.squared y.star <- sapply(mu.hat, rnorm, n = 1000, sd = sigma.hat) tt <- apply(y.star, 1, function(y) summary(lm(y ~ X - 1))$adj.r.squared) b.star <- y.star %*% X set.seed(1234) bcapar(t0 = t0, tt = tt, bb = b.star)
bcaplot
uses the output of bcajack
,
bcajack2
, or bcapar
to plot bca and standard
confidence limits for the parameter of interest.
bcaplot( vl, main = " ", xlab = "coverage", ylab = "limits", alpha = c(0.025, 0.05, 0.1, 0.16), ylim, xlim, add = 0, sub = "black=bca, green=standard", sw = 1, ... )
bcaplot( vl, main = " ", xlab = "coverage", ylab = "limits", alpha = c(0.025, 0.05, 0.1, 0.16), ylim, xlim, add = 0, sub = "black=bca, green=standard", sw = 1, ... )
vl |
output of |
main |
The main caption (can be empty) |
xlab |
The x axis label (supplied if not specified) |
ylab |
The y axis labels (supplied if not specified) |
alpha |
Coverages are |
ylim |
y axis plot limits set automatically if not provided |
xlim |
x axis plot limits set automatically if not provided |
add |
|
sub |
subtitle (can be empty) |
sw |
|
... |
further args for plot |
confidence interval endpoints are plotted vertically versus
two-sided coverages . Bca limits in black,
Standard limits in green (dashed.). If
vl$lims
includes
the column "jacksd" of jackknife internal standard deviations
then these are indicated by vertical red bars centered at the
bca limit points.
The diabetes
data frame has 442 rows and 3 columns. These are the
data used in the Efron et al "Least Angle Regression" paper.
This data frame contains the following columns:
x a matrix with 10 columns
y a numeric vector
x2 a matrix with 64 columns
The x matrix has been standardized to have unit L2 norm in each column and zero mean. The matrix x2 consists of x plus certain interactions.
https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf
Efron, Hastie, Johnstone and Tibshirani (2003) "Least Angle Regression" (with discussion) Annals of Statistics