The University of Montana

Department of Mathematical Sciences

Technical report #21/2008

Exact bootstrap *k*-nearest neighbor learners

**Brian M. Steele**

Dept. of Mathematical Sciences

The University of Montana

Missoula MT 59812

**Abstract**

Bootstrap aggregation, or bagging, is a method of reducing the prediction
error of a statistical learner. The goal of bagging is to construct a new learner which is
the expectation of the original learner with respect to the empirical distribution function. In nearly all cases, the expectation cannot be computed analytically, and bootstrap
sampling is used to produce an approximation. The *k*-nearest neighbor learners are
exceptions to this generalization, and exact bagging of many *k*-nearest neighbor learners
is straightforward. This article presents computationally simple and fast formulae for exact bagging of *k*-nearest neighbor learners and extends exact bagging methods from the
conventional bootstrap sampling (sampling *n* observations with replacement from a set
of *n* observations) to bootstrap *sub*-sampling schemes (with and without replacement).
In addition, a *partially* exact *k*-nearest neighbor regression learner is developed. The
article also compares the prediction error associated with elementary and exact bagging
*k*-nearest neighbor learners, and several other ensemble methods using a suite of publicly
available data sets.

**Keywords:** bagging, k-nearest neighbor, classication, regression, ensemble methods

**AMS Subject Classification:**
** Download
Technical Report:** pdf (171 KB)