An
Introduction to Ensemble Methods for Data Analysis
Richard A. Berk (UCLA)
ABSTRACT
There are a
growing number of new statistical procedures Leo Breiman
(2001b) has called "algorithmic." Coming from work primarily
in statistics, applied mathematics, and computer science,
these techniques are sometimes linked to "data mining,"
"machine learning," and "statistical learning." A key idea
behind algorithmic methods is that there is no statistical
model in the usual sense; no effort to made to represent how
the data were generated. And no apologies are made for the
absence of a model. Rather, there is some practical data
analysis problem to solve that is attacked directly with
procedures designed specifically for that purpose. If, for
example, the goal is to determine which prison inmates are
likely to engage in some form of serious misconduct while in
prison (Berk and Baek, 2003), there is a classification
problem is to be addressed. Should the goal be to minimize
some function of classification errors, procedures are
applied with that minimization problem paramount. There is
no need to represent how the data were generated if it is
possible to accurately classify inmates by other means.