Statistical Properties of Large Margin Classifiers

Peter Bartlett
BIOwulf Technologies
(visiting UC Berkeley)
 

Many successful pattern classification algorithms can be viewed as large margin classifiers: they use thresholded real-valued functions, and choose these functions by minimizing a cost function involving the margin (the amount by which the function's value lies to the correct side of the threshold).  Voting methods such as adaboost, kernel methods such as support vector machines, neural network classifiers, and logistic regression can all be viewed in this way.  This talk will present recent results on the convergence of the risk (misclassification probability) of large margin classifiers.

Finding a classifier from some class with risk near the minimal value over that class is computationally infeasible for many interesting classes.  As a consequence of our results, we show that large margin classification algorithms can be viewed as computationally feasible approaches to approximately minimizing risk, where the notion of approximation error depends on the margin cost function.

We also show that under Tsybakov's low noise assumption (for instance, that the conditional probability of one class is bounded away from 1/2), the risk converges surprisingly quickly: as quickly as the rates exhibited by Tsybakov using the computationally infeasible approach of empirical risk minimization.

(joint work with Jon McAuliffe and Mike Jordan)