Massive data discrimination via linear support vector machines

1 January 2000

journal article
research article
Published by Taylor & Francis in Optimization Methods and Software

Vol. 13 (1), 1-10
https://doi.org/10.1080/10556780008805771

Abstract

A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimen-sional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to an optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems

Keywords

This publication has 7 references indexed in Scilit:

A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery, 1998
Breast Cancer Diagnosis and Prognosis Via Linear Programming
Operations Research, 1995
The Nature of Statistical Learning Theory
Published by Springer Nature ,1995
Robust linear programming discrimination of two linearly inseparable sets
Optimization Methods and Software, 1992
Linear and Nonlinear Separation of Patterns by Linear Programming
Operations Research, 1965
A Linear Programming Approach to the Cutting-Stock Problem
Operations Research, 1961
Decomposition Principle for Linear Programs
Operations Research, 1960

Cited by 125 articles