Massive data discrimination via linear support vector machines
- 1 January 2000
- journal article
- research article
- Published by Taylor & Francis in Optimization Methods and Software
- Vol. 13 (1), 1-10
- https://doi.org/10.1080/10556780008805771
Abstract
A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimen-sional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to an optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problemsKeywords
This publication has 7 references indexed in Scilit:
- A Tutorial on Support Vector Machines for Pattern RecognitionData Mining and Knowledge Discovery, 1998
- Breast Cancer Diagnosis and Prognosis Via Linear ProgrammingOperations Research, 1995
- The Nature of Statistical Learning TheoryPublished by Springer Nature ,1995
- Robust linear programming discrimination of two linearly inseparable setsOptimization Methods and Software, 1992
- Linear and Nonlinear Separation of Patterns by Linear ProgrammingOperations Research, 1965
- A Linear Programming Approach to the Cutting-Stock ProblemOperations Research, 1961
- Decomposition Principle for Linear ProgramsOperations Research, 1960