Support Vector Machines for Dyadic Data
- 1 June 2006
- journal article
- Published by MIT Press in Neural Computation
- Vol. 18 (6), 1472-1510
- https://doi.org/10.1162/neco.2006.18.6.1472
Abstract
We describe a new technique for the analysis of dyadic data, where two sets of objects (row and column objects) are characterized by a matrix of numerical values that describe their mutual relationships. The new technique, called potential support vector machine (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the column objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the row rather than the column objects and can handle data and kernel matrices that are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks; the second scheme leads to the selection of a small, informative set of row support objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real-world data sets. The results show that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as true dyadic data sets. In addition, a theoretical justification is provided for the new approach.Keywords
This publication has 26 references indexed in Scilit:
- Bayesian Support Vector Regression Using a Unified Loss FunctionIEEE Transactions on Neural Networks, 2004
- Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficientJournal of the American Society for Information Science and Technology, 2003
- The PROSITE database, its status in 2002Nucleic Acids Research, 2002
- The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics, 2001
- Approximate Statistical Tests for Comparing Supervised Classification Learning AlgorithmsNeural Computation, 1998
- Selection of relevant features and examples in machine learningArtificial Intelligence, 1997
- Role of Chromosome Territories in the Functional Compartmentalization of the Cell NucleusCold Spring Harbor Symposia on Quantitative Biology, 1993
- Mapping intellectual structure of a scientific subfield through author cocitationsJournal of the American Society for Information Science, 1990
- Sequencing of megabase plus DNA by hybridization: Theory of the methodGenomics, 1989
- A novel method for nucleic acid sequence determinationJournal of Theoretical Biology, 1988