Support Vector Machines for Dyadic Data

1 June 2006

journal article
Published by MIT Press in Neural Computation

Vol. 18 (6), 1472-1510
https://doi.org/10.1162/neco.2006.18.6.1472

Abstract

We describe a new technique for the analysis of dyadic data, where two sets of objects (row and column objects) are characterized by a matrix of numerical values that describe their mutual relationships. The new technique, called potential support vector machine (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the column objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the row rather than the column objects and can handle data and kernel matrices that are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks; the second scheme leads to the selection of a small, informative set of row support objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real-world data sets. The results show that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as true dyadic data sets. In addition, a theoretical justification is provided for the new approach.

Keywords

This publication has 26 references indexed in Scilit:

Bayesian Support Vector Regression Using a Unified Loss Function
IEEE Transactions on Neural Networks, 2004
Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient
Journal of the American Society for Information Science and Technology, 2003
The PROSITE database, its status in 2002
Nucleic Acids Research, 2002
The control of the false discovery rate in multiple testing under dependency
The Annals of Statistics, 2001
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
Neural Computation, 1998
Selection of relevant features and examples in machine learning
Artificial Intelligence, 1997
Role of Chromosome Territories in the Functional Compartmentalization of the Cell Nucleus
Cold Spring Harbor Symposia on Quantitative Biology, 1993
Mapping intellectual structure of a scientific subfield through author cocitations
Journal of the American Society for Information Science, 1990
Sequencing of megabase plus DNA by hybridization: Theory of the method
Genomics, 1989
A novel method for nucleic acid sequence determination
Journal of Theoretical Biology, 1988

Cited by 64 articles