Temporal sequence learning and data reduction for anomaly detection

1 August 1999

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Information and System Security

Vol. 2 (3), 295-331
https://doi.org/10.1145/322510.322526

Abstract

The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of an individual, system, or network in terms of temporal sequences of discrete data. We present an approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, unordered observations into a metric space via a similarity measure that encodes intra-attribute dependencies. Classification boundaries are selected from an a posteriori characterization of valid user behaviors, coupled with a domain heuristic. An empirical evaluation of the approach on user command data demonstrates that we can accurately differentiate the profiled user from alternative users when the available features encode sufficient information. Furthermore, we demonstrate that the system detects anomalous conditions quickly — an important quality for reducing potential damage by a malicious user. We present several techniques for reducing data storage requirements of the user profile, including instance-selection methods and clustering. As empirical evaluation shows that a new greedy clustering algorithm reduces the size of the user model by 70%, with only a small loss in accuracy.

Keywords

This publication has 12 references indexed in Scilit:

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
A multi-component nonlinear prediction system for the S&P 500 index
Neurocomputing, 1996
Similarity methods in signal processing
IEEE Transactions on Signal Processing, 1996
Pattern Recognition and Neural Networks
Published by Cambridge University Press (CUP) ,1996
Instance-Based Learning Algorithms
Machine Learning, 1991
INTRODUCTION
Published by Elsevier ,1990
Inferring Graphs from Walks
Published by Elsevier ,1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
Learning regular sets from queries and counterexamples
Information and Computation, 1987
An Intrusion-Detection Model
IEEE Transactions on Software Engineering, 1987

Cited by 241 articles