Temporal sequence learning and data reduction for anomaly detection
- 1 August 1999
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information and System Security
- Vol. 2 (3), 295-331
- https://doi.org/10.1145/322510.322526
Abstract
The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of an individual, system, or network in terms of temporal sequences of discrete data. We present an approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, unordered observations into a metric space via a similarity measure that encodes intra-attribute dependencies. Classification boundaries are selected from an a posteriori characterization of valid user behaviors, coupled with a domain heuristic. An empirical evaluation of the approach on user command data demonstrates that we can accurately differentiate the profiled user from alternative users when the available features encode sufficient information. Furthermore, we demonstrate that the system detects anomalous conditions quickly — an important quality for reducing potential damage by a malicious user. We present several techniques for reducing data storage requirements of the user profile, including instance-selection methods and clustering. As empirical evaluation shows that a new greedy clustering algorithm reduces the size of the user model by 70%, with only a small loss in accuracy.Keywords
This publication has 12 references indexed in Scilit:
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- A multi-component nonlinear prediction system for the S&P 500 indexNeurocomputing, 1996
- Similarity methods in signal processingIEEE Transactions on Signal Processing, 1996
- Pattern Recognition and Neural NetworksPublished by Cambridge University Press (CUP) ,1996
- Instance-Based Learning AlgorithmsMachine Learning, 1991
- INTRODUCTIONPublished by Elsevier ,1990
- Inferring Graphs from WalksPublished by Elsevier ,1990
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Learning regular sets from queries and counterexamplesInformation and Computation, 1987
- An Intrusion-Detection ModelIEEE Transactions on Software Engineering, 1987