Hidden Markov Models for Automated Protocol Learning

Sean Whalen, Matt Bishop, and James P. Crutchfield

Department of Computer Science
Complexity Sciences Center and Physics Department
University of California at Davis
Davis, CA 95616

ABSTRACT: Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or suboptimal.

We apply the ε-machine—a special type of HMM—to the task of constructing network protocol models solely from network traffic. Unlike previous approaches, ε-machine reconstruction infers the minimal HMM architecture directly from data and is well suited to applications such as anomaly detection. We draw distinctions between our approach and previous research, and discuss the benefits and challenges of ε-machines for protocol model inference.

S. Whalen, M. Bishop, and J. P. Crutchfield, "Hidden Markov Models for Automated Protocol Learning", S. Jajodia and J. Zhou (Eds.), SecureComm 2010, LNICST 50 (2010) 415-42
[pdf] 197 kB
Santa Fe Institute Working Paper 09-11-XXX.
arXiv:0911.XXXX [cs.NI].