Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models

D. R. Upper
Mathematics Department
University of California
Berkeley, California 94720 USA

ABSTRACT: Hidden Markov Models (HMMs) are widely used in pattern recognition applications, most notably speech recognition. However, they have been studied primarily on a practical level, with the HMM matrices as the fundamental objects and without considering the viewpoint of an observer trying to accurately predict the future output. This dissertation is a study of the processes represented by HMMs using the concepts and techniques of stochastic automata derived from the study of dynamical systems and their complexity. The goal is to understand these processes in the language of stochastic automata. Along the way, certain ideas of stochastic automata are characterized in a measure-theoretic manner.

We begin by defining a process to be a stationary measure space of bi-infinite sequences. We define a process state to be a conditional distribution on the future of a process which corresponds to the state of knowledge held by an observer who has seen some or all of the process's history. This definition is similar in spirit to ideas used in dynamical systems, and it is a formalization of the notion of a "deterministic state" used in automata theory. We describe the process states of HMM processes. And we give a necessary condition for a process to have an HMM representation.

Following [Balasubramanian] and [Ito et al], we define Generalized Hidden Markov Models (GHMMs). These are structurally and operationally the same as HMMs, except that parameters which are interpreted as probabilities in defining HMMs are allowed to be negative in GHMMs. We describe necessary and sufficient conditions for two GHMMs (and thus two HMMs) to represent the same process, and and we give a method for finding the smallest possible GHMM equivalent to a given one.

Going further, we give an algorithm for constructing a GHMM that represents a process from the probabilities that process assigns to words. We prove that, for every process, either the algorithm constructs a GHMM that represents the given process or that no such representation exists. This characterizes the set of process representable by GHMMs. Finally, we describe an implementation of this algorithm which constructs GHMMs from sample sequences.

D. R. Upper, "Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models", Ph.D. Dissertation, Mathematics Department, University of California, Berkeley (February 1997). [ps.gz]= 895kb zipped [pdf]= 1,672kb

In parts:
Chapters 1-2: [ps.gz]= 182kb zipped [pdf]= 284kb
Chapter 3: [ps]= 298kb [pdf]= 487kb
Chapter 4: [ps]= 205kb [pdf]= 419kb
Chapters 5-6: [ps]= 160kb [pdf]= 290kb
End: [ps]= 112kb [pdf]= 213kb