Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison

Cosma Rohilla Shalizi
Santa Fe Institute
1399 Hyde Park Rd.
Santa Fe, NM 87501, USA
and
Center for the Study of Complex Systems
University of Michigan
Ann Arbor, MI 48109, USA
Kristina Lisa Shalizi
Santa Fe Institute
1399 Hyde Park Rd.
Santa Fe, NM 87501, USA
and
Physics Department
University of San Francisco
2130 Fulton Street
San Francisco, CA 94117, USA
James P. Crutchfield
Santa Fe Institute
1399 Hyde Park Rd.
Santa Fe, NM 87501, USA

ABSTRACT: We present a new algorithm for discovering patterns in time series or other sequential data. In the prior companion work, Part I, we reviewed the underlying theory, detailed the algorithm, and established its asymptotic reliability and various estimates of its data-size asymptotic rate of convergence. Here, in Part II, we outline the algorithm's implementation, illustrate its behavior and ability to discover even "difficult" patterns, demonstrate its superiority over alternative algorithms, and discuss its possible applications in the natural sciences and to data mining.


Kristina Lisa Shalizi, Cosma Rohilla Shalizi, and James P. Crutchfield, "Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison", Journal of Machine Learning Research (2002) to be submitted. [ps.gz]= 313kb [ps]= 1,470kb [pdf]= 360kb
Santa Fe Insitute Working Paper 02-10-XXX. arXiv.org/abs/cs.LG/02XXXXX.