What does it mean for a natural system to be structured? How can scientific insight be extracted from complex data sets? My research interests lie at the intersection of these two questions.

Complex behaviors arise from an intricate intertwining of interactions and influence among system components. Unlike their equilibrium and near-equilibrium counterparts, there is no general theory to predict what patterns and structures may emerge in far-from-equilibrium systems. Each system behaves differently; details and history matter. The complex behaviors that emerge cannot be explicitly described mathematically, nor can they be directly deduced from the governing equations (e.g. what is the mathematical expression for a hurricane, and how can you derive it from the equations of a general circulation climate model?).

Though we don't know how to analytically derive complex emergent behaviors from their governing equations, we can numerically solve the equations to simulate the behaviors. While practically very useful, this does not generally help in elucidating the physical and causal mechanisms underlying these behaviors. Just because we can simulate hurricanes with complicated climate models does not mean we now fully understand hurricanes. More notoriously, turbulence remains a persistent mystery despite decades of numerical studies. The tangled web of interactions in a computer simulation can be just as impenetrable to us as it is in a natural system. *Data, in and of itself, is not understanding.* And science is now inundated with data. With it comes an enormous opportunity for scientific discovery, but also a great challenge. How do we gain scientific insight and understanding from all this data?

A new data-driven paradigm is taking shape, centered around machine learning (ML), as a means of circumventing the difficulties complex systems pose to traditional scientific inquiry. However, the requirements for scientific discovery and understanding are quite different from those for commercial application where ML has been remarkably successful. Current ML models and techniques can not be simply ported over to tackle outstanding scientific problems without incorporating physical insights. For instance, ML has been most successful in the supervised setting, where the model learns from ground-truth labels in a training data set. Many scientific problems however, such as coherent structure discovery, simply do not have a ground-truth and thus supervised approaches can not even be applied.

Computational mechanics is a behavior-driven theory of complex emergent behavior, and provides an unsupervised ML methodology for scientific data analysis. Based on a notion of *intrinsic computation*, computational mechanics synthesizes statistical mechanics with computation theory to formalize pattern as generalized symmetry. For a model to optimally predict a system's behavior with minimal computational resources, that model must capture pattern and structure in the system's behaviors. Computational mechanics makes this idea operational through the *causal equivalence relation* that uniquely defines the minimal model for optimal prediction directly from observed data without reference to the governing equations.

My research has focused on the computational mechanics of spatiotemporal systems. Local models are built using *lightcones* as local notions of pasts and futures to create a local causal equivalence relation. These models, the **local causal states**, are assigned to each point in spacetime and as such create a latent spacetime field that shares a coordinate geometry with the associated observable field. Symmetries in the latent field correspond to generalized symmetries in the observable field. From this, *coherent structures* can be formally defined as locally-broken (generalized) symmetries in spacetime. I have applied this methodology to discover coherent structures in cellular automata models and Lagrangian coherent structures in complex fluid flows. Application to fluid flows requires high-performance computing. Through collaboration with Berkeley Lab and Intel, we created an HPC implementation in Python, known as Project DisCo (**Dis**covery of **Co**herent Structures). This work was chosen for an HPC Innovation Excellence Award.

For more details, see my research page.