Data Mechanics and Integrative Pattern Inferences on Complex Systems

Physicist and Nobel Prize winner, Murray Gell-Mann, was quoted in Coveney & Highfield’s 1994 book “Frontiers of Complexity-the search for order in a chaotic world” (p-8) as saying:

We must get away from the idea that serious work is restricted to “beating to death a well-defined problem in a narrow discipline, while broadly integrative thinking is relegated to cocktail parties.”

This vision is more pertinent now than ever before. As Information Technology advances, researchers are able to create many new and revisit even more old systems in human society and in nature. These new and old systems receive tremendous amounts of new research attentions because of brand new technologies of collecting large amounts of data. Data of various formats, high dimensional point cloud or high frequency time series, or networks of many types and sizes, is collected and explored to shed new lights of authentic understanding into systems of interest.

Why integrative thinking is desperately needed now is because all these data formats contain complex dependence structures among and between sampled subjects and measured features. Such structural dependences consist of two closely coupled systemic characteristics: deterministic structures and inherent randomness. This concept is one well-known in statistical physics, particularly in the field of Complex System. This concept prescribes scientific endeavors in complex systems as formulating and discovering patterns of deterministic structures and mechanisms of inherent randomness.

In contrast, these two systemic characteristics clearly are in defiance of majority of statistical principles, concepts, modeling and bootstrapping approaches that are based on unrealistic independent and stationary assumptions. Therefore statistical investigations are commonly conducted by ignoring systemic sensitivity. This is why statistical results for complex system are likely unrealistic or even illusory.

The primary achievement of my research group in the past five years is that I build an integrative foundation for discovering systemic characteristics by combining principles from three perspectives of: 1) Statistical physics: I quantify the deterministic structure through the lowest energy macrostate with respect to a suitably chosen Hamiltonian; 2) Computational geometry: I develop a new computing paradigm, called Data Mechanics, by building two highly coupled Ultrametric trees on row and column axes of any data matrix to bring out its multiscale block patterns as the optimal resolution for the macrostate; 3) Information theory: I take the mutliscale block patterns as minimum sufficient statistics of Kolmogorov’s algorithmic statistics, and construct algorithms for capturing mechanisms for inherent randomness within all involving blocks.

The computed deterministic structures and inherent randomness is then called a coupling geometry on a data matrix. Such a coupling geometry indeed embraces a matrix version of Kolmogorov complexity. Kolmogorov’s two-part coding scheme then becomes the integrative principle for mimicking or bootstrapping a matrix by retaining its systemic characteristics. That is why such a coupling geometry plays the new foundation role for studying a complex system at a single phase, which is approximated by a network or high dimensional data cloud or time series.

For the study of an entire system, a partial coupling geometry computation is developed to compute the causal and predictive associations linking two adjacent phases. In such a fashion, an evolution of phases can be sequentially linked and studied to manifest systemic understanding.

Network bootstrapping ensembles as technical devices are derived to bear with various scales of structural information, and then give rise to an energy distribution profile. This energy distribution profile becomes the base for inferring and testing hypothesis regarding systemic structure. Then a mutual energy derived from partial coupling geometries plays the role of mutual information in Information Theory to evaluate degrees of association between two adjacent phases. These computational developments are under one unified theme: Learning-From-Data.

Under this theme, complexity of dependence structure is not only computable, but visible. This visualization surly enhances researchers’ understanding about the system as one whole. I envision the coupling geometry computing is the foundation of Data Science, and is critical in any scientific computing.

Here we list all the real world systems investigated in my research group. They include NCAA Football League with focuses on nonlinear ranking hierarchy and systemic robustness, Rhesus Macaque monkey society with focuses on nonlinear ranking hierarchy and behavioral network interactions, winemaking system with focuses on propagating effects of water-stress on grapes to bottled wine as the ending phase, ecological and biogeographic systems with focuses of mutualism and phylogenetic effects, stock market with focused on networking among and along many dimensional high frequency time series data, and the last, but the least, one is the English words system through the Lewis Carroll’s word game called Doublet, or Word-L adder .

The take-home message is: Via Data mechanics and its coupling geometry, systemic knowledge is computable and visible.

[Selected publications on computing]

II Data Mechanics and Data Cloud Geometry (DCG) Algorithms