Data Mechanics and Integrative Pattern Inferences on Complex Systems                  

Physicist and Nobel Prize winner, Murray Gell-Mann, was quoted in Coveney & Highfield’s 1994 book “Frontiers of Complexity-the search for order in a chaotic world” (p-8) as saying:

We must get away from the idea that serious work is restricted to “beating to death a well-defined problem in a narrow discipline, while broadly integrative thinking is relegated to cocktail parties.”

This vision is more pertinent now than ever before. As Information Technology advances, researchers are able to create many new and revisit even more old systems in human society and in nature.  These new and old systems receive tremendous amounts of new research attentions because of brand new technologies of collecting large amounts of data. Data of various formats, high dimensional point cloud or high frequency time series, or networks of many types and sizes, is collected and explored to shed new lights of authentic understanding into systems of interest.  

Why integrative thinking is desperately needed now is because all these data formats contain complex dependence structures among and between sampled subjects and measured features. Such structural dependences consist of two closely coupled systemic characteristics: deterministic structures and inherent randomness. This concept is one well-known in statistical physics, particularly in the field of Complex System. This concept prescribes scientific endeavors in complex systems as formulating and discovering patterns of deterministic structures and mechanisms of inherent randomness.

In contrast, these two systemic characteristics clearly are in defiance of majority of statistical principles, concepts, modeling and bootstrapping approaches that are based on unrealistic independent and stationary assumptions. Therefore statistical investigations are commonly conducted by ignoring systemic sensitivity. This is why statistical results for complex system are likely unrealistic or even illusory.  

The primary achievement of my research group in the past five years is that I build an integrative foundation for discovering systemic characteristics by combining principles from three perspectives of: 1) Statistical physics: I quantify the deterministic structure through the lowest energy macrostate with respect to a suitably chosen Hamiltonian; 2) Computational geometry: I develop a new computing paradigm, called Data Mechanics, by building two highly coupled Ultrametric trees on row and column axes of any data matrix to bring out its multiscale block patterns as the optimal resolution for the macrostate; 3) Information theory: I take the mutliscale block patterns as minimum sufficient statistics of Kolmogorov’s algorithmic statistics, and construct algorithms for capturing mechanisms for inherent randomness within all involving blocks.

The computed deterministic structures and inherent randomness is then called a coupling geometry on a data matrix.  Such a coupling geometry indeed embraces a matrix version of Kolmogorov complexity. Kolmogorov’s two-part coding scheme then becomes the integrative principle for mimicking or bootstrapping a matrix by retaining its systemic characteristics. That is why such a coupling geometry plays the new foundation role for studying a complex system at a single phase, which is approximated by a network or high dimensional data cloud or time series.

For the study of an entire system, a partial coupling geometry computation is developed to compute the causal and predictive associations linking two adjacent phases. In such a fashion, an evolution of phases can be sequentially linked and studied to manifest systemic understanding.

Network bootstrapping ensembles as technical devices are derived to bear with various scales of structural information, and then give rise to an energy distribution profile. This energy distribution profile becomes the base for inferring and testing hypothesis regarding systemic structure. Then a mutual energy derived from partial coupling geometries plays the role of mutual information in Information Theory to evaluate degrees of association between two adjacent phases.  These computational developments are under one unified theme: Learning-From-Data.

Under this theme, complexity of dependence structure is not only computable, but visible. This visualization surly enhances researchers’ understanding about the system as one whole. I envision the coupling geometry computing is the foundation of Data Science, and is critical in any scientific computing.

Here we list all the real world systems investigated in my research group. They include NCAA Football League with focuses on nonlinear ranking hierarchy and systemic robustness, Rhesus Macaque monkey society with focuses on nonlinear ranking hierarchy and behavioral network interactions, winemaking system with focuses on propagating effects of water-stress on grapes to bottled wine as the ending phase, ecological and biogeographic systems with focuses of mutualism and phylogenetic effects, stock market with focused on networking among and along many dimensional high frequency time series data, and the last, but the least, one is the English words system through the Lewis Carroll’s word game called Doublet, or Word-L adder .

The take-home message is: Via Data mechanics and its coupling geometry, systemic knowledge is computable and visible.

 

[Selected publications on computing]

  1. I.Hierarchical Factor Segmentation (HFS) algorithm 

  1. 1.Hsieh Fushing, C.-R. Hwang, H.-C. Lee, Y.-C. Lan and S.-B. Horng (2006) Testing and mapping non-stationarity in animal behavioral processes: a case study on an individual female bean weevil. J. of Theoretical Biology 238, 805-816. 

  2. 2.Hsieh Fushing, Shu-Chun, Chen and How-Jing Lee. (2009) Computing Circadian     Rhythmic Patterns and Beyond: A New Non-Fourier Analysis. Computational Statistics, 24, 409-430. 

  3. 3. Hsieh Fushing, Emilio Ferrer, Shuchun Chen, and Sy-Miin Chow (2010) Dynamics of dydic interaction I: Exploring non-stationarity of Intra- and inter-individual affective processes via hierarchical segmentation and stochastic small-world networks.  Psychometrika, 75, 351-372. 

  4. 4. Hsieh Fushing, Shu-Chun, Chen and Katherine S., Pollard (2009) A nearly exhaustive search for CpG island on whole chromosome.  Inter. J. Biostatistics. 5, Article 14. 

  5. 5. Hsieh Fushing, Shu-Chun, Chen and How-Jing Lee (2010) Statistical computations on biological rhythms I: dissecting variable cycles and measuring phase shifts in activity event time series. J. of Computational and Graphic Statistics, 19, 221-239. 

  6. 6.Hsieh Fushing, Shu-Chun, Chen and Chii-Ruey, Hwang (2012) Discovering stock dynamics through multidimensional volatility-phases. Quantitative Finance. 12, 213–230 doi:10.1080/14697681003743040.  

  7. 7.Hsieh Fushing, Shu-Chen Chen and Chii-Ruey Hwang. (2010) Non-parametric decoding on discrete time series and its application in bioinformatics.  Statistics in Bioscience. 2, 18-40. 

 

  1. 8.Hsieh Fushing, Emilio Ferrer, Shuchun Chen, Iris B. Mauss and James J.(2011) Examining coherence in emotion response system through network structure and signal transmission. Psychometrika, 76, 124-152. DOI: 10.1007/S11336-010-9194-0.   

  2. 9.Chang, Lo-Bin, Geman, Stuart,  Hsieh, Fushing and Hwang Chii-Ruey.  (2013). Invariance in the recurrence of large returns and the validation of models of price dynamics. Phy. Rev. E, 88, 022116. 

  3. 10.Hsieh Fushing, Shu-Chen Chen and Chii-Ruey Hwang. (2013). Discovering focal regions of slightly-aggregated sparse signals. Computational Statistics, 28, 2295-2308. 

  4. 11.Hsieh Fushing, Shu-Chen Chen and Chii-Ruey Hwang. (2014). Single stock dynamics on high-frequency data: From a compressed coding perspective. PLoS One 9(2): e85018. doi:10.1371/journal.pone.0085018 

 

II  Data Mechanics and Data Cloud Geometry (DCG) Algorithms

  1. 1.Hsieh Fushing and Michael P. McAssey. (2010) Time, temperature and data cloud geometry.  Physics Review E, 82, 061110-10. 

  2. 2.Hsieh Fushing, Michael P. McAssey, Brianne Beisner and Brenda McCowan. (2011). Ranking network of captive Rhesus Macaque society: A sophisticated corporative kingdom. PLoS One, 6, e17817. 

  3. 3.Hsieh Fushing, Michael P. McAssey and Brenda McCowan. (2011). Computing a ranking network with confidence bounds from a graph-based Beta random field. Processing of The Royal Society A. published online, doi: 10.1098/rspa.2011.0268 

  4. 4.Chan, S. Fushing, H. Beisner, B. and McCawan, B. (2013). Joint modeling of multiple social networks to elucidate primate social dynamics: I. Maximum entropy principle and network-based interactions.  PLOS ONE. 8(2): e51903. Doi:10.1371. 

  5. 5.Wang H., Chen Chen and Hsieh Fushing. (2012) Extracting multiscale pattern information of fMRI based functional brain connectivity for diagnosis of autism spectrum disorders. PLOS ONE. 7(10): e45502. Doi: 101371 

  6. 6.Chen Chen and Hsieh Fushing (2012) Multi-scale community geometry in network and its application. Physics Review E. 86, 041120. 

  7. 7.Hsieh Fushing, Wang, H. Van der Waal, K. McCowan, B. and Koehl, P. (2013)  Multi-scale clustering by building a robust and self-correcting ultrametric topology on data points. PLOS ONE 8(2): e56259. Doi: 10.1371. 

  8. 8.Chen*, C-P., Hsieh Fushing, Atwill, R. and Koehl, P. (2014) biDCG: A new method for discovering global features of DNA microarray data via an iterative re-clustering procedure. PLoS One, 9(7): DOI: 10.1371/journal.pone.0102445. 

  9. 9.Hsieh Fushing, Chen, C., Liu, S.-Y. and Koehl, P. (2014). Bootstrapping on undirected binary network via statistical mechanics. J. of Statistical Physics, 156, 823-842. 

  10. 10.Hsieh Fushing and Chen, C. (2014).  Data mechanics and coupling geometry on binary bipartite network. PLoS One, 9(8): e106154. doi:10.1371/journal.pone. 0106154. 

  11. 11.Shev, A., Fujii, K., Fushing Hsieh and McCowan, B. (2014). Systemic testing on Bradley-Terry Model against nonlinear ranking hierarchy. PLoS One 9(12): e115367. doi:10.1371/journal.pone.0115367. 

  12. 12.Hsieh Fushing, Chen C., Hsieh, Y.-C. and Farrell, P. (2014). Lewis Carroll’s Doublets net of English words: network heterogeneity in a complex system. PLoS One 9(12): e114177. doi:10.1371/journal.pone.0114177. 

  13. 13.Hsieh Fushing, Hseuh, C.-H., Heitkamp, C. and Matthews, M. A. (2015). Integrative inferences on the pattern geometries of grapes grown under water stress and their resulting wines. (revised for PLoS One) 

  14. 14.Hsieh Fushing and Fujii, K. (2015). Bootstrapping directed binary network via statistical mechanics. ( Revised for J. of Statistical Physics)