Article révisé par les pairs
Résumé : As data-intensive techniques proliferate across many scientific disciplines, new criteria for more objective interpretation and a priori evaluation are required to reconcile data-driven results with understanding of the underlying physics. Many unsupervised tools are used by researchers in the framework of combustion science to simplify models, speed up calculations, and discover hidden patterns in data. Heuristic criteria and rules of thumb are primarily used to select appropriate settings for such data-driven tools, particularly in unsupervised learning. This can lead to the choice of sub-optimal models, which can be difficult to interpret. For this reason, the present study aims to provide new guidelines for evaluating and interpreting problems when clustering and dimensionality reduction techniques are used in conjunction. In particular, the Vector Quantization Principal Component Analysis (VQPCA) algorithm is an ensemble of both techniques and has demonstrated its effectiveness in various combustion applications. However, more objective criteria are needed for the comparative evaluation of different unsupervised solutions for a given test case. This can reduce the level of user expertise required in the hyperparameters selection process. In this study, a novel definition of a case-independent, projection-based index for the comparative evaluation of low-dimensional manifold projections is presented. The proposed index was tested on a hierarchy of datasets from simple synthetic data with “known answer” to more complex combustion-related datasets, namely experimental piloted flames at different Reynolds numbers, a Direct Numerical Simulation (DNS) of n-heptane in Homogeneous Charge Compression Ignition (HCCI) conditions, and finally a DNS of a turbulent lifted hydrogen flame in heated coflow. Results demonstrate the effectiveness of the index in automatically choosing solutions that exhibit optimal trade-offs in model complexity and performance. Furthermore, the index was able to assist the user in distinguishing between physically meaningful and redundant or unexplainable solutions. Novelty and significance statement The novelty of this work is represented by a new index for comparing unsupervised learning solutions involving clustering and dimensionality reduction. The index allows to select physically relevant, interpretable, and well-performing models, guiding in the hyperparameter selection. This represents an important step towards ì adaptive simulation approaches for reduced-order combustion simulations.