Thèse de doctorat
Résumé : Nowadays, more and more industrial assets are continuously monitored and generate vast amount of event logs and sensor data. Data Mining is the field concerned with the exploration and exploitation of these data. Despite the fact that data mining has been researched for decades, the event log data are still underexploited in most data mining workflows although they could provide valuable insights on the asset behavior as they represent the internal processes of an asset. However, exploitation of event log data is challenging, mainly as: 1) event labels are not consistent across manufacturers, 2) assets report vast amount of data from which only a small part may be relevant, 3) textual event logs and numerical sensor data are usually processed by methods dedicated respectively to textual data or sensor data, methods combining both types of data are still missing, 4) industrial data are rarely labelled, i.e. there is no indication on the actual performance of the asset and it has to be derived from other sources, 5) the meaning of an event may vary depending on the events send after or before.Concretely, this thesis is concerned with the conception and validation of an integrated data processing framework for scalable performance assessment of industrial asset portfolios. This framework is composed of several advanced methodologies facilitating exploitation of both event logs and time series sensor data: 1) an ontology model describing photovoltaic (the validation domain) event system allowing the integration of heterogeneous event generated by various manufacturers; 2) a novel and computationally scalable methodology enabling automatic calculation of event relevancy score without any prior knowledge; 3) a semantically enriched multi-level pattern mining methodology enabling data exploration and hypothesis building across heterogeneous assets; 4) an advanced workflow extracting performance profiles by combining textual event logs and numerical sensor values; 5) a scalable methodology allowing rapid annotation of new asset runs with a known performance label only based on the event logs data.The framework has been exhaustively validated on real-world data from PV plants, provided by our industrial partner 3E. However, the framework has been designed to be domain agnostic and can be adapted to other industrial assets reporting event logs and sensor data.