HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning

Kontaxakis, Antonios; Sacharidis, Dimitris; Simitsis, Alkis; Abelló, Alberto; Nadal, Sergi

doi:doi/10.1109/ICDE60146.2024.00024

Citer

HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning

;Sacharidis, Dimitris ;Simitsis, Alkis ;Abelló, Alberto ;Nadal, Sergi
Référence Proceedings - International Conference on Data Engineering, page (221-234)
Publication Publié, 2024-09-01

Article révisé par les pairs

Résumé :

We present HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem. A thorough experimental evaluation shows that HYPPO results in plans that are typically one order (up to two orders) of magnitude faster and cheaper than the non-optimized pipeline and considerably (up to one order of magnitude) faster and cheaper than plans generated by the state of the art when materializing artifacts is possible. Lastly, our evaluation reveals that HYPPO reduces the cost by 3-4× even when materialization cannot be exploited.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning

Documents en relation

DI-fusion