par Awiti, Judith
Référence Communications in computer and information science, 1064, page (539-545)
Publication Publié, 2019-06-01
Article révisé par les pairs
Résumé : ETL processes are responsible for extracting, transforming and loading data from data sources into a data warehouse. Currently, managing ETL workflows has some challenges. First, each ETL tool has its own model for specifying ETL processes. This makes it is difficult to specify ETL processes that are beyond the capabilities of a chosen tool or switch between ETL tools without having to redesign the entire ETL workflow again. Second, a change in structure of a data source leads to ETL workflows that can no longer be executed and yields errors. Therefore, we propose a logical model for ETL processes that makes it feasible to (semi-)automatically repair ETL workflows. Our first approach is to specify ETL processes using Relational Algebra extended with update operations. This way, ETL processes can be automatically translated into SQL queries to be executed into any relational database management system. Later, we will consider expressing ETL tasks by means of an Extensible Markup Language (XML) and other programming languages. We also propose the Extended Evolving-ETL (E3TL) framework in which we will develop algorithms for (semi-) automatic repair of ETL workflows upon data source schema changes.