Résumé : In this work, we predict the likelihood that a retail customer starts buying groceriesonline. We rely on customer demographics and behavioral variables from transactionaldata to understand what are the critical elements that will lead a customer to start usingthis service.Based on personal data collected from information provided by customers when enrollingin a loyalty program and historical shopping activity associated with these loyalty cards,we can identify which customers started using e-commerce and what reasons are leadingto the use of this shopping and delivery option.This information is critical to Delhaize to support its long-term strategy to grow the ecommerce segment by promoting this behavior using targeted communications andcampaigns.Following business objectives from Delhaize we approached this challenge as asupervised classification problem.Several machine learning algorithms were tested early in the process using unbalanceddata with little success, eventually, we tested logistic regression, random forest and aboosting algorithm (Adaboost) with different sampling techniques but as well ensemblealgorithms to compare their prediction power using classification metrics as AUC and liftto measure the potential success of a marketing campaign versus a random campaign.But before applying the predictors, we required some preprocessing due to the imbalanceof data. Different processes for over-sampling and under-sampling and ensembletechniques were followed to overcome this challenge.In total, 16 of these sampling techniques were followed narrowing their number in phasestesting with different dataset sizes until reaching the 3 most performant samplingtechniques that we tested with to 3 different learning algorithms and 3 ensemblealgorithms.Other steps that are critical in this process is feature selection and engineering.Working closely with the subject matter experts in the e-commerce team in Delhaize hasbeen instrumental to identify which elements bring value to the predictors.Mixing knowledge from subject matter experts, research done in the e-commerce domainand machine learning techniques for identifying features with high predictive power,have been fundamental to achieve reasonable scores in the metrics for the learneralgorithms.We will discuss the results obtained using the best sampling techniques for this specificdataset and the learning algorithms using validation processes.Eventually, we will provide a recommendation in terms of the best combination ofsampling technique and learning algorithm and we will comment about the businessbenefits of the outcome of the model and potential future work to tackle this problem in abroader scope.