Résumé : Despite the strong prognostic stratification of circulating tumor cells (CTCs) enumeration in metastatic breast cancer (MBC), current clinical trials usually do not include a baseline CTCs in their design. This study aimed to generate a classifier for CTCs prognostic simulation in existing datasets for hypothesis generation in patients with MBC. A K-nearest neighbor machine learning algorithm was trained on a pooled dataset comprising 2436 individual MBC patients from the European Pooled Analysis Consortium and the MD Anderson Cancer Center to identify patients likely to have CTCs ≥ 5/7 mL blood (StageIVaggressive vs StageIVindolent). The model had a 65.1% accuracy and its prognostic impact resulted in a hazard ratio (HR) of 1.89 (Simulatedaggressive vs SimulatedindolentP < .001), similar to patients with actual CTCs enumeration (HR 2.76; P < .001). The classifier's performance was then tested on an independent retrospective database comprising 446 consecutive hormone receptor (HR)-positive HER2-negative MBC patients. The model further stratified clinical subgroups usually considered prognostically homogeneous such as patients with bone-only or liver metastases. Bone-only disease classified as Simulatedaggressive had a significantly worse overall survival (OS; P < .0001), while patients with liver metastases classified as Simulatedindolent had a significantly better prognosis (P < .0001). Consistent results were observed for patients who had undergone CTCs enumeration in the pooled population. The differential prognostic impact of endocrine- (ET) and chemotherapy (CT) was explored across the simulated subgroups. No significant differences were observed between ET and CT in the overall population, both in terms of progression-free survival (PFS) and OS. In contrast, a statistically significant difference, favoring CT over ET was observed among Simulatedaggressive patients (HR: 0.62; P = .030 and HR: 0.60; P = .037, respectively, for PFS and OS).