par Ge, Wendong;Jing, Jin;An, Sungtae;Herlopian, Aline;Ng, Marcus;Struck, Aaron A.F.;Appavu, Brian;Johnson, Emily E.L.;Osman, Gamaleldin;Haider, Hiba Arif;Karakis, Ioannis;Kim, Jennifer Ahjin;Halford, Jonathan J.J.;Dhakar, Monica M.B.;Sarkis, Rani;Swisher, Christa C.B.;Schmitt, S.E.;Lee, Jongwoo J.W.;Tabaeizadeh, Mohammad;Rodriguez, Andres;Gaspard, Nicolas ;Gilmore, Emily Jean;Herman, Susan S.T.;Kaplan, Peter P.W.;Pathmanathan, Jay;Hong, Shenda;Rosenthal, Eric E.S.;Zafar, Sahar S.F.;Sun, Jimeng;Westover, Michael Brandon
Référence Journal of neuroscience methods, 351, 108966
Publication Publié, 2021-03-01
Référence Journal of neuroscience methods, 351, 108966
Publication Publié, 2021-03-01
Article révisé par les pairs
Résumé : | Objectives: Seizures and seizure-like electroencephalography (EEG) patterns, collectively referred to as “ictal interictal injury continuum” (IIIC) patterns, are commonly encountered in critically ill patients. Automated detection is important for patient care and to enable research. However, training accurate detectors requires a large labeled dataset. Active Learning (AL) may help select informative examples to label, but the optimal AL approach remains unclear. Methods: We assembled >200,000 h of EEG from 1,454 hospitalized patients. From these, we collected 9,808 labeled and 120,000 unlabeled 10-second EEG segments. Labels included 6 IIIC patterns. In each AL iteration, a Dense-Net Convolutional Neural Network (CNN) learned vector representations for EEG segments using available labels, which were used to create a 2D embedding map. Nearest-neighbor label spreading within the embedding map was used to create additional pseudo-labeled data. A second Dense-Net was trained using real- and pseudo-labels. We evaluated several strategies for selecting candidate points for experts to label next. Finally, we compared two methods for class balancing within queries: standard balanced-based querying (SBBQ), and high confidence spread-based balanced querying (HCSBBQ). Results: Our results show: 1) Label spreading increased convergence speed for AL. 2) All query criteria produced similar results to random sampling. 3) HCSBBQ query balancing performed best. Using label spreading and HCSBBQ query balancing, we were able to train models approaching expert-level performance across all pattern categories after obtaining ∼7000 expert labels. Conclusion: Our results provide guidance regarding the use of AL to efficiently label large EEG datasets in critically ill patients. |