par Aichinger, Philipp;Hagmuller, Martin;Schneider-Stickler, Berit;Schoentgen, Jean ;Pernkopf, Franz
Référence IEEE/ACM Transactions on Audio Speech and Language Processing, 26, 2, page (330-341)
Publication Publié, 2018
Article révisé par les pairs
Résumé : Diplophonia is a type of pathological voice in which two fundamental frequencies fo are present simultaneously. Specialized audio analyzers that can handle up to two fos in diplophonic voices are in their infancy. We propose the tracking of up to two fo s in diplophonic voices by audio waveform modeling AWM, which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference fo-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple fos. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52, whereas the most accurate benchmark achieves 11.11. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.