Thèse de doctorat
Résumé : Within the context of the assessment of laryngeal function, acoustic analysis has an important place because the speech signal may be recorded non-invasively and it forms the base on which the perceptual assessment of voice is founded. Given the limitations of perceptual ratings, one has investigated vocal cues of disordered voices that are clinically relevant, summarize properties of speech signals and report on a speaker's phonation in general and voice in particular. Ideally, the acoustic descriptors should also be correlates of auditory-perceptual ratings of voice. Generally speaking, the goal of acoustic analysis is to document quantitatively the degree of severity of a voice disorder and monitor the evolution of the voice of dysphonic speakers.

The first part of this thesis is devoted to the analysis of disordered connected speech. The aim is to investigate vocal cues that are clinically relevant and correlated with auditory-perceptual ratings. Two approaches are investigated. The variogram-based method in the temporal domain is addressed first. The second approach is in the cepstral domain. In particular, the first rahmonic amplitude is used as an acoustic cue to describe voice quality. A multi-dimensional approach combining temporal and spectral aspects is also investigated. The goal is to check whether acoustic cues in both domains report complementary information when predicting perceptual scores.

Both methods are tested first on a corpus of synthetic sound stimuli that has been obtained by means of a synthesizer of disordered voices. The purpose is to learn about the link between the signal properties (fixed by the synthesis parameters) and acoustic cues.

In this study, we had the opportunity to use two large natural speech corpora. One of them has been perceptually rated.

The final part of the text is devoted to the automatic classification of voice with regard to perceived voice quality. Many studies have proposed a binary (normal/pathological) classification of voice samples. An automatic categorization according to perceived degrees of hoarseness appears, however, to be more attractive to both clinicians and technologists and more likely to be clinically relevant. Indeed, one way to reduce inter-rater variability of an auditory-perceptual evaluation is to ask several experts to participate and then to average the perceptual scores. However, auditory-perceptual evaluation of a corpus by several judges is a very laborious, time-consuming and costly task. Making this perceptual evaluation task automatic is therefore desirable.

The aim of this study is to exploit the support vector machine classifier that has become, over the last years, a popular tool for classification, to carry out categorization of voices according to perceived degrees of hoarseness.