par Saerens, Marco
Référence Speech communication, 12, 4, page (321-333)
Publication Publié, 1993-08
Article révisé par les pairs
Résumé : When using hidden Markov models for speech recognition, it is usually assumed that the probability that a particular acoustic vector is emitted at a given time only depends on the current state and the current acoustic vector observed. In this paper, we introduce another idea, i.e., we assume that, in a given state, the acoustic vectors are generated by a continuous Markov process. Indeed, the time evolution of the acoustic vector is inherently dynamic and continuous, and sampling only occurs for the purpose of computation. This allows us to assign a probability density to the time trajectory of the acoustic vector inside the state, reflecting the probability that this particular path has been generated by the continuous Markov process associated with this state. Roughly speaking, it measures the "adequacy" of the observed trajectory with respect to an ideal trajectory, which is modelled by a vectorial linear differential equation. This model is introduced in order to describe the dynamic behaviour of the acoustic vector inside a state. Once the segmentation is fixed, reestimation formulae for the parameters of the continuous Markov process are derived for the Viterbi algorithm. As usual, the segmentation can be obtained by sampling the continuous process, and by applying dynamic programming to find the best path over all the possible sequences of states and all the possible durations. Finally, we sketch a possible generalization to path mixtures, for which different trajectories are available in each state. However, we have to stress that no experimental results are available at present. Indeed, we did not have the opportunity to test the algorithm on real speech. We are aware of the fact that the assumptions we did may not be appropriate for the modelling of speech. © 1993.