Vowel duration as temporal regulator of syllable

 

Pierre Durand, Anna Durand-Deska

Université de Provence

 

In the frame of Polish TTS synthesis, we investigate components of the speech wave in order to feed MBROLA device with required data. The paper addresses the problem of duration exponents of sentence in connected speech in Polish. Contrary to claims made in studies based on single words, or short phrases devoid of situational context, the analysis of natural and synthesised speech samples performed in this study suggest that duration does play a role in the perception of stress in connected speech in Polish and can, furthermore, have important pragmatic effects. The data we are going to analyse are extracted from sentences of various size included in 40 short monologues out of Polish version of the BABEL database. The 60 speakers included a range of female and male voices as well as vocal strategies. They were asked to read monologues in a dramatised fashion in order to introduce context sensitive cues, and avoid those specific to reading. Questions to be produced by the subjects were placed at beginning of the monologues, or were preceded by an assertion or another question. This ensured that a wider range of context sensitive prosodic cues would be used. Out of this corpus 13 interrogative sentences with “Czy...?” were extracted. If the reading task can give “laboratory speech”, recording rules, and selection of speakers out of the 60 recorded ensure the selected items to be modern casual Polish representative. In fact, from such a database, the drawback remains the number of parameters to take into account even if the speakers selection, introduces a kind of normalization. If assertive sentences can be uttered without active hearer participation, interrogative ones can be spoken without listener from which speaker intend to get verbal or non-verbal answer. They have more natural prosodic characteristics, with duration clues specific of speech communication. Analysed questions are extracted from monologs composed of five or six sentences. Monologs constitute a single unit from semantic and communicative point of vue. Sentences of them interact on semantic, prosodic, rhythmic and pragmatic levels.

 

In this paper, we intend to investigate how speakers use durations in sentences of various length, at syllabic and segmental level and bring to light relation between these durations. Evidence of the importance of segment duration is given by synthesis (Durand & al. 2003). Analysis of sentences shows a strong tendency to divide the sequence in a string of long-short syllables from the place of the lexical stress. Vocalic durations show important variations, whereas consonant one is mainly function of the components of syllable onset and coda. Given the various levels of constraints on the peripheral realization, vowel duration variations allows the syllable to be a long or a short one, even if this “alternation” is a consequence of articulatory constraints linked to word or focal stress. To highlight this “vocalic temporal regulation”, it is possible to take into account two normalizations. First, the speech rate, that allows comparison between sentences with different tempo. Next, the intersyllabic duration difference, for each syllable is heard by comparison with the former one.

 

Home | Abstracts