Deliberate mispronunciation in EFL e-dictionaries: integrating PDI with TTS

Włodzimierz Sobkowiak

PLM'2007

 

 

Contents:

1. TTS for EFL?

      (a) feasible?

      (b) examples

      (c) simulating a foreign accent with TTS?

2. PDI – predicted difficulty

3. The 20 words – perceived difficulty

4. The 20 words – attested difficulty

5. PDI TTS in EFL MRDs

6. Synthesizing examples/definitions?

 

Abstract

The Phonetic Difficulty Index (PDI) is a quantitative/qualitative measure of word pronouncing difficulty to L1 learners of a given L2.  Specifically, in its current implementation (see http://ifa.amu.edu.pl/~swlodek/public.htm for bibliography), it assigns numerical (0-10 range) and difficulty (57 pronouncing problems) Polglish-sensitive tags to an English word-list or text.  The range of applications of the current version of PDI extends from evaluation of pedagogical materials, such as texts, word-lists, dictionaries, etc., in terms of phonetic difficulty, to generation of word-lists meeting user-specified phonetic criteria for teaching, learning, testing and materials preparation.

One application of PDI which has not so far been considered is in modeling learners' pronunciation of English lexical items through deliberately mispronouncing e-dictionary entries in ways characteristic of the given L1, in this case – Polish, or, more accurately, Polglish, i.e. the Polish-English interlanguage of Polish learners of English as a foreign language (EFL).  The rationale of this project is as follows.  EFL learners often have problems perceiving the phonetic difference between their 'accented' pronunciation of a given lexical item and the native speaker model.  The modern techniques offered by contemporary e‑dictionaries of allowing the learner to record his/her pronunciation to compare audially or visually with the recorded native model may not work in this situation.  Demonstrating an actual Polglish mispronunciation of the word alongside the correct native version, spoken in the same voice and keeping all the other phonetic variables constant, might be more useful.  This has not been feasible so far in e‑dictionaries: no professional native English speaker could be expected to persuasively mimic Polglish mispronunciation, not to mention the cost of such a procedure.  With PDI and Text-to-Speech synthesis (TTS) we have the two key technologies to make such believable mispronunciations possible.  PDI identifies for each lexical entry in an e‑dictionary expected Polglish mispronunciations, generates a mispronounced phonetic representation in the orthographic or transcriptional form, and passes it on to the TTS module for conversion into audio.  The model and the mispronunciation can now be audially produced on the fly, with no need for prior recording with human speakers.  The exact mispronunciation can be controlled down to minute phonetic detail to suit the proficiency level and phonetic idiosyncracies of the user (as constructed by the user-modeling component of the dictionary) or the pedagogical agenda of the learner/teacher (for example, the amount of final obstruent voicing in English can be exaggerated).

 

1. TTS for EFL?

"Not only is (top-quality) synthesized speech intelligible and natural, but it can also actually function as a model of pronunciation.  For example, Filoglossia, a CALL package with (modern) Greek as a foreign language, already employs TTS synthesis: http://www.ilsp.gr/filoglossia_plus_eng.html, and WordPilot from http://www.compulang.com/, also has this feature" (Sobkowiak 2003).

"ScanSoft's RealSpeak™ Word uses a ground-breaking approach to text-to-speech to achieve superb quality speech output from a dictionary of words and idioms, allowing language learners to hear how words should be accurately pronounced" (http://www.nuance.com/realspeak/word/).

"TTS applications can render many benefits to EFL students while making teachers job easier. I have found that my students have improved their pronunciation since I started using them in my classes, not to mention that they have become more autonomous" (González 2007).

 

 

Examples of ScanSoft and IvoSoftware synthesis (web downloads):

My name is Radek.  I welcome all present in hall C-1 at professor Sobkoviak's lecture.  My task is convincing you about the high level of * speech synthesis.

"My name is Radek" from ScanSoft (http://www.scansoft.com/realspeak/demo)

"My name is Radek" from IvoSoftware (http://www.ivosoftware.com/ivonaonline.php)

"Nazywam się Radek" from ScanSoft (http://www.scansoft.com/realspeak/demo)

"Nazywam się Radek" from IvoSoftware (http://www.ivosoftware.com/ivonaonline.php)

 

 

"Simulating a foreign accent of English by computer for didactic purposes is not a new idea.  In 1997 Hyouk-Keun Kim created his Korean Accented English Pronunciation Simulator (http://english-korean.net/kaeps/index.html), rightly noticing that "Most adult ESL/EFL learners [...] do not recognize the problems of their English pronunciation", and that it might be a good idea to demonstrate these under computer control.  Eventually a rule-based KAEPS system was set up, simulating "three types of English pronunciations in the IPA symbols: 1) a phoneme-based English pronunciation, 2) a desirable allophone-based American English pronunciation, and 3) a possible Korean accented English pronunciation".  While Kim's system has never advanced beyond accented graphemic (i.e. IPA) representation, it would be easy enough to attach the IPA-to-speech engine to it.  After all, most TTS systems use phonetic transcription at some stage of the synthesis process [...] An L1-sensitive TTS system would be able to dynamically adjust its parameters to realistically simulate spoken Polglish at these various stages of proficiency" (Sobkowiak 2003).

 

 

Proviso: "Deliberate mispronunciation" is ambiguous:

·     There are 798 Google hits with this phrase, two <in title>: my paper and the following: "Try saying words in a way which will help you remember the way they are spelt. E.g. Wednesday say as Wed-nes-day, friend say as fry-end, people say as pea-op-le.  If there is a word you always struggle with, try this method!  (http://www.school-portal.co.uk/GroupDownloadFile.asp?GroupId=81173&ResourceId=454160).

·     Gahmen - Deliberate mispronunciation of the word "government". Used as a substitute for the actual word especially when criticising the government in written form to prevent possible sanctions against the author (http://en.wikipedia.org/wiki/Singlish_vocabulary).

·     Verbage - /ver'b*j/ n. A deliberate misspelling and mispronunciation of {verbiage} that assimilates it to the word 'garbage'. More pejorative than 'verbiage' (http://www.anvari.org/fortune/Fortune_Big_T/2528_verbage-ver-b-j-n-a-deliberate-misspelling-and-mispronunciation-of-verbiage-that-assimilates-it-to-the-word-garbage.html).

 

 

2. PDI – predicted difficulty

The Phonetic Difficulty Index (PDI) is a quantitative/qualitative measure of word pronouncing difficulty to L1 learners of a given L2.  Specifically, in its current implementation (see http://ifa.amu.edu.pl/~swlodek/public.htm for bibliography), it assigns numerical (0-10 range) and difficulty (57 pronouncing problems) Polglish-sensitive tags to an English word-list or text.

 

Table 1. Example PDI codes with likely Polglish errors

phonetic difficulty code (PDI code)

likely Polglish error

b – <ur> in word

schwa, r?

s – <age_> in stem and not eɪdʒ_

eɪdʒ

w – <ey_> in stem and not eɪ_

eɪ

B – eə

j breaking, smoothing, schwa

E – ʌ

Polish a

J – short schwa

schwa quality

L – voiced apico-dental

d, z, v

N – final voiced obstruent

devoicing

O – pre-voiced dɪs- or mɪs-

z

Q – vowel overnasalization

Polish-like fully nasal vowels

V – glottal fricative h

Polish velar fricative x

X – word-final syllabic sonorants

schwa insertion

2 – more than 5 syllables

stress and articulation problems

7 – <ary_>/<ory_>/<ery_> in bisyllabic-plus stems

stress, vowel quality

9 – proper noun

graphophonemically irregular

 

3. The 20 words – perceived difficulty (Sobkowiak  2000)

In the study originally written in 2000, and published on the web (http://ifa.amu.edu.pl/~swlodek/diffind2.pdf), 208 Polish students of English philology filled in a questionnaire concerning the perceived phonetic difficulty of twenty English words stratified on two dimensions: (a) a-priori rule-based assessment of phonetic difficulty and (b) word frequency rank.  The words were, alphabetically: almost, appear, author, awkward, belief, carry, coloured, debate, defect, dissolve, kingdom, mother, oblige, relax, server, southern, survive, taxi, tired, youngster.  A two-way ANOVA confirmed the significance of both main effects and their synergetic interaction, i.e. the perceived difficulty rating was affected by both the word's rule-based difficulty index and its frequency independently, as well as by their product.

 

4. The 20 words – attested difficulty (Sobkowiak and Ferlacka 2006)

In a recent study, Sobkowiak and Ferlacka (2006) tried to "calibrate the Phonetic Difficulty Index" empirically. The twenty English words of Sobkowiak 2000 were read in carrier sentences by 38 Polish learners of English aged 17-18.  The sentences were definitions taken from the Macmillan English Dictionary for Advanced Learners', first edition (MEDAL1).  A total of 617 word-readings yielded 1211 errors, for the grand mean of 1.96 phonetic errors per reader per word.  The primary aim of that experiment was to verify empirically the intuitively arrived at lexico-phonetic difficulty judgements encapsulated in the PDI.  Predictably, the PDI phonolapsological intuitions turned out to be taken from the academic EFL context, and as such showed little correlation with the actual errors made by Polish schoolchildren.

A sample of 5 sentences/definitions from one learner (keywords bolded):

·       Melt - if you melt into or against someone you relax as they hold you close in a romantic way.

·       Hail - to signal a taxi or bus so that it stops for you.

·       Defect - a fault in someone or something.

 

5. PDI TTS in EFL MRDs

Speech synthesis mechanisms can be tweaked to produce human-sounding audio output of an arbitrary phonemic/allophonic string, including deliberate mispronunciations illustrating selected interlanguage features.  These can then be offered to the EFL e-dictionary user, suitably adjusted to their needs and wants.  In Table 2 two mispronunciation versions are given, one containing the PDI-predicted error(s), the other showing the most common of the actually attested errors in Sobkowiak & Ferlacka study of 2006 (the actual transcription coding was made by Ferlacka).  I am grateful to Mr. Dawid Pietrala for tweaking the Festival speech synthesis system to obtain phonolapsologically accented 'Polglish' speech.  All sounds in the 'error' columns are to be interpreted as having basically Polish qualities, e.g. /a/ is Polish /a/, similarly for other vowels and consonants, e.g. /dʒ/ or /tʃ/. Spelling pronunciation and heavy phonetic transfer from Polish are obvious.  Most phonetically interesting examples are shaded.

 

Table 2. 20 English words: correct and in two Polglish versions

 

word

correct (Festival)

PDI-predicted error

most common attested error

1.  

almost

'oːlməʊst

'ælməʊst

'alməst

2.  

appear

ə'pɪə

ə'pijə

a'pir

3.  

author

'θə

'aʊtə

'tor

4.  

awkward

'kwəd

'kwət

'afkwart

5.  

belief

bɪ'lif

---

be'lif

6.  

carry

'kærɪ

---

'keri

7.  

coloured

'kʌləd

'kaləʊrt

'koloret

8.  

debate

dɪ'beɪt

---

de'beɪt

9.  

defect

'difekt

---

'defekt

10.             

dissolve

dɪ'zɔlv

dɪ'zolf

dɪ'solf

11.             

kingdom

'kɪŋdəm

'kɪŋgdom

'kiŋgdom

12.             

mother

'mʌðə

'madə

'mader

13.             

oblige

ə'blaɪdʒ

o'blaɪtʃ

o'blik

14.             

relax

rɪ'læks

---

'relaks

15.             

server

'səːvə

'sevə

'səːrver

16.             

southern

'sʌðən

'szən

'stern

17.             

survive

sə'vaɪv

sə'vaɪf

sur'vaɪf

18.             

taxi

'tæksɪ

---

'taksi

19.             

tired

'taɪəd

'taɪət

'taɪret

20.             

youngster

'jʌŋstə

'jaŋkstə

'joŋkster

 

 

Figure 1. LDOCE3 audio playback screen: "Play Polglish pronunciation"?

 

6. Synthesizing examples/definitions?

"The first ideas to audiolize EFL e‑dictionary examples (but not definitions!), for instance, appeared long ago.  In an overview of electronic learners' dictionaries, published in 1997, Perry dreamed: "Not only could the pronunciation of headwords and derivatives be given, but also the use of sound could be extended to cover some of the usage examples".  With the recent introduction of recorded audio example sentences in LDOCE4 (http://www.longman.com/ldoce/about.html) there may be a distant glimmer of hope" (Sobkowiak 2006:81).

Some ScanSoft examples of synthesized MEDAL1 definitions:

·     Youngster - a child or a young person.

·     Run-down - so tired that you do not feel well.

·     Oblige – to help someone by doing something that they have asked you to do.

·     Debate - if people debate a subject, they discuss it formally before making a decision, usually by voting.

·     Double cream - thick cream that becomes almost solid when you mix it quickly.

·     Hail - to signal a taxi or bus so that it stops for you.

·     Pallbearer - someone who helps to carry a coffin at a funeral.

 

Bibliography

González,D. 2007. "Text-to-speech applications used in EFL contexts to enhance pronunciation".  TESL-EJ 11.2.

Sobkowiak,W. 2000. "Rule-based and empirical rating of perceived phonetic difficulty of English words according to Polish learners: does frequency matter?" [published electronically: http://ifa.amu.edu.pl/~swlodek/diffind2.pdf.

Sobkowiak,W. 2003. "TTS in EFL CALL - some pedagogical considerations". Teaching English with Technology 3.4.

Sobkowiak,W. 2006. Phonetics of EFL dictionary definitions. Poznań: Wydawnictwo Poznańskie. (abstract here)

Sobkowiak,W. & W.Ferlacka. 2006. "Calibrating the Phonetic Difficulty Index". In W.Sobkowiak & E.Waniek-Klimczak (eds). 2006. Dydaktyka fonetyki języka obcego w Polsce. Konin: PWSZ w Koninie. 173-187. (Proceedings of the Phonetics in FLT 6 Conference in Mikorzyn, 8-10.5.2006; abstract here)