Automatic phonetic annotation of corpora for EFL purposes

 

Włodzimierz Sobkowiak

Adam Mickiewicz University, Poznań

 

Corpora of English text (both native and non-native) are now taken for granted as a resource in teaching and learning English as a Foreign Language (EFL). So far they have, however, been exploited mostly on the lexical, morpho-syntactic and stylistic level. The phonetic potential of raw-text corpora (as opposed to the few expensive acoustically treated and annotated ones) has not been discovered.

 

In this contribution a method is presented of automatically annotating raw-text corpora with EFL phonetic tags coming from a suitably treated electronic word-list. The focus of the presentation is in phonetic lapsology, i.e. in annotating English text for probable Polglish (Polish-English interlanguage) pronunciation problems and errors, as well as for the overall level of pronouncing difficulty. Two examples from my research are presented and discussed: (1) phono-lapsological analysis of definitions in Macmillan English Dictionary for Advanced Learners on CD-ROM (MEDAL; Sobkowiak, forthcoming) and (2) my current work on TIMIT sentences in the context of the Boulder-Poznań CSLR Colorado Literacy Tutor project. It is demonstrated that on top of automatic phonetic transcription of raw text, which is now conceptually and technologically rather trivial, a sophisticated L1-sensitive automatic phonetic annotation is feasible, with a variety of EFL-related functions, in particular text/sentence selection (e.g. TIMIT) and evaluation (e.g. MEDAL) for lexicographic, pedagogical and research (Sobkowiak, unpublished) purposes.

 

 

References:

 

Sobkowiak, forthcoming. "Phonetically controlled definitions?". Poster to be presented at the 11th Euralex International Congress, Lorient, France, 6-10 July 2004 [abstract here: http://elex.amu.edu.pl/~sobkow/abstract.htm#ABS33].

 

Sobkowiak, W. (unpublished). "Rule-based and empirical rating of perceived phonetic difficulty of English words according to Polish learners: does frequency matter?" [full paper here: http://elex.amu.edu.pl/~sobkow/diffind2.doc].

Keywords: corpus, annotation, phonetic, EFL, lapsology

 

Home | Abstracts