Automatic phonetic annotation of corpora for EFL purposes


Włodzimierz Sobkowiak

Adam Mickiewicz University, Poznań


Corpora of English text (both native and non-native) are now taken for granted as a resource in teaching and learning English as a Foreign Language (EFL). So far they have, however, been exploited mostly on the lexical, morpho-syntactic and stylistic level. The phonetic potential of raw-text corpora (as opposed to the few expensive acoustically treated and annotated ones) has not been discovered.


In this contribution a method is presented of automatically annotating raw-text corpora with EFL phonetic tags coming from a suitably treated electronic word-list. The focus of the presentation is in phonetic lapsology, i.e. in annotating English text for probable Polglish (Polish-English interlanguage) pronunciation problems and errors, as well as for the overall level of pronouncing difficulty. Two examples from my research are presented and discussed: (1) phono-lapsological analysis of definitions in Macmillan English Dictionary for Advanced Learners on CD-ROM (MEDAL; Sobkowiak, forthcoming) and (2) my current work on TIMIT sentences in the context of the Boulder-Poznań CSLR Colorado Literacy Tutor project. It is demonstrated that on top of automatic phonetic transcription of raw text, which is now conceptually and technologically rather trivial, a sophisticated L1-sensitive automatic phonetic annotation is feasible, with a variety of EFL-related functions, in particular text/sentence selection (e.g. TIMIT) and evaluation (e.g. MEDAL) for lexicographic, pedagogical and research (Sobkowiak, unpublished) purposes.





Sobkowiak, forthcoming. "Phonetically controlled definitions?". Poster to be presented at the 11th Euralex International Congress, Lorient, France, 6-10 July 2004 [abstract here:].


Sobkowiak, W. (unpublished). "Rule-based and empirical rating of perceived phonetic difficulty of English words according to Polish learners: does frequency matter?" [full paper here:].

Keywords: corpus, annotation, phonetic, EFL, lapsology


