What are dictionary definitions good for?

Włodzimierz Sobkowiak

 

 

0. Assumptions

1. Phonetic Difficulty Index (PDI)

2. PDI annotation of:

(a) lexica

(b) text corpora

3. PDI annotation of MEDAL definitions

4. Applications:

(a) phonolexicographic analysis

(b) didactic use

 

0. Assumptions

(a)        Dictionary definitions are actually read in (monolingual) dictionary lookup

(b)        "Inner [...] pronunciation [...] is a constituent part of reading by far the most of people" (Gibson and Levin 1975:342)

(c)        Even more so in FL learners

(d)        Phonetically difficult definitions hinder subvocal reading

(e)        EFL Dictionary definitions should be 'user-friendly'

(f)         It is possible to measure phonetic difficulty

 

Gibson,E.J. & H.Levin. 1975. The psychology of reading. Cambridge, Mass.: The MIT Press.

 

 

1. Phonetic Difficulty Index (PDI)

The PDI algorithm was run over the Oxford Advanced Learner's Dictionary of Current English (OALDCE) word-list, which currently counts 85430 wordforms and 25264 lemmas.  Each encountered phonetic difficulty was counted as one point.  The algorithm generated the PDI range between 0 (easy) and 10 (hard), with a mean of 2.45, and standard deviation 1.56.  Apart from measuring the overall phonetic difficulty of a lexical item, the algorithm also assigns tags containing 57 Polglish difficulty codes.

 

Table 1. Some examples of PDI codes with their lexical frequency and sources of likely errors

PDI code

frequency

source of likely Polglish error

a – compound

11148

stress, geminates

g – <ou> in word

3992

many phonetic realizations

r – <gh_> or <ght_> in stem

534

many phonetic realizations

A – linking /r/

4787

/r/ or not? (BrE), trilled?

B – /e«/

1129

/j/ breaking, smoothing, schwa

H – velar nasal

10044

/Ng/, /Nk/, /n/

J – short schwa

32192

schwa quality

N – final voiced obstruent

31427

devoicing

U – post-alveolar affricates

7631

Polish apical substitutes

1 – British≠American

31710

accent confusion

2 – more than 5 syllables

750

stress and articulation problems

3 – secondary stress

10351

reduced to unstressed

 

 

2a. PDI annotation of lexica

 

Table 2. A sample of a PDI-annotated lexicon (OALDCE word-list)

word

stem

British

syllable

structure

POS

syllable

number

PDI

value

PDI

code

boggling

boggle

'b0glIN

'CVCCVC

Ib%

2

2

H1

bogy

bogy

'b5gI

'CVCV

K8$

2

0

 

bohemians

bohemian

b5'himI@nz

CV'CVCVVCC

Kj%

4

5

CJNQV

 

 

PDI code

Polglish difficulty

incidence in the

OALDCE word-list

incidence in MEDAL definitions (records)

H

velar nasal

10044 (11.8%)

49759 (56.2%)

1

British≠American

31710 (37.1%)

81387 (92.0%)

C

/I«/

3337 (3.9%)

10205 (11.5%)

J

short schwa

32192 (37.7%)

83506 (94.4%)

N

final voiced obstruent

31427 (36.8%)

75014 (84.8%)

Q

vowel over-nasalization

7612   (8.9%)

24477 (27.7%)

V

glottal fricative /h/

4267 (5.0%)

26507 (30.0%)

 

 

2b. PDI annotation of text corpora

 

Table 3. Some example PDI-tagged sentences from TIMIT

TIMIT sentence

phonetic transcription

PDI coding

mean PDI

word #

global PDI

Theocracy reconsidered

/TI'0kr@sI ,rik@n'sId@d/

dJM1 JNQ13

4.5

2

9

There were other farmhouses nearby

D7 w9R 'VD@ 'fAmh2zIz 'n6b1

ABL1 AK1 AEJL1 agNV1 C1

3.8

5

19

We can die, too, we can die like real people

wi k&n d1 tu wi k&n d1 l1k r6l 'pipl

* * * * * * * * C dX

0.3

10

3

 

 

3. PDI annotation of MEDAL definitions

 

Table 4. An example of PDI-tagged MEDAL definition (taster)

Definition

a small amount of something that is offered so that you can experience it and decide whether you like it or not

Transcription

@ smOl @'m2nt 0v 'sVmTIN D&t Iz '0f@d s5 D&t ju k&n Ik'sp6r6ns It &nd dI's1d 'weD@ ju l1k It O n0t

PDI codes

J 1 gJ N1 EHM L N JN1 * L g * C * N NO AJL1 g * * A1 1

Mean PDI

1.3

Number of words

22

Global PDI

28

 

The mean word-weighted PDI counted over the 88495 MEDAL definitions equals 1.52, s.d.=0.42.  I tentatively compared the MEDAL's mean with a randomly selected short text (1698 words) downloaded from the internet: the latter's mean PDI was 1.92.

 

 

4a. Applications: phonolexicographic analysis

 

Table 5. Cross-dictionary comparisons

 

COBUILD3

LDOCE4

OALD7

CALD=CIDE2

MEDAL

definition sample (N=433)

85

88

90

83

87

mean PDI (per word)

1.4

1.5

1.5

1.5

1.5

mean PDI (per definition)

28

21

22

22

21

mean # words

19

14

15

14

14

 

Definition phonetic difficulty

A word should be defined using words simpler than itself (Ayto 1984).  12 headwords in the MEDAL sample of 87 show definition PDI at least 1 point greater than the PDI of the headword: candy, foot, grease, intensity, keel, mail, oozy, recess, requisite, snip, tramp, vaccinate.  One definition's PDI exceeds headword PDI by more than 2 points: necessary for a particular purpose.

 

Ayto,J.R. 1984. "The vocabulary of definition". In D.Goetz & T.Herbst (eds). 1984. Theoretische und praktische Probleme der Lexicographie. München: Max Hueber Verlag. 50-60

 

 

4b. Applications: didactic

·       dynamically adjusting definitions to the learner's needs and requirements, also in terms of pronunciation

·       (semi)automatic creation of language tasks and exercises in an electronic dictionary

·       offering the user a corpus-like resource within the dictionary

 

MEDAL phonolapsological query examples:

1. /t+j/ coalescence; PDI<0.6:

bedroom: a room that you sleep in

cone: a cone shape that you put ice cream in and eat

green: not yet ready to be eaten

payphone: a telephone in a public place that you pay to use

 

2. Linking /r/; PDI<0.6

chasm/crevasse: a very deep crack in rock or ice

exactly: in every way or every detail

intense: very great or extreme

severe/ly: very strict or extreme

to have one foot in the grave: to be very old or ill and likely to die soon

 

3. Schwa-less definitions; PDI<.06

creep by: if time creeps by it passes very slowly

lean (adj): lean meat has very little fat in it

lean (n): meat that has very little fat in it

not a moment too soon: so late that it is almost too late

tied up: if traffic is tied up it is not moving very quickly

 

4. Schwa-heavy definitions

client-server: used for referring to a network (=group of computers) in which each computer is either a client or a server. Clients are the individual computers that run programs or the equipment connected to them such as printers, and servers are the powerful computers that supply the information that makes them work (30 schwas in 51 words, PDI=2.0)