The English corpora

Variable 1: native/non-native

Non-native English

Native English

Variable 2: Learner/non-learner

‘apprentice’ corpora

‘expert’ corpora

Variable 3: (Presumed) Proficiency Level (label)

1. Interm.

2. Upp-Int.

3. Advanced

4. College

5. Professional

Brief description

Polish

intermediate EFL

Spanish

(upper-) intermediate EFL

Belgian-French

advanced EFL

Polish

advanced EFL

British and American college

learner English

British

academic writing

British and American

quality press

Corpus label(s) used

PLLC

SPAN

FREN

IFA-PICLE

LOCNESS

MCONC

LOB&BROWN

Words (tokens) in corpus*

92,712

94,965

101,442

107,990

106,255

97,914

94,421

 

The native Polish corpora

Corpus label

Variable 2: Learner/non-learner

Variable 3: Proficiency

Brief description

Tokens

POL-STUD ‘apprentice’ corpus college college compositions

103,382

POL-EXP ‘expert’ corpus professional academic papers + quality-press articles

101,348

* The count taken with the WordList facility, part of the WordSmith 3.0 Tools package. Hyphenated words were programmed to count as one word.