- IFA-PICLE —
an extract of the PICLE
corpus, consisting almost entirely of writings
contributed by Poznań School of English (IFA) students;
contains argumentative/expository essays of a 500-1,000-word
sample size; [Comparison
table]
- PLLC — a manually
edited extract from the Polish part (in excess of 500,000
tokens in toto) of the 10-million-word Longman Learner
Corpus (http://www.longman-elt.com/dictionaries/corpus/lclearn.html);
PLLC includes short essay writings, some of which
feature personal rather than argumentative discourse (hobbies,
interests, plan for the future, etc.); the sample sizes
are varied, with short texts of a couple of hundred words
prevailing; only non-beginner level texts (assessed
impressionistically) were selected, but the attested
proficiency is not very homogenous; [Comparison
table]
- SPAN — an extract from the Spanish sub-corpus of
the International Corpus of Learner English:
argumentative essays of 500-1,000-word sample size; as in
the case of PLLC, SPAN includes texts of
varied quality, of which a great many can be regarded as
lower-than-advanced standard; [Comparison
table]
- FREN — an extract from the Belgian-French sub-corpus
of the International Corpus of Learner English:
argumentative essays of 500-1,000-word sample size; [Comparison
table]
- LOCNESS — a selection of argumentative essays written
by English and American secondary school and college
students (=Louvain Corpus of Native English eSSays),
the primary control native corpus within the ICLE
family, which is arguably more comparable with the non-native
learner data than professionally written text samples;
contains argumentative essays (including a few on
literature topics) of 500-1,000 sample size; [Comparison
table]
- MCONC — a collection of manually extracted UK
academic texts (textbook and introductory books,
predominantly) taken from samples included in the MicroConcord
text collection B. Academic texts (1993); the sample
size is generally longer (up to 2,000 words), but
attention was paid to filter out technical discourse and
concentrate on more accessible expository or
argumentative texts; [Comparison
table]
- LOB&BROWN — a collection of UK and US quality
press editorials & some excerpts from popular science
books, retrieved from the LOB and Brown
corpora (ICAME Language Corpora 1991), exclusively
from Category B (‘Press: Editorial’) and Category F (‘Popular
Lore’) texts; the collection excludes short press
reports, but includes analyses of political events,
popular science articles, columns, etc.; the sample sizes
vary, though in general approximate those found in the
learner data; [Comparison
table]
All the English corpora were POS-tagged (part-of-speech
tagged) and lemmatised, using a DOS tagger-lemmatiser called TOSCA-ICLE
Tagging Unit 1.0, specially designed for the ICLE Project.
- POL-STUD — a corpus collected by P.
Kaszubski, consisting of mainly argumentative essays
produced by senior-year secondary school pupils and first-year
university students of English at various institutions of
the Poznań area. The sample-size range is comparable to
that in ICLE; [Comparison
table]
- POL-EXP — a collection of Polish quality-press
articles and academic papers (humanities subjects,
popular science), contributed by privately reached
contributors and by Prof. Ireneusz Bobrowski, Polish
Academy of Sciences, Kraków; compiled by P. Kaszubski; [Comparison
table]
No topic homogeneity could be enforced in the
English and Polish corpora, but efforts were made to include, in
the first place, themes typically represented in PICLE and the
other ICLE learner corpora (e.g. youth and social problems, such
as violence, drugs, TV-addictions, etc.).