Descriptions of PK's comparable corpora

IFA-PICLE — an extract of the PICLE corpus, consisting almost entirely of writings contributed by Poznań School of English (IFA) students; contains argumentative/expository essays of a 500-1,000-word sample size; [Comparison table]
PLLC — a manually edited extract from the Polish part (in excess of 500,000 tokens in toto) of the 10-million-word Longman Learner Corpus (http://www.longman-elt.com/dictionaries/corpus/lclearn.html); PLLC includes short essay writings, some of which feature personal rather than argumentative discourse (hobbies, interests, plan for the future, etc.); the sample sizes are varied, with short texts of a couple of hundred words prevailing; only non-beginner level texts (assessed impressionistically) were selected, but the attested proficiency is not very homogenous; [Comparison table]
SPAN — an extract from the Spanish sub-corpus of the International Corpus of Learner English: argumentative essays of 500-1,000-word sample size; as in the case of PLLC, SPAN includes texts of varied quality, of which a great many can be regarded as lower-than-advanced standard; [Comparison table]
FREN — an extract from the Belgian-French sub-corpus of the International Corpus of Learner English: argumentative essays of 500-1,000-word sample size; [Comparison table]
LOCNESS — a selection of argumentative essays written by English and American secondary school and college students (=Louvain Corpus of Native English eSSays), the primary control native corpus within the ICLE family, which is arguably more comparable with the non-native learner data than professionally written text samples; contains argumentative essays (including a few on literature topics) of 500-1,000 sample size; [Comparison table]
MCONC — a collection of manually extracted UK academic texts (textbook and introductory books, predominantly) taken from samples included in the MicroConcord text collection B. Academic texts (1993); the sample size is generally longer (up to 2,000 words), but attention was paid to filter out technical discourse and concentrate on more accessible expository or argumentative texts; [Comparison table]
LOB&BROWN — a collection of UK and US quality press editorials & some excerpts from popular science books, retrieved from the LOB and Brown corpora (ICAME Language Corpora 1991), exclusively from Category B (‘Press: Editorial’) and Category F (‘Popular Lore’) texts; the collection excludes short press reports, but includes analyses of political events, popular science articles, columns, etc.; the sample sizes vary, though in general approximate those found in the learner data; [Comparison table]

All the English corpora were POS-tagged (part-of-speech tagged) and lemmatised, using a DOS tagger-lemmatiser called TOSCA-ICLE Tagging Unit 1.0, specially designed for the ICLE Project.

POL-STUD — a corpus collected by P. Kaszubski, consisting of mainly argumentative essays produced by senior-year secondary school pupils and first-year university students of English at various institutions of the Poznań area. The sample-size range is comparable to that in ICLE; [Comparison table]
POL-EXP — a collection of Polish quality-press articles and academic papers (humanities subjects, popular science), contributed by privately reached contributors and by Prof. Ireneusz Bobrowski, Polish Academy of Sciences, Kraków; compiled by P. Kaszubski; [Comparison table]

No topic homogeneity could be enforced in the English and Polish corpora, but efforts were made to include, in the first place, themes typically represented in PICLE and the other ICLE learner corpora (e.g. youth and social problems, such as violence, drugs, TV-addictions, etc.).