All the English corpora were POS-tagged (part-of-speech tagged) and lemmatised, using a DOS tagger-lemmatiser called TOSCA-ICLE Tagging Unit 1.0, specially designed for the ICLE Project.

No topic homogeneity could be enforced in the English and Polish corpora, but efforts were made to include, in the first place, themes typically represented in PICLE and the other ICLE learner corpora (e.g. youth and social problems, such as violence, drugs, TV-addictions, etc.).