Rough set approach in formal learning theory

Wojciech Buszkowski (Adam Mickiewicz University, Poznań)

Formal learning theory is understood as a branch of symbolic learning, following the Gold paradigm of learning from positive data and identification in the limit. We focus on learning algorithms based on unification, which have been elaborated for categorial grammars by Buszkowski & Penn (1990), Kanazawa (1994), Marciniec (1994), Foret (2001), and others.

Rough set theory is an approach to incomplete information, elaborated by Z. Pawlak (1991); also see Demri & Or³owska (2002), Polkowski (2002). A set of objects X is represented by a pair of two sets: the lower approximation and the upper approximation. The former consists of those objects which necessarily belong to X on the basis of our knowledge, and the latter consists of objects which possibly belong to X on the basis of our knowledge.

A standard model of rough sets is a finite information system (relational database) in which objects are characterized by attributes and their values. Two objects are indiscernible with respect to a set A of attributes if, for any attribute from A, the value of this attribute for x equals the value of this attribute for y. Now, for a set X, of objects, the lower approximation is the join of indiscernibility classes contained in X, and the upper approximation is the join of indiscernibility classes not disjoint with X.

A central idea of formal learning theory is to discover a grammar for a language L on the basis of a finite sample from L (a finite set D contained in L). If L is from a class K, then the lower approximation of L determined by D is defined here as  the meet of all languages from K containing D, and the upper approximation as the join of all languages from K containing D. We study several connections between these notions and basic notions of formal learning theory (learnability of a class of grammars, finite elasticity, outcomes of unification algorithms).