His Majesty, Her Majesty, and Burkina Fasos:

Strange multi-word lexical units from computational perspective


Magdalena Derwojedowa and Michał Rudolf (Warsaw)


Our aim is to tackle lexical units such as:


1. Zauważyli, że tekst ,,Cosi fan tutte'' zapisano po włosku.

(Noticed3,pl,past that librettonom,sg “Cosi fan tutte”gen,sg writeImpers in Italian.)

2. Czekamy jeszcze na jego królewską mość.

(Wait1,pl,pres still for his royal highness).


i.e., a certain kind of multi-word units having the following properties:


a. They are composed of any number of strings of letters separated by spaces, their contents and setup being fixed, e.g.:


3. Grać na cztery ręce z waszą hrabiowską mością to sama rozkosz.

(Playinfinit four handacc,pl with your comtal majestyinstr,pl, be3,sg,pl real pleasure.)

4. Nakupili ciastek do diabła i trochę.

(Buy3,pl,mhum,past a hell lot of and some cookiegen,pl.)


b. They are ,,pre-syntactic’’, i.e., ,,one-constituent’’ from the syntactic point of view; e.g.:


5. Po prostu//Zwyczajnie pianista fałszuje.

(Simply pianistnom,sg play3,sg,pres false.)

6. Orkiestra zgrała się do cna//doszczętnie w kasynie.

(Orchestrafem,nom,sg lose3,sg,fem,past all the money in casinoloc,sg.)


c. Consequently, they are – as a whole - unequivocally interpretable in morphological terms; e.g.:


PO PROSTU – particle,



MIMO ŻE – conjunction


d. They are continuous; no other constituent can be inserted in-between.


We will call such structures BF-type lexical units, contrary to multi-word constructions given below:


 7. Niech dyrygent się tak bardzo nie dziwi, że nikt nie przyszedł.

(The conductor shouldn’t be surprised so much, that nobody came.)


BF-type units are not syntactic constructions; rather, they belong to lexicon. This means that each of the constituents of such a unit should be treated unilaterally, its morphosyntactic status being completely unimportant or, in other words, undecidable. This way we can avoid the problem of different internal and external agreements BF-type units are involved in.

The mechanism we use to generate BF-type lexical units can also help us account for some semi-morphological discontinuous structures, as well as for discontinuous syntactic constructions, e.g.:


8. Jaś pnie się w górę.

(Jaś climb3,sg,pres up.)

9. Pierwsze skrzypce wciąż się kłócą z drugimi.

(First violin quarrel3,pl,pres with the second.)

10. Może soprany wreszcie będą czytać nuty.

(Maybe sopranonom,sg finally read3,pl.fut scoreacc.)


The repertory of classes of BF-type units covers almost all parts of speech. Some of those classes seem so highly regular that for implementational reasons we can keep lists of constituent word-forms shorter by supplying some rules  generating the whole “paradigm’’.


In our analysis, we also touch the issue of ,,natural’’ vs. ,,grammatical’’ interpretation of some features (i.e., sex and gender):


11. Kompozytor długo rozmawiał ze zmęczonym głową kościoła.

(Composer talk3,sg,past with tiredmhum,instr,sg head(fem,instr,sg) (of the) churchmhum,instr,sg.)


as well as some more sophisticated syntactic mechanism — haplology, shared (common) constituent and so forth:


12. ?Zwiedziłam Rio [Grande i de Janeiro] oraz Santa [Barbara i ].

(Visit1,sg,fem,past) Rio [Grande and de Janeiro] as well as Santa [Barbara and Fé].

13. Ich [królewska i hrabiowska] mości grały na cztery ręce.

(Their [royal and comtal] majestypl play3,nonmaschum,pl,past four handacc,pl)


We also provide some results of automatic syntactic analysis of BF-type units.