His Majesty, Her Majesty, and Burkina Fasos: Strange multi-word lexical units from computational perspective |
|
Magdalena Derwojedowa and Michał Rudolf (Warsaw) |
Our aim is to
tackle lexical units such as:
1. Zauważyli, że tekst ,,Cosi fan tutte'' zapisano po włosku.
(Noticed3,pl,past
that librettonom,sg “Cosi fan tutte”gen,sg writeImpers
in Italian.)
2. Czekamy jeszcze na jego królewską mość.
(Wait1,pl,pres
still for his royal highness).
i.e., a certain
kind of multi-word units having the following properties:
a. They are
composed of any number of strings of letters separated by spaces, their
contents and setup being fixed, e.g.:
3. Grać na cztery ręce z waszą hrabiowską mością to sama
rozkosz.
(Playinfinit
four handacc,pl with your comtal majestyinstr,pl,
be3,sg,pl real pleasure.)
4. Nakupili ciastek do diabła i trochę.
(Buy3,pl,mhum,past
a hell lot of and some cookiegen,pl.)
b. They are ,,pre-syntactic’’,
i.e., ,,one-constituent’’ from the syntactic point of view; e.g.:
5. Po prostu//Zwyczajnie pianista fałszuje.
(Simply
pianistnom,sg play3,sg,pres false.)
6. Orkiestra zgrała się do cna//doszczętnie w kasynie.
(Orchestrafem,nom,sg
lose3,sg,fem,past all the money in casinoloc,sg.)
c. Consequently,
they are – as a whole - unequivocally interpretable in morphological terms;
e.g.:
PO PROSTU – particle,
GÓRNA WOLTA, BURKINA FASO, JEJ KRÓLEWSKA MOŚĆ – nouns (feminine)
DO DIABŁA I TROCHĘ, STO DWADZIEŚCIA SIEDEM –numerals
MIMO ŻE –
conjunction
d. They are
continuous; no other constituent can be inserted in-between.
We will call such
structures BF-type lexical units, contrary to multi-word constructions given
below:
7. Niech dyrygent się
tak bardzo nie dziwi, że nikt nie przyszedł.
(The conductor shouldn’t
be surprised so much, that nobody came.)
BF-type units are
not syntactic constructions; rather, they belong to lexicon. This means that each
of the constituents of such a unit should be treated unilaterally, its
morphosyntactic status being completely unimportant or, in other words,
undecidable. This way we can avoid the problem of different internal and
external agreements BF-type units are involved in.
The mechanism we
use to generate BF-type lexical units can also help us account for some
semi-morphological discontinuous structures, as well as for discontinuous
syntactic constructions, e.g.:
8. Jaś pnie się w górę.
(Jaś climb3,sg,pres
up.)
9. Pierwsze skrzypce wciąż się kłócą z drugimi.
(First violin quarrel3,pl,pres
with the second.)
10. Może soprany wreszcie będą czytać nuty.
(Maybe sopranonom,sg
finally read3,pl.fut scoreacc.)
The repertory of classes
of BF-type units covers almost all parts of speech. Some of those classes seem
so highly regular that for implementational reasons we can keep lists of
constituent word-forms shorter by supplying some rules generating the whole “paradigm’’.
In our analysis, we
also touch the issue of ,,natural’’ vs. ,,grammatical’’ interpretation of some
features (i.e., sex and gender):
11. Kompozytor długo rozmawiał ze zmęczonym głową kościoła.
(Composer talk3,sg,past
with tiredmhum,instr,sg head(fem,instr,sg) (of the)
churchmhum,instr,sg.)
as well as some
more sophisticated syntactic mechanism — haplology, shared (common)
constituent and so forth:
12. ?Zwiedziłam Rio [Grande i de Janeiro]
oraz Santa [Barbara i Fé].
(Visit1,sg,fem,past)
Rio [Grande and de Janeiro] as well as Santa [Barbara and Fé].
13. Ich [królewska i hrabiowska] mości
grały na cztery ręce.
(Their [royal and
comtal] majestypl play3,nonmaschum,pl,past four handacc,pl)
We also provide
some results of automatic syntactic analysis of BF-type units.