Recognising senses of way for the purpose of machine translation

Tomasz Stępień (Wrocław University)

One of the main tasks in machine translation (MT) is to correctly recognise the senses of words in a given context. In this paper we present the results of the analysis of the noun way , basing on the Penn Treebank corpus. The senses of way are identified, checked against the data presented in dictionaries, and then associated with specific syntactic patterns and semantic features of the contexts in which the word occurs. This is followed by necessary generalisations, which help to arrive at formal criteria of sense distinction that may be used in MT systems.

Initial research reveals that four main nominal senses of way can be distinguished, namely (i) 'manner of doing something', (ii) 'road', (iii) 'direction', (iv) 'custom, manner'. Additionally, way occurs frequently in idioms. The analysis shows that some syntactic patterns are exclusively associated with certain senses, e.g. way + of + the gerund form always has the sense (i):

1. This is a way of getting to school ...

2. Housewives are finding literally hundreds of ways of getting the maximum use out of traditional designs ... .

In some other cases, semantics plays the distinguishing role. When way is an argument or a modifier of a verb, the verb's meaning is decisive, cf. open and come (sense (ii)):

3. ... a way has been opened for strengthening budgeting procedures ...

4. ... it prevents late-comers from missing some of the people they have come a long way to hear ...

and head and shift (sense (iii)):

5. Buster would solve that quarterback problem just as we head that way.

6. Mr. Khrushchev is convinced that the balance of world power is shifting his way ... .

Even more interesting are the cases in which a structure normally associated with a certain sense actually conveys another meaning due to some semantic factor. Consider the following examples:

7. ... a personal confrontation with Mr. Khrushchev might be the only way to prevent catastrophe.

8. There are four rather obvious ways to reduce or eliminate the vulnerability of aircraft on the ground.

9. ... the address text still had "quite a way to go" toward completion.

All the above sentences contain the infinitive but in (7) and (8) way means 'manner of doing something' whereas in (9) it has the sense of 'road'. The reason is obviously the presence of the verb to go . Providing a list of similar verbs and relating it to the WordNet ordering to create necessary generalisation is another issue addressed in this paper.

We focus mainly on the role of semantics and on the syntax/semantics interaction, devoting less space to other problems, which include treating idiomatic expressions containing way . The results of the research should not only help create more correct machine translations but also give some partial insight into the way humans recognise the proper senses of words.

References

Fellbaum, Christiane (ed.), 1998. WordNet : an electronic lexical database . Cambridge, Massachusetts: MIT Press.

"A Lexicalized Tree Adjoining Grammar for English", 2001. ftp://ftp.cis.upenn.edu/pub/xtag/release-2.24.2001/tech-report.pdf (24th November 2005).

Longman Dictionary of Contemporary English , 2003. New York: Longman

Nunberg, Geoffrey and Ivan Sag and Thomas Wasow, 1994. "Idioms". Language 70, 491-538.

O'Grady, William, 1998. "The Syntax of Idioms". Natural Language and Linguistic Theory 16, 279-312.

The Penn Treebank corpus . http://www.cis.upenn.edu/~treebank/ (24th November 2005).

Wielki słownik angielsko-polski PWN-Oxford , 2004. Warszawa: Wydawnictwo Naukowe PWN.

WordNet 2.1 . http://wordnet.princeton.edu