Linguistic Classes Example Code

Manipulating Items

Adding basic information to an EST_Item

An item such as

is constructed as follows: (note that the atttributes are in capitals by linguistic convention only: attirbute names are case sensitive and can be upper or lower case).

EST_Item p; p.set("POS", "Noun"); p.set("NAME", "example"); p.set("FOCUS", "+"); p.set("DURATION", 2.76); p.set("STRESS", 2);

The type of the values in features is a EST_Val class, which is a union which can store ints, floats, EST_Strings, void pointers, andEST_Features. The overloaded function facility of C++ means that the set() can be used for all of these.

Accessing basic information in an Item

When accessing the features, the type must be specified. This is done most easily by using of a series of functions whose type is coded by a capital letter:

F()
return value as a float
I()
return value as a integer
S()
return value as a
A()
return value as a EST_Features

cout << "Part of speech for p is " << p.S("POS") << endl; cout << "Duration for p is " << p.F("DURATION") << endl; cout << "Stress value for p is " << p.I("STRESS") << endl;


A optional default value can be given if a result is always desired

cout << "Part of speech for p is " << p.S("POS") << endl; cout << "Syntactic Category for p is " << p.S("CAT", "Noun") << endl; // noerror

Nested feature structures in items

Nested feature structures such as Example 6-1

can be created in a number of ways:

p.set("NAME", "d"); p.set("VOICE", "+"); p.set("CONTINUANT", "-"); p.set("SONORANT", "-"); EST_Features f; p.set("PLACE", f); // copy in empty feature set here p.A("PLACE").set("CORONAL", "+"); p.A("PLACE").set("ANTERIOR", "+");

or by filling the values in an EST_Features object and copying it in:

EST_Features f2; f2.set("CORONAL", "+"); f2.set("ANTERIOR", "+"); p.set("PLACE", f2);

Nested features can be accessed by multiple calls to the accessing commands:

cout << "Anterior value is: " << p.A("PLACE").S("ANTERIOR"); cout << "Coronal value is: " << p.A("PLACE").S("CORONAL");

The first command is A() because PLACE is a feature structure, and the second command is S() because it returns a string (the value or ANTRIOR or CORONAL). A shorthand is provided to extract the value in a single statement:

cout << "Anterior value is: " << p.S("PLACE.ANTERIOR"); cout << "Coronal value is: " << p.S("PLACE.CORONAL");

Again, as the last value to be returned is a string S() must be used. This shorthand can also be used to set the features:

p.set("PLACE.CORONAL", "+"); p.set("PLACE.ANTERIOR", "+");

Adding arbitrary classes into items

As well as the built types, of int, float and string, EST_Items can hold any other type, for example a waveform or track. Only pointers to classes can be added - the EST_Val class takes care of garbage collection.

First we must add the ability to have a certain type as a val. This is done by a having the following macro in the appropriate .h file by the following command:

VAL_REGISTER_CLASS_DCLS(wave,EST_Wave)

The first argument (wave) is an artibtrary name which is used to form a function for casting from the val. To add a waveform as a value, the est_val macro must be called to make a value out of the object:

EST_Wave *sig; sig = new EST_Wave; p.set_val("waveform", est_val(sig));

To get a waveform back, the wave casting function must be used:

Utility functions for items

The presence of a attribute can be checked using f_present(), which returns true if the attribute is in the item:

cout << "This is true: " << p.f_present("PLACE"); cout << "This is false: " << p.f_present("MANNER");

A attirbute can be removed by f_remove

p.f_remove("PLACE");

Manipulating relations

Building a linear list relation

kk It is standard to store the phones for an utterance as a linear list in a EST_Relation object. Each phone is represented by one EST_Item, whereas the complete list is stored as a EST_Relation.

The easiest way to build a linear list is by using the EST_Relation.append(), which when called without arguments, makes a new empty EST_Item, adds it onto the end of the relation and returns a pointer to it. The information relevant to that phone can then be added to the returned item.

EST_Relation phones; EST_Item *a; a = phones.append(); a->set("NAME", "f"); a->set("TYPE", "consonant"); a = phones.append(); a->set("NAME", "o"); a->set("TYPE", "vowel"); a = phones.append(); a->set("NAME", "r"); a->set("TYPE", "consonant");

Note that the -> operator is used because the EST_Item a is a pointer here. The same pointer variable can be used multiple times because every time append() is called it allocates a new item and returns a pointer to it.

If you already have a EST_Item pointer and want to add it to a relation, you can give it as an argument to append(), but this is generally inadvisable as it involves some unecessary copying, and also you have to allocate the memory for the next EST_Item pointer yourself everytime (if you don't you will overwrite the previous one):

a = new EST_Item; a->set("NAME", "m"); a->set("TYPE", "consonant"); phones.append(a); a = new EST_Item; a->set("NAME", "ei"); a->set("TYPE", "vowel");

Items can be prepended in exactly the same way:

a = phones.prepend(); a->set("NAME", "n"); a->set("TYPE", "consonant"); a = phones.prepend(); a->set("NAME", "i"); a->set("TYPE", "vowel");

Iterating through a linear list relation

Iteration in lists is performed with next() and prev(), and an EST_Item, used as an iteration pointer.

EST_Item *s; for (s = phones.head(); s != 0; s = next(s)) cout << s->S("NAME") << endl;


for (s = phones.tail(); s != 0; s = prev(s)) cout << s->S("NAME") << endl;


head() and tail() return EST_Item pointers to the start and end of the list. next() and prev() returns the next or previous item in the list, and returns 0 when the end or start of the list is reached. Hence checking for 0 is a useful termination condition of the iteration. Taking advantage of C shorthand allows us to write:

for (s = phones.head(); s; s = next(s)) cout << s->S("NAME") << endl;

Building a tree relation

It is standard to store information such as syntax as a tree in a EST_Relation object. Each tree node is represented by one EST_Item, whereas the complete tree is stored as a EST_Relation.

The easiest way to build a tree is by using the append_daughter(), which when called without arguments, makes a new empty EST_Item, adds it as a daughter to an existing item and returns a pointer to it. The information relevant to that node can then be added to the returned item. The root node of the tree must be added directly to the EST_Relation.

Example 6-2. Example prog01

EST_Relation tree; EST_Item *r, *np, *vp, *n; r = tree.append(); r->set("CAT", "S"); np = append_daughter(r); np->set("CAT", "NP"); n = append_daughter(np); n->set("CAT", "PRO"); n = append_daughter(n); n->set("NAME", "John"); vp = append_daughter(r); vp->set("CAT", "VP"); n = append_daughter(vp); n->set("CAT", "VERB"); n = append_daughter(n); n->set("NAME", "loves"); np = append_daughter(vp); np->set("CAT", "NP"); n = append_daughter(np); n->set("CAT", "DET"); n = append_daughter(n); n->set("NAME", "the"); n = append_daughter(np); n->set("CAT", "NOUN"); n = append_daughter(n); n->set("NAME", "woman"); cout << tree;


Obviously, the use of recursive functions in building trees is more efficient and would eliminate the need for the large number of temporary variables used in the above example.

Iterating through a tree relation

Iteration in trees is done with daughter1() daughter2() daughtern() and parent(). Pre-order traversal can be achieved iteratively as follows:

n = tree.head(); // initialise iteration variable to head of tree while (n) { if (daughter1(n) != 0) // if daughter exists, make n its daughter n = daughter1(n); else if (next(n) != 0)// otherwise visit its sisters n = next(n); else // if no sisters are left, go back up the tree { // until a sister to a parent is found bool found=FALSE; for (EST_Item *pp = parent(n); pp != 0; pp = parent(pp)) if (next(pp)) { n = next(pp); found=TRUE; break; } if (!found) { n = 0; break; } } cout << *n; }

A special set of iterators are available for traversal of the leaf (terminal) nodes of a tree:

Example 6-3. Leaf iteration

for (s = first_leaf(tree.head()); s != last_leaf(tree.head()); s = next_leaf(s)) cout << s->S("NAME") << endl;

Building a multi-linear relation

This is not yet fully implemented.

Iterating through a multi-linear relation

This is not yet fully implemented.

Relations in Utterances

The EST_Utterance class is used to store all the items and relations relevant to a single utterance. (Here utterance is used as a general linguistic entity - it doesn't have to relate to a well formed complete linguistic unit such as a sentence or phrase).

Instead of storing relations separately, they are stored in utterances:

EST_Utterance utt; utt.create_relation("Word"); utt.create_relation("Syntax");

EST_Relations can be accessed though the utterance object either directly or by use of a temporary EST_Relation pointer:

EST_Relation *word, *syntax; word = utt.relation("Word"); syntax = utt.relation("Syntax");

The contents of the relation can be filled by the methods described above.

Adding items into multiple relations

A major aspect of this system is that an item can be in two relations at once, as shown in Figure 6-2.

In the following example, using the syntax relation as already created in Example 6-2, shows how to put the terminal nodes of this tree into a word relation:

Example 6-4. adding existing items to a new relation

word = utt.relation("Word"); syntax = utt.relation("Syntax"); for (s = first_leaf(syntax->head()); s != last_leaf(syntax->head()); s = next_leaf(s)) word->append(s);

Thus the terminal nodes in the syntax relation are now stored as a linear list in the word relation. Hence

cout << *utt.relation("Syntax") << "\n";

produces


whereas

cout << *utt.relation("Word") << "\n";

produces


Changing the relation an item is in.

Even if an item is in more than one relation, it always has the idea of a "current" relation. If the traversal functions (next, previous, parent etc) are called, traversal always occurs with respect to the current relation. An item's current relation can be changed as follows:

s = utt.relation("Word")->head(); // set p to first word s = next(s); // get next word: s = parent(s) would throw an error as there // is no parent to s in the word relation. s = prev(s); // get previous word s = s->as_relation("Syntax"); // change relation. s = parent(s); // get parent of s in syntax relation s = daughter1(s); // get first daughter of s: s = next(s) would throw an // error as there is no next to s in the syntax relation.

while s is still the same item, the current relation is now "Syntax". The current relation is returned by the relation() function:

cout << "Name of current relation: " << s->relation()->name() << endl;

If you aren't sure whether an item is in a relation, you can check with in_relation(). This will return true if an item is in the requested relation regardless of what the current relation is.

cout << "P is in the syntax relation: " << s->in_relation("Word") << endl; cout << "Relations: " << s->relations() << endl;

Feature functions

evaluate functions setting functions