Preparing a Text

Note: There is a CheatSheet available for download containing the essentials of this page.

Terms and meanings

Talking about elements we mean an element of a phrase: "Tom" is an element of the phrase "Tom plays soccer".
The word that is used to name an element in a phrase. Note, that this is not limited to nouns! An adjective or a verb also has an identifier in a phrase. The identifier is used to build a node in the internal discourse model.
Just the same like in the UML.
Just the same like in the UML.

Basic Idea

To process a functional requirements specification (FRS) in SENSE, you have to annotate the FRS using SALE. The Sale Compiler is then used to transform the FRS into an internal discourse model. We use a graph rewriting system to work on the discourse model. All you need to know at this point, is that elements of the FRS (phrases, words, comments) are mapped to nodes. For every phrase there will be a node (being of the phrase-type) and for every word of that phrase there will be a node (being of the phrase-type). These nodes are linked together with edges that

  • connect words to their phrases and
  • contain additional information (such as roles).

The sections below describe the annotation process. To depict the process in detail, we use the following phrases:

The game of chess is played between two opponents.
They move their pieces alternately on a square board called a chessboard.

At first, we rearrange these phrases to a prettier outlining:

 The game of chess is played between two opponents.

 They move their pieces alternately 
   on a square board called a chessboard.

Phrases and Sub-Phrases

First of all, you have to delineate phrases using square brackets ([ and ]). Top-level phrases end with a dot (.).

 [ The game of chess is played between two opponents ].

 [ They move their pieces alternately 
     on a square board called a chessboard ].

In SALE, a nested relation (like in the second phrase: a square board called a chessboard) can per se take a role in its outer relation. Yet, if an inner object of the nested relation is marked with a caret (ˆ), it is interpreted as the head of the clause (like in ANTLR grammars). This head is then bound to the outer relation with the role which is appended to the nested relation.

 [ The game of chess is played between two opponents ].

 [ They move their pieces $alternately 
     on a [ square ^board called a chessboard ] ].

Wipe out unnecessary Elements (Comments in SALE)

Using # you can comment out unnecessary identifiers in a phrase:

 [ #The game of chess #is played #between two opponents ].

Using #{ ... } you can comment out several words at once:

 [ They move their pieces $alternately 
     #{on a} [ square ^board called #a chessboard ] ].

Connect Elements that consist of multiple words

If an element consists of multiple words, such as the game of chess, you should connect the words using the dash (_).

 [ #The game_of_chess #is played #between two opponents ].

 [ They move their pieces $alternately 
     #{on a} [ square ^board called #a chessboard ] ].

Identify Attributes and Multiplicities

Every element in sale can be attributed using the dollar sign $, a $red car for example. Multiplicities also are attributes but are denoted with a star *, *seven apples for example.

Attributes always apply to the following element - except when it is explicitly reassigned using the movement operators << and >>, the apple is red<<2 for example.

 [ #The game_of_chess #is played between *two opponents ].

 [ They move their pieces $alternately<<3 
     #{on a} [ $square ^board called #a chessboard ] ].

Identify Thematic Roles

Identify and name the roles of the elements - every element must play a role or must be commented out otherwise. Use the suffix |ROLE to add a role to an element; you can add multiple roles to an element using square brackets: |{ROLE1, ROLE2}.

[ #The game_of_chess|PAT #is played|ACT #between *two opponents|AG ] . 

[ They|AG move|ACT their|POSS pieces|{HAB,PAT} $alternately<<3 #on 
  [ #a $square ˆboard|{PAT,FIN} called|ACT #a chessboard|FIC ]|LOC_POS ] .

Note: The inner relation a square board called a chessboard plays a role but the element board is lifted using the caret operator. This way, the board plays the role LOC_POS in the outer relation.

A list of thematic roles with examples can be found in the article about thematic roles

Coreference Analysis

If an instance of an object is named several times in a text, on can mark the second and following occurrences to function as reference to the first one. This way, there will be only one node in the internal discourse model instead of multiple nodes for the same instance.

To achieve this, there are two possibilities:

Simple References

Using simple references, you can connect two identifiers: [ Tom|AG plays_soccer|ACT ]. [ @Tom|AG plays_tennis|ACT ]. Note, that an element prefixed with an @ always refers to the first preceding element with the same identifier. In the internal discourse model will be one node named "Tom" that plays two role in two phrases. The node "Tom" is linked into two phrases.


Sometimes, natural language avoids to use the same identifier twice - just like in The opponent whose king has been checkmated has lost the game.. There, the identifier whose represents the element "opponent" as well as the identifier opponent.

Sometimes, the author of the text doesn't like repetitions. He might then use different words for the same instance: Michael_Schumacher has won the grand_prix of Malaysia. The ferrari_pilot has now 75 championship_points.

In both cases, you cannot use @ to refer to the first element, because the elements have different identifiers. To obtain the same effect, you can declare assertions. Simply add the following two phrases:

[ #The @opponent|AG [ whose|POSS king|{HAB, STATII} #has #been checkmated|STAT ]|SUM #has lost|ACT #the @game|PAT ].

# Assertions
[ @king|EQK @whose|EQD ].

[ Michael_Schumacher|AG has_won|ACT #the grand_prix|PAT #of Malaysia|LOC_POS ].
[ #The ferrari_pilot|POSS #has #now *75 championship_points|HAB ].

# Assertions
[ @Michael_Schumacher|EQK @ferrari_pilot|EQD ].

Using assertions, you do not get the same result as when using simple references: There will be two nodes with different names - but these nodes are connected by a helper phrase, which "knows" which of the two identifiers will be kept in the discourse model (the one with EQK) and which one can be deleted (the one with EQD).

Assertions for Sets

Sometimes you have to combine several elements into one element: [ Tom and Mathias play_soccer ]. [ They win ]. Now you have to tell SALE that They is a set which consists of both of the elements Tom and Mathias. Having read the section about assertions, you probably agree that using assertions here is straight forward.

Using the roles EQS to denote the set and EQB to denote the elements, you end up with the following annotation:

[ {Tom AND Mathias}|AG play_soccer|ACT ].
[ They|AG win|ACT ].

# Assertions
[ @They|EQS @Tom|EQB ].
[ @They|EQS @Mathias|EQB ]. 

Grouping Elements using Sets

SALE also supports the grouping of elements using sets. Sets are annotated using curly brackets ({ and }). This way, a set of elements can play a role in a relation: I like Apples and Oranges will be annotated as follows: [ I|AG like|ACT {Apples AND Oranges}|PAT ].

SALE supports and-sets as well as or-sets:

  • {Apples, Oranges AND Bananas}|ROLE
  • {Apples, Oranges OR Bananas}|ROLE


Can be found in the examples section.

New Insights

  • A (non-annotated) member inherits its function from the set that contains this member.
  • Attributesets: ${red, hot, AND expensive}
  • Attributesets: ${red, $very hot, AND $very expensive}
  • Attributesets: ${red AND $very {hot AND expensive}}
  • Attribute-sets are possible too, like the following example tells.
 [ #The ${challenging, strategic AND dynamic} game_of_chess #is played between *two opponents ].

Of course, you can use multiplicity-sets as well and you can also combine them.

 [ They move their *{16 $black <<1 AND 16 $white <<1} pieces alternately 
     on a square board called a chessboard ].

Note: There is no current support for sets using the boolean algebra, i.e. [ An apple is ${ {red AND juicy} OR {green AND sour} } ] .

  • XOR-sets : {Apples, Oranges XOR Bananas}|ROLE
  • Constituentsets: [Leonhard|AG {sits AND talks}|ACT].
  • Constituentsets: [Leonhard|AG {sits AND talks $loudly<<1}|ACT].
  • Constituentsets: [Leonhard|AG {talks $loudly<<1 AND [^goes|ACT home|LOC_DEST]}|ACT].
  • Comments: #{ bla bla, bla bla }
  • Multiplicities: *{one OR two} cars
  • Identifier for roles: Mathias|AG(play) plays|ACT(play)

Related Publications


Last modified 10 years ago Last modified on Jul 7, 2011 9:14:16 AM