SALE MX - Model Extraction from Natural Language Texts

SALE MX aims at the extraction of UML models from natural language text. To avoid error prone NLP, SALE MX starts after a (currently manual) annotation of a NL text. The annotation explicitly marks the semantics of the text, thereby documenting a common understanding of the requirements (see Preparing a Text for details).

The basis of the entire process is SENSE, the Software Engineer's Natural Language Semantics Encoding. SENSE describes how semantics can be encoded and used to process NL texts. SALE (the SENSE Annotation Language for English) is one possible realization of the SENSE process and provides a set of thematic roles with which you can explicitly encode the semantics of texts. Even though designed for English, SALE is also usable for various languages like German, French an Hungarian. See the examples section for further information.

SALE also comes with an ANTLR based compiler that transforms the annotated text into a graph representation, which can be loaded into GrGen.NET. This graph is the internal discourse model of the text an is the central artifact of our process. More or less simple graph rewriting rules are then used to evaluate the structure of the semantics. We also use graph rewriting rules to produce an internal graph representation of an UML document which can be saved to an XMI document for further processing.

Apart from the annotation process, the system works without user interaction and produces UML diagrams. This annotation process can be time consuming an is the bootleneck of our system at the moment. Therefore we aim at providing a supportive tool for annotators and try to (pre-) annotate texts automatically (see AutoAnnotator) for details.

SALE MX - System overview

Future components are linked with red arrows, implemented components have a blue underground.

Base system and subprojects

Overview (I) and related subprojects

  • Component SUMOX
    A framework that is able to identify associated SALE constructs and to give a proposal to which UML elements they could be converted
  • Component UML - Consists of two subprojects:
    • SALE2UML
      Extracts the UML model based on MOF v. 2.0 and UML v. 2.1 from the SALE-document.
    • UML2XMI
      An exporter ruleset for GrGen.NET that exports UML graphs into a readable XMI document.
  • Component GRS2SALE
    Disassembles a SALE graph instance back to a SALE document file.
  • Component UML Feedback
    Synchronize changes made in the UML-tool with the original XMI-document.

Overview (II) and related subprojects

Related Publications


Back to Home

Last modified 12 years ago Last modified on Jul 7, 2011 9:05:11 AM