Stein, C.; Arndt, S.:
Controlled Language with Terminology-Ontologies: How to Automatically Check Comprehensibility of Texts.
LaRC 2013 - International Workshop on Terminology, Language and Content Resources, Pretoria, Südafrika, Juni 2013.
Controlled Language is needed to make sure a message will be delivered. It is used to reduce ambiguity and complexity by reducing a wide range of possibilities of interpretations to an essential subset. The subset is said to represent something everybody will understand in the same way. If we are forming more complex grammatical constructions, using specialized words or referring to an implicit context, this is opening a space for misunderstanding we want to avoid. On the other hand our texts are normally dealing with very complex topics and need to be terminologically specific about certain things. If a text is paraphrasing a technical term, it might help the layman, but will confuse the professional. On the other hand technical editors or translators are normally no explicit experts in the topic they are writing about. They might also blur the meaning for a special target group instead of clarifying it.
Considering this, we could look out for a tighter connection between author, text and
reader. Right now a text normally has to be general for all kinds of readers – but not every reader is similar. In fact readers are normally having multiple backgrounds and knowledge and of course they are using different terminologies. The questions are: How can we assist the author in writing a text exactly fitting his target group? How can we help him anticipate the target groups wording? How can we make the corresponding definitions available in documents? And last, how will we provide mutual understandability of different target groups and what does this imply for oral communication? Terminological ontologies can make a major contribution to answering those questions.
Many people are talking about ontologies right now and the upcoming next generation of the web, the semantic web. In fact there are many different types of ontologies or semantically enriched data to be distinguished. An ontology can be seen as the next generation paradigm of data management. It can also be seen as a way to give computers a better way to understand and process semantics. Some people say that ontologies are the only way to actually build a knowledge management system. Ontologies can be seen as an easy way to merge and combine data from different resources. And they can be seen as just a necessity to improve the search engine ranking for content. Terminological ontologies, though, are focusing not so much on data but on natural languages, how we can structure them and how to deal with natural language ambiguities in a proper way. They are much more than a simple list of terms and definitions, they are a semantically structured network of these terms which itself forms the meaning. This network can be navigated, new information can be achieved by reasoning across relations and with its help we can improve current controlled language tools.
What is the basic idea of this? Understanding can only be achieved when the grammatical structure and the terms associated with certain concepts are well-‐known in the target community. Different communities of practice may refer to the same phenomena of reality but verbalize them differently, based on perspective, customs and experience. Lawyers for example will speak about a traffic accident in totally different words than engineers, politicians or the press. Nearly every bigger company has a corporate language with a certain terminology today – but each one is different. Even in the standards we find many
different definitions for the same term and many terms for the same concept – which belong to different communities. Does that mean standardization does not work? No, it means that every community has its own language and has a right to keep it. But as nobody wants to fall back into the time of not existing terminology management, we need to keep in mind, who understands what in our text. In many cases misunderstandings are not detected, because the meaning is similar enough to miss the differences!
To get closer to this, we did some research on the use of terminological ontologies with language checkers for controlled language. As it is widely used by our industry partners, we are currently building a Microsoft Word Plugin, which connects to an ontology and checks the text while writing. It is not only checking whether a text uses deprecated denominations, it also checks, which terms are common in which community and how they are understood there. If the author is providing information about his or her community background and the target community of the text, the tool automatically checks where misunderstandings will occur. The necessary information is provided by the terminology. In real life we are not using one single terminology but we have several terminologies in parallel use. The tool allows modeling that fact by including a hierarchy of terminologies in use. For example, a user can include a general lexicon on the base level, add a company glossary above that, a project glossary and an individual glossary on top. The tool also includes the corresponding definititions directly in the docx-‐file or generates a printable document specific glossary. This way it stays clear which meaning the author associated to the terms while writing the text. It will also show the author the gaps in his glossaries and enables him to close them.
This way implicit knowledge can be made explicit without being too restrictive and universal.
Also, controlled language in a terminological and ontological perspective can be done much more flexible and directly bounded to the author and the reader. In the presentation I would like to show the general way this works and the basic ideas behind it to discuss the possibilities that arrive with terminological ontologies for usage in controlled language. I would also like to make a short excursus about the use of controlled language in requirements engineering, which is another associated focus of our research.