What is ontology and knowledge representation

Background knowledge: from glossary to ontology

Semantic models for knowledge representation, especially ontologies, allow a formal and interpretable representation of knowledge for humans and machines (cf. Ehrig, Hartmann and Schmitz, 2004). These models of knowledge representation are used in a wide variety of scientific disciplines, but are examined below for their application to information technology. The power or semantic richness of the approaches increases in the following order: Glossary - Taxonomy - Thesaurus - Topic Map - Ontology. (see Fig. 11)

Figure 11: Semantic stairs; Source: Blumauer, Andreas; Pellegrini, Tassilo (2006). Semantic Web and Semantic Technologies. Central terms and distinctions. In: Pellegrini, Tassilo; Blumauer, Andreas (ed.): Semantic Web. Paths to a networked knowledge society. Berlin: Springer Verlag, pp. 9-27

A glossary is a simple list of terms with accompanying explanations. Relationships to other terms are not formally recorded.
The Taxonomy is a hierarchy of terms representing elements in a super / subordinate order. Apart from the hierarchical structure, no relationships between elements can be defined (see Ullrich, Maier and Angele, 2003, p. 3). The bookmarks of a web browser are an example of this. By creating folders, categories are created in which the bookmarks are saved. Relationships between bookmarks of a category (e.g. "is English version of") cannot be expressed. Bookmarks that appear in several categories must also be saved several times.
The thesaurus extends the model of a taxonomy with two firmly defined relationships between the objects: the similarity and synonym relation. Two synonymous terms (synonyms) can be defined as such and objects with similar properties can be related (see Ullrich, Maier and Angele, 2003, p.4). Examples: Thesaurus, Topic Map and Ontology are "similar"; "Data Mining" and "KDD" are synonyms. In this context the WordNet project (http://wordnet.princeton.edu/) should be mentioned, a linguistic universal thesaurus for the English language. In addition to synonym relationships, more complex relationships (e.g. hypernyms and idioms) between the terms can be modeled here.
The Topic Map is an ISO standard based on XML (http://www.isotopicmaps.org/sam/). It consists of topics (abstract things), associations, scopes (areas of validity for topics) and assigned documents outside the topic map (occurrences) (see Ullrich, Maier and Angele, 2003, p. 5). You can define associations between objects yourself.
A ontology is in philosophy a theory about the essence of being. Researchers in the fields of artificial intelligence and the web are adopting the term ontology in their jargon to mean a document or file that formally defines the relationships between terms. The typical type of ontology for the web involves a taxonomy and a set of conclusions and relationships. (cf. Berners-Lee, Hendler and Lassila, 2001, 2001) In the context of the “Semantic Web” (cf. Berners-Lee, Hendler and Lassila, 2001) and knowledge sharing and knowledge reuse, the term is often used as “an explicit specification of a (common) conceptualization ”(cf. Blumauer and Pellegrini, 2006, p. 12; cf. also Ullrich, Maier and Angele, 2003, p. 6).

A powerful set of rules enables relationships between objects of the ontology (and other ontologies) to be expressed using “if-then” relationships, assignments, logical links and other functions. In addition, ontologies offer the possibility of separating schema (data model) and content from one another (see Ullrich, Maier and Angele, 2003, p. 7). The range of possible uses of the term or concept "ontology" shows the following typification based on Blumauer and Pellegrini (2006, p. 16):

  • Domain ontologies model knowledge within a domain. e.g. tourism or bioinformatics.
  • Metadata ontologies as vocabulary for describing information sources or types. e.g. Dublin-Core (http://dublincore.org/) and RSS (http://web.resource.org/rss/1.0/spec).
  • Generic ontologies that can form the basis for a variety of more specific domain ontologies.
  • So-called "Light-Weight-Ontologies". These can be expressed, among other things, by the topic map standard.

Topic maps and ontologies are suitable for improving search systems, for navigation and for visualization. Ontologies also allow the integration of heterogeneous data sources and guarantee future viability, as they cover the capabilities of taxonomies, thesauri and topic maps (cf. Ullrich, Maier and Angele, 2003, pp. 9-10).

How are the documents prepared or prepared for text mining?