Jan Portisch has defended his PhD thesis

Jan Portisch has successfully defended his PhD thesis titled “Exploiting General-Purpose Background Knowledge for Automated Schema Matching” on August 25th.

In his thesis, Jan examined how large-scale, structured general purpose data sources such as WordNet, DBpedia, Wiktionary, or WebIsALOD, can be exploited for finding correspondences between data schemas and ontologies. For this purpose, he explored symbolic and embedding-based methods. The thesis was sponsored by SAP and, besides research contributions, also contains interesting insights on how to apply those research methods in industry settings.

The examination committee consisted of Prof. Catia Pesquita (University of Lisbon, Portugal), Prof. Han van der Aa, Prof. Christian Bizer, and Prof. Heiko Paulheim.

Abstract of the thesis:

The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schemamatching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans.
One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.

In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources
since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.

A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.

One of the largest structured sources for general-purpose background knowledge are knowledge graphswhich have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph embeddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.

In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications.

Back