When developing complex intelligent systems, such as a hybrid semantic search engine which is considered a completely new search paradigm for the user, we face the problem of making complex systems easy to interpret for the user. Therefore, I expanded my research interests with human-machine interaction. With more and more experience, I have become a professional in user-group-specific interaction design with software and hardware interfaces. My main interest is the design of interaction with environmental sensors and actuators, especially for people without technical experience.
For more information see publications.
The subject of my doctoral thesis is semantic search in the context of today's information management systems. These systems include intranets and Web 3.0 applications, as well as many web portals that contain information in heterogeneous formats and structures. On the one hand, they contain data in a structured form, and on the other hand they contain documents that are related to this data. However, these documents are usually only partially structured or completely unstructured. For example, travel portals describe the period, the destination, the cost of the travel through structured data, while additional information, such as descriptions of the hotel, destination, excursions, etc. is in unstructured form.
The focus of today's semantic search engines is to find knowledge either in a structured form (also called fact retrieval), or in semi- or un-structured form, which is commonly referred to as semantic document retrieval. Only a few search engines are trying to close the gap between these two approaches. Although they search simultaneously for structured and unstructured data, the results are either analyzed independently, or the search possibilities are highly limited: for example, they might support only specific question patterns. Accordingly, the information available in the system is not exploited, and, simultaneously, the relationships between individual pieces of content in the respective information systems and complementary information cannot reach the user.
In order to close this gap, this thesis develops and evaluates a new hybrid semantic search approach that combines structured and semi- or un-structured content throughout the entire search process. This approach not only finds facts and documents, it uses also relationships that exist between the different items of structured data at every stage of the search, and integrates them into the search results. If the answer to a query is not completely structured (like a fact), or unstructured (like a document), this approach provides a query-specific combination of both. However, consideration of structured as well as semi- or un-structured content by the information system throughout the entire search process poses a special challenge to the search engine. This engine must be able to browse facts and documents independently, to combine them, and to rank the differently structured results in an appropriate order. Furthermore, the complexity of the data should not be apparent to the end user. Rather, the presentation of the contents must be understandable and easy to interpret, both in the query request and the presentation of results.
The central question of this thesis is whether a hybrid approach can answer the queries on a given database better than a semantic document search or fact-finding alone, or any other hybrid search that does not combine these approaches during the search process. The evaluations from the perspective of the system and users show that the hybrid semantic search solution developed in this thesis provides better answers than the methods above by combining structured and unstructured content in the search process, and therefore gives an advantage over previous approaches. A survey of users shows that the hybrid semantic search is perceived as understandable and preferable for heterogeneously structured datasets.
For more information see publications.
In my opinion, the Semantic Web and Web 2.0 build the basic principles of the next World Wide Web generation or
so-called Web 3.0. The Semantic Web ('Web of Data', 'Linked Data') is
based on a formal description (RDFS, RDF,
OWL) of resources, i.e., data and
services, allowing them to be uniquely identified and defining relations between
them. Furthermore, such a formal description is machine readable and interpretable, thus
enabling the development of learning methods. Web 1.0 and legacy IT content, i.e., linked documents
(HTML, XHTML), Databases, plain-text and multimedia files, are also integrable and linkable
using such formal descriptions. The Web 2.0 has a rather less technical focus and describes a
social phenomenon of activities in the Web. Web 2.0 is about linked people, linked
social services, thus social media sharing platforms mostly with folksonomies
blogs (Technorati), wikis (Wikipedia).
The Semantic Web and Web 2.0 ideas are increasingly interweaving, Social Semantic Web applica-
tions are being developed. Such applications are, e. g., semantic wikis (Semantic
MediaWiki, Kaukolu), semantic blogs (SemBlog),
social semantic networks (PeopleAggregator) and
social semantic information spaces (NEPOMUK ).
Information retrieval (IR) deals with the representation, organization and access to information-items
like text, sound, images, data in several datasets such as documents, databases, metadata, hypertext etc.
I became acquainted with information retrieval during my diploma thesis.
I have developed, implemented and evaluated four methods for supervised word sense disambiguation,
one of the most challenging tasks of information retrieval. The approaches are based on methods of statistical analysis
like Singular Value Decomposition, coocurrence matrix and machine learning algorithms like the Naive Bayes
Classifier, K-Means clustering.
Machine Learning (ML) is a subfield of artificial intelligence. It deals with the development of algorithms
that allow the computer to learn from data.
I'm interested in machine learning algorithms, their characteristics, applications and limits.