My research focuses on semantic search, the topic of my PhD, and is motivated by the way the World Wide Web is evolving.
Especially with the advent of Linked Data, the Semantic Web is finally gaining momentum, while at the same time more and more Web 2.0 applications are being deployed. The two trends are increasingly interwoven and build the basis of the next generation Web, the Web 3.0, where conventional websites and data sources (e.g., DB, XML, HTML, plain text, multimedia) should be integrated and linked as well. Thus, there is a plethora of information in various representation forms which can be mapped to an information pool composed of a knowledge base (in RDF/S) and a text index. To effectively search such data, an engine is required which is able to explore linked content with various degrees of formality. For comprehensive search support, all kinds of queries, i.e., free text queries, formal queries, and queries composed of free text and also formal parts should be allowed. Furthermore, various kinds of answers should be delivered (e.g., facts, documents and documents with relevant facts). Therefore, we propose a search method which combines a fact retrieval approach (triple-based) and a semantic document retrieval approach (spreading activation).
For more information see publications.
In my opinion, the Semantic Web and Web 2.0 build the basic principles of the next World Wide Web generation or
so-called Web 3.0. The Semantic Web ('Web of Data', 'Linked Data') is
based on a formal description (RDFS, RDF,
OWL) of resources, i.e., data and
services, allowing them to be uniquely identified and defining relations between
them. Furthermore, such a formal description is machine readable and interpretable, thus
enabling the development of learning methods. Web 1.0 and legacy IT content, i.e., linked documents
(HTML, XHTML), Databases, plain-text and multimedia files, are also integrable and linkable
using such formal descriptions. The Web 2.0 has a rather less technical focus and describes a
social phenomenon of activities in the Web. Web 2.0 is about linked people, linked
social services, thus social media sharing platforms mostly with folksonomies
blogs (Technorati), wikis (Wikipedia).
The Semantic Web and Web 2.0 ideas are increasingly interweaving, Social Semantic Web applica-
tions are being developed. Such applications are, e. g., semantic wikis (Semantic
MediaWiki, Kaukolu), semantic blogs (SemBlog),
social semantic networks (PeopleAggregator) and
social semantic information spaces (NEPOMUK ).
The Semantic Desktop is a means for personal knowledge management.
It transfers the Semantic Web to desktop computers by consistent application of Semantic Web standards
like the Resource Description Framework (RDF) and RDF Schema (RDFS). Documents, e-mails, contacts, multimedia files are identified by
URIs, across application borders. The user is able to annotate, classify, and relate these resources and
to represent abstract concepts like projects, events, groups, etc. expressing his view in a Personal Information Model (PIMO).
On a full-featured Semantic Desktop many data sources are integrated: a personal categorization system of
topics, projects, people, events etc. is established. The text and metadata of all documents is indexed and categorized.
Together with the collected metadata about these concepts, a critical amount of information is available -
browsable, searchable - to the user.
For more information visit
Machine Learning (ML) is a subfield of artificial intelligence. It deals with the development of algorithms
that allow the computer to learn from data.
I'm interested in machine learning algorithms, their characteristics, applications and limits.
Information retrieval (IR) deals with the representation, organization and access to information-items
like text, sound, images, data in several datasets such as documents, databases, metadata, hypertext etc.
I became acquainted with information retrieval during my diploma thesis.
I have developed, implemented and evaluated four methods for supervised word sense disambiguation,
one of the most challenging tasks of information retrieval. The approaches are based on methods of statistical analysis
like Singular Value Decomposition, coocurrence matrix and machine learning algorithms like the Naive Bayes
Classifier, K-Means clustering.