From Hypermedia Information Retrieval to
Knowledge Management in Enterprises

Andreas Abecker, Michael Sintek, and Holger Wirtz
German Research Center for Artificial Intelligence (DFKI) GmbH

To appear in: IFMIP-98, First International Forum on Multimedia & Image Processing. Anchorage, Alaska, USA. May 1998


We present a scenario for storage and exploitation of corporate knowledge assets. At the heart of an intelligent information infrastructure, the Organizational Memory (OM) holds and brings to beneficial use manifold kinds of multi- and hypermedia documents, as well as formal and non-formal knowledge. The scenario is grounded on advanced information retrieval (IR) technology--knowledge-rich methods for logic-based retrieval and conceptual indexing are the basis for its design. We focus on information modeling issues and introduce a comprehensive information-source description schema. The schema characterizes a knowledge item along the dimensions form (logical and layout structure, described in the Information Source Ontology), content (conceptual structure, described in the enterprise's Product Domain Ontology), and meta-content (context factors in terms of the Enterprise Ontology).

KEYWORDS: information retrieval, information modeling, organizational memory, knowledge management


Organizational knowledge management aims at improving the capitalization on existing knowledge assets and at facilitating the creation of new knowledge. An OM which captures, stores, disseminates, and eases context-dependent utilization of valuable corporate knowledge is a central prerequisite for the information-technology part of knowledge management. Contributions from Artificial Intelligence mostly focus on formal knowledge representations for capturing individual expertise, e.g., in expert systems, or case bases. However, empirical research shows only few systems based on formal knowledge-bases operational in industrial practice [4], mainly due to difficult system maintenance. Our own industrial experiences revealed also further problems (cp. [8]): First, supporting knowledge workers means to conjointly handle and synergetically use manifold heterogeneous sources of knowledge, data, and (multimedia) documents. Valuable tacit knowledge is often hard or impossible to formalize because of its many implicits etc. Even if formalization is principally possible, it is not necessarily profitable in terms of a rigid cost-benefit-analysis. In an enterprise, we are facing knowledge of different degree of formality and in manifold representations:

Organizational knowledge of formal nature, like business rules, design guidelines, etc. is often insufficiently regarded, it is hard to know and remember for all employees. Another problem are frequent changes. If possible with reasonable effort, such knowledge should be formally represented to ensure its automatic utilization.

Individual or group experiences are often tacit knowledge which is not sufficiently documented and shared with other employees. First steps to do this are lessons learned archives, best practice databases, etc. Because such experiences are typically very hard to formalize, they should be recorded as semi-structured electronic memos still relying on natural language, but indexed with workflow-specific terms to enable their sharing and reuse.

Knowledge contained in (multi- and hypermedia) documents and in databases, e.g., technical documentation, hypertext manuals, product data, video tapes, images, office letters, old workflow instances etc. is often hard to find, exploit, and utilize. It is buried in large file cabinets and dispersed over several data and document bases. Here, formal meta-information can ease its retrieval and exploitation.

In order to effectively cope with the heterogeneity of information, we outline an OM's role as follows: The OM's objective is intelligent assistance to the user rather than automatic problem solving. This assistance is achieved by actively providing information useful in the current workflow context. Formal knowledge is primarily used for detecting an actual information need, finding potentially useful existing information sources, and determining their relevance for the task at hand. In order to keep the effort at a minimum for up-front knowledge engineering during system development, existing formal knowledge structures should be integrated as far as possible.

An OM designed along this outline essentially acts as an active multimedia IR system which comprises existing information sources as its content. Formal knowledge integrated wherever reasonable partly automates problem-solving, but mainly helps finding more informal knowledge documents. Interoperability of representations is achieved by description within a common information space. In this paper, we will concentrate on information modeling issues, i.e., on the question how this common information space can be designed. To this end, we first review recent developments in multi- and hypermedia IR and then adapt and extend these ideas to our scenario.


Logic-Based Information Retrieval understands retrieval as the task of finding all documents d for a given query q which are likely to imply q, i.e., d -> q holds. Retrieval is seen as logical inference which can profit from different sources of background knowledge. The inference works on formal representations of both documents d and query q. Since a user's real information need is typically specified only rather vague in a query, and, on the other hand, the content of documents can only be modeled to a certain extent, it is clear that there is a lot of vagueness and uncertainty intrinsic to the inference process. This is reflected by probabilistic inferences which aim at computing the probability P(d -> q) that d implies q.

Dimensions of Document Models: Usually, document modeling in logic-based IR is concerned with three dimensions of document description [9]:

  1. the logical structure, e.g., of a proceedings volume with sections as parts, articles as the sections' parts, and title, abstract, and text body as the articles' parts,
  2. the layout structure, e.g., of a business letter with a rectangular bold-faced region in the upper left corner of the sheet, and
  3. the conceptual structure, e.g., of a technical memo which describes the content of a document making, for instance, statements about a product's quality.

Most IR systems use also some factual knowledge about the document, e.g., the author's name, the publisher etc. We will refer to these document-extrinsic features as document meta-content or document contextual structure.

The most interesting advantage of such a comprehensive modeling of documents (and any other information source) is the possibility to attach additional background knowledge to each of the modeled dimensions and let these knowledge bases interact. The most important example herefore is to have a sophisticated model of the domain the documents talk about and to index documents with pointers into this domain model. This conceptual indexing approach for sophisticated content representation allows, e.g., formulation of domain-specific search heuristics [1] or a more precise query formulation [12]. It is not only a way for indexing non-text documents (e.g. video tapes or images) [6], but also a natural means for integrating information from different sources with different vocabulary [2].

Document Modeling Languages: Having identified the dimensions useful for describing information sources, it has to be clarified what language is needed. From the examples (especially the detailed domain modeling) above it is already clear that at least the basic abstraction mechanisms are needed which constitute a structurally object-oriented formalism: classification of objects into classes, generalization of classes to superclasses, aggregation to express ``part-of'' relationships, and attribute-value assertions to specify certain class instances. We need also inferential capabilities for the formulation of search heuristics or to follow links in hypermedia documents. In order to provide the needed expressiveness plus a means for coping with the mentioned uncertainty of the inference, typical approaches are based on probabilistic extensions of, e.g., terminological logics, or deductive databases [10,11]. In the following section, we will show how these mechanisms can be employed for supporting corporate knowledge management.


   Figure 1: Overview of an Organizational Memory

Figure 1 sketches our approach to grading up from information retrieval to knowledge management. We start with heterogeneous, multi- and hypermedia information sources which shall work together to provide comprehensive problem-solving support. To this end, all sources are described according to a homogeneous, comprehensive description schema (knowledge item descriptions, KIDs). Recent investigations on OM organization principles [13] show the following factors essential for determining the knowledge which is useful to support an activity: the task to be performed, the role the actor plays for this task, and the domain the task is done within. Hofer-Alfeis & Klabunde [7] concretize these factors in enterprise terminology as business process activity, organisational role, and product to be processed. This view gives us a first specialization of the general IR scenario for the enterprise knowledge management problem:

  1. conceptual structure: the topics a knowledge item is dealing with are expressed in terms of the enterprises' product models. Of course, a useful product domain ontology will also define associated concepts like suppliers, buyers etc., and
  2. contextual structure: meta-content like the document-creation context or possible application areas are stated in terms of the enterprise ontology, the main part of which are business process models.

Using these formal structures for indexing knowledge items has the advantage that already existing formalizations can be reused. Compared to conventional IR approaches, we consider the context dimension very important. [3] show how rich knowledge about business processes, started process instances, and dependencies between documents in different business process activities can be employed for powerful search and retrieval of office letters. We adopt this view, but extend it from office letters to all information sources used in a business process.

While the conceptual and contextual structure mainly help finding the appropriate sources (what can be found?), the representational ontology helps to extract and access knowledge from an information source and to find the appropriate piece of knowledge within a document (how is it found?). Though standard in office-letter processing, this idea must also be lifted to arbitrary information sources (e.g., a database has no layout and typically a logical structure identical with the conceptual one).

Having described information sources, we provide for active support within the work processes. To this end, we set up an extended business process model which attaches to each activity its information goals: these are expressed as variables from the domain ontology which the respective task has to fill (e.g. the goal of a decision activity within a purchasing process may be to determine which supplier should be engaged). Now we have to automatically send queries for support of actual information needs to the OM which are interpreted on the basis of the actual business process activity, of the goal to be reached, business context factors of the activity (e.g. the person performing the taks and her respective department), and goals already achieved (e.g. the already determined features of the product to be buyed). The query evaluation is done as a knowledge-intensive retrieval process with a number of background-knowledge sources the explanation of which is beyond the scope of this paper.


Due to space limitations we could only give a glance at our comprehensive Organizational Memory model. At the core of an active support system for knowledge-intensive problems stands a meta-information system comprising heterogeneous sources of data, formal knowledge, and documents. These sources are described in a schema which is derived from state-of-the-art information modeling in hyper- and multimedia IR. However, we emphasize the importance of knowledge-item context and try to reuse existing formal structures wherever possible. Further, we try to add as much formal knowledge as reasonable to ease exact retrieval and enable formal inferences if parts of the problem can be solved automatically. The main goal is not accompanying or replacing existing information systems, but their full and easy exploitation by the addition of meta- and complementary knowledge.

The core of our retrieval engine is currently being implemented as a structurally object-oriented description formalism mainly along the ideas of [11]. As needed for conceptual indexing, it allows for class instances as well as whole classes to be used as attribute values. Attribute-value assertions can be equipped with a probability. Uncertain assertions can be declared as disjoint or independent events. The system is designed as an object-oriented interface on top of a probabilistic relational algebra which in turn is mapped onto the conventional relational data model. The implementation is done in Java with JDBC.


C. Baudin, J. Gevins, V. Baya, and A. Mabogunje.
DEDAL: using domain concepts to index engineering design information.
In Proc. of the Meeting of the Cognitive Science Society, Bloomington, Indiana, 1992.
C. Kindermann, T. Hoppe et al.
The MIHMA demonstrator application: bmt line.
Technical report, Non Standard Logics S.A., Paris and TU Berlin, April 1996.
A. Celentano, M.G. Fugini, and S. Pozzi.
Knowledge-based document retrieval in office environments: The Kabiria system.
ACM Transactions on Information Systems, 13(3), 1995.
Th. Davenport, S.L. Jarvenpaa, and M.C. Beers.
Improving knowledge work processes.
Sloan Management Review, Reprint Series 37(4), Summer, 1996.
H.-P. Frei, D. Harman, P. Schäuble, and R. Wilkinson, editors.
Proc. of the 19th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval.
August 1996.
A.S. Gordon and E.A. Domeshek.
Conceptual indexing for video retrieval.
In M. Maybury, editor, Intelligent Multimedia Information Retrieval, IJCAI-Workshop, Montreal, August 1995.
J. Hofer-Alfeis and St. Klabunde.
Approaches to managing the lessons learned cycle.
In M. Wolf and U. Reimer, editors, Proc. PAKM`96, Basel, October 1996.
O. Kühn and A. Abecker.
Corporate memories for knowledge management in industrial practice: Prospects and challenges.
Journal of Universal Computer Science, 3(8), 1997.
C. Meghini, F. Rabitti, and C. Thanos.
Conceptual modeling of multimedia documents.
IEEE Computer, October 1991.
C. Meghini and U. Straccia.
A relevance terminological logic for information retrieval.
In [5], 1996.
Th. Rölleke and N. Fuhr.
Retrieval of complex objects using a four-valued logic.
In [5], 1996.
B. van Bakel, R.T. Boon, N.J. Mars, J. Nijhuis, E. Oltmans, and P.E. van der Vet.
Condorcet annual report.
Report UT-KBS-96-12, University of Twente, September 1996.
G. van Heijst, R. van der Spek, and E. Kruizinga.
Organizing corporate memories.
In R. Dieng and J. Vanwelkenhuysen, editors, KAW`96, Special Track on Corporate Memory and Enterprise Modeling, November 1996.

About this document ...

From Hypermedia Information Retrieval to
Knowledge Management in Enterprises

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -dir htmldir -split 3 final-header.

The translation was initiated by Andreas Abecker on 11/20/1997


Andreas Abecker (