Abecker, Michael Sintek,
and Holger Wirtz
German Research Center for Artificial Intelligence (DFKI) GmbH
To appear in: IFMIP-98, First International Forum on Multimedia & Image Processing. Anchorage, Alaska, USA. May 1998
We present a scenario for storage and exploitation of corporate knowledge assets. At the heart of an intelligent information infrastructure, the Organizational Memory (OM) holds and brings to beneficial use manifold kinds of multi- and hypermedia documents, as well as formal and non-formal knowledge. The scenario is grounded on advanced information retrieval (IR) technology--knowledge-rich methods for logic-based retrieval and conceptual indexing are the basis for its design. We focus on information modeling issues and introduce a comprehensive information-source description schema. The schema characterizes a knowledge item along the dimensions form (logical and layout structure, described in the Information Source Ontology), content (conceptual structure, described in the enterprise's Product Domain Ontology), and meta-content (context factors in terms of the Enterprise Ontology).
KEYWORDS: information retrieval, information modeling, organizational memory, knowledge management
INTRODUCTION: CORPORATE KNOWLEDGE MANAGEMENT
Organizational knowledge management aims at improving the capitalization on existing knowledge assets and at facilitating the creation of new knowledge. An OM which captures, stores, disseminates, and eases context-dependent utilization of valuable corporate knowledge is a central prerequisite for the information-technology part of knowledge management. Contributions from Artificial Intelligence mostly focus on formal knowledge representations for capturing individual expertise, e.g., in expert systems, or case bases. However, empirical research shows only few systems based on formal knowledge-bases operational in industrial practice , mainly due to difficult system maintenance. Our own industrial experiences revealed also further problems (cp. ): First, supporting knowledge workers means to conjointly handle and synergetically use manifold heterogeneous sources of knowledge, data, and (multimedia) documents. Valuable tacit knowledge is often hard or impossible to formalize because of its many implicits etc. Even if formalization is principally possible, it is not necessarily profitable in terms of a rigid cost-benefit-analysis. In an enterprise, we are facing knowledge of different degree of formality and in manifold representations:
Organizational knowledge of formal nature, like business rules, design guidelines, etc. is often insufficiently regarded, it is hard to know and remember for all employees. Another problem are frequent changes. If possible with reasonable effort, such knowledge should be formally represented to ensure its automatic utilization.
Individual or group experiences are often tacit knowledge which is not sufficiently documented and shared with other employees. First steps to do this are lessons learned archives, best practice databases, etc. Because such experiences are typically very hard to formalize, they should be recorded as semi-structured electronic memos still relying on natural language, but indexed with workflow-specific terms to enable their sharing and reuse.
Knowledge contained in (multi- and hypermedia) documents and in databases, e.g., technical documentation, hypertext manuals, product data, video tapes, images, office letters, old workflow instances etc. is often hard to find, exploit, and utilize. It is buried in large file cabinets and dispersed over several data and document bases. Here, formal meta-information can ease its retrieval and exploitation.
In order to effectively cope with the heterogeneity of information, we outline an OM's role as follows: The OM's objective is intelligent assistance to the user rather than automatic problem solving. This assistance is achieved by actively providing information useful in the current workflow context. Formal knowledge is primarily used for detecting an actual information need, finding potentially useful existing information sources, and determining their relevance for the task at hand. In order to keep the effort at a minimum for up-front knowledge engineering during system development, existing formal knowledge structures should be integrated as far as possible.
An OM designed along this outline essentially acts as an active multimedia IR system which comprises existing information sources as its content. Formal knowledge integrated wherever reasonable partly automates problem-solving, but mainly helps finding more informal knowledge documents. Interoperability of representations is achieved by description within a common information space. In this paper, we will concentrate on information modeling issues, i.e., on the question how this common information space can be designed. To this end, we first review recent developments in multi- and hypermedia IR and then adapt and extend these ideas to our scenario.
INFORMATION MODELING IN INFORMATION RETRIEVAL
Logic-Based Information Retrieval understands retrieval as the task of finding all documents d for a given query q which are likely to imply q, i.e., d -> q holds. Retrieval is seen as logical inference which can profit from different sources of background knowledge. The inference works on formal representations of both documents d and query q. Since a user's real information need is typically specified only rather vague in a query, and, on the other hand, the content of documents can only be modeled to a certain extent, it is clear that there is a lot of vagueness and uncertainty intrinsic to the inference process. This is reflected by probabilistic inferences which aim at computing the probability P(d -> q) that d implies q.
Dimensions of Document Models: Usually, document modeling in logic-based IR is concerned with three dimensions of document description :
Most IR systems use also some factual knowledge about the document, e.g., the author's name, the publisher etc. We will refer to these document-extrinsic features as document meta-content or document contextual structure.
The most interesting advantage of such a comprehensive modeling of documents (and any other information source) is the possibility to attach additional background knowledge to each of the modeled dimensions and let these knowledge bases interact. The most important example herefore is to have a sophisticated model of the domain the documents talk about and to index documents with pointers into this domain model. This conceptual indexing approach for sophisticated content representation allows, e.g., formulation of domain-specific search heuristics  or a more precise query formulation . It is not only a way for indexing non-text documents (e.g. video tapes or images) , but also a natural means for integrating information from different sources with different vocabulary .
Document Modeling Languages: Having identified the dimensions useful for describing information sources, it has to be clarified what language is needed. From the examples (especially the detailed domain modeling) above it is already clear that at least the basic abstraction mechanisms are needed which constitute a structurally object-oriented formalism: classification of objects into classes, generalization of classes to superclasses, aggregation to express ``part-of'' relationships, and attribute-value assertions to specify certain class instances. We need also inferential capabilities for the formulation of search heuristics or to follow links in hypermedia documents. In order to provide the needed expressiveness plus a means for coping with the mentioned uncertainty of the inference, typical approaches are based on probabilistic extensions of, e.g., terminological logics, or deductive databases [10,11]. In the following section, we will show how these mechanisms can be employed for supporting corporate knowledge management.
INFORMATION MODELING FOR CORPORATE MEMORIES
Figure 1: Overview of an Organizational Memory
Figure 1 sketches our approach to grading up from information retrieval to knowledge management. We start with heterogeneous, multi- and hypermedia information sources which shall work together to provide comprehensive problem-solving support. To this end, all sources are described according to a homogeneous, comprehensive description schema (knowledge item descriptions, KIDs). Recent investigations on OM organization principles  show the following factors essential for determining the knowledge which is useful to support an activity: the task to be performed, the role the actor plays for this task, and the domain the task is done within. Hofer-Alfeis & Klabunde  concretize these factors in enterprise terminology as business process activity, organisational role, and product to be processed. This view gives us a first specialization of the general IR scenario for the enterprise knowledge management problem:
Using these formal structures for indexing knowledge items has the advantage that already existing formalizations can be reused. Compared to conventional IR approaches, we consider the context dimension very important.  show how rich knowledge about business processes, started process instances, and dependencies between documents in different business process activities can be employed for powerful search and retrieval of office letters. We adopt this view, but extend it from office letters to all information sources used in a business process.
While the conceptual and contextual structure mainly help finding the appropriate sources (what can be found?), the representational ontology helps to extract and access knowledge from an information source and to find the appropriate piece of knowledge within a document (how is it found?). Though standard in office-letter processing, this idea must also be lifted to arbitrary information sources (e.g., a database has no layout and typically a logical structure identical with the conceptual one).
Having described information sources, we provide for active support within the work processes. To this end, we set up an extended business process model which attaches to each activity its information goals: these are expressed as variables from the domain ontology which the respective task has to fill (e.g. the goal of a decision activity within a purchasing process may be to determine which supplier should be engaged). Now we have to automatically send queries for support of actual information needs to the OM which are interpreted on the basis of the actual business process activity, of the goal to be reached, business context factors of the activity (e.g. the person performing the taks and her respective department), and goals already achieved (e.g. the already determined features of the product to be buyed). The query evaluation is done as a knowledge-intensive retrieval process with a number of background-knowledge sources the explanation of which is beyond the scope of this paper.
Due to space limitations we could only give a glance at our comprehensive Organizational Memory model. At the core of an active support system for knowledge-intensive problems stands a meta-information system comprising heterogeneous sources of data, formal knowledge, and documents. These sources are described in a schema which is derived from state-of-the-art information modeling in hyper- and multimedia IR. However, we emphasize the importance of knowledge-item context and try to reuse existing formal structures wherever possible. Further, we try to add as much formal knowledge as reasonable to ease exact retrieval and enable formal inferences if parts of the problem can be solved automatically. The main goal is not accompanying or replacing existing information systems, but their full and easy exploitation by the addition of meta- and complementary knowledge.
The core of our retrieval engine is currently being implemented as a structurally object-oriented description formalism mainly along the ideas of . As needed for conceptual indexing, it allows for class instances as well as whole classes to be used as attribute values. Attribute-value assertions can be equipped with a probability. Uncertain assertions can be declared as disjoint or independent events. The system is designed as an object-oriented interface on top of a probabilistic relational algebra which in turn is mapped onto the conventional relational data model. The implementation is done in Java with JDBC.
From Hypermedia Information Retrieval to
Knowledge Management in Enterprises
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -dir htmldir -split 3 final-header.
The translation was initiated by Andreas Abecker on 11/20/1997