1.

Subsumption Semantics for XML Tags*


Version: Apr 12, 2000

First Version Prepared for:
Dagstuhl Seminar 00121, Semantics for the Web
March 19-24, 2000


Harold Boley
DFKI GmbH







* "Practice what you preach": XML source of these slides at subtag.xml (subtag.xml.txt);
transformed to HTML via Michael Sintek's SliML stylesheet at slides.xsl
Next - Prev


2. The Ontological XML Imperative

Problem of Semantics for the Web:

One, Standardized Semantics

Would Boost the Web Further Than

Many, Non-Standardized Semantics



Proposed Solution Step:

Incorporate Subsumption Semantics

Right Into the Web's XML x.0 Tags (x>1):

Build Taxonomy Into DTDs, Leave Axioms To Schemas!

(Ontology = Taxonomy + Axioms)
Next - Prev


3. Attacking a Problem with XML

  • Main building blocks of XML DTDs are chaining (whole-part; inversely, part-of) relations between a parent element and an (ordered) sequence of child elements
  • XML DTDs cannot specify subsumption (inversely, isa or kind-of) relations between a tag and an (unordered) set of subtags, as needed for the taxonomy backbone of ontologies
  • Subsumption of element tags is thus handled ad hoc or in non-DTD schema languages
    • RDF Schema: subClassOf specifies subsumption (not specific for tags)
    • XML Schema (Part 1): derivedBy="extension" right-appends children (no subsumption specification)
  • Various non-DTD tag-subsumption and -inheritance semantics thus lead to diverging XML uses
Next - Prev


4. An XML Sample

Consider this DTD for a sales database table:


<!ELEMENT sales          (company, item, quantity) >

<!ELEMENT company        (#PCDATA) >
<!ELEMENT item           (#PCDATA) >
<!ELEMENT quantity       (#PCDATA) >

Also, one element for a corresponding tuple:


<sales>
  <company> Onoffbook </company>
  <item> XML4You </item>
  <quantity> 12417 <quantity>
</sales>
Next - Prev


5. Differentiating Parent Subtags: From Copying to Subsumption

Now, for the E-commerce era, let us differentiate online-sales from offline-sales tags, which should both `inherit' their children from the neutral sales element:


<!ELEMENT sales          (company, item, quantity) >
<!ELEMENT online-sales   (company, item, quantity) >
<!ELEMENT offline-sales  (company, item, quantity) >

<!ELEMENT company        (#PCDATA) >
<!ELEMENT item           (#PCDATA) >
<!ELEMENT quantity       (#PCDATA) >

Instead of such copied child declarations or corresponding ENTITY declarations, we propose a new XML version (NEXML) with a true tag-inheritance construct SUBSUMES for DTDs, shortening the first three lines to:


<!ELEMENT sales          (company, item, quantity) >

<!SUBSUMES sales online-sales >
<!SUBSUMES sales offline-sales >
Next - Prev


6. How NEXML Uses SUBSUMES: Child Inheritance

Element instances with online-sales and offline-sales tags thus obtain children with sales-declared company, item, and quantity tags (note the 329 `unclassified-rest' copies while Onoffbook is heading towards its offline/online break-even point for the book "XML4You"):


<!SUBSUMES sales online-sales >
<!SUBSUMES sales offline-sales >

<!ELEMENT sales          (company, item, quantity) >

<!ELEMENT company        (#PCDATA) >
<!ELEMENT item           (#PCDATA) >
<!ELEMENT quantity       (#PCDATA) >

<sales>
  <company> Onoffbook </company>
  <item> XML4You </item>
  <quantity> 329 <quantity>
</sales>

<online-sales>
  <company> Onoffbook </company>
  <item> XML4You </item>
  <quantity> 12417 <quantity>
</online-sales>

<offline-sales>
  <company> Onoffbook </company>
  <item> XML4You </item>
  <quantity> 15182 <quantity>
<offline-sales>
Next - Prev


7. Element Attributes vs. Element Children

  • (NE)XML elements, with attributes and children, correspond to frame/OOP instances; their DTDs correspond to frame/OOP classes
  • Children are ordered, with inheritance for DTD declarations only
  • Attributes are unordered, hence (inspired by frame/OOP) inheritance
    • in DTDs: performed for declarations
    • in processors: permitted for values
  • "#REQUIRED" in NEXML, by attribute inheritance, means "required for all descendant leaf elements"
Next - Prev


8. How NEXML Uses SUBSUMES: Attribute Inheritance

Element instances with online-sales and offline-sales tags also obtain sales-declared attributes such as year and price (the DTD inherits both attribute declarations; an element processor should inherit the year attribute's 2000 value):


<!SUBSUMES sales online-sales >
<!SUBSUMES sales offline-sales >

<!ATTLIST sales year  CDATA #REQUIRED >
<!ATTLIST sales price CDATA #REQUIRED >

<sales        year="2000">
  ...             |
</sales>          |
                  v
<online-sales             price="27.50">
  ...
</online-sales>

<offline-sales            price="22.50">
  ...
<offline-sales>
Next - Prev


9. Differentiating Child Subtags: From Copying to Subsumption

Let us now differentiate household-item from business-item tags, which should both `inherit' their child context (siblings) from the neutral item element:


<!ELEMENT sales          (company, item, quantity) >
<!ELEMENT sales          (company, household-item, quantity) >
<!ELEMENT sales          (company, business-item, quantity) >

<!ELEMENT company        (#PCDATA) >
<!ELEMENT item           (#PCDATA) >
<!ELEMENT household-item (#PCDATA) >
<!ELEMENT business-item  (#PCDATA) >
<!ELEMENT quantity       (#PCDATA) >

Instead of such copied sibling declarations we can use the XML choice (|) construct, shortening the first three lines to:


<!ELEMENT sales          (company,
                          (item | household-item | business-item),
                          quantity) >

However, to express the subsumption semantics hidden within this choice, we again use our tag-inheritance construct SUBSUMES for NEXML DTDs, obtaining:


<!ELEMENT sales          (company, item, quantity) >

<!SUBSUMES item household-item >
<!SUBSUMES item business-item >
Next - Prev


10. Combining Parent and Child Subtags: "Multiplying Out" Subsumption

We can also combine the parent differentiation online-sales/offline-sales with a child differentiation such as household-item/business-item, obtaining (#PCDATA declarations omitted):


<!ELEMENT sales          (company, item, quantity) >
<!ELEMENT sales          (company, household-item, quantity) >
<!ELEMENT sales          (company, business-item, quantity) >

<!ELEMENT online-sales   (company, item, quantity) >
<!ELEMENT online-sales   (company, household-item, quantity) >
<!ELEMENT online-sales   (company, business-item, quantity) >

<!ELEMENT offline-sales  (company, item, quantity) >
<!ELEMENT offline-sales  (company, household-item, quantity) >
<!ELEMENT offline-sales  (company, business-item, quantity) >

As in the child-differentiation example, this can be regarded as the result of "multiplying out" choices from a shorter DTD:


<!ELEMENT sales          (company,
                          (item | household-item | business-item),
                          quantity) >

<!ELEMENT online-sales   (company,
                          (item | household-item | business-item),
                          quantity) >

<!ELEMENT offline-sales  (company,
                          (item | household-item | business-item),
                          quantity) >

As in the parent- and child-differentiation examples, it can also be regarded as the XML result of "multiplying out" SUBSUMES from a more semantic NEXML DTD:


<!ELEMENT sales          (company, item, quantity) >

<!SUBSUMES sales online-sales >
<!SUBSUMES sales offline-sales >

<!SUBSUMES item household-item >
<!SUBSUMES item business-item >
Next - Prev


11. Left and Right Child Extensions in NEXML DTDs

  1. Child chaining and tag subsumption can alternate arbitrarily
  2. Since children are ordered, a subtag can extend inherited child content to the left and/or right
  3. This is indicated in NEXML DTDs by using ELEMENT declarations with two (possibly empty) parenthesized child sequences for subtags: the left and right extensions
  4. We do not employ multiple supertag inheritance simply because a specification corresponding to item 3. would become much harder when multiple child sequences have to be merged
Next - Prev


12. A Refined NEXML DTD Example

This is the full DTD of the original NEXML sample refined by online-sales subtags and item children, one of which rooting a product-taxonomy top-level (sales' subtags online-sales and offline-sales are extended by children to both sides; the online-sales subtag web-sales is extended only to the left; email-sales, only to the right):


<!SUBSUMES sales online-sales >
<!SUBSUMES sales offline-sales >
<!SUBSUMES online-sales web-sales >
<!SUBSUMES online-sales email-sales >
<!SUBSUMES product book >
<!SUBSUMES product cd >
<!SUBSUMES product video >

<!ATTLIST sales year  CDATA #REQUIRED >
<!ATTLIST sales price CDATA #REQUIRED >
<!ATTLIST online-sales weight   CDATA #IMPLIED >
<!ATTLIST online-sales oversize CDATA #IMPLIED >
<!ATTLIST offline-sales stock CDATA #REQUIRED >
<!ATTLIST web-sales href CDATA #REQUIRED >
<!ATTLIST email-sales mailto CDATA #REQUIRED >
<!ATTLIST product code CDATA #REQUIRED >

<!ELEMENT sales          (company, item, quantity) >
<!ELEMENT online-sales   (portal, delivery) (authentization) >
<!ELEMENT offline-sales  (store) (location) >
<!ELEMENT web-sales      (header) () >
<!ELEMENT email-sales    () (user, subject) >
<!ELEMENT item           (wrapper, product) >

<!ELEMENT company        (#PCDATA) >
<!ELEMENT quantity       (#PCDATA) >
<!ELEMENT portal         (#PCDATA) >
<!ELEMENT delivery       (#PCDATA) >
<!ELEMENT authentization (#PCDATA) >
<!ELEMENT store          (#PCDATA) >
<!ELEMENT location       (#PCDATA) >
<!ELEMENT header         (#PCDATA) >
<!ELEMENT user           (#PCDATA) >
<!ELEMENT subject        (#PCDATA) >
<!ELEMENT wrapper        (#PCDATA) >
<!ELEMENT product        (#PCDATA) >
<!ELEMENT book           (#PCDATA) >
<!ELEMENT cd             (#PCDATA) >
<!ELEMENT video          (#PCDATA) >
Next - Prev


13. A Tree-Like Diagram Form of NEXML DTDs

  • Chained (ordered) children branch horizontally, in green; subsumed (unordered) tags branch vertically, in red
  • Left and right child extensions are shown via geometric branch positions
  • Attributes are written (in italics) next below their elements' tags
  • For repeated tag occurrences, the tree-like structure needs repeated tag nodes, which could be expanded/contracted via "+"/"-"-buttons or FlexDAGs
  • Further EBNF-like DTD syntax can be accommodated by labels (*, +, ?) on branches and a new kind of branch (|)
Next - Prev


14. The Refined NEXML DTD Diagram

Next - Prev


15. Implementation Approaches

  • Two approaches for implementing NEXML in validators, browsers, stylesheet processors, and other tools:
    1. DTD preprocessor: Reduce NEXML DTDs to XML 1.0 DTDs by "multiplying out" the SUBSUMES hierarchy as shown in the examples
    2. XML successor: Develop a version of NEXML that directly supports the SUBSUMES hierarchy into a new XML x.0 for W3C standardization
  • Approach 1. can explode the size of generated DTDs, which in certain cases can be avoided by XML choices (|) and ENTITY declarations
  • Still, approach 2. should be attempted, utilizing efficient subsumption algorithms, e.g. from description logics
  • The remaining XML 1.0 features (ENTITY, NOTATION, etc. ) can be taken over to XML x.0, making XML 1.0 DTDs upward compatible
Next - Prev


16. Conclusions

  • NEXML augments XML 1.0 by SUBSUMES declarations and enhanced ELEMENT declarations for subsumption DTDs
  • A corresponding XML x.0 version would be immediately useful by enabling a standardized subsumption semantics for the information on the Web
  • Further refinements could, e.g., comprise overlapping vs. disjoint subsumptions (special integrity constraints)
  • Subsumption DTDs constitute a taxonomy that can also be accessed by axioms (e.g., general ICs) in schemas, giving us the full power of ontologies
  • By moving expressivity from RDF Schema and XML Schema into XML x.0, these separate standards could be reconciled (cf. The Cambridge Communique)
Next - Prev