Semantic Technologies for Busy People

Author: Markus Schröder, 24.01.2022


Semantic technologies are technologies to work with semantics, in our case, meaning (message, signification) in the context of knowledge. This tutorial briefly introduces several of its aspects and refers to more material for interested readers.


When you would like to tell someone about something you know, you usually formulate a sentence, like for example:

Douglas Adams created The Hitchhiker's Guide to the Galaxy (sometimes referred to as HG2G) finally in the United Kingdom in 1979.

The sentence uses traditional grammar to structures various information:

Grammatical Part Grammatical Function Example
Subject person or thing about whom the statement is made Douglas Adams
Predicate a property that a subject has or is characterized by created
Object is any of several types of arguments The Hitchhiker's Guide to the Galaxy
Parentheses set off extra information within a sentence (sometimes referred to as HG2G)
Manner describes how and in what way the action of a verb is carried out finally
Place where it happens in the United Kingdom
Time when it happens in 1979

To enable an artificial intelligence (AI) to also "know" what is mainly stated in the sentence, we should store only the three most important grammatical parts:
subject, predicate and object
Because such statements like ("Douglas Adams", "created", "The Hitchhiker's Guide to the Galaxy") have always three parts, we also call them triples (or triple statements). We often abbreviate the three parts with their first letter (s,p,o) (sometimes refer to them as SPO-triples).

You as a human with all your background knowledge and natural reasoning should be able to understand the sentence without much ambiguity. However, our AI is not that smart and would wonder what exactly you mean (ambiguity, vagueness). In fact, there are serveral people named Douglas Adams like an American engineer or an American cricketer. Moreover, does "created" mean by artistic means or the cause to be or to become? And is "The Hitchhiker's Guide to the Galaxy" the novel or the film? Moreover, instead of just mentioning, we should specifically refer to what we mean without any ambiguity for the AI. There are many possibilities to implement this.

Better Use a Standardized Way

Since we would like to transfer knowledge from one AI system to another, we should do it in a way everybody is familiar with. To allow a group of people commit to a certain way, technical standards are usually defined. For example, this webpage was sent to you with the HyperText Transfer Protocol (HTTP), which both the server and your browser understand. To transfer knowledge in form of statements like shown above, it would be helpful to have a standardized way, too. Fortunately, the World Wide Web Consortium (W3C), the main international standards organization for the World Wide Web, already recommends several standards to give you and other researchers one standard way of working with knowledge.

The standards come from the idea, that the Web should not only be human-readable, but also machine-readable by adding semantic (= meaning) to webpages. This idea was coined Semantic Web. In the following is a comparison:

Web: "Hypertext Markup Language (HTML) is enough to present it to humans."

<div id="42">Douglas Adams created The Hitchhiker's Guide to the Galaxy <i>(sometimes referred to as <b>HG2G</b>)</i> finally in the United Kingdom in 1979.</div>

Semantic Web: "why not also send statements to enable machines to process it?"

(Douglas Adams, created, The Hitchhiker's Guide to the Galaxy)

To avoid that everybody sends statements like on the right side in different ways, W3C defined a standard for it. Since the statements typically describe a web resource in an essential supporting structure (known as framework), it was named Resource Description Framework (RDF). Using it, we can express statements as RDF triples.

To also avoid that everybody refers to resources in different ways, the standard also defines a uniform (consistent, systematic) way to identify the resources. Therefore, you have to use Uniform Resource Identifiers (URIs) which define how your identifiers (character strings) have to be stuctured.

URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]

There are related identifiers you may come in contact with, which are summarized in the following table.

Identifier Abbrev. Function Example
Uniform Resource Name URN you can name the resource, but not locate it
Uniform Resource Locator URL you can locate the resource
Uniform Resource Identifier URI generalizes URN and URL
Internationalized Resource Identifier IRI generalizes URI, expanding the set of permitted charactersῬόδος

Since URIs can get rather long and often share the same prefix (part of a string attached to a beginning of a string), there is the idea to have a compact representation of them: Compact URI (CURIE). This way, we define only once a prefix named wiki: for and can write from then on short versions, e.g. wiki:Douglas_Adams.

Hands-on RDF

There are software libraries to develop with RDF. In this document, we support two programming languages:

Java 8 developers can use the Jena library.
Add the dependency to your Maven POM project.


Python 3 developers can use the rdflib library.
Install the dependency with the package installer for Python (pip).

$ pip install rdflib

For educational purpose the code is more explicit and rather long. Often the libraries provide shortcut methods which can be used by experienced developers. The reader is always encouraged to use the autocomplete feature of the IDE to discover more methods (functions).

Table of Contents


To store RDF statements, also known as triples, you need a triplestore. Just consider the data model as a mathematical set of triple entries (i.e. without duplicates) where you can add and remove triple statements (atoms). Since their subjects and objects can also be interpreted as nodes in a directed edge-labelled graph, it is also named an RDF graph. Because the statements express some knowledge, it is also called knowledge graph and because data is interlinked, it is also called linked data.

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;

Model model = ModelFactory.createDefaultModel();
from rdflib import Graph

g = Graph()

Because we have to use URIs, they need to be created first. At the predicate position only a certain kind of resource is considered: a resource describing a property.

import org.apache.jena.rdf.model.ResourceFactory;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.Property;

Resource subject = ResourceFactory.createResource(""); //or model.createResource
Property predicate = ResourceFactory.createProperty(""); //or model.createProperty
Resource object = ResourceFactory.createResource("");
from rdflib import URIRef

subject = URIRef('')
predicate = URIRef('')
obj = URIRef('')

If you really do not have resolvable (web) addresses, but need ad hoc URIs for your resources, consider creating URNs with Universally Unique Identifiers (UUIDs) (and don't worry: they will never clash).

import java.util.UUID;

Resource res = ResourceFactory.createResource("urn:uuid:" + UUID.randomUUID().toString());
import uuid

res = URIRef('urn:uuid:' + str(uuid.uuid4()))

Now, we can add and remove triples in the graph. If you would like to change only subject, predicate or object in a triple, you have to remove the triple completely and add the changed one, since triples are atomic.

model.add(subject, predicate, object);
model.remove(subject, predicate, object);
g.add((subject, predicate, obj))
g.remove((subject, predicate, obj))

Ok, but how can we store data values like names, numbers and dates? Only at object position you are also allowed to store any kind of character string. Since this is not a resource, you cannot make statements about it, which is why we call it literal (cf. terminal symbols in formal grammars).

import org.apache.jena.rdf.model.Literal;

Property label = ResourceFactory.createProperty("");
Literal name = ResourceFactory.createPlainLiteral("Douglas Adams"); //or model.createLiteral
model.add(subject, label, name);
from rdflib import Literal

label = URIRef('')
name = Literal('Douglas Adams')
g.add((subject, label, name))

Literals can have additional meta data:

import org.apache.jena.datatypes.xsd.XSDDatatype;

Literal publicationDate = ResourceFactory.createTypedLiteral("1979-10-12", XSDDatatype.XSDdate); //or model.createTypedLiteral
Literal japaneseName = ResourceFactory.createLangLiteral("ダグラス・アダムズ", "ja"); //or model.createLiteral
from rdflib.namespace import XSD

publicationDate = Literal("1979-10-12",
japaneseName = Literal('ダグラス・アダムズ', lang='ja')


We can iterate over statements and optionally give a fixed subject, predicate or object (or a combination of them which is known as basic graph patterns [BGPs]).

for(Statement stmt : model.listStatements().toSet()) {

for(Statement stmt : model.listStatements(subject, null, (RDFNode) null).toSet()) {
Further methods
for s, p, o in g:
    print(s, p, o)

for s, p, o in g.triples((subject, None, None)):
    print(s, p, o)
Further methods

This is useful when we would like to simply lookup what is connected over one-hop. However, for more complex queries, we should use a dedicated query language (like SQL for databases). Fortunately, W3C standardized an RDF Query Language which is called SPARQL .

    ?s ?p ?o

The basic idea is that ?-prefixed variables are iteratively bound with values for all possible matches in the graph (subgraph matching). Learning all aspects of SPARQL takes some time, but it is worthwhile for working with RDF: almost all triplestores implement SPARQL as the main interface to query and manipulate RDF data. It is not uncommon that complex use cases result in long and complex queries. With the libraries we can parse and execute SPARQL queries.

import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.QuerySolution;
import org.apache.jena.query.ResultSet;
import org.apache.jena.query.ResultSetFormatter;

QueryExecution qe = QueryExecutionFactory.create(queryStr, model);
ResultSet rs = qe.execSelect();
for(QuerySolution qs : ResultSetFormatter.toList(rs)) {
//or ResultSetFormatter.out(rs)
Further reading
rs = g.query(queryStr)

for qs in rs:
Further reading

Reading and Writing RDF (I/O)

Whenever you would like to transfer RDF data from one system to another, you can serialize and parse RDF documents in a certain format.

Since URIs can get rather long and often share the same prefix, we should define some prefix mappings first. Note that an empty prefix name "" is also allowed. Frequently used prefixes can be looked up at

import org.apache.jena.shared.PrefixMapping;
import org.apache.jena.vocabulary.XSD;

model.setNsPrefix("wiki", "");
model.setNsPrefix("wn", "");
model.setNsPrefix("xsd", XSD.NS);

//contains frequently used prefixes
PrefixMapping standard = PrefixMapping.Standard;
Further reading
g.bind("wiki", URIRef(""))
g.bind("wn", URIRef(""))
g.bind("xsd", XSD)
Further reading

The following table lists frequently used serialization formats, ordered by decreasing importance for us.

Language Abbrev. Extended from Main Features Mime-Type File Ext. Jena rdflib
Terse RDF Triple Language Turtle, TTL N-Triples Human-friendly and human-readable due to the following features: @prefix directive; ; repeat subject-predicate, , repeat object; (…) lists; boolean, decimal, double, integer literals text/turtle *.ttl "TTL" 'ttl'
N-Triples NT Line-based: one line for one triple; . indicates full stop application/n-triples *.nt "NT" 'nt'
JavaScript Object Notation for Linked Data JSON-LD JSON Uses JSON to encode triples; useful in JavaScript/Web context; drawback: triples are not clearly visible anymore application/ld+json *.jsonld "JSON-LD" 'json-ld'
Notation3 N3 Turtle Adds more features to Turtle, but almost always we do not need them text/n3;charset=utf-8 *.n3 "N3" 'n3'
RDF/XML RDF/XML XML Uses XML to express RDF data; not human-friendly: not particularly readable and writable for humans application/rdf+xml *.rdf "RDF/XML" 'pretty-xml'

The libraries allow you to write and read these formats (always use UTF-8 encoding).

StringWriter sw = new StringWriter();
model.write(sw, "TTL"); //or "NT", "JSON-LD", "N3", "RDF/XML"
System.out.println(sw.toString()); FileReader(new File("data.ttl")), null, "TTL");
Further reading
print(g.serialize(format='ttl')) # or 'nt', 'json-ld', 'n3', 'pretty-xml'

g.parse('data.ttl', format='ttl')
Further reading, Further reading

Below, all the formats represent the exact same RDF data (triples), however in very different syntaxes. You can always convert between them.

@prefix wiki:  <> .
@prefix wn:    <> .
@prefix xsd:   <> .

wiki:Douglas_Adams  <>
                "Douglas Adams" , "ダグラス・アダムズ"@ja ;
        wn:01643749-v  wiki:The_Hitchhikers_Guide_to_the_Galaxy_novel .

                "1979-10-12"^^xsd:date .

<> <> "Douglas Adams" .
<> <> "ダグラス・アダムズ"@ja .
<> <> <> .
<> <> "1979-10-12"^^<> .

  "@context" : {
    "dateSubmitted" : {
      "@id" : "",
      "@type" : ""
    "label" : {
      "@id" : ""
    "01643749-v" : {
      "@id" : "",
      "@type" : "@id"
    "wiki" : "",
    "wn" : "",
    "xsd" : ""
  "@graph" : [ {
    "@id" : "wiki:Douglas_Adams",
    "01643749-v" : "wiki:The_Hitchhikers_Guide_to_the_Galaxy_novel",
    "label" : [
      "Douglas Adams",
        "@language" : "ja",
        "@value" : "ダグラス・アダムズ"
  }, {
    "@id" : "wiki:The_Hitchhikers_Guide_to_the_Galaxy_novel",
    "dateSubmitted" : "1979-10-12"
  } ]
@prefix wiki:  <> .
@prefix wn:    <> .
@prefix xsd:   <> .

wiki:Douglas_Adams  <>
                "Douglas Adams" , "ダグラス・アダムズ"@ja ;
        wn:01643749-v  wiki:The_Hitchhikers_Guide_to_the_Galaxy_novel .

                "1979-10-12"^^xsd:date .

  <rdf:Description rdf:about="">
    <rdfs:label xml:lang="ja">ダグラス・アダムズ</rdfs:label>
    <rdfs:label>Douglas Adams</rdfs:label>
      <rdf:Description rdf:about="">
        <j.0:dateSubmitted rdf:datatype=""


So far, we only made statements (facts) about persons or things with assertions (also known as the ABox part). However, we can also use RDF to make statements about a terminology (vocabulary of a domain of interest, also known as the TBox part). Since this defines what can possibly exist and how it relates in a domain, we call it Ontology (taken from philosophy: study of concepts such as existence, being, becoming, and reality).

In its simplest form, ontologies define two aspects:

The following table lists frequently used ontologies, ordered by decreasing importance for us.

Ontology Abbrev. / Prefix Reference Freq. Used
Resource Description Framework
(using RDF we describe the terminology of RDF)
RDF / rdf
Jena: import org.apache.jena.vocabulary.RDF
rdflib: from rdflib.namespace import RDF
Resource Description Framework Schema
(using RDF we describe a schema terminology)
RDFS / rdfs
Jena: import org.apache.jena.vocabulary.RDFS
rdflib: from rdflib.namespace import RDFS
XML Schema Definition
XSD / xsd
Jena: import org.apache.jena.vocabulary.XSD
rdflib: from rdflib.namespace import XSD
Nepomuk Shared Desktop Ontologies
(we often use the Personal Information Model Ontology)
PIMO / pimo
Jena: (not available)
rdflib: PIMO = Namespace("")
SDO / sdo
Jena: (not available)
rdflib: SDO = Namespace("")
Friend of a Friend
FOAF / foaf
Jena: import org.apache.jena.sparql.vocabulary.FOAF
rdflib: from rdflib.namespace import FOAF
Simple Knowledge Organization System
SKOS / skos
Jena: import org.apache.jena.vocabulary.SKOS;
rdflib: from rdflib.namespace import SKOS
Dublin Core Metadata Initiative
DC / dc
Jena: import org.apache.jena.vocabulary.DC;
rdflib: from rdflib.namespace import DC
Web Ontology Language
OWL / owl
Jena: import org.apache.jena.vocabulary.OWL2;
rdflib: from rdflib.namespace import OWL

When making assertions, we encourage you to reuse (refer to) as much classes and properties as possible from existing ontologies. This way, your statements can be easily interpreted by looking up (dereference) the already defined meanings. Use the search in linked open vocabularies to find more classes and properties in public ontologies. In order to look up ontology prefixes, use

Resource douglasAdams = ResourceFactory.createResource("");

model.add(douglasAdams, RDF.type, FOAF.Person);
model.add(douglasAdams, RDF.type, OWL2.NamedIndividual);
model.add(douglasAdams, RDFS.label, "Douglas Adams");
model.add(douglasAdams, RDFS.comment, "An English author, screenwriter, essayist, humorist, satirist and dramatist.");
model.add(douglasAdams, DC.title, "Douglas Adams");
douglasAdams = URIRef("")

g.add((douglasAdams, RDF.type, FOAF.Person))
g.add((douglasAdams, RDF.type, OWL.NamedIndividual))
g.add((douglasAdams, RDFS.label, Literal("Douglas Adams")))
g.add((douglasAdams, RDFS.comment, Literal("An English author, screenwriter, essayist, humorist, satirist and dramatist.")))
g.add((douglasAdams, DC.title, Literal("Douglas Adams")))

Modelling an ontology from a domain of interest is called ontology engineering, while learning it from text more automatically is called ontology learning. Both are considered knowledge acquisition activities. This tutorial will not cover, how to model an ontology from a domain of interest.

Knowledge Graphs

Besides ontologies for certain domains, you can also make use of assertions from publicly available knowledge graphs. Since this linked data is usually open (i.e. freely available), such RDF datasets are summarized under the umbrella term Linked Open Data (Cloud). The domains include for example geography, government, life sciences, linguistics and media.

A notable knowledge graph is the RDF version of Wikipedia which is DBpedia. Its main idea is the mapping of infobox tables to RDF statements using its ontology.

@prefix dbo:	<> .
@prefix dbr:	<> .
dbr:Douglas_Adams dbo:birthName "Douglas Noel Adams"@en ;
    dbo:birthDate	"1952-03-11"^^xsd:date ;
    dbo:birthPlace	dbr:Cambridge , dbr:Cambridgeshire ;

    dbo:deathDate	"2001-05-11"^^xsd:date ;
    dbo:deathPlace	<,_California> ;

    dbo:restingPlace	dbr:Highgate_Cemetery , dbr:London ;

    dbo:occupation	dbr:Author , dbr:Screenwriter , dbr:Essay ,
        dbr:List_of_humorists , dbr:List_of_satirists_and_satires , dbr:Playwright .
DBpedia Page, Turtle

Another notable knowledge graph is Wikidata which is, in contrast to DBpedia, manually created by contributors. It can be queried with SPARQL or downloaded (see truthy dumps).

Usually, such knowledge graphs cover rather common knowledge of a certain domain, which is why they are less useful for personal knowledge graph scenarios.

Statements about Statements

"Douglas Adams created The Hitchhiker's Guide to the Galaxy", claims Mark Carwardine.

This sentence has two statements.

This statement about a statement can be expressed in RDF with so-called RDF-star (or written RDF*) which introduces quoted triples. They are written with double angle brackets << ... >>.

@prefix wiki:  <> .
@prefix wn:    <> .

<< wiki:Douglas_Adams wn:01643749-v wiki:The_Hitchhikers_Guide_to_the_Galaxy_novel >> wn:00758383-v wiki:Mark_Carwardine .

Note that we see here only one statement which is the 'claim' statement. The quoted triple is not part of the RDF graph since it is hidden in the (subject) resource. It has to be repeated in unquoted way to be in the RDF graph as well. Use on the resource getStmtTerm() to get the statement (i.e. quoted triple).

Jena supports RDF-star. You need Jena ARQ library for RDF I/O technology (RIOT) to parse and write RDF-star. It is supported in the following formats:

  • Turtle-star, just use "TTL"
  • N-Triples-star, just use "NT"

Unfortunately, Version 3.17.0 does not support the {| ... |} annotation syntax.

StringWriter sw = new StringWriter();
model.write(sw, "TTL"); //or "NT"
System.out.println(sw.toString()); FileReader(new File("data.ttl")), null, "TTL");

In rdflib there seems to be an ongoing implementation.


Since we are working with statements, we can also apply forms of logical reasoning in RDF. This includes two parts: rules usually in form of [if premises then conclusions] (modus ponens) and facts in form of statements (premises and conclusions, depending on occurance in rules). We distinguish the following three reasoning forms.

Reasoning What we have What we get Usage for us
Deduction (inference)
a.k.a. forward chaining (data-driven)
Premises Rules Conclusions We often infer new statements from existing statements
Induction ("(machine) learning") Premises Conclusions Rules We sometimes train machine learning models from knowledge graphs
Abduction ("reasoning of detectives")
a.k.a. backward chaining (goal-driven)
Conclusions Rules Premises We sometimes would like to check a deduction tree

A rule engine can be used to perform forward and backward chaining.

Jena provides a general purpose rule engine. For working with inferred statements use the InfModel interface. The rule language in Jena follows a certain syntax. In the example, we apply the rule that every person is also a resource in RDF, yet more complex rules can easily be defined, also including builtin functions.

import org.apache.jena.reasoner.rulesys.GenericRuleReasoner;
import org.apache.jena.reasoner.rulesys.Rule;
import org.apache.jena.rdf.model.InfModel;

String rules =
    "@prefix rdf: <>.\n" +
    "@prefix rdfs: <>.\n" +
    "@prefix foaf: <>.\n" +
    "[ruleA: (?s rdf:type foaf:Person) -> (?s rdf:type rdfs:Resource)]";
GenericRuleReasoner reasoner = new GenericRuleReasoner(Rule.parseRules(rules));
InfModel infModel = ModelFactory.createInfModel(reasoner, model);
Model raw = infModel.getRawModel();
Model deduced = infModel.getDeductionsModel();

In rdflib there seems to be no rule engine implementation. As an alternative in Python, you may consider PyKE (Python Knowledge Engine). However, this requires to mediate between RDF and PyKE.

If logging is enabled, we can reproduce how certain statements are derived.

import org.apache.jena.reasoner.Derivation;
import org.apache.jena.reasoner.rulesys.RuleDerivation;


Iterator<Derivation> iter = infModel.getDerivation(ResourceFactory.createStatement(
while(iter.hasNext()) {
    RuleDerivation ruleDerivation = (RuleDerivation);
            ruleDerivation.getRule() + " " +
            ruleDerivation.getMatches() + " " +

Besides defining our own rule set, we can rely on already defined logic systems like description logic (DL). Fortunately, as soon as we use RDFS and an RDFS reasoner, we get a DL variant. This can also be included in a rule set with @include <RDFS>.

InfModel rdfsInfModel = ModelFactory.createRDFSModel(model);

This results in the following RDFS entailment rule applications.

Name If Then
"Everything is a resource, but
properties are on predicate position"
(x, p, y)

(x, rdf:type, rdfs:Resource)
(y, rdf:type, rdfs:Resource)
(x, p, y)
(p, rdf:type, rdf:Property)
"A class is a specific resource"
(c, rdf:type, rdfs:Class)
(c, rdfs:subClassOf, rdfs:Resource)
Class Hierarchy
(c, rdf:type, rdfs:Class)
(c, rdfs:subClassOf, c)
(c, rdfs:subClassOf, d)
(d, rdfs:subClassOf, e)
(c, rdfs:subClassOf, e)

(x, rdf:type, c)
(c, rdfs:subClassOf, d)
(x, rdf:type, d)

Property Hierarchy
(p, rdf:type, rdf:Property)
(p, rdfs:subPropertyOf, p)
(p, rdfs:subPropertyOf, q)
(q, rdfs:subPropertyOf, r)
(p, rdfs:subPropertyOf, r)

(x, p, y)
(p, rdfs:subPropertyOf, q)
(x, q, y)

Domain & Range
(x, p, y)
(p, rdfs:domain, d)
(x, rdf:type, d)

(x, p, y)
(p, rdfs:range, r)
(y, rdf:type, r)

More expressive reasoning can be done with OWL (Lite < DL < Full), however we rather seldom use it due to its higher time complexity.

Mapping to RDF

Often we have to work with semi-structured data sources such as CSV, XML or JSON which are not expressed in RDF. In such cases, we can map data to RDF statements in order to work with it using semantic technologies. Using a programming language to implement the mapping can take much effort and time. Therefore, a mapping language can be useful to define how sources shall be mapped to RDF. A popular one is the RDF Mapping Language (RML) in combination with an RML engine (RML processor, RML mapper).

RML Mapper is a Java implementation of an RML engine. Since it includes old Jena dependencies, we exclude them.


In rdflib there seems to be no RML engine implementation. As an alternative in Python, you may consider pyrml.

RML is also expressed with RDF (there is also a YAML syntax and an editor). Since RML is an extension of R2RML (mapping ralational databases), its concepts are often reused.

@prefix rr: <>.
@prefix rml: <>.
@prefix ql: <>.
@prefix xsd: <>.
@prefix foaf: <>.

RML's basic idea is to refer to logical sources and define how subjects and their prediate-objects shall be mapped. Use templates with variable references {…} or direct references. We use BlankNode syntax […] to define resources without giving them URIs.

HG2G.csv Input Data


The first row is usually the header. Use column names to refer to them.

RML Mapping File

@prefix : <file://data/HG2G-csv.rml.ttl#>.

:mapping a rr:TriplesMap;
    rml:logicalSource [
        rml:source "HG2G.csv" ;
        rml:referenceFormulation ql:CSV

    rr:subjectMap [
        rr:template "file://data/HG2G.csv.ttl#{id}";
        rr:class foaf:Person
    rr:predicateObjectMap [
        rr:predicate foaf:firstName;
        rr:objectMap [
            rml:reference "fn"
    rr:predicateObjectMap [
        rr:predicate foaf:lastName;
        rr:objectMap [
            rml:reference "ln"
    rr:predicateObjectMap [
        rr:predicate foaf:age;
        rr:objectMap [
            rml:reference "age";
            rr:datatype xsd:int
    rr:predicateObjectMap [
        rr:predicate foaf:knows;
        rr:objectMap [
            rr:template "file://data/HG2G.csv.ttl#{knows}"

HG2G.xml Input Data

    <person id="1" fn="Arthur" ln="Dent" age="35">
    <person id="2" fn="Ford" ln="Prefect" age="38">
    <person id="3" fn="Marvin"  age="4274936">

Use XPath to iterate and refer to XML elements and attribute values.

RML Mapping File

@prefix : <file://data/HG2G-xml.rml.ttl#>.

:mapping a rr:TriplesMap;
    rml:logicalSource [
        rml:source "HG2G.xml" ;
        rml:referenceFormulation ql:XPath;
        rml:iterator "/hg2g/person";
    rr:subjectMap [
        rr:template "file://data/HG2G.xml.ttl#{@id}";
        rr:class foaf:Person
    rr:predicateObjectMap [
        rr:predicate foaf:firstName;
        rr:objectMap [
            rml:reference "@fn"
    rr:predicateObjectMap [
        rr:predicate foaf:lastName;
        rr:objectMap [
            rml:reference "@ln"
    rr:predicateObjectMap [
        rr:predicate foaf:age;
        rr:objectMap [
            rml:reference "@age";
            rr:datatype xsd:int
    rr:predicateObjectMap [
        rr:predicate foaf:knows;
        rr:objectMap [
            rr:template "file://data/HG2G.xml.ttl#{knows}"

HG2G.json Input Data

        "id": "1",
        "fn": "Arthur",
        "ln": "Dent",
        "age": 35,
        "knows": [{ "id": "2" }, { "id": "3" }
        "id": "2",
        "fn": "Ford",
        "ln": "Prefect",
        "age": 38,
        "knows": [{ "id": "1" }, { "id": "3" }
        "id": "3",
        "fn": "Marvin",
        "ln": "",
        "age": 4274936,
        "knows": [{ "id": "1" }, { "id": "2" }

Use JSONPath to iterate and refer to JSON elements and values.

RML Mapping File

@prefix : <file://data/HG2G-json.rml.ttl#>.

:mapping a rr:TriplesMap;
    rml:logicalSource [
        rml:source "HG2G.json" ;
        rml:referenceFormulation ql:JSONPath;
        rml:iterator "$.*";
    rr:subjectMap [
        rr:template "file://data/HG2G.json.ttl#{id}";
        rr:class foaf:Person
    rr:predicateObjectMap [
        rr:predicate foaf:firstName;
        rr:objectMap [
            rml:reference "fn"
    rr:predicateObjectMap [
        rr:predicate foaf:lastName;
        rr:objectMap [
            rml:reference "ln"
    rr:predicateObjectMap [
        rr:predicate foaf:age;
        rr:objectMap [
            rml:reference "age";
            rr:datatype xsd:int
    rr:predicateObjectMap [
        rr:predicate foaf:knows;
        rr:objectMap [
            rr:template "file://data/HG2G.json.ttl#{knows.*.id}"

In Java we use RML Mapper to perform the mapping.

import be.ugent.rml.Executor;
import be.ugent.rml.functions.FunctionLoader;
import be.ugent.rml.records.RecordsFactory;

RDF4JStore rmlStore = new RDF4JStore(); FileInputStream(mappingFile), null, RDFFormat.TURTLE);
RecordsFactory factory = new RecordsFactory(mappingFile.getParentFile().getAbsolutePath());
QuadStore outputStore = new RDF4JStore();
FunctionLoader functionLoader = new FunctionLoader();
Executor executor = new Executor(rmlStore, factory, functionLoader, outputStore, null);
QuadStore qs = executor.execute(null);
qs.write(new FileWriterWithEncoding(outFile, StandardCharsets.UTF_8), "turtle");

However, often we have to transform (convert) data, since it is not in the shape we need it. Therefore, RML allows us to invoke functions from a programming language during the mapping.

First, we implement a static method in Java in a separate project and build a JAR file functions-1.0.0-SNAPSHOT.jar.

package de.uni.kl.dfki.mschroeder.tutorial.semtec;

import java.time.LocalDateTime;

public class Functions {

    public static Integer getYear(Integer age) {
        return - age;


Second, we have to describe the function using the Function Ontology (FnO). This definition can be automatically generated using Java reflections.

@prefix fnoi:  <> .
@prefix fno:   <> .
@prefix fnom:  <> .
@prefix fnml:  <> .
@prefix xsd:   <> .
@prefix rdfs:  <> .

        a            fno:Function ;
        rdfs:label   "getYear" ;
        fno:expects  ( <java:parameter.integer.0> ) ;
        fno:returns  ( <java:return.integer> ) .

        a                fnoi:JavaClass ;
                "functions/target/functions-1.0.0-SNAPSHOT.jar" ; # note: has to be an absolute path
        fnoi:class-name  "de.uni.kl.dfki.mschroeder.tutorial.semtec.Functions" .

[ a                   fno:Mapping ;
  fno:function        <java:de.uni.kl.dfki.mschroeder.tutorial.semtec.Functions.getYear> ;
  fno:implementation  <java:de.uni.kl.dfki.mschroeder.tutorial.semtec.Functions> ;
  fno:methodMapping   [ a                 fnom:StringMethodMapping ;
                        fnom:method-name  "getYear"
] .

        a              fno:Parameter ;
        rdfs:label     "Input Integer 0" ;
        fno:predicate  <java:parameter.predicate.integer.0> ;
        fno:required   true ;
        fno:type       xsd:int .

        a              fno:Output ;
        rdfs:label     "Output Integer" ;
        fno:predicate  <java:return.predicate.integer> ;
        fno:type       xsd:int .

The function definition has to be loaded.

QuadStore fnoQuadStore = new RDF4JStore(); FileInputStream(fnoFile), null, RDFFormat.TURTLE);
FunctionLoader functionLoader = new FunctionLoader(fnoQuadStore);

This way we can use it in the mapping.

@prefix dbo: <> .
@prefix :    <file://data/HG2G-csv-fno.rml.ttl#>.

:mapping a rr:TriplesMap;
    rml:logicalSource [
        rml:source "HG2G.csv" ;
        rml:referenceFormulation ql:CSV
    rr:subjectMap [
        rr:template "file://data/HG2G.csv.ttl#{id}";
        rr:class foaf:Person
    rr:predicateObjectMap [
        rr:predicate dbo:birthYear;
        rr:objectMap [
            a fnml:FunctionMap ;
            fnml:functionValue  [
                rr:predicateObjectMap  [
                    rr:predicate  fno:executes ;
                    rr:object     <java:de.uni.kl.dfki.mschroeder.tutorial.semtec.Functions.getYear>
                ] ;
                rr:predicateObjectMap [
                    rr:predicate  <java:parameter.predicate.integer.0> ;
                    rr:objectMap  [ rml:reference  "age" ]
            ] ;
            rr:datatype xsd:int


A patch consists of changes which usually update, fix or improve computer programs. However, patches can also be used to update datasets, in our case, RDF graphs.

The open source library RDF Delta implements a patching mechanism for RDF in its RDF Patch module. Since it includes old Jena dependencies, we exclude them. Version 0.8.2 is the latest version compatible with Java 8.


In rdflib there seems to be no library which supports patching RDF.

Suppose a knowledge engineer (KE) downloads the following subset of an RDF graph in order to edit it locally.

@prefix skos:  <> .

        skos:prefLabel              "Graph Embedings" ;
        <pimo:thing#hasSuperTopic>  <pimo:1452006681033:5> .

        skos:prefLabel  "Artificial Intelligence" .

        skos:prefLabel  "Peer-to-Peer" .

To observe what changes happen to the RDF model, an implementation of the RDFChanges interface is necessary (in our case RDFChangesCollector). RDF patches are able to store meta data in form of header information. During the session, the KE corrects the typo in Graph Embeddings and states the correct super topic relation. Instead of Java code, this could be done in a GUI.

import org.seaborne.patch.changes.RDFChangesCollector;
import org.seaborne.patch.RDFPatchOps;
import org.seaborne.patch.RDFPatch;
import org.seaborne.patch.items.ChangeItem;
import org.apache.jena.graph.NodeFactory;

RDFChangesCollector collector = new RDFChangesCollector();
Graph observedGraph = RDFPatchOps.changes(model.getGraph(), collector);
Model observedModel = ModelFactory.createModelForGraph(observedGraph);

collector.header("authKey", NodeFactory.createLiteral("43538d81-4027-4106-9a5f-063405827d82"));

observedModel.removeAll(graphEmbeddings, hasSuperTopic, null);
observedModel.add(graphEmbeddings, hasSuperTopic, artificialIntelligence);
observedModel.removeAll(graphEmbeddings, SKOS.prefLabel, null);
observedModel.add(graphEmbeddings, SKOS.prefLabel, "Graph Embeddings");

String patchSyntax = RDFPatchOps.str(collector.getRDFPatch());
RDFPatchStored patch = (RDFPatchStored) ByteArrayInputStream(patchSyntax.getBytes()));
List<ChangeItem> changeItems = patch.getActions();

The patch can be serialized for storing in a document or transferring to another system. Conversely, serialized patches can be parsed and accessed in-memory using Java Beans. A patch document lists Header entries and what triples are Added and what are Deleted.

H authkey "43538d81-4027-4106-9a5f-063405827d82" .
D <pimo:1549008225287:24> <pimo:thing#hasSuperTopic> <pimo:1452006681033:5> .
A <pimo:1549008225287:24> <pimo:thing#hasSuperTopic> <pimo:1342797660296:20> .
D <pimo:1549008225287:24> <> "Graph Embedings" .
A <pimo:1549008225287:24> <> "Graph Embeddings" .

Linked Data Fragments

Performing complex SPARQL queries on a single server can lead to high server costs, low availability and bandwidth. To redistribute the load between clients and servers, a solution is to only allow querying fragments (isolated, incomplete parts of an RDF graph), so-called Linked Data Fragments. The basic idea is that only Basic Graph Patterns (BGP) (so-called Triple Patterns) are evaluated by the server, while the client filters, joins, etc. the results. BGPs have the form:


To implement a Triple Pattern Fragments servers, developers can consult the unofficial draft of the Triple Pattern Fragments specification. Given a dataset hosted on a server, a (possibly paginated) fragment is an RDF document response to a triple pattern request by a client. A fragment response basically consists of two parts:

Meta Data
Meta data consists of information about the dataset (e.g. how it is queried) and the fragment (e.g. name, size, pagination). To state this, the vocabulary of Hypermedia-Driven Web APIs (Hydra) [References in Java] and the Vocabulary of Interlinked Datasets (VoID) is used.
The (possibly paginated) actual triples which match a given triple pattern.

To implement a server, the micro framework Spark is used in the example below. Of cause any HttpServlet implementation can alternatively be used, for example Jetty.


A Triple Pattern Fragments server has to provide only one HTTP GET route to let clients query fragments. The major part of the source code is about forming the fragment's meta data. Since meta data is passed in a separate RDF graph (data in default graph), the RDF serialization format TriG is used. Optional authentication information can be passed in the request's header.

import org.apache.jena.graph.Node;
import org.apache.jena.query.Dataset;
import org.apache.jena.query.DatasetFactory;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Property;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFDataMgr;
import org.apache.jena.sparql.util.NodeFactoryExtra;
import org.apache.jena.sparql.vocabulary.FOAF;
import org.apache.jena.vocabulary.RDF;
import org.apache.jena.vocabulary.RDFS;
import org.apache.jena.vocabulary.VOID;
import spark.Request;
import spark.Response;
import spark.Spark;

public static void main(String[] args) {
    Spark.get("/fragment", TriplePatternFragments::getFragment);

private static Object getFragment(Request req, Response resp) {
    //meta data
    Model metadata = ModelFactory.createDefaultModel();
    Resource metadataGraph = metadata.createResource(req.url() + "#metadata");
    Resource fragment = metadata.createResource(req.url());
    Resource hydraSearch = metadata.createResource();
    Map<String, Property> hydraMap = new HashMap<>();
    hydraMap.put("s", RDF.subject);
    hydraMap.put("p", RDF.predicate);
    hydraMap.put("o", RDF.object);

    //the primary topic of the meta data graph is the fragment
    metadata.add(metadataGraph, FOAF.primaryTopic, fragment);

    Resource dataset = metadata.createResource("http://localhost:8081/fragment#dataset");
    metadata.add(dataset, RDF.type, VOID.Dataset);
    metadata.add(dataset, RDF.type, Hydra.Collection);
    //the fragment is a subset of the whole dataset
    metadata.add(dataset, VOID.subset, fragment);
    //explain how the dataset can be searched
    metadata.add(dataset,, hydraSearch);
    //explain the template to search
    metadata.add(hydraSearch, Hydra.template, "http://localhost:8081/fragment{?s,p,o}");
    //map URL query param keys to RDF properties
    for(Entry<String, Property> entry : hydraMap.entrySet()) {
        Resource mapping = metadata.createResource();
        metadata.add(hydraSearch, Hydra.mapping, mapping);
        metadata.add(mapping, Hydra.variable, entry.getKey());
        metadata.add(mapping,, entry.getValue());

    String s = req.queryParams("s");
    String p = req.queryParams("p");
    String o = req.queryParams("o");
    Model dataModel = evaluateTriplePattern(s, p, o);

    //meta data about the fragment
    metadata.add(fragment, RDF.type, Hydra.PartialCollectionView);
    //a fragment's subset is itself
    metadata.add(fragment, VOID.subset, fragment);
    long totalTriples = dataModel.size();
    metadata.addLiteral(fragment, VOID.triples, totalTriples);
    metadata.addLiteral(fragment, Hydra.totalItems, totalTriples);

    Dataset ds = DatasetFactory.create();
    ds.getDefaultModel().setNsPrefix("hydra", Hydra.NS);
    ds.getDefaultModel().setNsPrefix("rdf", RDF.uri);
    ds.getDefaultModel().setNsPrefix("void", VOID.NS);
    //graph about meta data
    ds.addNamedModel(metadataGraph.getURI(), metadata);
    //default graph is data graph

    //write in trig format
    StringWriter trigSW = new StringWriter();
    RDFDataMgr.write(trigSW, ds, Lang.TRIG);
    resp.header("Access-Control-Allow-Origin", "*");
    return trigSW.toString();

The evaluateTriplePattern method is the procedure which actually evaluates a triple pattern (s,p,o) and returns matching RDF triples. Since object o can also be a literal (indicated by a starting quote), we have to parse it using regular expression.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

private static final Pattern STRINGPATTERN = Pattern.compile("^\"(.*)\"(?:@(.*)|\\^\\^<?([^<>]*)>?)?$");

private static Model evaluateTriplePattern(String s, String p, String o) {
    Model dataModel = ModelFactory.createDefaultModel();

    Resource subject = null;
    Property predicate = null;
    RDFNode object = null;
    if(s != null) {
        subject = dataModel.createResource(s);
    if(p != null) {
        predicate = dataModel.createProperty(p);
    if(o != null) {
        if(o.charAt(0) == '\"') {
            Matcher matcher = STRINGPATTERN.matcher(o);
            if (matcher.matches()) {
                String label   =;
                String langTag =;
                String typeURI =;

                if (langTag != null)
                    object = dataModel.createLiteral(label, langTag);
                else if (typeURI != null)
                    object = dataModel.createTypedLiteral(label, typeURI);
                    object = dataModel.createLiteral(label);

            } else {
                throw new RuntimeException("Literal parsing failed for " + o);
        } else {
            object = dataModel.createResource(o);

    //TODO match subject, predicate, object and fill dataModel

    return dataModel;

A similar implemention can be archived with flask and rdflib. There is also a Python implementation on GitHub.

An empty repsonse looks like the following (see also a DBpedia example).

@prefix hydra: <> .
@prefix void:  <> .
@prefix rdf:   <> .
@prefix xsd:   <> .

# extra graph for meta data
<http://localhost:8081/fragment#metadata> {

# the primary topic of the meta data graph is the fragment
        foaf:primaryTopic  <http://localhost:8081/fragment> .

# information about the dataset
        a            void:Dataset , hydra:Collection ;
        void:subset  <http://localhost:8081/fragment> ;
        # how the dataset is queried
        hydra:search      [
                            hydra:template  "http://localhost:8081/fragment{?s,p,o}" ;
                            hydra:mapping   [ hydra:property  rdf:subject ;
                                              hydra:variable  "s"
                                            ] ;
                            hydra:mapping   [ hydra:property  rdf:predicate ;
                                              hydra:variable  "p"
                                            ] ;
                            hydra:mapping   [ hydra:property  rdf:object ;
                                              hydra:variable  "o"
                          ] .

# information about the fragment
        a                 hydra:PartialCollectionView ;
        void:subset       <http://localhost:8081/fragment> ;
        void:triples      "0"^^xsd:int ;
        hydra:totalItems  "0"^^xsd:int .



A JavaScript-based client is Comunica which can be tried online (e.g. to query the localhost server).