Data Sprout
Dataset Generation Patterns for Evaluating Knowledge Graph Construction


Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they produce or manage data, generation patterns could be discovered which can be utilized by data generators to imitate real datasets. We derived generation patterns found in real spreadsheets from industry and demonstrate a suitable generator called Data Sprout (Code@GitHub) that is able to reproduce them.

YouTube Video: ESWC 2021 Demo Paper Teaser



The demo is offline (since 2023-09-15). If you would like to try it anyway, contact me at markus [dot] schroeder [at] dfki [dot] de.

You can try out a publicly available online demo.


The source code of the generator can be found on GitHub.

You can also download already generated datasets.