String Analyzer






Description



Provides various analytic components to process strings from arbitrary data sources. The library is designed to analyze short character strings (text snippets) in various ways. Its features have been used on several occasions in knowledge graph construction tasks.

Authors


Documentation

String Entity

A string entity is like java.lang.String (character string), but additionally with
//entity with id and value
StringEntity mschroeder = new StringEntity("https://www.dfki.uni-kl.de/~mschroeder", "Markus Schröder");

//entity with UUID as id
StringEntity se = StringEntity.withRandomUUID("a value");

Own Component

Create a String Analyzer (SA) component.
public class SAExampleComponent extends StringAnalyzerComponent {
    @Override
    public void add(StringEntity entity) {
        String id = entity.getId();
        String value = entity.getValue();

        //process entity here
    }
}
Use the component in the string analyzer.
public static void main(String[] args) {
    StringAnalyzer sa = StringAnalyzer
            .analyze()
            .with(new SAExampleComponent())
            .add(
                "text",
                "other text"
            );

    SAExampleComponent comp = sa.getComponent(SAExampleComponent.class);
}

String Statistics

StringAnalyzer sa = StringAnalyzer.analyze().withStatistics();

sa.add(StringEntity.withRandomUUID(null));
sa.add("");
sa.add("string");
sa.add("text");
sa.add("string");
sa.add(" \n ");
SAStatistics statistics = sa.getStatistics();
statistics.getCount();
statistics.getCountEmpty();
statistics.getCountNull();
statistics.getCountTrimEmpty();
statistics.getDistinctCount();
statistics.getDistinctCountOrdered();
statistics.getDistinctness();
statistics.getLengthStat();