Texana

Intranet Service

At pc-4336 a texana server should be running. Named Entity Recognition (NER) with the Named Entity Classifier for Wikidata (NECKAr) dataset can be performed in the following way.

curl --request POST \
  --url http://pc-4336.kl.dfki.de:8483/neckar/search \
  --header 'content-type: application/json' \
  --data '{
	"body": "Douglas Adams and George Carlin"
}'

See another example below.

Build

After building the projects, there is a texana-server.jar in the server project.

Configure

Put a config.json file in the working directory of the server. In the file the Finite State Machines (FSTs) can be configured in a JSON array. Every FST should have an id to identify it. The file is read when the server starts.

{
    "fst": [
    ]
}

Named Entity Recognition (NER)

We use data from Named Entity Classifier for Wikidata (NECKAr) to load named entities. Download WikidataNE_20170320_NECKAR_1_0.json_.gz. Add the FST to the config.json. This also will create a neckar.sqlite database to store meta data about the loaded entities.

{
    "fst": [
        {
            "id": "neckar",
            "reader": "NECKArMultiFST",
            "path": "WikidataNE_20170320_NECKAR_1_0.json_.gz",
            "serializationFile": "WikidataNE_20170320_NECKAR_1_0.serial"
        }
    ]
}

Option	Description
`path`	Location of the json gz file from NECKAr.
`serializationFile`	Location where the serialization file will be stored. This allows a faster loading time for the FST. Delete this file to force a reloading.
`max`	Maximum number of entities to read from the json gz NECKAr file. Use a small value for testing (e.g. 5000).
`bulkSize`	Bulk size for inserting into database.

Run

Start the server with the following command:

java -jar texana-server.jar

The server is running at http://localhost:8483. The config.json file is read on start-up. Wait until the console outputs [id] is ready.

Perform Named Entity Recognition (NER)

curl --request POST \
  --url http://localhost:8483/neckar/search \
  --header 'content-type: application/json' \
  --data '{
	"body": "Bill Maher is an american stand-up comedian"
}'

Option	Description
`body`	The text where named entities are searched.
`metadata`	If true, adds meta data to the result from wikidata such as occupation, gender, alias, etc.

Output

{
  "size": 1,
  "resources": [
    {
      "coveredText": "Bill Maher",
      "from": 0,
      "id": 489,
      "to": 10,
      "lang": "de",
      "type": "PER"
    }
  ],
  "body": "Bill Maher is an american stand-up comedian"
}

Texana
Text Analysis

Description

Usage

Intranet Service

Build

Configure

Named Entity Recognition (NER)

Run

Perform Named Entity Recognition (NER)

Code

Paper

TexanaText Analysis

Description

Usage

Intranet Service

Build

Configure

Named Entity Recognition (NER)

Run

Perform Named Entity Recognition (NER)

Code

Paper

Texana
Text Analysis