Text Indexing

Support

Support for text indexing has been available since 1.0.4. It's been reasonably well tested, and is used in production in a few systems, but there may well be bugs remaining.

Configuration

In order to configure the text indexing, write triples to a graph called <system:config>, eg like 4s-import $KB -m system:config path/to/config/file.

An example config is:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix text: <http://4store.org/fulltext#> .
@prefix ex: <http://example.org/text#> .

rdfs:label text:index text:dmetaphone .
ex:token text:index text:token .
ex:stem text:index text:stem .

This means that objects of the predicate rdfs:label will be indexed with double metaphones, objects of ex:token will be indexed with plain text (lowercase) tokens and ex:stem will be stemmed.

You can pick what language's stemming algorithm is used with language tags, e.g.:

<> ex:stem "Alle Menschen sind frei und gleich an Würde und Rechten geboren. Sie sind mit Vernunft und Gewissen begabt und sollen einander im Geist der Brüderlichkeit begegnen."@de .

will be stemmed using a German stemming algorithm.

Some examples of the text indexing, and how to query it are shown here:  http://theno23.livejournal.com/17658.html