Ticket #102 (new enhancement)

Opened 4 years ago

Last modified 4 years ago

Optimisation for freetext search using regexs

Reported by: kjetil@… Owned by:
Priority: major Milestone:
Component: Unknown Version: 1.0
Keywords: Cc:

Description

I advocated strongly freetext as a deliverable for the SPARQL WG, but there wasn't enough consensus to get it in. Also, my requirements were pretty simple, and did not require the full XPath/XQuery requirements that many others have. I summarised my requirements in the following advocacy
message:

http://lists.w3.org/Archives/Public/public-rdf-dawg/2009AprJun/0174.html

The WG (amongst them, a certain 4store developer :-) ) noted that these requirements could be met by transforming the search string to a regular expression in the application layer. Indeed it can. However, no engines are optimized for this and it can potential be
much slower than the built-in free text index.

My feature request is thus that the engine optimizes for certain regular expressions

If the user search term is "dahut", then the regular expression boils down to

\bdahut\S*

I hope this is doable.

Change History

comment:1 Changed 4 years ago by swh

What were considering adding is something that will break certain predicates, eg, if you configure 4store to index rdfs:label as words, and you write:

<x> rdfs:label "foo bar baz" .

Then we will generate:

<x> 4store:word "foo", "bar", "baz"

so you can do a search like:

SELECT ?x WHERE {
  ?x 4store:word "foo" .
}

we'll probably add the option to generate metaphones and stemmed words too.

Would that be good enough for you?

Note: See TracTickets for help on using tickets.