Ticket #102 (new enhancement)
Optimisation for freetext search using regexs
| Reported by: | kjetil@… | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | Unknown | Version: | 1.0 |
| Keywords: | Cc: |
Description
I advocated strongly freetext as a deliverable for the SPARQL WG, but there wasn't enough consensus to get it in. Also, my requirements were pretty simple, and did not require the full XPath/XQuery requirements that many others have. I summarised my requirements in the following advocacy
message:
http://lists.w3.org/Archives/Public/public-rdf-dawg/2009AprJun/0174.html
The WG (amongst them, a certain 4store developer :-) ) noted that these requirements could be met by transforming the search string to a regular expression in the application layer. Indeed it can. However, no engines are optimized for this and it can potential be
much slower than the built-in free text index.
My feature request is thus that the engine optimizes for certain regular expressions
If the user search term is "dahut", then the regular expression boils down to
\bdahut\S*
I hope this is doable.

What were considering adding is something that will break certain predicates, eg, if you configure 4store to index rdfs:label as words, and you write:
Then we will generate:
so you can do a search like:
SELECT ?x WHERE { ?x 4store:word "foo" . }we'll probably add the option to generate metaphones and stemmed words too.
Would that be good enough for you?