|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Lucene oddity1.3.0dev-rev9849
This is the first time I have worked with Lucene, so I don't know if (or how) this worked in previous versions of Exist.
I have the following configuration:
<collection xmlns="http://exist-db.org/collection-config/1.0"> <index xmlns:atom="http://www.w3.org/2005/Atom"xmlns:html ="http://www.w3.org/1999/xhtml"xmlns:wiki ="http://exist-db.org/xquery/wiki"> <!-- Disable the standard full text index --> <fulltext default="none" attributes="no"/> <!-- Lucene index is configured below --> <lucene> <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> <analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> <text match="*|@*"/> </lucene> </index></collection>
I run this XQuery:
import module namespace lucene = 'http://exist-db.org/xquery/lucene'; //SPEECH[lucene:query(., 'lord')]
And as you would expect, I get a bunch of <SPEECH/> elements back. Great! So far, so good.
However, if I change to this (and reindex):
<text match="//SPEECH//*"/> I get nothing. Nada. Zip. Yes, I am doing reindexing consistently. And I had Paul Ryan look at it as well, and he's stumped (he's better with the deep Exist configuration than I am).
I'm pulling in both "Much Ado about Nothing" and "A Comedy of Errors". The text match element is straight from your documentation for Lucene, so I would expect it to work.
Any ideas on different things I could try to figure this out? Am I using it wrong?
Jason Smith ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddity<collection xmlns="http://exist-db.org/collection-config/1.0"> <index xmlns:ifp="http://www.ifactory.com/press" xmlns:dc="http://purl.org/dc/elements/1.1/"> <fulltext default="none" attributes="no"/> <lucene> <text qname="SPEECH" /> </lucene> </index> </collection> Jason Smith wrote:
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddity > <text match="//SPEECH//*"/>
The pattern syntax for match looks like XPath, but it is not. In particular, // is a bit counter-intuitive: match="//SPEECH//*" includes all descendant nodes of SPEECH, but not SPEECH itself. I will think about changing this, but I'm not yet sure how (maybe we should choose other separators than /, so it doesn't look like XPath). Anyway, with your configuration, //SPEECH[lucene:query(LINE, 'lord')] should return matches while //SPEECH[lucene:query(., 'lord')] does not since SPEECH itself has no index. If you create an index on SPEECH, you can query SPEECH, but not its child LINE, and vice versa. Wolfgang ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddity> I think it's recommended to use qname indexes now.
We recommend to use a qname instead of a "path" for the range and old full text indexes. The new indexes (Lucene, N-gram) don't accept the old type of "path" definition. Background: for performance reasons, it is best if eXist knows at compile time, what indexes are available. This wasn't possible with the old indexes on "path". However, since people started complaining, I reintroduced a somewhat similar - though different behind the scenes - feature for the Lucene index, which now accepts a "match" attribute with a path. I'm not sure if we should keep that. Maybe it would be better to make the semantics more explicit, e.g. <text qname="SPEECH" descend="yes"/> which would index SPEECH and all descendants below it. Wolfgang ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddityI'm not sure if the path syntax allows you to do more than you can with
<text qname="SPEECH" descend="yes|no"/> but if it doesn't, this seems better. I think this syntax is probably easier to understand; the other one seems as if it could easily mislead one into thinking xpath is available, as you said. Also: it's been a little while since I worked on this, but I thought I remembered the default being descend="yes", effectively, with the option of specifying <ignore> elements to exclude some content - is that right? -Mike Wolfgang wrote: >> I think it's recommended to use qname indexes now. > > We recommend to use a qname instead of a "path" for the range and old > full text indexes. The new indexes (Lucene, N-gram) don't accept the > old type of "path" definition. Background: for performance reasons, it > is best if eXist knows at compile time, what indexes are available. > This wasn't possible with the old indexes on "path". > > However, since people started complaining, I reintroduced a somewhat > similar - though different behind the scenes - feature for the Lucene > index, which now accepts a "match" attribute with a path. I'm not sure > if we should keep that. Maybe it would be better to make the semantics > more explicit, e.g. > > <text qname="SPEECH" descend="yes"/> > > which would index SPEECH and all descendants below it. > > Wolfgang ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddity> Also: it's been a little while since I worked on this, but I thought I
> remembered the default being descend="yes", effectively, with the option > of specifying <ignore> elements to exclude some content - is that right? No, <text qname="SPEECH"/> creates an index ONLY on SPEECH. What is passed to Lucene is the string value of SPEECH, which includes the text of all its descendant text nodes, *except* those filtered out by an optional <ignore>. For example, consider the fragment: <SPEECH> <SPEAKER>Second Witch</SPEAKER> <LINE>Fillet of a fenny snake,</LINE> <LINE>In the cauldron boil and bake;</LINE> </SPEECH> If you have an index on SPEECH, Lucene will create a "document" with the text "Second Witch Fillet of a fenny snake, In the cauldron boil and bake;" and indexes it. eXist internally links this Lucene document to the SPEECH node, but Lucene has no knowledge of that (it doesn't know anything about XML nodes). The query: //SPEECH[ft:query(., 'cauldron')] searches the index and finds the "document" containing the SPEECH text, which eXist can trace back to the SPEECH node in the XML document. However, it is required that you use the same context (SPEECH) for creating and querying the index. The query: //SPEECH[ft:query(LINE, 'cauldron')] will not return anything, even though LINE is a child of SPEECH and 'cauldron' was indexed. This particular 'cauldron' is linked to its ancestor SPEECH node, not its parent LINE. However, you are free to give the user both options, i.e. use SPEECH and LINE as context at the same time. How? Simply define a second index on LINE: <text qname="SPEECH"/> <text qname="LINE"/> Concerning <ignore> and <inline>: every text string is passed through Lucene's analyzer before it is indexed. eXist's <ignore> and <inline> configuration tags simply allow you to slightly modify the text before Lucene sees it. The config <text qname="SPEECH"><ignore qname="SPEAKER"/></text> removes the SPEAKER part, so Lucene will only see "Fillet of a fenny snake, In the cauldron boil and bake;". I hope this helps to clarify the issue. I think I will add this explanation to the documentation. Wolfgang ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
|
|
|
Re: Lucene odditySo I'm trying to do a Lucene full-text index on an attribute value. The following works fine:
<text match="//@id"/> But the documentation says that the "match" syntax is experimental and may be removed. I tried the following using the qname syntax, but neither worked: <text qname="@id"/> or <text qname="id"/> How do you index an attribute value using the qname syntax? |
|
|
Re: Lucene oddity<text qname="@id"/>
Did you reindex? -- Roy VanP wrote: > So I'm trying to do a Lucene full-text index on an attribute value. The > following works fine: > > <text match="//@id"/> > > But the documentation says that the "match" syntax is experimental and may > be removed. I tried the following using the qname syntax, but neither > worked: > > <text qname="@id"/> or <text qname="id"/> > > How do you index an attribute value using the qname syntax? > ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open |
|
|
Re: Lucene oddityWell, I thought I did, but I just tried it and now it works. I wasn't sure I had the right syntax as the documentation doesn't show adding an attribute as an index. Thanks for getting me to try it again. Paul |
| Free embeddable forum powered by Nabble | Forum Help |