|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
indexing xml messagesHi, the following junit test fails on 3 out of the 6 searches:
@Test public void indexXML() throws Exception { Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); RAMDirectory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, analyzer, true, IndexWriter.MaxFieldLength.LIMITED); Document doc = new Document(); String xml = FileHelper.readFileContent("lucene_work/myxml.xml"); doc.add(new Field("myxml", xml, Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); writer.close(); IndexReader reader = IndexReader.open(dir, true); // only searching, so read-only=true Searcher searcher = new IndexSearcher(reader); // Assert.assertEquals(1, searcher.search(new TermQuery(new Term("myxml", "123AB")), 1).totalHits); Assert.assertEquals(1, searcher.search(new TermQuery(new Term("myxml", "reference")), 1).totalHits); // Assert.assertEquals(1, searcher.search(new TermQuery(new Term("myxml", "operationImpact")), 1).totalHits); Assert.assertEquals(1, searcher.search(new TermQuery(new Term("myxml", "data")), 1).totalHits); // Assert.assertEquals(1, searcher.search(new TermQuery(new Term("myxml", "EFG")), 1).totalHits); searcher.close(); reader.close(); } given this xml message: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <operationImpact> <reference value="123AB"/> <data>EFG</data> </operationImpact> How do I get this to work? My goal is to be able to do full text search on XML documents. This includes tags, attribute values and tag values. Thanks, vince |
|
|
Re: indexing xml messagesStandardAnalyzer will, amongst other things, convert everything to
lowercase which means that term queries on mixed or upper case text will fail to match. There is some info on indexing XML docs in the FAQ http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_XML_documents.3F and I'm sure that Google would find loads more stuff. And Luke is invaluable for seeing what your index really holds. -- Ian. On Tue, Nov 3, 2009 at 7:40 AM, vsevel <v.sevel@...> wrote: > > Hi, the following junit test fails on 3 out of the 6 searches: > > @Test > public void indexXML() throws Exception { > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); > RAMDirectory dir = new RAMDirectory(); > IndexWriter writer = new IndexWriter(dir, analyzer, true, > IndexWriter.MaxFieldLength.LIMITED); > Document doc = new Document(); > String xml = FileHelper.readFileContent("lucene_work/myxml.xml"); > doc.add(new Field("myxml", xml, Field.Store.YES, > Field.Index.ANALYZED)); > doc.add(new Field("id", "1", Field.Store.YES, > Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > writer.close(); > > IndexReader reader = IndexReader.open(dir, true); // only searching, > so read-only=true > Searcher searcher = new IndexSearcher(reader); > // Assert.assertEquals(1, searcher.search(new TermQuery(new > Term("myxml", "123AB")), 1).totalHits); > Assert.assertEquals(1, searcher.search(new TermQuery(new > Term("myxml", "reference")), 1).totalHits); > // Assert.assertEquals(1, searcher.search(new TermQuery(new > Term("myxml", "operationImpact")), 1).totalHits); > Assert.assertEquals(1, searcher.search(new TermQuery(new > Term("myxml", "data")), 1).totalHits); > // Assert.assertEquals(1, searcher.search(new TermQuery(new > Term("myxml", "EFG")), 1).totalHits); > searcher.close(); > reader.close(); > } > > given this xml message: > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <operationImpact> > <reference value="123AB"/> > <data>EFG</data> > </operationImpact> > > How do I get this to work? My goal is to be able to do full text search on > XML documents. This includes tags, attribute values and tag values. > > Thanks, > vince > -- > View this message in context: http://old.nabble.com/indexing-xml-messages-tp26160016p26160016.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@... > For additional commands, e-mail: java-user-help@... > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
| Free embeddable forum powered by Nabble | Forum Help |