« Return to Thread: Preparing the ground for a real multilang index

Re: Preparing the ground for a real multilang index

by Jan Høydahl :: Rate this Message:

Reply to Author | View in Thread

Michael, you're of course right, copyfield would copy from source.
The lack of built-in language awareness in Solr is unfortunate :(
I have not tried Lucid's BasisTech lemmatizer implementation, but check
with them whether they can support multi languages in the same field.

--
Jan Høydahl
On 8. juli. 2009, at 16.32, Paul Libbrecht wrote:

> Can't the copy field use a different analyzer?
> Both for query and indexing?
> Otherwise you need to craft your own analyzer which reads the  
> language from the field-name... there's several classes ready for  
> this.
>
> paul
>
> Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit :
>
>> On 08.07.2009 00:50 Jan Høydahl wrote:
>>
>>> itself and do not need to know the query language. You may then want
>>> to do a copyfield from all your text_<lang> -> text for convenient  
>>> one-
>>> field-to-rule-them-all search.
>>
>> Would that really help? As I understand it, copyfield takes the  
>> raw, not
>> yet analyzed field value. I cannot see yet the advantage of this
>> "text"-field over the current situation with no text_<lang> fields  
>> at all.
>> The copied-to text field has to be language agnostic with no  
>> stemming at
>> all, so it would miss many hits. Or is there a way to combine many
>> differently stemmed variants into one field to be able to search  
>> against
>> all of them at once? That would be great indeed!
>>
>> -Michael
>

 « Return to Thread: Preparing the ground for a real multilang index