|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Accent-insensitive searchesI'm working on an database application. Items in the database are all in spanish language. It's mandatory that searches are accent-insensitive, meaning that, for example, a search for the word 'electrico' (no accent) must return entrances containing 'eléctrico' (with accent).
Searching the web for a solution, I find I must set these two properties when creating the database: territory=es_ES collation=TERRITORY_BASED But it still doesn't work this way. Looks like the default collation for es_ES is still accent-sensitive. So I try to use a custom collator that will behave as I need to. I find some instructions for this in the following blog: http://blogs.sun.com/kah/entry/user_defined_collation_in_apache In brief, I define a new CollatorProvider and register it with the JVM. Here's the code for this class: public class IgnoraAcentosCollatorProvider extends java.text.spi.CollatorProvider { @Override public Collator getInstance(Locale locale) { if (!locale.equals(new Locale("es","ES","accentinsensitive"))){ throw new IllegalArgumentException("Solo acepta es_ES_accentinsensitive"); } Collator c=Collator.getInstance(new Locale("es","ES")); c.setStrength(Collator.PRIMARY); return c; } @Override public Locale[] getAvailableLocales() { return new Locale[]{ new Locale("es","ES","accentinsensitive") }; } } It simply takes the default es_ES Collator and changes strength to PRIMARY. This makes the collator return 0 when comparing 'electrico' and 'eléctrico'. After making sure this new Collator is available for the JVM, I re-start Derby and make a new database, now setting territory=es_ES_accentinsensitive The database is created without errors (meaning Derby reaches my Collator), but searches are still accent-sensitive (no matter if I use = or LIKE operators). Any clue? I made intensive searches about this issue but I found no solution. I can avoid the problem simply using MySQL (the default spanish configuration has already the desired behaviour) but I would like to keep on using Derby if possible. I'm using JavaDB-Derby 10.4.2.1 Thanks. |
|
|
Re: Accent-insensitive searchesHi Josu,
And sorry for the late reply. I've tried the exact same thing as you yesterday - except I have used Locale.FRENCH as /base/ locale: > Collator c=Collator.getInstance(Locale.FRENCH); It works like a charm (both with = and LIKE). I am using Apache Derby 10.5.1.1 and Sun JDK 1.6.0_12. Maybe you should try with a more recent version of Derby (?) Otherwise, check that you've specified your custom collator at DB creation time. Not when booting the DB: since even if you specify an other collator when you boot an existing DB, it's still the collator specified at creation time that is used. Finally, the most doubtful one: accent difference might be a PRIMARY difference for the default es_ES collator (??). But, that would be really surprising... Let us know if you find the answer! Best regards, Sylvain josu a écrit : > > I'm working on an database application. Items in the database are all in > spanish language. It's mandatory that searches are accent-insensitive, > meaning that, for example, a search for the word 'electrico' (no accent) > must return entrances containing 'eléctrico' (with accent). > > Searching the web for a solution, I find I must set these two properties > when creating the database: > > territory=es_ES > collation=TERRITORY_BASED > > But it still doesn't work this way. Looks like the default collation for > es_ES is still accent-sensitive. > > So I try to use a custom collator that will behave as I need to. I find some > instructions for this in the following blog: > > http://blogs.sun.com/kah/entry/user_defined_collation_in_apache > http://blogs.sun.com/kah/entry/user_defined_collation_in_apache > > In brief, I define a new CollatorProvider and register it with the JVM. > Here's the code for this class: > > > public class IgnoraAcentosCollatorProvider extends > java.text.spi.CollatorProvider { > > @Override > public Collator getInstance(Locale locale) { > if (!locale.equals(new Locale("es","ES","accentinsensitive"))){ > throw new IllegalArgumentException("Solo acepta > es_ES_accentinsensitive"); > } > Collator c=Collator.getInstance(new Locale("es","ES")); > c.setStrength(Collator.PRIMARY); > return c; > } > > @Override > public Locale[] getAvailableLocales() { > return new Locale[]{ > new Locale("es","ES","accentinsensitive") > }; > } > > } > > > It simply takes the default es_ES Collator and changes strength to PRIMARY. > This makes the collator return 0 when comparing 'electrico' and 'eléctrico'. > > After making sure this new Collator is available for the JVM, I re-start > Derby and make a new database, now setting territory=es_ES_accentinsensitive > > The database is created without errors (meaning Derby reaches my Collator), > but searches are still accent-sensitive (no matter if I use = or LIKE > operators). > > Any clue? I made intensive searches about this issue but I found no > solution. I can avoid the problem simply using MySQL (the default spanish > configuration has already the desired behaviour) but I would like to keep on > using Derby if possible. > > I'm using JavaDB-Derby 10.4.2.1 > > Thanks. -- Website: http://www.chicoree.fr |
|
|
Re: Accent-insensitive searchesjosu <josu.eh.mondo@...> writes:
> I'm working on an database application. Items in the database are all in > spanish language. It's mandatory that searches are accent-insensitive, > meaning that, for example, a search for the word 'electrico' (no accent) > must return entrances containing 'eléctrico' (with accent). > > Searching the web for a solution, I find I must set these two properties > when creating the database: > > territory=es_ES > collation=TERRITORY_BASED > > But it still doesn't work this way. Looks like the default collation for > es_ES is still accent-sensitive. > > So I try to use a custom collator that will behave as I need to. I find some > instructions for this in the following blog: > > http://blogs.sun.com/kah/entry/user_defined_collation_in_apache > http://blogs.sun.com/kah/entry/user_defined_collation_in_apache > > In brief, I define a new CollatorProvider and register it with the JVM. > Here's the code for this class: > > > public class IgnoraAcentosCollatorProvider extends > java.text.spi.CollatorProvider { > > @Override > public Collator getInstance(Locale locale) { > if (!locale.equals(new Locale("es","ES","accentinsensitive"))){ > throw new IllegalArgumentException("Solo acepta > es_ES_accentinsensitive"); > } > Collator c=Collator.getInstance(new Locale("es","ES")); > c.setStrength(Collator.PRIMARY); > return c; > } > > @Override > public Locale[] getAvailableLocales() { > return new Locale[]{ > new Locale("es","ES","accentinsensitive") > }; > } > > } > > > It simply takes the default es_ES Collator and changes strength to PRIMARY. > This makes the collator return 0 when comparing 'electrico' and 'eléctrico'. > > After making sure this new Collator is available for the JVM, I re-start > Derby and make a new database, now setting territory=es_ES_accentinsensitive > > The database is created without errors (meaning Derby reaches my Collator), > but searches are still accent-sensitive (no matter if I use = or LIKE > operators). > > Any clue? I made intensive searches about this issue but I found no > solution. I can avoid the problem simply using MySQL (the default spanish > configuration has already the desired behaviour) but I would like to keep on > using Derby if possible. > > I'm using JavaDB-Derby 10.4.2.1 Hi, It seems to work in my environment. I used the collator provider class that you posted and performed the steps below. (Note that when you set the strength of the collator to PRIMARY, it will be case-insensitive in addition to being accent-insensitive. Making it accent-insensitive and at the same time case-sensitive is probably more work, but I think it should be doable.) kah@tecra:/tmp/coll % javac IgnoraAcentosCollatorProvider.java kah@tecra:/tmp/coll % mkdir -p META-INF/services kah@tecra:/tmp/coll % echo IgnoraAcentosCollatorProvider > META-INF/services/java.text.spi.CollatorProvider kah@tecra:/tmp/coll % jar cf coll.jar IgnoraAcentosCollatorProvider* META-INF kah@tecra:/tmp/coll % java -version java version "1.6.0_15" Java(TM) SE Runtime Environment (build 1.6.0_15-b03) Java HotSpot(TM) Server VM (build 14.1-b02, mixed mode) kah@tecra:/tmp/coll % java -Djava.ext.dirs=. -jar /code/derby/oldreleases/10.4.2.0/derbyrun.jar ij ij version 10.4 ij> connect 'jdbc:derby:mydb;create=true;territory=es_ES_accentinsensitive;collation=TERRITORY_BASED'; ij> create table t(x varchar(20)); 0 rows inserted/updated/deleted ij> insert into t values 'electrico','eléctrico','Electrico'; 3 rows inserted/updated/deleted ij> select * from t where x='electrico'; X -------------------- electrico eléctrico Electrico 3 rows selected ij> You can verify that the database was created with the correct territory and collation by evaluating this in IJ and see that "TERRITORY_BASED" is returned: ij> values syscs_util.syscs_get_database_property('derby.database.collation'); 1 ------------------------- TERRITORY_BASED 1 row selected And also check that service.properties in the database directory contains the following line: derby.serviceLocale=es_ES_accentinsensitive Hope this helps, -- Knut Anders |
|
|
Re: Accent-insensitive searchesBonsoir Sylvain,
I was just starting to look for the way to implement case insensitive and accent insensitive SELECT in Derby when I received this mail. I tried what you suggest but I can't make it working. I kown, it is not a purely Derby question but I guess that those who are not familiar with spi may have the same problem, so I post on this mailing list. I have defined my jar with the META-INF file and ServiceLoader.load( java.text.spi.CollatorProvider.class ) call is returning my class correctly. Anyway Derby throw ERROR XBM04 as it can't find my provider. I search into the java API source and found that ServiceLoader.loadInstalled() was called by sun.util.LocaleServiceProviderPool (Locale.getAvailableLocales()) And javadoc says : "This method is intended for use when only installed providers are desired. The resulting service will only find and load providers that have been installed into the current Java virtual machine; providers on the application's class path will be ignored." Does that mean I have to deploy my jar CollatorProvider somewhere else than in the application class path ? It should be a deployment problem for me... What means "installed into the current Java virtual machine" ? It is a shame that DERBY doesn't implement an easier way to do such frequent action. A case insensitive, accent insensitive behaviour option **should be** a standard DERBY feature (as SQL Server has and ORACLE don't have) ! Thanks jylaxx ----- Original Message ----- From: "Sylvain Leroux" <sl20@...> To: "Derby Discussion" <derby-user@...> Sent: Friday, September 11, 2009 10:33 AM Subject: Re: Accent-insensitive searches > Hi Josu, > > And sorry for the late reply. > > > I've tried the exact same thing as you yesterday - except I have used > Locale.FRENCH as /base/ locale: > > Collator c=Collator.getInstance(Locale.FRENCH); > > It works like a charm (both with = and LIKE). > I am using Apache Derby 10.5.1.1 and Sun JDK 1.6.0_12. > > > Maybe you should try with a more recent version of Derby (?) > > > Otherwise, check that you've specified your custom collator at DB creation > time. Not when booting the DB: since even if you specify an other collator > when you boot an existing DB, it's still the collator specified at > creation time that is used. > > > Finally, the most doubtful one: accent difference might be a PRIMARY > difference for the default es_ES collator (??). But, that would be really > surprising... > > > Let us know if you find the answer! > > Best regards, > Sylvain > > > josu a écrit : >> >> I'm working on an database application. Items in the database are all in >> spanish language. It's mandatory that searches are accent-insensitive, >> meaning that, for example, a search for the word 'electrico' (no accent) >> must return entrances containing 'eléctrico' (with accent). >> >> Searching the web for a solution, I find I must set these two properties >> when creating the database: territory=es_ES >> collation=TERRITORY_BASED >> >> But it still doesn't work this way. Looks like the default collation for >> es_ES is still accent-sensitive. >> >> So I try to use a custom collator that will behave as I need to. I find >> some >> instructions for this in the following blog: >> >> http://blogs.sun.com/kah/entry/user_defined_collation_in_apache >> http://blogs.sun.com/kah/entry/user_defined_collation_in_apache In brief, >> I define a new CollatorProvider and register it with the JVM. >> Here's the code for this class: >> >> >> public class IgnoraAcentosCollatorProvider extends >> java.text.spi.CollatorProvider { >> >> @Override >> public Collator getInstance(Locale locale) { >> if (!locale.equals(new Locale("es","ES","accentinsensitive"))){ >> throw new IllegalArgumentException("Solo acepta >> es_ES_accentinsensitive"); >> } >> Collator c=Collator.getInstance(new Locale("es","ES")); >> c.setStrength(Collator.PRIMARY); >> return c; >> } >> >> @Override >> public Locale[] getAvailableLocales() { >> return new Locale[]{ >> new Locale("es","ES","accentinsensitive") >> }; >> } >> >> } >> >> >> It simply takes the default es_ES Collator and changes strength to >> PRIMARY. >> This makes the collator return 0 when comparing 'electrico' and >> 'eléctrico'. >> >> After making sure this new Collator is available for the JVM, I re-start >> Derby and make a new database, now setting >> territory=es_ES_accentinsensitive >> >> The database is created without errors (meaning Derby reaches my >> Collator), >> but searches are still accent-sensitive (no matter if I use = or LIKE >> operators). >> >> Any clue? I made intensive searches about this issue but I found no >> solution. I can avoid the problem simply using MySQL (the default spanish >> configuration has already the desired behaviour) but I would like to keep >> on >> using Derby if possible. >> >> I'm using JavaDB-Derby 10.4.2.1 >> >> Thanks. > > > -- > Website: http://www.chicoree.fr > > > |
|
|
Re: Accent-insensitive searchesWell I found part of the answer by my own.
So the jar need to be in the jre lib/ext directory. Is that compulsory ? I deploy all my application in only one directory, it is really a problem to have to put something into the jre/lib/ext !! jylaxx ----- Original Message ----- From: "dev@xx" <dev@...> To: "Derby Discussion" <derby-user@...> Sent: Friday, September 11, 2009 6:57 PM Subject: Re: Accent-insensitive searches > Bonsoir Sylvain, > > I was just starting to look for the way to implement case insensitive and > accent insensitive SELECT in Derby when I received this mail. > > I tried what you suggest but I can't make it working. I kown, it is not a > purely Derby question but I guess that those who are not familiar with spi > may have the same problem, so I post on this mailing list. > > I have defined my jar with the META-INF file and ServiceLoader.load( > java.text.spi.CollatorProvider.class ) call is returning my class > correctly. > Anyway Derby throw ERROR XBM04 as it can't find my provider. > I search into the java API source and found that > ServiceLoader.loadInstalled() was called by > sun.util.LocaleServiceProviderPool (Locale.getAvailableLocales()) > And javadoc says : "This method is intended for use when only installed > providers are desired. The resulting service will only find and load > providers that have been installed into the current Java virtual machine; > providers on the application's class path will be ignored." > > Does that mean I have to deploy my jar CollatorProvider somewhere else > than in the application class path ? It should be a deployment problem for > me... > What means "installed into the current Java virtual machine" ? > > It is a shame that DERBY doesn't implement an easier way to do such > frequent action. A case insensitive, accent insensitive behaviour option > **should be** a standard DERBY feature (as SQL Server has and ORACLE don't > have) ! > > Thanks > > jylaxx > > > ----- Original Message ----- > From: "Sylvain Leroux" <sl20@...> > To: "Derby Discussion" <derby-user@...> > Sent: Friday, September 11, 2009 10:33 AM > Subject: Re: Accent-insensitive searches > > >> Hi Josu, >> >> And sorry for the late reply. >> >> >> I've tried the exact same thing as you yesterday - except I have used >> Locale.FRENCH as /base/ locale: >> > Collator c=Collator.getInstance(Locale.FRENCH); >> >> It works like a charm (both with = and LIKE). >> I am using Apache Derby 10.5.1.1 and Sun JDK 1.6.0_12. >> >> >> Maybe you should try with a more recent version of Derby (?) >> >> >> Otherwise, check that you've specified your custom collator at DB >> creation time. Not when booting the DB: since even if you specify an >> other collator when you boot an existing DB, it's still the collator >> specified at creation time that is used. >> >> >> Finally, the most doubtful one: accent difference might be a PRIMARY >> difference for the default es_ES collator (??). But, that would be really >> surprising... >> >> >> Let us know if you find the answer! >> >> Best regards, >> Sylvain >> >> >> josu a écrit : >>> >>> I'm working on an database application. Items in the database are all in >>> spanish language. It's mandatory that searches are accent-insensitive, >>> meaning that, for example, a search for the word 'electrico' (no accent) >>> must return entrances containing 'eléctrico' (with accent). >>> >>> Searching the web for a solution, I find I must set these two properties >>> when creating the database: territory=es_ES >>> collation=TERRITORY_BASED >>> >>> But it still doesn't work this way. Looks like the default collation for >>> es_ES is still accent-sensitive. >>> >>> So I try to use a custom collator that will behave as I need to. I find >>> some >>> instructions for this in the following blog: >>> >>> http://blogs.sun.com/kah/entry/user_defined_collation_in_apache >>> http://blogs.sun.com/kah/entry/user_defined_collation_in_apache In >>> brief, I define a new CollatorProvider and register it with the JVM. >>> Here's the code for this class: >>> >>> >>> public class IgnoraAcentosCollatorProvider extends >>> java.text.spi.CollatorProvider { >>> >>> @Override >>> public Collator getInstance(Locale locale) { >>> if (!locale.equals(new Locale("es","ES","accentinsensitive"))){ >>> throw new IllegalArgumentException("Solo acepta >>> es_ES_accentinsensitive"); >>> } >>> Collator c=Collator.getInstance(new Locale("es","ES")); >>> c.setStrength(Collator.PRIMARY); >>> return c; >>> } >>> >>> @Override >>> public Locale[] getAvailableLocales() { >>> return new Locale[]{ >>> new Locale("es","ES","accentinsensitive") >>> }; >>> } >>> >>> } >>> >>> >>> It simply takes the default es_ES Collator and changes strength to >>> PRIMARY. >>> This makes the collator return 0 when comparing 'electrico' and >>> 'eléctrico'. >>> >>> After making sure this new Collator is available for the JVM, I re-start >>> Derby and make a new database, now setting >>> territory=es_ES_accentinsensitive >>> >>> The database is created without errors (meaning Derby reaches my >>> Collator), >>> but searches are still accent-sensitive (no matter if I use = or LIKE >>> operators). >>> >>> Any clue? I made intensive searches about this issue but I found no >>> solution. I can avoid the problem simply using MySQL (the default >>> spanish >>> configuration has already the desired behaviour) but I would like to >>> keep on >>> using Derby if possible. >>> >>> I'm using JavaDB-Derby 10.4.2.1 >>> >>> Thanks. >> >> >> -- >> Website: http://www.chicoree.fr >> >> >> > > |
|
|
Re: Accent-insensitive searchesSalut Jylaxx,
dev@xx a écrit : > > Well I found part of the answer by my own. > So the jar need to be in the jre lib/ext directory. > Is that compulsory ? I deploy all my application in only one directory, > it is really a problem to have to put something into the jre/lib/ext !! > > jylaxx You may override the standard Java property java.ext.dirs to specify an other directory for your Java extensions: java -Djava.ext.dirs=/path/to/my/ext ... Starting with JDK6 (I think), you may even specify several directories: java -Djava.ext.dirs=/path/to/my/ext:/usr/local/lib/java/ext ... See: http://java.sun.com/docs/books/tutorial/ext/basics/install.html Kind regards, Sylvain -- Website: http://www.chicoree.fr |
|
|
Re: Accent-insensitive searchesThanks Sylvain and Knut for your answers.
Looks like Knut tried and succeeded using version 10.4.2.0... That's the way I'm doing it I already checked that the collator works right when setting strength to PRIMARY I checked both, and they're ok. But the result of the queries is still accent-sensitive. I'm working with Windows XP and it seems you're working with Linux, could that be the reason of the different behaviour? Thanks again for your help, Josu Larrea |
|
|
Re: Accent-insensitive searchesjosu <josu.eh.mondo@...> writes:
> Knut Anders Hatlen wrote: >> >> You can verify that the database was created with the correct territory >> and collation by evaluating this in IJ and see that "TERRITORY_BASED" is >> returned: >> ... >> And also check that service.properties in the database directory >> contains the following line: >> >> derby.serviceLocale=es_ES_accentinsensitive >> > > I checked both, and they're ok. But the result of the queries is still > accent-sensitive. I'm working with Windows XP and it seems you're working > with Linux, could that be the reason of the different behaviour? Maybe. I've only tried it on Solaris, and I don't know if there are any differences on other platforms. To check that the Java VM picks up your collator correctly, you could try this code and see what it prints: Locale locale = new Locale("es", "ES", "accentinsensitive"); Collator c = Collator.getInstance(locale); System.out.println(c.compare("electrico", "eléctrico")); "0" should be printed if your custom collator has been correctly installed. -- Knut Anders |
|
|
Re: Accent-insensitive searchesThanks again. I already checked that, and it's properly working. So it seems to be Derby who isn't using my collator when making comparisons, even if I set it at the databases collator...
|
| Free embeddable forum powered by Nabble | Forum Help |