|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
[FB-Tracker] Created: (CORE-2721) Issue with SIMILAR TO and UTF8 on 2.5 Beta 2 (and 1)Issue with SIMILAR TO and UTF8 on 2.5 Beta 2 (and 1)
---------------------------------------------------- Key: CORE-2721 URL: http://tracker.firebirdsql.org/browse/CORE-2721 Project: Firebird Core Issue Type: Bug Components: Charsets/Collation, Engine Affects Versions: 2.5 Beta 2, 2.5 Beta 1 Reporter: Adriano dos Santos Fernandes Reported in fb-devel: I've encountered a problem using SIMILAR TO syntax on UTF-8 fields. I searched through the tracker but couldn't find this issue. What seems to be happening is that when using SIMILAR TO on UTF-8 fields doesn't get the right pattern length. I've made a testcase with a bit of data in it that you can get from: http://www.fantasycomic.com/t/testsimilarto.zip (both 2.5beta2 database as gbak backup are in there). In that testcase you'll find a table created with: CREATE TABLE TEST ( ANSIFIELD Varchar(100), UTF8FIELD Varchar(100) CHARACTER SET UTF8, UNICODECIFIELD Varchar(100) CHARACTER SET UTF8 COLLATE UNICODE_CI ); This table was then filled with a number of records, with the same value going into each three fields. The default character set for the database is ISO8859_1. I have done all my tests using flamerobin 0.8.6.1652 Unicode, connection charset was ISO8859_1. Now on that table if you do SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DELL %' ; You'll get 16 results. Amongst those results are incorrect matches like for example 'DE SINGEL'. All 16 matches start with 'DE' Change the query to: SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DE %' ; and you'll get 39 results, all starting with "D" but some incorrect matches like 'DHR. F. BERENDE'. Now, these same queries on AnsieField (which has the same data but different character set) will give 3 and 6 results respectively and all correct. If you change the query to use LIKE instead of SIMILAR TO on Utf8Field then you get the correct results. To me this looks like some of the substring matching in the regexp engine is using the wrong character set to calculate string sizes, so instead of 'DELL %' it sees 'DE%' (because the 5 character 'DELL ' literal string is seen as a 2-char one). I've tested this issue in Firebird 2.5 Beta 1 and Beta 2 running as superserver on windows vista, 32bit. If you wish I can run the same tests on xp32 bit and xp 64 bit. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tracker.firebirdsql.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel |
| Free embeddable forum powered by Nabble | Forum Help |