|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
utf-8 characters delivered as question marks involuntaryDear listmates,
Please, let me start with an illustration to cut it short: input => output A: 3313 => 3313; B: 3313 => 3?13; ------------------------- 3 -> a variable-length character; 1 -> a one byte ascii character; ? -> a one byte ascii question mark (literaly) That means when i output a constant (a string of variable-length characters) with an echo() function i normally get everything correct in a browser window. That makes a case A Sometimes (probability 1/(2-20)) i get a question mark instead of a particular variable-length character when the other variable-length characters are displayed correctly within the same page. This is case B Can someone please tell me what could be the cause of that. Personally i can imagine anything from a faulty hardware from the memory down the nic to a software error anywhere but the drivers. And i would rather like to narrow the scope. Thanks Nash More details for an instant: - Only characters read from a utf-8 encoded file with a function file_get_contents() seem to be displayed correctly at all times; - Characters that are read by the php.exe (please excuse me here) do fail once in a while - Interesting observation is that in case of a failure 'mbstring.internal_encoding' constant output by the phpinfo() changes from undefined to ISO - Even more interesting observation is that changing this 'mbstring.internal_encoding' and setting it to UTF-8 locally does not affect things to the better. A glimpse of the php settings: mb_internal_encoding("UTF-8"); ini_set('mbstring.internal_encoding', 'UTF-8'); ini_set('iconv.input_encoding', 'UTF-8'); ini_set('iconv.internal_encoding', 'UTF-8'); ini_set('iconv.output_encoding', 'UTF-8'); ini_set('default_charset', 'UTF-8'); ini_set('detect_unicode', 'Off'); ini_set('display_startup_errors', 'On'); ini_set('output_buffering', 'On'); ini_set('zlib.output_compression', 'Off'); |
|
|
Re: utf-8 characters delivered as question marks involuntary> input => output
> A: 3313 => 3313; > B: 3313 => 3?13; > ------------------------- > 3 -> a variable-length character; > 1 -> a one byte ascii character; > ? -> a one byte ascii question mark (literaly) > ... > Sometimes (probability 1/(2-20)) i get a question mark instead of a > particular variable-length character when the other variable-length > characters are displayed correctly within the same page. Question mark is often used for characters that have no code point in the charset. So, what is the code point of the character that turns into a question mark? Does that same code point output okay some of the time? Of course, this explanation is a bit dubious if input is UTF-8, internal encoding is UTF-8 and output is UTF-8. > More details for an instant: - Only characters read from a utf-8 > encoded file with a function file_get_contents() seem to be displayed > correctly at all times; - Characters that are read by the php.exe > (please excuse me here) do fail once in a while Ah, could you be reading a partial multi-byte character and processing it? E.g. taking an extreme example if you read 8 bytes, process them, then repeat with the next 8 bytes, and they are an equal mix of characters 1, 2 and 3 bytes long then there is a high probability you will processing garbage. Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic open source dictionary/semantic network) http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php |
|
|
|
|
|
Re: utf-8 characters delivered as question marks involuntary> > > Do you think i should try
> > > to read from a > > > better sector? > > > > Unless you think the hardware is particularly old and > > suspect (e.g. > > unstable on all applications) I think the chance of it > > being a software > > bug or configuration issue is 99%. > > > > Darren > > > > What i'm doing now is reading the file in a bad sector > with the function file_get_contents() and sending it to the > screen. It does not fail from the file_get_contents() side > when it does fail from the php.exe side. > > Nash > moved to a different virtual server with a different executable version, same hard drive phpinfo() now clearly states Multibyte Support enabled no optimization engine here lets wait untill the question marks start getting into databases Nash -- PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php |
| Free embeddable forum powered by Nabble | Forum Help |