PHP support of Unicode?

View: New views
5 Messages — Rating Filter:   Alert me  

PHP support of Unicode?

by Gunnar Vestergaard-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

When using PHP, writing content in my local language and my neighbouring
contries' language, ISO 8859-1 has been sufficient as a character
encoding. But using PHP with other languages, is that possible? I mean,
does PHP support Unicode at present time? As I understand it, the
following statement is true:
"PHP supports Unicode only as long as it is encoded as UTF-8"

Is that correct, or does PHP also support UTF-16?

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: PHP support of Unicode?

by Rasmus Lerdorf :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gunnar Vestergaard wrote:
> When using PHP, writing content in my local language and my neighbouring
> contries' language, ISO 8859-1 has been sufficient as a character
> encoding. But using PHP with other languages, is that possible? I mean,
> does PHP support Unicode at present time? As I understand it, the
> following statement is true:
> "PHP supports Unicode only as long as it is encoded as UTF-8"
>
> Is that correct, or does PHP also support UTF-16?

It depends what you are doing.  PCRE, our regex library, only speaks
UTF-8 and there are functions like json_encode() that assume utf-8 as
well.  If you are just doing pass-through stuff, you can use whatever
you want.  It is only if you want to manipulate the text in some manner
that you need to worry about the encoding.

-Rasmus

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


RE: PHP support of Unicode?

by tex-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rasmus, Gunnar,

When Rasmus says "manipulate", it means to me "modify". Perhaps that is not
Rasmus' meaning.

However, you may need to be aware of the encoding if you are testing,
comparing, searching values, etc. as well.
For example a case-insensitive search would need to be aware of the encoding
to use the right values for upper and lower case.

More generally you should be aware of the encoding, label it properly, and
potentially convert encodings appropriately to/from processes or I/O that
may require another encoding.

My answer to Gunnar's question is that UTF-8 is a perfectly valid form of
Unicode (UTF-16, UTF-32 being others) and you don't need to favor utf-16.
PHP 5.3 has more functions for internationalization that are utf-8 and
locale based. You might look into those.

tex


-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@...]
Sent: Saturday, October 03, 2009 9:59 AM
To: Gunnar Vestergaard
Cc: php-i18n@...
Subject: Re: [PHP-I18N] PHP support of Unicode?

Gunnar Vestergaard wrote:
> When using PHP, writing content in my local language and my neighbouring
> contries' language, ISO 8859-1 has been sufficient as a character
> encoding. But using PHP with other languages, is that possible? I mean,
> does PHP support Unicode at present time? As I understand it, the
> following statement is true:
> "PHP supports Unicode only as long as it is encoded as UTF-8"
>
> Is that correct, or does PHP also support UTF-16?

It depends what you are doing.  PCRE, our regex library, only speaks
UTF-8 and there are functions like json_encode() that assume utf-8 as
well.  If you are just doing pass-through stuff, you can use whatever
you want.  It is only if you want to manipulate the text in some manner
that you need to worry about the encoding.

-Rasmus

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: PHP support of Unicode?

by Darren Cook :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> encoding. But using PHP with other languages, is that possible? I
> mean, does PHP support Unicode at present time?

Yes, PHP is an excellent choice for unicode work, both for websites and
commandline utilities.

> As I understand it, the following statement is true: "PHP supports
> Unicode only as long as it is encoded as UTF-8"

UTF-8 is usually what you want. Browser and tool and editor support is
most widespread for UTF-8 (especially on linux).

Even using a Microsoft SQL Server database, which I believe was using
UCS-2LE internally, I give it the data in UTF-8, and the database
connection handles the conversion behind the scenes for me.

Be aware if you go down the UTF-16 route you have to start caring about
LE vs. BE, and also need to understand the difference between UTF-16 and
UCS-2.

> Is that correct, or does PHP also support UTF-16?

If using the mbstring extension then UTF-16 is supported (including for
regexes apparently):
 http://jp.php.net/manual/en/mbstring.supported-encodings.php

mbstring has been on all shared hosts I've used, so it is usually fine
to rely on it.

Darren

--
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


RE: PHP support of Unicode?

by Andi Gutmans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You may also want to check out http://pecl.php.net/package/intl
Works with PHP 5.2 and PHP 5.3 and delivers a lot of functionality
(assumes it receives UTF-8).
There's also an intro article on Zend's devzone about it
http://devzone.zend.com/article/4799-Internationalization-in-PHP-5.3

Andi


--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php