|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
SimpleXML - UTF8I hope there is an easy answer to this:
I am using a remote XML service, that about 1 in 100 times returns XML with invalid UTF-8 bytes. I don't have any control over the remote service, but simpleXML pukes when I pass malformed UTF-8 to it. Does anyone know of a simple way to cleanup bad UTF-8 bytes, e.g. replace the invalid bytes with a '?'. Rgds, John Campbell _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
|
|
Re: SimpleXML - UTF8a simple str_replace can replace the invalid characters..but you have to know what they are first.
----- Original Message ----- From: "John Campbell" <jcampbell1@...> To: "NYPHP Talk" <talk@...> Sent: Saturday, October 17, 2009 4:59:53 PM Subject: [nyphp-talk] SimpleXML - UTF8 I hope there is an easy answer to this: I am using a remote XML service, that about 1 in 100 times returns XML with invalid UTF-8 bytes. I don't have any control over the remote service, but simpleXML pukes when I pass malformed UTF-8 to it. Does anyone know of a simple way to cleanup bad UTF-8 bytes, e.g. replace the invalid bytes with a '?'. Rgds, John Campbell _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
|
|
Re: SimpleXML - UTF8I have this handy function I pulled from somewhere else. Does it help?
Apologies if the actual characters don't come across in the email. /** * This function was created to scrub additional html entities that are not in the PHP get_html_translation_table * Currently bug #34577 in the bugs.php.net database. * a1 is a list of current html entities that are commonly appearing in the listing description that are not escaped * a2 is most of the entities to either an accepted format, correct html-entity, or with a blank space * * @param string $string string to scrub * @return string $string clean string */ public static function xmlStringScrub($string) { $a1 = array("�","�","�","�", "�","�", "�", "�", "�", "�", "�","�","�","�","�", "�", "�"); $a2 = array(".","-","•","", "'","'", '"', '"', "-", "-", ",", "^",",","","€", "®", "™"); $string = htmlentities($string, ENT_QUOTES); $string = str_replace($a1, $a2, $string); $string = utf8_encode($string); return $string; } _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
|
|
Re: SimpleXML - UTF8Hi, i believe php.net/mb_convert_encoding will use "?" or just avoid printing a character when it can't convert. There's also php.net/iconv
On Sat, Oct 17, 2009 at 5:59 PM, John Campbell <jcampbell1@...> wrote: I hope there is an easy answer to this: _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
|
|
Re: SimpleXML - UTF8John Campbell wrote:
> I am using a remote XML service, that about 1 in 100 times returns XML > with invalid UTF-8 bytes. I don't have any control over the remote > service, but simpleXML pukes when I pass malformed UTF-8 to it. Does > anyone know of a simple way to cleanup bad UTF-8 bytes, e.g. replace > the invalid bytes with a '?'. Try: $text = @iconv('UTF-8','UTF-8//TRANSLIT',$text); Dan _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
|
|
Re: SimpleXML - UTF8On Mon, Oct 19, 2009 at 7:32 AM, Dan Cech <dcech@...> wrote:
> Try: > > $text = @iconv('UTF-8','UTF-8//TRANSLIT',$text); Thanks Dan, I knew there had to be something simple. It looks like mb_convert_encoding($txt,'UTF-8','UTF-8') will work similarly, but just deletes the offending bytes. Regards, John Campbell _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation |
| Free embeddable forum powered by Nabble | Forum Help |