« Return to Thread: preg_replace with UTF-8

preg_replace with UTF-8

by SleePy-4 :: Rate this Message:

Reply to Author | View in Thread

I seem to be having a minor issue with preg_replace not working as  
expected when using UTF-8 strings. So far I have found out that \w  
doesn't seem to be detecting UTF-8 strings.

This is my test php file:
<?php
$data = 'ooooooooooooooooooooooo';
echo 'Data before: ', $data, '<br />';

$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;

// UTF-8 Test
$data = 'ффффффффффффффффффффффф';
echo '<hr />Data before: ', $data, '<br />';

$data = preg_replace('~([\w\.]{6})~u', '$1 < >', $data);
echo 'Data After: ', $data;

?>


I would expect it to be:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: фффффф <>фффффф <>фффффф<> ффффф

But what I get is:
Data before: ooooooooooooooooooooooo
Data After: oooooo < >oooooo < >oooooo < >ooooo
---
Data before: ффффффффффффффффффффффф
Data After: ффффффффффффффффффффффф

Did I go about this the wrong way or is this a php bug itself?
I tested this in php 5.3, 5.2.9 and 6.0 (snapshot from a couple weeks  
ago) and received the same results.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

 « Return to Thread: preg_replace with UTF-8