UTF-8 combine_path() bug

View: New views
8 Messages — Rating Filter:   Alert me  

UTF-8 combine_path() bug

by Pontus Rodling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Something crashes hard when using combine_path() with a UTF-8 filename.
"subfolder" contains UTF-8 characters, "path" contains no UTF-8 characters:

string x = combine_path(path, subfolder); // This crashes
combine_path(path, subfolder); // This does not crash

I'm running Windows 7 RC 64-bit and Pike 7.8.352.

=== Crash details ===
Faulting application name: pike.exe, version: 0.0.0.0, time stamp:
0x4aba8624
Faulting module name: MSVCR90.dll, version: 9.0.30729.4918, time stamp:
0x49d43da7
Exception code: 0xc0000005
Fault offset: 0x0003c15e
Faulting process id: 0xf50
Faulting application start time: 0x01ca533c3efca9cc
Faulting application path: C:\Program Files (x86)\Pike\bin\pike.exe
Faulting module path:
C:\Windows\WinSxS\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4918_none_508da958bcbd2845\MSVCR90.dll


// Pontus Rodling


Re: UTF-8 combine_path() bug

by Martin Stjernholm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Pontus Rodling <pontus@...> wrote:

> Something crashes hard when using combine_path() with a UTF-8
> filename.

Can you please provide a complete test program?


Re: UTF-8 combine_path() bug

by Pontus Rodling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Stjernholm skrev:

> Pontus Rodling <pontus@...> wrote:
>
>  
>> Something crashes hard when using combine_path() with a UTF-8
>> filename.
>>    
>
> Can you please provide a complete test program?
>
>  
This is the code I was working on when i triggered the bug.
Simply put a folder with a UTF-8 name somewhere in c:/test and it should
cause it to crash.
As an example, c:/test/パイク crashes Pike.

// Pontus
---
int foldercount;

array folders = ({ });

int main() {
folders += ({
Folder("C:/test"),
});

foreach (folders, Folder f) {
write("Indexing: %s\n", f->path);
f->deepindex();
}

write("Folders: %d\n", foldercount);

write("Press enter to quit... ");
Stdio.stdin->read(1);
}

class Folder {
string name;
Folder parent;

string path;

private array subfolders;
private array files;

void create(string n, Folder|void p) {
name = n;
parent = p;

if (parent) path = combine_path(parent->path, name);
else path = name;

foldercount++;
}

void deepindex() {
array a = get_subfolders();
foreach (a, Folder f) {
f->deepindex();
}
}

array get_subfolders() {
if (!subfolders) {
subfolders = ({ });
array a = sort(get_dir(path));
foreach (a, string b) {
// CRASHES HERE
if (Stdio.is_dir(combine_path(path, b))) {
subfolders += ({ Folder(b, this) });
}
}
}

return copy_value(subfolders);
}
}


Re: UTF-8 combine_path() bug

by Martin Stjernholm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Pontus Rodling <pontus@...> wrote:

> This is the code I was working on when i triggered the bug.
> Simply put a folder with a UTF-8 name somewhere in c:/test and it
> should cause it to crash.

Getting a better minimized test case would have been nice. I reduced
it to this:

  #charset utf-8
  int main()
  {
    werror ("%O\n", combine_path_nt ("test", "€uro"));
  }

Annoyingly enough it doesn't fail on unix (neither 32- nor 64-bit) and
not in windbg. Where is Valgrind4Windows? :P


Re: UTF-8 combine_path() bug

by Pontus Rodling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Stjernholm skrev:

> Pontus Rodling <pontus@...> wrote:
>
>  
>> This is the code I was working on when i triggered the bug.
>> Simply put a folder with a UTF-8 name somewhere in c:/test and it
>> should cause it to crash.
>>    
>
> Getting a better minimized test case would have been nice. I reduced
> it to this:
>
>   #charset utf-8
>   int main()
>   {
>     werror ("%O\n", combine_path_nt ("test", "€uro"));
>   }
>
> Annoyingly enough it doesn't fail on unix (neither 32- nor 64-bit) and
> not in windbg. Where is Valgrind4Windows? :P
>  
That indeed is smaller and crashes, yea :P
I should also learn to put code as an attachment and not inline, doh!
Anyway, a custom combine_path function will have to do for now.

// Pontus


Re: UTF-8 combine_path() bug

by Martin Stjernholm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Stjernholm <mast@...> wrote:

>   #charset utf-8
>   int main()
>   {
>     werror ("%O\n", combine_path_nt ("test", "€uro"));
>   }
>
> Annoyingly enough it doesn't fail on unix (neither 32- nor 64-bit) and
> not in windbg. Where is Valgrind4Windows? :P

By pure luck I managed to catch it in a debugger. It looks like an
out-of-bounds memory access in isalpha() in the microsoft CRT when
it's given a 16-bit value (crashes only if the memory isn't mapped,
which depends on loaded dll's, load order, sizes of compilation units,
and similar phase-of-the-moon factors).

The bug only occurs if the first char in the second arg is wide. I've
checked in a workaround, which is to simply avoid isalpha().

Thanks for the report.


Re: UTF-8 combine_path() bug

by Mirar @ Pike importmöte för mailinglistan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Is this a problem in main.c also?


Re: UTF-8 combine_path() bug

by Martin Stjernholm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

"Martin Nilsson (Opera Mini - AFK!) @ Pike (-) importmöte för mailinglistan" <6341@...> wrote:

> Is this a problem in main.c also?

I looked at that one as well, but given the type of the variable and
how it's used elsewhere in the function, it doesn't look like it could
ever be a widestring.

I imagine it isn't unlikely that the problem affects the other is*
functions though, and they are used here and there. I've no idea in
how many places they might be fed wide chars.