I have already ported icu4.2 to a platfrom. But when runnning the code, it exited uncorrectly.
It was happened in the applyPropertyPattern function in the uniset_props.cpp file.
the applyPropertyPattern function is as follows:
UnicodeSet& UnicodeSet::applyPropertyPattern(const UnicodeString& pattern,
ParsePosition& ppos,
UErrorCode &ec)
......
// Look for an '=' sign. If this is present, we will parse a
// medium \p{gc=Cf} or long \p{GeneralCategory=Format}
// pattern.
int32_t equals = pattern.indexOf(EQUALS, pos);
UnicodeString propName, valueName;
if (equals >= 0 && equals < close && !isName) {
// Equals seen; parse medium/long pattern
pattern.extractBetween(pos, equals, propName);
pattern.extractBetween(equals+1, close, valueName);
}
else {
// Handle case where no '=' is seen, and \N{}
pattern.extractBetween(pos, close, propName);
// Handle \N{name}
if (isName) {
// This is a little inefficient since it means we have to
// parse NAME_PROP back to UCHAR_NAME even though we already
// know it's UCHAR_NAME. If we refactor the API to
// support args of (UProperty, char*) then we can remove
// NAME_PROP and make this a little more efficient.
valueName = propName;
propName = UnicodeString(NAME_PROP, NAME_PROP_LENGTH, US_INV);
} }
applyPropertyAlias(propName, valueName, ec);
......
the pattern being applied is
static const UChar gIsWordPattern[] = {
// [ \ p { A l p h a b e t i c }
0x5b, 0x5c, 0x70, 0x7b, 0x61, 0x6c, 0x70, 0x68, 0x61, 0x62, 0x65, 0x74, 0x69, 0x63, 0x7d,
// \ p { M } Mark
0x5c, 0x70, 0x7b, 0x4d, 0x7d,
// \ p { N d } Digit_Numeric
0x5c, 0x70, 0x7b, 0x4e, 0x64, 0x7d,
// \ p { P c } ] Connector_Punctuation
0x5c, 0x70, 0x7b, 0x50, 0x63, 0x7d, 0x5d, 0};
becuase the bold lines never entered.
it was firstly called in this line of file regexst.cpp in function RegexStaticSets::RegexStaticSets(UErrorCode *status):
fPropSets[URX_ISWORD_SET] = new UnicodeSet(UnicodeString(TRUE, gIsWordPattern, -1), *status);
I think it goes to the wrong direction, then what is the cause?