|
View:
New views
19 Messages
—
Rating Filter:
Alert me
|
|
|
segfault after regexpUsing the file string.txt I get
octave:1> load "string.txt" octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; Segmentation fault This happens for the following two systems (using octave-3.0.2): Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 x86_64 Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 18:46:53 CEST 2008 i686 It doesn't happen for Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr 9 16:57:17 JST 2007 ia64 G. |
|
|
Re: segfault after regexpAm Donnerstag, den 18.09.2008, 04:26 -0700 schrieb G..:
> Using the file http://www.nabble.com/file/p19550666/string.txt string.txt I > get > > octave:1> load "string.txt" > octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; > Segmentation fault > > This happens for the following two systems (using octave-3.0.2): > Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 > x86_64 > Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 18:46:53 CEST 2008 > i686 > > It doesn't happen for > Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr 9 16:57:17 JST > 2007 ia64 Which version of pcre? On all systems, please. Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
segfault after regexpOn 18-Sep-2008, G.. wrote:
| Using the file http://www.nabble.com/file/p19550666/string.txt string.txt I | get | | octave:1> load "string.txt" | octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; | Segmentation fault | | This happens for the following two systems (using octave-3.0.2): | Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 | x86_64 | Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 18:46:53 CEST 2008 | i686 | | It doesn't happen for | Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr 9 16:57:17 JST | 2007 ia64 Running Octave under gdb and trying this example, I see: octave:1> load string.txt octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f4a34bd76f0 (LWP 7894)] 0x00007f4a2bfa51f8 in ?? () from /usr/lib/libpcre.so.3 (gdb) bt #0 0x00007f4a2bfa51f8 in ?? () from /usr/lib/libpcre.so.3 #1 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #2 0x00007f4a2bf9f998 in ?? () from /usr/lib/libpcre.so.3 #3 0x00007f4a2bfa39ec in ?? () from /usr/lib/libpcre.so.3 #4 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #5 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #6 0x00007f4a2bfa758c in ?? () from /usr/lib/libpcre.so.3 #7 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #8 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #9 0x00007f4a2bf9f998 in ?? () from /usr/lib/libpcre.so.3 #10 0x00007f4a2bfa39ec in ?? () from /usr/lib/libpcre.so.3 [...] #2174 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #2175 0x00007f4a2bfa5214 in ?? () from /usr/lib/libpcre.so.3 #2176 0x00007f4a2bfa758c in ?? () from /usr/lib/libpcre.so.3 [...] [etc.] [etc.] [etc.] So this appears to be an infinite recursion bug in the PCRE library. jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn 18-Sep-2008, Thomas Weber wrote:
| Am Donnerstag, den 18.09.2008, 04:26 -0700 schrieb G..: | > Using the file http://www.nabble.com/file/p19550666/string.txt string.txt I | > get | > | > octave:1> load "string.txt" | > octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; | > Segmentation fault | > | > This happens for the following two systems (using octave-3.0.2): | > Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 | > x86_64 | > Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 18:46:53 CEST 2008 | > i686 | > | > It doesn't happen for | > Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr 9 16:57:17 JST | > 2007 ia64 | | Which version of pcre? On all systems, please. For me, it is segfault:379> dpkg -l *pcre* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==============-==============-============================================ ii libpcre3 7.6-2.1 Perl 5 Compatible Regular Expression Library ii libpcre3-dev 7.6-2.1 Perl 5 Compatible Regular Expression Library ii libpcrecpp0 7.6-2.1 Perl 5 Compatible Regular Expression Library on a Debian AMD64 system. jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexppcre 7.8 for a) and c), and 7.4 for b) (where it was actually octave-3.0.0 from Ubuntu Hardy). |
|
|
Re: segfault after regexp--- On Thu, 9/18/08, G.. <gail@...> wrote: > From: G.. <gail@...> > Subject: segfault after regexp > To: bug-octave@... > Date: Thursday, September 18, 2008, 4:26 AM > Using the file > http://www.nabble.com/file/p19550666/string.txt string.txt > I > get > > octave:1> load "string.txt" > octave:2> regexp(s, > '^(\s*-*\d+[.]*\d*\s*)+$', > "lineanchors") ; > Segmentation fault > > This happens for the following two systems (using > octave-3.0.2): > Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 > 06:49:39 UTC 2008 > x86_64 > Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 > 18:46:53 CEST 2008 > i686 > > It doesn't happen for > Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr > 9 16:57:17 JST > 2007 ia64 > > G. > -- I confirm - happens on both 3.0.1 and 3.0.2 - self built octave and its dependencies. Regards, Sergei. _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn Thu, Sep 18, 2008 at 04:26:27AM -0700, G.. wrote:
> > Using the file http://www.nabble.com/file/p19550666/string.txt string.txt I > get > > octave:1> load "string.txt" > octave:2> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', "lineanchors") ; > Segmentation fault I suspect a stack overflow here. G., you did read Philip's comments about nested unlimited repeats, didn't you? http://lists.exim.org/lurker/message/20080918.084230.037a0008.en.html This works on my system if I increase the stack size limit (w.m contains your commands): $ ulimit -s 8192 $ ./run-octave -q octave:1> w Segmentation fault $ ulimit -s 16000 $ ./run-octave -q octave:1> w ans = 1 > This happens for the following two systems (using octave-3.0.2): > Operating System: Linux 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 > x86_64 > Operating System: Linux 2.6.27-rc6 #10 SMP Mon Sep 15 18:46:53 CEST 2008 > i686 > > It doesn't happen for > Operating System: Linux 2.4.18-nec3.4p1.045 #1 SMP Mon Apr 9 16:57:17 JST > 2007 ia64 I suspect the Itanium has a bigger default stack size. Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpThat works. However, for larger data (e.g. by simply doubling the input string) it doesn't. G. |
|
|
Re: segfault after regexpOn Sun, Sep 28, 2008 at 02:15:47AM -0700, G.. wrote:
> > > Thomas Weber-8 wrote: > > > > $ ulimit -s 16000 > > $ ./run-octave -q > > octave:1> w > > ans = 1 > > > > That works. However, for larger data (e.g. by simply doubling the input > string) it doesn't. You are running into a constraint/protection by your operating system. Increase the size even more or (better): check if you can't come up with better regexps. Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpa) this may be ignorant, but summarizing this means that - on the same machine with ulimit 16000 - I get: octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') Segmentation fault matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') ans = 1 After enlarging the input string to its limits, matlab finally gives a message that the array is too large, but never segfaults. b) w.r.t. the regexp, I really don't see how to describe a sequence of spaces and numbers more simply. And the data (the input string) is pretty tame. G. |
|
|
Re: segfault after regexpOn 29-Sep-2008, G.. wrote:
| a) this may be ignorant, but summarizing this means that - on the same | machine with ulimit 16000 - I get: | | octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') | Segmentation fault | | matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') | | ans = | | 1 | | After enlarging the input string to its limits, matlab finally gives a | message that the array is too large, but never segfaults. I think it has been mentioned before that Matlab uses its own regexp library that has its own set of bugs. Since Octave just sets up a call to the PCRE library, I think the bug is probably in the PCRE library, and the right place to fix it is there. Unless maybe there is a way now to have PCRE detect this problem and return an error code instead of infinitely recursing and causing a segfault. Hmm. I expected to be able to also generate a segfault with the pcretest program, but I couldn't make that happen. So maybe Octave is calling the PCRE functions incorrectly? I'm not sure, and not an expert here, so it would be helpful if someone could help debug the problem, or verify that there is a bug in the PCRE library that should be fixed. Thanks, jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn Mon, Sep 29, 2008 at 02:19:41PM -0400, John W. Eaton wrote:
> On 29-Sep-2008, G.. wrote: > > | a) this may be ignorant, but summarizing this means that - on the same > | machine with ulimit 16000 - I get: > | > | octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') > | Segmentation fault > | > | matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') > | > | ans = > | > | 1 > | > | After enlarging the input string to its limits, matlab finally gives a > | message that the array is too large, but never segfaults. > > I think it has been mentioned before that Matlab uses its own regexp > library that has its own set of bugs. > > Since Octave just sets up a call to the PCRE library, I think the bug > is probably in the PCRE library, and the right place to fix it is > there. Unless maybe there is a way now to have PCRE detect this > problem and return an error code instead of infinitely recursing and > causing a segfault. Actually it's a SIGSEGV. It might be possible to change regexp.cc to set the soft limit for stack recursion (the equivalent of the above 'ulimit -s' command) to the hard limit. I don't know however what kind of consequences this has for the system in question. Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn 29-Sep-2008, Thomas Weber wrote:
| On Mon, Sep 29, 2008 at 02:19:41PM -0400, John W. Eaton wrote: | > On 29-Sep-2008, G.. wrote: | > | > | a) this may be ignorant, but summarizing this means that - on the same | > | machine with ulimit 16000 - I get: | > | | > | octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') | > | Segmentation fault | > | | > | matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') | > | | > | ans = | > | | > | 1 | > | | > | After enlarging the input string to its limits, matlab finally gives a | > | message that the array is too large, but never segfaults. | > | > I think it has been mentioned before that Matlab uses its own regexp | > library that has its own set of bugs. | > | > Since Octave just sets up a call to the PCRE library, I think the bug | > is probably in the PCRE library, and the right place to fix it is | > there. Unless maybe there is a way now to have PCRE detect this | > problem and return an error code instead of infinitely recursing and | > causing a segfault. | | Actually it's a SIGSEGV. There's a difference? | It might be possible to change regexp.cc to set the soft limit for stack | recursion (the equivalent of the above 'ulimit -s' command) to the hard | limit. | | I don't know however what kind of consequences this has for the system | in question. I think we should first find out why this is going into an apparently infinite recursion. If it is an error in the way that we are using the PCRE functions, then maybe we can fix it. Otherwise, I think the bug should be fixed in PCRE. jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpAm Montag, den 29.09.2008, 17:07 -0400 schrieb John W. Eaton:
> On 29-Sep-2008, Thomas Weber wrote: > > | On Mon, Sep 29, 2008 at 02:19:41PM -0400, John W. Eaton wrote: > | > On 29-Sep-2008, G.. wrote: > | > > | > | a) this may be ignorant, but summarizing this means that - on the same > | > | machine with ulimit 16000 - I get: > | > | > | > | octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') > | > | Segmentation fault > | > | > | > | matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') > | > | > | > | ans = > | > | > | > | 1 > | > | > | > | After enlarging the input string to its limits, matlab finally gives a > | > | message that the array is too large, but never segfaults. > | > > | > I think it has been mentioned before that Matlab uses its own regexp > | > library that has its own set of bugs. > | > > | > Since Octave just sets up a call to the PCRE library, I think the bug > | > is probably in the PCRE library, and the right place to fix it is > | > there. Unless maybe there is a way now to have PCRE detect this > | > problem and return an error code instead of infinitely recursing and > | > causing a segfault. > | > | Actually it's a SIGSEGV. > > There's a difference? I thought, but I was wrong. Okay, forget that. Anyway, we could catch the SIGSEGV from the pcre library and proceed accordingly. > > | It might be possible to change regexp.cc to set the soft limit for stack > | recursion (the equivalent of the above 'ulimit -s' command) to the hard > | limit. > | > | I don't know however what kind of consequences this has for the system > | in question. > > I think we should first find out why this is going into an apparently > infinite recursion. If it is an error in the way that we are using > the PCRE functions, then maybe we can fix it. Otherwise, I think the > bug should be fixed in PCRE. I don't think it's an infinite recursion (it works when given enough space, so it's definitely finite). There are already several options in the PCRE library, including a different implementation for regexps like this. Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn 30-Sep-2008, Thomas Weber wrote:
| Am Montag, den 29.09.2008, 17:07 -0400 schrieb John W. Eaton: | > On 29-Sep-2008, Thomas Weber wrote: | > | > | On Mon, Sep 29, 2008 at 02:19:41PM -0400, John W. Eaton wrote: | > | > On 29-Sep-2008, G.. wrote: | > | > | > | > | a) this may be ignorant, but summarizing this means that - on the same | > | > | machine with ulimit 16000 - I get: | > | > | | > | > | octave:> regexp(s, '^(\s*-*\d+[.]*\d*\s*)+$', 'lineanchors') | > | > | Segmentation fault | > | > | | > | > | matlab:> regexp(repmat(s,1,1000), '^(\s*-*\d+[.]*\d*\s*)+$') | > | > | | > | > | ans = | > | > | | > | > | 1 | > | > | | > | > | After enlarging the input string to its limits, matlab finally gives a | > | > | message that the array is too large, but never segfaults. | > | > | > | > I think it has been mentioned before that Matlab uses its own regexp | > | > library that has its own set of bugs. | > | > | > | > Since Octave just sets up a call to the PCRE library, I think the bug | > | > is probably in the PCRE library, and the right place to fix it is | > | > there. Unless maybe there is a way now to have PCRE detect this | > | > problem and return an error code instead of infinitely recursing and | > | > causing a segfault. | > | | > | Actually it's a SIGSEGV. | > | > There's a difference? | | I thought, but I was wrong. Okay, forget that. | | Anyway, we could catch the SIGSEGV from the pcre library and proceed | accordingly. There is already a handler installed for SIGSEGV, but I think it fails in this instance. I'm not certain why that happens, but my guess is that, calling the signal handler fails if there is no more stack space. | > | It might be possible to change regexp.cc to set the soft limit for stack | > | recursion (the equivalent of the above 'ulimit -s' command) to the hard | > | limit. | > | | > | I don't know however what kind of consequences this has for the system | > | in question. | > | > I think we should first find out why this is going into an apparently | > infinite recursion. If it is an error in the way that we are using | > the PCRE functions, then maybe we can fix it. Otherwise, I think the | > bug should be fixed in PCRE. | | I don't think it's an infinite recursion (it works when given enough | space, so it's definitely finite). OK, but it seems like a very large number of recursive calls for what seems to be a relatively simple regexp operating on what also seems to be a small amount of data. | There are already several options in the PCRE library, including a | different implementation for regexps like this. So should we be trying to recognize the characteristics of the regexp and set some options before calling PCRE? It seems to me that job should be handled by PCRE itself. We are just users of the library. How are we supposed to know what kinds of regexps will cause trouble? jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn Tue, Sep 30, 2008 at 04:16:29PM -0400, John W. Eaton wrote:
> On 30-Sep-2008, Thomas Weber wrote: > There is already a handler installed for SIGSEGV, but I think it fails > in this instance. I'm not certain why that happens, but my guess is > that, calling the signal handler fails if there is no more stack space. Yes, according to sigaltstack(2), that's the problem. (I'm mentioning sigaltstack here mostly for reference, so I don't have to search the net again). I fear however that this will turn into a very system-specific solution. > > | > | It might be possible to change regexp.cc to set the soft limit for stack > | > | recursion (the equivalent of the above 'ulimit -s' command) to the hard > | > | limit. > | > | > | > | I don't know however what kind of consequences this has for the system > | > | in question. > | > > | > I think we should first find out why this is going into an apparently > | > infinite recursion. If it is an error in the way that we are using > | > the PCRE functions, then maybe we can fix it. Otherwise, I think the > | > bug should be fixed in PCRE. > | > | I don't think it's an infinite recursion (it works when given enough > | space, so it's definitely finite). > > OK, but it seems like a very large number of recursive calls for what > seems to be a relatively simple regexp operating on what also seems to > be a small amount of data. According to pcrestack(3), that's a problem that might happen with nested, unlimited regexps. > | There are already several options in the PCRE library, including a > | different implementation for regexps like this. > > So should we be trying to recognize the characteristics of the regexp > and set some options before calling PCRE? I don't think we will have much luck in recognizing the characteristics of a regexp. If the data is trivial, even the most complicated regexp will work; vice versa, with enough data, even simple regexp's might run into this. > should be handled by PCRE itself. We are just users of the library. > How are we supposed to know what kinds of regexps will cause trouble? Well, quoting pcrestack's man page: "As a very rough rule of thumb, you should reckon on about 500 bytes per recursion. Thus, if you want to limit your stack usage to 8Mb, you should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can support around 128000 recursions. The pcretest test program has a command line option (-S) that can be used to increase the size of its stack." So, we have some estimates, with a security factor of (say) 2, we should be alright. This doesn't address the important question though: what kind of memory limit do we pose on the stack? Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn Sat, Oct 04, 2008 at 11:40:04AM +0200, Thomas Weber wrote:
> Well, quoting pcrestack's man page: > "As a very rough rule of thumb, you should reckon on about 500 bytes per > recursion. Thus, if you want to limit your stack usage to 8Mb, you > should set the limit at 16000 recursions. A 64Mb stack, on the other > hand, can support around 128000 recursions. The pcretest test program > has a command line option (-S) that can be used to increase the size of > its stack." > > So, we have some estimates, with a security factor of (say) 2, we should > be alright. > > This doesn't address the important question though: what kind of memory > limit do we pose on the stack? hard limit), with a safety factor of 2. Thomas # HG changeset patch # User Thomas Weber <thomas.weber.mail@...> # Date 1223729321 -7200 # Node ID f89e3a3bf4d106ee3d297243e42768d1b5213703 # Parent a10397d26114998bca6c7c5570eb2feb40f77b91 Set a sensible limit on stack usage diff --git a/src/DLD-FUNCTIONS/regexp.cc b/src/DLD-FUNCTIONS/regexp.cc --- a/src/DLD-FUNCTIONS/regexp.cc +++ b/src/DLD-FUNCTIONS/regexp.cc @@ -52,9 +52,10 @@ #include <regex.h> #endif -// Define the maximum number of retries for a pattern that -// possibly results in an infinite recursion. -#define PCRE_MATCHLIMIT_MAX 10 +// Define a safety factor for PCRE's estimated stack usage +// Used to protect against stack overflow and to estimate the maximum +// number of recursions. +#define PCRE_STACK_SAFETY_FACTOR 2 // The regexp is constructed as a linked list to avoid resizing the // return values in arrays at each new match. @@ -384,31 +385,58 @@ { OCTAVE_QUIT; - int matches = pcre_exec(re, 0, buffer.c_str(), + // pcre_exec uses recursion aggressively, therefore we may + // run out of stack in the call to pcre_exec() if we use the + // platform's default values. + + // If the hard limit from getrlimit() is unlimited, we set + // the stack limit to 500MB and set the number of recursions + // accordingly: one recursion needs approximately 500 bytes + // according to pcrestack(3) and we use a safety factor of + // PCRE_STACK_SAFETY_FACTOR. + + // If the hard limit from getrlimit() is lower, we set the + // limit on the number of function executions to a suitable + // value. + + // query the limits + pcre_extra pe; + pcre_config(PCRE_CONFIG_MATCH_LIMIT, static_cast <void *> (&pe.match_limit)); + pe.flags = PCRE_EXTRA_MATCH_LIMIT; + + struct rlimit rlim; + getrlimit(RLIMIT_STACK, &rlim); + + if (rlim.rlim_max == RLIM_INFINITY) + // no hard limit, so we limit ourselves to 500 MB + rlim.rlim_cur = static_cast <rlim_t> (500 * 1024 * 1024); + else + // set soft limit to hard limit + rlim.rlim_cur = rlim.rlim_max; + + pe.match_limit = rlim.rlim_cur / (500 * PCRE_STACK_SAFETY_FACTOR); + + if (setrlimit(RLIMIT_STACK, &rlim) != 0) + { + error ("%s: increasing stack limit for PCRE usage failed", nm.c_str()); + pcre_free(re); + return 0; + } + + int matches = pcre_exec(re, &pe, buffer.c_str(), buffer.length(), idx, (idx ? PCRE_NOTBOL : 0), ovector, (subpatterns+1)*3); + if (matches == PCRE_ERROR_MATCHLIMIT) { - // try harder; start with default value for MATCH_LIMIT and increase it - warning("Your pattern caused PCRE to hit its MATCH_LIMIT.\nTrying harder now, but this will be slow."); - pcre_extra pe; - pcre_config(PCRE_CONFIG_MATCH_LIMIT, static_cast <void *> (&pe.match_limit)); - pe.flags = PCRE_EXTRA_MATCH_LIMIT; - - int i = 0; - while (matches == PCRE_ERROR_MATCHLIMIT && - i++ < PCRE_MATCHLIMIT_MAX) - { - OCTAVE_QUIT; - - pe.match_limit *= 10; - matches = pcre_exec(re, &pe, buffer.c_str(), - buffer.length(), idx, - (idx ? PCRE_NOTBOL : 0), - ovector, (subpatterns+1)*3); - } + // we've hit the match limit, despite setting it to a + // big value above. Inform the user and fail gracefully + error("%s: Your pattern caused PCRE to hit its MATCH_LIMIT. " + "If you have a hard limit on stack usage set, try to set it higher.", nm.c_str()); + pcre_free(re); + return 0; } if (matches < 0 && matches != PCRE_ERROR_NOMATCH) _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn 11-Oct-2008, Thomas Weber wrote:
| On Sat, Oct 04, 2008 at 11:40:04AM +0200, Thomas Weber wrote: | > Well, quoting pcrestack's man page: | > "As a very rough rule of thumb, you should reckon on about 500 bytes per | > recursion. Thus, if you want to limit your stack usage to 8Mb, you | > should set the limit at 16000 recursions. A 64Mb stack, on the other | > hand, can support around 128000 recursions. The pcretest test program | > has a command line option (-S) that can be used to increase the size of | > its stack." | > | > So, we have some estimates, with a security factor of (say) 2, we should | > be alright. | > | > This doesn't address the important question though: what kind of memory | > limit do we pose on the stack? | | Patch attached. I assume a maximum of 500MB on the stack (if there's no | hard limit), with a safety factor of 2. I don't think getrlimit and setrlimit are portable, so at a minimum, you'll need a configure check and only use this method if thse functions are available. But is this really the right place for the fix? Or even the right approach to take? Octave is not the only program using PCRE that might run into this problem. It seems to me that it would be better to fix it in PCRE itself, preferably by using a different algorithm that doesn't suffer from these problems. Modifying the stack limit does not seem like a real fix to the actual problem. Instead, you are just hiding it. The problem still exists, and will still bite for larger problems or more complex data. jwe _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
|
|
Re: segfault after regexpOn Sat, Oct 11, 2008 at 09:09:24AM -0400, John W. Eaton wrote:
> On 11-Oct-2008, Thomas Weber wrote: > > | On Sat, Oct 04, 2008 at 11:40:04AM +0200, Thomas Weber wrote: > | > Well, quoting pcrestack's man page: > | > "As a very rough rule of thumb, you should reckon on about 500 bytes per > | > recursion. Thus, if you want to limit your stack usage to 8Mb, you > | > should set the limit at 16000 recursions. A 64Mb stack, on the other > | > hand, can support around 128000 recursions. The pcretest test program > | > has a command line option (-S) that can be used to increase the size of > | > its stack." > | > > | > So, we have some estimates, with a security factor of (say) 2, we should > | > be alright. > | > > | > This doesn't address the important question though: what kind of memory > | > limit do we pose on the stack? > | > | Patch attached. I assume a maximum of 500MB on the stack (if there's no > | hard limit), with a safety factor of 2. > > I don't think getrlimit and setrlimit are portable, so at a minimum, > you'll need a configure check and only use this method if thse > functions are available. I actually thought they are in POSIX. Can people with different systems comment? For that matter, does the original crash happen on Windows or Mac? > But is this really the right place for the fix? Or even the right > approach to take? Octave is not the only program using PCRE that > might run into this problem. Eh, yes: PHP: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=476419 PCRE itself: http://bugs.exim.org/show_bug.cgi?id=704 > It seems to me that it would be better to fix it in PCRE itself, > preferably by using a different algorithm that doesn't suffer from > these problems. There is a different algorithm already implemented, pcre_dfa_exec(). It's not Perl compatible, though. Reading through its documentation, we will just hit a different problem, though: it needs a workspace for saving the number of different possible matches. So we would need to choose how many partial matches we would like to track (man pcreapi for details). > Modifying the stack limit does not seem like a real fix to the actual > problem. Instead, you are just hiding it. The problem still exists, > and will still bite for larger problems or more complex data. Sorry, but with enough data, your RAM won't handle that, either. There's a limit on how much we can cater: 1) When compiling PCRE, the user has chosen a far too large recursion limit. 2) The soft limit in his shell on stack usage is too low for the value from 1). 3) There comes Octave, simply using what it is told to use and not working with it. But now Octave should overcome 1) and 2)? I'd say if it was trivial to overcome, PCRE would handle it itself. PCRE's default usage means aggressive recursion, how should we change that? Thomas _______________________________________________ Bug-octave mailing list Bug-octave@... https://www-old.cae.wisc.edu/mailman/listinfo/bug-octave |
| Free embeddable forum powered by Nabble | Forum Help |