[PATCH] tsearch parser inefficiency if text includes urls or emails

View: New views
10 Messages — Rating Filter:   Alert me  

[PATCH] tsearch parser inefficiency if text includes urls or emails

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

While playing around/evaluating tsearch I notices that to_tsvector is
obscenely slow for some files. After some profiling I found that this is due
using a seperate TSParser in p_ishost/p_isURLPath in wparser_def.c.
If a multibyte encoding is in use TParserInit copies the whole remaining input
and converts it to wchar_t or pg_wchar - for every email or protocol prefixed
url in the the document. Which obviously is bad.

I solved the issue by having a seperate TParserCopyInit/TParserCopyClose which
reuses the the already converted strings of the original TParser - only at
different offsets.

Another approach would be to get rid of the separate parser invocations -
requiring a bunch of additional states. This seemed more complex to me, so I
wanted to get some feedback first.

Without patch:
andres=# SELECT to_tsvector('english', document) FROM document WHERE filename =
'/usr/share/doc/libdrm-nouveau1/changelog';
                                                                                                     
 ─────────────────────────────────────────────────────────────────────────────────────────────────────
...
 (1 row)
 
Time: 5835.676 ms

With patch:
andres=# SELECT to_tsvector('english', document) FROM document WHERE filename =
'/usr/share/doc/libdrm-nouveau1/changelog';
                                                                                                     
 ─────────────────────────────────────────────────────────────────────────────────────────────────────
...
 (1 row)
 
Time: 395.341 ms

Ill cleanup the patch if it seems like a sensible solution...

Is this backpatch-worthy?

Andres


PS: I let the additional define in for the moment so that its easier to see the
performance differences.

[reuse-strings-in-tparser-recursion.patch]

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 301c1eb..14757d9 100644
*** a/src/backend/tsearch/wparser_def.c
--- b/src/backend/tsearch/wparser_def.c
***************
*** 24,29 ****
--- 24,30 ----
 
  /* Define me to enable tracing of parser behavior */
  /* #define WPARSER_TRACE */
+ #define WPARSER_REUSE
 
 
  /* Output token categories */
*************** TParserInit(char *str, int len)
*** 328,333 ****
--- 329,358 ----
  return prs;
  }
 
+ static TParser *
+ TParserCopyInit(const TParser const* orig, size_t offset, int len)
+ {
+ TParser    *prs = (TParser *) palloc0(sizeof(TParser));
+
+ prs->charmaxlen = orig->charmaxlen;
+ prs->usewide = orig->usewide;
+ prs->lenstr = len;
+
+ prs->str = orig->str + offset;
+ prs->pgwstr = orig->pgwstr + offset;
+ prs->wstr = orig->wstr + offset;
+
+
+ prs->state = newTParserPosition(NULL);
+ prs->state->state = TPS_Base;
+
+ #ifdef WPARSER_TRACE
+ fprintf(stderr, "parsing \"%.*s\"\n", len, orig->str + offset);
+ #endif
+
+ return prs;
+ }
+
  static void
  TParserClose(TParser *prs)
  {
*************** TParserClose(TParser *prs)
*** 345,351 ****
--- 370,388 ----
  if (prs->pgwstr)
  pfree(prs->pgwstr);
  #endif
+ pfree(prs);
+ }
 
+ static void
+ TParserCopyClose(TParser *prs)
+ {
+ while (prs->state)
+ {
+ TParserPosition *ptr = prs->state->prev;
+
+ pfree(prs->state);
+ prs->state = ptr;
+ }
  pfree(prs);
  }
 
*************** p_isignore(TParser *prs)
*** 617,623 ****
  static int
  p_ishost(TParser *prs)
  {
! TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
  int res = 0;
 
  tmpprs->wanthost = true;
--- 654,664 ----
  static int
  p_ishost(TParser *prs)
  {
! #ifdef WPARSER_REUSE
! TParser *tmpprs = TParserCopyInit(prs, prs->state->posbyte, prs->lenstr - prs->state->posbyte);
! #else
! TParser *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
! #endif
  int res = 0;
 
  tmpprs->wanthost = true;
*************** p_ishost(TParser *prs)
*** 631,637 ****
--- 672,682 ----
  prs->state->charlen = tmpprs->state->charlen;
  res = 1;
  }
+ #ifdef WPARSER_REUSE
+ TParserCopyClose(tmpprs);
+ #else
  TParserClose(tmpprs);
+ #endif
 
  return res;
  }
*************** p_ishost(TParser *prs)
*** 639,645 ****
  static int
  p_isURLPath(TParser *prs)
  {
! TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
  int res = 0;
 
  tmpprs->state = newTParserPosition(tmpprs->state);
--- 684,694 ----
  static int
  p_isURLPath(TParser *prs)
  {
! #ifdef WPARSER_REUSE
! TParser *tmpprs = TParserCopyInit(prs, prs->state->posbyte, prs->lenstr - prs->state->posbyte);
! #else
! TParser *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
! #endif
  int res = 0;
 
  tmpprs->state = newTParserPosition(tmpprs->state);
*************** p_isURLPath(TParser *prs)
*** 654,660 ****
--- 703,713 ----
  prs->state->charlen = tmpprs->state->charlen;
  res = 1;
  }
+ #ifdef WPARSER_REUSE
+ TParserCopyClose(tmpprs);
+ #else
  TParserClose(tmpprs);
+ #endif
 
  return res;
  }



--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] tsearch parser inefficiency if text includes urls or emails - new version

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 01 November 2009 16:19:43 Andres Freund wrote:

> While playing around/evaluating tsearch I notices that to_tsvector is
> obscenely slow for some files. After some profiling I found that this is
>  due using a seperate TSParser in p_ishost/p_isURLPath in wparser_def.c. If
>  a multibyte encoding is in use TParserInit copies the whole remaining
>  input and converts it to wchar_t or pg_wchar - for every email or protocol
>  prefixed url in the the document. Which obviously is bad.
>
> I solved the issue by having a seperate TParserCopyInit/TParserCopyClose
>  which reuses the the already converted strings of the original TParser -
>  only at different offsets.
>
> Another approach would be to get rid of the separate parser invocations -
> requiring a bunch of additional states. This seemed more complex to me, so
>  I wanted to get some feedback first.
>
> Without patch:
> andres=# SELECT to_tsvector('english', document) FROM document WHERE
>  filename = '/usr/share/doc/libdrm-nouveau1/changelog';
>
>  ──────────────────────────────────────────────────────────────────────────
> ─────────────────────────── ...
>  (1 row)
>
> Time: 5835.676 ms
>
> With patch:
> andres=# SELECT to_tsvector('english', document) FROM document WHERE
>  filename = '/usr/share/doc/libdrm-nouveau1/changelog';
>
>  ──────────────────────────────────────────────────────────────────────────
> ─────────────────────────── ...
>  (1 row)
>
> Time: 395.341 ms
>
> Ill cleanup the patch if it seems like a sensible solution...
As nobody commented here is a corrected (stupid thinko) and cleaned up
version. Anyone cares to comment whether I am the only one thinking this is an
issue?

Andres

[0001-Fix-TSearch-inefficiency-because-of-repeated-copying.patch]

From cbdeb0bb636f3b7619d0a3019854809ea5565dac Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@...>
Date: Sun, 8 Nov 2009 16:30:42 +0100
Subject: [PATCH] Fix TSearch inefficiency because of repeated copying of strings

---
 src/backend/tsearch/wparser_def.c |   63 ++++++++++++++++++++++++++++++++++--
 1 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 301c1eb..7bbd826 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -328,6 +328,41 @@ TParserInit(char *str, int len)
  return prs;
 }
 
+/*
+ * As an alternative to a full TParserInit one can create a
+ * TParserCopy which basically is a normally TParser without a private
+ * copy of the string - instead it uses the one from another TParser.
+ * This is usefull because at some places TParsers are created
+ * recursively and the repeated copying around of the strings can
+ * cause major inefficiency.
+ * Obviously one may not close the original TParser before the copy.
+ */
+static TParser *
+TParserCopyInit(const TParser const* orig)
+{
+ TParser    *prs = (TParser *) palloc0(sizeof(TParser));
+
+ prs->charmaxlen = orig->charmaxlen;
+ prs->usewide = orig->usewide;
+ prs->lenstr = orig->lenstr - orig->state->posbyte;
+
+ prs->str = orig->str + orig->state->posbyte;
+ if(orig->pgwstr)
+ prs->pgwstr = orig->pgwstr + orig->state->poschar;
+ if(orig->wstr)
+ prs->wstr = orig->wstr + orig->state->poschar;
+
+ prs->state = newTParserPosition(NULL);
+ prs->state->state = TPS_Base;
+
+#ifdef WPARSER_TRACE
+ fprintf(stderr, "parsing copy \"%.*s\"\n", len, str);
+#endif
+
+ return prs;
+}
+
+
 static void
 TParserClose(TParser *prs)
 {
@@ -350,6 +385,26 @@ TParserClose(TParser *prs)
 }
 
 /*
+ * See TParserCopyInit
+ */
+static void
+TParserCopyClose(TParser *prs)
+{
+ while (prs->state)
+ {
+ TParserPosition *ptr = prs->state->prev;
+
+ pfree(prs->state);
+ prs->state = ptr;
+ }
+#ifdef WPARSER_TRACE
+ fprintf(stderr, "closing parser copy");
+#endif
+ pfree(prs);
+}
+
+
+/*
  * Character-type support functions, equivalent to is* macros, but
  * working with any possible encodings and locales. Notes:
  *  - with multibyte encoding and C-locale isw* function may fail
@@ -617,7 +672,7 @@ p_isignore(TParser *prs)
 static int
 p_ishost(TParser *prs)
 {
- TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
+ TParser *tmpprs = TParserCopyInit(prs);
  int res = 0;
 
  tmpprs->wanthost = true;
@@ -631,7 +686,7 @@ p_ishost(TParser *prs)
  prs->state->charlen = tmpprs->state->charlen;
  res = 1;
  }
- TParserClose(tmpprs);
+ TParserCopyClose(tmpprs);
 
  return res;
 }
@@ -639,7 +694,7 @@ p_ishost(TParser *prs)
 static int
 p_isURLPath(TParser *prs)
 {
- TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
+ TParser *tmpprs = TParserCopyInit(prs);
  int res = 0;
 
  tmpprs->state = newTParserPosition(tmpprs->state);
@@ -654,7 +709,7 @@ p_isURLPath(TParser *prs)
  prs->state->charlen = tmpprs->state->charlen;
  res = 1;
  }
- TParserClose(tmpprs);
+ TParserCopyClose(tmpprs);
 
  return res;
 }
--
1.6.5.12.gd65df24




--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] tsearch parser inefficiency if text includes urls or emails - new version

by Kenneth Marshall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Nov 08, 2009 at 05:00:53PM +0100, Andres Freund wrote:

> On Sunday 01 November 2009 16:19:43 Andres Freund wrote:
> > While playing around/evaluating tsearch I notices that to_tsvector is
> > obscenely slow for some files. After some profiling I found that this is
> >  due using a seperate TSParser in p_ishost/p_isURLPath in wparser_def.c. If
> >  a multibyte encoding is in use TParserInit copies the whole remaining
> >  input and converts it to wchar_t or pg_wchar - for every email or protocol
> >  prefixed url in the the document. Which obviously is bad.
> >
> > I solved the issue by having a seperate TParserCopyInit/TParserCopyClose
> >  which reuses the the already converted strings of the original TParser -
> >  only at different offsets.
> >
> > Another approach would be to get rid of the separate parser invocations -
> > requiring a bunch of additional states. This seemed more complex to me, so
> >  I wanted to get some feedback first.
> >
> > Without patch:
> > andres=# SELECT to_tsvector('english', document) FROM document WHERE
> >  filename = '/usr/share/doc/libdrm-nouveau1/changelog';
> >
> >  ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
> > ????????????????????????????????????????????????????????????????????????????????? ...
> >  (1 row)
> >
> > Time: 5835.676 ms
> >
> > With patch:
> > andres=# SELECT to_tsvector('english', document) FROM document WHERE
> >  filename = '/usr/share/doc/libdrm-nouveau1/changelog';
> >
> >  ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
> > ????????????????????????????????????????????????????????????????????????????????? ...
> >  (1 row)
> >
> > Time: 395.341 ms
> >
> > Ill cleanup the patch if it seems like a sensible solution...
> As nobody commented here is a corrected (stupid thinko) and cleaned up
> version. Anyone cares to comment whether I am the only one thinking this is an
> issue?
>
> Andres

+1

As a user of tsearch, I can certainly appreciate to speed-up in parsing --
more CPU for everyone else.

Regards,
Ken

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] tsearch parser inefficiency if text includes urls or emails - new version

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 08 November 2009 17:41:15 Kenneth Marshall wrote:
> On Sun, Nov 08, 2009 at 05:00:53PM +0100, Andres Freund wrote:
> > As nobody commented here is a corrected (stupid thinko) and cleaned up
> > version. Anyone cares to comment whether I am the only one thinking this
> > is an issue?
> > Andres
> +1
> As a user of tsearch, I can certainly appreciate to speed-up in parsing --
> more CPU for everyone else.
Please note that this is mostly an issue when using rather long documents
including either email addresses (ie. an @) or links with protocol prefixes
(like ?+://) - so it might not give you personally a benefit :-(

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: tsearch parser inefficiency if text includes urls or emails - new version

by Kevin Grittner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andres Freund <andres@...> wrote:

> On Sunday 01 November 2009 16:19:43 Andres Freund wrote:
>> While playing around/evaluating tsearch I notices that to_tsvector
>> is obscenely slow for some files. After some profiling I found that
>> this is due using a seperate TSParser in p_ishost/p_isURLPath in
>> wparser_def.c. If a multibyte encoding is in use TParserInit copies
>> the whole remaining input and converts it to wchar_t or pg_wchar -
>> for every email or protocol prefixed url in the the document. Which
>> obviously is bad.
>>
>> I solved the issue by having a seperate TParserCopyInit/
>> TParserCopyClose which reuses the the already converted strings of
>> the original TParser - only at different offsets.
>>
>> Another approach would be to get rid of the separate parser
>> invocations - requiring a bunch of additional states. This seemed
>> more complex to me, so I wanted to get some feedback first.
 
>> Without patch:
 
>> Time: 5835.676 ms
 
>> With patch:
 
>> Time: 395.341 ms
 
> As nobody commented here is a corrected (stupid thinko) and cleaned
> up version.
 
I've taken this one for review, and have taken a preliminary look.
 
It is in context format, applies cleanly, and passes "make check".
Next I read through the code, and have the same question that Andres
posed 12 days ago.  His patch massively reduces the cost of the parser
recursively calling itself for some cases, and it seems like the least
invasive way to modify the parser to solve this performance problem;
but it does beg the question of why a state machine like this should
recursively call itself when it hits certain states.
 
The patch is mitigating the penalty for what seems to me to be an
existing ugly kludge.  Is it acceptable to address the problem in this
way, or should the much more invasive work be done to eliminate the
kludge?  (Note: I personally would much rather see the performance
penalty addressed this way, and a TODO added for the more invasive
work, than to leave this alone for the next release if there's nobody
willing to tackle the problem at a more fundamental level.)
 
If nobody has a contrary opinion, I'll proceed with the review of this
patch and add something to the TODO page for the more invasive work.
 
-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: tsearch parser inefficiency if text includes urls or emails - new version

by Alvaro Herrera-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kevin Grittner wrote:

> (Note: I personally would much rather see the performance
> penalty addressed this way, and a TODO added for the more invasive
> work, than to leave this alone for the next release if there's nobody
> willing to tackle the problem at a more fundamental level.)

+1

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: tsearch parser inefficiency if text includes urls or emails - new version

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote:
> It is in context format, applies cleanly, and passes "make check".
Unfortunately the latter is not saying much - I had a bug there and it was not
found by the regression tests. Perhaps I should take a stab and add at least
some more...

> It is in context format, applies cleanly, and passes "make check".
> Next I read through the code, and have the same question that Andres
> posed 12 days ago.  His patch massively reduces the cost of the parser
> recursively calling itself for some cases, and it seems like the least
> invasive way to modify the parser to solve this performance problem;
> but it does beg the question of why a state machine like this should
> recursively call itself when it hits certain states.
I was wondering about that as well. I am not completely sure but to me it
looks like its just done to reduce the amount of rules and states.

I have to say that that code is not exactly clear and well documented...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Parent Message unknown Re: tsearch parser inefficiency if text includes urls or emails - new version

by Kevin Grittner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andres Freund  wrote:
On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote:
>> It is in context format, applies cleanly, and passes "make check".
> Unfortunately the latter is not saying much - I had a bug there and
> it was not found by the regression tests. Perhaps I should take a
> stab and add at least some more...
 
Sounds like a good idea.  The one thing to avoid is anything with a
long enough run time to annoy those that run it many times a day.
 

>> It is in context format, applies cleanly, and passes "make check".
>> Next I read through the code, and have the same question that
>> Andres posed 12 days ago. His patch massively reduces the cost of
>> the parser recursively calling itself for some cases, and it seems
>> like the least invasive way to modify the parser to solve this
>> performance problem; but it does beg the question of why a state
>> machine like this should recursively call itself when it hits
>> certain states.
> I was wondering about that as well. I am not completely sure but to
> me it looks like its just done to reduce the amount of rules and
> states.
 
I'm assuming that's the reason, but didn't dig deep enough to be sure.
I suspect to be really sure, I'd have to set it up without the
recursion and see what breaks.  I can't imagine it would be anything
which couldn't be fixed by adding enough states; but perhaps they ran
into something where these types would require so many new states that
the recursion seemed like the lesser of evils.
 
> I have to say that that code is not exactly clear and well
> documented...
 
Yeah.  I was happy with the level of documentation that you added with
your new code, but what was there before is mighty thin.  If you
gleaned enough information while working on it to feel comfortable
adding documentation anywhere else, that would be a good thing.
 
So far the only vote is to proceed with the mitigation, which was my
preference, and apparently yours -- so I guess we're at 3 to 0 in
favor of that.  I'll mark the patch as "Waiting on Author" so you can
add any comments and regression tests you feel are appropriate.
 
By the way, I found one typo in the comments -- it should by useful,
not usefull.
 
I liked what I saw so far, but need to spend more time desk-checking
for correctness, testing to confirm that it doesn't change results,
and confirming the performance improvement.
 
-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: tsearch parser inefficiency if text includes urls or emails - new version

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote:

> Andres Freund  wrote:
>
> On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote:
> >> It is in context format, applies cleanly, and passes "make check".
> >
> > Unfortunately the latter is not saying much - I had a bug there and
> > it was not found by the regression tests. Perhaps I should take a
> > stab and add at least some more...
>
> Sounds like a good idea.  The one thing to avoid is anything with a
> long enough run time to annoy those that run it many times a day.
Hm. There actually are tests excercising the part where I had a bug...
Strange.
It was a bug involving uninitialized data so probably the regression tests
where just "lucky".

> >> It is in context format, applies cleanly, and passes "make check".
> >> Next I read through the code, and have the same question that
> >> Andres posed 12 days ago. His patch massively reduces the cost of
> >> the parser recursively calling itself for some cases, and it seems
> >> like the least invasive way to modify the parser to solve this
> >> performance problem; but it does beg the question of why a state
> >> machine like this should recursively call itself when it hits
> >> certain states.
> > I was wondering about that as well. I am not completely sure but to
> > me it looks like its just done to reduce the amount of rules and
> > states.
> I'm assuming that's the reason, but didn't dig deep enough to be sure.
> I suspect to be really sure, I'd have to set it up without the
> recursion and see what breaks.  I can't imagine it would be anything
> which couldn't be fixed by adding enough states; but perhaps they ran
> into something where these types would require so many new states that
> the recursion seemed like the lesser of evils.
This is similar to my understanding...

> > I have to say that that code is not exactly clear and well
> > documented...
> Yeah.  I was happy with the level of documentation that you added with
> your new code, but what was there before is mighty thin.  If you
> gleaned enough information while working on it to feel comfortable
> adding documentation anywhere else, that would be a good thing.
It definitely would be a good thing. But that would definitely be seperate
patch. But I fear my current leel of knowledge is sufficient and also I am not
sure if I can make myself interested enough in that part.

> So far the only vote is to proceed with the mitigation, which was my
> preference, and apparently yours -- so I guess we're at 3 to 0 in
> favor of that.  I'll mark the patch as "Waiting on Author" so you can
> add any comments and regression tests you feel are appropriate.
>
> By the way, I found one typo in the comments -- it should by useful,
> not usefull.
Ok, will update.

> I liked what I saw so far, but need to spend more time desk-checking
> for correctness, testing to confirm that it doesn't change results,
> and confirming the performance improvement.
Thanks again for your reviewing!


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: tsearch parser inefficiency if text includes urls or emails - new version

by Kevin Grittner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andres Freund <andres@...> wrote:
> On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote:
>> Andres Freund  wrote:
 
>> > I had a bug there and it was not found by the regression tests.
>> > Perhaps I should take a stab and add at least some more...
>>
>> Sounds like a good idea.
 
> Hm. There actually are tests excercising the part where I had a
> bug...   Strange.  It was a bug involving uninitialized data so
> probably the regression tests where just "lucky".
 
OK.  I won't be looking for extra tests.
 

>> > I have to say that that code is not exactly clear and well
>> > documented...
>> Yeah.  I was happy with the level of documentation that you added
>> with your new code, but what was there before is mighty thin.  If
>> you gleaned enough information while working on it to feel
>> comfortable adding documentation anywhere else, that would be a
>> good thing.
> It definitely would be a good thing. But that would definitely be
> seperate patch. But I fear my current leel of knowledge is
> sufficient and also I am not sure if I can make myself interested
> enough in that part.
 
Fair enough.  I won't be looking for new comments for the old code.
 
>> By the way, I found one typo in the comments -- it should by
>> useful, not usefull.
> Ok, will update.
 
Given how trivial that is, I'm putting this back in "Needs Review"
status, and resuming my review work.  Barring surprises, I should wrap
this up whenever I can free up a two or three hours.
 
-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@...)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers