[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="><

View: New views
6 Messages — Rating Filter:   Alert me  

[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="><

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


URL:
  <http://savannah.nongnu.org/support/?105905>

                 Summary: Occasionally, produces a huge xml output with
contents: a rel="><
                 Project: MHonArc
            Submitted by: mgirod
            Submitted on: Tuesday 06/19/2007 at 16:01
                Category: None
                Priority: 5 - Normal
                Severity: 3 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email:
             Open/Closed: Open
         Discussion Lock: Any
        Operating System: GNU/Linux

    _______________________________________________________

Details:

Hello,

I took over the maintenance of our internal web site.
There we use mhonarc to archive mailing lists, and generate rss feeds.
We use it from a python wrapper, named mailapp, which I believe was written
in-house.

Occasionally (Feb 21, June 18), one process starts taking up all the host
resources (80% of the memory, significant CPU share).
The number of mailapp processes raises--my understanding is that the topmost
is not releasing a lock, and new ones are just cropping.

I can then find two files of huge sizes (20M):
- rawmsgs.txt
- rss20.xml

Renaming them away, and killing the processes solves the problem
temporarily.

Looking at the contents of the xml file, I found beyond the first  100k a
repeated pattern of:

  a rel="><

eng-artix-merge> head -c 101100 rss20.xml.away | tail -c 100; echo
 -0,0 +1,99 @@<br />+<a rel="><a rel="><a rel="><a
rel="><a rel="&g

I am not sure what is the cause and what the effects.
I couldn't find an existing report which would clearly describe the same
problem.
Any insights are welcome.

Marc




    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/


Parent Message unknown [approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="&gt;&lt;

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #2, sr #105905 (project mhonarc):

It's hard to provide any insight w/o knowing how mhonarc
is actually invoked and what resource configurations are
being applied.

Knowing the version of mhonarc in use may help also.

Some resource settings can cause a performance hit, and
there has been issues in the past where quirks in perl's
regex engine have been exposed, causing problems (like
infinite loops or consuming memory until things crash).

Your comments imply that archive updates are probably
not done in the most resource efficient manner.  If
things are configured to update archives right when messages
arrive, this is generally bad since it does not scale well,
and can lead to bottlenecks as mhonarc processes can queue
up waiting for a lock to release.

I favor a batch model where new messages are processed
on a periodic basis (maybe via cron).

If you can, see if you can isolate which archive and/or
messages causes the problem.  This way one can try to
see if the problem can be replicated outside of your
environment.

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/


[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="&gt;&lt;

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #3, sr #105905 (project mhonarc):

Thanks for your feed-back and comments!
I try to answer your questions (1-3), and then add an update (4).

1. the way mhonarc is invoked is indeed as you guessed, on reception of each
and every mail:

~> grep mailapp /etc/aliases
eng-mailapp: "| /usr/bin/python /x1/eng-mailapp/bin/mailapp"
~> grep mhonarc /x1/eng-mailapp/bin/mailapp
mhonarc = '/usr/local/bin/mhonarc'
mhonarc_cmd = '%s -quiet -outdir %s -add -rcfile
/x1/eng-mailapp/conf/mhonarc.mrc -definevar MAILLISTNAME=%s'
            cmd = mhonarc_cmd % (mhonarc, dir, elist)
                log.exception('problem with mhonarc child process: %s',
str(e))
                    log.error('%s exited with status %d', mhonarc, rt)
                        log.error('output from %s was:', mhonarc)
~> wc -l /x1/eng-mailapp/conf/mhonarc.mrc
483 /x1/eng-mailapp/conf/mhonarc.mrc

I attached this mrc file.

Of course, I am open to suggestions to switch to an other configuration.
As far as I can say, the mailapp script dispatches the incoming mails into
distinct mailboxes, and runs mhonarc on the relevant one.

2.
~> /usr/local/bin/mhonarc -v
  MHonArc v2.6.16 (Perl 5.008005 linux)
  Copyright (C) 1995-2005  Earl Hood, mhonarc@...
  MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only
  under the terms of the GNU General Public License, which may be found in
  the MHonArc distribution.

3. Isolate an archive... all the errors detected were for the same one
(eng-artix-merge which I meantioned), which has also the highest load.
However, I could not isolate a message, or a pattern. This was the intend of
my convoluted 'head' commands, but I found different patterns in different
cases.

4. We may have a general resource problem on this host. At least one other
functionality (generation of rss feeds from wiki pages) is also affected.

Thanks again.

(file #13241)
    _______________________________________________________

Additional Item Attachment:



    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/


[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="&gt;&lt;

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #4, sr #105905 (project mhonarc):

Actually, I forgot to mention that right now, the generation of rss feeds
(and only it) for mail archives is completed stopped, including in the
eng-artix-merge archive, and this since June 20.
[I was one week away in the meanwhile].
My last move had been to rename away the huge xml files, and to kill the
processes with an open file descriptor on them.


    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/


[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="&gt;&lt;

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #5, sr #105905 (project mhonarc):

Looking at the mrc file you provided, I saw nothing in
it that indicates mhonarc is responsible for creating
the RSS XML file that seems to be causing you problems.

Are you sure that mhonarc is creating the XML file or
could it be the wrapper/caller script instead?

Is the wrapper calling mhonarc with -otherindexes?

As for alternative configurations, you can see how
mharc, (http://www.mhonarc.org/mharc/), is designed
to process mail.  The Install doc provides info on
how and why mharc is setup the way it is.

Note, I do not know if mharc is appropos for what you
have, but the design model of it may give you ideas on how
to change things on your system to make it more scalable
and robust.

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/


[approved] [sr #105905] Occasionally, produces a huge xml output with contents: a rel="&gt;&lt;

by gry-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #6, sr #105905 (project mhonarc):

Thanks for your comments.
I now understand that you are right, and that the production of rss feeds
must be done otherwise.
My wrapper does not invoke mhonarc with -otherindexes.
I'll consider mharc.
Thank again!
Marc

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/support/?105905>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/