|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
profiling resultsIn the never ending quest for speed, I read up on
how to profile Perl programs, and ran on mhonarc. Here's the results and what to read if you want to try the same thing. http://www.perl.com/pub/a/2004/06/25/profiling.html Does this look reasonable to people? Anything obviously weird? Jeff Total Elapsed Time = 7.334524 Seconds User+System Time = 3.794524 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 20.3 0.773 1.455 7 0.1104 0.2079 mhonarc::sort_messages 18.6 0.707 0.707 446811 0.0000 0.0000 mhonarc::get_time_from_index 14.7 0.558 0.558 4805 0.0001 0.0001 MHonArc::RFC822::tokenise 14.4 0.548 2.264 13800 0.0000 0.0002 mhonarc::replace_li_var 5.09 0.193 0.193 13037 0.0000 0.0000 mhonarc::compute_msg_pos 4.77 0.181 0.561 9538 0.0000 0.0001 MHonArc::UTF8::Encode::clip 4.48 0.170 0.319 1 0.1700 0.3193 mhonarc::get_resources 3.93 0.149 0.149 42008 0.0000 0.0000 mhonarc::escape_str 3.85 0.146 0.295 3020 0.0000 0.0001 mhonarc::print_var 3.45 0.131 0.283 19077 0.0000 0.0000 Encode::decode 3.00 0.114 1.971 3 0.0380 0.6570 mhonarc::write_main_index 2.77 0.105 0.663 4757 0.0000 0.0001 mhonarc::extract_email_name 2.66 0.101 0.177 28617 0.0000 0.0000 Encode::find_encoding 2.00 0.076 0.076 7782 0.0000 0.0000 mhonarc::fmt_msgnum 2.00 0.076 0.076 28617 0.0000 0.0000 Encode::getEncoding |
|
|
Re: profiling results(I'm back from out-of-town and catching up on email)
On April 12, 2007 at 20:34, "Jeff Breidenbach" wrote: > Does this look reasonable to people? Anything obviously > weird? > Total Elapsed Time = 7.334524 Seconds > User+System Time = 3.794524 Seconds > Exclusive Times > %Time ExclSec CumulS #Calls sec/call Csec/c Name > 20.3 0.773 1.455 7 0.1104 0.2079 mhonarc::sort_messages Sorting does not surprise me. MHonArc does not keep a persistent sorted data structure, so it resorts everytime new messages are added (under the assumption that messages may come in in arbitrary order). This can definitely be painful if one updates an archive on-the-fly versus doing a queuing-batch model. In the latter, multiple messages may be added in a single invocations, avoiding the resorting for each message added. Do you invoke mhonarc for each new message for a list or do you queue up messages for a given list (over a specified period) before invoking mhonarc for the list? Note, sorting includes thread sorting, which is the most complicated. Some speed increase may be possible by disabling SUBJECTTHREADS (this is mentioned in the Performance Tips doc). However, disabling SUBJECTTHREADS may have a usability impact for messages that fail to define the proper reference headers. For large scale usage, a (robust) persistent data structure is needed. However, such a structure would require a redesign of mhonarc internals. > 18.6 0.707 0.707 446811 0.0000 0.0000 mhonarc::get_time_from_index This is due to the Perl 4 legacy code base. The unique index for each message also contains the date-time stamp applicable for the message. It may be possible to add in a new hash to just maintain the date-time information to avoid the split() operation each time get_time_from_index is invoked. This will cause an increase in the database size (and in memory size), but it may be negligable in the grand-scheme of things. I think when mhonarc was first written (and it was not called mhonarc), I favored reducing the numbering of hashes used versus performance gains (since performance was not a real issue since I did not forsee mhonarc being used at such a large scale). > 14.7 0.558 0.558 4805 0.0001 0.0001 MHonArc::RFC822::tokenise This code is non-trivial since it does full RFC-822 parsing. Older versions of mhonarc used to use a more simple parsing routine, but a more robust routine was required as mhonarc evolved (and to address bugs in email name add address extraction). > 14.4 0.548 2.264 13800 0.0000 0.0002 mhonarc::replace_li_var Minimizing variable usage in resource files is the main way to reduce the calls to this routine. However, resource file maintenance concerns may trump any performance hit gained. > 5.09 0.193 0.193 13037 0.0000 0.0000 mhonarc::compute_msg_pos This is part of resource variable resolution. See <http://www.mhonarc.org/MHonArc/doc/guides/performance.html#mesg_spec> on how to minimize the performance impact of this routine. > 4.77 0.181 0.561 9538 0.0000 0.0001 MHonArc::UTF8::Encode::clip This actually is more efficient than using the default CHARSETCONVERTERS model. I.e. Encoding everything to UTF-8 is more efficient (assuming proper resource settings). In MHonArc's default configuration, charset conversion can be very costly when dealing with non-ASCII messages. Years ago, I discovered this when doing my own profiling tests on MHonArc when performance complaints were raised when more extensive charset routines were added. > 4.48 0.170 0.319 1 0.1700 0.3193 mhonarc::get_resources This loads in the resource file(s). --ewh |
|
|
Re: profiling results> Do you invoke mhonarc for each new message for a list or do you
> queue up messages for a given list (over a specified period) before > invoking mhonarc for the list? Neither. The algorithm is to receive mail for a bunch of archives. When mhonarc is invoked, it batches as many messages (for a given archive) as are available. A single loop invokes mhonarc again and again. This approach scales very gracefully because a big message backlog leads to more batch processing. Wednesday I invoked mhonarc 12352 times, of which 7932 runs were for singleton messages. Average latency from message reception to archival hovered around two hours. I'm interested in reducing this latency, and even a small speed increase for mhonarc would help. Jeff |
|
|
Re: profiling resultsThe number of calls look pretty interesting. For example,
there are almost five thousand calls to the RFC822 tokenizer when adding a single message to an archive. Anyway, sounds like the easiest big programmatic gains might come from get_time_from_index and possibly reviewing some of the call counts. Another possible win is reviewing resource files and see if there are some unnecessary variables that can be trimmed out. I'll go ahead and do the latter since it is super easy. Thanks for the thorough analysis. Jeff |
| Free embeddable forum powered by Nabble | Forum Help |