|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
[scala] Scala Actors - no speedup?Hi,
I tried to implement Tim Bray's Wide Finder proposal (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in Scala, using Actors. The code is here: http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/ Basically, it's about parsing a log file using regular expressions. In my code, I have one coordinator who reads the log file and sends work chunks to several analyzers, and the combines their work again. As I write there, I don't see any speedup when running the code using Actors in comparison with the serial version. Any ideas? Am I doing something wrong? Is there any way to specify the number of threads that should be used for Actor execution? Regards, Martin |
|
|
Re: [scala] Scala Actors - no speedup?How many cores in your machine?
Martin Probst wrote: > Hi, > > I tried to implement Tim Bray's Wide Finder proposal > (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in Scala, > using Actors. The code is here: > http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/ > > Basically, it's about parsing a log file using regular expressions. In my > code, I have one coordinator who reads the log file and sends work chunks > to several analyzers, and the combines their work again. > > As I write there, I don't see any speedup when running the code using > Actors in comparison with the serial version. Any ideas? Am I doing > something wrong? > > Is there any way to specify the number of threads that should be used for > Actor execution? > > Regards, > Martin > > |
|
|
Re: [scala] Scala Actors - no speedup?I'll offer one observation and a speculation,
1) java.util.regex.Matcher is not thread-safe, so you'll likely run into problems if you use it in different actors unless you make it actor-specific 2) when sending a message to an actor, the processing happens in the same thread unless/until the actor blocks, in which case the thread is released and the caller/callee are disjoined. (My understanding is this is a [default] optimization to balance fine/coarse-grained processing to avoid costly context-switches) I'm guessing everything is happening in the same thread because of #2, and it's probably why you haven't witnessed #1 yet. I'm not sure exactly how to go about forcing a different thread to be used for the processing. alex On 9/25/07, Martin Probst <mail@...> wrote: Hi, |
|
|
Re: [scala] Scala Actors - no speedup?Sorry... should have read your blog post...
The first problem is your hardware... the hard drive in the MacBook Pro is a dog. The second problem is your OS... the Mach Kernel is notoriously bad for doing IO... Linus has an occasional rant about this. My MacBook Pro has something like 1/2 the disk read performance of my ThinkPad T41 running Ubuntu (same disk rotation speed and the TPad has an IDE drive, not SATA.) My guess is that you're spending nearly a whole CPU doing disk IO... you might try a larger file and see how much of your time is spent in System vs. User. David Pollak wrote: > How many cores in your machine? > > Martin Probst wrote: >> Hi, >> >> I tried to implement Tim Bray's Wide Finder proposal >> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in >> Scala, >> using Actors. The code is here: >> http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/ >> >> Basically, it's about parsing a log file using regular expressions. >> In my >> code, I have one coordinator who reads the log file and sends work >> chunks >> to several analyzers, and the combines their work again. >> >> As I write there, I don't see any speedup when running the code using >> Actors in comparison with the serial version. Any ideas? Am I doing >> something wrong? >> >> Is there any way to specify the number of threads that should be used >> for >> Actor execution? >> >> Regards, >> Martin >> >> > |
|
|
Re: [scala] Scala Actors - no speedup?Thanks for the responses.
So the machine is indeed a dual core MacBook Pro 2.0 GHz, forgot to mention that. When I run it using 'time' on ~250 MB, it says that I spend nearly no time in sys, all just user. Looking at the actual CPU utilization (using MenuMeters) doesn't really tell me anything. In the (supposedly) single threaded case, one CPU gets some limited use, and one is pegged. In the Actors case, both CPUs appear to be busy - but I don't really trust that tool, to be honest. I hoped someone could tell me how/where to find out how many threads are being used. I'm using 'react' and not 'receive', and 'react' is - according to the actors tutorial - not forcing the actor to use a single thread. So maybe they are using none at all? On the other hand, increasing the number of actors beyond my number of cores, e.g. to 32, doesn't harm performance significantly, which might mean it's not working at all or the actor-based concurrency has very limited overhead, which is of course good :-) Regards, Martin On Tue, 25 Sep 2007 09:05:11 -0700, David Pollak <dpp@...> wrote: > Sorry... should have read your blog post... > > The first problem is your hardware... the hard drive in the MacBook Pro > is a dog. > > The second problem is your OS... the Mach Kernel is notoriously bad for > doing IO... Linus has an occasional rant about this. My MacBook Pro has > something like 1/2 the disk read performance of my ThinkPad T41 running > Ubuntu (same disk rotation speed and the TPad has an IDE drive, not SATA.) > > My guess is that you're spending nearly a whole CPU doing disk IO... you > might try a larger file and see how much of your time is spent in System > vs. User. > > David Pollak wrote: >> How many cores in your machine? >> >> Martin Probst wrote: >>> Hi, >>> >>> I tried to implement Tim Bray's Wide Finder proposal >>> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in >>> Scala, >>> using Actors. The code is here: >>> http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/ >>> >>> Basically, it's about parsing a log file using regular expressions. >>> In my >>> code, I have one coordinator who reads the log file and sends work >>> chunks >>> to several analyzers, and the combines their work again. >>> >>> As I write there, I don't see any speedup when running the code using >>> Actors in comparison with the serial version. Any ideas? Am I doing >>> something wrong? >>> >>> Is there any way to specify the number of threads that should be used >>> for >>> Actor execution? >>> >>> Regards, >>> Martin >>> >>> >> |
|
|
Re: [scala] Scala Actors - no speedup?> 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into
> problems if you use it in different actors unless you make it > actor-specific I'm aware of that, I only share the pattern between Actors. Also, you have to create a new matcher for each line anyways. > 2) when sending a message to an actor, the processing happens in the same > thread unless/until the actor blocks, in which case the thread is released > and the caller/callee are disjoined. (My understanding is this is a > [default] optimization to balance fine/coarse-grained processing to avoid > costly context-switches) What exactly is blocking in this context? Waiting for another message send? As said, my code uses a coordinator that dispatches work to multiple actors. That might be executed in a single thread, as it dispatches the work one after the other, and actors simply run through the work, so there is no complex message sending that might block actors. Martin |
|
|
Re: [scala] Scala Actors - no speedup?Martin,
On Tuesday 25 September 2007 09:23, Martin Probst wrote: > ... > > ... Also, you have to create a new matcher for each line anyways. Not true. Check out the JavaDoc API for Matcher.reset(java.lang.CharSequence). > ... > > Martin Randall Schulz |
|
|
Re: [scala] Scala Actors - no speedup?As a sidenote on the topic, is there any skeleton implementations of dispatcher-based actor code?
I'm doing some testing and could surely use some demo-code. (No re-inventing of the round thing used for transportation of stuff) Best regards /Viktor On 9/25/07, Martin Probst <mail@...> wrote: > 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into |
|
|
Re: [scala] Scala Actors - no speedup?Actually, both of my comments where wrong. I had misread how you used Pattern and it turns out actors send messages via a thread pool so sending is non-blocking by default (I just tested this).
Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset. So I'm now leaning towards blaming the JIT compiler for the anomaly. Microbenchmarks and all... alex On 9/25/07, Martin Probst <mail@...> wrote: > 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into |
|
|
Re: [scala] Scala Actors - no speedup?Viktor Klang wrote:
> As a sidenote on the topic, is there any skeleton implementations of > dispatcher-based actor code? Not that I am aware of. This seems to be a common pattern, though. For example, we have a parallel test runner to test for regressions in the Scala compiler that uses a similar pattern. I think an actor-based implementation of MapReduce (http://labs.google.com/papers/mapreduce.html), or one that interfaces nicely with actors would be a very useful thing. Philipp |
|
|
Re: [scala] Scala Actors - no speedup?On 9/25/07, Alex Boisvert <boisvert@...> wrote:
> Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset. So > I'm now leaning towards blaming the JIT compiler for the anomaly. > Microbenchmarks and all... Umm, jumping the gun ? I personally would look at other places first exhaustively. Same JIT compiler is being used in large number of parallel applications world over, if your statement is really representative, these guys won't be bothering with multithreading & parallel processing in Java/JVM. I too shall try to reimplement it, and let you know about the results. - Akhilesh |
|
|
Re: [scala] Scala Actors - no speedup?From Tim's blog comment:
http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder In short, unless you bork the regex to make it cpu bound, you ain't gonna see much improvement in Java/Scala -----------------snip------------------ From: CB (Sep 25 2007, at 04:53) In Java processing 1 million lines in one thread takes about 4 seconds. Just the IO to read in those lines takes about 3 seconds. This suggests that this problem is not worth parallelizing. I tried the simplest multithreaded approach in Java. The main thread sticks lines on a queue and a another thread pulls them off and the regexp match etc. The performance of this approach (on a 4 cpu box) is almost twice as slow as the single threaded solution. By collecting batches of lines and putting them on the queue the performance gets to around the same level as the single thread. This is not because Java's concurrency support is bad. The util concurrent blocking queues are very efficient (probably as efficient as the message queues for Erlang processes). The amount of work done for each line is just so little that the overhead of locking and queue operations are just not worth it. The only sensible concurrency that you could get for this problem is by storing different day's logs on different disks and spawning a thread/task/process for each log file, finally combining the maps at the end. --------------snip--------------- On 9/26/07, Akhilesh Mritunjai <mritun@...> wrote: > On 9/25/07, Alex Boisvert <boisvert@...> wrote: > > Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset. So > > I'm now leaning towards blaming the JIT compiler for the anomaly. > > Microbenchmarks and all... > > Umm, jumping the gun ? > > I personally would look at other places first exhaustively. Same JIT > compiler is being used in large number of parallel applications world > over, if your statement is really representative, these guys won't be > bothering with multithreading & parallel processing in Java/JVM. > > I too shall try to reimplement it, and let you know about the results. > > - Akhilesh > |
|
|
Re: [scala] Scala Actors - no speedup?On Tuesday 25 September 2007 13:34, Akhilesh Mritunjai wrote:
> From Tim's blog comment: > http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder > > ... > > -----------------snip------------------ > From: CB (Sep 25 2007, at 04:53) > > ... The amount of work done for > each line is just so little that the > overhead of locking and queue > operations are just not worth it. > > ... > --------------snip--------------- Tim's comments make me wonder if the "biased locking" option of receent Sun JVM's might be advantageous for this particular algorithm. See section 4.2.5 of <http://java.sun.com/performance/reference/whitepapers/tuning.html> and <http://java.sun.com/javase/technologies/hotspot/publications/biasedlocking_oopsla2006_wp.pdf> Randall Schulz |
|
|
Re: [scala] Scala Actors - no speedup?On 9/26/07, Randall R Schulz <rschulz@...> wrote:
> > Tim's comments make me wonder if the "biased locking" option of receent > Sun JVM's might be advantageous for this particular algorithm. > Those are not Tim's observations, rather a reader's comment on his blog. I've just performed another micro-benchmark to estimate the maximum theroretical regexp performance of Java 6 regexp library for his case. Result: time: 29212ms. 1003 MB/s, 100000000 matches JVM: Sun Java 6 update 2 Platform: Windows XP Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache So *even* if the whole thing is done in a single thread, the performance is around 1 gig/s... so there is absolutely no point parallelizing this thing. Even my 3 yr old processor (let alone a more modern Core2Duo@2GHz) would easily saturate ANY potential data stream you can supply... a typical HDD doesn't go over 80 MB/s. Storm in a tea-cup :-P - Akhilesh |
|
|
Re: [scala] Scala Actors - no speedup?On 9/25/07, Akhilesh Mritunjai <mritun@...> wrote: On 9/26/07, Randall R Schulz <rschulz@...> wrote: Not to be a pest about it, but the Dothan revision of the Banias core only sported 2MBs of L2 cache. So *even* if the whole thing is done in a single thread, the |
|
|
Re: [scala] Scala Actors - no speedup?On Tuesday 25 September 2007 14:13, Akhilesh Mritunjai wrote:
> ... > > So *even* if the whole thing is done in a single thread, the > performance is around 1 gig/s... so there is absolutely no point > parallelizing this thing. Even my 3 yr old processor (let alone a > more modern Core2Duo@2GHz) would easily saturate ANY potential data > stream you can supply... a typical HDD doesn't go over 80 MB/s. "Typical" personal / desktop computers, maybe. But servers today often use 15,000 RPM drives connected by Ultra-320 SCSI busses. The last workstation I put together for my office uses one 15,000 RPM Ultra-320 drive and on 10,000 RPM SATA drive. Furthermore, the first flash-RAM-based mass storage devices (as in disk replacements) are now becoming available. I don't know what their performance characteristics are, but it's time for software engineers to start preparing for computing hardware that is, mercifully, free of rotating electromechanical mass storage. Hallelujah! > ... > - Akhilesh Randall Schulz |
|
|
Re: [scala] Scala Actors - no speedup?> I've just performed another micro-benchmark to estimate the maximum
> theroretical regexp performance of Java 6 regexp library for his case. > > Result: time: 29212ms. 1003 MB/s, 100000000 matches > > JVM: Sun Java 6 update 2 > Platform: Windows XP > Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache > > So *even* if the whole thing is done in a single thread, the > performance is around 1 gig/s... so there is absolutely no point > parallelizing this thing. Even my 3 yr old processor (let alone a more > modern Core2Duo@2GHz) would easily saturate ANY potential data stream > you can supply... a typical HDD doesn't go over 80 MB/s. Interesting. But on my mac, judging from the CPU monitor, the program is indeed CPU bound, or at least it fills up the CPU. A trivial program just reading in the file line-by-line take about 12 seconds. With regular expressions, in a single threaded case, it's about 18 seconds. So the regular expression matching does have an impact on the programs execution time. My use case are 12000000 lines, of which 10% match the regular expression, which is a 270 MB file. Of course, this is with "warm" buffers, i.e. after reading the file several times. Regards, Martin |
|
|
Re: [scala] Scala Actors - no speedup?On 9/25/07, Akhilesh Mritunjai <mritun@...> wrote:
On 9/25/07, Alex Boisvert <boisvert@...> wrote: I wasn't saying the JIT didn't work properly... I meant that it takes different times to warm up on different apps. If you're using Actors, you're loading more classes and you're JIT-ing more classes. In 1.3 second the JIT doesn't really have time to kick-in completely. So it's a micro-benchmark and it's difficult to compare given the startup overhead of the JVM. alex |
|
|
Re: [scala] Scala Actors - no speedup?On 9/26/07, Randall R Schulz <rschulz@...> wrote:
> On Tuesday 25 September 2007 14:13, Akhilesh Mritunjai wrote: > > ... > > > > So *even* if the whole thing is done in a single thread, the > > performance is around 1 gig/s... so there is absolutely no point > > parallelizing this thing. Even my 3 yr old processor (let alone a > > more modern Core2Duo@2GHz) would easily saturate ANY potential data > > stream you can supply... a typical HDD doesn't go over 80 MB/s. > > "Typical" personal / desktop computers, maybe. > > But servers today often use 15,000 RPM drives connected by Ultra-320 > SCSI busses. Their typical performance is around 100 MB/s. > Furthermore, the first flash-RAM-based mass storage devices (as in disk > replacements) are now becoming available. I don't know what their > performance characteristics are, but it's time for software engineers Their typical performance is even less than rotating magnetic disks and max out around 80 MB/s. Both of them contribute nothing to streaming data bandwidth, both only lower the disk lantencies (2ms typical for 15k rpm disks, few uS for flash). For more bandwidth, massively parallel setups are used. Sun's x4500 (aka Thumper) can sustain around 2GB/s from its 48 disks. I haven't seen anything (commerically available) that can sustain more. Even at that level, a "single" simple awk script can saturate the IO for this purpose because- 1. AWK regexp is the fastest out there 2. Thumper has got better processor (2X Opteron dual core). So unfortunately, IO bottlenecks ain't going anywhere soon. - Akhilesh |
|
|
Re: [scala] Scala Actors - no speedup?Hi Martin
On 9/26/07, Martin Probst <mail@...> wrote: > > I've just performed another micro-benchmark to estimate the maximum > > theroretical regexp performance of Java 6 regexp library for his case. > > > > Result: time: 29212ms. 1003 MB/s, 100000000 matches > > > > JVM: Sun Java 6 update 2 > > Platform: Windows XP > > Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache > > > > So *even* if the whole thing is done in a single thread, the > > performance is around 1 gig/s... so there is absolutely no point > > parallelizing this thing. Even my 3 yr old processor (let alone a more > > modern Core2Duo@2GHz) would easily saturate ANY potential data stream > > you can supply... a typical HDD doesn't go over 80 MB/s. > > Interesting. But on my mac, judging from the CPU monitor, the program > is indeed CPU bound, or at least it fills up the CPU. Can you just whip up a sample code in Scala that just reads the file (and throws away) ? I/O, specially in higher level languages, does consume CPU cycles because fetching the data itself requires CPU cycles. That is why Sun x4500 (Thumper) has 2X dual core opteron CPUs in there even though it's supposed to be a file server... sustaining 2GB/s (max it is capable of) requires serious CPU horse power. Your observation of introducing the regexp processing increasing the time can also be explained with this as regexp is basically sharing CPU with IO operations... which themselves require CPU. The thing to really worry about in this benchmark is that IO in Scala *can* potentially have so much overhead !! This is something I want to explore further. - Akhilesh |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |