[scala] Scala Actors - no speedup?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

[scala] Scala Actors - no speedup?

by Martin Probst-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I tried to implement Tim Bray's Wide Finder proposal
(http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in Scala,
using Actors. The code is here:
http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/

Basically, it's about parsing a log file using regular expressions. In my
code, I have one coordinator who reads the log file and sends work chunks
to several analyzers, and the combines their work again.

As I write there, I don't see any speedup when running the code using
Actors in comparison with the serial version. Any ideas? Am I doing
something wrong?

Is there any way to specify the number of threads that should be used for
Actor execution?

Regards,
Martin


Re: [scala] Scala Actors - no speedup?

by David Pollak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

How many cores in your machine?

Martin Probst wrote:

> Hi,
>
> I tried to implement Tim Bray's Wide Finder proposal
> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in Scala,
> using Actors. The code is here:
> http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/
>
> Basically, it's about parsing a log file using regular expressions. In my
> code, I have one coordinator who reads the log file and sends work chunks
> to several analyzers, and the combines their work again.
>
> As I write there, I don't see any speedup when running the code using
> Actors in comparison with the serial version. Any ideas? Am I doing
> something wrong?
>
> Is there any way to specify the number of threads that should be used for
> Actor execution?
>
> Regards,
> Martin
>
>  


Re: [scala] Scala Actors - no speedup?

by Alex Boisvert-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'll offer one observation and a speculation,

1) java.util.regex.Matcher is not thread-safe, so you'll likely run into problems if you use it in different actors unless you make it actor-specific

2) when sending a message to an actor, the processing happens in the same thread unless/until the actor blocks, in which case the thread is released and the caller/callee are disjoined.  (My understanding is this is a [default] optimization to balance fine/coarse-grained processing to avoid costly context-switches)

I'm guessing everything is happening in the same thread because of #2, and it's probably why you haven't witnessed #1 yet.  I'm not sure exactly how to go about forcing a different thread to be used for the processing.

alex


On 9/25/07, Martin Probst <mail@...> wrote:
Hi,

I tried to implement Tim Bray's Wide Finder proposal
(http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in Scala,
using Actors. The code is here:
http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/

Basically, it's about parsing a log file using regular expressions. In my
code, I have one coordinator who reads the log file and sends work chunks
to several analyzers, and the combines their work again.

As I write there, I don't see any speedup when running the code using
Actors in comparison with the serial version. Any ideas? Am I doing
something wrong?

Is there any way to specify the number of threads that should be used for
Actor execution?

Regards,
Martin



Re: [scala] Scala Actors - no speedup?

by David Pollak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry... should have read your blog post...

The first problem is your hardware... the hard drive in the MacBook Pro
is a dog.

The second problem is your OS... the Mach Kernel is notoriously bad for
doing IO... Linus has an occasional rant about this.  My MacBook Pro has
something like 1/2 the disk read performance of my ThinkPad T41 running
Ubuntu (same disk rotation speed and the TPad has an IDE drive, not SATA.)

My guess is that you're spending nearly a whole CPU doing disk IO... you
might try a larger file and see how much of your time is spent in System
vs. User.

David Pollak wrote:

> How many cores in your machine?
>
> Martin Probst wrote:
>> Hi,
>>
>> I tried to implement Tim Bray's Wide Finder proposal
>> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
>> Scala,
>> using Actors. The code is here:
>> http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/
>>
>> Basically, it's about parsing a log file using regular expressions.
>> In my
>> code, I have one coordinator who reads the log file and sends work
>> chunks
>> to several analyzers, and the combines their work again.
>>
>> As I write there, I don't see any speedup when running the code using
>> Actors in comparison with the serial version. Any ideas? Am I doing
>> something wrong?
>>
>> Is there any way to specify the number of threads that should be used
>> for
>> Actor execution?
>>
>> Regards,
>> Martin
>>
>>  
>


Re: [scala] Scala Actors - no speedup?

by Martin Probst-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the responses.

So the machine is indeed a dual core MacBook Pro 2.0 GHz, forgot to mention
that.

When I run it using 'time' on ~250 MB, it says that I spend nearly no time
in sys, all just user. Looking at the actual CPU utilization (using
MenuMeters) doesn't really tell me anything. In the (supposedly) single
threaded case, one CPU gets some limited use, and one is pegged. In the
Actors case, both CPUs appear to be busy - but I don't really trust that
tool, to be honest.

I hoped someone could tell me how/where to find out how many threads are
being used. I'm using 'react' and not 'receive', and 'react' is - according
to the actors tutorial - not forcing the actor to use a single thread. So
maybe they are using none at all?

On the other hand, increasing the number of actors beyond my number of
cores, e.g. to 32, doesn't harm performance significantly, which might mean
it's not working at all or the actor-based concurrency has very limited
overhead, which is of course good :-)

Regards,
Martin

On Tue, 25 Sep 2007 09:05:11 -0700, David Pollak <dpp@...> wrote:
> Sorry... should have read your blog post...
>
> The first problem is your hardware... the hard drive in the MacBook Pro
> is a dog.
>
> The second problem is your OS... the Mach Kernel is notoriously bad for
> doing IO... Linus has an occasional rant about this.  My MacBook Pro has
> something like 1/2 the disk read performance of my ThinkPad T41 running
> Ubuntu (same disk rotation speed and the TPad has an IDE drive, not
SATA.)

>
> My guess is that you're spending nearly a whole CPU doing disk IO... you
> might try a larger file and see how much of your time is spent in System
> vs. User.
>
> David Pollak wrote:
>> How many cores in your machine?
>>
>> Martin Probst wrote:
>>> Hi,
>>>
>>> I tried to implement Tim Bray's Wide Finder proposal
>>> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
>>> Scala,
>>> using Actors. The code is here:
>>> http://www.martin-probst.com/2007/09/24/wide-finder-in-scala/
>>>
>>> Basically, it's about parsing a log file using regular expressions.
>>> In my
>>> code, I have one coordinator who reads the log file and sends work
>>> chunks
>>> to several analyzers, and the combines their work again.
>>>
>>> As I write there, I don't see any speedup when running the code using
>>> Actors in comparison with the serial version. Any ideas? Am I doing
>>> something wrong?
>>>
>>> Is there any way to specify the number of threads that should be used
>>> for
>>> Actor execution?
>>>
>>> Regards,
>>> Martin
>>>
>>>
>>


Re: [scala] Scala Actors - no speedup?

by Martin Probst-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into
> problems if you use it in different actors unless you make it
> actor-specific

I'm aware of that, I only share the pattern between Actors. Also, you have
to create a new matcher for each line anyways.

> 2) when sending a message to an actor, the processing happens in the same
> thread unless/until the actor blocks, in which case the thread is
released
> and the caller/callee are disjoined.  (My understanding is this is a
> [default] optimization to balance fine/coarse-grained processing to avoid
> costly context-switches)

What exactly is blocking in this context? Waiting for another message send?

As said, my code uses a coordinator that dispatches work to multiple
actors. That might be executed in a single thread, as it dispatches the
work one after the other, and actors simply run through the work, so there
is no complex message sending that might block actors.

Martin


Re: [scala] Scala Actors - no speedup?

by Randall Schulz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin,

On Tuesday 25 September 2007 09:23, Martin Probst wrote:
> ...
>
> ... Also, you have to create a new matcher for each line anyways.

Not true. Check out the JavaDoc API for
Matcher.reset(java.lang.CharSequence).


> ...
>
> Martin


Randall Schulz

Re: [scala] Scala Actors - no speedup?

by Viktor Klang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

As a sidenote on the topic, is there any skeleton implementations of dispatcher-based actor code?

I'm doing some testing and could surely use some demo-code. (No re-inventing of the round thing used for transportation of stuff)

Best regards
/Viktor

On 9/25/07, Martin Probst <mail@...> wrote:
> 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into
> problems if you use it in different actors unless you make it
> actor-specific

I'm aware of that, I only share the pattern between Actors. Also, you have
to create a new matcher for each line anyways.

> 2) when sending a message to an actor, the processing happens in the same
> thread unless/until the actor blocks, in which case the thread is
released
> and the caller/callee are disjoined.  (My understanding is this is a
> [default] optimization to balance fine/coarse-grained processing to avoid
> costly context-switches)

What exactly is blocking in this context? Waiting for another message send?

As said, my code uses a coordinator that dispatches work to multiple
actors. That might be executed in a single thread, as it dispatches the
work one after the other, and actors simply run through the work, so there
is no complex message sending that might block actors.

Martin



Re: [scala] Scala Actors - no speedup?

by Alex Boisvert-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Actually, both of my comments where wrong.  I had misread how you used Pattern and it turns out actors send messages via a thread pool so sending is non-blocking by default (I just tested this).

Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset.   So I'm now leaning towards blaming the JIT compiler for the anomaly.  Microbenchmarks and all...

alex

On 9/25/07, Martin Probst <mail@...> wrote:
> 1) java.util.regex.Matcher is not thread-safe, so you'll likely run into
> problems if you use it in different actors unless you make it
> actor-specific

I'm aware of that, I only share the pattern between Actors. Also, you have
to create a new matcher for each line anyways.

> 2) when sending a message to an actor, the processing happens in the same
> thread unless/until the actor blocks, in which case the thread is
released
> and the caller/callee are disjoined.  (My understanding is this is a
> [default] optimization to balance fine/coarse-grained processing to avoid
> costly context-switches)

What exactly is blocking in this context? Waiting for another message send?

As said, my code uses a coordinator that dispatches work to multiple
actors. That might be executed in a single thread, as it dispatches the
work one after the other, and actors simply run through the work, so there
is no complex message sending that might block actors.

Martin



Re: [scala] Scala Actors - no speedup?

by Philipp Haller-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Viktor Klang wrote:
> As a sidenote on the topic, is there any skeleton implementations of
> dispatcher-based actor code?

Not that I am aware of. This seems to be a common pattern, though. For
example, we have a parallel test runner to test for regressions in the
Scala compiler that uses a similar pattern.

I think an actor-based implementation of MapReduce
(http://labs.google.com/papers/mapreduce.html), or one that interfaces
nicely with actors would be a very useful thing.

    Philipp


Re: [scala] Scala Actors - no speedup?

by Akhilesh Mritunjai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 9/25/07, Alex Boisvert <boisvert@...> wrote:
> Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset.   So
> I'm now leaning towards blaming the JIT compiler for the anomaly.
> Microbenchmarks and all...

Umm, jumping the gun ?

I personally would look at other places first exhaustively. Same JIT
compiler is being used in large number of parallel applications world
over, if your statement is really representative, these guys won't be
bothering with multithreading & parallel processing in Java/JVM.

I too shall try to reimplement it, and let you know about the results.

- Akhilesh

Re: [scala] Scala Actors - no speedup?

by Akhilesh Mritunjai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

From Tim's blog comment:
http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

In short, unless you bork the regex to make it cpu bound, you ain't
gonna see much improvement in Java/Scala

-----------------snip------------------
From: CB (Sep 25 2007, at 04:53)

In Java processing 1 million lines in one thread takes about 4
seconds. Just the IO to read in those lines takes about 3 seconds.
This suggests that this problem is not worth parallelizing.

I tried the simplest multithreaded approach in Java. The main thread
sticks lines on a queue and a another thread pulls them off and the
regexp match etc. The performance of this approach (on a 4 cpu box) is
almost twice as slow as the single threaded solution. By collecting
batches of lines and putting them on the queue the performance gets to
around the same level as the single thread. This is not because Java's
concurrency support is bad. The util concurrent blocking queues are
very efficient (probably as efficient as the message queues for Erlang
processes). The amount of work done for each line is just so little
that the overhead of locking and queue operations are just not worth
it.

The only sensible concurrency that you could get for this problem is
by storing different day's logs on different disks and spawning a
thread/task/process for each log file, finally combining the maps at
the end.
--------------snip---------------


On 9/26/07, Akhilesh Mritunjai <mritun@...> wrote:

> On 9/25/07, Alex Boisvert <boisvert@...> wrote:
> > Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset.   So
> > I'm now leaning towards blaming the JIT compiler for the anomaly.
> > Microbenchmarks and all...
>
> Umm, jumping the gun ?
>
> I personally would look at other places first exhaustively. Same JIT
> compiler is being used in large number of parallel applications world
> over, if your statement is really representative, these guys won't be
> bothering with multithreading & parallel processing in Java/JVM.
>
> I too shall try to reimplement it, and let you know about the results.
>
> - Akhilesh
>

Re: [scala] Scala Actors - no speedup?

by Randall Schulz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 25 September 2007 13:34, Akhilesh Mritunjai wrote:

> From Tim's blog comment:
> http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
>
> ...
>
> -----------------snip------------------
> From: CB (Sep 25 2007, at 04:53)
>
> ... The amount of work done for
> each line is just so little that the
> overhead of locking and queue
> operations are just not worth it.
>
> ...
> --------------snip---------------

Tim's comments make me wonder if the "biased locking" option of receent
Sun JVM's might be advantageous for this particular algorithm.

See section 4.2.5 of
<http://java.sun.com/performance/reference/whitepapers/tuning.html> and
<http://java.sun.com/javase/technologies/hotspot/publications/biasedlocking_oopsla2006_wp.pdf>


Randall Schulz

Re: [scala] Scala Actors - no speedup?

by Akhilesh Mritunjai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 9/26/07, Randall R Schulz <rschulz@...> wrote:
>
> Tim's comments make me wonder if the "biased locking" option of receent
> Sun JVM's might be advantageous for this particular algorithm.
>

Those are not Tim's observations, rather a reader's comment on his blog.

I've just performed another micro-benchmark to estimate the maximum
theroretical regexp performance of Java 6 regexp library for his case.

Result: time: 29212ms. 1003 MB/s, 100000000 matches

JVM: Sun Java 6 update 2
Platform: Windows XP
Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache

So *even* if the whole thing is done in a single thread, the
performance is around 1 gig/s... so there is absolutely no point
parallelizing this thing. Even my 3 yr old processor (let alone a more
modern Core2Duo@2GHz) would easily saturate ANY potential data stream
you can supply... a typical HDD doesn't go over 80 MB/s.

Storm in a tea-cup :-P

- Akhilesh

Re: [scala] Scala Actors - no speedup?

by Viktor Klang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 9/25/07, Akhilesh Mritunjai <mritun@...> wrote:
On 9/26/07, Randall R Schulz <rschulz@...> wrote:
>
> Tim's comments make me wonder if the "biased locking" option of receent
> Sun JVM's might be advantageous for this particular algorithm.
>

Those are not Tim's observations, rather a reader's comment on his blog.

I've just performed another micro-benchmark to estimate the maximum
theroretical regexp performance of Java 6 regexp library for his case.

Result: time: 29212ms. 1003 MB/s, 100000000 matches

JVM: Sun Java 6 update 2
Platform: Windows XP
Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache


Not to be a pest about it, but the Dothan revision of the Banias core only sported 2MBs of L2 cache.
 

So *even* if the whole thing is done in a single thread, the
performance is around 1 gig/s... so there is absolutely no point
parallelizing this thing. Even my 3 yr old processor (let alone a more
modern Core2Duo@2GHz) would easily saturate ANY potential data stream
you can supply... a typical HDD doesn't go over 80 MB/s.

Storm in a tea-cup :-P

- Akhilesh


Re: [scala] Scala Actors - no speedup?

by Randall Schulz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 25 September 2007 14:13, Akhilesh Mritunjai wrote:
> ...
>
> So *even* if the whole thing is done in a single thread, the
> performance is around 1 gig/s... so there is absolutely no point
> parallelizing this thing. Even my 3 yr old processor (let alone a
> more modern Core2Duo@2GHz) would easily saturate ANY potential data
> stream you can supply... a typical HDD doesn't go over 80 MB/s.

"Typical" personal / desktop computers, maybe.

But servers today often use 15,000 RPM drives connected by Ultra-320
SCSI busses.

The last workstation I put together for my office uses one 15,000 RPM
Ultra-320 drive and on 10,000 RPM SATA drive.


Furthermore, the first flash-RAM-based mass storage devices (as in disk
replacements) are now becoming available. I don't know what their
performance characteristics are, but it's time for software engineers
to start preparing for computing hardware that is, mercifully, free of
rotating electromechanical mass storage. Hallelujah!


> ...
> - Akhilesh


Randall Schulz

Re: [scala] Scala Actors - no speedup?

by Martin Probst-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I've just performed another micro-benchmark to estimate the maximum
> theroretical regexp performance of Java 6 regexp library for his case.
>
> Result: time: 29212ms. 1003 MB/s, 100000000 matches
>
> JVM: Sun Java 6 update 2
> Platform: Windows XP
> Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache
>
> So *even* if the whole thing is done in a single thread, the
> performance is around 1 gig/s... so there is absolutely no point
> parallelizing this thing. Even my 3 yr old processor (let alone a more
> modern Core2Duo@2GHz) would easily saturate ANY potential data stream
> you can supply... a typical HDD doesn't go over 80 MB/s.

Interesting. But on my mac, judging from the CPU monitor, the program  
is indeed CPU bound, or at least it fills up the CPU.

A trivial program just reading in the file line-by-line take about 12  
seconds. With regular expressions, in a single threaded case, it's  
about 18 seconds. So the regular expression matching does have an  
impact on the programs execution time. My use case are 12000000  
lines, of which 10% match the regular expression, which is a 270 MB  
file. Of course, this is with "warm" buffers, i.e. after reading the  
file several times.

Regards,
Martin

Re: [scala] Scala Actors - no speedup?

by Alex Boisvert-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 9/25/07, Akhilesh Mritunjai <mritun@...> wrote:
On 9/25/07, Alex Boisvert <boisvert@...> wrote:
> Interestingly, the SerialAnalyser runs faster on my machine on a 15M dataset.   So
> I'm now leaning towards blaming the JIT compiler for the anomaly.
> Microbenchmarks and all...

Umm, jumping the gun ?

I personally would look at other places first exhaustively. Same JIT
compiler is being used in large number of parallel applications world
over, if your statement is really representative, these guys won't be
bothering with multithreading & parallel processing in Java/JVM.

I too shall try to reimplement it, and let you know about the results.


I wasn't saying the JIT didn't work properly... I meant that it takes different times to warm up on different apps.  If you're using Actors, you're loading more classes and you're JIT-ing more classes.   In 1.3 second the JIT doesn't really have time to kick-in completely.   So it's a micro-benchmark and it's difficult to compare given the startup overhead of the JVM.

alex



Re: [scala] Scala Actors - no speedup?

by Akhilesh Mritunjai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 9/26/07, Randall R Schulz <rschulz@...> wrote:

> On Tuesday 25 September 2007 14:13, Akhilesh Mritunjai wrote:
> > ...
> >
> > So *even* if the whole thing is done in a single thread, the
> > performance is around 1 gig/s... so there is absolutely no point
> > parallelizing this thing. Even my 3 yr old processor (let alone a
> > more modern Core2Duo@2GHz) would easily saturate ANY potential data
> > stream you can supply... a typical HDD doesn't go over 80 MB/s.
>
> "Typical" personal / desktop computers, maybe.
>
> But servers today often use 15,000 RPM drives connected by Ultra-320
> SCSI busses.

Their typical performance is around 100 MB/s.

> Furthermore, the first flash-RAM-based mass storage devices (as in disk
> replacements) are now becoming available. I don't know what their
> performance characteristics are, but it's time for software engineers

Their typical performance is even less than rotating magnetic disks
and max out around 80 MB/s.

Both of them contribute nothing to streaming data bandwidth, both only
lower the disk lantencies (2ms typical for 15k rpm disks, few uS for
flash).

For more bandwidth, massively parallel setups are used. Sun's x4500
(aka Thumper) can sustain around 2GB/s from its 48 disks. I haven't
seen anything (commerically available) that can sustain more. Even at
that level, a "single" simple awk script can saturate the IO for this
purpose because-
1. AWK regexp is the fastest out there
2. Thumper has got better processor (2X Opteron dual core).

So unfortunately, IO bottlenecks ain't going anywhere soon.

- Akhilesh

Re: [scala] Scala Actors - no speedup?

by Akhilesh Mritunjai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Martin

On 9/26/07, Martin Probst <mail@...> wrote:

> > I've just performed another micro-benchmark to estimate the maximum
> > theroretical regexp performance of Java 6 regexp library for his case.
> >
> > Result: time: 29212ms. 1003 MB/s, 100000000 matches
> >
> > JVM: Sun Java 6 update 2
> > Platform: Windows XP
> > Processor: Pentium -M 1.6 GHz single core (Dothan) 4MB cache
> >
> > So *even* if the whole thing is done in a single thread, the
> > performance is around 1 gig/s... so there is absolutely no point
> > parallelizing this thing. Even my 3 yr old processor (let alone a more
> > modern Core2Duo@2GHz) would easily saturate ANY potential data stream
> > you can supply... a typical HDD doesn't go over 80 MB/s.
>
> Interesting. But on my mac, judging from the CPU monitor, the program
> is indeed CPU bound, or at least it fills up the CPU.

Can you just whip up a sample code in Scala that just reads the file
(and throws away) ?

I/O, specially in higher level languages, does consume CPU cycles
because fetching the data itself requires CPU cycles. That is why Sun
x4500 (Thumper) has 2X dual core opteron CPUs in there even though
it's supposed to be a file server... sustaining 2GB/s (max it is
capable of) requires serious CPU horse power.

Your observation of introducing the regexp processing increasing the
time can also be explained with this as regexp is basically sharing
CPU with IO operations... which themselves require CPU.

The thing to really worry about in this benchmark is that IO in Scala
*can* potentially have so much overhead !! This is something I want to
explore further.

- Akhilesh
< Prev | 1 - 2 | Next >