Re: [scala-user] Arrays, Lists and parallelism

View: New views
6 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: [scala-user] Arrays, Lists and parallelism

by Daniel Sobral :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nope, not dreaming. Yep, I'm seeing that too.
 
Scala is not a non-strict by default language. You have to be quite specific to get a non-strict behavior, and most newcomers to the language are quite unaware such a possibility even exists. The non-strictness is carefully segregated into things like Stream, projection and lazy val.
 
Except, of course, for the ranges, which are basic tools for programmers everywhere.
 
I agree with you. Ranges ought to be strict by default, and the sooner that is done, the better. In fact, since Ranges were reworked for Scala 2.8 -- see the Range.Inclusive discussion, for instance -- then this would be a very good time to do it.

On Wed, Oct 14, 2009 at 2:20 PM, Paul Phillips <paulp@...> wrote:
On Wed, Oct 14, 2009 at 06:10:08PM +0100, Russel Winder wrote:
> I just forwarded my reply to Brian to the list.  It has a trio of code
> fragments that indicate the problem I am seeing.  Basically a for
> comprehension delivers a List of Futures but it is treated
> sequentially. If I toArray it that List to get an Array of Futures
> then I get parallelism.

Am I dreaming here? It's the exact same issue as in the "yields the
best of me..." thread! It has nothing to do with List vs. Array and
everything to do with ".toArray" vs. leaving it in a Range.

I guess I couldn't ask for clearer evidence that this is going to be an
ongoing problem.


>           val partialSums = for ( index <- 0 until numberOfThreads ) yield
>             Futures.future {
>             }
>
> runs sequentially!  The fragment:
>
>           val partialSums = ( for ( index <- 0 until numberOfThreads ) yield
>             Futures.future {
>               . . .
>             } ).toArray
> runs in parallel.  So when it comes to parallel computations, Arrays
> rule and Lists suck :-)
>

--
Paul Phillips      | Adultery is the application of democracy to love.
Caged Spirit       |     -- H. L. Mencken
Empiricist         |
pp: i haul pills   |----------* http://www.improving.org/paulp/ *----------



--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.

Re: [scala-user] Arrays, Lists and parallelism

by Razvan Cojocaru :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't believe that ranges are the basic tools of programmers everywhere. If you count the times you use for to iterate over all kinds of collection vs over a range, in JDK6, the obvious winner is the iterator version simply because in life, except the unhappy DOM.NodeList, nothing is really fixed and known up-front, but dynamic, i.e. obtained from a DB query or received via an API and it wouldn't be fixed size...

I'm talking about real life programs, not numbers theory and the usual examples found in books and academia...

Before JDK 6, yes, everyone used the range options simply because it was more concise than using iterators.

JDK5 iteration:
   Iterator i = mylist.iterator();
   while (i.hasNext()) dostuff (i.next) // i is also consumed so you normally assign it to a local val: 3 more lines

JDK 5 ("smart" iteration as in less code)
   for (int i = 0; i < mylist.size(); i++) dostuff (mylist.get(i))   // old jdk5 iteration

vs JDK6 iteration
   for (Element e : mylist) dostuff (e)

vs scala iteration, where there be dragons that yield lazy non-debuggables...

Maybe I'm wrong and most programmers didn't wake up yet to the iterative version in JDK6...

One thing I love about Scala is how Array has a .foreach :) now I can really iterate over anything with the same code...well until you hit the for-yield dragons :)

Bottom line: I don't consider Range to be an issue, as is implemented. I consider the for-yield to be an issue since it's presented as a for loop, which is not!

Daniel Sobral wrote:
Nope, not dreaming. Yep, I'm seeing that too.

Scala is not a non-strict by default language. You have to be quite specific
to get a non-strict behavior, and most newcomers to the language are quite
unaware such a possibility even exists. The non-strictness is carefully
segregated into things like Stream, projection and lazy val.

Except, of course, for the ranges, which are basic tools for programmers
everywhere.

I agree with you. Ranges ought to be strict by default, and the sooner that
is done, the better. In fact, since Ranges were reworked for Scala 2.8 --
see the Range.Inclusive discussion, for instance -- then this would be a
very good time to do it.

On Wed, Oct 14, 2009 at 2:20 PM, Paul Phillips <paulp@improving.org> wrote:

> On Wed, Oct 14, 2009 at 06:10:08PM +0100, Russel Winder wrote:
> > I just forwarded my reply to Brian to the list.  It has a trio of code
> > fragments that indicate the problem I am seeing.  Basically a for
> > comprehension delivers a List of Futures but it is treated
> > sequentially. If I toArray it that List to get an Array of Futures
> > then I get parallelism.
>
> Am I dreaming here? It's the exact same issue as in the "yields the
> best of me..." thread! It has nothing to do with List vs. Array and
> everything to do with ".toArray" vs. leaving it in a Range.
>
> I guess I couldn't ask for clearer evidence that this is going to be an
> ongoing problem.
>
>
> >           val partialSums = for ( index <- 0 until numberOfThreads )
> yield
> >             Futures.future {
> >             }
> >
> > runs sequentially!  The fragment:
> >
> >           val partialSums = ( for ( index <- 0 until numberOfThreads )
> yield
> >             Futures.future {
> >               . . .
> >             } ).toArray
>   > runs in parallel.  So when it comes to parallel computations, Arrays
> > rule and Lists suck :-)
> >
>
> --
> Paul Phillips      | Adultery is the application of democracy to love.
> Caged Spirit       |     -- H. L. Mencken
> Empiricist         |
> pp: i haul pills   |----------* http://www.improving.org/paulp/*----------
>



--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews:
review by name, review by reference and review by value.

Re: [scala-user] Arrays, Lists and parallelism

by Razvan Cojocaru :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

And here's the proof

script:
echo `find . -name "*.java" | wc -l` " files"
echo `find . -name "*.java" -exec grep "for *(" {} \; | wc -l` " for loops"
echo `find . -name "*.java" -exec grep "for *(.*[+][+].*" {} \; | wc -l` " old style"

Result:
/host/home/razvanc/trunk/frameworks/src/com/sigma/hframe\> . x
218  files
269  for loops
100  old style

- please double check the script - it seems to work but maybe not.

NOTE this is ran on some older frameworks and most of the 100 old style are in fact the "simplified" iterations like:
            for (int i = 0; i < propertyNodes.getLength(); i++) {

I'm now running it on one of our product's code base, more than 1 milion LOCs and will report back when it's done...right now it's killing my laptop :)

Razvan Cojocaru wrote:
I don't believe that ranges are the basic tools of programmers everywhere. If you count the times you use for to iterate over all kinds of collection vs over a range, in JDK6, the obvious winner is the iterator version simply because in life, except the unhappy DOM.NodeList, nothing is really fixed and known up-front, but dynamic, i.e. obtained from a DB query or received via an API and it wouldn't be fixed size...

I'm talking about real life programs, not numbers theory and the usual examples found in books and academia...
...

Re: [scala-user] Arrays, Lists and parallelism

by Razvan Cojocaru :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

REport:

/host/home/razvanc/trunk/smpcore/src/com/sigma\> . /host/home/razvanc/x
6966  files
10253  for loops
3484  old style

Q.E.D.

Re: [scala-user] Arrays, Lists and parallelism

by Razvan Cojocaru :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Interestingly enough - I ran this on anotehr of our products, which is less than 2 years old, so written entirely on JDK6 and the ratio holds - maybe I was bolder than I should've been.

/host/home/razvanc/sis\> . /host/home/razvanc/x
859  files
868  for loops
309  old style

still Q.E.D. though

P.S on an unrelated note, I wonder if the lower ratio of for per java file is indicative of the web ui vs swing ui in the other product? That's the major difference. This newer product also has a very skinny domain model - so less relationships - that may explain actually the lower ratio of FOR loops.

What that tells you though is that as the domain model grows in larger apps, the relationships grow usually geometrically, so the ratio and number of for loops that yield something grows as well. Thus the for-yield is even more important to fix than the Range!

Razvan Cojocaru wrote:
REport:

/host/home/razvanc/trunk/smpcore/src/com/sigma\> . /host/home/razvanc/x
6966  files
10253  for loops
3484  old style

Q.E.D.

Re: [scala-user] Arrays, Lists and parallelism

by Razvan Cojocaru :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Razvan Cojocaru wrote:
Interestingly enough - I ran this on anotehr of our products, which is less than 2 years old, so written entirely on JDK6 and the ratio holds - maybe I was bolder than I should've been.

/host/home/razvanc/sis\> . /host/home/razvanc/x
859  files
868  for loops
309  old style

still Q.E.D. though

P.S on an unrelated note, I wonder if the lower ratio of for per java file is indicative of the web ui vs swing ui in the other product? That's the major difference. This newer product also has a very skinny domain model - so less relationships - that may explain actually the lower ratio of FOR loops.

What that tells you though is that as the domain model grows in larger apps, the relationships grow usually geometrically, so the ratio and number of for loops that yield something grows as well. Thus the for-yield is even more important to fix than the Range!
[later edit]
 HEH - about a third of the fors are like this:
            for (int i = 0; i < oPartitionNodeList.getLength(); i++)
so that's thanks to whomever designed the DOM.NodeList

I hate to be right but when I'm right I'm right.