Simple FAST lazy functional primes

View: New views
19 Messages — Rating Filter:   Alert me  

Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


First, here it is:

primes   = 2: 3: sieve 0 primes' 5
primes'  = tail primes
sieve k (p:ps) x
         = [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k primes']]
           ++ sieve (k+1) ps (p*p+2)

(thanks to Leon P.Smith for his brilliant idea of directly generating
the spans of odds instead of calling 'span' on the infinite odds list).

This code is faster than PriorityQueue-based sieve (granted, using my own
ad-hoc implementation, so YMMV) in producing the first million primes
(testing  was done running the ghc -O3 compiled code inside GHCi).
The relative performance vs the PQ-version is:

  100,000   300,000   1,000,000  primes produced
   0.6       0.75      0.97

I've recently came to the Melissa O'Neill's article (I know, I know) and
she makes the incredible claim there that the square-of-prime optimization
doesn't count if we want to improve the old and "lazy"

uprimes = 2: sieve [3,5..] where
  sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]

Her article gave me the strong impression that she claims that the only way
to fix this code is by using Priority Queue based optimization, and then
goes on to present astronomical gains in speed by implementing it.

Well, I find this claim incredible. First of all, the "naive" code fires up
its nested filters much too early, when in fact each must be started only
when the prime's square is reached (not only have its work started there -
be started there itself!), so that filters are delayed:

dprimes  = 2: 3: sieve (tail dprimes) [5,7..]  where
  sieve (p:ps) xs
         = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
           where (h,t)=span (< p*p) xs

This code right there is on par with the PQ-base code, only x3-4 slower at
generating the first million primes.

Second, the implicit control structure of linearly nested filters, piling up
in front of the supply stream, each with its prime-number-it-filters-by
hidden inside it, needs to be explicated into a data structure of
primes-to-filter-by (which is in fact the prefix of the primes list
we're building itself), so the filtering could be done by one function call
instead of many nested calls:

xprimes  = 2: 3: sieve 0 primes' [5,7..]  where
  primes' = tail xprimes
  sieve k (p:ps) xs
           = noDivs k h ++ sieve (k+1) ps (tail t)
             where (h,t)=span (< p*p) xs
  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))

>From here, using the brilliant idea of Leon P. Smith's of fusing the span
and the infinite odds supply, we arrive at the final version,

kprimes  = 2: 3: sieve 0 primes' 5  where
  primes' = tail kprimes
  sieve k (p:ps) x
           = noDivs k h ++ sieve (k+1) ps (t+2)
             where (h,t)=([x,x+2..t-2], p*p)
  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))

Using the list comprehension syntax, it turns out, when compiled with
ghc -O3, gives it another 5-10% speedup itself.

So I take it to disprove the central premise of the article, and to show
that simple lazy functional FAST primes code does in fact exist, and
that the PQ optimization - which value of course no-one can dispute - is
a far-end optimization.


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Bulat Ziganshin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Will,

Monday, November 2, 2009, 10:41:15 AM, you wrote:

> (testing  was done running the ghc -O3 compiled code inside GHCi).

afaik, -O2 should be used


--
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@...

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Sjoerd Visscher-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You can remove the "take k" step by passing along the list of primes  
smaller than p instead of k:

primes = 2 : 3 : 5 : 7 : sieve [3, 2] (drop 2 primes)
sieve qs@(q:_) (p:ps)
         = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
           ++ sieve (p:qs) ps

I also removed the "x" parameter by generating it from the previous  
prime. But this also means you have to start at p=5 and q=3, because  
you want q*q+2 to be odd.

Sjoerd

On Nov 2, 2009, at 8:41 AM, Will Ness wrote:

>
> First, here it is:
>
> primes   = 2: 3: sieve 0 primes' 5
> primes'  = tail primes
> sieve k (p:ps) x
>         = [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k  
> primes']]
>           ++ sieve (k+1) ps (p*p+2)
>
> (thanks to Leon P.Smith for his brilliant idea of directly generating
> the spans of odds instead of calling 'span' on the infinite odds  
> list).
>
> This code is faster than PriorityQueue-based sieve (granted, using  
> my own
> ad-hoc implementation, so YMMV) in producing the first million primes
> (testing  was done running the ghc -O3 compiled code inside GHCi).
> The relative performance vs the PQ-version is:
>
>  100,000   300,000   1,000,000  primes produced
>   0.6       0.75      0.97
>
> I've recently came to the Melissa O'Neill's article (I know, I know)  
> and
> she makes the incredible claim there that the square-of-prime  
> optimization
> doesn't count if we want to improve the old and "lazy"
>
> uprimes = 2: sieve [3,5..] where
>  sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]
>
> Her article gave me the strong impression that she claims that the  
> only way
> to fix this code is by using Priority Queue based optimization, and  
> then
> goes on to present astronomical gains in speed by implementing it.
>
> Well, I find this claim incredible. First of all, the "naive" code  
> fires up
> its nested filters much too early, when in fact each must be started  
> only
> when the prime's square is reached (not only have its work started  
> there -
> be started there itself!), so that filters are delayed:
>
> dprimes  = 2: 3: sieve (tail dprimes) [5,7..]  where
>  sieve (p:ps) xs
>         = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
>           where (h,t)=span (< p*p) xs
>
> This code right there is on par with the PQ-base code, only x3-4  
> slower at
> generating the first million primes.
>
> Second, the implicit control structure of linearly nested filters,  
> piling up
> in front of the supply stream, each with its prime-number-it-filters-
> by
> hidden inside it, needs to be explicated into a data structure of
> primes-to-filter-by (which is in fact the prefix of the primes list
> we're building itself), so the filtering could be done by one  
> function call
> instead of many nested calls:
>
> xprimes  = 2: 3: sieve 0 primes' [5,7..]  where
>  primes' = tail xprimes
>  sieve k (p:ps) xs
>           = noDivs k h ++ sieve (k+1) ps (tail t)
>             where (h,t)=span (< p*p) xs
>  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
>
>> From here, using the brilliant idea of Leon P. Smith's of fusing  
>> the span
> and the infinite odds supply, we arrive at the final version,
>
> kprimes  = 2: 3: sieve 0 primes' 5  where
>  primes' = tail kprimes
>  sieve k (p:ps) x
>           = noDivs k h ++ sieve (k+1) ps (t+2)
>             where (h,t)=([x,x+2..t-2], p*p)
>  noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
>
> Using the list comprehension syntax, it turns out, when compiled with
> ghc -O3, gives it another 5-10% speedup itself.
>
> So I take it to disprove the central premise of the article, and to  
> show
> that simple lazy functional FAST primes code does in fact exist, and
> that the PQ optimization - which value of course no-one can dispute  
> - is
> a far-end optimization.
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@...
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--
Sjoerd Visscher
sjoerd@...



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Sjoerd Visscher-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Excuse me, 2 doesn't have to be in the list of smaller primes, as  
we're only generating odd numbers:

primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
sieve qs@(q:_) (p:ps)
        = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
          ++ sieve (p:qs) ps

Sjoerd

On Nov 2, 2009, at 2:07 PM, Sjoerd Visscher wrote:

> You can remove the "take k" step by passing along the list of primes  
> smaller than p instead of k:
>
> primes = 2 : 3 : 5 : 7 : sieve [3, 2] (drop 2 primes)
> sieve qs@(q:_) (p:ps)
>        = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
>          ++ sieve (p:qs) ps
>
> I also removed the "x" parameter by generating it from the previous  
> prime. But this also means you have to start at p=5 and q=3, because  
> you want q*q+2 to be odd.
>
> Sjoerd
>
> On Nov 2, 2009, at 8:41 AM, Will Ness wrote:
>
>>
>> First, here it is:
>>
>> primes   = 2: 3: sieve 0 primes' 5
>> primes'  = tail primes
>> sieve k (p:ps) x
>>        = [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k  
>> primes']]
>>          ++ sieve (k+1) ps (p*p+2)
>>
>> (thanks to Leon P.Smith for his brilliant idea of directly generating
>> the spans of odds instead of calling 'span' on the infinite odds  
>> list).
>>
>> This code is faster than PriorityQueue-based sieve (granted, using  
>> my own
>> ad-hoc implementation, so YMMV) in producing the first million primes
>> (testing  was done running the ghc -O3 compiled code inside GHCi).
>> The relative performance vs the PQ-version is:
>>
>> 100,000   300,000   1,000,000  primes produced
>>  0.6       0.75      0.97
>>
>> I've recently came to the Melissa O'Neill's article (I know, I  
>> know) and
>> she makes the incredible claim there that the square-of-prime  
>> optimization
>> doesn't count if we want to improve the old and "lazy"
>>
>> uprimes = 2: sieve [3,5..] where
>> sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]
>>
>> Her article gave me the strong impression that she claims that the  
>> only way
>> to fix this code is by using Priority Queue based optimization, and  
>> then
>> goes on to present astronomical gains in speed by implementing it.
>>
>> Well, I find this claim incredible. First of all, the "naive" code  
>> fires up
>> its nested filters much too early, when in fact each must be  
>> started only
>> when the prime's square is reached (not only have its work started  
>> there -
>> be started there itself!), so that filters are delayed:
>>
>> dprimes  = 2: 3: sieve (tail dprimes) [5,7..]  where
>> sieve (p:ps) xs
>>        = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
>>          where (h,t)=span (< p*p) xs
>>
>> This code right there is on par with the PQ-base code, only x3-4  
>> slower at
>> generating the first million primes.
>>
>> Second, the implicit control structure of linearly nested filters,  
>> piling up
>> in front of the supply stream, each with its prime-number-it-
>> filters-by
>> hidden inside it, needs to be explicated into a data structure of
>> primes-to-filter-by (which is in fact the prefix of the primes list
>> we're building itself), so the filtering could be done by one  
>> function call
>> instead of many nested calls:
>>
>> xprimes  = 2: 3: sieve 0 primes' [5,7..]  where
>> primes' = tail xprimes
>> sieve k (p:ps) xs
>>          = noDivs k h ++ sieve (k+1) ps (tail t)
>>            where (h,t)=span (< p*p) xs
>> noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
>>
>>> From here, using the brilliant idea of Leon P. Smith's of fusing  
>>> the span
>> and the infinite odds supply, we arrive at the final version,
>>
>> kprimes  = 2: 3: sieve 0 primes' 5  where
>> primes' = tail kprimes
>> sieve k (p:ps) x
>>          = noDivs k h ++ sieve (k+1) ps (t+2)
>>            where (h,t)=([x,x+2..t-2], p*p)
>> noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
>>
>> Using the list comprehension syntax, it turns out, when compiled with
>> ghc -O3, gives it another 5-10% speedup itself.
>>
>> So I take it to disprove the central premise of the article, and to  
>> show
>> that simple lazy functional FAST primes code does in fact exist, and
>> that the PQ optimization - which value of course no-one can dispute  
>> - is
>> a far-end optimization.
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe@...
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> --
> Sjoerd Visscher
> sjoerd@...
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@...
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--
Sjoerd Visscher
sjoerd@...



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sjoerd Visscher <sjoerd <at> w3future.com> writes:

> [...] 2 doesn't have to be in the list of smaller primes, as  
> we're only generating odd numbers:
>
> primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
> sieve qs@(q:_) (p:ps)
>         = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
>           ++ sieve (p:qs) ps
>
> Sjoerd

Thanks!

I haven't tested your code's speed yet, but have few points:

i wouldn't expect eliminating a parameter to have any effect on performance in
compiled code (I haven't a clue whether -O2 or -O3 should be used BTW) if it
doesn't eliminate any superfluous computations.

second, you prepend bigger primes in front of a list (building the prefix in
reverse order); this will cause them to be tested against first. I think (1)
we're supposed to test composites against smaller factors first, for the tests
to stop sooner. IMO it'll slow the code down. I think (2) you can safely append
them at the end instead, for a compiler should be able to just maintain a tail
pointer there. But then you loose immediate access to the last found prime; so
will need to return the 'x' parameter back. Then you end up with my code exactly
:) (only replicating the prefix):

primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes) 9
sieve pfx (p:ps) x
       = [x | x<-[x+2,x+4..p*p-2], and [(x`rem`p)/=0 | p<-pfx]]
         ++ sieve (pfx++[p]) ps (p*p)

it'll be interesting to test my hypothesis (two of them :) ) and see if this has
in fact better time than your code.

thirdly, (take k) could be compiled out completely into a tail pointer too,
maintained and advanced after each step. I would hope a smart compiler to do
just that. Again, some more testing is in order. Although, I tested the two
approaches on some previous incarnation of this code, and the (take k) vs
(pfx++[p]) had exactly same run time (or was better).

What I'm after mostly here, is for the code to be clear and simple, and
reasonably efficient. Right now it corresponds very nicely to the description of
the sieve as filtering all the odds testable by a known prefix of primes, and
then going on to proceed with the next batch of them with a prefix that was
grown by one more prime. And so on.

But more importantly I want it to be known that there's a lot that can be done
here, in a natural functional lazy kind of way, before resorting to priority
queues and mutable arrays. We could always just use C too. ;)

Thanks for the feedback! :)


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Will Ness <will_n48 <at> yahoo.com> writes:

> But more importantly I want it to be known that there's a lot that can be done
> here, in a natural functional lazy kind of way, before resorting to priority
> queues and mutable arrays. We could always just use C too. ;)

I mean it as an introductory code that's nevertheless good for producing the
first million primes or so - not the best sieve ever, but the best (is it?)
simple clear functional introductory sieve, instead. That's another reason to
using (take k primes') - it practically says it in English, "the first k odd
primes".

:)



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sjoerd Visscher <sjoerd <at> w3future.com> writes:

>
> Excuse me, 2 doesn't have to be in the list of smaller primes, as  
> we're only generating odd numbers:
>
> primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
> sieve qs@(q:_) (p:ps)
>         = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
>           ++ sieve (p:qs) ps
>
> Sjoerd
>

Hi,

I've run it now. In producing 100,000 primes, your above code takes x3.5 more
time than the one I posted. The code modified as I suggested with (qs++[p])
takes exactly the same time as mine. :)

Cheers,


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: Simple FAST lazy functional primes

by Sjoerd Visscher-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 2, 2009, at 5:11 PM, Will Ness wrote:

> Sjoerd Visscher <sjoerd <at> w3future.com> writes:
>
>>
>> Excuse me, 2 doesn't have to be in the list of smaller primes, as
>> we're only generating odd numbers:
>>
>> primes = 2 : 3 : 5 : 7 : sieve [3] (drop 2 primes)
>> sieve qs@(q:_) (p:ps)
>>        = [x | x<-[q*q+2,q*q+4..p*p-2], and [(x`rem`p)/=0 | p<-qs]]
>>          ++ sieve (p:qs) ps
>>
>> Sjoerd
>>
>
> Hi,
>
> I've run it now. In producing 100,000 primes, your above code takes  
> x3.5 more
> time than the one I posted.

Too bad. The order of the primes when filtering matters quite a lot!

> The code modified as I suggested with (qs++[p])
> takes exactly the same time as mine. :)

Yes, append and take are both O(n).

Here's another variation which I rather like:

primes = 2 : 3 : sieve (tail primes) (notDivides 3) 5
sieve (p:ps) test from
         = [x | x <- [from, from + 2 .. p * p - 2], test x]
           ++ sieve ps (\x -> test x && notDivides p x) (p * p + 2)
notDivides d x = (x `rem` d) /= 0

I'm curious if GHC can compile this to the same speed as your code.

--
Sjoerd Visscher
sjoerd@...



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sjoerd Visscher <sjoerd <at> w3future.com> writes:

> On Nov 2, 2009, at 5:11 PM, Will Ness wrote:
>
> > Hi,
> >
> > I've run it now. In producing 100,000 primes, your above code takes  
> > x3.5 more
> > time than the one I posted.
>
> Too bad. The order of the primes when filtering matters quite a lot!

Yep. Most of the composites will have mostly small prime factors, most of the
time. :) An English paraphrase of the analysis from the paper, when she proves
that whatever you do with composites, doesn't matter (in the _long_ run!) if you
test primes for divisibility - because for primes we end up testing against
_all_ of the primes in the prefix. This is exactly where her PQ code achieves
its coup - it gets its primes for free when it checks its composites and sees
the gap there.

So the reason not to have these nested functions is to have all the logic
represented in the open, in the data structure - here the list, still. PQ is a
natural next step. Having a lot of linearly nested functions, there isn't a lot
we can do with them. :)

> > The code modified as I suggested with (qs++[p])
> > takes exactly the same time as mine. :)
>
> Yes, append and take are both O(n).

I would hope 'append' to be O(1), and 'take' just translated into an upper limit
of the testing loop in 'and'.

>
> Here's another variation which I rather like:
>
> primes = 2 : 3 : sieve (tail primes) (notDivides 3) 5
> sieve (p:ps) test from
>          = [x | x <- [from, from + 2 .. p * p - 2], test x]
>            ++ sieve ps (\x -> test x && notDivides p x) (p * p + 2)
> notDivides d x = (x `rem` d) /= 0
>
> I'm curious if GHC can compile this to the same speed as your code.

It depends on whether it is as good at traversing function call frames at run
time, as elements of a list. Or whether it uses the call frames at all, or is
able to compile away all of it.

:)



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Will Ness <will_n48 <at> yahoo.com> writes:

> primes   = 2: 3: sieve 0 primes' 5
> primes'  = tail primes
> sieve k (p:ps) x
>          = [x | x <- [x,x+2..p*p-2],
>                  and [(x`rem`p)/=0 | p <- take k primes']]
>            ++ sieve (k+1) ps (p*p+2)
>
> (thanks to Leon P.Smith for his brilliant idea of directly generating
> the spans of odds instead of calling 'span' on the infinite odds list).
>
> The relative performance vs the PQ-version is:
>
>   100,000   300,000   1,000,000  primes produced
>    0.6       0.75      0.97
>

One _crucial_ tidbit I've left out: _type_signature_.

Adding (:: [Int]) speeds this code up more than TWICE!

:) :)

'sieve' can also be used in e.g.

primesFrom m = sieve (length h) ps m where
  (h,(_:ps)) = span (<= (floor.sqrt.fromIntegral) m) primes

to get few big primes even faster.

:)


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: Simple FAST lazy functional primes

by Jason Dagit-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Mon, Nov 2, 2009 at 1:59 PM, Will Ness <will_n48@...> wrote:
Will Ness <will_n48 <at> yahoo.com> writes:

> primes   = 2: 3: sieve 0 primes' 5
> primes'  = tail primes
> sieve k (p:ps) x
>          = [x | x <- [x,x+2..p*p-2],
>                  and [(x`rem`p)/=0 | p <- take k primes']]
>            ++ sieve (k+1) ps (p*p+2)
>
> (thanks to Leon P.Smith for his brilliant idea of directly generating
> the spans of odds instead of calling 'span' on the infinite odds list).
>
> The relative performance vs the PQ-version is:
>
>   100,000   300,000   1,000,000  primes produced
>    0.6       0.75      0.97
>

One _crucial_ tidbit I've left out: _type_signature_.

Adding (:: [Int]) speeds this code up more than TWICE!

:) :)

If you are okay with Int, then maybe you're also happy with Int32 or Word32.  If so, why don't you use template haskell to build the list at compile time?  If you do that, then getting the kth prime at run-time is O(k).  Take that AKS!  :)

Jason

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Dagit <dagit <at> codersbase.com> writes:

> On Mon, Nov 2, 2009 at 1:59 PM, Will Ness <will_n48 <at> yahoo.com> wrote:
> Will Ness <will_n48 <at> yahoo.com> writes:
>
> One _crucial_ tidbit I've left out: _type_signature_.
> Adding (:: [Int]) speeds this code up more than TWICE!
> :) :)
>
>
> If you are okay with Int, then maybe you're also happy with Int32 or Word32.
 If so, why don't you use template haskell to build the list at compile time?
 If you do that, then getting the kth prime at run-time is O(k).  Take that
 AKS!  :)
>


O(k), I've removed it since the post actually. Wasn't thinking clearly for a
moment, having seen the double speedup!

I've found Matthew Brecknell's fast code in old Melissa O'Neill's article here
from 2007-02-19 18:14:23 GMT. Without the type signature, it's twice slower than
my code.

I think it is a fairly faithful translation now of what the sieve is all about,
except that it tests its candidate numbers whereas sieve counts over them (and
thus is able to skip over primes without testing them). The usual functional
approach has it working with each number in isolation, so tests it (to recreate
counting state in effect), thus overworks much on primes.

Next logical step is to start counting!


:)



> Jason
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: Simple FAST lazy functional primes

by Felipe Lessa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 02, 2009 at 02:10:12PM -0700, Jason Dagit wrote:
> If you are okay with Int, then maybe you're also happy with Int32 or Word32.
>  If so, why don't you use template haskell to build the list at compile
> time?  If you do that, then getting the kth prime at run-time is O(k).  Take
> that AKS!  :)

Silly you!  If he's into TH than he may just create an -- god
forbid -- array and get the kth prime in O(1).  :)

--
Felipe.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: Simple FAST lazy functional primes

by Jason Dagit-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Mon, Nov 2, 2009 at 2:40 PM, Will Ness <will_n48@...> wrote:
Jason Dagit <dagit <at> codersbase.com> writes:

> On Mon, Nov 2, 2009 at 1:59 PM, Will Ness <will_n48 <at> yahoo.com> wrote:
> Will Ness <will_n48 <at> yahoo.com> writes:
>
> One _crucial_ tidbit I've left out: _type_signature_.
> Adding (:: [Int]) speeds this code up more than TWICE!
> :) :)
>
>
> If you are okay with Int, then maybe you're also happy with Int32 or Word32.
 If so, why don't you use template haskell to build the list at compile time?
 If you do that, then getting the kth prime at run-time is O(k).  Take that
 AKS!  :)
>


O(k), I've removed it since the post actually. Wasn't thinking clearly for a
moment, having seen the double speedup!

By the way, do you understand where the speedup with Int is coming from?  As I understand it, there are two main places.  One is that the type class dictionary passing can be removed (GHC might figure this out already, I'd need to check the core to be sure).  The other is that GHC is likely unboxing to it's primitive Int# type.

If none of that made sense, or you'd like to know how you can double check what I'm talking about, then Real-World Haskell has a chapter with an example:

Good luck!
Jason

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Parent Message unknown Re: Simple FAST lazy functional primes

by Steven Chaplin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Will,

I had previously tested the Melissa O'Neil prime function (using the
primes 0.1.1 package) and found it to be fast (for a functional
approach), but with high memory use.

To fully test a prime function, I think you should:
1. Test for more than the first 10^6 primes.
Generating all primes to 10^6 is quite easy for modern computers. Most
prime generating functions will run in less than 1 sec and look fast.
Try generating all primes to 10^7 and 10^8 then you will see how 'fast'
these lazy functional methods really are.
2. Measure memory use.
As you move above 10^6 and test primes up to 10^7, 10^8, and 10^9,
memory use becomes very important. A prime function with excessive
memory use will soon consume all of the system's memory.

Here's another primes function which performs quite well.

primes :: [Int]
primes = 2 : filter (isPrime primes) [3,5..]  where
  isPrime (p:ps) n
    | mod n p == 0 = False
    | p*p > n      = True
    | otherwise    = isPrime ps n
  isPrime [] _ = False -- never used, but avoids compiler warning

Here's some results from my PC, for generating primes to 10^8.

         10**6      10**7       10**8
      secs MiB   secs MiB   secs  MiB
      -------------------------------
1     0.01   0    0.1   2      2   14
2     0.56   7   11.1  43    270  306
3     0.61   7   11.8  44    260  342
4     0.43  36    5.4 345    900 not finished

1 using a Haskell ST Array
2 your primes function
3 my primes function, listed above
4 Melissa O'Neil method from primes 0.1.1 package

To summarise the results from the tests I've run:
- if you want fast functional primes, forget it, its not possible.
- if you just want fast primes, then use C, or a Haskell array.
- if performance does not matter and slow primes are acceptable, then
use the purely functional approach.

Regards,
Steve


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Dagit <dagit <at> codersbase.com> writes:

> By the way, do you understand where the speedup with Int is coming from?  As I
understand it, there are two main places.  One is that the type class dictionary
passing can be removed (GHC might figure this out already, I'd need to check the
core to be sure).  The other is that GHC is likely unboxing to it's primitive
Int# type.
> [...]
> Good luck!
> Jason


Thanks!

Writing the super-fast sieve wasn't my objective here though. It rather was
writing the fastest possible simple functional lazy code true to the sieve's
definition, and understanding it better that way (that's the added bonus).

As it stands now, this code seems a rather faithful description of what _is_
sieve, except that it tests each number in isolation instead of counting over a
bunch of them at once (skipping over primes, getting them for free). THAT's the
crucial difference, which the article seems trying to explain but never quite
gets it in such simple terms. All the extra activity is kept to absolute minimum
here, and _now_ the main thing can be dealt with further, if so desired - like
turning to using the PQ thing, etc.

Then if we were to compare them, it wouldn't be like comparing apples with
orange juice. :)  That's what it felt like, seeing the PQ code compared with the
classic "naive" version in that article. I'm reasonably sure that PQ-augmented,
this code will be even faster, not slower, even for the first million primes.

This whole experience proves it that the clearest code can also be the fastest
(and may be necessarily so). Seeing it described in that article as if clarity
must be paid for with efficiency (and vice versa), didn't seem right to me.


Cheers,
 


> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Steve,

Steve <stevech1097 <at> yahoo.com.au> writes:

>
> Hi Will,
>
> I had previously tested the Melissa O'Neil prime function (using the
> primes 0.1.1 package) and found it to be fast (for a functional
> approach), but with high memory use.
>
> To fully test a prime function, I think you should:
> 1. Test for more than the first 10^6 primes.
> Generating all primes to 10^6 is quite easy for modern computers. Most
> prime generating functions will run in less than 1 sec and look fast.
> Try generating all primes to 10^7 and 10^8 then you will see how 'fast'
> these lazy functional methods really are.
> 2. Measure memory use.
> As you move above 10^6 and test primes up to 10^7, 10^8, and 10^9,
> memory use becomes very important. A prime function with excessive
> memory use will soon consume all of the system's memory.
>
> Here's another primes function which performs quite well.
>
> primes :: [Int]
> primes = 2 : filter (isPrime primes) [3,5..]  where
>   isPrime (p:ps) n
>     | mod n p == 0 = False
>     | p*p > n      = True
>     | otherwise    = isPrime ps n
>   isPrime [] _ = False -- never used, but avoids compiler warning
>
> Here's some results from my PC, for generating primes to 10^8.
>
>          10**6      10**7       10**8
>       secs MiB   secs MiB   secs  MiB
>       -------------------------------
> 1     0.01   0    0.1   2      2   14
> 2     0.56   7   11.1  43    270  306
> 3     0.61   7   11.8  44    260  342
> 4     0.43  36    5.4 345    900 not finished
>
> 1 using a Haskell ST Array
> 2 your primes function
> 3 my primes function, listed above
> 4 Melissa O'Neil method from primes 0.1.1 package
>
> To summarise the results from the tests I've run:
> - if you want fast functional primes, forget it, its not possible.
> - if you just want fast primes, then use C, or a Haskell array.
> - if performance does not matter and slow primes are acceptable, then
> use the purely functional approach.
>
> Regards,
> Steve
>


you just have a fast PC that's all, :) so a million is not enough for you. My
old laptop is 50 times slower. :)

Seriously though, your conclusions seem entirely plausible to me. My goal here
was to have a Haskell code for primes, that is decent. Nothing more.

The reason your code is slightly slower is of course that in effect it
recalculates the needed primes prefix for each new candidate. If you could
somehow thread its length through, it might have sped it up some more.

I've just tried it and it was twice slower than mine. (?) I didn't use the [Int]
signature in both. Maybe it's not real issues we're dealing with here, but
compiler/OS/CPU issues? (or have you've forgotten to put the [Int] signature
into my code too, when tested? It runs twice as fast with it). Although your
code has an advantage that it is very easy to add the wheel optimization to it.

BTW I don't know about the code in the package, but the one in the article makes
terrible faux-pas of adding each prime into the queue as soon as it is produced;
this could easily account for a memory blow-up you're seeing. What's really
needed, is to plug a decent PQ implementation into my framework, which does
absolute minimum of repeated calculations it seems.

What I have now, is this:

 qprimes   = 2: 3: sieve emptyQ primes' 5  where
  primes'  = tail qprimes        
  sieve q (p:ps) x                  
           = h ++ sieve (addQ q' (2*p) (p*p+2*p)) ps (p*p+2)  
             where
               (h,q') = noComps q [x,x+2..p*p-2]
  ......

The main deficiency of list-based sieves, as I've now came to realize and
formulate in simple enough terms for myself to understand, is that they work
with each number in isolation, and thus must test primes as well as composites.
Testing composites on average is cheap, because most of them will have small
factors; testing primes is costly.

Imperative sieves avoid that by working over spans of numbers at once, so they
get their primes for free, when they "see" gaps in produced/marked composites (I
repeat myself here, but am not sure if you've read this my explanation in other
posts). What counts here, is the direct access to random memory - or the numbers
written out on a blackboard. Melissa's PQ approach tries to emulate that. Not
crossing them off, but seeing the gaps in between.

One is tempted to treat Haskell as high-level executable definition, hoping for
a compiler to turn it into the fastest possible low-level code. I know I can
translate it into C by hand fairly well; why wouldn't the compiler? It seems
Haskell compilers have much room for improvement. :)





_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Parent Message unknown Re: Simple FAST lazy functional primes

by Steven Chaplin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-11-03 at 10:52 -0500, Will wrote:

> I've just tried it and it was twice slower than mine. (?) I didn't use
> the [Int]
> signature in both. Maybe it's not real issues we're dealing with here,
> but
> compiler/OS/CPU issues? (or have you've forgotten to put the [Int]
> signature
> into my code too, when tested? It runs twice as fast with it).
> Although your
> code has an advantage that it is very easy to add the wheel
> optimization to it.

I used [Int] for both. If you didn't use [Int] then it defaults to
[Integer] and you are probably testing the speed of GMP integer
operations (rather than the speed of Haskell Int operations) which could
give differing conclusions.

I had looked into using a wheel before. Its nice in theory, but not so
useful in practice. At least that's my experience where using a wheel
made the primes function slower.

> What I have now, is this:
>
>  qprimes   = 2: 3: sieve emptyQ primes' 5  where
>   primes'  = tail qprimes        
>   sieve q (p:ps) x                  
>            = h ++ sieve (addQ q' (2*p) (p*p+2*p)) ps (p*p+2)  
>              where
>                (h,q') = noComps q [x,x+2..p*p-2]
>   ......

Adding your code and conclusions to "Prime numbers" on the Haskell wiki,
could be useful.
http://www.haskell.org/haskellwiki/Prime_numbers
I've just updated the page to split it into "Finding Primes" and
"Testing Primality".

Steve

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Simple FAST lazy functional primes

by Will Ness-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Steve <stevech1097 <at> yahoo.com.au> writes:

>
> On Tue, 2009-11-03 at 10:52 -0500, Will wrote:
> > I've just tried it and it was twice slower than mine. (?) I didn't use
> > the [Int]
> > signature in both. [...] It runs twice as fast with it).
> > Although your
> > code has an advantage that it is very easy to add the wheel
> > optimization to it.
>
> I used [Int] for both. If you didn't use [Int] then it defaults to
> [Integer] and you are probably testing the speed of GMP integer
> operations (rather than the speed of Haskell Int operations) which could
> give differing conclusions.

When both are declared [Int], the ratios are [1.50, 1.40, 1.35, 1.26, 1.26] for
speed of your version vs mine in producing 100,000, 300,000, 1, 2 and 3 million
primes, on my laptop, with memory consistently at 120% (as reported by GHCi
with :s +s switch). Haven't tried any above that, as it's old and slow. But
clearly the two versions are on par with each  other. The reason I prefer my
version is that it's in a form easy to be tweaked with further. :)


> I had looked into using a wheel before. Its nice in theory, but not so
> useful in practice. At least that's my experience where using a wheel
> made the primes function slower.

interesting.


> Adding your code and conclusions to "Prime numbers" on the Haskell wiki,
> could be useful.
> http://www.haskell.org/haskellwiki/Prime_numbers

Thank you for pointing me to that page. I've just done that. :) The short
description that I've arrived at in the end, is that my version explicitly uses
successive prefixes of the primes list to test batches of odds between
successive squares of primes. Now it's short and clear.

Thanks!


> I've just updated the page to split it into "Finding Primes" and
> "Testing Primality".
>
> Steve
>




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe