(flawed?) benchmark : sort

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >

(flawed?) benchmark : sort

by Krzysztof Skrzętnicki :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

I was playing with various versions of sorting algorithms. I know it's very easy to create flawed benchmark and I don't claim those are good ones. However, it really seems strange to me, that sort - library function - is actually the worse measured function. I can hardly belive it, and I'd rather say I have made a mistake preparing it.

The overall winner seems to be qsort_iv - which is nothing less but old sort replaced by mergesort now.

Any clues?

Regards
Christopher Skrzętnicki

--- cut here ---
[Tener@laptener haskell]$ ghc -O2 --make qsort.hs && ./qsort +RTS -sstderr -RTS > /dev/null
[1 of 1] Compiling Main             ( qsort.hs, qsort.o )
Linking qsort ...
./qsort +RTS -sstderr
(1.0,"iv")
(1.1896770400256864,"v")
(1.3091609772011856,"treeSort")
(1.592515715933153,"vii")
(1.5953543402198838,"vi")
(1.5961286512637272,"iii")
(1.8175480563244177,"i")
(1.8771604568641642,"ii")
(2.453160634439497,"mergeSort")
(2.6627090768870216,"sort")
26,094,674,624 bytes allocated in the heap
12,716,656,224 bytes copied during GC (scavenged)
2,021,104,592 bytes copied during GC (not scavenged)
107,225,088 bytes maximum residency (140 sample(s))

      49773 collections in generation 0 ( 21.76s)
        140 collections in generation 1 ( 23.61s)

        305 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   20.39s  ( 20.74s elapsed)
  GC    time   45.37s  ( 46.22s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   65.76s  ( 66.96s elapsed)

  %GC time      69.0%  (69.0% elapsed)

  Alloc rate    1,279,723,644 bytes per MUT second

  Productivity  31.0% of total user, 30.5% of total elapsed


--- cut here ---

{-# OPTIONS_GHC -O2 #-}
module Main where

import System.CPUTime
import System.IO
import System.Environment
import System.Random
import Data.List( partition, sort )

data Tree a =
    Node (Tree a) a (Tree a)
        | Leaf


qsort_i []  = []
qsort_i (x:xs) = qsort_i (filter (< x) xs) ++ [x] ++ qsort_i (filter (>= x) xs)

qsort_ii [] = []
qsort_ii (x:xs) = let (ls,gt) = partition (< x) xs in qsort_ii ls ++ [x] ++ qsort_ii gt

qsort_iii xs = qsort_iii' [] xs
qsort_iii' acc [] = acc
qsort_iii' acc (x:xs) =
    let (ls,gt) = partition (< x) xs in
    let acc' = (x:(qsort_iii' acc gt)) in qsort_iii' acc' ls

qsort_v [] = []
qsort_v (x:xs) = let (xlt, xgt ) = foldl (\ (lt,gt) el -> case compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt, el:gt) ) ([],[]) xs
                 in qsort_v xlt ++ [x] ++ qsort_v xgt

-- zmodyfikowany i
qsort_vi [] = []
qsort_vi (x:xs) = qsort_vi (filter (\y-> compare x y == GT) xs) ++ [x] ++ qsort_vi (filter (\y-> compare x y /= GT) xs)


-- zmodyfikowany iii
qsort_vii xs = qsort_vii' [] xs
qsort_vii' acc [] = acc
qsort_vii' acc (x:xs) =
    let (ls,gt) = partition (\y-> compare x y == GT) xs in
    let acc' = (x:(qsort_vii' acc gt)) in qsort_vii' acc' ls



-- qsort is stable and does not concatenate.
qsort_iv xs = qsort_iv' (compare) xs []

qsort_iv' _   []     r = r
qsort_iv' _   [x]    r = x:r
qsort_iv' cmp (x:xs) r = qpart cmp x xs [] [] r

-- qpart partitions and sorts the sublists
qpart cmp x [] rlt rge r =
    -- rlt and rge are in reverse order and must be sorted with an
    -- anti-stable sorting
    rqsort_iv' cmp rlt (x:rqsort_iv' cmp rge r)
qpart cmp x (y:ys) rlt rge r =
    case cmp x y of
    GT -> qpart cmp x ys (y:rlt) rge r
        _  -> qpart cmp x ys rlt (y:rge) r

-- rqsort is as qsort but anti-stable, i.e. reverses equal elements
rqsort_iv' _   []     r = r
rqsort_iv' _   [x]    r = x:r
rqsort_iv' cmp (x:xs) r = rqpart cmp x xs [] [] r

rqpart cmp x [] rle rgt r =
    qsort_iv' cmp rle (x:qsort_iv' cmp rgt r)
rqpart cmp x (y:ys) rle rgt r =
    case cmp y x of
    GT -> rqpart cmp x ys rle (y:rgt) r
        _  -> rqpart cmp x ys (y:rle) rgt r


-- code by Orcus

-- Zadanie 9 - merge sort
mergeSort :: Ord a => [a] -> [a]
mergeSort []    = []
mergeSort [x]   = [x]
mergeSort xs    = let(l, r) = splitAt (length xs `quot` 2) xs
                  in mergeSortP (mergeSort l) (mergeSort r)

-- funkcja pomocnicza scalajÄ…ca dwie listy uporzÄ…dkowane w jednÄ…
mergeSortP :: Ord a => [a] -> [a] -> [a]
mergeSortP xs []            = xs
mergeSortP [] ys            = ys
mergeSortP (x:xs) (y:ys)
    | x <= y = x:(mergeSortP xs (y:ys))
    | otherwise = y:(mergeSortP (x:xs) ys)

-- Zadanie 10 - tree sort
treeSort :: Ord a => [a] -> [a]
-- pointless po raz drugi
treeSort = (treeSortInorder . treeSortToTree)
   
treeSortToTree :: Ord a => [a] -> Tree a
treeSortToTree []       = Leaf
treeSortToTree (x:xs)   = let (xlt, xgt) = foldl (\ (lt,gt) el -> case compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt, el:gt) ) ([],[]) xs
                          in Node (treeSortToTree xlt) x (treeSortToTree xgt)

treeSortInorder :: Ord a => Tree a -> [a]
treeSortInorder Leaf            = []
treeSortInorder (Node l x r)    = (treeSortInorder l) ++ [x] ++ (treeSortInorder r)

-- end code by Orcus



--
big_number = 1000000 :: Int


main = do
  gen <- getStdGen
  let xs' = randomRs (1::Int, big_number) gen
  xs <- return (take big_number xs')
  t1 <- getCPUTime
  print (qsort_i xs) -- i
  t2 <- getCPUTime
  print (qsort_ii xs) -- ii
  t3 <- getCPUTime
  print (qsort_iii xs) -- iii
  t4 <- getCPUTime
  print (qsort_iv xs) -- iv
  t5 <- getCPUTime
  print (qsort_v xs) -- v
  t6 <- getCPUTime
  print (qsort_vi xs) -- vi
  t7 <- getCPUTime
  print (qsort_vii xs) -- vii
  t8 <- getCPUTime
  print (sort xs) -- sort
  t9 <- getCPUTime
  print (mergeSort xs) -- mergeSort
  t10 <- getCPUTime
  print (treeSort xs) -- treeSort
  t11 <- getCPUTime
  let getTimes xs = zipWith (-) (tail xs) xs
  let timers = [t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11]
  let times = getTimes timers
  let table = zip times ["i","ii","iii","iv", "v", "vi", "vii", "sort","mergeSort","treeSort"]
  let sorted = sort table
  let scaled = map (\(x,n) -> (((fromIntegral x / (fromIntegral $ fst (head sorted)))::Double),n)) sorted
  let toShow = concatMap (\x-> show x ++ "\n") scaled
  hPutStr stderr toShow

main_small = do
  gen <- getStdGen
  let xs' = randomRs (1::Int, 100000) gen
  xs <- return (take big_number xs')
  t1 <- getCPUTime
  print (qsort_iv xs) -- iv
  t2 <- getCPUTime
  print (sort xs) -- sort
  t3 <- getCPUTime
  print (mergeSort xs) -- mergeSort
  t4 <- getCPUTime
  print (treeSort xs) -- treeSort
  t5 <- getCPUTime
  let getTimes xs = zipWith (-) (tail xs) xs
  let timers = [t1,t2,t3,t4,t5]
  let times = getTimes timers
  let table = zip times ["iv", "sort","mergeSort","treeSort"]
  let sorted = sort table
  let scaled = map (\(x,n) -> (((fromIntegral x / (fromIntegral $ fst (head sorted)))::Double),n)) sorted
  let toShow = concatMap (\x-> show x ++ "\n") scaled
  hPutStr stderr toShow
  hPrint stderr times

--- cut here ---

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Chaddaï Fouché-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/3/4, Krzysztof Skrzętnicki <gtener@...>:

> Hi
>
> I was playing with various versions of sorting algorithms. I know it's very
> easy to create flawed benchmark and I don't claim those are good ones.
> However, it really seems strange to me, that sort - library function - is
> actually the worse measured function. I can hardly belive it, and I'd rather
> say I have made a mistake preparing it.
>
> The overall winner seems to be qsort_iv - which is nothing less but old sort
> replaced by mergesort now.
>
> Any clues?
Part of what you may be missing :
-- cut here --
module Main where

import Control.Parallel.Strategies
import Control.Arrow
import System.CPUTime
import System.IO
import System.Environment
import System.Random
import Data.List( partition, sort )

data Tree a =
    Node (Tree a) a (Tree a)
        | Leaf


qsort_i []  = []
qsort_i (x:xs) = qsort_i (filter (< x) xs) ++ [x] ++ qsort_i (filter (>= x) xs)

qsort_ii [] = []
qsort_ii (x:xs) = let (ls,gt) = partition (< x) xs in qsort_ii ls ++
[x] ++ qsort_ii gt

qsort_iii xs = qsort_iii' [] xs
qsort_iii' acc [] = acc
qsort_iii' acc (x:xs) =
    let (ls,gt) = partition (< x) xs in
    let acc' = (x:(qsort_iii' acc gt)) in qsort_iii' acc' ls

qsort_v [] = []
qsort_v (x:xs) = let (xlt, xgt ) = foldl (\ (lt,gt) el -> case compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt,
el:gt) ) ([],[]) xs
                 in qsort_v xlt ++ [x] ++ qsort_v xgt

-- zmodyfikowany i
qsort_vi [] = []
qsort_vi (x:xs) = qsort_vi (filter (\y-> compare x y == GT) xs) ++ [x]
++ qsort_vi (filter (\y-> compare x y /= GT) xs)


-- zmodyfikowany iii
qsort_vii xs = qsort_vii' [] xs
qsort_vii' acc [] = acc
qsort_vii' acc (x:xs) =
    let (ls,gt) = partition (\y-> compare x y == GT) xs in
    let acc' = (x:(qsort_vii' acc gt)) in qsort_vii' acc' ls



-- qsort is stable and does not concatenate.
qsort_iv xs = qsort_iv' (compare) xs []

qsort_iv' _   []     r = r
qsort_iv' _   [x]    r = x:r
qsort_iv' cmp (x:xs) r = qpart cmp x xs [] [] r

-- qpart partitions and sorts the sublists
qpart cmp x [] rlt rge r =
    -- rlt and rge are in reverse order and must be sorted with an
    -- anti-stable sorting
    rqsort_iv' cmp rlt (x:rqsort_iv' cmp rge r)
qpart cmp x (y:ys) rlt rge r =
    case cmp x y of
    GT -> qpart cmp x ys (y:rlt) rge r
    _  -> qpart cmp x ys rlt (y:rge) r

-- rqsort is as qsort but anti-stable, i.e. reverses equal elements
rqsort_iv' _   []     r = r
rqsort_iv' _   [x]    r = x:r
rqsort_iv' cmp (x:xs) r = rqpart cmp x xs [] [] r

rqpart cmp x [] rle rgt r =
    qsort_iv' cmp rle (x:qsort_iv' cmp rgt r)
rqpart cmp x (y:ys) rle rgt r =
    case cmp y x of
    GT -> rqpart cmp x ys rle (y:rgt) r
    _  -> rqpart cmp x ys (y:rle) rgt r


-- code by Orcus

-- Zadanie 9 - merge sort
mergeSort :: Ord a => [a] -> [a]
mergeSort []    = []
mergeSort [x]   = [x]
mergeSort xs    = let(l, r) = splitAt (length xs `quot` 2) xs
                  in mergeSortP (mergeSort l) (mergeSort r)

-- funkcja pomocnicza scalajÄ…ca dwie listy uporzÄ…dkowane w jednÄ…
mergeSortP :: Ord a => [a] -> [a] -> [a]
mergeSortP xs []            = xs
mergeSortP [] ys            = ys
mergeSortP (x:xs) (y:ys)
    | x <= y = x:(mergeSortP xs (y:ys))
    | otherwise = y:(mergeSortP (x:xs) ys)

-- Zadanie 10 - tree sort
treeSort :: Ord a => [a] -> [a]
-- pointless po raz drugi
treeSort = (treeSortInorder . treeSortToTree)

treeSortToTree :: Ord a => [a] -> Tree a
treeSortToTree []       = Leaf
treeSortToTree (x:xs)   = let (xlt, xgt) = foldl (\ (lt,gt) el -> case
compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt,
el:gt) ) ([],[]) xs
                          in Node (treeSortToTree xlt) x (treeSortToTree xgt)

treeSortInorder :: Ord a => Tree a -> [a]
treeSortInorder Leaf            = []
treeSortInorder (Node l x r)    = (treeSortInorder l) ++ [x] ++
(treeSortInorder r)

-- end code by Orcus

-- begin benchmark making code

makeBenchs benchs xs = do
  let (funcNames, funcs) = unzip benchs
  tBegin <- getCPUTime
  timers <- mapM (\f-> print (f xs) >> getCPUTime) funcs
  let times = zipWith (-) timers (tBegin:timers)
      sortedResults = sort $ zip times funcNames
      minT = fromIntegral $ fst $ head sortedResults
      scaled = map (((/minT) . fromIntegral) *** id) sortedResults
  hPutStr stderr $ unlines $ map show scaled

onRandom eltCnt = do
  gen <- getStdGen
  let xs = take eltCnt (randomRs (1::Int, bigNum) gen) `using` rnf
  xs `seq` return xs

onSorted eltCnt = do
  gen <- getStdGen
  let xs = take eltCnt (randomRs (1::Int, bigNum) gen) `using` rnf
      sxs = sort xs `using` rnf
  xs `seq` sxs `seq` return sxs

bigNum = 1000000 :: Int

-- end of benchmark making code

main = makeBenchs [("i",qsort_i),
                   ("ii",qsort_ii),
                   ("iii",qsort_iii),
                   ("iv",qsort_iv),
                   ("v",qsort_v),
                   ("vi",qsort_vi),
                   ("vii",qsort_vii),
                   ("sort",sort),
                   ("mergeSort",mergeSort),
                   ("treeSort",treeSort)]
            =<< onSorted . read . head =<< getArgs

-- cut here --

It could probably be improved (with classics solution (better
selection of the pivot...)), but the mergesort is only 3 times slower
in worse case, and much more regular, if someone needs a faster sort
in a specific case, it isn't hard to code.

--
Jedaï

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

>  -- Zadanie 9 - merge sort
>  mergeSort :: Ord a => [a] -> [a]
>  mergeSort []    = []
>  mergeSort [x]   = [x]
>  mergeSort xs    = let(l, r) = splitAt (length xs `quot` 2) xs
>                   in mergeSortP (mergeSort l) (mergeSort r)

splitAt is not a particularly good way to split a list, since you
recurse over the list twice. Try instead:

split (x:xs) = (x:b,a)
    where (a,b) = split xs
split [] = ([], [])

Perhaps adding some strictness annotations and turning the where into a case.

Also remember that a standard benchmark for sorting is an
ordered/reverse ordered list, as that causes quicksort to go to O(n^2)
given a bad pivot choice.

If the sort in the standard libraries can be improved on, it should be replaced.

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Krzysztof Skrzętnicki :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for improved code. My point was to measure which programming patterns are faster than the others so I can learn which ones I should use. However, the thing that is really bad is the fact, that even oneliner qsort_i is faster than library sort. Which is very different from what I've expected. My intuition is only best and fastest code goes to library, to the point that people can learn from it. It seems I was mislead.


It could probably be improved (with classics solution (better
selection of the pivot...)), but the mergesort is only 3 times slower
in worse case, and much more regular, if someone needs a faster sort
in a specific case, it isn't hard to code.

--
Jedaï


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

> My intuition is only best and fastest code goes to library, to the point
> that people can learn from it. It seems I was mislead.

The compilers change over time - meaning that the fastest code may
change over time. There is also the chance that the original code was
not the best - for example the words function in the standard library
performs two additional isSpace tests per word. The original code
specifies an interface, its thanks to people trying to beat the
performance that things improve.

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Chaddaï Fouché-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/3/4, Krzysztof Skrzętnicki <gtener@...>:
> Thanks for improved code. My point was to measure which programming patterns
> are faster than the others so I can learn which ones I should use. However,
> the thing that is really bad is the fact, that even oneliner qsort_i is
> faster than library sort. Which is very different from what I've expected.
> My intuition is only best and fastest code goes to library, to the point
> that people can learn from it. It seems I was mislead.

I think you did not correctly got the point of my and Neil Mitchell's
message : you benchmarked those function on a completely random
sequences so qsort was at his best, but in the real world, most
sequences would have bias, and it is not infrequent at all to sort a
partially sorted (or reverse sorted) list... In this case the
performance of all your qsort are abysmal... Which is the reason the
old sort was replaced by the actual mergesort in the library. Try my
code (with 10000 elements for example), you'll see that sort is the
best on a sorted list, and that qsort is at least 60 times slower (on
10000, in fact it is degenerating in O(n^2)).

In the real world, the library maintainers decided it was ok to pay a
slight overhead in the case where the list to sort is really randomly
distributed since mergesort won so hugely over qsort in the pretty
frequent case (in programs) of lists which present regularities.

There is no sort which is ideal in all situations, but we can try to
get a sort that works well in all situations, and don't trash in
situations not so infrequent.

(On the other hand, don't expect libraries functions to always be the
best to use in your particular situation, they tend to be all-purpose
as we just saw and the maintainers prefer simple generic
implementations rather than complicated ones who could be slightly (or
even significantly) faster in some case)

--
Jedaï

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Krzysztof Skrzętnicki :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I get it now, thanks. Also, I guess it is possible to find a better algorithm for standard library sort.

Christopher Skrzętnicki

On Wed, Mar 5, 2008 at 12:04 AM, Chaddaï Fouché <chaddai.fouche@...> wrote:
2008/3/4, Krzysztof Skrzętnicki <gtener@...>:
> Thanks for improved code. My point was to measure which programming patterns
> are faster than the others so I can learn which ones I should use. However,
> the thing that is really bad is the fact, that even oneliner qsort_i is
> faster than library sort. Which is very different from what I've expected.
> My intuition is only best and fastest code goes to library, to the point
> that people can learn from it. It seems I was mislead.

I think you did not correctly got the point of my and Neil Mitchell's
message : you benchmarked those function on a completely random
sequences so qsort was at his best, but in the real world, most
sequences would have bias, and it is not infrequent at all to sort a
partially sorted (or reverse sorted) list... In this case the
performance of all your qsort are abysmal... Which is the reason the
old sort was replaced by the actual mergesort in the library. Try my
code (with 10000 elements for example), you'll see that sort is the
best on a sorted list, and that qsort is at least 60 times slower (on
10000, in fact it is degenerating in O(n^2)).

In the real world, the library maintainers decided it was ok to pay a
slight overhead in the case where the list to sort is really randomly
distributed since mergesort won so hugely over qsort in the pretty
frequent case (in programs) of lists which present regularities.

There is no sort which is ideal in all situations, but we can try to
get a sort that works well in all situations, and don't trash in
situations not so infrequent.

(On the other hand, don't expect libraries functions to always be the
best to use in your particular situation, they tend to be all-purpose
as we just saw and the maintainers prefer simple generic
implementations rather than complicated ones who could be slightly (or
even significantly) faster in some case)

--
Jedaï


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: (flawed?) benchmark : sort

by Krzysztof Skrzętnicki :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ok, I did some search and found Data.Map, which can be used to implement pretty fast sorting:

import qualified Data.Map as Map

treeSort :: Ord a => [a] -> [a]
treeSort = map (\(x,_) -> x ) . Map.toAscList . Map.fromList . map (\x->(x,()))

In fact It is likely to behave like sort, with the exception that it is 23% faster. I did not hovever check the memory consumption. It works well on random, sorted and reverse-sorted inputs, and the speed difference is always about the same. I belive I could take Data.Map and get datatype isomorphic to specialized Data.Map a () of it, so that treeSort will became Map.toAscList . Map.fromList. This may also bring some speedup.

What do you think about this particular function?

On Tue, Mar 4, 2008 at 1:45 AM, Krzysztof Skrzętnicki <gtener@...> wrote:
Hi

I was playing with various versions of sorting algorithms. I know it's very easy to create flawed benchmark and I don't claim those are good ones. However, it really seems strange to me, that sort - library function - is actually the worse measured function. I can hardly belive it, and I'd rather say I have made a mistake preparing it.

The overall winner seems to be qsort_iv - which is nothing less but old sort replaced by mergesort now.

Any clues?

Regards
Christopher Skrzętnicki

--- cut here ---
[Tener@laptener haskell]$ ghc -O2 --make qsort.hs && ./qsort +RTS -sstderr -RTS > /dev/null
[1 of 1] Compiling Main             ( qsort.hs, qsort.o )
Linking qsort ...
./qsort +RTS -sstderr
(1.0,"iv")
(1.1896770400256864,"v")
(1.3091609772011856,"treeSort")
(1.592515715933153,"vii")
(1.5953543402198838,"vi")
(1.5961286512637272,"iii")
(1.8175480563244177,"i")
(1.8771604568641642,"ii")
(2.453160634439497,"mergeSort")
(2.6627090768870216,"sort")
26,094,674,624 bytes allocated in the heap
12,716,656,224 bytes copied during GC (scavenged)
2,021,104,592 bytes copied during GC (not scavenged)
107,225,088 bytes maximum residency (140 sample(s))

      49773 collections in generation 0 ( 21.76s)
        140 collections in generation 1 ( 23.61s)

        305 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   20.39s  ( 20.74s elapsed)
  GC    time   45.37s  ( 46.22s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   65.76s  ( 66.96s elapsed)

  %GC time      69.0%  (69.0% elapsed)

  Alloc rate    1,279,723,644 bytes per MUT second

  Productivity  31.0% of total user, 30.5% of total elapsed


--- cut here ---

{-# OPTIONS_GHC -O2 #-}
module Main where

import System.CPUTime
import System.IO
import System.Environment
import System.Random
import Data.List( partition, sort )

data Tree a =
    Node (Tree a) a (Tree a)
        | Leaf


qsort_i []  = []
qsort_i (x:xs) = qsort_i (filter (< x) xs) ++ [x] ++ qsort_i (filter (>= x) xs)

qsort_ii [] = []
qsort_ii (x:xs) = let (ls,gt) = partition (< x) xs in qsort_ii ls ++ [x] ++ qsort_ii gt

qsort_iii xs = qsort_iii' [] xs
qsort_iii' acc [] = acc
qsort_iii' acc (x:xs) =
    let (ls,gt) = partition (< x) xs in
    let acc' = (x:(qsort_iii' acc gt)) in qsort_iii' acc' ls

qsort_v [] = []
qsort_v (x:xs) = let (xlt, xgt ) = foldl (\ (lt,gt) el -> case compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt, el:gt) ) ([],[]) xs
                 in qsort_v xlt ++ [x] ++ qsort_v xgt

-- zmodyfikowany i
qsort_vi [] = []
qsort_vi (x:xs) = qsort_vi (filter (\y-> compare x y == GT) xs) ++ [x] ++ qsort_vi (filter (\y-> compare x y /= GT) xs)


-- zmodyfikowany iii
qsort_vii xs = qsort_vii' [] xs
qsort_vii' acc [] = acc
qsort_vii' acc (x:xs) =
    let (ls,gt) = partition (\y-> compare x y == GT) xs in
    let acc' = (x:(qsort_vii' acc gt)) in qsort_vii' acc' ls



-- qsort is stable and does not concatenate.
qsort_iv xs = qsort_iv' (compare) xs []

qsort_iv' _   []     r = r
qsort_iv' _   [x]    r = x:r
qsort_iv' cmp (x:xs) r = qpart cmp x xs [] [] r

-- qpart partitions and sorts the sublists
qpart cmp x [] rlt rge r =
    -- rlt and rge are in reverse order and must be sorted with an
    -- anti-stable sorting
    rqsort_iv' cmp rlt (x:rqsort_iv' cmp rge r)
qpart cmp x (y:ys) rlt rge r =
    case cmp x y of
    GT -> qpart cmp x ys (y:rlt) rge r
        _  -> qpart cmp x ys rlt (y:rge) r

-- rqsort is as qsort but anti-stable, i.e. reverses equal elements
rqsort_iv' _   []     r = r
rqsort_iv' _   [x]    r = x:r
rqsort_iv' cmp (x:xs) r = rqpart cmp x xs [] [] r

rqpart cmp x [] rle rgt r =
    qsort_iv' cmp rle (x:qsort_iv' cmp rgt r)
rqpart cmp x (y:ys) rle rgt r =
    case cmp y x of
    GT -> rqpart cmp x ys rle (y:rgt) r
        _  -> rqpart cmp x ys (y:rle) rgt r


-- code by Orcus

-- Zadanie 9 - merge sort
mergeSort :: Ord a => [a] -> [a]
mergeSort []    = []
mergeSort [x]   = [x]
mergeSort xs    = let(l, r) = splitAt (length xs `quot` 2) xs
                  in mergeSortP (mergeSort l) (mergeSort r)

-- funkcja pomocnicza scalajÄ…ca dwie listy uporzÄ…dkowane w jednÄ…
mergeSortP :: Ord a => [a] -> [a] -> [a]
mergeSortP xs []            = xs
mergeSortP [] ys            = ys
mergeSortP (x:xs) (y:ys)
    | x <= y = x:(mergeSortP xs (y:ys))
    | otherwise = y:(mergeSortP (x:xs) ys)

-- Zadanie 10 - tree sort
treeSort :: Ord a => [a] -> [a]
-- pointless po raz drugi
treeSort = (treeSortInorder . treeSortToTree)
   
treeSortToTree :: Ord a => [a] -> Tree a
treeSortToTree []       = Leaf
treeSortToTree (x:xs)   = let (xlt, xgt) = foldl (\ (lt,gt) el -> case compare x el of
                                                            GT -> (el:lt, gt)
                                                            _  -> (lt, el:gt) ) ([],[]) xs
                          in Node (treeSortToTree xlt) x (treeSortToTree xgt)

treeSortInorder :: Ord a => Tree a -> [a]
treeSortInorder Leaf            = []
treeSortInorder (Node l x r)    = (treeSortInorder l) ++ [x] ++ (treeSortInorder r)

-- end code by Orcus



--
big_number = 1000000 :: Int


main = do
  gen <- getStdGen
  let xs' = randomRs (1::Int, big_number) gen
  xs <- return (take big_number xs')
  t1 <- getCPUTime
  print (qsort_i xs) -- i
  t2 <- getCPUTime
  print (qsort_ii xs) -- ii
  t3 <- getCPUTime
  print (qsort_iii xs) -- iii
  t4 <- getCPUTime
  print (qsort_iv xs) -- iv
  t5 <- getCPUTime
  print (qsort_v xs) -- v
  t6 <- getCPUTime
  print (qsort_vi xs) -- vi
  t7 <- getCPUTime
  print (qsort_vii xs) -- vii
  t8 <- getCPUTime
  print (sort xs) -- sort
  t9 <- getCPUTime
  print (mergeSort xs) -- mergeSort
  t10 <- getCPUTime
  print (treeSort xs) -- treeSort
  t11 <- getCPUTime
  let getTimes xs = zipWith (-) (tail xs) xs
  let timers = [t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11]
  let times = getTimes timers
  let table = zip times ["i","ii","iii","iv", "v", "vi", "vii", "sort","mergeSort","treeSort"]
  let sorted = sort table
  let scaled = map (\(x,n) -> (((fromIntegral x / (fromIntegral $ fst (head sorted)))::Double),n)) sorted
  let toShow = concatMap (\x-> show x ++ "\n") scaled
  hPutStr stderr toShow

main_small = do
  gen <- getStdGen
  let xs' = randomRs (1::Int, 100000) gen
  xs <- return (take big_number xs')
  t1 <- getCPUTime
  print (qsort_iv xs) -- iv
  t2 <- getCPUTime
  print (sort xs) -- sort
  t3 <- getCPUTime
  print (mergeSort xs) -- mergeSort
  t4 <- getCPUTime
  print (treeSort xs) -- treeSort
  t5 <- getCPUTime
  let getTimes xs = zipWith (-) (tail xs) xs
  let timers = [t1,t2,t3,t4,t5]
  let times = getTimes timers
  let table = zip times ["iv", "sort","mergeSort","treeSort"]
  let sorted = sort table
  let scaled = map (\(x,n) -> (((fromIntegral x / (fromIntegral $ fst (head sorted)))::Double),n)) sorted
  let toShow = concatMap (\x-> show x ++ "\n") scaled
  hPutStr stderr toShow
  hPrint stderr times

--- cut here ---


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Dan Doel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote:

> Ok, I did some search and found Data.Map, which can be used to implement
> pretty fast sorting:
>
> import qualified Data.Map as Map
>
> treeSort :: Ord a => [a] -> [a]
> treeSort = map (\(x,_) -> x ) . Map.toAscList . Map.fromList . map
> (\x->(x,()))
>
> In fact It is likely to behave like sort, with the exception that it is 23%
> faster. I did not hovever check the memory consumption. It works well on
> random, sorted and reverse-sorted inputs, and the speed difference is
> always about the same. I belive I could take Data.Map and get datatype
> isomorphic to specialized *Data.Map a ()* of it, so that treeSort will
> became Map.toAscList . Map.fromList. This may also bring some speedup.
>
> What do you think about this particular function?

Some thoughts:

1) To get your function specifically, you could just use Data.Set.Set a
instead of Map a ().

2) What does it do with duplicate elements in the list? I expect it deletes
them. To avoid this, you'd need to use something like fromListWith, keeping
track of how many duplicates there are, and expanding at the end.

3) I imagine the time taken to get any output is always O(n*log n). Various
lazy sorts can be written (and I'd guess the standard library sort is written
this way, although I don't know for sure) such that 'head (sort l)' is O(n),
or O(n + k*log n) for getting the first k elements. However, Map, being a
balanced binary tree, doesn't (I think) have the right characteristics for
this.

At the very least, you'll probably want to test with a function that doesn't
delete duplicate elements. Something like this:

  treeSort = concatMap (\(x,n) -> replicate n x)
              . Map.toAscList . Map.fromListWith (+) . map (\x -> (x,1))

-- Dan
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Don Stewart-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

dan.doel:

> On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote:
> > Ok, I did some search and found Data.Map, which can be used to implement
> > pretty fast sorting:
> >
> > import qualified Data.Map as Map
> >
> > treeSort :: Ord a => [a] -> [a]
> > treeSort = map (\(x,_) -> x ) . Map.toAscList . Map.fromList . map
> > (\x->(x,()))
> >
> > In fact It is likely to behave like sort, with the exception that it is 23%
> > faster. I did not hovever check the memory consumption. It works well on
> > random, sorted and reverse-sorted inputs, and the speed difference is
> > always about the same. I belive I could take Data.Map and get datatype
> > isomorphic to specialized *Data.Map a ()* of it, so that treeSort will
> > became Map.toAscList . Map.fromList. This may also bring some speedup.
> >
> > What do you think about this particular function?
>
> Some thoughts:
>
> 1) To get your function specifically, you could just use Data.Set.Set a
> instead of Map a ().
>
> 2) What does it do with duplicate elements in the list? I expect it deletes
> them. To avoid this, you'd need to use something like fromListWith, keeping
> track of how many duplicates there are, and expanding at the end.

And a little QuickCheck to help things along:

    import qualified Data.Map as Map
    import Data.List
    import Test.QuickCheck

    treeSort :: Ord a => [a] -> [a]
    treeSort = map (\(x,_) -> x ) . Map.toAscList . Map.fromList . map (\x->(x,()))

    main = quickCheck prop_sort

    prop_sort xs = sort xs == treeSort xs
        where _ = xs :: [Int]

Running:

    $ runhaskell A.hs
    Falsifiable, after 11 tests:
    [-2,-2,5]
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Duncan Coutts :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, 2008-03-09 at 23:04 -0400, Dan Doel wrote:
> On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote:

> > What do you think about this particular function?
>
> Some thoughts:
>
> 1) To get your function specifically, you could just use Data.Set.Set a
> instead of Map a ().
>
> 2) What does it do with duplicate elements in the list? I expect it deletes
> them. To avoid this, you'd need to use something like fromListWith, keeping
> track of how many duplicates there are, and expanding at the end.
>
> 3) I imagine the time taken to get any output is always O(n*log n). Various
> lazy sorts can be written (and I'd guess the standard library sort is written
> this way, although I don't know for sure) such that 'head (sort l)' is O(n),
> or O(n + k*log n) for getting the first k elements. However, Map, being a
> balanced binary tree, doesn't (I think) have the right characteristics for
> this.

Sounds to me like we should try a heap sort. As a data structure it
should have similar constant factors to Data.Map (or .Set) but a heap is
less ordered than a search tree and gives the O(n + k*log n) property.

Duncan

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

>  2) What does it do with duplicate elements in the list? I expect it deletes
>  them. To avoid this, you'd need to use something like fromListWith, keeping
>  track of how many duplicates there are, and expanding at the end.

That would be wrong. Consider:

data Foo = Foo Int Int

instance Ord Foo where
    compare (Foo a _) (Foo b _) = compare a b

sort [Foo 1 2, Foo 1 -2] must return the original list back, in that
order. You cannot delete things and duplicate them later. To guarantee
the ordering you must also do other stuff.

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

Can whoever picks this up please try the sort code from Yhc in their
comparisons. In my benchmarks it ran up to twice as fast as the GHC
code. http://darcs.haskell.org/yhc/src/packages/yhc-base-1.0/Data/List.hs

I think what we really need is first quickCheck and timing framework
for measuring sorts. After we have decided what makes one sort
faster/better than another, then is the time to start deciding what
sort is the best one. Ian did some initial work on this:
http://www.haskell.org/pipermail/glasgow-haskell-users/2002-May/003376.html

Until the sort-check package is uploaded to hackage I think most
people will find it hard to be convinced of one sort over another.

Thanks

Neil


On Mon, Mar 10, 2008 at 8:27 AM, Neil Mitchell <ndmitchell@...> wrote:

> Hi
>
>
>  >  2) What does it do with duplicate elements in the list? I expect it deletes
>  >  them. To avoid this, you'd need to use something like fromListWith, keeping
>  >  track of how many duplicates there are, and expanding at the end.
>
>  That would be wrong. Consider:
>
>  data Foo = Foo Int Int
>
>  instance Ord Foo where
>     compare (Foo a _) (Foo b _) = compare a b
>
>  sort [Foo 1 2, Foo 1 -2] must return the original list back, in that
>  order. You cannot delete things and duplicate them later. To guarantee
>  the ordering you must also do other stuff.
>
>  Thanks
>
>  Neil
>
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Malcolm Wallace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 10 Mar 2008, at 08:36, Neil Mitchell wrote:
> Can whoever picks this up please try the sort code from Yhc in their
> comparisons. In my benchmarks it ran up to twice as fast as the GHC
> code. http://darcs.haskell.org/yhc/src/packages/yhc-base-1.0/Data/List.hs

I believe the Yhc sort implementation is faster because Lennart did  
some extensive performance tuning of sorting with hbc, about ten years  
ago, and contributed the resulting winner to nhc98 way back then.

Regards,
     Malcolm

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Adrian Hey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Neil Mitchell wrote:

>>  2) What does it do with duplicate elements in the list? I expect it deletes
>>  them. To avoid this, you'd need to use something like fromListWith, keeping
>>  track of how many duplicates there are, and expanding at the end.
>
> That would be wrong. Consider:
>
> data Foo = Foo Int Int
>
> instance Ord Foo where
>     compare (Foo a _) (Foo b _) = compare a b

I would consider such an Ord instance to be hopelessly broken, and I
don't think it's the responsibility of authors of functions with Ord
constraints in their sigs (such as sort) to consider such possibilities
or specify and control the behaviour of their behaviour for such
instances. Trying to do this is what leads to horrors such as the "left
biasing" of Data.Map (for example).

Unfortunately the Haskell standards don't currently specify sane laws
for Eq and Ord class instances, but they should. Otherwise knowing a
type is an instance of Ord tells me nothing that I can rely on.

Regards
--
Adrian Hey

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Adrian Hey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adrian Hey wrote:

> or specify and control the behaviour of their behaviour for such
> instances.

Urk, sorry for the gibberish. I guess I should get into the habit of
reading what I write before posting :-)

Regards
--
Adrian Hey

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Luke Palmer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Mar 10, 2008 at 11:00 AM, Adrian Hey <ahey@...> wrote:

> Neil Mitchell wrote:
>  >>  2) What does it do with duplicate elements in the list? I expect it deletes
>  >>  them. To avoid this, you'd need to use something like fromListWith, keeping
>  >>  track of how many duplicates there are, and expanding at the end.
>  >
>  > That would be wrong. Consider:
>  >
>  > data Foo = Foo Int Int
>  >
>  > instance Ord Foo where
>  >     compare (Foo a _) (Foo b _) = compare a b
>
>  I would consider such an Ord instance to be hopelessly broken, and I
>  don't think it's the responsibility of authors of functions with Ord
>  constraints in their sigs (such as sort) to consider such possibilities
>  or specify and control the behaviour of their behaviour for such
>  instances. Trying to do this is what leads to horrors such as the "left
>  biasing" of Data.Map (for example).

It could be.  I actually don't know what Haskell specifies here.  If a
type has an Eq instance and x == y, must x and y be observationally
equivalent?  Or is it reasonable to allow some wiggle room?  I'd say
(==) definitely has to be an equivalence relation, but beyond that,
it's open to the implementor, since if an algorithm only depends on
(Eq a), it can't tell the difference between observational equality
and any other equivalence relation.  But that's just one argument ("by
example", in a way).  That is, an argument that this is hopelessly
broken isn't trivial, it needs to be defended.

There is nonetheless a need to handle duplicates gracefully, that is
keeping a count won't cut it, because of sortBy.

Luke
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Neil Mitchell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

>  > instance Ord Foo where
>  >     compare (Foo a _) (Foo b _) = compare a b
>
>  I would consider such an Ord instance to be hopelessly broken, and I
>  don't think it's the responsibility of authors of functions with Ord
>  constraints in their sigs (such as sort) to consider such possibilities
>  or specify and control the behaviour of their behaviour for such
>  instances. Trying to do this is what leads to horrors such as the "left
>  biasing" of Data.Map (for example).

The sort in Haskell is specified to be "stable". What that means is
that the above ord relation is fine. The above ordering observes all
the necessary mathematical definitions of ordering, assuming an Eq of:

instance Eq Foo where
    (==) (Foo a _) (Foo b _) = (==) a b

>  Unfortunately the Haskell standards don't currently specify sane laws
>  for Eq and Ord class instances, but they should. Otherwise knowing a
>  type is an instance of Ord tells me nothing that I can rely on.

Please give the sane law that this ordering violates. I can't see any!

What if I had made the definition of Foo:

data Foo = Foo Int (Int -> Int)

Now, is the only possible answer that any Ord instance for Foo is wrong?

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re[2]: Re: (flawed?) benchmark : sort

by Bulat Ziganshin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Adrian,

Monday, March 10, 2008, 2:00:18 PM, you wrote:

>> instance Ord Foo where
>>     compare (Foo a _) (Foo b _) = compare a b

> I would consider such an Ord instance to be hopelessly broken, and I

hmmmm. for example, imagine files in file manager sorted by size or date


--
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@...

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: (flawed?) benchmark : sort

by Adrian Hey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Neil Mitchell wrote:

> Hi
>
>>  > instance Ord Foo where
>>  >     compare (Foo a _) (Foo b _) = compare a b
>>
>>  I would consider such an Ord instance to be hopelessly broken, and I
>>  don't think it's the responsibility of authors of functions with Ord
>>  constraints in their sigs (such as sort) to consider such possibilities
>>  or specify and control the behaviour of their behaviour for such
>>  instances. Trying to do this is what leads to horrors such as the "left
>>  biasing" of Data.Map (for example).
>
> The sort in Haskell is specified to be "stable". What that means is
> that the above ord relation is fine. The above ordering observes all
> the necessary mathematical definitions of ordering, assuming an Eq of:
>
> instance Eq Foo where
>     (==) (Foo a _) (Foo b _) = (==) a b
>
>>  Unfortunately the Haskell standards don't currently specify sane laws
>>  for Eq and Ord class instances, but they should. Otherwise knowing a
>>  type is an instance of Ord tells me nothing that I can rely on.
>
> Please give the sane law that this ordering violates. I can't see any!

The Eq instance you've given violates the law that (x == y) = True
implies x = y. Of course the Haskell standard doesn't specify this law,
but it should.

The Haskell standard doen't even specify that compare x y = EQ implies
(x == y) = True, but again it should (what's the purpose of the Eq
constraint on Ord class otherwise).

> What if I had made the definition of Foo:
>
> data Foo = Foo Int (Int -> Int)
>
> Now, is the only possible answer that any Ord instance for Foo is wrong?

Yes, if the Foo constuctor is exported. If it's scope confined to one
module then you could maintain the invariant that the same function is
always associated with a given Int. However, if this is the case then
the issue you raise wrt sort behaviour is irrelevant.

Regards
--
Adrian Hey

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe
< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >