excercise - a completely lazy sorting algorithm

View: New views
19 Messages — Rating Filter:   Alert me  

excercise - a completely lazy sorting algorithm

by Petr Pudlak (Debian) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

about a month ago, we were discussing sorting in Haskell with a friend. We
realized a nice property of lazy merge-sort (which is AFAIK the implementation
of Data.List.sort): Getting the first element of the sorted list actually
requires O(n) time, not O(n * log n) as in imperative languages. And
immediately we asked: Would it be possible to create a lazy selection/sorting
algorithm so that getting any element of the sorted list/array by its index
would require just O(n) time, and getting all the elements would still be in
O(n * log n)?

More precisely: The result of sorting an n-element list or array should be a
structure that allows to ask i-th element of the result (for example, a lazy
array). Asking k arbitrary elements should take no more than
  O(min(n * log n, k * n))

I believe that this could be done, but so far I wasn't able to implement and
show it myself. I think the solution would be somewhat modified lazy quicksort
(or "Median of Medians algorithm" if we want to be sure that we get good
pivots).

Perhaps somebody else would also like to give it a try? Or perhaps explain me
why it's not possible?

    Best regards,
    Petr
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Interesting problem.  I have been toying with the same problem for some time.

To solve the problem in theory, I'd concentrate on getting the number
of comparisons into the required O(n) resp. O(n log n) ranges.
Afterwards we can think about building the infrastructure to keep the
number of all operations (book keeping..) in those bounds, too.

Anyway, I'll give a solution to the problem using a randomized
quicksort, soon.  Later we can replace the randomized pivote-picking
with a deteministic linear-median algorithm.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Andrew Hunter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jul 6, 2009 at 12:26 PM, Petr Pudlak<deb@...> wrote:
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>  O(min(n * log n, k * n))
>

I'd argue this is not really a true use of "laziness" and more just
amortized cost.  That said, yes, it can be done!

I'm going to use an imperative data structure because I hate you
all</sarcasm> and it makes the analysis/presentation slightly easier.

To wit, we have a sorted array of unsorted bags of inputs.  For
example, (suppose we start with an array of numbers 1..10, scrambled),
we might start with:

(10,{1,5,6,8,3,4,2,7,9,10})

And after some operations, it might look like:

(3,{1,3,2});(4,{6,7,5,4});(3:{8,9,10})

And we're trying to (eventually) hit the state:

(1,{1});(2,{2});...etc.

(It's not hard to see how to in constant time maintain an array of
pointers into these bags to allow finding elements.)

Now, using a linear-time selection algorithm like median-of-medians or
Chazelle's, it's easy to see how to find the k-th largest element in
such a data structure: find the bag containing the k-th element, apply
the selection algorithm.  Furthermore, still in linear time, we can
(while we're working) split the bag around that element we found.  So,
given the data state:

(10,{1,5,6,8,3,4,2,7,9,10})

if we're asked for the 4th largest element, we'll update to:

(3,{1,3,2});(1,{4});(6,{5,6,8,7,9,10})

So far so good, right?  Now we just apply one more change: after each
lookup, we walk across the list of bags, and split each in half (as if
we were asked for the median element of each bag.)  In my running
example, we'd go to:

(1,{1});{1,{2});(1,{3});(1,{4});(2,{5,6});(1,{7});(3,{8,9,10})

This is still linear time--we run a linear-time algorithm on disjoint
subsets of the input.  Now, note: after k lookups to this structure,
the largest bag is at most n/2^k elements long!  Couple with a slight
optimization that overwrites the lookup table into the bags with just
the correct results once the bags are small enough, and this matches
your time bounds quite nicely.

Now, I doubt it'll be fast, but that's not what you asked.  Plus, come
up with a persuasive use case for a system where you /really need/ the
4th, 27th, and 957th largest elements /now/ and none of the rest, and
I promise I'll make something practically efficient. :)

If someone can translate my algorithm into a non-side-effecting one,
I'd appreciate it, but I think that like disjoint set/union, this is
probably inherently side-effecting.  Yes, yes, we could use functional
arrays, but short of that I don't see a way to avoid side effects to
take care of my amortized work.

AHH
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> If someone can translate my algorithm into a non-side-effecting one,
> I'd appreciate it, but I think that like disjoint set/union, this is
> probably inherently side-effecting.  Yes, yes, we could use functional
> arrays, but short of that I don't see a way to avoid side effects to
> take care of my amortized work.

I am just working on a side-effect-free version.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Mads Lindstrøm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Petr,

Maybe this will give inspiration
http://en.wikipedia.org/wiki/Selection_algorithm

It seems to me, that you just need a selection algorithm which works in
O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
algorithm with any O(n * lg n) sort, you furfil your time constrain.


Regards,

Mads

> Hi all,
>
> about a month ago, we were discussing sorting in Haskell with a friend. We
> realized a nice property of lazy merge-sort (which is AFAIK the implementation
> of Data.List.sort): Getting the first element of the sorted list actually
> requires O(n) time, not O(n * log n) as in imperative languages. And
> immediately we asked: Would it be possible to create a lazy selection/sorting
> algorithm so that getting any element of the sorted list/array by its index
> would require just O(n) time, and getting all the elements would still be in
> O(n * log n)?
>
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>   O(min(n * log n, k * n))
>
> I believe that this could be done, but so far I wasn't able to implement and
> show it myself. I think the solution would be somewhat modified lazy quicksort
> (or "Median of Medians algorithm" if we want to be sure that we get good
> pivots).
>
> Perhaps somebody else would also like to give it a try? Or perhaps explain me
> why it's not possible?
>
>     Best regards,
>     Petr
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@...
> http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (204 bytes) Download Attachment

Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> It seems to me, that you just need a selection algorithm which works in
> O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
> algorithm with any O(n * lg n) sort, you furfil your time constrain.

I guess, we also want the list to be sorted in O(1) after having
selected every element.

Matthias.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Andrew Hunter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jul 6, 2009 at 4:32 PM, Matthias
Görgens<matthias.goergens@...> wrote:
>> It seems to me, that you just need a selection algorithm which works in
>> O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
>> algorithm with any O(n * lg n) sort, you furfil your time constrain.
>
> I guess, we also want the list to be sorted in O(1) after having
> selected every element.
>

I think he's suggesting something along the lines of: for the first
\log n requests, use a selection.  On the \log n + 1th request, just
sort the whole thing.  This obviously isn't the spirit of what's
wanted, but does in fact meet the time bounds.

AHH
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The "sorted array of bags of unsorted input" is a nice idea.  However,
you have to use the data structure in a single-threaded [1] fashion to
obtain the claimed bounds.

Here's a pure solution that uses amortization and laziness.

> import qualified Data.Sequence as S
> import Data.Sequence ((><))
> import Data.Foldable
> import Data.Monoid

Suppose we have a function to find the the median of a list, and
partition it into three sublists: Smaller than the median, equal to
the media, larger than the median.  That function should run in linear
time.

> partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

where the following data structure holds the sublists and some
bookkeeping information:

> data BTreeRaw a m = Leaf
>                   | Node {cmp::(a->Ordering)
>                          , lN :: Int
>                          , less::m
>                          , eq :: (S.Seq a)
>                          , gN :: Int
>                          , greater::m
>                          }

where 'lN' and 'gN' are the length of 'less' and 'greater'.

We can make BTreeRaw a functor:

> instance Functor (BTreeRaw a) where
>     fmap f Leaf = Leaf
>     fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g)

Now using a fixed-point construction we can bootstrap a sorting
algorithm from partitionOnMedian:

> data Fix m = Fix {unfix :: (m (Fix m))}
> type BTree a = Fix (BTreeRaw a)

> treeSort :: forall a. (Ord a) => S.Seq a -> BTree a
> treeSort = Fix . helper . partitionOnMedian
>     where helper = fmap (Fix . helper . partitionOnMedian)

Now treeSort produces the thunk of a balanced binary search tree.  Of
course we can get a sorted list out of it (forcing the whole
structure):

> flatten :: BTree a -> S.Seq a
> flatten (Fix Leaf) = S.empty
> flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g

> mySort = flatten . treeSort

But we can also get elements efficently, forcing only a linear amount
of comparisions in the worst case:

> index :: BTree a -> Int -> a
> index (Fix Leaf) _ = error "tried to get an element of Leaf"
> index (Fix (Node lN l e gN g)) i | i < lN
>                                      = index l i
>                                  | i - lN < S.length e
>                                      = S.index e (i-lN)
>                                  | i - lN - S.length e < gN
>                                      = index g (i - lN - S.length e)
>                                  | i - lN - S.length e - gN >= 0
>                                      = error "index out of bounds"

Although we do have to force comparisions only once every time we
touch the same element in the tree, we do still have to traverse the
tree (in logarithmic time).

If you want linear time access on first touch of an element and
constant time access afterwards us toArray:

> toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t
> toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI])
>     where size (Fix Leaf) = 0
>           size (Fix (Node lN _ e gN _)) = lN + S.length e + gN
>           maxI = size tree - 1

[1] Single-Threaded in the sense of Okasaki's use of the word.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Heinrich Apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Petr Pudlak wrote:
> about a month ago, we were discussing sorting in Haskell with a friend. We
> realized a nice property of lazy merge-sort (which is AFAIK the implementation
> of Data.List.sort): Getting the first element of the sorted list actually
> requires O(n) time, not O(n * log n) as in imperative languages.

Similarly, taking the first  k  elements will take  O(n + k log n)
time. And this also works for the standard lazy quicksort. See also

   http://apfelmus.nfshost.com/quicksearch.html

> And immediately we asked: Would it be possible to create a lazy
> selection/sorting algorithm so that getting any element of the sorted
> list/array by its index would require just O(n) time, and getting all the
> elements would still be in O(n * log n)?
>
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>   O(min(n * log n, k * n))

If you want to take elements only from the beginning, then using a list
as result type is enough. But you want random access, and you rightly
note that this requires some other data structure to store the sorted
result, simply because

   xs !! k    is at least  O(k)  time

So, a tree like Matthias implements it is the way to go. Basically, it
reifies the recursive calls of quicksort as a lazy data struture which
can be evaluated piecemeal.

(If you don't care about the  O(k)  inherent in  xs !! k  and only ask
about the number of comparisons it takes to evaluate  xs !! k , then it
is possible to make the standard  quicksort  slightly lazier so that
this works, too. Details in the link given above.)



Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> So, a tree like Matthias implements it is the way to go. Basically, it
> reifies the recursive calls of quicksort as a lazy data struture which
> can be evaluated piecemeal.

Yes.  I wonder if it is possible to use a standard (randomized
quicksort) and employ some type class magic (like continuations) to
make the reification [1] transparent to the code.

Matthias.
[1] I reified reify.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Heinrich Apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matthias Görgens wrote:

>> So, a tree like Matthias implements it is the way to go. Basically, it
>> reifies the recursive calls of quicksort as a lazy data struture which
>> can be evaluated piecemeal.
>
> Yes.  I wonder if it is possible to use a standard (randomized
> quicksort) and employ some type class magic (like continuations) to
> make the reification [1] transparent to the code.
>
> Matthias.
> [1] I reified reify.

Well, you can perform an abstraction similar to what leads to the
definition of  fold .

Starting with say

   sum []     = 0
   sum (x:xs) = x + sum xs

we can replace the special values 0 and + by variables, leading to a
function

   (fold f z) []     = z
   (fold f z) (x:xs) = x `f` (fold f z) xs

In other words, if we specialize the variables to 0 and + again, we get

   fold (+) 0 = sum


Similarly, we can start with quicksort

   quicksort :: Ord a => [a] -> [a]
   quicksort []     = []
   quicksort (x:xs) = quicksort ls ++ [x] ++ quicksort rs
       where
       ls = filter (<= x) xs
       rs = filter (>  x) xs

and replace the operations on the return type with variables

   quicksort :: Ord a => (a -> b -> b -> b) -> b -> [a] -> b
   quicksort f z []     = z
   quicksort f z (x:xs) = f x (quicksort f z ls) (quicksort f z rs)
       where
       ls = filter (<= x) xs
       rs = filter (>  x) xs

Note however that this is not quite enough to handle the case of a
random access tree like you wrote it, because the latter depends on the
fact that quicksorts *preserves* the length of the list. What I mean is
that we have to keep track of the lengths of the lists, for instance
like this:

   quicksort :: Ord a =>
                (a -> (Int,b) -> (Int,b) -> b) -> b -> [a] -> (Int,b)
   quicksort f z []     = (0,z)
   quicksort f z (x:xs) =
       (length xs + 1, f x (quicksort f z ls) (quicksort f z rs))
       where
       ls = filter (<= x) xs
       rs = filter (>  x) xs

And we can implement a random access tree

   type Sized a = (Int, a)
   size = fst

   data Tree a  = Empty
                | Branch a (Sized (Tree a)) (Sized (Tree a))

   mySort :: [a] -> Sized (Tree a)
   mySort = quicksort Branch Empty

   index :: Sized (Tree a) -> Int -> Maybe a
   index (0,Empty)        _       = Nothing
   index (n,Branch x a b) k
       | 1 <= k && k <= size a    = index a k
       | k == size a + 1          = Just x
       | size a + 1 < k && k <= n = index b (k - size a - 1)
       | otherwise                = Nothing

or an ordinary sort

   qsort :: Ord a => [a] -> [a]
   qsort = quicksort (\x a b -> snd a ++ [x] ++ snd b) []



Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Interesting.  Can you make the definition of quicksort non-recursive,
too?  Perhaps with help of a bootstrapping combinator like the one
implicit in the approach I have given earlier?

> treeSort = bootstrap partitionOnMedian
> bootstrap f = Fix . helper . f
>     where helper = fmap (Fix . helper . f)

> partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

Extra points if you can give 'bootstrap' or an equivalent in terms of
existing Haskell combinators.

Matthias.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Heinrich Apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matthias Görgens wrote:

> Interesting.  Can you make the definition of quicksort non-recursive,
> too?  Perhaps with help of a bootstrapping combinator like the one
> implicit in the approach I have given earlier?
>
>> treeSort = bootstrap partitionOnMedian
>> bootstrap f = Fix . helper . f
>>     where helper = fmap (Fix . helper . f)
>
>> partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)
>
> Extra points if you can give 'bootstrap' or an equivalent in terms of
> existing Haskell combinators.

Sure, no problem.

Of course, some part of algorithm has to be recursive, but this can be
outsourced to a general recursion scheme, like the hylomorphism

    hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b)
    hylo f g = g . fmap (hylo f g) . f

This scheme is a combination of the anamorphism which builds the tree of
recursive calls

    data Fix f = In { out :: f (Fix f) }

    ana :: Functor f => (a -> f a) -> a -> Fix f
    ana f = In . fmap (ana f) . f

and a catamorphism which combines the results

    cata :: Functor f => (f b -> b) -> Fix f -> b
    cata g = g . fmap (cata g) . out

so we could also write

    hylo f g = cata g . ana f



For quicksort, the call tree is the fixed point of the functor

    type Size    = Int
    data Q a b   = Empty | Merge Size a b b

    instance Functor (Q a) where
        fmap f (Merge n x b b') = Merge n x (f b) (f b')
        fmap f Empty            = Empty

The algorithm

    quicksort = hylo partition merge

proceeds in the two well-known phases: first, the input list is
partitioned into two parts separated by a pivot

    partition []     = Empty
    partition (x:xs) = Merge (length xs + 1) x ls xs
        where
        ls = filter (<= x) xs
        rs = filter (>  x) xs

and then the sorted results are merged

    merge Empty           = []
    merge (Merge _ x a b) = a ++ [x] ++ b

The random access tree implementation can be obtained by using a
different  merge  function. In particular, the quicksort from my
previous post was parametrized on  merge  (with a slightly different way
to indicate the list lengths); any function of type

    (Size -> a -> b -> b -> b) -> b -> c

is equivalent to a function of type

    (Q a b -> b) -> c


Incidentally, the random access tree *is* the call tree for quicksort,
so we'd have

   merge = In

and we can shorten the whole thing to

   quicksort = ana partition

which is essentially what you coded.



To read about  hylo f g = cata g . ana f  with quicksort as example
again in a slightly different light, see also the following blog post by
Ulisses Costa

 http://ulissesaraujo.wordpress.com/2009/04/09/hylomorphisms-in-haskell/



Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: excercise - a completely lazy sorting algorithm

by Matthias Görgens-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks.  I heard about the hylo-, ana- and catamorphisms before, but
never explicitly used them.  Time to get started.

And yet another question: One can get the median in deterministic
linear time.  For quicksort choosing the median as pivot keeps the O(n
log n) average running time and brings down the expected worst case,
too.  Do you know of a (natural) way to combine selecting the median
and doing the quicksort, so that you don't compare unnecessarily?

The way to de-randomize quickselect is to calculate medians of
medians.  I.e. solve the problem for smaller instances first.  I
suspect if we follow this strategy with the reified quicksort
call-trees, the de-randomized quicksort will look a lot like
mergesort.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Heinrich Apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matthias Görgens wrote:
> Thanks.  I heard about the hylo-, ana- and catamorphisms before, but
> never explicitly used them.  Time to get started.

You did use them explicitly :) , namely in

  treeSort = bootstrap partitionOnMedian
  bootstrap f = Fix . helper . f
      where helper = fmap (Fix . helper . f)

  partitionOnMedian :: (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

since

  bootstrap f = ana f

In particular, we have this slight reformulation

  bootstrap f = helper'
     where
     helper  = fmap helper'
     helper' = Fix . helper . f

and hence

  bootstrap f = helper'
     where helper' = Fix . fmap helper' . f

and thus

  bootstrap f = Fix . fmap (bootstrap f) . f

which is the definition of  ana f .


> And yet another question: One can get the median in deterministic
> linear time.  For quicksort choosing the median as pivot keeps the O(n
> log n) average running time and brings down the expected worst case,
> too.  Do you know of a (natural) way to combine selecting the median
> and doing the quicksort, so that you don't compare unnecessarily?
>
> The way to de-randomize quickselect is to calculate medians of
> medians.  I.e. solve the problem for smaller instances first.  I
> suspect if we follow this strategy with the reified quicksort
> call-trees, the de-randomized quicksort will look a lot like
> mergesort.

Interesting question! (Of which I don't know the answer of. ;) )

I never understood the median of median method because I always thought
that it calculates an approximate median of approximate medians, which I
now realize is not the case. I concur that there should be something
better than "recursively wasting" comparisons in quickselect just for
finding the median pivot in quicksort, especially since both are the
same algorithm modulo lazy evaluation.

Concerning quicksort looking like mergesort, it seems like a good idea
to somehow merge the groups of 5 from the median-of-medians calculation.
However, there is an important difference between quicksort and
mergesort, namely lazy quicksort is able to produce the batch of the
first k smallest elements in

  O(n + k log k) total time [1]

while lazy mergesort needs

  O(n + k log n) total time [2]

There is something about quicksort that makes it "better" than mergesort.


[1] Proof goes like this: First, quicksort chooses a sublist of smallest
items recursively until a list of the smallest k items remains, this
takes O(n) time. Then the list consisting of the smallest k items is
sorted in Θ(k log k).

[2] Mergesort builds a tournament tree in Θ(n) time and then performs k
tournaments that take O(log n) time each, so the the second phase is O(k
log n).

I need to think about this.



Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Bertram Felgenhauer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Petr Pudlak wrote:
> Would it be possible to create a lazy selection/sorting
> algorithm so that getting any element of the sorted list/array by its index
> would require just O(n) time, and getting all the elements would still be in
> O(n * log n)?

The (merge) sorting algorithm provided by Data.List has this property,
providing the first k elements in O(n + k log(n)) time. (source at [1])

As a mental model, you can think of suspended 'merge' evaluations as
internal nodes in a tree, with the two arguments as subtrees. In that
model, the algorithm turns into heap sort: It builds a balanced binary
tree with n external nodes, taking O(n) time -- this job is done by
merge_pairs -- and then repeatedly extracts the minimum element, taking
O(log(n)) time for each one.

regards,

Bertram

[1] http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.html#mergesort
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Re: excercise - a completely lazy sorting algorithm

by Raynor Vliegendhart :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/9/09, Heinrich Apfelmus <apfelmus@...> wrote:
>
> Of course, some part of algorithm has to be recursive, but this can be
> outsourced to a general recursion scheme, like the hylomorphism
>
>    hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b)
>    hylo f g = g . fmap (hylo f g) . f
>

Is that definition of hylo actually usuable? A few on IRC tried to use
that definition for a few examples, but the examples failed to
terminate or blew up the stack.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Heinrich Apfelmus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raynor Vliegendhart wrote:

> On 7/9/09, Heinrich Apfelmus <apfelmus@...> wrote:
>> Of course, some part of algorithm has to be recursive, but this can be
>> outsourced to a general recursion scheme, like the hylomorphism
>>
>>    hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b)
>>    hylo f g = g . fmap (hylo f g) . f
>>
>
> Is that definition of hylo actually usable? A few on IRC tried to use
> that definition for a few examples, but the examples failed to
> terminate or blew up the stack.

The implementation of  quicksort  with  hylo  works fine for me, given
medium sized inputs like for example  quicksort (reverse [1..1000]) .

What were the examples you tried?


Regards,
apfelmus

--
http://apfelmus.nfshost.com

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: excercise - a completely lazy sorting algorithm

by Petr Pudlak (Debian) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

I apologize that I didn't react to your posts, I was on a vacation. (BTW, if
you ever come to Slovakia, I strongly recommend visiting Mala (Lesser) Fatra
mountains. IMHO it's more beautiful than more-known Tatra mountains.)

Thanks for your interest and many intriguing ideas. Especially, I like
cata-/ana-/hylo-morphisms, it looks to me as a very useful concept to learn.

I hope I'll manage to create my own version of the sorting algorithm based on
your advices. Maybe I'll also try to do some real benchmarks, if I have time.

-Petr

On Tue, Jul 07, 2009 at 02:49:08AM +0200, Matthias Görgens wrote:

> The "sorted array of bags of unsorted input" is a nice idea.  However,
> you have to use the data structure in a single-threaded [1] fashion to
> obtain the claimed bounds.
>
> Here's a pure solution that uses amortization and laziness.
>
> > import qualified Data.Sequence as S
> > import Data.Sequence ((><))
> > import Data.Foldable
> > import Data.Monoid
>
> Suppose we have a function to find the the median of a list, and
> partition it into three sublists: Smaller than the median, equal to
> the media, larger than the median.  That function should run in linear
> time.
>
> > partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)
>
> where the following data structure holds the sublists and some
> bookkeeping information:
>
> > data BTreeRaw a m = Leaf
> >                   | Node {cmp::(a->Ordering)
> >                          , lN :: Int
> >                          , less::m
> >                          , eq :: (S.Seq a)
> >                          , gN :: Int
> >                          , greater::m
> >                          }
>
> where 'lN' and 'gN' are the length of 'less' and 'greater'.
>
> We can make BTreeRaw a functor:
>
> > instance Functor (BTreeRaw a) where
> >     fmap f Leaf = Leaf
> >     fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g)
>
> Now using a fixed-point construction we can bootstrap a sorting
> algorithm from partitionOnMedian:
>
> > data Fix m = Fix {unfix :: (m (Fix m))}
> > type BTree a = Fix (BTreeRaw a)
>
> > treeSort :: forall a. (Ord a) => S.Seq a -> BTree a
> > treeSort = Fix . helper . partitionOnMedian
> >     where helper = fmap (Fix . helper . partitionOnMedian)
>
> Now treeSort produces the thunk of a balanced binary search tree.  Of
> course we can get a sorted list out of it (forcing the whole
> structure):
>
> > flatten :: BTree a -> S.Seq a
> > flatten (Fix Leaf) = S.empty
> > flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g
>
> > mySort = flatten . treeSort
>
> But we can also get elements efficently, forcing only a linear amount
> of comparisions in the worst case:
>
> > index :: BTree a -> Int -> a
> > index (Fix Leaf) _ = error "tried to get an element of Leaf"
> > index (Fix (Node lN l e gN g)) i | i < lN
> >                                      = index l i
> >                                  | i - lN < S.length e
> >                                      = S.index e (i-lN)
> >                                  | i - lN - S.length e < gN
> >                                      = index g (i - lN - S.length e)
> >                                  | i - lN - S.length e - gN >= 0
> >                                      = error "index out of bounds"
>
> Although we do have to force comparisions only once every time we
> touch the same element in the tree, we do still have to traverse the
> tree (in logarithmic time).
>
> If you want linear time access on first touch of an element and
> constant time access afterwards us toArray:
>
> > toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t
> > toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI])
> >     where size (Fix Leaf) = 0
> >           size (Fix (Node lN _ e gN _)) = lN + S.length e + gN
> >           maxI = size tree - 1
>
> [1] Single-Threaded in the sense of Okasaki's use of the word.
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@...
> http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe