Request for eachFileRecursive

View: New views
20 Messages — Rating Filter:   Alert me  

Request for eachFileRecursive

by Jesse Eichar-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have a request for the FileExtras class (I'm willing to write the  
code and the tests if my proposal is accepted).  I have found my self  
having to over and over write a method for recursing through a  
directory graph and doing something to many or all of the files.  For  
example I might be deleting all .svn directories or some other task.

Probably the reason for this is because I am introducing Scala at my  
company through scripts and maybe a tool for automated builds  
(although I'm less convinced this is a good idea).

Feel free to shoot me down but it does seem like a useful method for  
the FileExtras class.

The obvious way to implement it is in a "in-order" traversal.  
Something like the following:

> import java.io.File
>
> def eachChildRecursive( parent:File,  function:File=>Unit ):Unit = {
> parent.listFiles.foreach { child =>
> function(child)
> if(child.isDirectory())
> eachChildRecursive(child,function)
> }
> }
>
> val here = new File(".")
>
> eachChildRecursive( here, println _ )

However by calling the function after the if statement we could  
perform a depth-first traversal.

Is there any desire by anyone else to see this in the library?  I have  
already written it at least 5 times which means it will go into my  
private library otherwise.  Seeing that ScalaX has a much wider  
audience I thought to offer my services in writing for scalax.

Jesse


Re: Request for eachFileRecursive

by Stepan Koltsov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think it would be better to have

def childrenRecursively: Iterable[File]

method, that returns all children recursively, but lazily, i. e. it
does evaluation when it elements accessed.

S.

On Thu, Jul 24, 2008 at 18:49, Jesse Eichar <jesse.eichar@...> wrote:

> I have a request for the FileExtras class (I'm willing to write the code and
> the tests if my proposal is accepted).  I have found my self having to over
> and over write a method for recursing through a directory graph and doing
> something to many or all of the files.  For example I might be deleting all
> .svn directories or some other task.
>
> Probably the reason for this is because I am introducing Scala at my company
> through scripts and maybe a tool for automated builds (although I'm less
> convinced this is a good idea).
>
> Feel free to shoot me down but it does seem like a useful method for the
> FileExtras class.
>
> The obvious way to implement it is in a "in-order" traversal.  Something
> like the following:
>
>> import java.io.File
>>
>> def eachChildRecursive( parent:File,  function:File=>Unit ):Unit = {
>>        parent.listFiles.foreach { child =>
>>                function(child)
>>                if(child.isDirectory())
>>                        eachChildRecursive(child,function)
>>        }
>> }
>>
>> val here = new File(".")
>>
>> eachChildRecursive( here, println _ )
>
> However by calling the function after the if statement we could perform a
> depth-first traversal.
>
> Is there any desire by anyone else to see this in the library?  I have
> already written it at least 5 times which means it will go into my private
> library otherwise.  Seeing that ScalaX has a much wider audience I thought
> to offer my services in writing for scalax.
>
> Jesse
>
>


Re: Request for eachFileRecursive

by Jesse Eichar-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I like that.  That would allow me to do my usual case but also the  
other operations I was thinking about like find, filter, etc...

Cool.  I will get on this.

On 24-Jul-08, at 4:55 PM, Stepan Koltsov wrote:

> I think it would be better to have
>
> def childrenRecursively: Iterable[File]
>
> method, that returns all children recursively, but lazily, i. e. it
> does evaluation when it elements accessed.
>
> S.
>
> On Thu, Jul 24, 2008 at 18:49, Jesse Eichar <jesse.eichar@...>  
> wrote:
>> I have a request for the FileExtras class (I'm willing to write the  
>> code and
>> the tests if my proposal is accepted).  I have found my self having  
>> to over
>> and over write a method for recursing through a directory graph and  
>> doing
>> something to many or all of the files.  For example I might be  
>> deleting all
>> .svn directories or some other task.
>>
>> Probably the reason for this is because I am introducing Scala at  
>> my company
>> through scripts and maybe a tool for automated builds (although I'm  
>> less
>> convinced this is a good idea).
>>
>> Feel free to shoot me down but it does seem like a useful method  
>> for the
>> FileExtras class.
>>
>> The obvious way to implement it is in a "in-order" traversal.  
>> Something
>> like the following:
>>
>>> import java.io.File
>>>
>>> def eachChildRecursive( parent:File,  function:File=>Unit ):Unit = {
>>>       parent.listFiles.foreach { child =>
>>>               function(child)
>>>               if(child.isDirectory())
>>>                       eachChildRecursive(child,function)
>>>       }
>>> }
>>>
>>> val here = new File(".")
>>>
>>> eachChildRecursive( here, println _ )
>>
>> However by calling the function after the if statement we could  
>> perform a
>> depth-first traversal.
>>
>> Is there any desire by anyone else to see this in the library?  I  
>> have
>> already written it at least 5 times which means it will go into my  
>> private
>> library otherwise.  Seeing that ScalaX has a much wider audience I  
>> thought
>> to offer my services in writing for scalax.
>>
>> Jesse
>>
>>



Re: Request for eachFileRecursive

by Andrew O'Malley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

FYI, below is the implementation I use. You are free use it or discard  
it as you see fit.

I named it listRecursively to match the deleteRecursively method  
already in FileExtras.

However, that's pretty verbose for a commonly used function. How about  
recurse?

And I agree completely with Stepan that an Iterable approach is  
better. You can then use it in for comprehensions such as:

for (file <- "/Users/andrew".toFile.recurse if file.getName == ".svn") {
        ...
}

Cheers,
Andrew

   def listRecursively() = new Iterator[File] {
     // Seperate files and dirs so that files can be processed first  
to minimise the queue size
     val files = new Queue[File]();
     val dirs = new Queue[File]();
     enqueue(file)

     def enqueue(dir: File) = {
       for (file <- dir.listFiles()) if (file.isDirectory())  
dirs.enqueue(file) else files.enqueue(file)
       dir
     }

     def hasNext = files.length != 0 || dirs.length != 0

     def next() = {
       if (!hasNext) throw new NoSuchElementException("EOF")
       if (files.length != 0) files.dequeue else enqueue(dirs.dequeue)
     }
   }


On 25/07/2008, at 1:40 AM, Jesse Eichar wrote:

> I like that.  That would allow me to do my usual case but also the  
> other operations I was thinking about like find, filter, etc...
>
> Cool.  I will get on this.
>
> On 24-Jul-08, at 4:55 PM, Stepan Koltsov wrote:
>
>> I think it would be better to have
>>
>> def childrenRecursively: Iterable[File]
>>
>> method, that returns all children recursively, but lazily, i. e. it
>> does evaluation when it elements accessed.
>>
>> S.


Re: Request for eachFileRecursive

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-07-25 08:14:31 Andrew O'Malley wrote:

> Hi all,
>
> FYI, below is the implementation I use. You are free use it or
> discard it as you see fit.
>
> I named it listRecursively to match the deleteRecursively method  
> already in FileExtras.
>
> However, that's pretty verbose for a commonly used function. How
> about recurse?
>
> And I agree completely with Stepan that an Iterable approach is  
> better. You can then use it in for comprehensions such as:
>
> for (file <- "/Users/andrew".toFile.recurse if file.getName ==
> ".svn") { ...
> }

Hi Andrew. This looks very useful. Do you mind sending me a CLA so that
we can include it in the library?

  http://scalax.scalaforge.org/cla-i.html

Thanks

/J


Re: Request for eachFileRecursive

by Andrew O'Malley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

FYI, I submitted the CLA to Jamie, so presumably file recursion will  
be added to scalax by Jamie soon.

Apologies to Jesse if you've spent time on this already.

Cheers,
Andrew


On 27/07/2008, at 1:07 AM, Jamie Webb wrote:

>> for (file <- "/Users/andrew".toFile.recurse if file.getName ==
>> ".svn") { ...
>> }
>
> Hi Andrew. This looks very useful. Do you mind sending me a CLA so  
> that
> we can include it in the library?
>
>  http://scalax.scalaforge.org/cla-i.html
>
> Thanks
>
> /J



Re: Request for eachFileRecursive

by Jesse Eichar-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Not a problem.  It was a good solution.  I'm glad to finally see the  
method making it in.  I am sick of writing it :)

Jesse
On 26-Jul-08, at 11:46 PM, Andrew O'Malley wrote:

> Hi all,
>
> FYI, I submitted the CLA to Jamie, so presumably file recursion will  
> be added to scalax by Jamie soon.
>
> Apologies to Jesse if you've spent time on this already.
>
> Cheers,
> Andrew
>
>
> On 27/07/2008, at 1:07 AM, Jamie Webb wrote:
>>> for (file <- "/Users/andrew".toFile.recurse if file.getName ==
>>> ".svn") { ...
>>> }
>>
>> Hi Andrew. This looks very useful. Do you mind sending me a CLA so  
>> that
>> we can include it in the library?
>>
>> http://scalax.scalaforge.org/cla-i.html
>>
>> Thanks
>>
>> /J
>
>



Re: Request for eachFileRecursive

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-07-26 16:07:49 Jamie Webb wrote:

> On 2008-07-25 08:14:31 Andrew O'Malley wrote:
> > Hi all,
> >
> > FYI, below is the implementation I use. You are free use it or
> > discard it as you see fit.
> >
> > I named it listRecursively to match the deleteRecursively method  
> > already in FileExtras.
> >
> > However, that's pretty verbose for a commonly used function. How
> > about recurse?
> >
> > And I agree completely with Stepan that an Iterable approach is  
> > better. You can then use it in for comprehensions such as:
> >
> > for (file <- "/Users/andrew".toFile.recurse if file.getName ==
> > ".svn") { ...
> > }
>
> Hi Andrew. This looks very useful. Do you mind sending me a CLA so
> that we can include it in the library?
>
>   http://scalax.scalaforge.org/cla-i.html

I'm about to add this, but first a question about naming:

We've got two possibilities given here: 'listRecursively' and
'recurse'. However, IIRC Python calls this function 'walk', which I
quite like. In particular, 'recurse' seems a strange choice when in
fact we're unrolling the recursion by providing an iterator.

Any opinions?

/J


Re: Request for eachFileRecursive

by Eric Willigers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 29, 2008 at 7:49 AM, Jamie Webb <j@...> wrote:
 
Any opinions?

'walk' or 'listRecursively',  not 'recurse'


Re: Request for eachFileRecursive

by Stepan Koltsov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 29, 2008 at 01:49, Jamie Webb <j@...> wrote:

> We've got two possibilities given here: 'listRecursively' and
> 'recurse'. However, IIRC Python calls this function 'walk', which I
> quite like. In particular, 'recurse' seems a strange choice when in
> fact we're unrolling the recursion by providing an iterator.
>
> Any opinions?

I like "walk". It should return Iterator, not Iterable, as I suggested before.

S.


Re: Request for eachFileRecursive

by James Iry-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's a crazy idea: make it a tree.  It's a much more useful structure, it matches the domain better, and it's just as easy to "iterate"

for (file <- "/Users/andrew".toFile.tree) {doSomething(file)}


On Mon, Jul 28, 2008 at 2:49 PM, Jamie Webb <j@...> wrote:
On 2008-07-26 16:07:49 Jamie Webb wrote:
> On 2008-07-25 08:14:31 Andrew O'Malley wrote:
> > Hi all,
> >
> > FYI, below is the implementation I use. You are free use it or
> > discard it as you see fit.
> >
> > I named it listRecursively to match the deleteRecursively method
> > already in FileExtras.
> >
> > However, that's pretty verbose for a commonly used function. How
> > about recurse?
> >
> > And I agree completely with Stepan that an Iterable approach is
> > better. You can then use it in for comprehensions such as:
> >
> > for (file <- "/Users/andrew".toFile.recurse if file.getName ==
> > ".svn") { ...
> > }
>
> Hi Andrew. This looks very useful. Do you mind sending me a CLA so
> that we can include it in the library?
>
>   http://scalax.scalaforge.org/cla-i.html

I'm about to add this, but first a question about naming:

We've got two possibilities given here: 'listRecursively' and
'recurse'. However, IIRC Python calls this function 'walk', which I
quite like. In particular, 'recurse' seems a strange choice when in
fact we're unrolling the recursion by providing an iterator.

Any opinions?

/J



Re: Request for eachFileRecursive

by Andrew O'Malley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'd favour something short as I use it frequently in scripting, so  
that would rule out listRecursively.
Tree or walk is fine by me (with tree feeling a little more intuitive).

Cheers,
Andrew


On 29/07/2008, at 7:49 AM, Jamie Webb wrote:

>> calax.scalaforge.org/cla-i.html
>
> I'm about to add this, but first a question about naming:
>
> We've got two possibilities given here: 'listRecursively' and
> 'recurse'. However, IIRC Python calls this function 'walk', which I
> quite like. In particular, 'recurse' seems a strange choice when in
> fact we're unrolling the recursion by providing an iterator.
>
> Any opinions?
>
> /J



Re: Request for eachFileRecursive

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jul 28, 2008 at 11:46 PM, Stepan Koltsov <yozh@...> wrote:

> On Tue, Jul 29, 2008 at 01:49, Jamie Webb <j@...> wrote:
>
>> We've got two possibilities given here: 'listRecursively' and
>> 'recurse'. However, IIRC Python calls this function 'walk', which I
>> quite like. In particular, 'recurse' seems a strange choice when in
>> fact we're unrolling the recursion by providing an iterator.
>>
>> Any opinions?
>
> I like "walk". It should return Iterator, not Iterable, as I suggested before.

I feel fairly strongly that returning an Iterator from anything that's
not the elements method of an Iterable is a bad design decision (Yes,
I know the collections library does it in various places. I think it's
a bad design decision there too). Among other things it makes it
harder to compose it with other collections and prevents the
implementation of a more efficient foreach method.

As a name, I like "descendants". Of those suggested, "walk" seems the
most reasonable.


Re: Request for eachFileRecursive

by Jesse Eichar-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 29-Jul-08, at 9:24 AM, David MacIver wrote:

> On Mon, Jul 28, 2008 at 11:46 PM, Stepan Koltsov <yozh@...> wrote:
>> On Tue, Jul 29, 2008 at 01:49, Jamie Webb <j@...>  
>> wrote:
>>
>>> We've got two possibilities given here: 'listRecursively' and
>>> 'recurse'. However, IIRC Python calls this function 'walk', which I
>>> quite like. In particular, 'recurse' seems a strange choice when in
>>> fact we're unrolling the recursion by providing an iterator.
>>>
>>> Any opinions?
>>
>> I like "walk". It should return Iterator, not Iterable, as I  
>> suggested before.
>
> I feel fairly strongly that returning an Iterator from anything that's
> not the elements method of an Iterable is a bad design decision (Yes,
> I know the collections library does it in various places. I think it's
> a bad design decision there too). Among other things it makes it
> harder to compose it with other collections and prevents the
> implementation of a more efficient foreach method.
>
> As a name, I like "descendants". Of those suggested, "walk" seems the
> most reasonable.
>


I like descendants as well.  Walk might be short but it does not imply  
a collection to me.  It sounds like we should be passing a function to  
the method.

My 2 cents.

Jesse



Re: Request for eachFileRecursive

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-07-28 16:57:34 James Iry wrote:
> Here's a crazy idea: make it a tree.  It's a much more useful
> structure, it matches the domain better, and it's just as easy to
> "iterate"
>
> for (file <- "/Users/andrew".toFile.tree) {doSomething(file)}

Not sure what you mean here. What interface does this tree have?

/J


Re: Request for eachFileRecursive

by Matt Hellige :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

All...

The following implementation does something unexpected if there are
any unreadable directories in the hierarchy (at least on Unix). If dir
is a directory for which I don't have proper permissions, then
dir.listFiles() returns null, which results in a NullPointerException.
I don't have an opinion about whether this method should ignore
unreadable directories or report an error, but probably it should not
just NPE.

The same NPE occurs if the argument is not a directory, but maybe it's
ok to consider this user error? Actually, probably this should be
detected as well.

Thanks...
Matt


On Thu, Jul 24, 2008 at 5:14 PM, Andrew O'Malley <andrew@...> wrote:

> Hi all,
>
> FYI, below is the implementation I use. You are free use it or discard it as
> you see fit.
>
> I named it listRecursively to match the deleteRecursively method already in
> FileExtras.
>
> However, that's pretty verbose for a commonly used function. How about
> recurse?
>
> And I agree completely with Stepan that an Iterable approach is better. You
> can then use it in for comprehensions such as:
>
> for (file <- "/Users/andrew".toFile.recurse if file.getName == ".svn") {
>        ...
> }
>
> Cheers,
> Andrew
>
>  def listRecursively() = new Iterator[File] {
>    // Seperate files and dirs so that files can be processed first to
> minimise the queue size
>    val files = new Queue[File]();
>    val dirs = new Queue[File]();
>    enqueue(file)
>
>    def enqueue(dir: File) = {
>      for (file <- dir.listFiles()) if (file.isDirectory())
> dirs.enqueue(file) else files.enqueue(file)
>      dir
>    }
>
>    def hasNext = files.length != 0 || dirs.length != 0
>
>    def next() = {
>      if (!hasNext) throw new NoSuchElementException("EOF")
>      if (files.length != 0) files.dequeue else enqueue(dirs.dequeue)
>    }
>  }
>
>
> On 25/07/2008, at 1:40 AM, Jesse Eichar wrote:
>
>> I like that.  That would allow me to do my usual case but also the other
>> operations I was thinking about like find, filter, etc...
>>
>> Cool.  I will get on this.
>>
>> On 24-Jul-08, at 4:55 PM, Stepan Koltsov wrote:
>>
>>> I think it would be better to have
>>>
>>> def childrenRecursively: Iterable[File]
>>>
>>> method, that returns all children recursively, but lazily, i. e. it
>>> does evaluation when it elements accessed.
>>>
>>> S.
>
>



--
Matt Hellige / matt@...
http://matt.immute.net


Re: Request for eachFileRecursive

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-07-29 10:26:51 Jesse Eichar wrote:

>
> On 29-Jul-08, at 9:24 AM, David MacIver wrote:
>
> > On Mon, Jul 28, 2008 at 11:46 PM, Stepan Koltsov <yozh@...>
> > wrote:
> >> On Tue, Jul 29, 2008 at 01:49, Jamie Webb <j@...>  
> >> wrote:
> >>
> >>> We've got two possibilities given here: 'listRecursively' and
> >>> 'recurse'. However, IIRC Python calls this function 'walk', which
> >>> I quite like. In particular, 'recurse' seems a strange choice
> >>> when in fact we're unrolling the recursion by providing an
> >>> iterator.
> >>>
> >>> Any opinions?
> >>
> >> I like "walk". It should return Iterator, not Iterable, as I  
> >> suggested before.
> >
> > I feel fairly strongly that returning an Iterator from anything
> > that's not the elements method of an Iterable is a bad design
> > decision (Yes, I know the collections library does it in various
> > places. I think it's a bad design decision there too). Among other
> > things it makes it harder to compose it with other collections and
> > prevents the implementation of a more efficient foreach method.
> >
> > As a name, I like "descendants". Of those suggested, "walk" seems
> > the most reasonable.
> >
>
>
> I like descendants as well.  Walk might be short but it does not
> imply a collection to me.  It sounds like we should be passing a
> function to the method.

I don't think descendants is a good name, not least because it's
technically incorrect: Andrew's implementation (rightly, IMO) returns
the file itself as well as its descendents. (Oh look, I spelt it two
different ways.)

I think you're right though, David, that an Iterable is preferable.

/J


Re: Request for eachFileRecursive

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-07-29 14:16:54 Matt Hellige wrote:

> All...
>
> The following implementation does something unexpected if there are
> any unreadable directories in the hierarchy (at least on Unix). If dir
> is a directory for which I don't have proper permissions, then
> dir.listFiles() returns null, which results in a NullPointerException.
> I don't have an opinion about whether this method should ignore
> unreadable directories or report an error, but probably it should not
> just NPE.
>
> The same NPE occurs if the argument is not a directory, but maybe it's
> ok to consider this user error? Actually, probably this should be
> detected as well.

Good point. I can't imagine wanting to throw in the middle of an
iteration, but I can imagine wanting to know if errors occurred...

How about an extra method on the iterator?

  val iter = myFile.tree.elements
  for(val f <- iter) {
      if(iter.wasUnreadable) log.error("Couldn't read " + f)
      else ...
  }

Slightly strange, but it's there if you need it and you get the
ignoring behaviour if you don't.

I would expect calling this on a non-directory to throw an IOException.

/J


Re: Request for eachFileRecursive

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 29, 2008 at 8:21 PM, Jamie Webb <j@...> wrote:

>> > I feel fairly strongly that returning an Iterator from anything
>> > that's not the elements method of an Iterable is a bad design
>> > decision (Yes, I know the collections library does it in various
>> > places. I think it's a bad design decision there too). Among other
>> > things it makes it harder to compose it with other collections and
>> > prevents the implementation of a more efficient foreach method.
>> >
>> > As a name, I like "descendants". Of those suggested, "walk" seems
>> > the most reasonable.
>> >
>>
>>
>> I like descendants as well.  Walk might be short but it does not
>> imply a collection to me.  It sounds like we should be passing a
>> function to the method.
>
> I don't think descendants is a good name, not least because it's
> technically incorrect: Andrew's implementation (rightly, IMO) returns
> the file itself as well as its descendents. (Oh look, I spelt it two
> different ways.)

Good point. I withdraw the suggestion.


Re: Request for eachFileRecursive

by James Iry-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

An n-ary tree, perhaps a lazy one. Call the class NTree[+A], so this would File.tree:NTree[File].  It would have children:Seq[Ntree[A]].  At the very least it would have map(f:A=>B):NTree[B] and foreach(f:A=>Unit):Unit.  Probably should have folds and such too.  filter and flatMap need thought..

On Tue, Jul 29, 2008 at 12:13 PM, Jamie Webb <j@...> wrote:
On 2008-07-28 16:57:34 James Iry wrote:
> Here's a crazy idea: make it a tree.  It's a much more useful
> structure, it matches the domain better, and it's just as easy to
> "iterate"
>
> for (file <- "/Users/andrew".toFile.tree) {doSomething(file)}

Not sure what you mean here. What interface does this tree have?

/J