Task Optimization

View: New views
18 Messages — Rating Filter:   Alert me  

Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am interested in ways to short circuit task execution for the purpose of
optimization.  I would love to see some of this in 0.7 and would be glad to
contribute.

Here are some ideas:
1) Add an "onlyIf" method to Task that is given a closure. The closure would be
executed before the first action of the task and would cancel execution of the
task (with appropriate lifecycle message) if it returned false.  This closure
would have as a delegate an optimization container with some helper methods that
would provide more convenient access to change detection (among other things).
Then you could do:
   mytask.onlyIf {
     timestampChanged 'src/main/mysrc'
     // or contentsChanged 'src/main/mysrc'
   }

2) Running a clean should probably remove the change detection state information
for a project (or at least the clean task should be able to be configured to do
this conveniently).

3) I would like some general way for tasks to indicate that they did anything.
Perhaps task.getDidWork().  BTW, I figured out how to do this for gradle's use
of ant.javac and can now tell if it really compiled anything.

4) I would like to be able to specify that a chain of dependent tasks only
execute a task if Task.didWork is true for all of its dependents.  Note that
this is not always desired, so you need to be able to turn this on and off.  I'm
not sure of the best way to configure this.  If we use the onlyIf method
suggested above, it might take another closure to check this that would be
returned from a  "needed" method.  This would look like:
   myTask.onlyIf(needed())

This probably should be the default for tests, but perhaps not for all Tasks.

Javac is already checking to see if the source files are out of date with the
classes, so I don't think that the javac task needs to use the new
changedetection.  This would, however let you stop other tasks in the chain
(like test) if nothing needed to be compiled.  (unrelated: I would also like to
see an option on compile to use Ant's depend task.  I think the current
dependencyTracking option doesn't work with the modern compiler. )

Other types of tasks could make good use of Tom's change detection.

5) We probably want a command line option to be able to disable all of these
optimizations.  Sometimes you really want to force a build with no optimizations
(without running clean).


In the race for speed, Gradle will probably never catch Ant in a clean build (at
least while you are delegating most of the expensive stuff to ant).  However,
most of the time developers are doing incremental changes on existing systems
and not running clean.  In this case, if Gradle can support features to
conveniently bypass unneeded steps, it can be much faster.  Also, Gradle has a
huge advantage of a more maintainable and modular build specification.

--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a proof of concept implementation of some of this at
git://github.com/sappling/gradle.git in the "opt" branch.

This includes:
1) A new onlyIf method on Task
2) A new didWork method on Task
3) Implementations of didWork for Compile and GroovyCompile.  I don't think that
we can handle Ant's Copy task in this same way.  We may have to use a
replacement, but this has other consequences.
4) Changes to src/samples/java/quickstart/build.gradle to demo didWork and onlyIf.
5) A start at OptimizationHelper.isNeeded method.  This will require some
additional dependency management features, so I stopped development until I got
some feedback on this whole approach.

Steve Appling wrote:

> I am interested in ways to short circuit task execution for the purpose
> of optimization.  I would love to see some of this in 0.7 and would be
> glad to contribute.
>
> Here are some ideas:
> 1) Add an "onlyIf" method to Task that is given a closure. The closure
> would be executed before the first action of the task and would cancel
> execution of the task (with appropriate lifecycle message) if it
> returned false.  This closure would have as a delegate an optimization
> container with some helper methods that would provide more convenient
> access to change detection (among other things). Then you could do:
>   mytask.onlyIf {
>     timestampChanged 'src/main/mysrc'
>     // or contentsChanged 'src/main/mysrc'
>   }
>
> 2) Running a clean should probably remove the change detection state
> information for a project (or at least the clean task should be able to
> be configured to do this conveniently).
>
> 3) I would like some general way for tasks to indicate that they did
> anything. Perhaps task.getDidWork().  BTW, I figured out how to do this
> for gradle's use of ant.javac and can now tell if it really compiled
> anything.
>
> 4) I would like to be able to specify that a chain of dependent tasks
> only execute a task if Task.didWork is true for all of its dependents.  
> Note that this is not always desired, so you need to be able to turn
> this on and off.  I'm not sure of the best way to configure this.  If we
> use the onlyIf method suggested above, it might take another closure to
> check this that would be returned from a  "needed" method.  This would
> look like:
>   myTask.onlyIf(needed())
>
> This probably should be the default for tests, but perhaps not for all
> Tasks.
>
> Javac is already checking to see if the source files are out of date
> with the classes, so I don't think that the javac task needs to use the
> new changedetection.  This would, however let you stop other tasks in
> the chain (like test) if nothing needed to be compiled.  (unrelated: I
> would also like to see an option on compile to use Ant's depend task.  I
> think the current dependencyTracking option doesn't work with the modern
> compiler. )
>
> Other types of tasks could make good use of Tom's change detection.
>
> 5) We probably want a command line option to be able to disable all of
> these optimizations.  Sometimes you really want to force a build with no
> optimizations (without running clean).
>
>
> In the race for speed, Gradle will probably never catch Ant in a clean
> build (at least while you are delegating most of the expensive stuff to
> ant).  However, most of the time developers are doing incremental
> changes on existing systems and not running clean.  In this case, if
> Gradle can support features to conveniently bypass unneeded steps, it
> can be much faster.  Also, Gradle has a huge advantage of a more
> maintainable and modular build specification.
>

--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jun 19, 2009, at 9:44 PM, Steve Appling wrote:

> I am interested in ways to short circuit task execution for the  
> purpose of optimization.  I would love to see some of this in 0.7  
> and would be glad to contribute.
>
> Here are some ideas:
> 1) Add an "onlyIf" method to Task that is given a closure. The  
> closure would be executed before the first action of the task and  
> would cancel execution of the task (with appropriate lifecycle  
> message) if it returned false.  This closure would have as a  
> delegate an optimization container with some helper methods that  
> would provide more convenient access to change detection (among  
> other things). Then you could do:
>  mytask.onlyIf {
>    timestampChanged 'src/main/mysrc'
>    // or contentsChanged 'src/main/mysrc'
>  }

I like the syntax. I'm also thinking about the following use cases:

I want to _add_ a custom onlyIf condition.
I want to remove a condition.

What you can do to add:

oldOnlyIf = myTask.onlyIf
myTask.onlyIf {
    value == 5 && oldOnlyIf.call()
}

It is not very nice but it works. So I think that should be good  
enough for 0.7. Later we might add a spec like API.

Removing and replacing is obviously easy.

>
> 2) Running a clean should probably remove the change detection state  
> information for a project (or at least the clean task should be able  
> to be configured to do this conveniently).

That would be important.

>
> 3) I would like some general way for tasks to indicate that they did  
> anything. Perhaps task.getDidWork().  BTW, I figured out how to do  
> this for gradle's use of ant.javac and can now tell if it really  
> compiled anything.

This makes sense.

>
> 4) I would like to be able to specify that a chain of dependent  
> tasks only execute a task if Task.didWork is true for all of its  
> dependents.  Note that this is not always desired, so you need to be  
> able to turn this on and off.  I'm not sure of the best way to  
> configure this.  If we use the onlyIf method suggested above, it  
> might take another closure to check this that would be returned from  
> a  "needed" method.  This would look like:
>  myTask.onlyIf(needed())



>
> This probably should be the default for tests, but perhaps not for  
> all Tasks.
>
> Javac is already checking to see if the source files are out of date  
> with the classes, so I don't think that the javac task needs to use  
> the new changedetection.

Right. But we can set the didWork flag.

> This would, however let you stop other tasks in the chain (like  
> test) if nothing needed to be compiled.

Right.

>  (unrelated: I would also like to see an option on compile to use  
> Ant's depend task.  I think the current dependencyTracking option  
> doesn't work with the modern compiler. )

Interesting. This deserves a discussion on its own. I think this is an  
important topic.

>
> Other types of tasks could make good use of Tom's change detection.
>
> 5) We probably want a command line option to be able to disable all  
> of these optimizations.  Sometimes you really want to force a build  
> with no optimizations (without running clean).

Right. A contrived example: You want the tests to be run even if  
nothing needs to be compiled as your tests depend on some dynamic  
properties retrieved from the network. Adam has come up with the idea  
of introducing the notion of a build type. We should discuss this now  
in more detail.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Cool. I'm keen to get this into 0.7.

On Jun 22, 2009, at 9:03 PM, Steve Appling wrote:

> I have a proof of concept implementation of some of this at git://
> github.com/sappling/gradle.git in the "opt" branch.
>
> This includes:
> 1) A new onlyIf method on Task
> 2) A new didWork method on Task
> 3) Implementations of didWork for Compile and GroovyCompile.

Amazing. For Ant 1.7.1 you probably could also use the new  
updatedProperty. But no such thing exists for Groovyc. Excellent. And  
I think we should expose all the information you gather. The public  
API of the Compile task could return a list with compiled files.

> I don't think that we can handle Ant's Copy task in this same way.  
> We may have to use a replacement, but this has other consequences.

I guess the problem is that as long as the Copy task is not able to  
tell if it did work, we can't decide whether to skip the tests (unless  
we check the binary dir). Isn't it?


> 5) A start at OptimizationHelper.isNeeded method.  This will require  
> some additional dependency management features, so I stopped  
> development until I got some feedback on this whole approach.

I have to think about the whole 'needed' thing. I will give feedback  
later.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

<snip>

> 4) I would like to be able to specify that a chain of dependent  
> tasks only execute a task if Task.didWork is true for all of its  
> dependents.

I don't fully understand this. Could you explain this a bit more?

<snip>

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Hans Dockter wrote:

> <snip>
>
>> 4) I would like to be able to specify that a chain of dependent tasks
>> only execute a task if Task.didWork is true for all of its dependents.
>
> I don't fully understand this. Could you explain this a bit more?
>
> <snip>
>
> - Hans
>

Sure - I did not express that very well at all.  I also wrote it before I
attempted an implementation, so I think I have a better idea of what might be
needed now.

In the syntax that I implemented, you could say:
test.onlyIf { isNeeded() }

I wanted this to be able to look at the TaskDependencies for the test task and
only execute if Task.didWork was true for one of them. I was not able to figure
out how to use TaskDependencies to accomplish this.
task.getTaskDependencies(task) only returns the tasks that are explicitly added
using dependsOn and doesn't seem to take into account the tasks needed to build
the artifacts in the configurations that are contained in the TaskDependencies
object.

In this case (a Test task), I would like the isNeeded method to return true if
either compile or compileTests didWork() is true or if any of the tasks needed
to build artifacts in the testRuntime configuration didWork() was true.
Currently it does not check the tasks that might be derived from the configuration.

I was hoping that a general purpose isNeeded helper could do this for all tasks
in the same way, but it is possible that certain subclasses of Task just need
their own specific implementations.
--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Hans Dockter wrote:

> Cool. I'm keen to get this into 0.7.
>
> On Jun 22, 2009, at 9:03 PM, Steve Appling wrote:
>
>> I have a proof of concept implementation of some of this at
>> git://github.com/sappling/gradle.git in the "opt" branch.
>>
>> This includes:
>> 1) A new onlyIf method on Task
>> 2) A new didWork method on Task
>> 3) Implementations of didWork for Compile and GroovyCompile.
>
> Amazing. For Ant 1.7.1 you probably could also use the new
> updatedProperty. But no such thing exists for Groovyc. Excellent. And I
> think we should expose all the information you gather. The public API of
> the Compile task could return a list with compiled files.
>
>> I don't think that we can handle Ant's Copy task in this same way.  We
>> may have to use a replacement, but this has other consequences.
>
> I guess the problem is that as long as the Copy task is not able to tell
> if it did work, we can't decide whether to skip the tests (unless we
> check the binary dir). Isn't it?
>
The Ant copy task keeps a list of the files to copy, but clears it at the end of
  the execute method (comment says this is to clean up so a single instance can
be reused).

We need to know if the copy did anything (for tasks like processResources) so
that processResources.didWork can be part of the onlyIf closure for test.

I have a replacement implementation of Copy that doesn't use Ant, but I would
want to give it a closer look before making it public.  It is not exactly the
same syntax as the current Copy task, but has some nice extra features including
file renaming based on regular expressions and filtering content during a copy.
It can also track if any files were actually copied.

>
>> 5) A start at OptimizationHelper.isNeeded method.  This will require
>> some additional dependency management features, so I stopped
>> development until I got some feedback on this whole approach.
>
> I have to think about the whole 'needed' thing. I will give feedback later.
>
> - Hans
>
> --
> Hans Dockter
> Gradle Project Manager
> http://www.gradle.org
>


--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jun 24, 2009, at 10:42 PM, Steve Appling wrote:

>
>
> Hans Dockter wrote:
>> <snip>
>>> 4) I would like to be able to specify that a chain of dependent  
>>> tasks only execute a task if Task.didWork is true for all of its  
>>> dependents.
>> I don't fully understand this. Could you explain this a bit more?
>> <snip>
>> - Hans
>
> Sure - I did not express that very well at all.  I also wrote it  
> before I attempted an implementation, so I think I have a better  
> idea of what might be needed now.
>
> In the syntax that I implemented, you could say:
> test.onlyIf { isNeeded() }
>
> I wanted this to be able to look at the TaskDependencies for the  
> test task and only execute if Task.didWork was true for one of them.  
> I was not able to figure out how to use TaskDependencies to  
> accomplish this. task.getTaskDependencies(task) only returns the  
> tasks that are explicitly added using dependsOn and doesn't seem to  
> take into account the tasks needed to build the artifacts in the  
> configurations that are contained in the TaskDependencies object.

At the moment our task execution graph does not provide this  
information nor does it have a data model for this. What should be  
straight forward to do is to add a method to the execution graph that  
computes this on the fly for a certain task.

>
> In this case (a Test task), I would like the isNeeded method to  
> return true if either compile or compileTests didWork() is true or  
> if any of the tasks needed to build artifacts in the testRuntime  
> configuration didWork() was true. Currently it does not check the  
> tasks that might be derived from the configuration.

It is similar to the idea of smart exclusion except that this needs to  
be done at execution time.

>
> I was hoping that a general purpose isNeeded helper could do this  
> for all tasks in the same way, but it is possible that certain  
> subclasses of Task just need their own specific implementations.

Right.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jun 24, 2009, at 11:09 PM, Steve Appling wrote:

>
>
> Hans Dockter wrote:
>> Cool. I'm keen to get this into 0.7.
>> On Jun 22, 2009, at 9:03 PM, Steve Appling wrote:
>>> I have a proof of concept implementation of some of this at git://
>>> github.com/sappling/gradle.git in the "opt" branch.
>>>
>>> This includes:
>>> 1) A new onlyIf method on Task
>>> 2) A new didWork method on Task
>>> 3) Implementations of didWork for Compile and GroovyCompile.
>> Amazing. For Ant 1.7.1 you probably could also use the new  
>> updatedProperty. But no such thing exists for Groovyc. Excellent.  
>> And I think we should expose all the information you gather. The  
>> public API of the Compile task could return a list with compiled  
>> files.
>>> I don't think that we can handle Ant's Copy task in this same  
>>> way.  We may have to use a replacement, but this has other  
>>> consequences.
>> I guess the problem is that as long as the Copy task is not able to  
>> tell if it did work, we can't decide whether to skip the tests  
>> (unless we check the binary dir). Isn't it?
> The Ant copy task keeps a list of the files to copy, but clears it  
> at the end of  the execute method (comment says this is to clean up  
> so a single instance can be reused).
>
> We need to know if the copy did anything (for tasks like  
> processResources) so that processResources.didWork can be part of  
> the onlyIf closure for test.
>
> I have a replacement implementation of Copy that doesn't use Ant,  
> but I would want to give it a closer look before making it public.  
> It is not exactly the same syntax as the current Copy task, but has  
> some nice extra features including file renaming based on regular  
> expressions and filtering content during a copy. It can also track  
> if any files were actually copied.

I'm very happy to switch the Copy implementation even if it introduces  
some breaking changes. I'm very interested to have a look at your  
implementation.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Adam Murdoch-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Steve Appling wrote:

> I am interested in ways to short circuit task execution for the
> purpose of optimization.  I would love to see some of this in 0.7 and
> would be glad to contribute.
>
> Here are some ideas:
> 1) Add an "onlyIf" method to Task that is given a closure. The closure
> would be executed before the first action of the task and would cancel
> execution of the task (with appropriate lifecycle message) if it
> returned false.  This closure would have as a delegate an optimization
> container with some helper methods that would provide more convenient
> access to change detection (among other things). Then you could do:
>   mytask.onlyIf {
>     timestampChanged 'src/main/mysrc'
>     // or contentsChanged 'src/main/mysrc'
>   }
>

I think this is a good idea.

> 2) Running a clean should probably remove the change detection state
> information for a project (or at least the clean task should be able
> to be configured to do this conveniently).
>

I think the change detection mechanism should figure out that the output
artifacts don't exist any more instead.

One thing that clean should arguably get rid of is the internal
repository in $rootDir/.gradle. I wonder if it should also clean the
buildSrc project?

> 3) I would like some general way for tasks to indicate that they did
> anything. Perhaps task.getDidWork().  BTW, I figured out how to do
> this for gradle's use of ant.javac and can now tell if it really
> compiled anything.
>

When you say 'it really compiled anything' do you mean you can tell
whether the task decided to invoke javac or not?

I think it would be better if Gradle could figure out whether a task did
anything, rather than require the task writer to do anything. I think we
could assume that if a task executes any task action, it has done work.
If a task wants to do any short-circuiting, it would need to use an
onlyIf() predicate. In addition, if we provided any easy way for a task
to declare its output artifacts, then Gradle can additionally
automatically apply change detection to these output artifacts in order
to decide whether the task did any work.

So, instead of adding a Task.didWork property, perhaps we should merge
this concept with the existing Task.executed property into a single
read-only Task.state property with an enum with values something like:
created, executed, or skipped.

> 4) I would like to be able to specify that a chain of dependent tasks
> only execute a task if Task.didWork is true for all of its
> dependents.  Note that this is not always desired, so you need to be
> able to turn this on and off.  I'm not sure of the best way to
> configure this.  If we use the onlyIf method suggested above, it might
> take another closure to check this that would be returned from a  
> "needed" method.  This would look like:
>   myTask.onlyIf(needed())
>
> This probably should be the default for tests, but perhaps not for all
> Tasks.
>

I'm not sure about this approach.

The tests should run if either the test classes or the classes under
test have changed since last time we successfully ran the tests.
Arguably a change to the test runtime classpath should also cause the
tests to run. In other words, the tests should be run only if the input
artifacts have not changed since last time we ran the tests. Checking
whether all the dependencies of the test task have executed or not is
only an approximation of this, and not a general solution. For example,
if I assemble my classes under test using, say, 2 independent Compile
tasks, then the test task should run if either task has done something.
Or, I may assemble my classes using some other build tool, so that
there's no task which we can use to check whether or not the classes
have changed.

To me, the key to task optimisation is to base it on the input and
output artifacts of a task. If we make it easy to declare both the input
and output artifacts of a task, we make the model much richer, and from
this we get a lot of goodness.

For example, if we know what the input artifacts for a task are, Gradle
can apply change detection to those input artifacts on the task's
behalf. If we also know which tasks produce those artifacts, then Gradle
can optimise the change detection. Gradle could, for example, when it
knows which task produces a given artifact, simply use the fact that the
producer task executed an action or not to decide whether the input
artifacts have changed, and only fall back to hashing or timestamps or a
Java 7 file watcher or whatever when it doesn't know how the artifact is
produced. Similarly, it could use the fact that a Jar was downloaded by
the dependency management system to decide whether the input artifacts
have changed.

Adding input and output artifacts to the model also lets us use this
information to build the DAG, and to be smart about skipping tasks. For
example, if the test task were to declare that it uses the tests classes
directory and the test runtime configuration as input artifacts, then
Gradle would be able to automatically add the tasks that produce these
(if any) to the task dependencies of the test task.

Knowing which tasks produce and consume a given artifact also allows us
to extract concurrency constraints from the model. If 2 tasks both
contribute to the production of the same artifact (classes dir, say),
they should not run concurrently. Or if 2 tasks both consume the same
artifact, they should not run concurrently. And obviously a producer and
consumer task for a given artifact should not run concurrently.

Extending this further, if we know the input and output artifacts of a
task, or subgraph of tasks, we can distribute the work to remote machines.

> Javac is already checking to see if the source files are out of date
> with the classes, so I don't think that the javac task needs to use
> the new changedetection.  This would, however let you stop other tasks
> in the chain (like test) if nothing needed to be compiled.  
> (unrelated: I would also like to see an option on compile to use Ant's
> depend task.  I think the current dependencyTracking option doesn't
> work with the modern compiler. )
>
> Other types of tasks could make good use of Tom's change detection.
>
> 5) We probably want a command line option to be able to disable all of
> these optimizations.  Sometimes you really want to force a build with
> no optimizations (without running clean).
>
>
> In the race for speed, Gradle will probably never catch Ant in a clean
> build (at least while you are delegating most of the expensive stuff
> to ant).

I wonder. The richer our model, the more scope we have to optimise
without the build script author or task author to doing anything
special. We can automatically extract parallelism. We can inline and
batch tasks. We can distribute bits of the build. We can reuse work that
other machines have already done.


Adam


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Adam Murdoch wrote:

>
>
> Steve Appling wrote:
>> I am interested in ways to short circuit task execution for the
>> purpose of optimization.  I would love to see some of this in 0.7 and
>> would be glad to contribute.
>>
>> Here are some ideas:
>> 1) Add an "onlyIf" method to Task that is given a closure. The closure
>> would be executed before the first action of the task and would cancel
>> execution of the task (with appropriate lifecycle message) if it
>> returned false.  This closure would have as a delegate an optimization
>> container with some helper methods that would provide more convenient
>> access to change detection (among other things). Then you could do:
>>   mytask.onlyIf {
>>     timestampChanged 'src/main/mysrc'
>>     // or contentsChanged 'src/main/mysrc'
>>   }
>>
>
> I think this is a good idea.
>
>> 2) Running a clean should probably remove the change detection state
>> information for a project (or at least the clean task should be able
>> to be configured to do this conveniently).
>>
>
> I think the change detection mechanism should figure out that the output
> artifacts don't exist any more instead.
>
> One thing that clean should arguably get rid of is the internal
> repository in $rootDir/.gradle. I wonder if it should also clean the
> buildSrc project?
>
>> 3) I would like some general way for tasks to indicate that they did
>> anything. Perhaps task.getDidWork().  BTW, I figured out how to do
>> this for gradle's use of ant.javac and can now tell if it really
>> compiled anything.
>>
>
> When you say 'it really compiled anything' do you mean you can tell
> whether the task decided to invoke javac or not?

Ant's javac scans the source and class files itself to see if any source files
are newer than the corresponding class files.  If so, it then calls Java's javac
with this list of outdated files.  After executing the gradle task, I can
determine which files were actually passed to Java's javac by ant.  For several
types of tasks (compile, groovycompile, copy, directory, zip, jar, tar), the
task is already doing its own optimization by comparing source timestamps to
some target during execution.  It is possible to execute the task without it
having any side effects.  Since most of them have the information about what
they actually did, it seems better (and faster) to use this information instead
of scanning source / output a second time externally to see what changed.

>
> I think it would be better if Gradle could figure out whether a task did
> anything, rather than require the task writer to do anything.
I would like this, but I'm not sure how to accomplish it in the general case.
Tasks may have input/output other than just a set of files (like network
operations, web services calls, deploy over webdav).  Even tasks like copy may
do the work in a way that makes it hard to see what happened after the fact.  I
know that we have several tasks which have output that is put into the same
directory with the output from other tasks.  It would not be sufficient to just
scan the output directories after each execution since they would also include
the results from other tasks.  If you never allow parallel execution, then you
could scan the output directories both before and after a tasks execution, but
this seems expensive.  If the task already knows what it did, why not make use
of that information.

For custom tasks (instances of DefaultTask), it seems simpler for a build writer
to set some state to indicate if they did anything than to specify the set of
files to check.  If this check is best done by comparing files, then we should
provide easy ways to call into the change detection code to set this state.

> I think we could assume that if a task executes any task action, it has done work.
I don't think this is true.  As I discussed above, there are many tasks (like
compile) that execute their task action, but decide during execution to not
cause any side effects.

> If a task wants to do any short-circuiting, it would need to use an
> onlyIf() predicate. In addition, if we provided any easy way for a task
> to declare its output artifacts, then Gradle can additionally
> automatically apply change detection to these output artifacts in order
> to decide whether the task did any work.
>
> So, instead of adding a Task.didWork property, perhaps we should merge
> this concept with the existing Task.executed property into a single
> read-only Task.state property with an enum with values something like:
> created, executed, or skipped.
>

I think you should be able to distinguish executed and did something from
executed and didn't do anything.

>> 4) I would like to be able to specify that a chain of dependent tasks
>> only execute a task if Task.didWork is true for all of its
>> dependents.  Note that this is not always desired, so you need to be
>> able to turn this on and off.  I'm not sure of the best way to
>> configure this.  If we use the onlyIf method suggested above, it might
>> take another closure to check this that would be returned from a  
>> "needed" method.  This would look like:
>>   myTask.onlyIf(needed())
>>
>> This probably should be the default for tests, but perhaps not for all
>> Tasks.
>>
>
> I'm not sure about this approach.
After trying to implement some of this, I no longer like all of this approach
either.  I don't think there is anything appropriate to do "for a chain of
dependent tasks".  I do still like the general idea of onlyIf { isNeeded() }.  I
think that isNeeded may be a good place contain any mechanism for Gradle to
automatically determine if artifacts it depends changed or tasks it depends on
did work.

>
> The tests should run if either the test classes or the classes under
> test have changed since last time we successfully ran the tests.
> Arguably a change to the test runtime classpath should also cause the
> tests to run. In other words, the tests should be run only if the input
> artifacts have not changed since last time we ran the tests. Checking
> whether all the dependencies of the test task have executed or not is
> only an approximation of this, and not a general solution. For example,
> if I assemble my classes under test using, say, 2 independent Compile
> tasks, then the test task should run if either task has done something.
> Or, I may assemble my classes using some other build tool, so that
> there's no task which we can use to check whether or not the classes
> have changed.
>
> To me, the key to task optimisation is to base it on the input and
> output artifacts of a task. If we make it easy to declare both the input
> and output artifacts of a task, we make the model much richer, and from
> this we get a lot of goodness.
>
> For example, if we know what the input artifacts for a task are, Gradle
> can apply change detection to those input artifacts on the task's
> behalf. If we also know which tasks produce those artifacts, then Gradle
> can optimise the change detection. Gradle could, for example, when it
> knows which task produces a given artifact, simply use the fact that the
> producer task executed an action or not to decide whether the input
> artifacts have changed, and only fall back to hashing or timestamps or a
> Java 7 file watcher or whatever when it doesn't know how the artifact is
> produced. Similarly, it could use the fact that a Jar was downloaded by
> the dependency management system to decide whether the input artifacts
> have changed.
>
> Adding input and output artifacts to the model also lets us use this
> information to build the DAG, and to be smart about skipping tasks. For
> example, if the test task were to declare that it uses the tests classes
> directory and the test runtime configuration as input artifacts, then
> Gradle would be able to automatically add the tasks that produce these
> (if any) to the task dependencies of the test task.
>
> Knowing which tasks produce and consume a given artifact also allows us
> to extract concurrency constraints from the model. If 2 tasks both
> contribute to the production of the same artifact (classes dir, say),
> they should not run concurrently. Or if 2 tasks both consume the same
> artifact, they should not run concurrently. And obviously a producer and
> consumer task for a given artifact should not run concurrently.
>
> Extending this further, if we know the input and output artifacts of a
> task, or subgraph of tasks, we can distribute the work to remote machines.
>
I think it might be a good approach to first add support for the onlyIf clause
and some helpers to allow manual use of optimization and then investigate
techniques to allow Gradle to be smarter about this and do more automatically.
If Gradle just adds optimization rules to tasks in the built in plugins and
doesn't provide automated optimization for custom tasks you will still get a lot
of benefit.

I generally like the idea of a richer model that has information about what each
task consumes and produces, but I'm not clear exactly how this would be
specified.  I don't want to require the build writer to duplicate information
about what the task inputs / outputs are.  I would love to see some examples of
how this would work for general tasks.

>> Javac is already checking to see if the source files are out of date
>> with the classes, so I don't think that the javac task needs to use
>> the new changedetection.  This would, however let you stop other tasks
>> in the chain (like test) if nothing needed to be compiled.  
>> (unrelated: I would also like to see an option on compile to use Ant's
>> depend task.  I think the current dependencyTracking option doesn't
>> work with the modern compiler. )
>>
>> Other types of tasks could make good use of Tom's change detection.
>>
>> 5) We probably want a command line option to be able to disable all of
>> these optimizations.  Sometimes you really want to force a build with
>> no optimizations (without running clean).
>>
>>
>> In the race for speed, Gradle will probably never catch Ant in a clean
>> build (at least while you are delegating most of the expensive stuff
>> to ant).
>
> I wonder. The richer our model, the more scope we have to optimise
> without the build script author or task author to doing anything
> special. We can automatically extract parallelism. We can inline and
> batch tasks. We can distribute bits of the build. We can reuse work that
> other machines have already done.
>
>
> Adam
>

--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Hans Dockter wrote:
>
> I'm very happy to switch the Copy implementation even if it introduces
> some breaking changes. I'm very interested to have a look at your
> implementation.
>
> - Hans
>

I'll put my changes in a public repo in a few days and start another thread here
with information about the syntax changes and new features.

--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>
> One thing that clean should arguably get rid of is the internal  
> repository in $rootDir/.gradle.

That's were we have our cached stuff. I'm wondering if clean should  
really affect the cache. Many people do always a clean when they do a  
build. That means buildSrc would always be builded and the build  
script would always be compiled. That's usually not what they want I  
think.

> I wonder if it should also clean the buildSrc project?

The buildSrc jar is cached. If the cached jar is not available (e.g.  
deleted by -C rebuild) or is out of date, the buildSrc project is  
rebuild with a clean.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by hdockter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>
>> 4) I would like to be able to specify that a chain of dependent  
>> tasks only execute a task if Task.didWork is true for all of its  
>> dependents.  Note that this is not always desired, so you need to  
>> be able to turn this on and off.  I'm not sure of the best way to  
>> configure this.  If we use the onlyIf method suggested above, it  
>> might take another closure to check this that would be returned  
>> from a  "needed" method.  This would look like:
>>  myTask.onlyIf(needed())
>>
>> This probably should be the default for tests, but perhaps not for  
>> all Tasks.
>>
>
> I'm not sure about this approach.
>
> The tests should run if either the test classes or the classes under  
> test have changed since last time we successfully ran the tests.  
> Arguably a change to the test runtime classpath should also cause  
> the tests to run. In other words, the tests should be run only if  
> the input artifacts have not changed since last time we ran the  
> tests. Checking whether all the dependencies of the test task have  
> executed or not is only an approximation of this, and not a general  
> solution. For example, if I assemble my classes under test using,  
> say, 2 independent Compile tasks, then the test task should run if  
> either task has done something. Or, I may assemble my classes using  
> some other build tool, so that there's no task which we can use to  
> check whether or not the classes have changed.
>
> To me, the key to task optimisation is to base it on the input and  
> output artifacts of a task. If we make it easy to declare both the  
> input and output artifacts of a task, we make the model much richer,  
> and from this we get a lot of goodness.
>
> For example, if we know what the input artifacts for a task are,  
> Gradle can apply change detection to those input artifacts on the  
> task's behalf. If we also know which tasks produce those artifacts,  
> then Gradle can optimise the change detection. Gradle could, for  
> example, when it knows which task produces a given artifact, simply  
> use the fact that the producer task executed an action or not to  
> decide whether the input artifacts have changed, and only fall back  
> to hashing or timestamps or a Java 7 file watcher or whatever when  
> it doesn't know how the artifact is produced. Similarly, it could  
> use the fact that a Jar was downloaded by the dependency management  
> system to decide whether the input artifacts have changed.
>

This is very interesting. I'm just trying to play a little with some  
terminology. There are output-affecting input values (e.g. classpaths,  
src dirs, compiler options, ...) and also some non-output-affecting  
input values like log level. The output affecting input values can be  
subdivided into belonging to something like an Outputter and something  
like plain input values. Outputters can tell if they did some work,  
for plain input values the task needs its own history and change  
detection management. By providing a rich domain model important types  
of plain input values can be turned into outputters (e.g. SourceDir).  
And for a subset of the remaining range of input value types we should  
be able to provide a nice toolkit that makes it easy to define change  
detection.

With the above model, the default behavior of onlyIf is  
inputValues.haveChanged == true

There are also scenarios like: This task should not be executed on  
Friday. I think they don't fit into the input value model. So we still  
need to accommodate custom onlyIf rules.

One of the interesting issues is to make it easy to write such tasks.

> Adding input and output artifacts to the model also lets us use this  
> information to build the DAG, and to be smart about skipping tasks.  
> For example, if the test task were to declare that it uses the tests  
> classes directory and the test runtime configuration as input  
> artifacts, then Gradle would be able to automatically add the tasks  
> that produce these (if any) to the task dependencies of the test task.

One things that comes to my mind is a scenario, that two tasks output  
into classesDir. But a third tasks only wants to be dependent on one  
of those other tasks. Yet I see your point. It is a very interesting  
question how to integrate the concepts of the input/output model with  
the DAG model. Again, a richer domain model can help. If the test  
tasks declares to use for example a SourceDir object as an input  
value, my scenario from above could easily be solved. But you could  
ask why not declaring a dependsOn relation from test to SourceDir? I  
think this is basically what we do with this new input model, with the  
difference that it is more specific. Instead of just providing a  
dependsOn method, an input value of type SourceDir could be translated  
into: This is a dependsOn for the purpose of having the classpath of  
the production code in the runtime classpath of the tests.

- Hans

--
Hans Dockter
Gradle Project Manager
http://www.gradle.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Adam Murdoch-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Steve Appling wrote:
> I have a proof of concept implementation of some of this at
> git://github.com/sappling/gradle.git in the "opt" branch.
>

This looks pretty good. Some comments below.

> This includes:
> 1) A new onlyIf method on Task

We will need an overload of onlyIf() which takes a TaskAction, so that
build logic implemented in Java (eg the Java plugin) can make use of
this too.

> 2) A new didWork method on Task

The didWork property should default to false, until the execute() method
first executes a task action, when it should change to true.

> 3) Implementations of didWork for Compile and GroovyCompile.

Could you add some unit test coverage for this?

>   I don't think that we can handle Ant's Copy task in this same way.  
> We may have to use a replacement, but this has other consequences.
> 4) Changes to src/samples/java/quickstart/build.gradle to demo didWork
> and onlyIf.

I don't think these belong in the quickstart sample. It would be
unfortunate if users really had to think about optimisation when being
introduced to their very first Gradle build. A better place for this
would be in the Java plugin.

Some integration test coverage would be good too.

> 5) A start at OptimizationHelper.isNeeded method.  This will require
> some additional dependency management features, so I stopped
> development until I got some feedback on this whole approach.
>

I'm still not convinced about this one. I'd rather get the above bits
into the 0.7 release and leave this one out until we have a better idea
of how it should work (which may also be in time for 0.7). If we have
Task.onlyIf(), one can very easily add an equivalent of isNeeded() in
their build script.

> Steve Appling wrote:
>> I am interested in ways to short circuit task execution for the
>> purpose of optimization.  I would love to see some of this in 0.7 and
>> would be glad to contribute.
>>
>> Here are some ideas:
>> 1) Add an "onlyIf" method to Task that is given a closure. The
>> closure would be executed before the first action of the task and
>> would cancel execution of the task (with appropriate lifecycle
>> message) if it returned false.  This closure would have as a delegate
>> an optimization container with some helper methods that would provide
>> more convenient access to change detection (among other things). Then
>> you could do:
>>   mytask.onlyIf {
>>     timestampChanged 'src/main/mysrc'
>>     // or contentsChanged 'src/main/mysrc'
>>   }
>>
>> 2) Running a clean should probably remove the change detection state
>> information for a project (or at least the clean task should be able
>> to be configured to do this conveniently).
>>
>> 3) I would like some general way for tasks to indicate that they did
>> anything. Perhaps task.getDidWork().  BTW, I figured out how to do
>> this for gradle's use of ant.javac and can now tell if it really
>> compiled anything.
>>
>> 4) I would like to be able to specify that a chain of dependent tasks
>> only execute a task if Task.didWork is true for all of its
>> dependents.  Note that this is not always desired, so you need to be
>> able to turn this on and off.  I'm not sure of the best way to
>> configure this.  If we use the onlyIf method suggested above, it
>> might take another closure to check this that would be returned from
>> a  "needed" method.  This would look like:
>>   myTask.onlyIf(needed())
>>
>> This probably should be the default for tests, but perhaps not for
>> all Tasks.
>>
>> Javac is already checking to see if the source files are out of date
>> with the classes, so I don't think that the javac task needs to use
>> the new changedetection.  This would, however let you stop other
>> tasks in the chain (like test) if nothing needed to be compiled.  
>> (unrelated: I would also like to see an option on compile to use
>> Ant's depend task.  I think the current dependencyTracking option
>> doesn't work with the modern compiler. )
>>
>> Other types of tasks could make good use of Tom's change detection.
>>
>> 5) We probably want a command line option to be able to disable all
>> of these optimizations.  Sometimes you really want to force a build
>> with no optimizations (without running clean).
>>
>>
>> In the race for speed, Gradle will probably never catch Ant in a
>> clean build (at least while you are delegating most of the expensive
>> stuff to ant).  However, most of the time developers are doing
>> incremental changes on existing systems and not running clean.  In
>> this case, if Gradle can support features to conveniently bypass
>> unneeded steps, it can be much faster.  Also, Gradle has a huge
>> advantage of a more maintainable and modular build specification.
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Adam Murdoch wrote:

>
>
> Steve Appling wrote:
>> I have a proof of concept implementation of some of this at
>> git://github.com/sappling/gradle.git in the "opt" branch.
>>
>
> This looks pretty good. Some comments below.
>
>> This includes:
>> 1) A new onlyIf method on Task
>
> We will need an overload of onlyIf() which takes a TaskAction, so that
> build logic implemented in Java (eg the Java plugin) can make use of
> this too.
Thats a good point, I had not thought of that.

>
>> 2) A new didWork method on Task
>
> The didWork property should default to false, until the execute() method
> first executes a task action, when it should change to true.
>
>> 3) Implementations of didWork for Compile and GroovyCompile.
>
> Could you add some unit test coverage for this?
>
As I said in my post, this was just a proof of concept, not a final
implementation.  I wanted to get feedback about this general approach.  I'll add
unit/integration tests and more javadoc comments.

>>   I don't think that we can handle Ant's Copy task in this same way.  
>> We may have to use a replacement, but this has other consequences.
>> 4) Changes to src/samples/java/quickstart/build.gradle to demo didWork
>> and onlyIf.
>
> I don't think these belong in the quickstart sample. It would be
> unfortunate if users really had to think about optimisation when being
> introduced to their very first Gradle build. A better place for this
> would be in the Java plugin.

This was not intended to ship as part of the quickstart samples, but was a
convenient place to show the syntax for comment.

>
> Some integration test coverage would be good too.
>
>> 5) A start at OptimizationHelper.isNeeded method.  This will require
>> some additional dependency management features, so I stopped
>> development until I got some feedback on this whole approach.
>>
>
> I'm still not convinced about this one. I'd rather get the above bits
> into the 0.7 release and leave this one out until we have a better idea
> of how it should work (which may also be in time for 0.7). If we have
> Task.onlyIf(), one can very easily add an equivalent of isNeeded() in
> their build script.
>
I agree.

>> Steve Appling wrote:
>>> I am interested in ways to short circuit task execution for the
>>> purpose of optimization.  I would love to see some of this in 0.7 and
>>> would be glad to contribute.
>>>
>>> Here are some ideas:
>>> 1) Add an "onlyIf" method to Task that is given a closure. The
>>> closure would be executed before the first action of the task and
>>> would cancel execution of the task (with appropriate lifecycle
>>> message) if it returned false.  This closure would have as a delegate
>>> an optimization container with some helper methods that would provide
>>> more convenient access to change detection (among other things). Then
>>> you could do:
>>>   mytask.onlyIf {
>>>     timestampChanged 'src/main/mysrc'
>>>     // or contentsChanged 'src/main/mysrc'
>>>   }
>>>
>>> 2) Running a clean should probably remove the change detection state
>>> information for a project (or at least the clean task should be able
>>> to be configured to do this conveniently).
>>>
>>> 3) I would like some general way for tasks to indicate that they did
>>> anything. Perhaps task.getDidWork().  BTW, I figured out how to do
>>> this for gradle's use of ant.javac and can now tell if it really
>>> compiled anything.
>>>
>>> 4) I would like to be able to specify that a chain of dependent tasks
>>> only execute a task if Task.didWork is true for all of its
>>> dependents.  Note that this is not always desired, so you need to be
>>> able to turn this on and off.  I'm not sure of the best way to
>>> configure this.  If we use the onlyIf method suggested above, it
>>> might take another closure to check this that would be returned from
>>> a  "needed" method.  This would look like:
>>>   myTask.onlyIf(needed())
>>>
>>> This probably should be the default for tests, but perhaps not for
>>> all Tasks.
>>>
>>> Javac is already checking to see if the source files are out of date
>>> with the classes, so I don't think that the javac task needs to use
>>> the new changedetection.  This would, however let you stop other
>>> tasks in the chain (like test) if nothing needed to be compiled.  
>>> (unrelated: I would also like to see an option on compile to use
>>> Ant's depend task.  I think the current dependencyTracking option
>>> doesn't work with the modern compiler. )
>>>
>>> Other types of tasks could make good use of Tom's change detection.
>>>
>>> 5) We probably want a command line option to be able to disable all
>>> of these optimizations.  Sometimes you really want to force a build
>>> with no optimizations (without running clean).
>>>
>>>
>>> In the race for speed, Gradle will probably never catch Ant in a
>>> clean build (at least while you are delegating most of the expensive
>>> stuff to ant).  However, most of the time developers are doing
>>> incremental changes on existing systems and not running clean.  In
>>> this case, if Gradle can support features to conveniently bypass
>>> unneeded steps, it can be much faster.  Also, Gradle has a huge
>>> advantage of a more maintainable and modular build specification.
>>>

--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Adam Murdoch-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Steve Appling wrote:

>
>
> Hans Dockter wrote:
>> <snip>
>>
>>> 4) I would like to be able to specify that a chain of dependent
>>> tasks only execute a task if Task.didWork is true for all of its
>>> dependents.
>>
>> I don't fully understand this. Could you explain this a bit more?
>>
>> <snip>
>>
>> - Hans
>>
>
> Sure - I did not express that very well at all.  I also wrote it
> before I attempted an implementation, so I think I have a better idea
> of what might be needed now.
>
> In the syntax that I implemented, you could say:
> test.onlyIf { isNeeded() }
>
> I wanted this to be able to look at the TaskDependencies for the test
> task and only execute if Task.didWork was true for one of them. I was
> not able to figure out how to use TaskDependencies to accomplish this.
> task.getTaskDependencies(task) only returns the tasks that are
> explicitly added using dependsOn and doesn't seem to take into account
> the tasks needed to build the artifacts in the configurations that are
> contained in the TaskDependencies object.
>

It should return both types of tasks. However, it will only start doing
this once all the projects have been evaluated, which is fine for
implementing isNeeded()


Adam


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Task Optimization

by Steve Appling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've made a Jira issue for this as a new feature and submitted a patch.  See
http://jira.codehaus.org/browse/GRADLE-533

This is just a starting point to enable users to manually specify optimization
conditions.  We need to add some built in support for optimizing the tasks added
by the built in plugins.  I had problems last week when I was trying to do this.
  I'll try to work on this more tomorrow and start another thread with a
description of the issues I encountered.
--
Steve Appling
Automated Logic Research Team

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email