|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
Task OptimizationHi,
I have implemented a task optimization functionality that we might put into 0.8. I have uploaded my branch to: http://github.com/hansd/gradle/tree/optim A couple of comments: 1.) The task history is now stored in gradle user home with some hash that relates it to the actual project. The base for the hash is the path of the root dir. We might have issues if a subproject takes part in multiple multi-project builds, if the output is sensitive to the respective multi-project build. The only way I see to solve such a problem, would be to have multiple output dirs. 2.) Each task has a now doesOutputExists() method which defaults to false. So far all archive tasks have a custom implementation which checks for the existence of the archive. The test task also has a custom implementation which checks for at least one test results file. I hope that we find a way to automate this in 0.9 by introducing a generic notion of task output. 3.) So far there are onlyIf implementations only for the test and the jar task provided by the Java plugin. I will add an onlyIf modification for the test task when the Groovy plugin is applied tomorrow. For 0.9 we want to automate the onlyIf statements based on the information we have on the input arguments of a task. 4.) What about the other tasks? For java compile the Ant javac task has its own optimization checking for changed files. I'm not sure about groovyc, I need to check. The Ant Javadoc/Groovydoc tasks do not check for changed files. To optimize them we would need to check for changed source files. The same is true for the code quality stuff. I'm not sure whether I will have time to get this done before 0.8. I would use Tom's change detection stuff. I haven't had a look at that yet. For 0.9 I guess the SourceSet's will be a good place for source change detection. For 0.8 it might be already good enough to distinguish between no changes/do nothing and do the full thing. 5.) The onlyIf optimization needs to be disabled if any build.gradle which is part of the multi-project build, the settings.grade or an init.gradle changes. Therefore a ScriptSource object now has a method hasChanged which defaults to true. The DefaultScriptCompilerFactory sets it to false if a script is read from the cache. I'm not very happy about the latter mechanism. To me this looks like a hint that the ScriptSource should be responsible for the compilation, instead of the compile class having a side effect on the state of ScriptSource. I will think about this in more detail tomorrow. 6.) The GradleInternal class exposes now the settings and the init script ScriptSource objects. It also provides a convenience method to check whether any ScriptSource object has changed. To get hold of the settings object it registers as a BuildListener. I think there should be a better way. I will think more about this tomorrow. I'm not completely sure whether we want to push this into 0.8 or not. Feedback is welcome. @Steve, John, Mike: Would the above described state of change detection be helpful for your large enterprise build? - Hans -- Hans Dockter Gradle Project Manager http://www.gradle.org --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationOn Thu, Sep 24, 2009 at 4:47 PM, Hans Dockter <mail@...> wrote:
Hi, We have an even worse case than that: the same multi-project build is used to produce many different outputs (based on some flag). Think about a project that gets built with some subtle and some not-so-subtle changes for different customers. That's what we are doing. We handle this (in part) by redirecting the build output to a different directory for each case. This means that storing the user home will not work well for us. Maybe storing it in the project's build directory would be better? However, it's not build output but state, which is what the .gradle directories used to be for. But I'm really happy with those having been removed from my subproject directories! I'm not sure what to do here. I'll want to think about it and maybe try it out tomorrow before commenting further. -- John Murph Automated Logic Research Team |
|
|
Re: Task OptimizationHi,
It sounds to me like the generic solution might actually be easier than the hard-coded solution, once you chase down all the edge cases, and will also end up more accurate and reusable. Given that we want to throw away the hard-coded solution as soon as 0.8 is out and replace it with a generic solution, I wonder if it's worth pursuing the hard-coded solution at all. Hans Dockter wrote: > Hi, > > I have implemented a task optimization functionality that we might put > into 0.8. I have uploaded my branch to: > http://github.com/hansd/gradle/tree/optim > > A couple of comments: > > 1.) The task history is now stored in gradle user home with some hash > that relates it to the actual project. The base for the hash is the > path of the root dir. We might have issues if a subproject takes part > in multiple multi-project builds, if the output is sensitive to the > respective multi-project build. The only way I see to solve such a > problem, would be to have multiple output dirs. We want a unique identifier for the build, not for the project. At this stage, the settings dir path would do. Or the project dir of the root project. > > 2.) Each task has a now doesOutputExists() method which defaults to > false. So far all archive tasks have a custom implementation which > checks for the existence of the archive. The test task also has a > custom implementation which checks for at least one test results file. > I hope that we find a way to automate this in 0.9 by introducing a > generic notion of task output. We already have the notion to some degree: properties can be marked with @OutputFile and @OutputDirectory. The default doesOutputExists() could make use of these. > > 3.) So far there are onlyIf implementations only for the test and the > jar task provided by the Java plugin. I will add an onlyIf > modification for the test task when the Groovy plugin is applied > tomorrow. For 0.9 we want to automate the onlyIf statements based on > the information we have on the input arguments of a task. > > 4.) What about the other tasks? For java compile the Ant javac task > has its own optimization checking for changed files. I'm not sure > about groovyc, I need to check. The Ant Javadoc/Groovydoc tasks do not > check for changed files. To optimize them we would need to check for > changed source files. The same is true for the code quality stuff. I'm > not sure whether I will have time to get this done before 0.8. I would > use Tom's change detection stuff. I haven't had a look at that yet. > For 0.9 I guess the SourceSet's will be a good place for source change > detection. For 0.8 it might be already good enough to distinguish > between no changes/do nothing and do the full thing. > I think you can pretty quickly do something general for all tasks with file inputs: - In the onlyIf predicate, calculate the set of (file path, timestamp) for all input files in the history. You could create a hash from this. - In the onlyIf predicate, skip the task if the input files hash == the input files hash from last successful execution and task.doesOutputExists() - execute the task - store the input files hash in the history. > 5.) The onlyIf optimization needs to be disabled if any build.gradle > which is part of the multi-project build, the settings.grade or an > init.gradle changes. Therefore a ScriptSource object now has a method > hasChanged which defaults to true. The DefaultScriptCompilerFactory > sets it to false if a script is read from the cache. I'm not very > happy about the latter mechanism. To me this looks like a hint that > the ScriptSource should be responsible for the compilation, instead of > the compile class having a side effect on the state of ScriptSource. I > will think about this in more detail tomorrow. > I think a better approach is to use the properties of the task. This is more accurate, in that it catches changes to the task configuration that aren't the result of changes to the build/init/settings scripts. Some types of changes we don't catch by checking if the scripts has changed: * Task is configured using -PsomeProperty=value, and that value is different to last execution. * Task is configured using system property, and that value is different to last execution. * Task is configured based on the DAG, and the DAG contains different tasks to last execution. * Task is configured by a 3rd party plugin, and that plugin has changed since last execution * Task is configured by buildSrc code, and that code has changed since last execution * Task is configured using properties from an imported build.xml, and that build.xml has changed * Task is configured using properties from gradle.properties, ... * ... you get the idea ... So, checking whether the scripts have changed since last execution doesn't come close to accurately detecting if we need to re-execute a task. It also means we unnecessarily re-execute tasks when an unrelated change has been made to the build script. I think accuracy is really important with this stuff. It absolutely must be reliable, or people will just run clean all the time to get a reliable build. We want to avoid this. I would suggest instead that we add an @Input annotation which one can use to mark up the properties of a task which contribute in some significant way to the output of the task. The input of a task is stored in the history, and the set of input files is simply treated as one piece of input. > 6.) The GradleInternal class exposes now the settings and the init > script ScriptSource objects. It also provides a convenience method to > check whether any ScriptSource object has changed. To get hold of the > settings object it registers as a BuildListener. I think there should > be a better way. I will think more about this tomorrow. > Remove the settings file, perhaps? :) > I'm not completely sure whether we want to push this into 0.8 or not. > Feedback is welcome. > I don't think it will be reliable enough. Adam --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationOn Sep 25, 2009, at 2:10 AM, Adam Murdoch wrote: > Hi, > > It sounds to me like the generic solution might actually be easier > than the hard-coded solution, once you chase down all the edge > cases, and will also end up more accurate and reusable. Given that > we want to throw away the hard-coded solution as soon as 0.8 is out > and replace it with a generic solution, I wonder if it's worth > pursuing the hard-coded solution at all. > > > Hans Dockter wrote: >> Hi, >> >> I have implemented a task optimization functionality that we might >> put into 0.8. I have uploaded my branch to: http://github.com/hansd/gradle/tree/optim >> >> A couple of comments: >> >> 1.) The task history is now stored in gradle user home with some >> hash that relates it to the actual project. The base for the hash >> is the path of the root dir. We might have issues if a subproject >> takes part in multiple multi-project builds, if the output is >> sensitive to the respective multi-project build. The only way I see >> to solve such a problem, would be to have multiple output dirs. > > We want a unique identifier for the build, not for the project. At > this stage, the settings dir path would do. Or the project dir of > the root project. That's the way it is done (I was not precise enough, when I said 'actual project' above. It is the build.). The base for the hash is the path of the root dir. > >> >> 2.) Each task has a now doesOutputExists() method which defaults to >> false. So far all archive tasks have a custom implementation which >> checks for the existence of the archive. The test task also has a >> custom implementation which checks for at least one test results >> file. I hope that we find a way to automate this in 0.9 by >> introducing a generic notion of task output. > > We already have the notion to some degree: properties can be marked > with @OutputFile and @OutputDirectory. The default doesOutputExists > () could make use of these. Right. > >> >> 3.) So far there are onlyIf implementations only for the test and >> the jar task provided by the Java plugin. I will add an onlyIf >> modification for the test task when the Groovy plugin is applied >> tomorrow. For 0.9 we want to automate the onlyIf statements based >> on the information we have on the input arguments of a task. >> >> 4.) What about the other tasks? For java compile the Ant javac task >> has its own optimization checking for changed files. I'm not sure >> about groovyc, I need to check. The Ant Javadoc/Groovydoc tasks do >> not check for changed files. To optimize them we would need to >> check for changed source files. The same is true for the code >> quality stuff. I'm not sure whether I will have time to get this >> done before 0.8. I would use Tom's change detection stuff. I >> haven't had a look at that yet. For 0.9 I guess the SourceSet's >> will be a good place for source change detection. For 0.8 it might >> be already good enough to distinguish between no changes/do nothing >> and do the full thing. >> > > I think you can pretty quickly do something general for all tasks > with file inputs: > > - In the onlyIf predicate, calculate the set of (file path, > timestamp) for all input files in the history. You could create a > hash from this. > > - In the onlyIf predicate, skip the task if the input files hash == > the input files hash from last successful execution and > task.doesOutputExists() > > - execute the task > > - store the input files hash in the history. Yes. > >> 5.) The onlyIf optimization needs to be disabled if any >> build.gradle which is part of the multi-project build, the >> settings.grade or an init.gradle changes. Therefore a ScriptSource >> object now has a method hasChanged which defaults to true. The >> DefaultScriptCompilerFactory sets it to false if a script is read >> from the cache. I'm not very happy about the latter mechanism. To >> me this looks like a hint that the ScriptSource should be >> responsible for the compilation, instead of the compile class >> having a side effect on the state of ScriptSource. I will think >> about this in more detail tomorrow. >> > > I think a better approach is to use the properties of the task. This > is more accurate, in that it catches changes to the task > configuration that aren't the result of changes to the build/init/ > settings scripts. Some types of changes we don't catch by checking > if the scripts has changed: > * Task is configured using -PsomeProperty=value, and that value is > different to last execution. > * Task is configured using system property, and that value is > different to last execution. > * Task is configured based on the DAG, and the DAG contains > different tasks to last execution. > * Task is configured by a 3rd party plugin, and that plugin has > changed since last execution > * Task is configured by buildSrc code, and that code has changed > since last execution > * Task is configured using properties from an imported build.xml, > and that build.xml has changed > * Task is configured using properties from gradle.properties, ... > * ... you get the idea ... > > So, checking whether the scripts have changed since last execution > doesn't come close to accurately detecting if we need to re-execute > a task. It also means we unnecessarily re-execute tasks when an > unrelated change has been made to the build script. > > I think accuracy is really important with this stuff. It absolutely > must be reliable, or people will just run clean all the time to get > a reliable build. We want to avoid this. > > I would suggest instead that we add an @Input annotation which one > can use to mark up the properties of a task which contribute in some > significant way to the output of the task. The input of a task is > stored in the history, and the set of input files is simply treated > as one piece of input. I agree. I guess that is what we should do. And with the annotations it looks rather straight forward to implement. > >> 6.) The GradleInternal class exposes now the settings and the init >> script ScriptSource objects. It also provides a convenience method >> to check whether any ScriptSource object has changed. To get hold >> of the settings object it registers as a BuildListener. I think >> there should be a better way. I will think more about this tomorrow. >> > > Remove the settings file, perhaps? :) > >> I'm not completely sure whether we want to push this into 0.8 or >> not. Feedback is welcome. >> > > I don't think it will be reliable enough. I also think we should leave it out for 0.8. - Hans -- Hans Dockter Gradle Project Manager http://www.gradle.org --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationHans Dockter wrote: > Hi, > > I have implemented a task optimization functionality that we might put > into 0.8. I have uploaded my branch to: > http://github.com/hansd/gradle/tree/optim > > A couple of comments: > > 1.) The task history is now stored in gradle user home with some hash > that relates it to the actual project. The base for the hash is the path > of the root dir. We might have issues if a subproject takes part in > multiple multi-project builds, if the output is sensitive to the > respective multi-project build. The only way I see to solve such a > problem, would be to have multiple output dirs. > subprojects that effectively participate in different multi-project builds that have different output directories. I think this will be an issue for us. <clipped remaining> -- Steve Appling Automated Logic Research Team --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationAdam Murdoch wrote: > Hi, > > It sounds to me like the generic solution might actually be easier than > the hard-coded solution, once you chase down all the edge cases, and > will also end up more accurate and reusable. Given that we want to throw > away the hard-coded solution as soon as 0.8 is out and replace it with a > generic solution, I wonder if it's worth pursuing the hard-coded > solution at all. > > > Hans Dockter wrote: >> Hi, >> >> I have implemented a task optimization functionality that we might put >> into 0.8. I have uploaded my branch to: >> http://github.com/hansd/gradle/tree/optim >> >> A couple of comments: >> >> 1.) The task history is now stored in gradle user home with some hash >> that relates it to the actual project. The base for the hash is the >> path of the root dir. We might have issues if a subproject takes part >> in multiple multi-project builds, if the output is sensitive to the >> respective multi-project build. The only way I see to solve such a >> problem, would be to have multiple output dirs. > > We want a unique identifier for the build, not for the project. At this > stage, the settings dir path would do. Or the project dir of the root > project. > effectively build different products in the same suite from the same collection of sub-projects. For this to not cause problems for us, I think we would need the task history to actually go somewhere under the build directory. This would have the added "benefit" that the task history would be removed when you did a clean, so you would no longer need the doesOutputExists() method - which I think is just there to handle cleans after successful task execution. On a related topic, I really don't like all of the script cache information to be stored under the user home directory. It seems that putting this under a .gradle in the root project would be better. That way the script caches go away when a project directory is deleted. I currently have 745 directories directly under my home/scriptCache directory. >> >> 2.) Each task has a now doesOutputExists() method which defaults to >> false. So far all archive tasks have a custom implementation which >> checks for the existence of the archive. The test task also has a >> custom implementation which checks for at least one test results file. >> I hope that we find a way to automate this in 0.9 by introducing a >> generic notion of task output. > > We already have the notion to some degree: properties can be marked with > @OutputFile and @OutputDirectory. The default doesOutputExists() could > make use of these. > >> >> 3.) So far there are onlyIf implementations only for the test and the >> jar task provided by the Java plugin. I will add an onlyIf >> modification for the test task when the Groovy plugin is applied >> tomorrow. For 0.9 we want to automate the onlyIf statements based on >> the information we have on the input arguments of a task. >> >> 4.) What about the other tasks? For java compile the Ant javac task >> has its own optimization checking for changed files. I'm not sure >> about groovyc, I need to check. The Ant Javadoc/Groovydoc tasks do not >> check for changed files. To optimize them we would need to check for >> changed source files. The same is true for the code quality stuff. I'm >> not sure whether I will have time to get this done before 0.8. I would >> use Tom's change detection stuff. I haven't had a look at that yet. >> For 0.9 I guess the SourceSet's will be a good place for source change >> detection. For 0.8 it might be already good enough to distinguish >> between no changes/do nothing and do the full thing. >> > > I think you can pretty quickly do something general for all tasks with > file inputs: > > - In the onlyIf predicate, calculate the set of (file path, timestamp) > for all input files in the history. You could create a hash from this. > > - In the onlyIf predicate, skip the task if the input files hash == the > input files hash from last successful execution and task.doesOutputExists() > > - execute the task > > - store the input files hash in the history. > >> 5.) The onlyIf optimization needs to be disabled if any build.gradle >> which is part of the multi-project build, the settings.grade or an >> init.gradle changes. Therefore a ScriptSource object now has a method >> hasChanged which defaults to true. The DefaultScriptCompilerFactory >> sets it to false if a script is read from the cache. I'm not very >> happy about the latter mechanism. To me this looks like a hint that >> the ScriptSource should be responsible for the compilation, instead of >> the compile class having a side effect on the state of ScriptSource. I >> will think about this in more detail tomorrow. >> > > I think a better approach is to use the properties of the task. This is > more accurate, in that it catches changes to the task configuration that > aren't the result of changes to the build/init/settings scripts. Some > types of changes we don't catch by checking if the scripts has changed: > * Task is configured using -PsomeProperty=value, and that value is > different to last execution. > * Task is configured using system property, and that value is different > to last execution. > * Task is configured based on the DAG, and the DAG contains different > tasks to last execution. > * Task is configured by a 3rd party plugin, and that plugin has changed > since last execution > * Task is configured by buildSrc code, and that code has changed since > last execution > * Task is configured using properties from an imported build.xml, and > that build.xml has changed > * Task is configured using properties from gradle.properties, ... > * ... you get the idea ... > > So, checking whether the scripts have changed since last execution > doesn't come close to accurately detecting if we need to re-execute a > task. It also means we unnecessarily re-execute tasks when an unrelated > change has been made to the build script. > > I think accuracy is really important with this stuff. It absolutely must > be reliable, or people will just run clean all the time to get a > reliable build. We want to avoid this. > > I would suggest instead that we add an @Input annotation which one can > use to mark up the properties of a task which contribute in some > significant way to the output of the task. The input of a task is stored > in the history, and the set of input files is simply treated as one > piece of input. > (for the reliability reasons given above). >> 6.) The GradleInternal class exposes now the settings and the init >> script ScriptSource objects. It also provides a convenience method to >> check whether any ScriptSource object has changed. To get hold of the >> settings object it registers as a BuildListener. I think there should >> be a better way. I will think more about this tomorrow. >> > > Remove the settings file, perhaps? :) > >> I'm not completely sure whether we want to push this into 0.8 or not. >> Feedback is welcome. >> > > I don't think it will be reliable enough. > > > Adam > Overall I like this approach and think that it can really help. I am concerned about introducing this at the last minute, however. I also think that it needs something like the @Input annotation that Adam suggested before it is really useful. If you want to implement some of these suggested changes and delay (yet again) to test this some more, then we will be glad to try it out in our project and give you more feedback. I think it would probably be wiser to move this to early 0.9. -- Steve Appling Automated Logic Research Team --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationSteve Appling wrote: > > > Adam Murdoch wrote: >> Hi, >> >> It sounds to me like the generic solution might actually be easier >> than the hard-coded solution, once you chase down all the edge cases, >> and will also end up more accurate and reusable. Given that we want >> to throw away the hard-coded solution as soon as 0.8 is out and >> replace it with a generic solution, I wonder if it's worth pursuing >> the hard-coded solution at all. >> >> >> Hans Dockter wrote: >>> Hi, >>> >>> I have implemented a task optimization functionality that we might >>> put into 0.8. I have uploaded my branch to: >>> http://github.com/hansd/gradle/tree/optim >>> >>> A couple of comments: >>> >>> 1.) The task history is now stored in gradle user home with some >>> hash that relates it to the actual project. The base for the hash is >>> the path of the root dir. We might have issues if a subproject takes >>> part in multiple multi-project builds, if the output is sensitive to >>> the respective multi-project build. The only way I see to solve such >>> a problem, would be to have multiple output dirs. >> >> We want a unique identifier for the build, not for the project. At >> this stage, the settings dir path would do. Or the project dir of the >> root project. >> > We change the build directories for a project based off of several > conditions to effectively build different products in the same suite > from the same collection of sub-projects. For this to not cause > problems for us, I think we would need the task history to actually go > somewhere under the build directory. This would have the added > "benefit" that the task history would be removed when you did a clean, > so you would no longer need the doesOutputExists() method - which I > think is just there to handle cleans after successful task execution. > There's a couple of problems with storing the state under the build directory and using its existence to decide whether to rebuild or not: - It doesn't work for tasks that generate output outside the build directory. For example, in Gradle's build the install task generates its output in the $gradle_installPath directory. If you do a clean, then next time install is executed, it will reinstall the distribution, regardless of whether anything has changed since last install. Or, if you install, then delete the install directory, the install task will not reinstall the distribution without a clean being executed. - It loses history. I'd like to collect profiling information in the history, so we can use it for things like reporting, and task scheduling, and providing better execution feedback on the various UIs. Storing this in the build directory isn't going to work. I think your problem is better solved instead by making the artifacts the first-class citizens of the history store, rather than tasks. That is, for a given output file/directory we store the identifier of the task which produced it, plus the input which that task used. Then, we skip the execution of a task if its output files were most recently built by that task with the same input it has now. The task identifier is some combination of build identifier + task path. The input is some aggregate of the tasks input properties and files. > On a related topic, I really don't like all of the script cache > information to be stored under the user home directory. It seems that > putting this under a .gradle in the root project would be better. > That way the script caches go away when a project directory is > deleted. I currently have 745 directories directly under my > home/scriptCache directory. > Is it the fact that the scripts are cached under ~/.gradle that you don't like, or the fact that they aren't being cleaned up when they are no longer needed? I think we have a similar problem under ~/.gradle/wrapper and ~/.gradle/cache. There's a few problems with moving the scripts to the root project dir: - It doesn't solve the problem for ~/.gradle/wrapper and ~/.gradle/cache. - It doesn't solve the problem for scripts which are compiled before we know the root project dir, such as init scripts. - It doesn't work for read-only workspaces. There may not be quite as many files under ~/.gradle/wrapper and ~/.gradle/cache, but they take up much more space. It would be nice to come up with a solution which cleaned up every thing we cache. Some possible solutions: - A task or command-line option which garbage collects ~/.gradle. - The gradle command periodically garbage collects ~/.gradle, based on some threshold. This could be number of invocations since last garbage collect, time since last garbage collect, total size of ~/.gradle, or free disk space. - We garbage collect a cache whenever we write to it (no more than once per build). - Don't cache anything under ~/.gradle. For example, store everything under the root project dir, including the ivy cache. For those things where we don't know the root project dir, store in a .gradle dir in the directory containing the thing. We could probably combine some of these. > > Overall I like this approach and think that it can really help. I am > concerned about introducing this at the last minute, however. I also > think that it needs something like the @Input annotation that Adam > suggested before it is really useful. If you want to implement some > of these suggested changes and delay (yet again) to test this some > more, then we will be glad to try it out in our project and give you > more feedback. I think it would probably be wiser to move this to > early 0.9. > I'm keen to get started on this as soon as 0.8 is out. Adam --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationAdam Murdoch wrote: > > > Steve Appling wrote: >> >> >> Adam Murdoch wrote: >>> Hi, >>> >>> It sounds to me like the generic solution might actually be easier >>> than the hard-coded solution, once you chase down all the edge cases, >>> and will also end up more accurate and reusable. Given that we want >>> to throw away the hard-coded solution as soon as 0.8 is out and >>> replace it with a generic solution, I wonder if it's worth pursuing >>> the hard-coded solution at all. >>> >>> >>> Hans Dockter wrote: >>>> Hi, >>>> >>>> I have implemented a task optimization functionality that we might >>>> put into 0.8. I have uploaded my branch to: >>>> http://github.com/hansd/gradle/tree/optim >>>> >>>> A couple of comments: >>>> >>>> 1.) The task history is now stored in gradle user home with some >>>> hash that relates it to the actual project. The base for the hash is >>>> the path of the root dir. We might have issues if a subproject takes >>>> part in multiple multi-project builds, if the output is sensitive to >>>> the respective multi-project build. The only way I see to solve such >>>> a problem, would be to have multiple output dirs. >>> >>> We want a unique identifier for the build, not for the project. At >>> this stage, the settings dir path would do. Or the project dir of the >>> root project. >>> >> We change the build directories for a project based off of several >> conditions to effectively build different products in the same suite >> from the same collection of sub-projects. For this to not cause >> problems for us, I think we would need the task history to actually go >> somewhere under the build directory. This would have the added >> "benefit" that the task history would be removed when you did a clean, >> so you would no longer need the doesOutputExists() method - which I >> think is just there to handle cleans after successful task execution. >> > > There's a couple of problems with storing the state under the build > directory and using its existence to decide whether to rebuild or not: > > - It doesn't work for tasks that generate output outside the build > directory. For example, in Gradle's build the install task generates its > output in the $gradle_installPath directory. If you do a clean, then > next time install is executed, it will reinstall the distribution, > regardless of whether anything has changed since last install. Or, if > you install, then delete the install directory, the install task will > not reinstall the distribution without a clean being executed. > > - It loses history. I'd like to collect profiling information in the > history, so we can use it for things like reporting, and task > scheduling, and providing better execution feedback on the various UIs. > Storing this in the build directory isn't going to work. > > I think your problem is better solved instead by making the artifacts > the first-class citizens of the history store, rather than tasks. That > is, for a given output file/directory we store the identifier of the > task which produced it, plus the input which that task used. Then, we > skip the execution of a task if its output files were most recently > built by that task with the same input it has now. > The task identifier is some combination of build identifier + task path. > The input is some aggregate of the tasks input properties and files. > I generally like this solution, but we may have another wrinkle. We have some tasks in different sub-projects that contribute to the same output directory. As long as you are matching both the task and the output directory (and allow the history to contain multiple tasks with the same output directory and multiple output directories for a single task) I think this will work. >> On a related topic, I really don't like all of the script cache >> information to be stored under the user home directory. It seems that >> putting this under a .gradle in the root project would be better. >> That way the script caches go away when a project directory is >> deleted. I currently have 745 directories directly under my >> home/scriptCache directory. >> > > Is it the fact that the scripts are cached under ~/.gradle that you > don't like, or the fact that they aren't being cleaned up when they are > no longer needed? > > I think we have a similar problem under ~/.gradle/wrapper and > ~/.gradle/cache. > > There's a few problems with moving the scripts to the root project dir: > > - It doesn't solve the problem for ~/.gradle/wrapper and ~/.gradle/cache. > > - It doesn't solve the problem for scripts which are compiled before we > know the root project dir, such as init scripts. > > - It doesn't work for read-only workspaces. > > There may not be quite as many files under ~/.gradle/wrapper and > ~/.gradle/cache, but they take up much more space. It would be nice to > come up with a solution which cleaned up every thing we cache. > > Some possible solutions: > > - A task or command-line option which garbage collects ~/.gradle. > > - The gradle command periodically garbage collects ~/.gradle, based on > some threshold. This could be number of invocations since last garbage > collect, time since last garbage collect, total size of ~/.gradle, or > free disk space. > > - We garbage collect a cache whenever we write to it (no more than once > per build). > > - Don't cache anything under ~/.gradle. For example, store everything > under the root project dir, including the ivy cache. For those things > where we don't know the root project dir, store in a .gradle dir in the > directory containing the thing. or init scripts. I don't know how important this is. This solution also will duplicate the downloaded ivy files for different projects, which is in line with my desire to keep project information together, but will slow things down in general :(. All things considered, I guess I would vote for a task or command line option to garbage collect everything (perhaps it can get rid of the silly temp/groovy-generated directories as well). I don't want to take the time to do this on each build - it's slow enough already. All of this is a very minor concern - we can certainly do this later. Thanks for explaining some of the reasons behind it. > > We could probably combine some of these. > >> >> Overall I like this approach and think that it can really help. I am >> concerned about introducing this at the last minute, however. I also >> think that it needs something like the @Input annotation that Adam >> suggested before it is really useful. If you want to implement some >> of these suggested changes and delay (yet again) to test this some >> more, then we will be glad to try it out in our project and give you >> more feedback. I think it would probably be wiser to move this to >> early 0.9. >> > > I'm keen to get started on this as soon as 0.8 is out. > > > Adam > -- Steve Appling Automated Logic Research Team --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationSteve Appling wrote: >>> On a related topic, I really don't like all of the script cache >>> information to be stored under the user home directory. It seems >>> that putting this under a .gradle in the root project would be >>> better. That way the script caches go away when a project directory >>> is deleted. I currently have 745 directories directly under my >>> home/scriptCache directory. >>> >> >> Is it the fact that the scripts are cached under ~/.gradle that you >> don't like, or the fact that they aren't being cleaned up when they >> are no longer needed? > It is really just that they are never cleaned up that bothers me. > >> >> I think we have a similar problem under ~/.gradle/wrapper and >> ~/.gradle/cache. >> >> There's a few problems with moving the scripts to the root project dir: >> >> - It doesn't solve the problem for ~/.gradle/wrapper and >> ~/.gradle/cache. >> >> - It doesn't solve the problem for scripts which are compiled before >> we know the root project dir, such as init scripts. >> >> - It doesn't work for read-only workspaces. >> >> There may not be quite as many files under ~/.gradle/wrapper and >> ~/.gradle/cache, but they take up much more space. It would be nice >> to come up with a solution which cleaned up every thing we cache. >> >> Some possible solutions: >> >> - A task or command-line option which garbage collects ~/.gradle. >> >> - The gradle command periodically garbage collects ~/.gradle, based >> on some threshold. This could be number of invocations since last >> garbage collect, time since last garbage collect, total size of >> ~/.gradle, or free disk space. >> >> - We garbage collect a cache whenever we write to it (no more than >> once per build). >> >> - Don't cache anything under ~/.gradle. For example, store everything >> under the root project dir, including the ivy cache. For those things >> where we don't know the root project dir, store in a .gradle dir in >> the directory containing the thing. > I would have said that I prefer this, but it doesn't handle read only > workspaces or init scripts. I don't know how important this is. I'm not sure either. Maybe we don't care about this. We could always add a command-line/init script option to let you specify the cache dir, for this situation. > This solution also will duplicate the downloaded ivy files for > different projects, which is in line with my desire to keep project > information together, but will slow things down in general :(. > You would pay a one-off cost for each dependency per build, which might not be too bad. Moving the ivy cache to $rootProjectDir/.gradle has some advantages, such as it gets rid of some ivy weirdness when multiple builds share the same cache. Adam --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationOn Oct 1, 2009, at 9:00 AM, Adam Murdoch wrote: > > > Steve Appling wrote: >>>> On a related topic, I really don't like all of the script cache >>>> information to be stored under the user home directory. It seems >>>> that putting this under a .gradle in the root project would be >>>> better. That way the script caches go away when a project >>>> directory is deleted. I currently have 745 directories directly >>>> under my home/scriptCache directory. >>>> >>> >>> Is it the fact that the scripts are cached under ~/.gradle that >>> you don't like, or the fact that they aren't being cleaned up when >>> they are no longer needed? >> It is really just that they are never cleaned up that bothers me. >> >>> >>> I think we have a similar problem under ~/.gradle/wrapper and >>> ~/.gradle/cache. >>> >>> There's a few problems with moving the scripts to the root project >>> dir: >>> >>> - It doesn't solve the problem for ~/.gradle/wrapper and ~/.gradle/ >>> cache. >>> >>> - It doesn't solve the problem for scripts which are compiled >>> before we know the root project dir, such as init scripts. >>> >>> - It doesn't work for read-only workspaces. >>> >>> There may not be quite as many files under ~/.gradle/wrapper and >>> ~/.gradle/cache, but they take up much more space. It would be >>> nice to come up with a solution which cleaned up every thing we >>> cache. >>> >>> Some possible solutions: >>> >>> - A task or command-line option which garbage collects ~/.gradle. >>> >>> - The gradle command periodically garbage collects ~/.gradle, >>> based on some threshold. This could be number of invocations since >>> last garbage collect, time since last garbage collect, total size >>> of ~/.gradle, or free disk space. I think we should provide both, 1 and 2. >>> >>> - We garbage collect a cache whenever we write to it (no more than >>> once per build). What would be the criteria for cache elements being garbage collected (cached build scripts, wrapper distributions)? Would be write some timestamp file when the cached script or the wrapper is distribution is being used and based on that define a GC policy? >>> >>> - Don't cache anything under ~/.gradle. For example, store >>> everything under the root project dir, including the ivy cache. >>> For those things where we don't know the root project dir, store >>> in a .gradle dir in the directory containing the thing. >> I would have said that I prefer this, but it doesn't handle read >> only workspaces or init scripts. I don't know how important this is. > > I'm not sure either. Maybe we don't care about this. We could always > add a command-line/init script option to let you specify the cache > dir, for this situation. One other argument for using ~/.gradle as the location for metadata is that some people have complained about having yet another metadata directory there project. They perceive it as pollution. > >> This solution also will duplicate the downloaded ivy files for >> different projects, which is in line with my desire to keep project >> information together, but will slow things down in general :(. >> > > You would pay a one-off cost for each dependency per build, which > might not be too bad. Moving the ivy cache to > $rootProjectDir/.gradle has some advantages, One advantage of this is to be able to delete a project specific ivy cache vs. deleting the whole ivy cache. > such as it gets rid of some ivy weirdness when multiple builds share > the same cache. It would be better if Ivy could solve those problems (and it might have improved during the last releases, I'm not sure). - Hans -- Hans Dockter Gradle Project Manager http://www.gradle.org --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationAdam Murdoch wrote: > > > Steve Appling wrote: >>>> On a related topic, I really don't like all of the script cache >>>> information to be stored under the user home directory. It seems >>>> that putting this under a .gradle in the root project would be >>>> better. That way the script caches go away when a project directory >>>> is deleted. I currently have 745 directories directly under my >>>> home/scriptCache directory. >>>> >>> >>> Is it the fact that the scripts are cached under ~/.gradle that you >>> don't like, or the fact that they aren't being cleaned up when they >>> are no longer needed? >> It is really just that they are never cleaned up that bothers me. >> >>> >>> I think we have a similar problem under ~/.gradle/wrapper and >>> ~/.gradle/cache. >>> >>> There's a few problems with moving the scripts to the root project dir: >>> >>> - It doesn't solve the problem for ~/.gradle/wrapper and >>> ~/.gradle/cache. >>> >>> - It doesn't solve the problem for scripts which are compiled before >>> we know the root project dir, such as init scripts. >>> >>> - It doesn't work for read-only workspaces. >>> >>> There may not be quite as many files under ~/.gradle/wrapper and >>> ~/.gradle/cache, but they take up much more space. It would be nice >>> to come up with a solution which cleaned up every thing we cache. >>> >>> Some possible solutions: >>> >>> - A task or command-line option which garbage collects ~/.gradle. >>> >>> - The gradle command periodically garbage collects ~/.gradle, based >>> on some threshold. This could be number of invocations since last >>> garbage collect, time since last garbage collect, total size of >>> ~/.gradle, or free disk space. >>> >>> - We garbage collect a cache whenever we write to it (no more than >>> once per build). >>> >>> - Don't cache anything under ~/.gradle. For example, store everything >>> under the root project dir, including the ivy cache. For those things >>> where we don't know the root project dir, store in a .gradle dir in >>> the directory containing the thing. >> I would have said that I prefer this, but it doesn't handle read only >> workspaces or init scripts. I don't know how important this is. > > I'm not sure either. Maybe we don't care about this. We could always add > a command-line/init script option to let you specify the cache dir, for > this situation. > >> This solution also will duplicate the downloaded ivy files for >> different projects, which is in line with my desire to keep project >> information together, but will slow things down in general :(. >> > > You would pay a one-off cost for each dependency per build, which might > not be too bad. Moving the ivy cache to $rootProjectDir/.gradle has some > advantages, such as it gets rid of some ivy weirdness when multiple > builds share the same cache. > Gradle or working on our own project because Ivy stores the name of the repository in its cache. Moving that to the root project would fix the problem. It also would let you "clean up" the cache by just deleting old projects. -- Steve Appling Automated Logic Research Team --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationOn Oct 1, 2009, at 2:04 PM, Steve Appling wrote: > > > Adam Murdoch wrote: >> Steve Appling wrote: >>>>> On a related topic, I really don't like all of the script cache >>>>> information to be stored under the user home directory. It >>>>> seems that putting this under a .gradle in the root project >>>>> would be better. That way the script caches go away when a >>>>> project directory is deleted. I currently have 745 directories >>>>> directly under my home/scriptCache directory. >>>>> >>>> >>>> Is it the fact that the scripts are cached under ~/.gradle that >>>> you don't like, or the fact that they aren't being cleaned up >>>> when they are no longer needed? >>> It is really just that they are never cleaned up that bothers me. >>> >>>> >>>> I think we have a similar problem under ~/.gradle/wrapper and >>>> ~/.gradle/cache. >>>> >>>> There's a few problems with moving the scripts to the root >>>> project dir: >>>> >>>> - It doesn't solve the problem for ~/.gradle/wrapper and >>>> ~/.gradle/cache. >>>> >>>> - It doesn't solve the problem for scripts which are compiled >>>> before we know the root project dir, such as init scripts. >>>> >>>> - It doesn't work for read-only workspaces. >>>> >>>> There may not be quite as many files under ~/.gradle/wrapper and >>>> ~/.gradle/cache, but they take up much more space. It would be >>>> nice to come up with a solution which cleaned up every thing we >>>> cache. >>>> >>>> Some possible solutions: >>>> >>>> - A task or command-line option which garbage collects ~/.gradle. >>>> >>>> - The gradle command periodically garbage collects ~/.gradle, >>>> based on some threshold. This could be number of invocations >>>> since last garbage collect, time since last garbage collect, >>>> total size of ~/.gradle, or free disk space. >>>> >>>> - We garbage collect a cache whenever we write to it (no more >>>> than once per build). >>>> >>>> - Don't cache anything under ~/.gradle. For example, store >>>> everything under the root project dir, including the ivy cache. >>>> For those things where we don't know the root project dir, store >>>> in a .gradle dir in the directory containing the thing. >>> I would have said that I prefer this, but it doesn't handle read >>> only workspaces or init scripts. I don't know how important this >>> is. >> I'm not sure either. Maybe we don't care about this. We could >> always add a command-line/init script option to let you specify the >> cache dir, for this situation. >>> This solution also will duplicate the downloaded ivy files for >>> different projects, which is in line with my desire to keep >>> project information together, but will slow things down in >>> general :(. >>> >> You would pay a one-off cost for each dependency per build, which >> might not be too bad. Moving the ivy cache to >> $rootProjectDir/.gradle has some advantages, such as it gets rid of >> some ivy weirdness when multiple builds share the same cache. > That's true - I currently get Ivy warnings whenever I switch between > working on Gradle or working on our own project because Ivy stores > the name of the repository in its cache. This is a truly annoying Ivy bug. I have just commented on it again. More votes might help (I know that you have voted): https://issues.apache.org/jira/browse/IVY-758 - Hans -- Hans Dockter Gradle Project Manager http://www.gradle.org --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
|
|
Re: Task OptimizationHans Dockter wrote: > > On Oct 1, 2009, at 9:00 AM, Adam Murdoch wrote: > >> >> >> Steve Appling wrote: >>>>> On a related topic, I really don't like all of the script cache >>>>> information to be stored under the user home directory. It seems >>>>> that putting this under a .gradle in the root project would be >>>>> better. That way the script caches go away when a project >>>>> directory is deleted. I currently have 745 directories directly >>>>> under my home/scriptCache directory. >>>>> >>>> >>>> Is it the fact that the scripts are cached under ~/.gradle that you >>>> don't like, or the fact that they aren't being cleaned up when they >>>> are no longer needed? >>> It is really just that they are never cleaned up that bothers me. >>> >>>> >>>> I think we have a similar problem under ~/.gradle/wrapper and >>>> ~/.gradle/cache. >>>> >>>> There's a few problems with moving the scripts to the root project >>>> dir: >>>> >>>> - It doesn't solve the problem for ~/.gradle/wrapper and >>>> ~/.gradle/cache. >>>> >>>> - It doesn't solve the problem for scripts which are compiled >>>> before we know the root project dir, such as init scripts. >>>> >>>> - It doesn't work for read-only workspaces. >>>> >>>> There may not be quite as many files under ~/.gradle/wrapper and >>>> ~/.gradle/cache, but they take up much more space. It would be nice >>>> to come up with a solution which cleaned up every thing we cache. >>>> >>>> Some possible solutions: >>>> >>>> - A task or command-line option which garbage collects ~/.gradle. >>>> >>>> - The gradle command periodically garbage collects ~/.gradle, based >>>> on some threshold. This could be number of invocations since last >>>> garbage collect, time since last garbage collect, total size of >>>> ~/.gradle, or free disk space. > > I think we should provide both, 1 and 2. > >>>> >>>> - We garbage collect a cache whenever we write to it (no more than >>>> once per build). > > What would be the criteria for cache elements being garbage collected > (cached build scripts, wrapper distributions)? Would be write some > timestamp file when the cached script or the wrapper is distribution > is being used and based on that define a GC policy? > Some options: 1. Record when each thing is used, and when garbage collecting discard all things which have not been used within a certain threshold. 2. Record which builds use each thing, and when garbage collecting discard all things not used by any build or whose builds no longer exist. Option 2 has the benefit of only requiring an update of this meta-info when the cache changes, whereas option 1 requires an update on each access. >>>> >>>> - Don't cache anything under ~/.gradle. For example, store >>>> everything under the root project dir, including the ivy cache. For >>>> those things where we don't know the root project dir, store in a >>>> .gradle dir in the directory containing the thing. >>> I would have said that I prefer this, but it doesn't handle read >>> only workspaces or init scripts. I don't know how important this is. >> >> I'm not sure either. Maybe we don't care about this. We could always >> add a command-line/init script option to let you specify the cache >> dir, for this situation. > > One other argument for using ~/.gradle as the location for metadata is > that some people have complained about having yet another metadata > directory there project. They perceive it as pollution. > True. I can't see a really strong argument either way for ~/.gradle or for $rootProjectDir/.gradle. Provided there aren't .gradle directories scattered all through my source tree, I don't really care either way. >> >>> This solution also will duplicate the downloaded ivy files for >>> different projects, which is in line with my desire to keep project >>> information together, but will slow things down in general :(. >>> >> >> You would pay a one-off cost for each dependency per build, which >> might not be too bad. Moving the ivy cache to $rootProjectDir/.gradle >> has some advantages, > > One advantage of this is to be able to delete a project specific ivy > cache vs. deleting the whole ivy cache. > This is appealing for automated builds, such as CI builds. If the ivy cache were in $rootProjectDir/.gradle, we could easily extend '--cache rebuild' to discard the ivy cache as well. Adam --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email |
| Free embeddable forum powered by Nabble | Forum Help |