« Return to Thread: Gathering Artifact repository discovery requirements

Re: Gathering Artifact repository discovery requirements

by brettporter :: Rate this Message:

Reply to Author | View in Thread

On 11/05/2009, at 11:12 AM, Brian Fox wrote:

> It's time to start looking at the problems with the current 2.x  
> resolution
> scheme as it specifically relates to repository declaration and  
> discovery.

Sorry for the delay in responding to this, I'm still catching up on May.

I think the first few sections are accurate and complete.

For requirements:

> 1. maintain the ability for a user to checkout your code and run mvn  
> install and have it work with no prior setup on their part.


+1

> 2. be able to depend on some jar and not worry about any  
> repositories required for transitive resolution (ie discover the  
> repositories transitively as dependencies are processed) (this is  
> controversial and may be eliminated. First it contributes to the  
> Problem #4 above in that SAT can't be done on a bounded list of  
> repositories. It also doesn't work normally behind a repository  
> manager because the list of repos is usually controlled in the repo  
> manager and thus autodiscovery is intentionally blocked, usually via  
> a mirrorOf * to circumvent the repos maven finds in the poms.)


I think we can achieve this in a way that is compatible with repo  
managers, depending on the solution (see below)

If we have this though, we need to add a new requirement:
5. builds should be able to add their own alternative versions for  
artifacts (eg, see xwiki's build that provides a lot of custom  
versions of standard things), without affecting other builds. So in  
this case, they would use a custom version to ensure within their  
build it can override others and contribute to ranges, but its  
existence in a local repository shouldn't affect other builds.

> 3. be able to separate the dependencies needed by maven plugins from  
> those needed by the build. This means not only where they are  
> resolved from, but also how they are stored locally to prevent cross-
> contamination.

I think I would reword this. I can understand wanting to locate  
plugins separately, and for their repos/deps not to affect the rest of  
the build, but I'm not sure why local storage matters. A dependency  
junit:junit:3.8.1 used in a plugin should be the same as that used in  
a project. Perhaps an alternate/additional requirement is "3. a given  
artifact coordinate must be always use an identical artifact across a  
build".

> 4. Repository identification: at this point we are pretty much in  
> agreement that the URL should be the unique identifier for a  
> repository. People who care about what they are publishing either  
> need to use canonical repositories like Maven central or need to  
> guarantee the existence of the repositories or have decent pointers.  
> In a fully distributed system the relocation mechanism we have does  
> not work in a fully distributed system without a master to manage  
> relocations.


This is a solution, not a requirement :) I think it's clear we need a  
unique identifier. A URI is a good way to do that, but we need to  
accommodate that repositories will move too (This was a problem listed  
earlier). Depending on how we solve the above, it may become less of  
an issue. So perhaps reword as "repositories must be uniquely  
identifiable and able to be relocated to a new location over time  
without affecting existing builds".

I'd then break out artifact relocation as separate requirements:
6. relocating an artifact to a different coordinate must be possible  
even if that is on a different repository

Stemming from the location I'd add:
7. repositories must be able to be mirrored to different locations and  
the user select from their choice of closer, identical repository.

Also, probably implied but worth stating:
8. all discovery must be possible without a repository manager  
installed (though using one can improve the ability to route requests  
differently)

And finally, maybe implied but worth being explicit about:
9. must work for locating parent projects (this will start giving us  
better ways to deal with the chicken/egg problem and auto-versioning)

Turning to solutions since it has been a while now... here's some  
starting points.

I'm tossing around two alternatives in my head:
1) using the repository as the start of the namespace (ie, http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.jar 
  is different to http://repo.otherproject.com/junit/junit/3.8.1/junit-3.8.1.jar)
, where the repository contributes to the "version" of the artifact,  
but is considered the same group/artifact ID for the purpose of  
resolution. Not that this is just for identification, location needs  
to be separate.
2) considering group/artifact ID to be globally unique and repository  
can be derived from that

I'm leaning towards (2) as its shorter notation and easier to  
understand. Under (1), we'd probably need to be able to add the  
repository to a dependency element (perhaps with a shorthand notation  
defined in the pom or its parent

Either way, the resolution mechanism should not be affected by the  
repositories used. For a given set of artifacts, that should always  
resolve the same way. The versions available to a range calculation  
will alter depending on the available repositories, but these should  
all be known up front in the build. I don't think we need to deal with  
how version ranges are calculated / made reproducible here (that's  
being separately dealt with), as long as the above requirements are  
met with respect to the repositories used for it.

To accommodate this, I think the repositories in the POM should become  
constrained to locating metadata for a certain set of artifacts, so  
they can be used to expand reach through resolution, but do not affect  
anything already encountered, and do not affect resolution outside the  
current project. As long as the revised (3) above holds, this will be  
reproducible.

Given 1) , 2), 3), and 5), I think a delegating structure for locating  
an artifact is the way to go. That is, specifying *only* the  
<dependency> element is enough for a build to locate an artifact, and  
always get the same one. The advantages are significant: less  
configuration/easier set up for new repositories, simpler resolution  
logic, faster resolution as it never needs to search multiple  
repositories. The delegation needs to go right down to the version  
level (snapshots in one repo, releases in another). Then the downside  
is loss of control (if we point javax to the download.java.net repos  
automatically, we have to live with that doing dodgy stuff in that  
namespace like bad POMs or changing released artifacts, or just being  
down).

I think this can be overcome by layers of routing rules. So, if  
central becomes the source of pointers to artifacts, then a project  
can add a repository to locate *missing* ones (not override existing)  
as described above, then a user can *alter* routes from their  
settings.xml. A common one for this will be * -> repository manager,  
but you could have others whether you are using a repo manager or not.

As for local storage, which was mentioned in the requirements, I'm  
still in favour or this or similar: http://docs.codehaus.org/display/MAVEN/Local+repository+separation 
. The important part here is that metadata is separated from artifacts  
and local installations are only used when you intended them to be.

Anyway, just a starting point for discussion, if we can agree on some  
of the fundamentals I'm sure we can build up a more complete solution.

Cheers,
Brett



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@...
For additional commands, e-mail: dev-help@...

 « Return to Thread: Gathering Artifact repository discovery requirements