A Few Ideas

View: New views
4 Messages — Rating Filter:   Alert me  

A Few Ideas

by ibaker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Good evening, I would appreciate any feedback on the following.

Our initial patch submission for implementing GSS-API mutual authentication
used an environment variable ($DISTCC_AUTH) and as such was a global
option for all hosts.

We have another version where this is now a per host option along the
lines of lzo or cpp (,gssapi).  This enables a client to use mixed
authenticating/non-authenticating build hosts, and allows for easier
deployment for users as there are fewer environment variables to set up.

We also currently use DNS aliases to identify a build cluster or
sub-cluster and replace its host definition with a host definition for
each of its IP addresses if the host option of ,exp is specified (this is
to avoid a performance cap imposed by the lockfile names and to utilise
the randomisation done by DNS).

These modifications involve extending the host definition structure and
the host option parser, so in this respect they're slightly more
intrusive in terms of code modification.  However, this approach is
consistent with the existing per host comma separated options.

Would this be an acceptable solution, or can anyone foresee any issues or
problems?


Thanks.

Ian
__
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/distcc

Re: A Few Ideas

by Fergus Henderson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 24, 2009 at 8:52 AM, <ibaker@...> wrote:
Good evening, I would appreciate any feedback on the following.

Our initial patch submission for implementing GSS-API mutual authentication used an environment variable ($DISTCC_AUTH) and as such was a global option for all hosts.

We have another version where this is now a per host option along the lines of lzo or cpp (,gssapi).  This enables a client to use mixed authenticating/non-authenticating build hosts, and allows for easier deployment for users as there are fewer environment variables to set up.

That sounds reasonable.

Is there a simple way for the client to detect when a server requires authentication,
so that this can be done automatically be "lsdistcc"?

We also currently use DNS aliases to identify a build cluster or sub-cluster and replace its host definition with a host definition for each of its IP addresses if the host option of ,exp is specified (this is to avoid a performance cap imposed by the lockfile names and to utilise the randomisation done by DNS).

Could you explain that in more detail?

Cheers,
  Fergus.
--
Fergus Henderson <fergus@...>

__
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/distcc

Re: A Few Ideas

by ibaker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 24 Apr 2009, Fergus Henderson wrote:

> On Fri, Apr 24, 2009 at 8:52 AM, <ibaker@...> wrote:
>
>> Good evening, I would appreciate any feedback on the following.
>>
>> Our initial patch submission for implementing GSS-API mutual authentication
>> used an environment variable ($DISTCC_AUTH) and as such was a global option
>> for all hosts.
>>
>> We have another version where this is now a per host option along the lines
>> of lzo or cpp (,gssapi).  This enables a client to use mixed
>> authenticating/non-authenticating build hosts, and allows for easier
>> deployment for users as there are fewer environment variables to set up.
>
>
> That sounds reasonable.
>
> Is there a simple way for the client to detect when a server requires
> authentication,
> so that this can be done automatically be "lsdistcc"?

The simplest way to determine if a server requires authentication is
to transmit the handshake character and see if one gets transmitted back.

>
> We also currently use DNS aliases to identify a build cluster or sub-cluster
>> and replace its host definition with a host definition for each of its IP
>> addresses if the host option of ,exp is specified (this is to avoid a
>> performance cap imposed by the lockfile names and to utilise the
>> randomisation done by DNS).
>
>
> Could you explain that in more detail?
>

Sure.  Currently we have a hostname of the form:

distcc-slc4-amd64.cern.ch/8

The set of IP addresses for this hostname represents the build cluster for
this particular platform.  Our patch replaces the dcc_hostdef list item
for distcc-slc4-amd64.cern.ch with a dcc_hostdef for each one of the IP
addresses in h_addr_list, replacing the hostname in the new list entry
with an IP address.

Randomisation as per --randomize takes place prior to this expansion so
that any randomisation done by DNS is preserved.

The rationale behind this patch is primarily to overcome the performance
ceiling imposed by the lockfile names.  Specifically, using the above
hostname, when we have 8 lockfiles created all other processes become
blocked.

Overcoming this performance ceiling and the additional randomisation
performed by DNS seemed like a good alternative to changing the lockfile
naming mechanism.


Ian
__
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/distcc

Re: A Few Ideas

by Fergus Henderson-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Mon, Apr 27, 2009 at 7:49 AM, <ibaker@...> wrote:
On Fri, 24 Apr 2009, Fergus Henderson wrote:

On Fri, Apr 24, 2009 at 8:52 AM, <ibaker@...> wrote:

Good evening, I would appreciate any feedback on the following.

Our initial patch submission for implementing GSS-API mutual authentication
used an environment variable ($DISTCC_AUTH) and as such was a global option
for all hosts.

We have another version where this is now a per host option along the lines
of lzo or cpp (,gssapi).  This enables a client to use mixed
authenticating/non-authenticating build hosts, and allows for easier
deployment for users as there are fewer environment variables to set up.


That sounds reasonable.

Is there a simple way for the client to detect when a server requires
authentication,
so that this can be done automatically be "lsdistcc"?

The simplest way to determine if a server requires authentication is to transmit the handshake character and see if one gets transmitted back.

OK, that's straight-forward.

We also currently use DNS aliases to identify a build cluster or sub-cluster
and replace its host definition with a host definition for each of its IP
addresses if the host option of ,exp is specified (this is to avoid a
performance cap imposed by the lockfile names and to utilise the
randomisation done by DNS).

Could you explain that in more detail?


Sure.  Currently we have a hostname of the form:

distcc-slc4-amd64.cern.ch/8

The set of IP addresses for this hostname represents the build cluster for
this particular platform.  Our patch replaces the dcc_hostdef list item
for distcc-slc4-amd64.cern.ch with a dcc_hostdef for each one of the IP
addresses in h_addr_list, replacing the hostname in the new list entry
with an IP address.

Randomisation as per --randomize takes place prior to this expansion so
that any randomisation done by DNS is preserved.

The rationale behind this patch is primarily to overcome the performance
ceiling imposed by the lockfile names.  Specifically, using the above
hostname, when we have 8 lockfiles created all other processes become
blocked.

Overcoming this performance ceiling and the additional randomisation
performed by DNS seemed like a good alternative to changing the lockfile
naming mechanism.

An alternative approach would be to use host names such as

       distcc1.slc4-amd64.cern.ch/1
       distcc2.slc4-amd64.cern.ch/1
       distcc3.slc4-amd64.cern.ch/1
       distcc4.slc4-amd64.cern.ch/1
       distcc5.slc4-amd64.cern.ch/4

for the distcc cluster, and then use "lsdistcc" (perhaps with the "-n" option) at the start of each build (which can be done automatically by the "pump" script if you set DISTCC_POTENTIAL_HOSTS).
This approach has some significant advantages, I believe.
  • It deals much better with the situation when some of the server hosts are down. With the approach that you suggest, distcc would fall back to local execution for any server that was down, resulting in a slow build due to (a) repeated timeouts trying to contact remote hosts that are down and (b) doing too much work locally.  Using lsdistcc, down hosts are quickly detected at the start of the build, and will not be included in DISTCC_HOSTS.
  • I think it can deal better with the situation where some of the server hosts are more powerful than others (e.g. by using /4 for distcc5 in the example above).  With the approach that you suggest, it seems that distcc would have to assume that each IP address returned was equally powerful; I don't see any way for DNS to indicate to distcc that certain IP addresses have more computing power than others.
It does have one significant disadvantage, which is that there is some additional admin work needed to avoid "holes" in the DNS namespace for distcc1..distccN when hosts are added/removed from the distcc cluster.
But this seems worth it for the advantages mentioned above.

Have you considered that approach?

Supporting multiple ways to achieve the same effect would add complexity to distcc without any additional compensating functionality, so unless there is a major benefit to your proposed approach, I think it would make more sense to consolidate support for the current lsdistcc-based approach.

Cheers,
  Fergus.

--
Fergus Henderson <fergus.henderson@...>

__
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/distcc