[geni-dev] CF Requirements: 2) Identity Vocabulary

View: New views
3 Messages — Rating Filter:   Alert me  

Parent Message unknown [geni-dev] CF Requirements: 2) Identity Vocabulary

by Max Ott-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


OK, let me start.

On 14/03/2009, at 8:59 AM, Harry Mussman wrote:
>

> 2a)  During the discussion, Larry Lannom of CNRI made the point that a
> system like GENI needs a precise vocabulary or ontology,  that is
> shared by all suites. (This is absolutely essential when multiple GENI
> suites that are federated together, as expected.) This will apply to
> principals, aggregates and slices.

I fully agree and have been making the argument that a taxonomic  
approach as taken in RSpec is insufficient. We need an ontology not  
only to have a precise way of describing things (first basic  
requirement for achieving repeatability) but also to describe  
RELATIONSHIPS and constraints among them. Now you can shoehorn all  
that into a taxonomy or to be less strict, into a tree structure with  
refids like we have in XML, but it will be messy to define the  
underlying vocabulary.

Now one of the biggest and valid criticism is that it is very hard to  
create an all encompassing ontology. To get a sizeable group with  
diverse interests to agree on all aspects can take years, just check  
the progress on some OASIS standards.

But we don't need that in order to get going. Namespaces allow us to  
easily extend an existing, or add a new one. Obviously, it won't help  
us if everyone has their own version, but there are already a few good  
starting points (including RSpec for an initial set of topics/nouns)  
and the various groups in GENI could add the things they care about.  
There has been tremendous progress in the Semantic Web community on  
automatic mapping related ontologies to each other and if we use their  
basic technologies, such as OWL, we can leverage a lot.

Ilia Baldine is using NDL in ORCA and I have been trying to at least  
convert RSpec into an ontology (all my tools fails to parse the  
current spec).


> 2b)  The current DRAFT states:
> "Each principal (also aggregate, component, slice) shall have a
> globally-unique name and/or a globally unique numerical identifier."

I would actually broaden that to something like 'artifact'. In Orbit,  
beside all that, every experiment, every measurement set, every  
experiment description file, configuration prototype, application, ...  
has its unique identifier. How else would we be able to describe what  
we want to do and also what we did and how everything is related to.

Coming back to relationships mentioned above, there are some  
interesting 'complications'. Let's pick a simple resource, such as a  
computer. That obviously should have a identifier. At some stage we  
replace the disk. Does the resource get a new identifier? It's not the  
same anymore, it's performance and capabilities may have changed. So  
our inventory ontology (or database schema) breaks this down into  
related resources which make up an other one. (Is a computer now an  
aggregate as it aggregates such "atomic" resources as motherboard,  
memory, disk, ...)

>
>
> 2c)  Discussion:
> Current prototype implementations use a UUID as a unique identifier,
> which is a long "random number" that is (with a very high probability)
> unique within one suite, and also among all suites.

First of all is the use of an UUID as defined in RFC 4122 a core tenet  
of the architecture? There are other well defined ways to accomplish  
that. There is an efficiency argument to be made, but what else was  
behind this? What prevents us from using generic URNs? You can always  
get to a UUID by specifying a hash function and a mapping namespace  
something most UUID libraries provide.

>
> However, there is no way to take a UUID and decide which suite it is
> in, and thus there is no way to find a UUID in a suite registry
> without checking the registries of all suites.


Anyway, there is a clear trade-off between the ease of creating a  
unique identifier and finding information about it. But I disagree  
that the only solution to find a UUID is by checking all registries.  
We have very robust DHT technologies which can easily be used for  
that. In fact, this is the route we are taking (with an interesting  
twist, though).
>

> 2d)  A proposed solution is to have the requirements read:
> "Each principal (also aggregate, component, slice) shall have a
> globally-unique name and/or a globally unique numerical identifier,
> where part of the name and/or numerical identifier directly specifies
> the identity of the GENI suite."

Not sure if this fundamentally solves the problem. How do we ensure  
uniqueness of the Suite ID (another UUID) and how do we initially find  
all the entry points to the various suites?  If we assume that in  
order to bootstrap the system we need a way to find out about all the  
registries first, or have a hierarchical structure where everyone  
knows THE registry and it knows (indirectly) every available suite,  
then obviously we start with the relevant knowledge.

I guess, if nothing else it limits the number of  identifiers we are  
looking for and it's a rather stable set. Any gossiping scheme would  
work very well.

Now ,one solution for a hierarchical naming scheme and one which makes  
my networking colleagues squirm, is  using IPv6 addresses for the  
identifiers and the DNS infrastructure for lookups. We have a well  
established way to assign address spaces, the SRV record (RFC 2782)  
for instance is used by XMPP to find the relevant XMPP server for a  
domain (and that's how we currently implement federation), ...

As this is supposed to be a discussion, I better end on a slightly  
controversial note :)

Cheers,

-max


_______________________________________________
control-wg mailing list
control-wg@...
http://lists.geni.net/mailman/listinfo/control-wg

Re: [geni-dev] CF Requirements: 2) Identity Vocabulary

by Giridhar Manepalli-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



(Since I prepared my write-up before Max sent out his summary about  
identifiers, I decided to send out mine as well. However, Max  
presented a few interesting ideas, so feel free to consider one of our  
emails as an anchoring one and pull-in the other one into the context.  
Thanks.)


We need identifiers to reference/identify entities such as principals,  
aggregates, components, slices etc. We expect the users, i.e., anyone  
(anything) who (that) has an identifier in hand and wants to proceed  
further with that identifier, to get some additional information  
associated with the entity referenced by that identifier. This  
additional information is key to enabling requirements such as getting  
metadata about the entity, setting up slices, interoperate across  
multiple organizations within a suite, interoperate across multiple  
suites etc.

In other words,

Identifiers must

(1) be unique across all organizations in a given suite.

(2) reveal some information, by either storing the information in the  
identifier or by associating that information somewhere else, about  
the owner suite or metadata about the entity being referenced etc.

(3) honor pt. 1 and 2 across all suites, to support federation and  
interoperability.

Pt. 1: Guaranteeing uniqueness is not a technical problem (there are a  
number of GUID, UUID generators to create unique identifiers); it is  
an organizational problem. Can all participating organizations use the  
same generator? If so, the problem is resolved. Can all organizations  
agree to providing a mapping between organizational identifiers and  
suite-wide-accepted identifiers? If so, the problem is resolved.

That is, there is no magic in achieving our goal here. All  
participants must sit together and agree to one another. Either create  
a spec. and ask everyone to implement that spec, or at least *map*  
internal implementations to that spec.

Pt. 2: Associating information with identifiers is common to computing  
and networking world for ages. However, there are two primary  
variances of those associations: (a) associating information by  
storing it *in* the identifiers or (b) associating information with  
the help of a common service.

Storing information in the identifiers, e.g. an entity identifier that  
looks like suiteID.somehashID, is fragile unless we all expect the  
entity to never be part of a different suite. By extension, putting  
semantics into identifiers is almost always a bad idea, unless we can  
guarantee that those semantics never change. A parallel problem with  
putting semantics is where do we draw the line, i.e. how do we know  
what kind of semantics is relevant to be part of the identifier and  
what kind is not. In my example, why putting *only* suiteID in the  
identifier is important, why not put other information such as access  
control, into the identifier.

The solution is to associate information with the identifiers by some  
means. For instance, the Handle System (RFC 3650), a distributed one,  
allows creating unique identifiers, associating records (of  
information) with identifiers, resolving associated records, and  
administering the identifiers individually. In a nutshell, it is a  
naming system for entities, just like what DNS is for machines - but  
with improved administrative and distributed capabilities. The point  
for mentioning about the Handle System is to emphasize the fact that  
there exists *a* solution that addresses our requirements.

Pt. 3: If we have solved pt.1 and pt.2, then we are much closer, if  
not already, to solving the interoperability problem. I have no doubt,  
we need to study the interoperability problem thoroughly - but if we  
have dealt with the other two problems successfully, then we should be  
in a better position to address this one.

-Giridhar


On Mar 18, 2009, at 9:07 PM, Max Ott wrote:

>
> OK, let me start.
>
> On 14/03/2009, at 8:59 AM, Harry Mussman wrote:
>>
>
>> 2a)  During the discussion, Larry Lannom of CNRI made the point  
>> that a
>> system like GENI needs a precise vocabulary or ontology,  that is
>> shared by all suites. (This is absolutely essential when multiple  
>> GENI
>> suites that are federated together, as expected.) This will apply to
>> principals, aggregates and slices.
>
> I fully agree and have been making the argument that a taxonomic
> approach as taken in RSpec is insufficient. We need an ontology not
> only to have a precise way of describing things (first basic
> requirement for achieving repeatability) but also to describe
> RELATIONSHIPS and constraints among them. Now you can shoehorn all
> that into a taxonomy or to be less strict, into a tree structure with
> refids like we have in XML, but it will be messy to define the
> underlying vocabulary.
>
> Now one of the biggest and valid criticism is that it is very hard to
> create an all encompassing ontology. To get a sizeable group with
> diverse interests to agree on all aspects can take years, just check
> the progress on some OASIS standards.
>
> But we don't need that in order to get going. Namespaces allow us to
> easily extend an existing, or add a new one. Obviously, it won't help
> us if everyone has their own version, but there are already a few good
> starting points (including RSpec for an initial set of topics/nouns)
> and the various groups in GENI could add the things they care about.
> There has been tremendous progress in the Semantic Web community on
> automatic mapping related ontologies to each other and if we use their
> basic technologies, such as OWL, we can leverage a lot.
>
> Ilia Baldine is using NDL in ORCA and I have been trying to at least
> convert RSpec into an ontology (all my tools fails to parse the
> current spec).
>
>
>> 2b)  The current DRAFT states:
>> "Each principal (also aggregate, component, slice) shall have a
>> globally-unique name and/or a globally unique numerical identifier."
>
> I would actually broaden that to something like 'artifact'. In Orbit,
> beside all that, every experiment, every measurement set, every
> experiment description file, configuration prototype, application, ...
> has its unique identifier. How else would we be able to describe what
> we want to do and also what we did and how everything is related to.
>
> Coming back to relationships mentioned above, there are some
> interesting 'complications'. Let's pick a simple resource, such as a
> computer. That obviously should have a identifier. At some stage we
> replace the disk. Does the resource get a new identifier? It's not the
> same anymore, it's performance and capabilities may have changed. So
> our inventory ontology (or database schema) breaks this down into
> related resources which make up an other one. (Is a computer now an
> aggregate as it aggregates such "atomic" resources as motherboard,
> memory, disk, ...)
>
>>
>>
>> 2c)  Discussion:
>> Current prototype implementations use a UUID as a unique identifier,
>> which is a long "random number" that is (with a very high  
>> probability)
>> unique within one suite, and also among all suites.
>
> First of all is the use of an UUID as defined in RFC 4122 a core tenet
> of the architecture? There are other well defined ways to accomplish
> that. There is an efficiency argument to be made, but what else was
> behind this? What prevents us from using generic URNs? You can always
> get to a UUID by specifying a hash function and a mapping namespace
> something most UUID libraries provide.
>
>>
>> However, there is no way to take a UUID and decide which suite it is
>> in, and thus there is no way to find a UUID in a suite registry
>> without checking the registries of all suites.
>
>
> Anyway, there is a clear trade-off between the ease of creating a
> unique identifier and finding information about it. But I disagree
> that the only solution to find a UUID is by checking all registries.
> We have very robust DHT technologies which can easily be used for
> that. In fact, this is the route we are taking (with an interesting
> twist, though).
>>
>
>> 2d)  A proposed solution is to have the requirements read:
>> "Each principal (also aggregate, component, slice) shall have a
>> globally-unique name and/or a globally unique numerical identifier,
>> where part of the name and/or numerical identifier directly specifies
>> the identity of the GENI suite."
>
> Not sure if this fundamentally solves the problem. How do we ensure
> uniqueness of the Suite ID (another UUID) and how do we initially find
> all the entry points to the various suites?  If we assume that in
> order to bootstrap the system we need a way to find out about all the
> registries first, or have a hierarchical structure where everyone
> knows THE registry and it knows (indirectly) every available suite,
> then obviously we start with the relevant knowledge.
>
> I guess, if nothing else it limits the number of  identifiers we are
> looking for and it's a rather stable set. Any gossiping scheme would
> work very well.
>
> Now ,one solution for a hierarchical naming scheme and one which makes
> my networking colleagues squirm, is  using IPv6 addresses for the
> identifiers and the DNS infrastructure for lookups. We have a well
> established way to assign address spaces, the SRV record (RFC 2782)
> for instance is used by XMPP to find the relevant XMPP server for a
> domain (and that's how we currently implement federation), ...
>
> As this is supposed to be a discussion, I better end on a slightly
> controversial note :)
>
> Cheers,
>
> -max
>
>
> _______________________________________________
> control-wg mailing list
> control-wg@...
> http://lists.geni.net/mailman/listinfo/control-wg
>


_______________________________________________
control-wg mailing list
control-wg@...
http://lists.geni.net/mailman/listinfo/control-wg

Re: [geni-dev] CF Requirements: 2) Identity Vocabulary

by Larry Lannom :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm supposed to be starting some other discussion, but I'll add a bit  
to this one first.

Semantics in identifiers - we have both a lot of hands-on experience  
with this and a lot of debate scars. As a general rule its easy to say  
- semantics are a bad idea. But, of course, there is some nuance to  
it. If you care at all about persistence, then the problem is deriving  
the identifier from some attribute of the identified entity where that  
attribute may change over time. Location and ownership are the obvious  
examples of this problem. If, however, you define the identifier/
entity relationship such that any change whatsoever in a given  
attribute would result in a new entity with a new identifier, then  
deriving the identifier from that attribute should be ok. But your  
crystal ball has be in good shape if you are working in some area  
where the identified entity is of interest 5, 10, or more years from  
now. Our approach has been to keep meaning out of identifiers and rely  
instead on a resolution system.

Resolving identifiers - having unique and persistent ids doesn't do  
you much good, in most circumstances, if you don't have a well-known  
way to get from the id to current state data on the identified entity.  
In network environments location is the most common piece of desirable  
data, but not the only one. If you don't have access to the entity,  
but want to request access, you need to know how. Perhaps you do have  
access but need to how to talk to it, how big it is, etc.  Our  
approach (handle system) is to resolve identifiers to one or more type/
value pairs, with an extensible typing system. The design choice that  
developers have faced over the years of using this system is how much  
data to put into those values and, as a result of that, how many  
levels of indirection to put into their application. By-and-large,  
people have come down to putting small amounts of data in the  
resolution system. This is also the DNS model. So handle values tend  
to be pointers (IP addresses, URLs, small chunks of XML containing  
multiple URLs plus decision rules) and end point data such as public  
keys or hashes.

Larry

On Mar 19, 2009, at 2:15 AM, Giridhar Manepalli wrote:

>
>
> (Since I prepared my write-up before Max sent out his summary about
> identifiers, I decided to send out mine as well. However, Max
> presented a few interesting ideas, so feel free to consider one of our
> emails as an anchoring one and pull-in the other one into the context.
> Thanks.)
>
>
> We need identifiers to reference/identify entities such as principals,
> aggregates, components, slices etc. We expect the users, i.e., anyone
> (anything) who (that) has an identifier in hand and wants to proceed
> further with that identifier, to get some additional information
> associated with the entity referenced by that identifier. This
> additional information is key to enabling requirements such as getting
> metadata about the entity, setting up slices, interoperate across
> multiple organizations within a suite, interoperate across multiple
> suites etc.
>
> In other words,
>
> Identifiers must
>
> (1) be unique across all organizations in a given suite.
>
> (2) reveal some information, by either storing the information in the
> identifier or by associating that information somewhere else, about
> the owner suite or metadata about the entity being referenced etc.
>
> (3) honor pt. 1 and 2 across all suites, to support federation and
> interoperability.
>
> Pt. 1: Guaranteeing uniqueness is not a technical problem (there are a
> number of GUID, UUID generators to create unique identifiers); it is
> an organizational problem. Can all participating organizations use the
> same generator? If so, the problem is resolved. Can all organizations
> agree to providing a mapping between organizational identifiers and
> suite-wide-accepted identifiers? If so, the problem is resolved.
>
> That is, there is no magic in achieving our goal here. All
> participants must sit together and agree to one another. Either create
> a spec. and ask everyone to implement that spec, or at least *map*
> internal implementations to that spec.
>
> Pt. 2: Associating information with identifiers is common to computing
> and networking world for ages. However, there are two primary
> variances of those associations: (a) associating information by
> storing it *in* the identifiers or (b) associating information with
> the help of a common service.
>
> Storing information in the identifiers, e.g. an entity identifier that
> looks like suiteID.somehashID, is fragile unless we all expect the
> entity to never be part of a different suite. By extension, putting
> semantics into identifiers is almost always a bad idea, unless we can
> guarantee that those semantics never change. A parallel problem with
> putting semantics is where do we draw the line, i.e. how do we know
> what kind of semantics is relevant to be part of the identifier and
> what kind is not. In my example, why putting *only* suiteID in the
> identifier is important, why not put other information such as access
> control, into the identifier.
>
> The solution is to associate information with the identifiers by some
> means. For instance, the Handle System (RFC 3650), a distributed one,
> allows creating unique identifiers, associating records (of
> information) with identifiers, resolving associated records, and
> administering the identifiers individually. In a nutshell, it is a
> naming system for entities, just like what DNS is for machines - but
> with improved administrative and distributed capabilities. The point
> for mentioning about the Handle System is to emphasize the fact that
> there exists *a* solution that addresses our requirements.
>
> Pt. 3: If we have solved pt.1 and pt.2, then we are much closer, if
> not already, to solving the interoperability problem. I have no doubt,
> we need to study the interoperability problem thoroughly - but if we
> have dealt with the other two problems successfully, then we should be
> in a better position to address this one.
>
> -Giridhar
>
>
> On Mar 18, 2009, at 9:07 PM, Max Ott wrote:
>
>>
>> OK, let me start.
>>
>> On 14/03/2009, at 8:59 AM, Harry Mussman wrote:
>>>
>>
>>> 2a)  During the discussion, Larry Lannom of CNRI made the point
>>> that a
>>> system like GENI needs a precise vocabulary or ontology,  that is
>>> shared by all suites. (This is absolutely essential when multiple
>>> GENI
>>> suites that are federated together, as expected.) This will apply to
>>> principals, aggregates and slices.
>>
>> I fully agree and have been making the argument that a taxonomic
>> approach as taken in RSpec is insufficient. We need an ontology not
>> only to have a precise way of describing things (first basic
>> requirement for achieving repeatability) but also to describe
>> RELATIONSHIPS and constraints among them. Now you can shoehorn all
>> that into a taxonomy or to be less strict, into a tree structure with
>> refids like we have in XML, but it will be messy to define the
>> underlying vocabulary.
>>
>> Now one of the biggest and valid criticism is that it is very hard to
>> create an all encompassing ontology. To get a sizeable group with
>> diverse interests to agree on all aspects can take years, just check
>> the progress on some OASIS standards.
>>
>> But we don't need that in order to get going. Namespaces allow us to
>> easily extend an existing, or add a new one. Obviously, it won't help
>> us if everyone has their own version, but there are already a few  
>> good
>> starting points (including RSpec for an initial set of topics/nouns)
>> and the various groups in GENI could add the things they care about.
>> There has been tremendous progress in the Semantic Web community on
>> automatic mapping related ontologies to each other and if we use  
>> their
>> basic technologies, such as OWL, we can leverage a lot.
>>
>> Ilia Baldine is using NDL in ORCA and I have been trying to at least
>> convert RSpec into an ontology (all my tools fails to parse the
>> current spec).
>>
>>
>>> 2b)  The current DRAFT states:
>>> "Each principal (also aggregate, component, slice) shall have a
>>> globally-unique name and/or a globally unique numerical identifier."
>>
>> I would actually broaden that to something like 'artifact'. In Orbit,
>> beside all that, every experiment, every measurement set, every
>> experiment description file, configuration prototype,  
>> application, ...
>> has its unique identifier. How else would we be able to describe what
>> we want to do and also what we did and how everything is related to.
>>
>> Coming back to relationships mentioned above, there are some
>> interesting 'complications'. Let's pick a simple resource, such as a
>> computer. That obviously should have a identifier. At some stage we
>> replace the disk. Does the resource get a new identifier? It's not  
>> the
>> same anymore, it's performance and capabilities may have changed. So
>> our inventory ontology (or database schema) breaks this down into
>> related resources which make up an other one. (Is a computer now an
>> aggregate as it aggregates such "atomic" resources as motherboard,
>> memory, disk, ...)
>>
>>>
>>>
>>> 2c)  Discussion:
>>> Current prototype implementations use a UUID as a unique identifier,
>>> which is a long "random number" that is (with a very high
>>> probability)
>>> unique within one suite, and also among all suites.
>>
>> First of all is the use of an UUID as defined in RFC 4122 a core  
>> tenet
>> of the architecture? There are other well defined ways to accomplish
>> that. There is an efficiency argument to be made, but what else was
>> behind this? What prevents us from using generic URNs? You can always
>> get to a UUID by specifying a hash function and a mapping namespace
>> something most UUID libraries provide.
>>
>>>
>>> However, there is no way to take a UUID and decide which suite it is
>>> in, and thus there is no way to find a UUID in a suite registry
>>> without checking the registries of all suites.
>>
>>
>> Anyway, there is a clear trade-off between the ease of creating a
>> unique identifier and finding information about it. But I disagree
>> that the only solution to find a UUID is by checking all registries.
>> We have very robust DHT technologies which can easily be used for
>> that. In fact, this is the route we are taking (with an interesting
>> twist, though).
>>>
>>
>>> 2d)  A proposed solution is to have the requirements read:
>>> "Each principal (also aggregate, component, slice) shall have a
>>> globally-unique name and/or a globally unique numerical identifier,
>>> where part of the name and/or numerical identifier directly  
>>> specifies
>>> the identity of the GENI suite."
>>
>> Not sure if this fundamentally solves the problem. How do we ensure
>> uniqueness of the Suite ID (another UUID) and how do we initially  
>> find
>> all the entry points to the various suites?  If we assume that in
>> order to bootstrap the system we need a way to find out about all the
>> registries first, or have a hierarchical structure where everyone
>> knows THE registry and it knows (indirectly) every available suite,
>> then obviously we start with the relevant knowledge.
>>
>> I guess, if nothing else it limits the number of  identifiers we are
>> looking for and it's a rather stable set. Any gossiping scheme would
>> work very well.
>>
>> Now ,one solution for a hierarchical naming scheme and one which  
>> makes
>> my networking colleagues squirm, is  using IPv6 addresses for the
>> identifiers and the DNS infrastructure for lookups. We have a well
>> established way to assign address spaces, the SRV record (RFC 2782)
>> for instance is used by XMPP to find the relevant XMPP server for a
>> domain (and that's how we currently implement federation), ...
>>
>> As this is supposed to be a discussion, I better end on a slightly
>> controversial note :)
>>
>> Cheers,
>>
>> -max
>>
>>
>> _______________________________________________
>> control-wg mailing list
>> control-wg@...
>> http://lists.geni.net/mailman/listinfo/control-wg
>>
>
>
> _______________________________________________
> control-wg mailing list
> control-wg@...
> http://lists.geni.net/mailman/listinfo/control-wg


_______________________________________________
control-wg mailing list
control-wg@...
http://lists.geni.net/mailman/listinfo/control-wg