|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: [geni-dev] CF Requirements: 2) Identity Vocabulary(Since I prepared my write-up before Max sent out his summary about identifiers, I decided to send out mine as well. However, Max presented a few interesting ideas, so feel free to consider one of our emails as an anchoring one and pull-in the other one into the context. Thanks.) We need identifiers to reference/identify entities such as principals, aggregates, components, slices etc. We expect the users, i.e., anyone (anything) who (that) has an identifier in hand and wants to proceed further with that identifier, to get some additional information associated with the entity referenced by that identifier. This additional information is key to enabling requirements such as getting metadata about the entity, setting up slices, interoperate across multiple organizations within a suite, interoperate across multiple suites etc. In other words, Identifiers must (1) be unique across all organizations in a given suite. (2) reveal some information, by either storing the information in the identifier or by associating that information somewhere else, about the owner suite or metadata about the entity being referenced etc. (3) honor pt. 1 and 2 across all suites, to support federation and interoperability. Pt. 1: Guaranteeing uniqueness is not a technical problem (there are a number of GUID, UUID generators to create unique identifiers); it is an organizational problem. Can all participating organizations use the same generator? If so, the problem is resolved. Can all organizations agree to providing a mapping between organizational identifiers and suite-wide-accepted identifiers? If so, the problem is resolved. That is, there is no magic in achieving our goal here. All participants must sit together and agree to one another. Either create a spec. and ask everyone to implement that spec, or at least *map* internal implementations to that spec. Pt. 2: Associating information with identifiers is common to computing and networking world for ages. However, there are two primary variances of those associations: (a) associating information by storing it *in* the identifiers or (b) associating information with the help of a common service. Storing information in the identifiers, e.g. an entity identifier that looks like suiteID.somehashID, is fragile unless we all expect the entity to never be part of a different suite. By extension, putting semantics into identifiers is almost always a bad idea, unless we can guarantee that those semantics never change. A parallel problem with putting semantics is where do we draw the line, i.e. how do we know what kind of semantics is relevant to be part of the identifier and what kind is not. In my example, why putting *only* suiteID in the identifier is important, why not put other information such as access control, into the identifier. The solution is to associate information with the identifiers by some means. For instance, the Handle System (RFC 3650), a distributed one, allows creating unique identifiers, associating records (of information) with identifiers, resolving associated records, and administering the identifiers individually. In a nutshell, it is a naming system for entities, just like what DNS is for machines - but with improved administrative and distributed capabilities. The point for mentioning about the Handle System is to emphasize the fact that there exists *a* solution that addresses our requirements. Pt. 3: If we have solved pt.1 and pt.2, then we are much closer, if not already, to solving the interoperability problem. I have no doubt, we need to study the interoperability problem thoroughly - but if we have dealt with the other two problems successfully, then we should be in a better position to address this one. -Giridhar On Mar 18, 2009, at 9:07 PM, Max Ott wrote: > > OK, let me start. > > On 14/03/2009, at 8:59 AM, Harry Mussman wrote: >> > >> 2a) During the discussion, Larry Lannom of CNRI made the point >> that a >> system like GENI needs a precise vocabulary or ontology, that is >> shared by all suites. (This is absolutely essential when multiple >> GENI >> suites that are federated together, as expected.) This will apply to >> principals, aggregates and slices. > > I fully agree and have been making the argument that a taxonomic > approach as taken in RSpec is insufficient. We need an ontology not > only to have a precise way of describing things (first basic > requirement for achieving repeatability) but also to describe > RELATIONSHIPS and constraints among them. Now you can shoehorn all > that into a taxonomy or to be less strict, into a tree structure with > refids like we have in XML, but it will be messy to define the > underlying vocabulary. > > Now one of the biggest and valid criticism is that it is very hard to > create an all encompassing ontology. To get a sizeable group with > diverse interests to agree on all aspects can take years, just check > the progress on some OASIS standards. > > But we don't need that in order to get going. Namespaces allow us to > easily extend an existing, or add a new one. Obviously, it won't help > us if everyone has their own version, but there are already a few good > starting points (including RSpec for an initial set of topics/nouns) > and the various groups in GENI could add the things they care about. > There has been tremendous progress in the Semantic Web community on > automatic mapping related ontologies to each other and if we use their > basic technologies, such as OWL, we can leverage a lot. > > Ilia Baldine is using NDL in ORCA and I have been trying to at least > convert RSpec into an ontology (all my tools fails to parse the > current spec). > > >> 2b) The current DRAFT states: >> "Each principal (also aggregate, component, slice) shall have a >> globally-unique name and/or a globally unique numerical identifier." > > I would actually broaden that to something like 'artifact'. In Orbit, > beside all that, every experiment, every measurement set, every > experiment description file, configuration prototype, application, ... > has its unique identifier. How else would we be able to describe what > we want to do and also what we did and how everything is related to. > > Coming back to relationships mentioned above, there are some > interesting 'complications'. Let's pick a simple resource, such as a > computer. That obviously should have a identifier. At some stage we > replace the disk. Does the resource get a new identifier? It's not the > same anymore, it's performance and capabilities may have changed. So > our inventory ontology (or database schema) breaks this down into > related resources which make up an other one. (Is a computer now an > aggregate as it aggregates such "atomic" resources as motherboard, > memory, disk, ...) > >> >> >> 2c) Discussion: >> Current prototype implementations use a UUID as a unique identifier, >> which is a long "random number" that is (with a very high >> probability) >> unique within one suite, and also among all suites. > > First of all is the use of an UUID as defined in RFC 4122 a core tenet > of the architecture? There are other well defined ways to accomplish > that. There is an efficiency argument to be made, but what else was > behind this? What prevents us from using generic URNs? You can always > get to a UUID by specifying a hash function and a mapping namespace > something most UUID libraries provide. > >> >> However, there is no way to take a UUID and decide which suite it is >> in, and thus there is no way to find a UUID in a suite registry >> without checking the registries of all suites. > > > Anyway, there is a clear trade-off between the ease of creating a > unique identifier and finding information about it. But I disagree > that the only solution to find a UUID is by checking all registries. > We have very robust DHT technologies which can easily be used for > that. In fact, this is the route we are taking (with an interesting > twist, though). >> > >> 2d) A proposed solution is to have the requirements read: >> "Each principal (also aggregate, component, slice) shall have a >> globally-unique name and/or a globally unique numerical identifier, >> where part of the name and/or numerical identifier directly specifies >> the identity of the GENI suite." > > Not sure if this fundamentally solves the problem. How do we ensure > uniqueness of the Suite ID (another UUID) and how do we initially find > all the entry points to the various suites? If we assume that in > order to bootstrap the system we need a way to find out about all the > registries first, or have a hierarchical structure where everyone > knows THE registry and it knows (indirectly) every available suite, > then obviously we start with the relevant knowledge. > > I guess, if nothing else it limits the number of identifiers we are > looking for and it's a rather stable set. Any gossiping scheme would > work very well. > > Now ,one solution for a hierarchical naming scheme and one which makes > my networking colleagues squirm, is using IPv6 addresses for the > identifiers and the DNS infrastructure for lookups. We have a well > established way to assign address spaces, the SRV record (RFC 2782) > for instance is used by XMPP to find the relevant XMPP server for a > domain (and that's how we currently implement federation), ... > > As this is supposed to be a discussion, I better end on a slightly > controversial note :) > > Cheers, > > -max > > > _______________________________________________ > control-wg mailing list > control-wg@... > http://lists.geni.net/mailman/listinfo/control-wg > _______________________________________________ control-wg mailing list control-wg@... http://lists.geni.net/mailman/listinfo/control-wg |
|
|
Re: [geni-dev] CF Requirements: 2) Identity VocabularyI'm supposed to be starting some other discussion, but I'll add a bit
to this one first. Semantics in identifiers - we have both a lot of hands-on experience with this and a lot of debate scars. As a general rule its easy to say - semantics are a bad idea. But, of course, there is some nuance to it. If you care at all about persistence, then the problem is deriving the identifier from some attribute of the identified entity where that attribute may change over time. Location and ownership are the obvious examples of this problem. If, however, you define the identifier/ entity relationship such that any change whatsoever in a given attribute would result in a new entity with a new identifier, then deriving the identifier from that attribute should be ok. But your crystal ball has be in good shape if you are working in some area where the identified entity is of interest 5, 10, or more years from now. Our approach has been to keep meaning out of identifiers and rely instead on a resolution system. Resolving identifiers - having unique and persistent ids doesn't do you much good, in most circumstances, if you don't have a well-known way to get from the id to current state data on the identified entity. In network environments location is the most common piece of desirable data, but not the only one. If you don't have access to the entity, but want to request access, you need to know how. Perhaps you do have access but need to how to talk to it, how big it is, etc. Our approach (handle system) is to resolve identifiers to one or more type/ value pairs, with an extensible typing system. The design choice that developers have faced over the years of using this system is how much data to put into those values and, as a result of that, how many levels of indirection to put into their application. By-and-large, people have come down to putting small amounts of data in the resolution system. This is also the DNS model. So handle values tend to be pointers (IP addresses, URLs, small chunks of XML containing multiple URLs plus decision rules) and end point data such as public keys or hashes. Larry On Mar 19, 2009, at 2:15 AM, Giridhar Manepalli wrote: > > > (Since I prepared my write-up before Max sent out his summary about > identifiers, I decided to send out mine as well. However, Max > presented a few interesting ideas, so feel free to consider one of our > emails as an anchoring one and pull-in the other one into the context. > Thanks.) > > > We need identifiers to reference/identify entities such as principals, > aggregates, components, slices etc. We expect the users, i.e., anyone > (anything) who (that) has an identifier in hand and wants to proceed > further with that identifier, to get some additional information > associated with the entity referenced by that identifier. This > additional information is key to enabling requirements such as getting > metadata about the entity, setting up slices, interoperate across > multiple organizations within a suite, interoperate across multiple > suites etc. > > In other words, > > Identifiers must > > (1) be unique across all organizations in a given suite. > > (2) reveal some information, by either storing the information in the > identifier or by associating that information somewhere else, about > the owner suite or metadata about the entity being referenced etc. > > (3) honor pt. 1 and 2 across all suites, to support federation and > interoperability. > > Pt. 1: Guaranteeing uniqueness is not a technical problem (there are a > number of GUID, UUID generators to create unique identifiers); it is > an organizational problem. Can all participating organizations use the > same generator? If so, the problem is resolved. Can all organizations > agree to providing a mapping between organizational identifiers and > suite-wide-accepted identifiers? If so, the problem is resolved. > > That is, there is no magic in achieving our goal here. All > participants must sit together and agree to one another. Either create > a spec. and ask everyone to implement that spec, or at least *map* > internal implementations to that spec. > > Pt. 2: Associating information with identifiers is common to computing > and networking world for ages. However, there are two primary > variances of those associations: (a) associating information by > storing it *in* the identifiers or (b) associating information with > the help of a common service. > > Storing information in the identifiers, e.g. an entity identifier that > looks like suiteID.somehashID, is fragile unless we all expect the > entity to never be part of a different suite. By extension, putting > semantics into identifiers is almost always a bad idea, unless we can > guarantee that those semantics never change. A parallel problem with > putting semantics is where do we draw the line, i.e. how do we know > what kind of semantics is relevant to be part of the identifier and > what kind is not. In my example, why putting *only* suiteID in the > identifier is important, why not put other information such as access > control, into the identifier. > > The solution is to associate information with the identifiers by some > means. For instance, the Handle System (RFC 3650), a distributed one, > allows creating unique identifiers, associating records (of > information) with identifiers, resolving associated records, and > administering the identifiers individually. In a nutshell, it is a > naming system for entities, just like what DNS is for machines - but > with improved administrative and distributed capabilities. The point > for mentioning about the Handle System is to emphasize the fact that > there exists *a* solution that addresses our requirements. > > Pt. 3: If we have solved pt.1 and pt.2, then we are much closer, if > not already, to solving the interoperability problem. I have no doubt, > we need to study the interoperability problem thoroughly - but if we > have dealt with the other two problems successfully, then we should be > in a better position to address this one. > > -Giridhar > > > On Mar 18, 2009, at 9:07 PM, Max Ott wrote: > >> >> OK, let me start. >> >> On 14/03/2009, at 8:59 AM, Harry Mussman wrote: >>> >> >>> 2a) During the discussion, Larry Lannom of CNRI made the point >>> that a >>> system like GENI needs a precise vocabulary or ontology, that is >>> shared by all suites. (This is absolutely essential when multiple >>> GENI >>> suites that are federated together, as expected.) This will apply to >>> principals, aggregates and slices. >> >> I fully agree and have been making the argument that a taxonomic >> approach as taken in RSpec is insufficient. We need an ontology not >> only to have a precise way of describing things (first basic >> requirement for achieving repeatability) but also to describe >> RELATIONSHIPS and constraints among them. Now you can shoehorn all >> that into a taxonomy or to be less strict, into a tree structure with >> refids like we have in XML, but it will be messy to define the >> underlying vocabulary. >> >> Now one of the biggest and valid criticism is that it is very hard to >> create an all encompassing ontology. To get a sizeable group with >> diverse interests to agree on all aspects can take years, just check >> the progress on some OASIS standards. >> >> But we don't need that in order to get going. Namespaces allow us to >> easily extend an existing, or add a new one. Obviously, it won't help >> us if everyone has their own version, but there are already a few >> good >> starting points (including RSpec for an initial set of topics/nouns) >> and the various groups in GENI could add the things they care about. >> There has been tremendous progress in the Semantic Web community on >> automatic mapping related ontologies to each other and if we use >> their >> basic technologies, such as OWL, we can leverage a lot. >> >> Ilia Baldine is using NDL in ORCA and I have been trying to at least >> convert RSpec into an ontology (all my tools fails to parse the >> current spec). >> >> >>> 2b) The current DRAFT states: >>> "Each principal (also aggregate, component, slice) shall have a >>> globally-unique name and/or a globally unique numerical identifier." >> >> I would actually broaden that to something like 'artifact'. In Orbit, >> beside all that, every experiment, every measurement set, every >> experiment description file, configuration prototype, >> application, ... >> has its unique identifier. How else would we be able to describe what >> we want to do and also what we did and how everything is related to. >> >> Coming back to relationships mentioned above, there are some >> interesting 'complications'. Let's pick a simple resource, such as a >> computer. That obviously should have a identifier. At some stage we >> replace the disk. Does the resource get a new identifier? It's not >> the >> same anymore, it's performance and capabilities may have changed. So >> our inventory ontology (or database schema) breaks this down into >> related resources which make up an other one. (Is a computer now an >> aggregate as it aggregates such "atomic" resources as motherboard, >> memory, disk, ...) >> >>> >>> >>> 2c) Discussion: >>> Current prototype implementations use a UUID as a unique identifier, >>> which is a long "random number" that is (with a very high >>> probability) >>> unique within one suite, and also among all suites. >> >> First of all is the use of an UUID as defined in RFC 4122 a core >> tenet >> of the architecture? There are other well defined ways to accomplish >> that. There is an efficiency argument to be made, but what else was >> behind this? What prevents us from using generic URNs? You can always >> get to a UUID by specifying a hash function and a mapping namespace >> something most UUID libraries provide. >> >>> >>> However, there is no way to take a UUID and decide which suite it is >>> in, and thus there is no way to find a UUID in a suite registry >>> without checking the registries of all suites. >> >> >> Anyway, there is a clear trade-off between the ease of creating a >> unique identifier and finding information about it. But I disagree >> that the only solution to find a UUID is by checking all registries. >> We have very robust DHT technologies which can easily be used for >> that. In fact, this is the route we are taking (with an interesting >> twist, though). >>> >> >>> 2d) A proposed solution is to have the requirements read: >>> "Each principal (also aggregate, component, slice) shall have a >>> globally-unique name and/or a globally unique numerical identifier, >>> where part of the name and/or numerical identifier directly >>> specifies >>> the identity of the GENI suite." >> >> Not sure if this fundamentally solves the problem. How do we ensure >> uniqueness of the Suite ID (another UUID) and how do we initially >> find >> all the entry points to the various suites? If we assume that in >> order to bootstrap the system we need a way to find out about all the >> registries first, or have a hierarchical structure where everyone >> knows THE registry and it knows (indirectly) every available suite, >> then obviously we start with the relevant knowledge. >> >> I guess, if nothing else it limits the number of identifiers we are >> looking for and it's a rather stable set. Any gossiping scheme would >> work very well. >> >> Now ,one solution for a hierarchical naming scheme and one which >> makes >> my networking colleagues squirm, is using IPv6 addresses for the >> identifiers and the DNS infrastructure for lookups. We have a well >> established way to assign address spaces, the SRV record (RFC 2782) >> for instance is used by XMPP to find the relevant XMPP server for a >> domain (and that's how we currently implement federation), ... >> >> As this is supposed to be a discussion, I better end on a slightly >> controversial note :) >> >> Cheers, >> >> -max >> >> >> _______________________________________________ >> control-wg mailing list >> control-wg@... >> http://lists.geni.net/mailman/listinfo/control-wg >> > > > _______________________________________________ > control-wg mailing list > control-wg@... > http://lists.geni.net/mailman/listinfo/control-wg _______________________________________________ control-wg mailing list control-wg@... http://lists.geni.net/mailman/listinfo/control-wg |
| Free embeddable forum powered by Nabble | Forum Help |