
|
URIs and Unique IDs
I'm resending this message to the semantic web discussion group for the record. On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that
we have no good solutions yet, do we continue to throw our hands up and
punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
|

|
Re: URIs and Unique IDs
Michael,
I'm not sure that its as cut and dry as: "Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."
There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term. In other words, the application wants to change its behavior when the semantics of a term are changed. In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term. I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology. Different ontologies and different applications will require different approaches.
Mike Lang On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that
we have no good solutions yet, do we continue to throw our hands up and
punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
Re: URIs and Unique IDs
We are trying to release a community semantic service (later this month!) that "does the right thing" in this arena. So I strongly agree with the tenor of this message. Except I am trying to imagine what implementation should happen in the _present_ for our service to be an exemplar. I am sorry for the long post, but if it is mostly valid, hopefully it can advance the discussion.
We have provisionally settled on the following principles for this service (which is intended to store domain vocabularies and terms, keep track of their versions, and let people make relations between them). I realize the focus of the original post was on URIs of the relations, but I think semantics of any terms are also important to consider, and probably apply to the relations.
Principles A. *Any* change to a vocabulary, including to any of its terms (and their semantics), metadata, means the vocabulary must get a new version = new URI B. A vocabulary contains all the terms within it, not just the terms that changed in that version C. The nominally opaque URIs must be fairly self-consistent in their presentation, or people in the non-semantic community will misunderstand them (or rebel against using them) D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI E. It must be possible to choose (i.e., map to, or identify) either a specific (versioned) meaning, or a 'most current' meaning, for a given concept
From these principles I've concluded aa. A new vocabulary version results in new term versions (= new URIs) for all the terms as well (even if their semantics haven't changed, sorry -- see below for further thoughts on this) bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?) cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.) Those who choose to use the 'most current' term will get what they pay for. dd. Any created relationship that uses a 'most current' URI, should be timestamped to allow review of the historical state of the members of the triple (but note that this is strictly for understanding, since the selection of the 'most current' URI as the referenced concept explicitly permits changes to happen in that resource) ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
So two easy conclusions:
Yes, it is terrible for the semantics of a (nominally static) concept to change, and that concept's URI to remain the same. That breaks everything, as near as I can tell.
In the case of a subject/object term, it is clearly acceptable for the semantics of a _dynamic_ concept to change without changing the URI.
I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running. I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan. The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed. So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context.
Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. The service described in (ee) needs this capability. But I think sameAs doesn't apply here, as the two URIs actually reflect two different resources, which are definitionally and semantically equal, but live in a different context.) I imagine we will have to create a relationship for our own use that has this meaning for now.
If you just can't stand all those URIs that have the same semantics, and you told me I had to use the original URI that had that meaning, I would say 'ok' -- then, to meet principal (C), I would create URIs that dereference to the original URI, so that when people get confused and use the (wrong, non-existent) URI that corresponds to that term in the current version of the vocabulary, at least I could respond with useful information.
(Yes, I know this emphasizes why URIs should be opaque, and I'm afraid in this respect I am consciously doing the 'wrong' thing by making my URI algorithm all too obvious. The value added by a semantic URI is just too compelling, for the success of the project and semantic adoption in general.)
John
On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote: I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold < uschold@...> wrote: Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation. However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical. The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics. This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies. However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does. For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs. Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics. Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions. If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed. The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption. Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID. Michael Uschold
|

|
Re: URIs and Unique IDs
See inline comments. On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,
I'm not sure that its as cut and dry as:
"Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."
There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
In other words, the application wants to change its behavior when the semantics of a term are changed. In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.
Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics? In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web.
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology. Different ontologies and different applications will require different approaches.
Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
Mike Lang On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that
we have no good solutions yet, do we continue to throw our hands up and
punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
Re: URIs and Unique IDs
It seems to me that in cases where an application wants to use the most up to date version of something, you don't have to change the semantics and keep the same UID. You can instead have a subscription service with allows an application to be notified of every change to new versions. Then the application that wants the new version can have a mechanism for updating it's innards to replace every occurrence of the old UID with the new one.
For applications that need to retain both can do that too. Everyone can be happy.
Michael On Sat, Nov 1, 2008 at 2:31 PM, Michael F Uschold <uschold@...> wrote:
See inline comments.
On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,
I'm not sure that its as cut and dry as:
"Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."
There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
In other words, the application wants to change its behavior when the semantics of a term are changed. In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.
Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics? In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web.
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology. Different ontologies and different applications will require different approaches.
Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
Mike Lang On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that
we have no good solutions yet, do we continue to throw our hands up and
punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
Re: URIs and Unique IDs
comments in line On Thu, Oct 30, 2008 at 8:08 PM, John Graybeal <graybeal@...> wrote:
We are trying to release a community semantic service (later this month!) that "does the right thing" in this arena. Excellent, glad to learn this.
So I strongly agree with the tenor of this message. Except I am trying to imagine what implementation should happen in the _present_ for our service to be an exemplar. I am sorry for the long post, but if it is mostly valid, hopefully it can advance the discussion.
We have provisionally settled on the following principles for this service (which is intended to store domain vocabularies and terms, keep track of their versions, and let people make relations between them). I realize the focus of the original post was on URIs of the relations, but I think semantics of any terms are also important to consider, and probably apply to the relations.
Principles A. *Any* change to a vocabulary, including to any of its terms (and their semantics), metadata, means the vocabulary must get a new version = new URI
Agreed. I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?
B. A vocabulary contains all the terms within it, not just the terms that changed in that version So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.
C. The nominally opaque URIs must be fairly self-consistent in their presentation, or people in the non-semantic community will misunderstand them (or rebel against using them)
This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.
D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions. If this is what you mean, then I absolutely agree with this. If this is not what you mean, then what is the difference between the 'current meaning of a term' and any other meaning of that term.
E. It must be possible to choose (i.e., map to, or identify) either a specific (versioned) meaning, or a 'most current' meaning, for a given concept
Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI) that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current semantic web infrastructure?
From these principles I've concluded
aa. A new vocabulary version results in new term versions (= new URIs) for all the terms as well (even if their semantics haven't changed, sorry -- see below for further thoughts on this)
I definitely disagree on this, even after reading your material below.
bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases: 1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.
For 1. you do NOT want to change the name f the term, was and is the right term. But you DO want to change its UID because it is a different thing.
For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader. You should be able to change the name w/o changing the UID.
cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.) Those who choose to use the 'most current' term will get what they pay for.
You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
dd. Any created relationship that uses a 'most current' URI, should be timestamped to allow review of the historical state of the members of the triple (but note that this is strictly for understanding, since the selection of the 'most current' URI as the referenced concept explicitly permits changes to happen in that resource)
Timestamping is useful, but could be expensive.
ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.
So two easy conclusions:
Yes, it is terrible for the semantics of a (nominally static) concept to change, and that concept's URI to remain the same. That breaks everything, as near as I can tell.
Agreed.
In the case of a subject/object term, it is clearly acceptable for the semantics of a _dynamic_ concept to change without changing the URI.
Well the core URI/UID can stay the same, but each version needs to have its own UID so applications that want to use old versions don't break.
There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others.
I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
You have a strong intuition that I'm not able to grasp. Can you articulate why with an example?
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan. The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.
This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing. Multiple context shows different uses of a term, so each use should get a different UID, not the same one.
So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context.
Maybe the wordnet example is a read herring. In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?
Here is one example where it is clearly a bad thing. The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms. T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics.
T4: A new dataset is created which uses the new URIs T5: The application loads the new data T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well.
This is clearly a bad thing. Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.
Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics.
This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?
The service described in (ee) needs this capability. But I think sameAs doesn't apply here, as the two URIs actually reflect two different resources, which are definitionally and semantically equal, but live in a different context.)
I still can't see any advantages for creating multiple copies of exactly the same thing. Have I missed something?
I imagine we will have to create a relationship for our own use that has this meaning for now. We probably will need some new infrastructural primitives, to relate versions to each other.
If you just can't stand all those URIs that have the same semantics, and you told me I had to use the original URI that had that meaning, I would say 'ok' -- then, to meet principal (C), I would create URIs that dereference to the original URI, so that when people get confused and use the (wrong, non-existent) URI that corresponds to that term in the current version of the vocabulary, at least I could respond with useful information.
This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.
See another thread I started on similar topic by googling ["proliferation of URIs" uschold]
(Yes, I know this emphasizes why URIs should be opaque, and I'm afraid in this respect I am consciously doing the 'wrong' thing by making my URI algorithm all too obvious. The value added by a semantic URI is just too compelling, for the success of the project and semantic adoption in general.)
John
On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote: I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold < uschold@...> wrote: Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
|

|
Re: URIs and Unique IDs
Michael,
I will try to be clearer -- your confusion was my fault, sorry. Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!). On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:
Agreed. I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?
yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate. B. A vocabulary contains all the terms within it, not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near the end): If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption? So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.
No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too.
I am using 'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'. So codes and opaque terms are OK. For most ontologies we'll create, terms will be words and word phrases.
Anticipating your later comments, we concluded (you won't like this at all): 1. a URI is a suitable UID 2. a term can be part of a suitable URI The first is argued elsewhere by others.
Re the second: Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable.
I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1]
This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.
I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above. D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions. If this is what you mean, then I absolutely agree with this.
Yes, this is what I mean, but keep in mind my previous conflation. Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI) that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current semantic web infrastructure?
Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.) We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained). bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases: 1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted. 2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.
For 1. you do NOT want to change the name f the term, was and is the right term. But you DO want to change its UID because it is a different thing.
For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader.
yes to all the above, well put. You should be able to change the name w/o changing the UID.
Well, OK, maybe. Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word. (At least until I redefine the word. Sure hope everyone is using timestamps. :->) cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.) Those who choose to use the 'most current' term will get what they pay for.
You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.
Yes to all the above, and to the 'timestamps may be expensive' also. I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved." But I plan as if I will... There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others.
Maybe. If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise. I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage. You have a strong intuition that I'm not able to grasp. Can you articulate why with an example?
OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa. Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the ocean to keep it cool", some inferences could be faulty, but the engine won't break down.
But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed. If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well. As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan. The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.
This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing. Multiple context shows different uses of a term, so each use should get a different UID, not the same one.
This is a different context. Example below. So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context.
Maybe the wordnet example is a read herring. In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?
Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged. By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created.
For a domain example, let's go back to sea surface temperature. 5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface". More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature.
Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided. Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations.
Here is one example where it is clearly a bad thing. The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology. T1: application loads ontology using original terms. T2: application loads data expressed using the original terms T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics.
Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences. Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you? But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....)
But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version." Otherwise, I agree it's a bad thing. T4: A new dataset is created which uses the new URIs T5: The application loads the new data T6: The application poses a query which uses the old URIs to filter data. T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well. This is clearly a bad thing. Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.
One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely. Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics.
This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?
An excellent point. (I'm busted!) Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept. I still can't see any advantages for creating multiple copies of exactly the same thing. Have I missed something?
The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it. Conceptually/abstractly I suspect this may be the right way to think of a vocabulary.
But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...).
but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*.... I imagine we will have to create a relationship for our own use that has this meaning for now.
We probably will need some new infrastructural primitives, to relate versions to each other.
Just so.
This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible. See another thread I started on similar topic by googling ["proliferation of URIs" uschold]
Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.") I would be a relatively small scale offender for a while, but a bad example.
I will leave it there, too long a post for sure.
John
|

|
Re: URIs and Unique IDs
A short reply to the main point. Uschold said: I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?
Graybeal said:
The
practical advantage is the one introduced at the top -- I can consider
and implement the vocabulary as a unit, carrying all of its components
along with it. Conceptually/abstractly I suspect this may be the right
way to think of a vocabulary.
If you de-conflate URIs and UIDs we can have our cake and eat it too. The new ontology is a unit with a UID that is different than the original one. It is a bundle consisting of its component terms and definitions, and there is an ontology-has-component link that points to the UIDs of the most recent version. Done, perfect. Humans don't create or read UIDs, machines do. Tools and names can be used to have the user see whatever you want them to see. This scheme gives the advantaage you want w/o minting new URIs for the same thing.
Methinks that the conflation of URIs and UIDs makes it hard or impossible to get this advantage unless you mint new synonym URIs. Hence to de-conflate. I'm convinced that something like this is the right thing to do, in
principle. Finding out how it can be done in practice will be a lot of
work. --- BTW, this discussion has inspired me to write a paper on the topic. Probably too late to submit to WWW conference, but I will make whatever I have available in some manner when it is ready Thank you for helping me clarify my ideas on this.
Michael On Mon, Nov 3, 2008 at 5:24 AM, John Graybeal <graybeal@...> wrote:
Michael,
I will try to be clearer -- your confusion was my fault, sorry. Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!).
On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:
Agreed. I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?
yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate.
B. A vocabulary contains all the terms within it, not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near the end): If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption?
So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.
No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too.
I am using 'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'. So codes and opaque terms are OK. For most ontologies we'll create, terms will be words and word phrases.
Anticipating your later comments, we concluded (you won't like this at all): 1. a URI is a suitable UID 2. a term can be part of a suitable URI The first is argued elsewhere by others.
Re the second: Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable.
I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1]
This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.
I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above.
D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions. If this is what you mean, then I absolutely agree with this.
Yes, this is what I mean, but keep in mind my previous conflation.
Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI) that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current semantic web infrastructure?
Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.) We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained).
bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases: 1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.
For 1. you do NOT want to change the name f the term, was and is the right term. But you DO want to change its UID because it is a different thing.
For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader.
yes to all the above, well put. You should be able to change the name w/o changing the UID.
Well, OK, maybe. Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word. (At least until I redefine the word. Sure hope everyone is using timestamps. :->)
cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.) Those who choose to use the 'most current' term will get what they pay for.
You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts
When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.
Yes to all the above, and to the 'timestamps may be expensive' also. I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved." But I plan as if I will...
There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others.
Maybe. If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise.
I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
You have a strong intuition that I'm not able to grasp. Can you articulate why with an example?
OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa. Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the ocean to keep it cool", some inferences could be faulty, but the engine won't break down.
But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed. If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well.
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan. The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.
This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing. Multiple context shows different uses of a term, so each use should get a different UID, not the same one.
This is a different context. Example below. So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context.
Maybe the wordnet example is a read herring. In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?
Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged. By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created.
For a domain example, let's go back to sea surface temperature. 5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface". More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature.
Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided. Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations.
Here is one example where it is clearly a bad thing. The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms. T2: application loads data expressed using the original terms T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics.
Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences. Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you? But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....)
But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version." Otherwise, I agree it's a bad thing.
T4: A new dataset is created which uses the new URIs T5: The application loads the new data T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well. This is clearly a bad thing. Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.
One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely.
Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics.
This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?
An excellent point. (I'm busted!) Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept.
I still can't see any advantages for creating multiple copies of exactly the same thing. Have I missed something?
The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it. Conceptually/abstractly I suspect this may be the right way to think of a vocabulary.
But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...).
but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*....
I imagine we will have to create a relationship for our own use that has this meaning for now.
We probably will need some new infrastructural primitives, to relate versions to each other.
Just so.
This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.
See another thread I started on similar topic by googling ["proliferation of URIs" uschold]
Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.") I would be a relatively small scale offender for a while, but a bad example.
I will leave it there, too long a post for sure.
John
|

|
RE: URIs and Unique IDs
I agree with Michael Lang, who says that the community
(or architect) should decide how words are used in an ontology, and should agree
on changes. Common sense suggests to me that reason for changing
the semantics of a word is to correct an error, in which case the same word
should be used, and existing systems will be freed from the error. I cannot
think of a good reason for changing the meaning of a word in the context of an
ontology otherwise.
But I think its important to recognise that in most
real systems there are different levels of semantics.
- Firstly there are some "keywords" in OWL, whose
semantics is defined by W3C and implemented by reasoning engine builders.
- Secondly there will be some words that are not
defined as part of "OWL" but which are recognised as "keywords" by particular
software systems ("if x is a member of VIRAL_INFECTIONS do
y"). In these cases the software and the ontology are strongly bound
together.
- Thirdly there will be aspects of the ontology
which are "data driven", in that they are handled in a general way by the
software ("find the broader terms of a term").
Fourthly, There may also be a distinction between "A
box" and "T box" words.
Moreover, a word may be significant to the software in
one system, but handled in a "data driven" way by a different system.
It seems to me it is not at all clear cut to decide the
best way to modify these ontologies.
Michael,
I will try to be clearer -- your confusion was my fault, sorry.
Appreciate very much your comments (and charitable interpretations, n.b.
'strong intuition' :->!).
On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:
Agreed. I assume you mean the vocabulary is the ontology?
Are we assuming OWL ontologies here, if not then what do you mean by a
vocabulary?
yes, I used 'vocabulary' to reflect what our customers have, but
an OWL ontology is what we will generate.
B. A vocabulary contains all the terms within it,
not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near
the end): If I say to a user "Here is a vocabulary dated X", the user will
assume that all the terms come with that vocabulary, and the terms of that
vocabulary are also dated X. So can I build a working semantic approach that
accepts this assumption?
So in the SKOS example, the new SKOS vocabulary/ontology would
contain the terms that do not change URIs as well as terms with new versions
with new URIs.
No, sorry, I was sloppy and used 'terms' and 'URIs'
interchangeably. Here is the easy part: You can assume that if anything in the
specification of a term changes while its string of characters remain the same,
I will insist on a new URI for that term (the version string will do nicely to
discriminate). And if the string of characters for a term changes, that will be
a new URI too.
I am using 'term' to mean 'a string of characters that likely, but
not necessarily, means something to a human'. So codes and opaque terms
are OK. For most ontologies we'll create, terms will be words and word
phrases.
Anticipating your later comments, we concluded (you won't like this at
all):
1. a URI is a suitable UID
2. a term can be part of a suitable URI
The first is argued elsewhere by others.
Re the second: Since what I really wanted to do was give people a way
to say "here's what this string of characters means", it doesn't bother me that
the same string of characters may mean something else later -- I need the UID
not for the _concept_, but for the unique string of characters. That will always
be the string of characters I want that UID to refer to. So making the UID a URL
that embeds the string of characters was acceptable.
I think I understand the concerns about non-opaque and non-persistent URLs,
and believe that those costs are relatively low compared to the resulting early
adopter benefits of this approach. [1]
This issue arises because of the conflation of URIs, UIDs and
human-readable IDs. Until these are de-conflated, probably this principle is
the right one. It will be unnecessary after
de-conflation.
I am unconvinced de-conflation can happen, at least in
our lifetimes, which is why I made some of those horrible linkages above.
D. It must be possible to 'look up' the current meaning of a
term, as well as specifically request any past meanings by their
URI
I read this that a term like 'broader' in SKOS could have
multiple URIs for multiple versions. If this is what you mean, then I
absolutely agree with this.
Yes, this is what I mean, but keep in mind my previous
conflation.
Agreed. You seem to be proposing the idea
of some kind of object (perhaps with a URI) that corresponds to the core
term, and that its various meanings are related versions are linked to the
core term. This may be a workable idea. Can this be done with the
current semantic web infrastructure?
Oh, I sure hope so. (Well, new relationships may be
needed. Not an expert here.) We are doing it a shade outside of the
'strict infrastructure', if there is such a thing -- our server will try to be
smart about the relationships between vocabulary versions (well, it has to be,
to make sure the version relationships are maintained).
bb. Any significant definitional or semantic
change to a term should really create a new term, not just evolve the word
we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable
position. I think there are at least two cases: 1. there was a
bonified conceptual error, and everyone agrees that the old meaning was the
wrong one and it is not wanted. 2. there is a new alternative,
that works in some cases, and some may also wish to use the older versions.
For 1. you do NOT want to change the name f the term,
was and is the right term. But you DO want to change its UID because it
is a different thing.
For 2, you probably want to introduce a new term
with a new name and a new UID. You could have the name of the transitive
version of broader be called broaderT and the non-transitive one be called
broader.
yes to all the above, well put.
You should be able to change the name w/o changing the UID.
Well, OK, maybe. Not for my own vocabularies,
because those are trying to define strings, not concepts. As you can see, I am
hung up on which thing someone has in mind when they say the name -- is it the
concept behind the name, or the name itself? I find it a lot easier to consider
the name the resource of interest, and if someday my 'inflammable' is redefined
to mean flammable, then my ontology will be exactly as wrong as all the books
that used the 'old' definition of the word. (At least until I redefine the
word. Sure hope everyone is using timestamps. :->)
cc. Created relationships to 'most current' URIs
persist even as the semantics of that resource may change; this potentially
introduces a time quality to inferences done with these resources (e.g.,
"Today's New York Times has an article on election polls" may be true
statement today, but false next week.) Those who choose to use the
'most current' term will get what they pay for.
You might be able to have programmatic or
infrastructural capability which can return the 'most current' version of a
given core term. There might be a URI/UID for the core term, and that is what
would be accessed. There, a directive would be given that says please return
the the most recent version of that item. This is a promising idea that could
probably keep everyone happy.
ee. Both the provided service, and ontology
engines in general, must be able to relate terms to their semantically
identical historical counterparts
When every version of every term has its own UID, then this
becomes feasible, though it may also be an expensive overhead.
ff. The service should be able to quickly
identify/present to its users each change in semantic meaning for a
term.
Yes, and an application should also be able to subscribe to the
core UID for a concept to be notified of any changes so it can keep up to date
automatically in the case where the most uptodate version is wanted, and
otherwise people can look into new versions on a case by case
basis.
Yes to all the above, and to the 'timestamps may be
expensive' also. I am worried about expense, but suspect I won't be able
to tell for a while how resource-intensive this will be, and whether
optimization will take care of it, and whether I still will be paid to "keep
this problem solved." But I plan as if I will...
There may be some clear cut cases where
you can tell which things are static vs. dynamic. However IMHO, it is likely
that a lot (perhaps most) case will be dependent on the needs of the
application, and the same concept may be dynamic in some applications and
static in others. Maybe.
If I declare the static concept is forever unvarying by definition, I
don't think it would be strategic for an application to assume otherwise.
I am less sanguine about this for predicates --
it seems like you're allowing replacing the engine while the car is
running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for
predicates, but it seems really inappropriate at this
stage.
You have a strong intuition that I'm not able to grasp.
Can you articulate why with an example?
OK, my examples uses 'sea surface temperature' as a
subject, and 'sameAs' as a predicate. If, over time, the concept associated with
'sea surface temperature' evolves from "any measurement of any body of sea water
within a meter or so of the ocean's surface" to "an informal reference to the
concept of temperature near the ocean surface (deprecated as a reference to a
particular measurement)", the tools I have written may produce some
less-than-ideal inferences if they assume the new definition applies to old
data, or vice-versa. Even if the new definition in 100 years is
"measurement of the temperature of the foam we keep on top of the ocean to
keep it cool", some inferences could be faulty, but the engine won't break
down.
But if I've originally used 'sameAs' in mappings to mean that two concepts
are analogous in certain defined ways (maybe a faulty original practice, but go
with it), and then the term is redefined by general consensus to mean "refers to
the exact same resource", I have some really broken results, because a key piece
right in the middle of my infrastructure has changed. If you try to change
important parts in a car while it's moving, bad things can happen, even if the
new part is every bit as good as the old part. If we try to change the meaning
of core terms used in semantic inferencing, then all the tools and things are
likely to behave oddly during the change, if not afterwards as well.
As to the multiple URIs for a single concept problem that was
introduced in (aa) above, I have both a justification and a backup plan.
The justification is that the meaning of terms and their definitions
is inferred in a context, and changes to the context (the rest of the
vocabulary) can affect the implicit meaning, or usage, of a term that
nominally wasn't changed.
This is true, and the reason why terms/words in
wordnet belong to multiple synsets. Each synset has a unique meaning, and in
the owl dataset, each synset has its own URI. So I don't find your argument
convincing. Multiple context shows different uses of a term, so each use
should get a different UID, not the same one.
This is a different context. Example below.
So even if I haven't changed the explicit definition of a term
in a new vocabulary release, it is meaningful to consider this term a new
resource, and give it a new URI, to reflect its new
context.
Maybe the wordnet example is a read herring. In any event,
can you provided a clear example of how an application would find it helpful
to have whole new sets of URIs minted for identical
things?
Here's a simple example, before giving you a detailed domain-specific
example: Let's say I review a vocabulary and change 80% of the definitions. But
the remaining definitions are deemed good and remain unchanged. By virtue
of being part of a heavily reviewed vocabulary, these remaining original terms
have gained credibility -- they are more reviewed and more trusted then they
were before that version was created.
For a domain example, let's go back to sea surface
temperature. 5 years ago, it meant something like "any measurement of
any body of sea water within a meter or so of the ocean's surface". More
recently, data managers realized that wasn't specific enough. So 5 new
terms were created to precisely delineate the difference kinds of sea surface
temperature.
Now, if I get a set of data that uses some of these new terms to label
variables, and also has an item labelled 'sea surface temperature', I can infer
that the use of the broader variable meant that no more specific description
could be provided. Whereas in data from 5 years ago, I might replace the
general term in many cases, by looking at other metadata to learn the more
specific term. With the existence of the new terms, the old term has new
connotations.
Here is one example where it is clearly a bad thing. The application is ontology-driven at a deep level. It makes use of
the resources in the coding/creation of application functionality. It also
loads and makes use of data using the ontology. T1: application
loads ontology using original terms. T2: application loads data
expressed using the original terms T3: all new URIs are minted,
when only a few have changed semantics, and there is no indication of which
ones have new semantics and which have the same semantics.
Well, this is bad but not unmanageable. Presumably a
query of the 'before' and 'after' resources for those two concepts would reveal
whether or not there are differences. Or, presumably you can query the
ontologies to get that info, even if you can't query the terms themselves. (Hmm,
in today's semantic web a lot of times you don't have the original ontology
versions either, do you? But that would be another thing that breaks the
system to some degree, you don't have any ability to validate previous
inferences or see what it was like when the relationships were created, so you
can't validate them independently. Sigh....)
But in any case, I accept the challenge here and say again "it only works
if the new URIs can say whether they are the same semantics as a previous
version." Otherwise, I agree it's a bad thing.
T4: A new dataset is created which uses the new URIs T5: The application loads the new data T6: The
application poses a query which uses the old URIs to filter data. T7; The new URIs do not match the old ones, so the query only returns
data from the old URIs when it should return data from the new dataset as
well.
This is clearly a bad thing. Your
proposal has to argue advantages that offset the disadvantage here, in order
for me to buy into it.
One mitigation of disadvantages is obtained if most of
the users map to the 'most recent version' (core concept) of the term, not
specific versions. I suspect this will be likely.
Of course, it is also very important to say this new resource
has the same definition and semantics as another, previous resource,
preferably pointing back to the original instance with that
definition/semantics.
This creates an unnecessary burden and seems to
contradict your point that something in a different context will have
different semantics. If it has different semantics, then why point back to
something with identical semantics?
An excellent point. (I'm busted!) Apparently
I differentiate between explicit meanings, which one finds in the term's
resource description, and implicit meanings, which one finds in larger context.
The version relationship primitives have to be understood to refer to the
explicit meanings only. When the definition changes explicitly, that's a URI
change that no longer can be considered exactly the same concept.
I still can't see any advantages for
creating multiple copies of exactly the same thing.
Have I missed something?
The practical advantage is the one introduced at the top
-- I can consider and implement the vocabulary as a unit, carrying all of its
components along with it. Conceptually/abstractly I suspect this may be
the right way to think of a vocabulary.
But more practically, this gives me a trivial way to generate URIs for
those terms, a trivial way to capture the contents of each new version of the
ontology (otherwise I have to analyze every term to decide if it is different,
right?), a trivial way to explain to the user what the URI for each term will
look like, and a way to tell from the term URI which vocabulary it's a part of
(not that I'd ever do that to an opaque URI...).
but of course, I realize I have to go do some of these things latert, in
any reasonable version of the system...I just don't have to do them
*instantly*....
I imagine we will have to create a relationship for our own
use that has this meaning for now.
We probably will need some new infrastructural primitives, to
relate versions to each other.
Just so.
This is a practical solution which would probably be pretty easy
when URIs are de-conflated with UIDs. Though proliferation of URIs for the
same thing should be reduced whenever possible.
See
another thread I started on similar topic by googling ["proliferation of URIs" uschold]
Excellent, I looked at the summary post and I see things with your level of
concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is
life, a constant tradeoff, but life is, on balance good.") I would be a
relatively small scale offender for a while, but a bad
example.
I
will leave it there, too long a post for sure.
John
|

|
Re: URIs and Unique IDs
Michael F Uschold wrote:
>
> BTW, this discussion has inspired me to write a paper on the topic.
> Probably too late to submit to WWW conference, but I will make whatever
> I have available in some manner when it is ready
Great, I look forward to seeing it. Can you post a copy to semantic-web
when it's ready?
BTW - and perhaps this is a bit cheeky - but may I take this opportunity
suggest (to you and others) that titles like 'URI Crisis' are a little
overly-dramatic. I'd rather see dull titles like 'URI - Mild Nuisance',
'URIs don't solve everything' or 'URIs Provide Opportunity for Further
Clarity and Best Practice'. The use of URIs provides mostly syntax and
some machinery around decentralised control; it also quite naturally and
inevitably provides many many ways to screw up. This doesn't constitute
a crisis any more than the fact that Unicode allows people to write
illogical things or bad poetry. Or that UML and OWL allow bad or vague
conceptual models to acquire the outer trappings of formality.
I find the talk of URI and identity crisis a little alarmist, and I fear
they're one of the factors that put people off from approaching this
technology. Are we really in a crisis situation? Should I stop or start
doing something asap?
cheers,
Dan
--
http://danbri.org/
|

|
Re: URIs and Unique IDs
On Nov 3, 2008, at 12:21 AM, Michael F Uschold wrote:
> Humans don't create or read UIDs, machines do. Tools and names can
> be used to have the user see whatever you want them to see. This
> scheme gives the advantaage you want w/o minting new URIs for the
> same thing.
Well, this is probably the nub of the different choices. In the
domain I work in, humans -- often assisted by machines, often not --
create the vast majority of both UIDs and URIs, and there are precious
few tools and systems supporting the former. (By 'supporting' I mean
creating the association between the human-centric data that keys the
UID, and always providing the right human-centric data whenever the
UID surfaces.)
In marine science at least, this is just Not Going To Happen in any
pervasive way for quite a while. So if I want human acceptance of
semantics now, regretfully, I'm going to have to conflate.
In each of our cases, we will be spending time trying to make this
work. Along those lines, I will be thinking very hard about how to
avoid the creation of semantically duplicate URIs in our system -- I
welcome lobbying (either way) from others regarding the value of this.
(I can summarize off-list comments.) Also I look forward to the
paper, I am sure I will learn from it. Thanks.
John
|

|
Re: URIs and Unique IDs
Thanks for the comments, Dan.
I expect that for most people it is indeed a mild nuisance or similar.
For me on my job, it is a major headache adn it is getting worse, not better. When the new version of Wordnet came out, we were faced with choices:
1. go through a long and painful process to 'do it right', by finding out which URIs shoudl be the same and which ones different. Reverse engineering what we wish would have been done in the first place (keep same URIs for thigns that do not change)
2. go through the process of changing our application in the appropriate places where any of teh old URIs were used.
3. dont bother upgrading to the new version.
None are attractive.
So I can agree that it is not a crisis now, but things are deterioriating, not improving. I was distressed to learn what SKOS was contemplating.
So I am argunig that there is a looming crisis.
A few months ago there was no financial crisis. Now there is.
In a few months or year or two, there could well be a full blown URI crises.
My work on this is aimed at preventing this crisis from unfolding before it is too late.
Michael
On Mon, Nov 3, 2008 at 12:43 PM, Dan Brickley <danbri@...> wrote:
Michael F Uschold wrote:
BTW, this discussion has inspired me to write a paper on the topic. Probably too late to submit to WWW conference, but I will make whatever I have available in some manner when it is ready
Great, I look forward to seeing it. Can you post a copy to semantic-web when it's ready?
BTW - and perhaps this is a bit cheeky - but may I take this opportunity suggest (to you and others) that titles like 'URI Crisis' are a little overly-dramatic. I'd rather see dull titles like 'URI - Mild Nuisance', 'URIs don't solve everything' or 'URIs Provide Opportunity for Further Clarity and Best Practice'. The use of URIs provides mostly syntax and some machinery around decentralised control; it also quite naturally and inevitably provides many many ways to screw up. This doesn't constitute a crisis any more than the fact that Unicode allows people to write illogical things or bad poetry. Or that UML and OWL allow bad or vague conceptual models to acquire the outer trappings of formality.
I find the talk of URI and identity crisis a little alarmist, and I fear they're one of the factors that put people off from approaching this technology. Are we really in a crisis situation? Should I stop or start doing something asap?
cheers,
Dan
-- http://danbri.org/
|

|
Re: URIs and Unique IDs
Michael,
After reading through your thread with John, I think that we are on the same page. I strongly believe (and it seems that you and John agree) that if a UID for a concept changes, the old version must have some way of pointing to the new version. I think this would call for a standard property that could be monitored so that an application would know when a new version was created. I am not sure why you left this out of your two principles. I believe if they are followed, then this principle must also be followed to avoid disrupting applications that make use of the concept.
In the case of Wordnet, had they followed your two principles, plus created pointers to new UIDs using on a standard property, you would be able to build your application so that it could automatically migrate to a new version of Wordnet.
Anyway, I look forward to your paper. My company, Revelytix, produces a web-based collaborative ontology editor and we currently do not really support versioning, simply because it has not been clear which direction we should go. We would be very happy to see the semantic web community reach consensus on some basic principles, it would make things much easier for us.
Michael Lang
On Sat, Nov 1, 2008 at 8:31 AM, Michael F Uschold <uschold@...> wrote:
See inline comments.
On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,
I'm not sure that its as cut and dry as:
"Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."
There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
In other words, the application wants to change its behavior when the semantics of a term are changed. In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.
Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics? In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web.
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology. Different ontologies and different applications will require different approaches.
Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
Mike Lang On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that
we have no good solutions yet, do we continue to throw our hands up and
punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
Re: URIs and Unique IDs
Glad we are on the same page.
> I think this would call for a standard property that could be monitored so that an application would know when a new version was created. I am not sure why you left this out of your two principles.
Silly answer: there was no room
Real answer: im just getting started, they were teh first two principles that came to mind.
As indicated in my otehr comments, I agree with your third principle.
A really big challenge will be deciding what can adn shold be done at teh infrastructural level, vs. what should be hard and fast guidelines vs. what shoudl be left up to the discressin of users adn developers.
Michael
On Mon, Nov 3, 2008 at 7:48 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,
After reading through your thread with John, I think that we are on the same page. I strongly believe (and it seems that you and John agree) that if a UID for a concept changes, the old version must have some way of pointing to the new version. I think this would call for a standard property that could be monitored so that an application would know when a new version was created. I am not sure why you left this out of your two principles. I believe if they are followed, then this principle must also be followed to avoid disrupting applications that make use of the concept.
In the case of Wordnet, had they followed your two principles, plus created pointers to new UIDs using on a standard property, you would be able to build your application so that it could automatically migrate to a new version of Wordnet.
Anyway, I look forward to your paper. My company, Revelytix, produces a web-based collaborative ontology editor and we currently do not really support versioning, simply because it has not been clear which direction we should go. We would be very happy to see the semantic web community reach consensus on some basic principles, it would make things much easier for us.
Michael Lang
On Sat, Nov 1, 2008 at 8:31 AM, Michael F Uschold <uschold@...> wrote:
See inline comments.
On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,
I'm not sure that its as cut and dry as:
"Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."
There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
In other words, the application wants to change its behavior when the semantics of a term are changed. In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.
Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics? In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web.
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology. Different ontologies and different applications will require different approaches.
Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
Mike Lang
On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.
On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue. Versioning information is meant to be placed on a version annotation.
However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.
Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.
This is a problem because they have no guidelines, they are basically stumbling along in the dark.
I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.
In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.
However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.
Here is how.
We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.
For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream. Imagine an application that relied on the semantics of broader as it was originally specified with transitivity. They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics. They are different beasts, and thus MUST have different URIs.
Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different. If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical. The only reasonable solution is to have the same URI for things with the same semantics.
Thus, any ontology versioning systemof the future will rely on these two principles: 1. If the semantics of a term changes, then it needs to have a new unique ID. 2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.
If either of these two guidelines are broken, then so will the ontology-driven applications of the future.
These maxims hold without exception for any standards that are formally released as standards. A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.
The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.
Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI? Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake. The URI with the wrong semantics must keep its original unique ID.
Michael Uschold
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell)
skype: michael.allen.lang.jr aim: MikeJrRevelytix
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell)
skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
Re: URIs and Unique IDs
comments inline.
On Mon, Nov 3, 2008 at 6:17 PM, John Graybeal <graybeal@...> wrote:
On Nov 3, 2008, at 12:21 AM, Michael F Uschold wrote:
Humans don't create or read UIDs, machines do. Tools and names can be used to have the user see whatever you want them to see. This scheme gives the advantaage you want w/o minting new URIs for the same thing.
Well, this is probably the nub of the different choices. In the domain I work in, humans -- often assisted by machines, often not -- create the vast majority of both UIDs and URIs, and there are precious few tools and systems supporting the former. (By 'supporting' I mean creating the association between the human-centric data that keys the UID, and always providing the right human-centric data whenever the UID surfaces.)
Agreed, and this is what needs to change.
In marine science at least, this is just Not Going To Happen in any pervasive way for quite a while.
For social or technical reasons? IF teh latter, are teh tecnical reasons fundamental and not going to change, or can technology evolve to improve things? What if the standard tools did it for you, would there still be resistance.
So if I want human acceptance of semantics now, regretfully, I'm going to have to conflate.
In each of our cases, we will be spending time trying to make this work. Along those lines, I will be thinking very hard about how to avoid the creation of semantically duplicate URIs in our system -- I welcome lobbying (either way) from others regarding the value of this. (I can summarize off-list comments.) Also I look forward to the paper, I am sure I will learn from it. Thanks.
John
|

|
Re: URIs and Unique IDs
On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
> I strongly believe (and it seems that you and John agree) that if a
> UID for a concept changes, the old version must have some way of
> pointing to the new version.
Funny, I would have said this the other way around (new points back to
old, then the system services can provide the old -> new capability --
or is this what you are saying too?). I have this notion that *any*
change to a static resource's specifications -- definition, metadata,
semantics -- makes a new resource (this lets me compare resource_new
to resource_old and see the difference between them unambiguously).
With this vision, the resource can't change once it is created, even
to point to a new resource (you see the problem). Is this vision just
plain wrong, per the consensus?
On Nov 3, 2008, at 3:11 PM, Michael F Uschold wrote:
> In marine science at least, this is just Not Going To Happen in any
> pervasive way for quite a while.
> For social or technical reasons? IF teh latter, are teh tecnical
> reasons fundamental and not going to change, or can technology
> evolve to improve things? What if the standard tools did it for you,
> would there still be resistance.
A bit of both. The difficulty is that the community uses every tool
and standard in the universe (!), many of them custom and one-off
programs, most of them severely non-semantic, and many not very
sophisticated, in this context at least. So it isn't like we change
"the standard tools" because there are no standard tools. And the cost
of making the changes (assuming we agree on all the changes to make)
is high compared to the funds available to make the changes, and the
larger community just does not see the need (yet). Yes we can address
the semantic part, but we need a major consensus on broad approaches
to have that attitude impact actual community usage. (Working on it. :-
>)
John
--------------
John Graybeal <mailto: graybeal@...> -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org
|

|
Re: URIs and Unique IDs
John Graybeal wrote:
> Funny, I would have said this the other way around (new points back to
> old, then the system services can provide the old -> new capability --
> or is this what you are saying too?).
And lo, it already exists:
http://www.w3.org/2006/link#obsoletesThe only tiny little thing that we need is widespread usage and
support for it.
--
Toby A Inkster
<mailto: mail@...>
< http://tobyinkster.co.uk>
|

|
Re: URIs and Unique IDs
Peter,
I agree 100% with your assessment. In the semantic web world, I believe that versioning will not be very important. I think a major benefit of using semantic web technologies is that you can build an application that will adapt to changes in the semantics of a word as the semantics change in the real world.
But, as you said, there may be cases where, at a significant point in time, a community would like to version its vocabulary. The goal of this discussion is simply to develop some guidelines for versioning, when it is necessary, that will make the transition from a past version of a vocabulary to an new one as easy, accurate, and flexible as possible for the users of a vocabulary.
Mike Lang On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell <ansell.peter@...> wrote:
----- "John Graybeal" < graybeal@...> wrote:
> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
>
> > I strongly believe (and it seems that you and John agree) that if a
>
> > UID for a concept changes, the old version must have some way of
> > pointing to the new version.
>
> Funny, I would have said this the other way around (new points back to
> old, then the system services can provide the old -> new capability --
> or is this what you are saying too?). I have this notion that *any*
> change to a static resource's specifications -- definition, metadata,
> semantics -- makes a new resource (this lets me compare resource_new
> to resource_old and see the difference between them unambiguously).
>
> With this vision, the resource can't change once it is created, even
> to point to a new resource (you see the problem). Is this vision just
> plain wrong, per the consensus?
Should we really focus on a "ya just never know, do ya" philosophy that hurts the majority of casual users more than it helps the specialised users? If you make up a system where you require that people manually migrate all their past statements in order to use the system in a months time then you won't be looked upon too favourably. And if you give them the choice to mass migrate their statements then what is the point if they always select "migrate all to most current versions"?
This is a very radical discussion that I don't think fits the majority of use cases that the semantic web will be applied to, as it is decidedly anti Web-2.0 where there is a constant evolution and links are relative, not static as in Web-1.0. If you really face it, meaning migrates, and the particular structure at a given instant in time isn't as relevant as the improvement in meaning anyway. If rules in the semantic web are completely reliant on data structures and unable to recognise the overall meaning that people gradually migrate towards then they are always going to be brittle, whether people are perfectly pedantic about UID's and/or URI's or whether they end up referencing everything with relative addresses which don't focus on particular representations at particular points in time.
It isn't bad to version information at significant points in time, but the archaic once-published-always-published-never-modified culture doesn't fit with electronic technologies IMO.
(Just a few thoughts :) )
Cheers,
Peter
-- Revelytix, Inc. phone: 410-584-0009 (office) 443-928-3782 (cell) skype: michael.allen.lang.jr aim: MikeJrRevelytix
|

|
RE: URIs and Unique IDs
> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
> [ . . . ] I have this notion that *any*
> change to a static resource's specifications -- definition, metadata,
> semantics -- makes a new resource (this lets me compare resource_new
> to resource_old and see the difference between them unambiguously).
I think that's one good view that's right for some applications, but not the end of the story. For applications that can live with more instability, modifying a resource specification in place is the best solution. For others needing more certainty, the strategy that you describe is best. The most important thing is to clearly state the change policy of the resource specification.
>
> With this vision, the resource can't change once it is created, even
> to point to a new resource (you see the problem). Is this vision just
> plain wrong, per the consensus?
I don't think this needs to be a problem, because a static resource specification could include the URL of an external document that can be updated without modifying the static resource specification. The URL of the external document *would* be a part of the static resource specification, but the content of the document at that URL would *not* be a part of the static resource specification.
David Booth, Ph.D.
HP Software
+1 617 629 8881 office | dbooth@...
http://www.hp.com/go/softwareStatements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.
|

|
Re: URIs and Unique IDs
I strongly disagree that versioning will not be important. I suspect
that it will become the most profound and challenging problem to tackle
if we are to scale the application of semantic technology. Change
management is a less critical in the short term for those concerned
with the linguistic notion of semantics. However, if you are concerned
with leveraging semantic models to drive/support high value proposition
mission critical systems, change management becomes a serious concern.
Versioning and change management becomes a show stopper if you are
going even further and intend to create full computational semantic
systems where the algorithms and data/object models of software systems
are replaced by semantic models. In each one of these three areas the
level of trust and dependencies on the asserted semantics will become
critical.
Here are a few examples:
1. Trust semantic models or ontologies to support operational/mission
systems such as:
a. Equipment, system maintenance applications
- an knowledge modeler/ontologists asserted that a General
Electric A877623 is a subclass of a Turbo Prop Engine and then in a
later version realizes their mistake that it is a subclass of another
system. The difference affects the scheduling of maintenance for
aircraft.
- a similar model asserts that a system should be overhauled
if a certain condition occurs
b. Operational policies and compliance applications
- a knowledge modelers asserts that a person who approve a
credit rating cannot approve a loan but in a later version of the
compliance ontology realizes that the semantics need to be far more
sophisticated. The difference affects the ability of the compliance
system to prevent or permit fraud.
c. Medical / Bio applications
- A bio medical ontologists asserts that one protein
up-regulates a gene. Another subject matter expert asserts that the
same protein down regulates a gene. Another researchers realizes that
it is important to tear down the model and express the context of the
scenario to capture the conflict. The difference affects the ability of
a medical diagnostic system.
d. Intelligence systems
- The model of a social / economic network for terrorist in
one model needs to be advanced to not to create millions of false
positives.
e. Any other system that dreams of integrating vast amounts of
subject matter expertise and organizing into something more
sophisticated and operational than just a categorization system,
dictionary or primitive taxonomy.
2. Simple, but ontologies/semantic models with massive adoption
a. In one popular social networking ontology the class Person is
used by millions of people. Later it becomes critical to redefine the
class as a subclass of Social Contact in order to differentiate from
the animal or physical notion of Person in another widely used ontology.
3. In the longer term vision, semantic technology Drive model driven /
ontology driven software systems
a. Declarative, rich semantic models that explicitly describe the
behavour of parts or every aspect of a functional software system.
b. Models that explicitly express the compatibility semantics
between one software system and another so that software systems
actually understand their purpose and functionality.
Systems that are more concerned with the NLP or the linguistic notion
of "semantics" are currently a little bit more resilient to change
management because their application tend to use statistics or
approximation to create value. Example applications would be sense
disambiguation for advertising, entity extraction, etc.. For these
systems machine learning can help us cope with a lot of inconsistencies
in semantic models. However, as these systems will become more mission
critical and the rationalization and harmonization of semantics between
various ontologies will start to become a serious economic issue. Using
the right version of various semantic models (such as Wordnet, DBPedia,
etc..) will become a very challenging and painful problem. This latter
area is a significant concern and area of effort/management right now.
The power of semantics can permit us to formally express and share the
semantics of things explicitly or implicitly. This can ultimately help
to actually get a grip on the ugly world of change management. However,
in the short term it will open a Pandora's box of power and change
management problems.
Conor
Conor Shankey
CTO
Reinvent, Inc - Vancouver.com
www.Reinvent.com
www.Vancouver.com
Michael Lang(Jr.) wrote:
Peter,
I agree 100% with your assessment. In the semantic web world, I
believe that versioning will not be very important. I think a major
benefit of using semantic web technologies is that you can build an
application that will adapt to changes in the semantics of a word as
the semantics change in the real world.
But, as you said, there may be cases where, at a significant
point in time, a community would like to version its vocabulary. The
goal of this discussion is simply to develop some guidelines for
versioning, when it is necessary, that will make the transition from a
past version of a vocabulary to an new one as easy, accurate, and
flexible as possible for the users of a vocabulary.
Mike Lang
On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell
<ansell.peter@...>
wrote:
----- "John Graybeal" < graybeal@...> wrote:
> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
>
> > I strongly believe (and it seems that you and John agree)
that if a
>
> > UID for a concept changes, the old version must have some way
of
> > pointing to the new version.
>
> Funny, I would have said this the other way around (new points
back to
> old, then the system services can provide the old -> new
capability --
> or is this what you are saying too?). I have this notion that
*any*
> change to a static resource's specifications -- definition,
metadata,
> semantics -- makes a new resource (this lets me compare
resource_new
> to resource_old and see the difference between them unambiguously).
>
> With this vision, the resource can't change once it is created,
even
> to point to a new resource (you see the problem). Is this vision
just
> plain wrong, per the consensus?
Should we really focus on a "ya just never know, do ya" philosophy that
hurts the majority of casual users more than it helps the specialised
users? If you make up a system where you require that people manually
migrate all their past statements in order to use the system in a
months time then you won't be looked upon too favourably. And if you
give them the choice to mass migrate their statements then what is the
point if they always select "migrate all to most current versions"?
This is a very radical discussion that I don't think fits the majority
of use cases that the semantic web will be applied to, as it is
decidedly anti Web-2.0 where there is a constant evolution and links
are relative, not static as in Web-1.0. If you really face it, meaning
migrates, and the particular structure at a given instant in time isn't
as relevant as the improvement in meaning anyway. If rules in the
semantic web are completely reliant on data structures and unable to
recognise the overall meaning that people gradually migrate towards
then they are always going to be brittle, whether people are perfectly
pedantic about UID's and/or URI's or whether they end up referencing
everything with relative addresses which don't focus on particular
representations at particular points in time.
It isn't bad to version information at significant points in time, but
the archaic once-published-always-published-never-modified culture
doesn't fit with electronic technologies IMO.
(Just a few thoughts :) )
Cheers,
Peter
--
Revelytix, Inc.
phone: 410-584-0009 (office)
443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix
|