URIs and Unique IDs

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Parent Message unknown URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold


Re: URIs and Unique IDs

by Michael Lang(Jr.) :: Rate this Message:

| View Threaded | Show Only this Message

Michael,

I'm not sure that its as cut and dry as:

"Thus, any ontology versioning system of the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."

There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.  In other words, the application wants to change its behavior when the semantics of a term are changed.  In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.  I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology.  Different ontologies and different applications will require different approaches.  

Mike Lang

On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix

Re: URIs and Unique IDs

by John Graybeal :: Rate this Message:

| View Threaded | Show Only this Message

We are trying to release a community semantic service (later this month!) that "does the right thing" in this arena. So I strongly agree with the tenor of this message. Except I am trying to imagine what implementation should happen in the _present_ for our service to be an exemplar.  I am sorry for the long post, but if it is mostly valid, hopefully it can advance the discussion.

We have provisionally settled on the following principles for this service (which is intended to store domain vocabularies and terms, keep track of their versions, and let people make relations between them).  I realize the focus of the original post was on URIs of the relations, but I think semantics of any terms are also important to consider, and probably apply to the relations.

Principles
  A. *Any* change to a vocabulary, including to any of its terms (and their semantics), metadata, means the vocabulary must get a new version = new URI
  B. A vocabulary contains all the terms within it, not just the terms that changed in that version
  C. The nominally opaque URIs must be fairly self-consistent in their presentation, or people in the non-semantic community will misunderstand them (or rebel against using them)
  D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
  E. It must be possible to choose (i.e., map to, or identify) either a specific (versioned) meaning, or a 'most current' meaning, for a given concept

From these principles I've concluded 
  aa. A new vocabulary version results in new term versions (= new URIs) for all the terms as well (even if their semantics haven't changed, sorry -- see below for further thoughts on this)
  bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
  cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.)  Those who choose to use the 'most current' term will get what they pay for.
  dd. Any created relationship that uses a 'most current' URI, should be timestamped to allow review of the historical state of the members of the triple (but note that this is strictly for understanding, since the selection of the 'most current' URI as the referenced concept explicitly permits changes to happen in that resource)
  ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts 
  ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.

So two easy conclusions:

Yes, it is terrible for the semantics of a (nominally static) concept to change, and that concept's URI to remain the same. That breaks everything, as near as I can tell.

In the case of a subject/object term, it is clearly acceptable for the semantics of a _dynamic_ concept to change without changing the URI. 

I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running. I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.

As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan.  The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed. So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. 

Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. The service described in (ee) needs this capability. But I think sameAs doesn't apply here, as the two URIs actually reflect two different resources, which are definitionally and semantically equal, but live in a different context.) I imagine we will have to create a relationship for our own use that has this meaning for now.  

If you just can't stand all those URIs that have the same semantics, and you told me I had to use the original URI that had that meaning, I would say 'ok' -- then, to meet principal (C), I would create URIs that dereference to the original URI, so that when people get confused and use the (wrong, non-existent) URI that corresponds to that term in the current version of the vocabulary, at least I could respond with useful information.

(Yes, I know this emphasizes why URIs should be opaque, and I'm afraid in this respect I am consciously doing the 'wrong' thing by making my URI algorithm all too obvious. The value added by a semantic URI is just too compelling, for the success of the project and semantic adoption in general.)

John



On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote:

I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold



John

--------------
John Graybeal   <graybeal@...>  -- 831-775-1956 
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org   


Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

See inline comments.

On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,

I'm not sure that its as cut and dry as:

"Thus, any ontology versioning system of the future will rely on these two principles:

1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."

There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
 
 In other words, the application wants to change its behavior when the semantics of a term are changed.  In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.  

Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics?  In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web. 

 
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology.  Different ontologies and different applications will require different approaches.  

Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
 

Mike Lang


On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix


Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

It seems to me that in cases where an application wants to use the most up to date version of something, you don't have to change the semantics and keep the same UID. You can instead have a subscription service with allows an application to be notified of every change to new versions.  Then the application that wants the new version can have a mechanism for updating it's innards to replace every occurrence of the old UID with the new one.

For applications that need to retain both can do that too. Everyone can be happy.

Michael

On Sat, Nov 1, 2008 at 2:31 PM, Michael F Uschold <uschold@...> wrote:
See inline comments.

On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,

I'm not sure that its as cut and dry as:

"Thus, any ontology versioning system of the future will rely on these two principles:

1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."

There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
 
 In other words, the application wants to change its behavior when the semantics of a term are changed.  In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.  

Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics?  In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web. 

 
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology.  Different ontologies and different applications will require different approaches.  

Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
 

Mike Lang


On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix



Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

comments in line

On Thu, Oct 30, 2008 at 8:08 PM, John Graybeal <graybeal@...> wrote:
We are trying to release a community semantic service (later this month!) that "does the right thing" in this arena.

Excellent, glad to learn this.
 
So I strongly agree with the tenor of this message. Except I am trying to imagine what implementation should happen in the _present_ for our service to be an exemplar.  I am sorry for the long post, but if it is mostly valid, hopefully it can advance the discussion.

We have provisionally settled on the following principles for this service (which is intended to store domain vocabularies and terms, keep track of their versions, and let people make relations between them).  I realize the focus of the original post was on URIs of the relations, but I think semantics of any terms are also important to consider, and probably apply to the relations.

Principles
  A. *Any* change to a vocabulary, including to any of its terms (and their semantics), metadata, means the vocabulary must get a new version = new URI

Agreed.  I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?
 
  B. A vocabulary contains all the terms within it, not just the terms that changed in that version

So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.
 
  C. The nominally opaque URIs must be fairly self-consistent in their presentation, or people in the non-semantic community will misunderstand them (or rebel against using them)

This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.
 
  D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI

I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions.  If this is what you mean, then I absolutely agree with this.  If this is not what you mean, then what is the difference between the 'current meaning of a term' and any other meaning of that term. 
 
  E. It must be possible to choose (i.e., map to, or identify) either a specific (versioned) meaning, or a 'most current' meaning, for a given concept

Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI)  that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current  semantic web infrastructure?
 

From these principles I've concluded 
  aa. A new vocabulary version results in new term versions (= new URIs) for all the terms as well (even if their semantics haven't changed, sorry -- see below for further thoughts on this)

I definitely disagree on this, even after reading your material below.
 
  bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)

This is an interesting question with more than one reasonable position. I think there are at least two cases:
1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.

For 1. you do NOT want to change the name f the term, was and is the right term.  But you DO want to change its UID because it is a different thing.
For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader. You should be able to change the name w/o changing the UID.
 
  cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.)  Those who choose to use the 'most current' term will get what they pay for.

You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.

 
  dd. Any created relationship that uses a 'most current' URI, should be timestamped to allow review of the historical state of the members of the triple (but note that this is strictly for understanding, since the selection of the 'most current' URI as the referenced concept explicitly permits changes to happen in that resource)

Timestamping is useful, but could be expensive.
 
  ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts 

When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
 
  ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.


Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.
 

So two easy conclusions:

Yes, it is terrible for the semantics of a (nominally static) concept to change, and that concept's URI to remain the same. That breaks everything, as near as I can tell.

Agreed.
 

In the case of a subject/object term, it is clearly acceptable for the semantics of a _dynamic_ concept to change without changing the URI. 

Well the core URI/UID can stay the same, but each version needs to have its own UID so applications that want to use old versions don't break. 

There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others.

 

I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.

I don't follow this analogy.
 
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.

You have a strong intuition that I'm not able to grasp.  Can you articulate why with an example?
 

As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan.  The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.

This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing.  Multiple context shows different uses of a term, so each use should get a different UID, not the same one.
 
So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. 

Maybe the wordnet example is a read herring.  In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?

Here is one example where it is clearly a bad thing.
The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms.
T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics. 
T4: A new dataset is created which uses the new URIs
T5: The application loads the new data
T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well.

This is clearly a bad thing.   Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.



Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. 

This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?
 
The service described in (ee) needs this capability. But I think sameAs doesn't apply here, as the two URIs actually reflect two different resources, which are definitionally and semantically equal, but live in a different context.) 

I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?
 
I imagine we will have to create a relationship for our own use that has this meaning for now.  

We probably will need some new infrastructural primitives, to relate versions to each other.
 

If you just can't stand all those URIs that have the same semantics, and you told me I had to use the original URI that had that meaning, I would say 'ok' -- then, to meet principal (C), I would create URIs that dereference to the original URI, so that when people get confused and use the (wrong, non-existent) URI that corresponds to that term in the current version of the vocabulary, at least I could respond with useful information.

This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.

See another thread I started on similar topic by googling
["proliferation of URIs" uschold]
 

(Yes, I know this emphasizes why URIs should be opaque, and I'm afraid in this respect I am consciously doing the 'wrong' thing by making my URI algorithm all too obvious. The value added by a semantic URI is just too compelling, for the success of the project and semantic adoption in general.)

John



On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote:

I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold



John

--------------
John Graybeal   <graybeal@...>  -- 831-775-1956 
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org   



Re: URIs and Unique IDs

by John Graybeal :: Rate this Message:

| View Threaded | Show Only this Message

Michael,

I will try to be clearer -- your confusion was my fault, sorry.  Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!).

On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:

Agreed.  I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?

yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate.
  B. A vocabulary contains all the terms within it, not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near the end):  If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption?

So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.

No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too.

I am using  'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'.  So codes and opaque terms are OK.  For most ontologies we'll create, terms will be words and word phrases.

Anticipating your later comments, we concluded (you won't like this at all):
 1. a URI is a suitable UID
 2. a term can be part of a suitable URI
The first is argued elsewhere by others.

Re the second:  Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable.

I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1]

This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.

I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above.
D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions.  If this is what you mean, then I absolutely agree with this. 

Yes, this is what I mean, but keep in mind my previous conflation.

Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI)  that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current  semantic web infrastructure?

Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.)  We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained).
  bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases:
1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.

For 1. you do NOT want to change the name f the term, was and is the right term.  But you DO want to change its UID because it is a different thing.

For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader.

yes to all the above, well put.

You should be able to change the name w/o changing the UID.

Well, OK, maybe.  Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word.  (At least until I redefine the word. Sure hope everyone is using timestamps. :->)
  cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.)  Those who choose to use the 'most current' term will get what they pay for.

You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
  ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts 
When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
  ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.

Yes to all the above, and to the 'timestamps may be expensive' also.  I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved."  But I plan as if I will...
There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others. 
Maybe.  If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise.

I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
You have a strong intuition that I'm not able to grasp.  Can you articulate why with an example?

OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa.  Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the  ocean to keep it cool", some inferences could be faulty, but the engine won't break down.

But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed.  If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well.
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan.  The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.

This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing.  Multiple context shows different uses of a term, so each use should get a different UID, not the same one.

This is  a different context. Example below.
So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. 
Maybe the wordnet example is a read herring.  In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?

Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged.  By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created.

For a domain example, let's go back to sea surface temperature.  5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface".  More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature.

Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided.  Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations.

Here is one example where it is clearly a bad thing.
The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms.
T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics. 

Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences.  Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you?  But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....)

But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version."  Otherwise, I agree it's a bad thing.  

T4: A new dataset is created which uses the new URIs
T5: The application loads the new data
T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well.

This is clearly a bad thing.   Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.

One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely.

Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. 

This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?

An excellent point.  (I'm busted!)  Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept.  

I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?

The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it.  Conceptually/abstractly I suspect this may be the right way to think of a vocabulary. 

But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...).

but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*....
I imagine we will have to create a relationship for our own use that has this meaning for now.  
We probably will need some new infrastructural primitives, to relate versions to each other.

Just so.

This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.

See another thread I started on similar topic by googling
["proliferation of URIs" uschold]

Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.")  I would be a relatively small scale offender for a while, but a bad example.

I will leave it there, too long a post for sure.

John

[1] Our URI creation scheme is described at http://marinemetadata.org/apguides/ontprovidersguide/ontguideconstructinguris , with other details in that web neighborhood.


Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

A short reply to the main point.

Uschold said:
I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?
Graybeal said:

The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it.  Conceptually/abstractly I suspect this may be the right way to think of a vocabulary.

If you de-conflate URIs and UIDs we can have our cake and eat it too.

The new ontology is a unit with a UID that is different than the original one.
It is a bundle consisting of its component terms and definitions, and there is an ontology-has-component link that points to the UIDs of the most recent version. Done, perfect.  Humans don't create or read UIDs, machines do. Tools and names can be used to have the user see whatever you want them to see.  This scheme gives the advantaage you want w/o minting new URIs for the same thing.

Methinks that the conflation of URIs and UIDs makes it hard or impossible to get this advantage unless you mint new synonym URIs.  Hence to de-conflate.

I'm convinced that something like this is the right thing to do, in principle.  Finding out how it can be done in practice will be a lot of work.
---

BTW, this discussion has inspired me to write a paper on the topic. Probably too late to submit to WWW conference, but I will make whatever I have available in some manner when it is ready

Thank you for helping me clarify my ideas on this.


Michael

On Mon, Nov 3, 2008 at 5:24 AM, John Graybeal <graybeal@...> wrote:
Michael,

I will try to be clearer -- your confusion was my fault, sorry.  Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!).

On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:

Agreed.  I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?

yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate.
  B. A vocabulary contains all the terms within it, not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near the end):  If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption?

So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.

No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too.

I am using  'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'.  So codes and opaque terms are OK.  For most ontologies we'll create, terms will be words and word phrases.

Anticipating your later comments, we concluded (you won't like this at all):
 1. a URI is a suitable UID
 2. a term can be part of a suitable URI
The first is argued elsewhere by others.

Re the second:  Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable.

I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1]

This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.

I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above.
D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions.  If this is what you mean, then I absolutely agree with this. 

Yes, this is what I mean, but keep in mind my previous conflation.

Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI)  that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current  semantic web infrastructure?

Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.)  We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained).
  bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases:
1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.

For 1. you do NOT want to change the name f the term, was and is the right term.  But you DO want to change its UID because it is a different thing.

For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader.

yes to all the above, well put.

You should be able to change the name w/o changing the UID.

Well, OK, maybe.  Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word.  (At least until I redefine the word. Sure hope everyone is using timestamps. :->)
  cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.)  Those who choose to use the 'most current' term will get what they pay for.

You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
  ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts 
When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
  ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.

Yes to all the above, and to the 'timestamps may be expensive' also.  I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved."  But I plan as if I will...
There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others. 
Maybe.  If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise.

I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
You have a strong intuition that I'm not able to grasp.  Can you articulate why with an example?

OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa.  Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the  ocean to keep it cool", some inferences could be faulty, but the engine won't break down.

But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed.  If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well.
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan.  The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.

This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing.  Multiple context shows different uses of a term, so each use should get a different UID, not the same one.

This is  a different context. Example below.
So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. 
Maybe the wordnet example is a read herring.  In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?

Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged.  By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created.

For a domain example, let's go back to sea surface temperature.  5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface".  More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature.

Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided.  Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations.

Here is one example where it is clearly a bad thing.
The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms.
T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics. 

Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences.  Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you?  But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....)

But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version."  Otherwise, I agree it's a bad thing.  

T4: A new dataset is created which uses the new URIs
T5: The application loads the new data
T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well.

This is clearly a bad thing.   Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.

One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely.

Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. 

This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?

An excellent point.  (I'm busted!)  Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept.  

I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?

The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it.  Conceptually/abstractly I suspect this may be the right way to think of a vocabulary. 

But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...).

but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*....
I imagine we will have to create a relationship for our own use that has this meaning for now.  
We probably will need some new infrastructural primitives, to relate versions to each other.

Just so.

This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.

See another thread I started on similar topic by googling
["proliferation of URIs" uschold]

Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.")  I would be a relatively small scale offender for a while, but a bad example.

I will leave it there, too long a post for sure.

John

[1] Our URI creation scheme is described at http://marinemetadata.org/apguides/ontprovidersguide/ontguideconstructinguris , with other details in that web neighborhood.



RE: URIs and Unique IDs

by tim.glover :: Rate this Message:

| View Threaded | Show Only this Message

 
I agree with Michael Lang, who says that the community (or architect) should decide how words are used in an ontology, and should agree on changes.  Common sense suggests to me that reason for changing the semantics of a word is to correct an error, in which case the same word should be used, and existing systems will be freed from the error. I cannot think of a good reason for changing the meaning of a word in the context of an ontology otherwise.
 
But I think its important to recognise that in most real systems there are different levels of semantics.
 
- Firstly there are some "keywords" in OWL, whose semantics is defined by W3C and implemented by reasoning engine builders.
 
- Secondly there will be some words that are not defined as part of "OWL" but which are recognised as "keywords" by particular software systems  ("if  x is a member of VIRAL_INFECTIONS do y").  In these cases the software and the ontology are strongly bound together.
 
- Thirdly there will be aspects of the ontology which are "data driven", in that they are handled in a general way by the software ("find the broader terms of a term").
 
Fourthly, There may also be a distinction between "A box" and "T box" words. 
 
Moreover, a word may be significant to the software in one system, but handled in a "data driven" way by a different system.
 
It seems to me it is not at all clear cut to decide the best way to modify these ontologies.



From: semantic-web-request@... [mailto:semantic-web-request@...] On Behalf Of John Graybeal
Sent: 03 November 2008 04:25
To: Michael F Uschold
Cc: semantic-web@...; aldo.gangemi@...; Conor Shankey; Peter Mika; Ora Lassila; Pan, Dr Jeff Z.; Tim Berners-Lee; Frank van Harmelen; sean.bechhofer@...
Subject: Re: URIs and Unique IDs

Michael,

I will try to be clearer -- your confusion was my fault, sorry.  Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!).

On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote:

Agreed.  I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary?

yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate.
  B. A vocabulary contains all the terms within it, not just the terms that changed in that version
Here is my folksy perspective behind the model (more justifications near the end):  If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption?

So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs.

No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too.

I am using  'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'.  So codes and opaque terms are OK.  For most ontologies we'll create, terms will be words and word phrases.

Anticipating your later comments, we concluded (you won't like this at all):
 1. a URI is a suitable UID
 2. a term can be part of a suitable URI
The first is argued elsewhere by others.

Re the second:  Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable.

I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1]

This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation.

I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above.
D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI
I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions.  If this is what you mean, then I absolutely agree with this. 

Yes, this is what I mean, but keep in mind my previous conflation.

Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI)  that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current  semantic web infrastructure?

Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.)  We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained).
  bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?)
This is an interesting question with more than one reasonable position. I think there are at least two cases:
1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also wish to use the older versions.

For 1. you do NOT want to change the name f the term, was and is the right term.  But you DO want to change its UID because it is a different thing.

For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader.

yes to all the above, well put.

You should be able to change the name w/o changing the UID.

Well, OK, maybe.  Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word.  (At least until I redefine the word. Sure hope everyone is using timestamps. :->)
  cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.)  Those who choose to use the 'most current' term will get what they pay for.

You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy.
  ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts 
When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead.
  ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term.
Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis.

Yes to all the above, and to the 'timestamps may be expensive' also.  I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved."  But I plan as if I will...
There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others. 
Maybe.  If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise.

I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running.
I don't follow this analogy.
I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage.
You have a strong intuition that I'm not able to grasp.  Can you articulate why with an example?

OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa.  Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the  ocean to keep it cool", some inferences could be faulty, but the engine won't break down.

But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed.  If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well.
As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan.  The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed.

This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing.  Multiple context shows different uses of a term, so each use should get a different UID, not the same one.

This is  a different context. Example below.
So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. 
Maybe the wordnet example is a read herring.  In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things?

Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged.  By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created.

For a domain example, let's go back to sea surface temperature.  5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface".  More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature.

Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided.  Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations.

Here is one example where it is clearly a bad thing.
The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology.
T1: application loads ontology using original terms.
T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics. 

Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences.  Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you?  But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....)

But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version."  Otherwise, I agree it's a bad thing.  

T4: A new dataset is created which uses the new URIs
T5: The application loads the new data
T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well.

This is clearly a bad thing.   Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it.

One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely.

Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. 

This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics?

An excellent point.  (I'm busted!)  Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept.  

I still can't see any advantages for creating multiple copies of exactly the same thing.
Have I missed something?

The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it.  Conceptually/abstractly I suspect this may be the right way to think of a vocabulary. 

But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...).

but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*....
I imagine we will have to create a relationship for our own use that has this meaning for now.  
We probably will need some new infrastructural primitives, to relate versions to each other.

Just so.

This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible.

See another thread I started on similar topic by googling
["proliferation of URIs" uschold]

Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.")  I would be a relatively small scale offender for a while, but a bad example.

I will leave it there, too long a post for sure.

John

[1] Our URI creation scheme is described at http://marinemetadata.org/apguides/ontprovidersguide/ontguideconstructinguris , with other details in that web neighborhood.


Re: URIs and Unique IDs

by Dan Brickley-2 :: Rate this Message:

| View Threaded | Show Only this Message


Michael F Uschold wrote:
>
> BTW, this discussion has inspired me to write a paper on the topic.
> Probably too late to submit to WWW conference, but I will make whatever
> I have available in some manner when it is ready

Great, I look forward to seeing it. Can you post a copy to semantic-web
when it's ready?

BTW - and perhaps this is a bit cheeky - but may I take this opportunity
suggest (to you and others) that titles like 'URI Crisis' are a little
overly-dramatic. I'd rather see dull titles like 'URI - Mild Nuisance',
'URIs don't solve everything' or 'URIs Provide Opportunity for Further
Clarity and Best Practice'. The use of URIs provides mostly syntax and
some machinery around decentralised control; it also quite naturally and
inevitably provides many many ways to screw up. This doesn't constitute
a crisis any more than the fact that Unicode allows people to write
illogical things or bad poetry.  Or that UML and OWL allow bad or vague
conceptual models to acquire the outer trappings of formality.

I find the talk of URI and identity crisis a little alarmist, and I fear
they're one of the factors that put people off from approaching this
technology. Are we really in a crisis situation? Should I stop or start
doing something asap?

cheers,

Dan

--
http://danbri.org/


Re: URIs and Unique IDs

by John Graybeal :: Rate this Message:

| View Threaded | Show Only this Message



On Nov 3, 2008, at 12:21 AM, Michael F Uschold wrote:

> Humans don't create or read UIDs, machines do. Tools and names can  
> be used to have the user see whatever you want them to see.  This  
> scheme gives the advantaage you want w/o minting new URIs for the  
> same thing.

Well, this is probably the nub of the different choices.  In the  
domain I work in, humans -- often assisted by machines, often not --  
create the vast majority of both UIDs and URIs, and there are precious  
few tools and systems supporting the former. (By 'supporting' I mean  
creating the association between the human-centric data that keys the  
UID, and always providing the right human-centric data whenever the  
UID surfaces.)

In marine science at least, this is just Not Going To Happen in any  
pervasive way for quite a while.  So if I want human acceptance of  
semantics now, regretfully, I'm going to have to conflate.

In each of our cases, we will be spending time trying to make this  
work. Along those lines, I will be thinking very hard about how to  
avoid the creation of semantically duplicate URIs in our system -- I  
welcome lobbying (either way) from others regarding the value of this.  
(I can summarize off-list comments.)  Also I look forward to the  
paper, I am sure I will learn from it.  Thanks.

John



Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

Thanks for the comments, Dan.
 
I expect that for most people it is indeed a mild nuisance or similar.
 
For me on my job, it is a major headache adn it is getting worse, not better.   When the new version of Wordnet came out, we were faced with choices:
1. go through a long and painful process to 'do it right', by finding out which URIs shoudl be the same and which ones different. Reverse engineering what we wish would have been done in the first place (keep same URIs for thigns that do not change)
2. go through the process of changing our application in the appropriate places where any of teh old URIs were used.
3. dont bother upgrading to the new version.
 
None are attractive.
 
So I can agree that it is not a crisis now, but things are deterioriating, not improving.   I was distressed to learn what SKOS was contemplating. 
So I am argunig that there is a looming crisis.
 
A few months ago there was no financial crisis. Now there is.
 
In a few months or year or two, there could well be a full blown URI crises.
 
My work on this is aimed at preventing this crisis from unfolding before it is too late.
 
Michael

On Mon, Nov 3, 2008 at 12:43 PM, Dan Brickley <danbri@...> wrote:
Michael F Uschold wrote:

BTW, this discussion has inspired me to write a paper on the topic. Probably too late to submit to WWW conference, but I will make whatever I have available in some manner when it is ready

Great, I look forward to seeing it. Can you post a copy to semantic-web when it's ready?

BTW - and perhaps this is a bit cheeky - but may I take this opportunity suggest (to you and others) that titles like 'URI Crisis' are a little overly-dramatic. I'd rather see dull titles like 'URI - Mild Nuisance', 'URIs don't solve everything' or 'URIs Provide Opportunity for Further Clarity and Best Practice'. The use of URIs provides mostly syntax and some machinery around decentralised control; it also quite naturally and inevitably provides many many ways to screw up. This doesn't constitute a crisis any more than the fact that Unicode allows people to write illogical things or bad poetry.  Or that UML and OWL allow bad or vague conceptual models to acquire the outer trappings of formality.

I find the talk of URI and identity crisis a little alarmist, and I fear they're one of the factors that put people off from approaching this technology. Are we really in a crisis situation? Should I stop or start doing something asap?

cheers,

Dan

--
http://danbri.org/


Re: URIs and Unique IDs

by Michael Lang(Jr.) :: Rate this Message:

| View Threaded | Show Only this Message

Michael,

After reading through your thread with John, I think that we are on the same page.  I strongly believe (and it seems that you and John agree) that if a UID for a concept changes, the old version must have some way of pointing to the new version.  I think this would call for a standard property that could be monitored so that an application would know when a new version was created.  I am not sure why you left this out of your two principles.  I believe if they are followed, then this principle must also be followed to avoid disrupting applications that make use of the concept.  

In the case of Wordnet, had they followed your two principles, plus created pointers to new UIDs using on a standard property, you would be able to build your application so that it could automatically migrate to a new version of Wordnet.

Anyway, I look forward to your paper.  My company, Revelytix, produces a web-based collaborative ontology editor and we currently do not really support versioning, simply because it has not been clear which direction we should go.  We would be very happy to see the semantic web community reach consensus on some basic principles, it would make things much easier for us.

Michael Lang


On Sat, Nov 1, 2008 at 8:31 AM, Michael F Uschold <uschold@...> wrote:
See inline comments.

On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,

I'm not sure that its as cut and dry as:

"Thus, any ontology versioning system of the future will rely on these two principles:

1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."

There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
 
 In other words, the application wants to change its behavior when the semantics of a term are changed.  In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.  

Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics?  In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web. 

 
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology.  Different ontologies and different applications will require different approaches.  

Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
 

Mike Lang


On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix

Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

Glad we are on the same page.
 
> I think this would call for a standard property that could be monitored so that an application would know when a new version was created.  I am not sure why you left this out of your two principles.
 
Silly answer: there was no room
Real answer: im just getting started, they were teh first two principles that came to mind.
 
As indicated in my otehr comments, I agree with your third principle.
 
A really big challenge will be deciding what can adn shold be done at teh infrastructural level, vs. what should be hard and fast guidelines vs. what shoudl be left  up to the discressin of users adn developers.
 
Michael

On Mon, Nov 3, 2008 at 7:48 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,

After reading through your thread with John, I think that we are on the same page.  I strongly believe (and it seems that you and John agree) that if a UID for a concept changes, the old version must have some way of pointing to the new version.  I think this would call for a standard property that could be monitored so that an application would know when a new version was created.  I am not sure why you left this out of your two principles.  I believe if they are followed, then this principle must also be followed to avoid disrupting applications that make use of the concept.  

In the case of Wordnet, had they followed your two principles, plus created pointers to new UIDs using on a standard property, you would be able to build your application so that it could automatically migrate to a new version of Wordnet.

Anyway, I look forward to your paper.  My company, Revelytix, produces a web-based collaborative ontology editor and we currently do not really support versioning, simply because it has not been clear which direction we should go.  We would be very happy to see the semantic web community reach consensus on some basic principles, it would make things much easier for us.

Michael Lang


On Sat, Nov 1, 2008 at 8:31 AM, Michael F Uschold <uschold@...> wrote:
See inline comments.

On Thu, Oct 30, 2008 at 4:20 PM, Michael Lang(Jr.) <michaelallenlang@...> wrote:
Michael,

I'm not sure that its as cut and dry as:

"Thus, any ontology versioning system of the future will rely on these two principles:

1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions."

There will certainly be times when an ontology-driven application is purposely dependent on the evolution of the semantics of a term.
 
 In other words, the application wants to change its behavior when the semantics of a term are changed.  In this case, the URI should not be changed if the semantics of a term are changed. If it was changed, the application would keep functioning in its original manner instead of adapting to the new meaning of the term.  

Can you think of a clear example where the application will only do the right thing when the unique identifier (UID) for a resource ceases to be used for that [conceptual] resource and is instead used for a resource with a different semantics?  In this case, do you propose that the application is notified of the new meaning or, it just changes w/o notice? Note, I'm asking about the UID in a world where it is de-conflated from the URI and the physical location on the web. 

 
I think, in general, it should be left up to the community of users and/or managers of an ontology to communicate with each other and decide what approach to take when creating a new version of an ontology.  Different ontologies and different applications will require different approaches.  

Proably true in general, but I need some concrete examples to be convinced that willy nilly semantics changing of the semantics of resources is desirable.
 

Mike Lang


On Thu, Oct 30, 2008 at 5:14 AM, Michael F Uschold <uschold@...> wrote:
I'm resending this message to the semantic web discussion group for the record.

On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@...> wrote:
Currently there is no accepted practice on how/whether to migrate to new URIs when a new version of an ontology is published. This is largely due to the fact that there is no good technology for managing versioning, and the W3C consciously (and probably sensibly) decided not to address the issue.  Versioning information is meant to be placed on a version annotation.

However the current situation is like the wild West, and everyone will be doing different things, resulting in a mess.

Wordnet published a new version and minted all new URIs even though many or most of the entries were semantically identical.
The SKOS working group is currently considering the pros and cons of various options. One is to adopt all new URIs in a new namespace, just like Wordnet. Another is to keep the exact same name space, and change the semantics of a small number of terms while keeping the same URI. A third is to keep the same URI for the unchanged terms, and mint new URIs for the terms with different semantics.

This is a problem because they have no guidelines, they are basically stumbling along in the dark.

I believe that this is an urgent matter that needs attention to prevent a nightmare from unfolding.

In the current state of semantic web use, it may not matter to much what choice the SKOS team chooses. This is mainly relatively few applications will be impacted, which may be due to the fact that the applications are not driven by the ontologies.

However, when usage of ontologies and ontology-driven applications becomes more mainstream, the differences could be profound. Given that this issue is intimately tied up with versioning, and that we have no good solutions yet, do we continue to throw our hands up and punt? Absolutely not, it is essential that a good precedent is set ASAP that is based on sound principles.

Here is how.

We should imagine a future where ontology versioning is handled properly and do things that are going to make things easy to migrate to that future. We don't know how the versioning black box will work, but we should be able to make some clear and definitive statements about WHAT it does.

For example, in the future, ontology-driven applications will be fairly mainstream. URIs are used as unique identifiers. When applications are driven from ontologies, then they will break if you change the semantics in mid-stream.  Imagine an application that relied on the semantics of broader as it was originally specified with transitivity.  They loaded data that was created using that semantics. Then the SKOS spec changes and broader is no longer transitive. New datasets are created according to this new meaning. The application loads more data. It needs to know which data is subject to transitive closure and which is not. This is impossible, if the same SKOS URI is used for versions with different semantics.  They are different beasts, and thus MUST have different URIs.

Similarly, if SKOS mints a whole new namespace and changes all the URIs, the application also has a problem. It has datasets with the old URI and datasets with the new URIs. This means that the datasets will not be linked like they should, they will treat the two different URIs for the same thing as being different.  If one wanted to go into OWL-Full, one can use owl:sameAs, but this is not very practical.  The only reasonable solution is to have the same URI for things with the same semantics.

Thus, any ontology versioning systemof the future will rely on these two principles:
1. If the semantics of a term changes, then it needs to have a new unique ID.
2. If the semantics of a term does NOT change, then it should maintain the same ID in any future versions.

If either of these two guidelines are broken, then so will the ontology-driven applications of the future.

These maxims hold without exception for any standards that are formally released as standards.
A question arises if we need to hold to the same standards for standards like SKOS which was never formally blessed.

The practical difficulties will be the same whether the standard is blessed or not. It only really depends on whether the standard is a de facto standard,or whether it is getting significant use. If users build things and ontology producers break things through carelessness, this will hinder semantic web technology adoption.

Another question is what to do if the original standard is belived to be incorrect, and the new one is the fixed one. Can one then keep the same URI?
Again, the answer should be informed by the impact on applications. The same problems will occur if you change the semantics and keep the same URI even if you are fixing a mistake.  The URI with the wrong semantics must keep its original unique ID.

Michael Uschold




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix




--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix


Re: URIs and Unique IDs

by Uschold :: Rate this Message:

| View Threaded | Show Only this Message

comments inline.

On Mon, Nov 3, 2008 at 6:17 PM, John Graybeal <graybeal@...> wrote:

On Nov 3, 2008, at 12:21 AM, Michael F Uschold wrote:

Humans don't create or read UIDs, machines do. Tools and names can be used to have the user see whatever you want them to see.  This scheme gives the advantaage you want w/o minting new URIs for the same thing.

Well, this is probably the nub of the different choices.  In the domain I work in, humans -- often assisted by machines, often not -- create the vast majority of both UIDs and URIs, and there are precious few tools and systems supporting the former. (By 'supporting' I mean creating the association between the human-centric data that keys the UID, and always providing the right human-centric data whenever the UID surfaces.)
 
 
Agreed, and this is what needs to change.
 


In marine science at least, this is just Not Going To Happen in any pervasive way for quite a while.  
 
For social or technical reasons? IF teh latter, are teh tecnical reasons fundamental and not going to change, or can technology evolve to improve things? What if the standard tools did it for you, would there still be resistance.
 
So if I want human acceptance of semantics now, regretfully, I'm going to have to conflate.
 
 
 


In each of our cases, we will be spending time trying to make this work. Along those lines, I will be thinking very hard about how to avoid the creation of semantically duplicate URIs in our system -- I welcome lobbying (either way) from others regarding the value of this. (I can summarize off-list comments.)  Also I look forward to the paper, I am sure I will learn from it.  Thanks.

John



Re: URIs and Unique IDs

by John Graybeal :: Rate this Message:

| View Threaded | Show Only this Message


On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:

>  I strongly believe (and it seems that you and John agree) that if a  
> UID for a concept changes, the old version must have some way of  
> pointing to the new version.

Funny, I would have said this the other way around (new points back to  
old, then the system services can provide the old -> new capability --  
or is this what you are saying too?).  I have this notion that *any*  
change to a static resource's specifications -- definition, metadata,  
semantics -- makes a new resource (this lets me compare resource_new  
to resource_old and see the difference between them unambiguously).

With this vision, the resource can't change once it is created, even  
to point to a new resource (you see the problem).  Is this vision just  
plain wrong, per the consensus?

On Nov 3, 2008, at 3:11 PM, Michael F Uschold wrote:
> In marine science at least, this is just Not Going To Happen in any  
> pervasive way for quite a while.
> For social or technical reasons? IF teh latter, are teh tecnical  
> reasons fundamental and not going to change, or can technology  
> evolve to improve things? What if the standard tools did it for you,  
> would there still be resistance.

A bit of both. The difficulty is that the community uses every tool  
and standard in the universe (!), many of them custom and one-off  
programs, most of them severely non-semantic, and many not very  
sophisticated, in this context at least.  So it isn't like we change  
"the standard tools" because there are no standard tools. And the cost  
of making the changes (assuming we agree on all the changes to make)  
is high compared to the funds available to make the changes, and the  
larger community just does not see the need (yet).  Yes we can address  
the semantic part, but we need a major consensus on broad approaches  
to have that attitude impact actual community usage. (Working on it. :-
 >)

John

--------------
John Graybeal   <mailto:graybeal@...>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org



Parent Message unknown Re: URIs and Unique IDs

by Toby Inkster-4 :: Rate this Message:

| View Threaded | Show Only this Message


John Graybeal wrote:

> Funny, I would have said this the other way around (new points back to
> old, then the system services can provide the old -> new capability --
> or is this what you are saying too?).

And lo, it already exists:

        http://www.w3.org/2006/link#obsoletes

The only tiny little thing that we need is widespread usage and  
support for it.

--
Toby A Inkster
<mailto:mail@...>
<http://tobyinkster.co.uk>





Re: URIs and Unique IDs

by Michael Lang(Jr.) :: Rate this Message:

| View Threaded | Show Only this Message

Peter,

I agree 100% with your assessment.  In the semantic web world, I believe that versioning will not be very important.  I think a major benefit of using semantic web technologies is that you can build an application that will adapt to changes in the semantics of a word as the semantics change in the real world.  

But, as you said, there may be cases where, at a significant point in time, a community would like to version its vocabulary.  The goal of this discussion is simply to develop some guidelines for versioning, when it is necessary, that will make the transition from a past version of a vocabulary to an new one as easy, accurate, and flexible as possible for the users of a vocabulary. 

Mike Lang

On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell <ansell.peter@...> wrote:

----- "John Graybeal" <graybeal@...> wrote:

> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
>
> >  I strongly believe (and it seems that you and John agree) that if a
>
> > UID for a concept changes, the old version must have some way of
> > pointing to the new version.
>
> Funny, I would have said this the other way around (new points back to
> old, then the system services can provide the old -> new capability --
> or is this what you are saying too?).  I have this notion that *any*
> change to a static resource's specifications -- definition, metadata,
> semantics -- makes a new resource (this lets me compare resource_new
> to resource_old and see the difference between them unambiguously).
>
> With this vision, the resource can't change once it is created, even
> to point to a new resource (you see the problem).  Is this vision just
> plain wrong, per the consensus?

Should we really focus on a "ya just never know, do ya" philosophy that hurts the majority of casual users more than it helps the specialised users? If you make up a system where you require that people manually migrate all their past statements in order to use the system in a months time then you won't be looked upon too favourably. And if you give them the choice to mass migrate their statements then what is the point if they always select "migrate all to most current versions"?

This is a very radical discussion that I don't think fits the majority of use cases that the semantic web will be applied to, as it is decidedly anti Web-2.0 where there is a constant evolution and links are relative, not static as in Web-1.0. If you really face it, meaning migrates, and the particular structure at a given instant in time isn't as relevant as the improvement in meaning anyway. If rules in the semantic web are completely reliant on data structures and unable to recognise the overall meaning that people gradually migrate towards then they are always going to be brittle, whether people are perfectly pedantic about UID's and/or URI's or whether they end up referencing everything with relative addresses which don't focus on particular representations at particular points in time.

It isn't bad to version information at significant points in time, but the archaic once-published-always-published-never-modified culture doesn't fit with electronic technologies IMO.

(Just a few thoughts :) )

Cheers,

Peter



--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix

RE: URIs and Unique IDs

by Booth, David (HP Software - Boston) :: Rate this Message:

| View Threaded | Show Only this Message


> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
> [ . . . ]  I have this notion that *any*
> change to a static resource's specifications -- definition, metadata,
> semantics -- makes a new resource (this lets me compare resource_new
> to resource_old and see the difference between them unambiguously).

I think that's one good view that's right for some applications, but not the end of the story.  For applications that can live with more instability, modifying a resource specification in place is the best solution.  For others needing more certainty, the strategy that you describe is best.  The most important thing is to clearly state the change policy of the resource specification.

>
> With this vision, the resource can't change once it is created, even
> to point to a new resource (you see the problem).  Is this vision just
> plain wrong, per the consensus?

I don't think this needs to be a problem, because a static resource specification could include the URL of an external document that can be updated without modifying the static resource specification.  The URL of the external document *would* be a part of the static resource specification, but the content of the document at that URL would *not* be a part of the static resource specification.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@...
http://www.hp.com/go/software

Statements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.


Re: URIs and Unique IDs

by Conor Shankey-2 :: Rate this Message:

| View Threaded | Show Only this Message

I strongly disagree that versioning will not be important. I suspect that it will become the most profound and challenging problem to tackle if we are to scale the application of semantic technology. Change management is a less critical in the short term for those concerned with the linguistic notion of semantics. However, if you are concerned with leveraging semantic models to drive/support high value proposition mission critical systems, change management becomes a serious concern. Versioning and change management becomes a show stopper if you are going even further and intend to create full computational semantic systems where the algorithms and data/object models of software systems are replaced by semantic models. In each one of these three areas the level of trust and dependencies on the asserted semantics will become critical.

Here are a few examples:

1. Trust semantic models or ontologies to support operational/mission systems such as:
    a. Equipment, system maintenance applications
          - an knowledge modeler/ontologists asserted that a General Electric A877623 is a subclass of a Turbo Prop Engine and then in a later version realizes their mistake that it is a subclass of another system. The difference affects the scheduling of maintenance for aircraft.
          - a similar model asserts that a system should be overhauled if a certain condition occurs
    b. Operational policies and compliance applications
          - a knowledge modelers asserts that a person who approve a credit rating cannot approve a loan but in a later version of the compliance ontology realizes that the semantics need to be far more sophisticated. The difference affects the ability of the compliance system to prevent or permit fraud.
    c. Medical / Bio applications
          - A bio medical ontologists asserts that one protein up-regulates a gene. Another subject matter expert asserts that the same protein down regulates a gene. Another researchers realizes that it is important to tear down the model and express the context of the scenario to capture the conflict. The difference affects the ability of a medical diagnostic system.
    d. Intelligence systems
          - The model of a social / economic network for terrorist in one model needs to be advanced to not to create millions of false positives.
    e. Any other system that dreams of integrating vast amounts of subject matter expertise and organizing into something more sophisticated and operational than just a categorization system, dictionary or primitive taxonomy.

2. Simple, but ontologies/semantic models with massive adoption
    a. In one popular social networking ontology the class Person is used by millions of people. Later it becomes critical to redefine the class as a subclass of Social Contact in order to differentiate from the animal or physical notion of Person in another widely used ontology.

3. In the longer term vision, semantic technology Drive model driven / ontology driven software systems
    a. Declarative, rich semantic models that explicitly describe the behavour of parts or every aspect of a functional software system.
    b. Models that explicitly express the compatibility semantics between one software system and another so that software systems actually understand their purpose and functionality.

Systems that are more concerned with the NLP or the linguistic notion of "semantics" are currently a little bit more resilient to change management because their application tend to use statistics or approximation to create value. Example applications would be sense disambiguation for advertising, entity extraction, etc.. For these systems machine learning can help us cope with a lot of inconsistencies in semantic models. However, as these systems will become more mission critical and the rationalization and harmonization of semantics between various ontologies will start to become a serious economic issue. Using the right version of various semantic models (such as Wordnet, DBPedia, etc..) will become a very challenging and painful problem. This latter area is a significant concern and area of effort/management right now.

The power of semantics can permit us to formally express and share the semantics of things explicitly or implicitly. This can ultimately help to actually get a grip on the ugly world of change management. However, in the short term it will open a Pandora's box of power and change management problems.

Conor

Conor Shankey
CTO
Reinvent, Inc - Vancouver.com
www.Reinvent.com
www.Vancouver.com

Michael Lang(Jr.) wrote:
Peter,

I agree 100% with your assessment.  In the semantic web world, I believe that versioning will not be very important.  I think a major benefit of using semantic web technologies is that you can build an application that will adapt to changes in the semantics of a word as the semantics change in the real world.  

But, as you said, there may be cases where, at a significant point in time, a community would like to version its vocabulary.  The goal of this discussion is simply to develop some guidelines for versioning, when it is necessary, that will make the transition from a past version of a vocabulary to an new one as easy, accurate, and flexible as possible for the users of a vocabulary. 

Mike Lang

On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell <ansell.peter@...> wrote:

----- "John Graybeal" <graybeal@...> wrote:

> On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
>
> >  I strongly believe (and it seems that you and John agree) that if a
>
> > UID for a concept changes, the old version must have some way of
> > pointing to the new version.
>
> Funny, I would have said this the other way around (new points back to
> old, then the system services can provide the old -> new capability --
> or is this what you are saying too?).  I have this notion that *any*
> change to a static resource's specifications -- definition, metadata,
> semantics -- makes a new resource (this lets me compare resource_new
> to resource_old and see the difference between them unambiguously).
>
> With this vision, the resource can't change once it is created, even
> to point to a new resource (you see the problem).  Is this vision just
> plain wrong, per the consensus?

Should we really focus on a "ya just never know, do ya" philosophy that hurts the majority of casual users more than it helps the specialised users? If you make up a system where you require that people manually migrate all their past statements in order to use the system in a months time then you won't be looked upon too favourably. And if you give them the choice to mass migrate their statements then what is the point if they always select "migrate all to most current versions"?

This is a very radical discussion that I don't think fits the majority of use cases that the semantic web will be applied to, as it is decidedly anti Web-2.0 where there is a constant evolution and links are relative, not static as in Web-1.0. If you really face it, meaning migrates, and the particular structure at a given instant in time isn't as relevant as the improvement in meaning anyway. If rules in the semantic web are completely reliant on data structures and unable to recognise the overall meaning that people gradually migrate towards then they are always going to be brittle, whether people are perfectly pedantic about UID's and/or URI's or whether they end up referencing everything with relative addresses which don't focus on particular representations at particular points in time.

It isn't bad to version information at significant points in time, but the archaic once-published-always-published-never-modified culture doesn't fit with electronic technologies IMO.

(Just a few thoughts :) )

Cheers,

Peter



--
Revelytix, Inc.

phone: 410-584-0009 (office)
          443-928-3782 (cell)
skype: michael.allen.lang.jr
aim: MikeJrRevelytix
< Prev | 1 - 2 - 3 | Next >