|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
UMTHES and SKOS-XLHi everyone,
I'm putting here a discussion we started with Thomas Brandholtz on UMTHES [1] on the use of SKOS-XL there (see slides at [2]). A long mail, but it can be interesting for a wider audience, as UMTHES is one of the first SKOS-XL implementations! === Dear Thomas, So let's go. The main issue I have is that xl:Label is used in a very "term-oriented" way in UMTHES. More precisely, I feel that you are using labels to aggregate lexical entities which which indeed are belonging to the same "term". But these literals be introduced as labels in basic SKOS, I think. Trying to use a concrete example from your slides: :4711 rdf:type skos:Concept; skosxl:prefLabel :wasteWater. :wasteWater rdf:type skosxl:Label; skosxl:literalForm "waste water"; ext:lexicalVariant "wastewater"; ext:compoundFrom (:waste :water). "wastewater" is introduced as a lexical variant of "waste water". Per se, this is of course ok. But in basic SKOS, I would have modelled that "wastewater" as a skos:altLabel or a skos:hiddenLabel of :4711. As not attaching that string to an instance of xl:Label using xl:literalForm prevents you from benefitting from the useful property chains given in XL. So I would have represented "wastewater" as an instance of xl:Label. Of course, you may object that you can declare yourself a property chain (or property chains) that would allow to infer that the literals that are objects of ext:lexicalVariant triples (or the ones involving sub-properties of ext:lexicalVariant) are also objects of skos:hiddenLabel (or skos:altLabel) statements attached to the skos:Concept to which your xl:Label is attached. But then I'd be still uncomfortable with an xl:Label giving raise to several (SKOS-basic) labels. Additionally, we actually introduced xl:labelRelation to handle cases like acronyms [1]. In your approach, acronym is a subproperty of lexicalVariant, which is clearly a different pattern from ours. As I feel it, your choice may be prefectly grounded in terminology. Still, I'd be curious to hear whether this is a strong position of yours, or if you could accomodate a different pattern. Maybe there can be indeed a solution accomodating both points of view (if I interpreted one correctly, of course). Namely, introducing "wastewater" as the literalForm of an xl:Label which is not connected to any concept; just connected (by an ext:lexicalVariant which would be then a sub-property of xl:labelRelation) to :wasteWater. Of course you can say then that the distinction between "waste water" and "wastewater" is something very important for your UMTHES and the applications you envision with it, and that "wastewater" should never be used as a basic concept label, even a hidden one. Or not even interpreted as something that could be a label... You can also argue that the xl:Label story is quite thin in the SKOS Reference anyway, and that you can use that class as a purely technical hook for any purpose. That's indeed not far from being the truth, and if all are rightfully motivated, well, I guess we can have several ways of handling a relation such as acronymy co-exist. But well, having one of the first XL deployments departing from the meager guidelines we had put in the Reference would not be a great sign for us :-/ Apart from this issue of ext:literalVariant and its sub-properties, I found the rest really good, confirming my first enthusiastic reaction after your talk :-) Two comments/questions, maybe: 1. Are you planning to add the language tag that seem to be missing on some slides (e.g. for the ext:inflection objects) in the real data? 2. Intuitively, I feel that the definition of :NonPreferredTerm (on slide 33) is too strong. I would have said that everything that is related via xl:altLabel to a concept cannot be a PreferredTerm. Otherwise there would be a conflict with the inferred basic SKOS labelling triples [2]. So the complementOf axiom would not be really needed. But again, it's late, and I prefer to send this mail rather than letting you wait more time for my answer... Cheers, Antoine [1] http://www.w3.org/2006/07/SWD/track/issues/215 [2] http://eea.eionet.europa.eu/Public/irc/envirowindows/jad/library?l=/ecoinformatics_indicator/ecoterm_5-6102009/ecoterm09-bandholtzppt/_EN_1.0_&a=d |
|
|
Re: UMTHES and SKOS-XLAh yes. We discovered a similar problem during work on BS 8723. It was
about whether to introduce a specialisation of USE/UF to cater for abbreviations/acronyms and their expansions, for which you might use tags such as AB/FT. A problem arises when the abbreviation is short for another non-preferred term rather than the preferred term. (For example, the preferred term "Information and communication technology" can have non-preferred terms "Information technology", "IT" and "ICT") It becomes apparent that the proposed specialisation is not really a type of USE/UF. It is an inter-term relationship that can sometimes apply between non-preferred terms. Obviously it is possible to find a way of representing this accurately, but at the expense of making the whole model more complicated and the tagging conventions more cumbersome. My personal view on this is that if you try to add more value in the shape of lexical/terminological information, you lose the virtue of simplicity. To put it another way, if you have mixed objectives (trying to achieve terminological objectives as well as enabling information retrieval) these tend to detract from each other. Cheers Stella ***************************************************** Stella Dextre Clarke Information Consultant Luke House, West Hendred, Wantage, OX12 8RR, UK Tel: 01235-833-298 Fax: 01235-863-298 stella@... ***************************************************** Antoine Isaac wrote: > Hi everyone, > > I'm putting here a discussion we started with Thomas Brandholtz on > UMTHES [1] on the use of SKOS-XL there (see slides at [2]). A long mail, > but it can be interesting for a wider audience, as UMTHES is one of the > first SKOS-XL implementations! > > === > > Dear Thomas, > > So let's go. The main issue I have is that xl:Label is used in a very > "term-oriented" way in UMTHES. > More precisely, I feel that you are using labels to aggregate lexical > entities which which indeed are belonging to the same "term". But these > literals be introduced as labels in basic SKOS, I think. Trying to use a > concrete example from your slides: > > :4711 rdf:type skos:Concept; > skosxl:prefLabel :wasteWater. > > :wasteWater rdf:type skosxl:Label; > skosxl:literalForm "waste water"; > ext:lexicalVariant "wastewater"; > ext:compoundFrom (:waste :water). > > "wastewater" is introduced as a lexical variant of "waste water". Per > se, this is of course ok. > But in basic SKOS, I would have modelled that "wastewater" as a > skos:altLabel or a skos:hiddenLabel of :4711. As not attaching that > string to an instance of xl:Label using xl:literalForm prevents you from > benefitting from the useful property chains given in XL. So I would have > represented "wastewater" as an instance of xl:Label. > > Of course, you may object that you can declare yourself a property chain > (or property chains) that would allow to infer that the literals that > are objects of ext:lexicalVariant triples (or the ones involving > sub-properties of ext:lexicalVariant) are also objects of > skos:hiddenLabel (or skos:altLabel) statements attached to the > skos:Concept to which your xl:Label is attached. > > But then I'd be still uncomfortable with an xl:Label giving raise to > several (SKOS-basic) labels. > Additionally, we actually introduced xl:labelRelation to handle cases > like acronyms [1]. In your approach, acronym is a subproperty of > lexicalVariant, which is clearly a different pattern from ours. > > As I feel it, your choice may be prefectly grounded in terminology. > Still, I'd be curious to hear whether this is a strong position of > yours, or if you could accomodate a different pattern. > > Maybe there can be indeed a solution accomodating both points of view > (if I interpreted one correctly, of course). Namely, introducing > "wastewater" as the literalForm of an xl:Label which is not connected to > any concept; just connected (by an ext:lexicalVariant which would be > then a sub-property of xl:labelRelation) to :wasteWater. > > Of course you can say then that the distinction between "waste water" > and "wastewater" is something very important for your UMTHES and the > applications you envision with it, and that "wastewater" should never be > used as a basic concept label, even a hidden one. Or not even > interpreted as something that could be a label... > > You can also argue that the xl:Label story is quite thin in the SKOS > Reference anyway, and that you can use that class as a purely technical > hook for any purpose. That's indeed not far from being the truth, and if > all are rightfully motivated, well, I guess we can have several ways of > handling a relation such as acronymy co-exist. > But well, having one of the first XL deployments departing from the > meager guidelines we had put in the Reference would not be a great sign > for us :-/ > > > Apart from this issue of ext:literalVariant and its sub-properties, I > found the rest really good, confirming my first enthusiastic reaction > after your talk :-) > > Two comments/questions, maybe: > > 1. Are you planning to add the language tag that seem to be missing on > some slides (e.g. for the ext:inflection objects) in the real data? > > 2. Intuitively, I feel that the definition of :NonPreferredTerm (on > slide 33) is too strong. I would have said that everything that is > related via xl:altLabel to a concept cannot be a PreferredTerm. > Otherwise there would be a conflict with the inferred basic SKOS > labelling triples [2]. So the complementOf axiom would not be really > needed. But again, it's late, and I prefer to send this mail rather than > letting you wait more time for my answer... > Cheers, > > Antoine > > [1] http://www.w3.org/2006/07/SWD/track/issues/215 > [2] > http://eea.eionet.europa.eu/Public/irc/envirowindows/jad/library?l=/ecoinformatics_indicator/ecoterm_5-6102009/ecoterm09-bandholtzppt/_EN_1.0_&a=d > > > > |
|
|
Re: UMTHES and SKOS-XL
Dear Stella & Antoine,
Antoine has raised the essential issue, Stella came up with a related use case which can be solved using the UMTHES patterns. UMTHES distinguishes not only prefLabel from altlabel, but also both from multiple spelling conventions of any label. We see abbreviations/acronyms as part of such spelling conventions, others are inflectional forms of the same term, or even common misspellings. If we mix this all together into altLabel instances, it would not make sense any more. Stellas example about abbrev is similar, but we separate spelling conventions ("lexical variants") from labels regardless whether they may be pref or alt. Example: :4711 rdf:type skos:Concept; skos:prefLabel "waste water"; skos:altLabel "sewage". makes sense, but #not recommended: :4711 rdf:type skos:Concept; skos:prefLabel "waste water"; skos:prefLabel "waste waters"; skos:prefLabel "wastewater"; skos:prefLabel "wastewaters"; skos:altLabel "sewage". looks at least somehow "unballanced". UMTHTES knows even more about lexical complexity (a really awful issue in German), that is why we decided to use xl:Label extensions to separate such complexity from the more prominent list of labels which are directly assigned to a skos:Concept: # hiding lexical complexity from the list of labels :wasteWater rdf:type skosxl:Label; skosxl:literalForm "waste water"; ext:lexicalVariant "wastewater"; ext:lexicalVariant "wastewaters"; ext:compoundFrom (:waste :water). Speaking in ISO Thesaurus lingo: we do not want inflectional forms etc. to become entry terms. (see http://www.w3.org/2004/02/skos/core/proposals.html#thesaurusRepresentation-11 This is also why we really do not want to have a property chain from a ext:lexicalVariant to a skos:Concept. We appreciate the property chain from the skosxl:literalForm to the skos:Concept. Why then do we need all those lexical variants at all? At first, UMTHES just has them. It is my job to serialise UMTHES in SKOS, not to change UMTHES. Secondly, we need this stuff to support automated indexing of full text documents. Machine need to be enabled to detect the Concepts behind this weird mess of character strings that makes a document (more on this in the ecoterm presentation). See some more notes inline below. Stella Dextre Clarke schrieb: Ah yes. We discovered a similar problem during work on BS 8723. It was about whether to introduce a specialisation of USE/UF to cater for abbreviations/acronyms and their expansions, for which you might use tags such as AB/FT. A problem arises when the abbreviation is short for another non-preferred term rather than the preferred term.right. If someone only wants the pure thesaurus, she might get along with the skos: part of UMTHES only and simply ignore the skosxl:+extensions. Cudos to the property chain which Antoine has mentioned, each skosxl:literalForm is equivalent to a directly asigned skos:pref/altLabel. So, nothing would be missing. as said, above: we do not want such property chains. Anyway, hiddenLabel might also hide the lexical complexity, this might be an idea. But I don't like the idea of creating thousands of xl:Label class instances when each of them only carries "exactly one" xl:literalForm and I do not really need class instances for anything else. This usage for acronyms (se also Stellas example above) is just an example, not part of the standard. We have considered to follow this example in the beginning, but then we found "subproperty of lexicalVariant" more convenient. It still conforms, as far as I see. Why should we introduce such a complex linkage chain here and waste all those recources needed to handle linked class instances instead of simple string properties ? Further more & may be more important, I see a considerabel semantic difference between a term (label) and a spelling variant of a term. That's why I do not want to handle them both equally on the model level. see above I would appreciate this and I am expecting nothing else. There are more patterns which have not been "harmonised" in SKOS, such as norrowerPartitive etc. for good reasons. I don't think this is a problem. Any standard should give room for some diversity at its borders. why this? The paatern you recommend is not bad, but its usability depends on the intentions of the thesaurus provdiders.But well, having one of the first XL deployments departing from the meager guidelines we had put in the Reference would not be a great sign for us :-/ Anyway, I can think about this for acronyms. I can do so theough this is not what we want to express. As each xl:label has exactly one xl:literalForm, this necessarily has a single language. From this can be infered that lexical variants of this literalForm have the same language. This is what we want to express, but I see no way to do this in Turtle or even savely in RDF/XML ... You may be right, I'll think this over, but now I have to go out for dinner first :-) Many thanks for your rich comments, Antone! Best regards, Thomas But again, it's late, and I prefer to send this mail rather than letting you wait more time for my answer... -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XLThomas Bandholtz wrote:
> Secondly, we need this stuff to support automated indexing of full text > documents. Machine need to be enabled to detect the Concepts behind this > weird mess of character strings that makes a document (more on this in > the ecoterm presentation). Another interesting point. I sometimes hear people complain that ISO2788-compliant thesauri do not help enough with retrieval from full text of documents that have not been humanly indexed. This is hardly surprising, since they were designed to support retrieval of documents indexed with that same vocabulary. The same is true of BS 8723-2 and the forthcoming ISO 25964-1. When people want to use a thesaurus for full text retrieval, I sometimes suggest they could improve the results by stripping the qualifiers off the non-preferred terms. But more could be done to enhance the results of that process, by including inflectional forms, term weighting, Boolean expressions, additional less reliable clue-words, etc, and of course dropping the idea of admitting the clue-words as non-preferred synonyms with reciprocal relationships. I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? Stella ***************************************************** Stella Dextre Clarke Information Consultant Luke House, West Hendred, Wantage, OX12 8RR, UK Tel: 01235-833-298 Fax: 01235-863-298 stella@... ***************************************************** |
|
|
Re: UMTHES and SKOS-XLHi Stella,
remember Leonard Will's posting about "revising the ISO standard for thesauri for information retrieval" from Feb this year? http://lists.w3.org/Archives/Public/public-esw-thes/2009Feb/0033.html with a huge diagram attached. Would be curious what has happened since then. Leonard, still on the line? Something else regarding my previous post. I was too eager to go out for dinner, so I made a misleading error in this turtle syntax example: #not recommended (and not what I wanted to write) :4711 rdf:type skos:Concept; skos:prefLabel "waste water"; skos:prefLabel "waste waters"; skos:prefLabel "wastewater"; skos:prefLabel "wastewaters"; skos:altLabel "sewage". This is not what i wanted to say. Should read as: #not recommended: :4711 rdf:type skos:Concept; skos:prefLabel "waste water"; skos:altLabel "waste waters"; skos:altLabel "wastewater"; skos:altLabel "wastewaters"; skos:altLabel "sewage". Too silly! Excuse me for such a confusion, i was somehow ... hungry! Damn copy&paste in a hurry! Best regards, Thomas > Thomas Bandholtz wrote: > >> Secondly, we need this stuff to support automated indexing of full >> text documents. Machine need to be enabled to detect the Concepts >> behind this weird mess of character strings that makes a document >> (more on this in the ecoterm presentation). > Another interesting point. I sometimes hear people complain that > ISO2788-compliant thesauri do not help enough with retrieval from full > text of documents that have not been humanly indexed. This is hardly > surprising, since they were designed to support retrieval of documents > indexed with that same vocabulary. The same is true of BS 8723-2 and > the forthcoming ISO 25964-1. > > When people want to use a thesaurus for full text retrieval, I > sometimes suggest they could improve the results by stripping the > qualifiers off the non-preferred terms. But more could be done to > enhance the results of that process, by including inflectional forms, > term weighting, Boolean expressions, additional less reliable > clue-words, etc, and of course dropping the idea of admitting the > clue-words as non-preferred synonyms with reciprocal relationships. > > I sometimes wonder if a future revised version of BS 8723 or ISO 25964 > should include some recommendations to this effect. What do you think? > > Stella > > ***************************************************** > Stella Dextre Clarke > Information Consultant > Luke House, West Hendred, Wantage, OX12 8RR, UK > Tel: 01235-833-298 > Fax: 01235-863-298 > stella@... > ***************************************************** > > -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XLIn message <4ADF6185.1060309@...>, Stella Dextre Clarke
<stella@...> writes >Thomas Bandholtz wrote: > >> Secondly, we need this stuff to support automated indexing of full >>text documents. Machine need to be enabled to detect the Concepts >>behind this weird mess of character strings that makes a document >>(more on this in the ecoterm presentation). >Another interesting point. I sometimes hear people complain that >ISO2788-compliant thesauri do not help enough with retrieval from full >text of documents that have not been humanly indexed. This is hardly >surprising, since they were designed to support retrieval of documents >indexed with that same vocabulary. The same is true of BS 8723-2 and >the forthcoming ISO 25964-1. > >When people want to use a thesaurus for full text retrieval, I >sometimes suggest they could improve the results by stripping the >qualifiers off the non-preferred terms. But more could be done to >enhance the results of that process, by including inflectional forms, >term weighting, Boolean expressions, additional less reliable >clue-words, etc, and of course dropping the idea of admitting the >clue-words as non-preferred synonyms with reciprocal relationships. > >I sometimes wonder if a future revised version of BS 8723 or ISO 25964 >should include some recommendations to this effect. What do you think? I would say not. "Machines detecting concepts" strikes me as an unachievable goal, certainly with our current capabilities. "Machines detecting the presence of words which are also terms in a thesaurus" is achievable, but it _isn't_ the same thing. Richard -- Richard Light |
|
|
RE: UMTHES and SKOS-XLHi,
Suggestion: There are three levels of organization. - Concepts (SKOS talk) - Labels - Text processing A significant part of the issues discussed related to what is on the label management level and what is on the text processing level (thus needing a proper definition) Language specific text processing and analysis (including inflection) seems to me a specialized area for which global resource (language dictionalries) like word-net can solve. Stemmeng, also is in this area. It seems to me costly if this would be managed in every thesaurus. Label management can focus on standard terms and term decomposition as relevant within a thesaurus or taxonomy. (equivalence relation, compound equivalence, acronym, short-name, qualifiers ...) Indexing and search engines combining thesaurus and text processing should can use the label management layer (of the thesaurus) to configure the relevant text processing. Concept and label processing surely belong to the thesaurus/taxonomy/... management. Text processing, I would suggest, is in the text processing engines. PS: - thanks for the UMTHES presentation - very instructive. - would it be an idea to build on further SKOS extensions to have common schema for artefacts like equivalence relation and compound equivalence; or for specializing some xl:labelRelation ? kr, Johan De Smedt. =================== -----Original Message----- From: public-esw-thes-request@... [mailto:public-esw-thes-request@...] On Behalf Of Stella Dextre Clarke Sent: Wednesday, 21 October, 2009 21:31 To: Thomas Bandholtz Cc: Antoine Isaac; SKOS Subject: Re: UMTHES and SKOS-XL Thomas Bandholtz wrote: > Secondly, we need this stuff to support automated indexing of full text > documents. Machine need to be enabled to detect the Concepts behind this > weird mess of character strings that makes a document (more on this in > the ecoterm presentation). Another interesting point. I sometimes hear people complain that ISO2788-compliant thesauri do not help enough with retrieval from full text of documents that have not been humanly indexed. This is hardly surprising, since they were designed to support retrieval of documents indexed with that same vocabulary. The same is true of BS 8723-2 and the forthcoming ISO 25964-1. When people want to use a thesaurus for full text retrieval, I sometimes suggest they could improve the results by stripping the qualifiers off the non-preferred terms. But more could be done to enhance the results of that process, by including inflectional forms, term weighting, Boolean expressions, additional less reliable clue-words, etc, and of course dropping the idea of admitting the clue-words as non-preferred synonyms with reciprocal relationships. I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? Stella ***************************************************** Stella Dextre Clarke Information Consultant Luke House, West Hendred, Wantage, OX12 8RR, UK Tel: 01235-833-298 Fax: 01235-863-298 stella@... ***************************************************** |
|
|
Re: UMTHES and SKOS-XLHi Johan,
> Suggestion: There are three levels of organization. > - Concepts (SKOS talk) > - Labels > - Text processing > Good idea! I would add: Labels are skosxl, text processing is not yet really covered by skos(xl), but can be supported by extending skosxl locally. > A significant part of the issues discussed related to what is on the label management level > and what is on the text processing level (thus needing a proper definition) > > Language specific text processing and analysis (including inflection) > seems to me a specialized area for which global resource (language dictionalries) > like word-net can solve. > http://wordnet.princeton.edu/wordnet/ starts with this sentence: "WordNet® is a large lexical database of English". Right. We have more than 20 languages in European GEMET. Believe me, when it comes to language specific text processing, English is the most simple language. > Stemmeng, also is in this area. > It seems to me costly if this would be managed in every thesaurus. > It is costly, sure, but as I have expressed before, UMTHES has already invested in this, and the question now is how to express the results in a skosxl extension, but not: should UMTHES forget all the results of this investment. You are right in one point: In general, a thesaurus needs not to care about this. It is not a general requirement. But language specific text processing needs to be solved on a language specific level by someone somehow. > Label management can focus on standard terms and term decomposition as relevant within a > thesaurus or taxonomy. (equivalence relation, compound equivalence, acronym, > short-name, qualifiers ...) > Right so far. What we try to handle is: each of such terms (=labels) has multiple spelling conventions, and a spelling variant does not make a different term on the same level. May be this is specific to some languages only and not such an issue in English. > Indexing and search engines combining thesaurus and text processing should can use the label > management layer (of the thesaurus) to configure the relevant text processing. > I think this needs a third, dedicated layer. > Concept and label processing surely belong to the thesaurus/taxonomy/... management. > Text processing, I would suggest, is in the text processing engines. > Right, but text processing engines need some structure to express the diversity of term (Label) ocurrence in natural language. > PS: > - thanks for the UMTHES presentation - very instructive. > Thanks for the flowers, I tried hard to provide some valuable contribution. As always, one has to surrender at some point of complexity (just to be on time for the meeting) and leave the rest to the next presentation, ... > - would it be an idea to build on further SKOS extensions to have common schema for > artefacts like equivalence relation and compound equivalence; or for specializing > some xl:labelRelation ? > I think we should collect more examples and patterns, and we should not try to harmonise this too striktly. What we tried to implement in UMTHES: seperate a pure SKOS CORE representation which everybody can handle from a somehow more experimental (admitted) extension which goes beyound established skos(xl) patterns. But for UMTHES need it now (!) as an exchange format in a real production scenario, so we cannot wait. Thanks Johan for your comments, really helpful to think this over more thoroughly! -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XL
Dear Robert,
Richard, when the machine has detected a term (which is quite easy so far) there are some remaining problems to be solved. I give only two examples:
"Machines detecting concepts" means getting closer and closer towards a save automatic decision in such cases. This will not be finalised by a "big bang", but it is not "an unachievable goal" as you say. It is not yet achieved completely, but there are many approaches coming closer every time you revisit them. Give this a little more time! Best regards, Thomas PS: On the other hand, if someone wants to to expose her knowledge to the Semantic Web, she should use a formal language such as RDF directly and not human lingo. This would make everything much easyer! (Dreaming ;-) -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XL and Others!
Hi!
A few thoughts coming from this discussion: * Indexing Authority List vs Existing Concepts Inventory: the MeSH is an example of merging both. In MeSH/UMLS, Concepts have their specific labels (terms) but they are grouped in micro-hierarchies to form an Heading entry. Example: http://www.nlm.nih.gov/cgi/mesh/2010/MB_cgi?mode=&index=877&view=expanded I believe SKOS is able to represent most of MeSH attributes: * Concept Unique identifier is the "about" * Tree numbers (changing from one year to another) is a notation system * (Heading) Entry Unique Id is another notation system (an id within a sub-scheme) * Registry Number (CAS) is another notation system (an id within another scheme) * Terms are preferred labels or synonyms (depending of lexical tag value) * Scope Notes are SKOS Scope notes. The concept references within Scope Notes have to represented somehow. * Annotation and other are editor notes or other types of SKOS notes * Previous indexing: relatedMatch with older Heading Schemes? It remain to be found a good way to represent Semantic Types (collections?) and Allowable qualifiers (collections too? or SKOS extension?) In this example, a difficult problem is present: the Heading entry is a specific (and not a generic) of the two other "non preferred" concepts! * Full Natural Language Processing needs a way to efficiently treat the EXCEPTIONS: the intuition believes that 80/20 rule is good enough. Reality is much more demanding: "small" linguistic errors are never accepted by humans (when visible: this is why Google does not document them!). So the representation of exceptions must be in the design of data structures for Natural Language Processing systems. It is their main use (the general 80% rules can even be hard coded). This is way too complex to be seen as a simple SKOS extension. * Thesaurus "projection" over a text has been used with success to generate suggestions to human indexers (not for fully automatic indexation). It is very useful and it is true that having the necessary lexical information in a SKOS extension to achieve this would be nice. It is limited to the detection of nominal groups but it may have problems with different grammatical ways to express coordination between elementary concepts in a term. To succeed, this "extension" normalization effort should be done to define properties only for that precise purpose. In general, focused "purpose", open to the different applications with that purpose, is the only way to deliver a working standard... I am very very sorry I cannot attend "Classification at CrossRoads" and the SKOS day, October the 30th in Nederlands: I hope to be able at another occasion. I suppose the communications will be available? Have a nice day! Christophe Dupriez Thomas Bandholtz a écrit : Dear Robert, [christophe_dupriez.vcf] begin:vcard fn:Christophe Dupriez n:Dupriez;Christophe org:DESTIN inc. SSEB adr;quoted-printable:;;rue des Palais 44, bo=C3=AEte 1;Bruxelles;;B-1030;Belgique email;internet:Christophe.Dupriez@... title:Informaticien tel;work:+32/2/216.66.15 tel;fax:+32/2/242.97.25 tel;cell:+32/475.77.62.11 note;quoted-printable:D=C3=A9veloppement de Syst=C3=A8mes de Traitement de l'Information x-mozilla-html:TRUE url:http://www.destin.be version:2.1 end:vcard |
|
|
Re: UMTHES and SKOS-XL and Others!Dear Christophe,
I am not familiar enough with the MeSH/UMLS schema to comment your SKOS mapping spontaneously. So i limit myself to your more general statements: > > * Full Natural Language Processing needs a way to efficiently treat > the EXCEPTIONS: the intuition believes that 80/20 rule is good enough. > Reality is much more demanding: "small" linguistic errors are never > accepted by humans (when visible: this is why Google does not document > them!). > So the representation of exceptions must be in the design of data > structures for Natural Language Processing systems. > It is their main use (the general 80% rules can even be hard coded). > This is way too complex to be seen as a simple SKOS extension. I agree, more or less. SKOS is not made to express rules. But you may enhance xl:Label instances with certain linguistic data (specific to the given language) in order to enable NLP systems getting along with the remaining 20%. At least this is what we try in UMTHES. > > * Thesaurus "projection" over a text has been used with success to > generate suggestions to human indexers (not for fully automatic > indexation). In practise, we once buildt a wizzard making suggestions to human indexers, and after some tests people used it as a fully automatic indexation. This was not because the wizzard would have been perfect, it was because 80% (or even 70) were found to be "good enough". This depends strongly on the use case. > It is very useful and it is true that having the necessary lexical > information in a SKOS extension to achieve this would be nice. > It is limited to the detection of nominal groups but it may have > problems with different grammatical ways to express coordination > between elementary concepts in a term. > To succeed, this "extension" normalization effort should be done to > define properties only for that precise purpose Can this be "normalized". I don't see any normalized NLP methods, so I wonder how we can normalize the properties that will support such methods. Do you have something in mind? > > In general, focused "purpose", open to the different applications > with that purpose, is the only way to deliver a working standard... To me any real world conceptScheme is an individual to a certain extent. SKOS (XL included) covers the common patterns and gives room for necessarily individual extensions. Over time, we might discover more common patterns even in the individuality of each scheme, but some diversity will always remain. I don't think this is a problem. Referring to the UMTHES extensions, it was not the intension to provide a standardisation proposal. UMTHES just needs a lossless RDF serialisation making the most of SKOS and extending it for our specific demands, and we need all this now. But I would be enthusiastic about some future extensions of SKOS towards linguistics and NLP support, if they may arise from this discussion. Kind regards, Thomas -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XLDear Thomas,
The discussion has gone quite wild, I see :-) I'll try to come back to the original UMTHES issue, first... > Speaking in ISO Thesaurus lingo: we do not want inflectional forms etc. to become entry terms. > Why then do we need all those lexical variants at all? > At first, UMTHES just has them. It is my job to serialise UMTHES in SKOS, not to change UMTHES. > Secondly, we need this stuff to support automated indexing of full text documents. Machine need to be enabled to detect the Concepts behind this weird mess of character strings that makes a document (more on this in the ecoterm presentation). I think everything is here, and you don't need to say much more! Especially the first sentence, which can be enough to define a practice (or actually remind it). I now see clearly the point in the example in your slides [2], where the main form xl:Label has dozens of variants in German. Having the knowledge of those could be counter-productive for many user-oriented applications but sophisticated NLP-based tools. Please remind however of the hiddenLabel solution. I agree with your prejudice againts creating more instances of xl:Label, but if you see a slight chance that UMTHES could evolve towards an even more lexically intensive thing, having the instances of xl:Label could spare your some painful model change... Picking some elements from your mail at the bottom: > This usage for acronyms (se also Stellas example above) is just an > example, not part of the standard. We have considered to follow this > example in the beginning, but then we found "subproperty of > lexicalVariant" more convenient. It still conforms, as far as I see. Yes! > Why should we introduce such a complex linkage chain here and waste all > those recources needed to handle linked class instances instead of > simple string properties ? The overhead is not really huge, in fact. I mean, it adds a fraction of all triples that you have already in UMTHES, it's not as if it mutliplied them by ten. > Further more & may be more important, I see a considerabel semantic > difference between a term (label) and a spelling variant of a term. > That's why I do not want to handle them both equally on the model level. Yes, but as SKOS would not make the distinction (other than treating them as hiddenLabel, whereas the others would be pref or alt labels) there would not b a strong counter-argument to it from the SKOS perspective. And from your more practical perspective, you could still create two sub-classes of xl:Label, a bit like what you hint at in your presentation, in fact. >>> I guess we can have >>> several ways of handling a relation such as acronymy co-exist. > I would appreciate this and I am expecting nothing else. There are more > patterns which have not been "harmonised" in SKOS, such as > norrowerPartitive etc. for good reasons. I don't think this is a > problem. Any standard should give room for some diversity at its borders. >>> But well, having one of the first XL deployments departing from the >>> meager guidelines we had put in the Reference would not be a great >>> sign for us :-/ > why this? The paatern you recommend is not bad, but its usability > depends on the intentions of the thesaurus provdiders. > Anyway, I can think about this for acronyms. You're right, much of that depends on the intention of thesaurus providers. And the pattern we had is certainly not intended as normative. >>> 1. Are you planning to add the language tag that seem to be missing >>> on some slides (e.g. for the ext:inflection objects) in the real data? > I can do so theough this is not what we want to express. As each > xl:label has exactly one xl:literalForm, this necessarily has a single > language. From this can be infered that lexical variants of this > literalForm have the same language. This is what we want to express, > but I see no way to do this in Turtle or even savely in RDF/XML ... Yes. The only way to proceed is to simulate that rule and by just putting the tags for all your literals that are in your data :-/ If you want to do it in a neat way, with rules, then you have to represent languages as full-fledged resources, and build axioms using them. Note that there is some logic, in a way. You cannot expect the syntax to allow you to deal with something that seems very much at the model level, at least to me! Cheers, Antoine > Dear Stella & Antoine, > > Antoine has raised the essential issue, Stella came up with a related > use case which can be solved using the UMTHES patterns. > UMTHES distinguishes not only prefLabel from altlabel, but also both > from multiple spelling conventions of any label. > We see abbreviations/acronyms as part of such spelling conventions, > others are inflectional forms of the same term, or even common > misspellings. If we mix this all together into altLabel instances, it > would not make sense any more. > > Stellas example about abbrev is similar, but we separate spelling > conventions ("lexical variants") from labels regardless whether they may > be pref or alt. > Example: > > :4711 rdf:type skos:Concept; > skos:prefLabel "waste water"; > skos:altLabel "sewage". > > makes sense, but > > #not recommended: > :4711 rdf:type skos:Concept; > skos:prefLabel "waste water"; > skos:prefLabel "waste waters"; > skos:prefLabel "wastewater"; > skos:prefLabel "wastewaters"; > skos:altLabel "sewage". > > looks at least somehow "unballanced". > > UMTHTES knows even more about lexical complexity (a really awful issue > in German), that is why we decided to use xl:Label extensions to > separate such complexity from the more prominent list of labels which > are directly assigned to a skos:Concept: > > # hiding lexical complexity from the list of labels > :wasteWater rdf:type skosxl:Label; > skosxl:literalForm "waste water"; > ext:lexicalVariant "wastewater"; > ext:lexicalVariant "wastewaters"; > ext:compoundFrom (:waste :water). > > Speaking in ISO Thesaurus lingo: we do not want inflectional forms etc. > to become entry terms. > (see > http://www.w3.org/2004/02/skos/core/proposals.html#thesaurusRepresentation-11 > ...) > > This is also why we really do not want to have a property chain from a > ext:lexicalVariant to a skos:Concept. > We appreciate the property chain from the skosxl:literalForm to the > skos:Concept. > > Why then do we need all those lexical variants at all? > At first, UMTHES just has them. It is my job to serialise UMTHES in > SKOS, not to change UMTHES. > Secondly, we need this stuff to support automated indexing of full text > documents. Machine need to be enabled to detect the Concepts behind this > weird mess of character strings that makes a document (more on this in > the ecoterm presentation). > > See some more notes inline below. > > Stella Dextre Clarke schrieb: >> Ah yes. We discovered a similar problem during work on BS 8723. It was >> about whether to introduce a specialisation of USE/UF to cater for >> abbreviations/acronyms and their expansions, for which you might use >> tags such as AB/FT. A problem arises when the abbreviation is short >> for another non-preferred term rather than the preferred term. >> (For example, the preferred term "Information and communication >> technology" can have non-preferred terms "Information technology", >> "IT" and "ICT") >> It becomes apparent that the proposed specialisation is not really a >> type of USE/UF. It is an inter-term relationship that can sometimes >> apply between non-preferred terms. Obviously it is possible to find a >> way of representing this accurately, but at the expense of making the >> whole model more complicated and the tagging conventions more cumbersome. >> >> My personal view on this is that if you try to add more value in the >> shape of lexical/terminological information, you lose the virtue of >> simplicity. To put it another way, if you have mixed objectives >> (trying to achieve terminological objectives as well as enabling >> information retrieval) these tend to detract from each other. > right. If someone only wants the pure thesaurus, she might get along > with the skos: part of UMTHES only and simply ignore the skosxl:+extensions. > Cudos to the property chain which Antoine has mentioned, each > skosxl:literalForm is equivalent to a directly asigned skos:pref/altLabel. > So, nothing would be missing. > >> >> Cheers >> Stella >> >> ***************************************************** >> Stella Dextre Clarke >> Information Consultant >> Luke House, West Hendred, Wantage, OX12 8RR, UK >> Tel: 01235-833-298 >> Fax: 01235-863-298 >> stella@... >> ***************************************************** >> >> >> Antoine Isaac wrote: >>> Hi everyone, >>> >>> I'm putting here a discussion we started with Thomas Brandholtz on >>> UMTHES [1] on the use of SKOS-XL there (see slides at [2]). A long >>> mail, but it can be interesting for a wider audience, as UMTHES is >>> one of the first SKOS-XL implementations! >>> >>> === >>> >>> Dear Thomas, >>> >>> So let's go. The main issue I have is that xl:Label is used in a very >>> "term-oriented" way in UMTHES. >>> More precisely, I feel that you are using labels to aggregate lexical >>> entities which which indeed are belonging to the same "term". But >>> these literals be introduced as labels in basic SKOS, I think. Trying >>> to use a concrete example from your slides: >>> >>> :4711 rdf:type skos:Concept; >>> skosxl:prefLabel :wasteWater. >>> >>> :wasteWater rdf:type skosxl:Label; >>> skosxl:literalForm "waste water"; >>> ext:lexicalVariant "wastewater"; >>> ext:compoundFrom (:waste :water). >>> >>> "wastewater" is introduced as a lexical variant of "waste water". Per >>> se, this is of course ok. >>> But in basic SKOS, I would have modelled that "wastewater" as a >>> skos:altLabel or a skos:hiddenLabel of :4711. As not attaching that >>> string to an instance of xl:Label using xl:literalForm prevents you >>> from benefitting from the useful property chains given in XL. So I >>> would have represented "wastewater" as an instance of xl:Label. >>> >>> Of course, you may object that you can declare yourself a property >>> chain (or property chains) that would allow to infer that the >>> literals that are objects of ext:lexicalVariant triples (or the ones >>> involving sub-properties of ext:lexicalVariant) are also objects of >>> skos:hiddenLabel (or skos:altLabel) statements attached to the >>> skos:Concept to which your xl:Label is attached. > as said, above: we do not want such property chains. > Anyway, hiddenLabel might also hide the lexical complexity, this might > be an idea. > But I don't like the idea of creating thousands of xl:Label class > instances when each of them only carries "exactly one" xl:literalForm > and I do not really need class instances for anything else. >>> But then I'd be still uncomfortable with an xl:Label giving raise to >>> several (SKOS-basic) labels. >>> Additionally, we actually introduced xl:labelRelation to handle cases >>> like acronyms [1]. In your approach, acronym is a subproperty of >>> lexicalVariant, which is clearly a different pattern from ours. > This usage for acronyms (se also Stellas example above) is just an > example, not part of the standard. We have considered to follow this > example in the beginning, but then we found "subproperty of > lexicalVariant" more convenient. It still conforms, as far as I see. >>> >>> As I feel it, your choice may be prefectly grounded in terminology. >>> Still, I'd be curious to hear whether this is a strong position of >>> yours, or if you could accomodate a different pattern. >>> >>> Maybe there can be indeed a solution accomodating both points of view >>> (if I interpreted one correctly, of course). Namely, introducing >>> "wastewater" as the literalForm of an xl:Label which is not connected >>> to any concept; just connected (by an ext:lexicalVariant which would >>> be then a sub-property of xl:labelRelation) to :wasteWater. > Why should we introduce such a complex linkage chain here and waste all > those recources needed to handle linked class instances instead of > simple string properties ? > > Further more & may be more important, I see a considerabel semantic > difference between a term (label) and a spelling variant of a term. > That's why I do not want to handle them both equally on the model level. > >>> >>> Of course you can say then that the distinction between "waste water" >>> and "wastewater" is something very important for your UMTHES and the >>> applications you envision with it, and that "wastewater" should never >>> be used as a basic concept label, even a hidden one. Or not even >>> interpreted as something that could be a label... > see above >>> >>> You can also argue that the xl:Label story is quite thin in the SKOS >>> Reference anyway, and that you can use that class as a purely >>> technical hook for any purpose. That's indeed not far from being the >>> truth, and if all are rightfully motivated, well, I guess we can have >>> several ways of handling a relation such as acronymy co-exist. > I would appreciate this and I am expecting nothing else. There are more > patterns which have not been "harmonised" in SKOS, such as > norrowerPartitive etc. for good reasons. I don't think this is a > problem. Any standard should give room for some diversity at its borders. > >>> But well, having one of the first XL deployments departing from the >>> meager guidelines we had put in the Reference would not be a great >>> sign for us :-/ > why this? The paatern you recommend is not bad, but its usability > depends on the intentions of the thesaurus provdiders. > Anyway, I can think about this for acronyms. >>> >>> >>> Apart from this issue of ext:literalVariant and its sub-properties, I >>> found the rest really good, confirming my first enthusiastic reaction >>> after your talk :-) >>> >>> Two comments/questions, maybe: >>> >>> 1. Are you planning to add the language tag that seem to be missing >>> on some slides (e.g. for the ext:inflection objects) in the real data? > I can do so theough this is not what we want to express. As each > xl:label has exactly one xl:literalForm, this necessarily has a single > language. From this can be infered that lexical variants of this > literalForm have the same language. This is what we want to express, > but I see no way to do this in Turtle or even savely in RDF/XML ... >>> >>> 2. Intuitively, I feel that the definition of :NonPreferredTerm (on >>> slide 33) is too strong. I would have said that everything that is >>> related via xl:altLabel to a concept cannot be a PreferredTerm. >>> Otherwise there would be a conflict with the inferred basic SKOS >>> labelling triples [2]. So the complementOf axiom would not be really >>> needed. > You may be right, I'll think this over, but now I have to go out for > dinner first :-) > > Many thanks for your rich comments, Antone! > > > Best regards, Thomas >>> But again, it's late, and I prefer to send this mail rather than >>> letting you wait more time for my answer... >>> Cheers, >>> >>> Antoine >>> >>> [1] http://www.w3.org/2006/07/SWD/track/issues/215 >>> [2] >>> http://eea.eionet.europa.eu/Public/irc/envirowindows/jad/library?l=/ecoinformatics_indicator/ecoterm_5-6102009/ecoterm09-bandholtzppt/_EN_1.0_&a=d >>> >>> >>> >>> >> > > > -- > Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com > innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany > Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 > |
|
|
Re: UMTHES and SKOS-XL and Others!Hello everyone,
Johan's suggestion > There are three levels of organization. > - Concepts (SKOS talk) > - Labels > - Text processing makes sense indeed. As Thomas, however, I would think that the label layer falls at least partly in the SKOS(XL) scope. And in the ISO/BS one. But to answer Stella specifically, on what I think belongs to the third point of Johan, > I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? I also think that this is a dangerous road to go. I mean, I certainly think that the effort of representing lexical info is very useful. And I believe that it is possible to achieve interesting stuff based on that. But for us (more simple KOS-oriented efforts like SKOS/ISO/BS) it would be better to just focus on: - point to some initiatives, such as Wordnet and [1], which try to represent lexical information to allow NLP tools to work with. - to allow those initiatives to be plugged onto our KOS-related efforts (or vice versa) by providing with the sufficient extension hooks. Which was the main rationale for SKOS-XL, in fact. Trying to cope with all the required details is out of our scope, and I think, our expertise, even if ISO/BS committees have bright people involved ;-) In fact finding a core model for lexical information modelling (such as [1]) is still an ongoing work, and there are multiple proposals around, which shows that it is indeed a complex. Cheers, Antoine [1] http://code.google.com/p/lexinfo/ > Dear Christophe, > > I am not familiar enough with the MeSH/UMLS schema to comment your SKOS > mapping spontaneously. > So i limit myself to your more general statements: > >> * Full Natural Language Processing needs a way to efficiently treat >> the EXCEPTIONS: the intuition believes that 80/20 rule is good enough. >> Reality is much more demanding: "small" linguistic errors are never >> accepted by humans (when visible: this is why Google does not document >> them!). >> So the representation of exceptions must be in the design of data >> structures for Natural Language Processing systems. >> It is their main use (the general 80% rules can even be hard coded). >> This is way too complex to be seen as a simple SKOS extension. > > I agree, more or less. SKOS is not made to express rules. But you may > enhance xl:Label instances with certain linguistic data (specific to the > given language) in order to enable NLP systems getting along with the > remaining 20%. At least this is what we try in UMTHES. > >> * Thesaurus "projection" over a text has been used with success to >> generate suggestions to human indexers (not for fully automatic >> indexation). > > In practise, we once buildt a wizzard making suggestions to human > indexers, and after some tests people used it as a fully automatic > indexation. > This was not because the wizzard would have been perfect, it was because > 80% (or even 70) were found to be "good enough". This depends strongly > on the use case. > >> It is very useful and it is true that having the necessary lexical >> information in a SKOS extension to achieve this would be nice. >> It is limited to the detection of nominal groups but it may have >> problems with different grammatical ways to express coordination >> between elementary concepts in a term. >> To succeed, this "extension" normalization effort should be done to >> define properties only for that precise purpose > > Can this be "normalized". I don't see any normalized NLP methods, so I > wonder how we can normalize the properties that will support such > methods. Do you have something in mind? > >> In general, focused "purpose", open to the different applications >> with that purpose, is the only way to deliver a working standard... > > To me any real world conceptScheme is an individual to a certain extent. > SKOS (XL included) covers the common patterns and gives room for > necessarily individual extensions. Over time, we might discover more > common patterns even in the individuality of each scheme, but some > diversity will always remain. I don't think this is a problem. > > Referring to the UMTHES extensions, it was not the intension to provide > a standardisation proposal. > UMTHES just needs a lossless RDF serialisation making the most of SKOS > and extending it for our specific demands, and we need all this now. > But I would be enthusiastic about some future extensions of SKOS towards > linguistics and NLP support, if they may arise from this discussion. > > Kind regards, > Thomas > |
|
|
|
|
|
RE: UMTHES and SKOS-XLHi Thomas,
Thanks for
the references, analysis and explanation.
You
did not miss anything.
However, I
want to iterate on some practical considerations
I see SKOS
primarily as an exchange format not as a maintenance format.
Users
of the EUROVOC thesaurus maintenance system manage permuted
literals as properties of preferred or alternate xl:Labels.
Hence:
- the genuine
managed labels are ok to have a URI that can later be used in an LOD or SPARQL
service interface
- permuted
literal forms do not have this quality
However, when
making a SKOS compliant publication, the hidden label has relevance (as
search value)
Hence EUROVOC
publishes for a skos:Concept :C
- :C
xl:prefLabel :ptC; xl:altLabel :nptC; skos:hiddenLabel "permuted literal form of
C" .
- :ptC
xl:literalForm "PT of C" .
- :nptC
xl:literalForm "nPT of C" .
I think this
is compliant with SKOS(XL) - comment is welcome.
The details
of why
- :nptC
is an an alt label
- "permuted
literal form of C" is a permuted label
can be found
in the equivalence relationships (simple or compound) or in the permuted literal
forms of either :ptC or :nptC.
This is
expressed in the EUROVOC specific SKOS extension (and thus requires
knowledge of the owl schema beyond
the formal
OWL expressions - i.e. the documentation that goes with the
schema).
The selection
of something being a PT or an nPT or a permuted label is up to the
thesaurus maintenance/management.
It is not
always obvious if either an acronym ("OWL") or the full name ("Web Ontology
Language") will be used as PT
in a real
world thesaurus. (Like considerations apply for other label
relations)
However, once
a name is selected as the PT, the related labels are likely
(mandatory?) candidates for nPT or hidden labels.
As there
currently is no SKOS extension capturing such label relations, we now discuss on
which approach to take.
I would
advocate that for some of the work done on the ISO standardization, it may be
worthwhile to do some RDF
standardization effort in the future.
Possible candidates are:
- Concept
groups
- Equivalence
relationships (simple and compound)
Obviously,
the industry may find the schema provided in the ISO standard (UML/XML)
sufficient.
kr, Johan De Smedt. From: Thomas Bandholtz [mailto:thomas.bandholtz@...] Sent: Sunday, 25 October, 2009 15:23 To: Johan De Smedt; Johan De Smedt; SKOS Cc: 'Antoine Isaac' Subject: Re: UMTHES and SKOS-XL from the SKOS point of view, the structure of your ev:permutedLiteralForm is very similar to that of umthes:lexicalVariant. As both are defined as local datatype properties of skosxl:Label, the property chain S57 will not work: S57: "The property chain (skosxl:hiddenLabel, skosxl:literalForm) is a sub-property of skos:hiddenLabel." You are right, OWL 2 introduces property chains for owl:ObjectProperty but not for owl:DatatypeProperty. See http://www.w3.org/TR/2009/WD-owl2-new-features-20090611/#F8:_Property_Chain_Inclusion There has been some discussion about "formal expression of property chains" in the skos list, but no final clarification. See http://lists.w3.org/Archives/Public/public-esw-thes/2009May/0003.html I think that Antoine's draft in http://lists.w3.org/Archives/Public/public-swd-wg/2009Mar/0043.html and is not valid OWL2 because he refers to datatype properties. There are some valid examples in the OWL 2 primer, for instance: <rdf:Description rdf:about="hasUncle"> <owl:propertyChainAxiom rdf:parseType="Collection"> <owl:ObjectProperty rdf:about="hasFather"/> <owl:ObjectProperty rdf:about="hasBrother"/> </owl:propertyChainAxiom> </rdf:Description> To me it is not really clear why this pattern is restricted to object properties in OWL 2, but it is. Anyway, given that S57 is valid, ev:permutedLiteralForm and umthes:lexicalVariant would need to be remodelled as xl:literalForm of some xl:Label, with some additional ev:hasPermutedLiteralForm subproperty of xl:LabelRelation. Then you can point from the Concept to the permuted from using xl:hiddenLabel. Something like: # no hiddenLabel in this example :123 rdf:type skos:Concept; skosxl:prefLabel :ABC. :ABC rdf:Type skosxl:Label; skosxl:literalForm "Something"; jv:permutedLiteralForm "permuted form of Something". would need to be modified as in the next example: # using hiddenLabel and a subproperty of skosxl:labelRelation :123 rdf:type skos:Concept; skosxl:prefLabel :ABC; skosxl:hiddenLabel :ABCperm. :ABC rdf:Type skosxl:Label; skosxl:literalForm "Something". :ABCperm rdf:Type skosxl:Label; skosxl:literalForm "permuted form of Something". :ABC: jv:hasPermutedLiteralForm :ABCperm. Looks a bit complicated, but this is how I read the SKOS specification, and Antoine pointed to this in a previous mail. I am still not decided whether to go this way for umthes:lexicalVariant. May be these properties just don't need to be hiddenLabels. Definition of skos:hiddenLabel: "A lexical label for a resource that should be hidden when generating visual displays of the resource, but should still be accessible to free text search operations." Doesn't seem to be very clear: free text search operations may access literal values of any property, but a SPARQL query restricted to rdfs:label (including subs) wouldn't. As far as i see, this would be the only consequence. Did I miss something? Kind regards, Thomas Johan De Smedt schrieb:
-- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Re: UMTHES and SKOS-XLsome considerations inline: [skip]
This is a very important issue. I used to see SKOS purely as an exchange format either, but since LOD I understand skosified reference vocabularies as an important building block at runtime. This does not constrain maintenance to anything else than the vocabulary has to be serializable in SKOS so that it will meet the expectations of a SKOS aware application.
I don't see any reason why this should not be compliant with SKOS. But it may not express your semantics: You provide prefLabel and altLabel with XL, but hiddenLabel in plain SKOS, so how will you express that some permuted literal form refers to one of the labels? A Concept by itself has no literal form, so i do not understand how it may have a permuted literal form. Is there exactly one permuted literal form per Concept or per Label? Could you give an example?
I understand that "ptC" stands for "preferred term of a Concept" and "nptC" for "non-preferred term of a Concept", right? The basic ISO equivalence relationship is "preferredTerm USED FOR non-preferredTerm" with inverse "non-preferredTerm USE preferredTerm". There is no such construct in SKOS. A SKOSXL Label cannot be preferred or not by itself, it only depends on how it is linked to a Concept (pref/altLabel). (see an example below ...) Guess that is the reason why you have a EUROVOC specific SKOS extension (which we don't know so far). I wonder how you express "permuted literal forms of either :ptC or :nptC", when the permuted literal form is a rdf:Literal? This might be a special case, but xs:labelRelation is intended to point to a xl:Label instance, not to a Literal.
What is the difference from a skos:Collection?
SKOS only has altLabel prefLabel relations from a Concept to a Label. >From this arises the question whether the same Label my be pref of one Concept and alt of another? Would this be compliant? Yes (may be not intentionally). S13: "skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties." S14: "A resource has no more than one value of skos:prefLabel per language tag." These only keep you from saying something like: <Love> skos:prefLabel "love"@en ; skos:prefLabel "adoration"@en . or <Love> skos:prefLabel "love"@en ; skos:altLabel "love"@en . But the following is compliant: <A> skos:prefLabel "love"@en ; skos:altLabel "adoration"@en . <b> skos:prefLabel "adoration"@en ; skos:altLabel "love"@en . Or even more evident in XL: <A> skosxl:prefLabel :love; skosxl:altLabel :adoration . <B> skosxl:prefLabel :adoration ; skosxl:altLabel :love . :love skosxl:literalForm "love"@en . :adoration skosxl:literalForm "adoration"@en. SKOS pref/alt of a label is only known in the context of a given Concept, while ISO pref/nonPref is bound to a given label (~term). Right? If you want to have ISO equivalence in SKOS you may express something like: prefTerm subClassOf xl:Label . nonPrefTerm subclassOf xl:Label . prefTerm disjointWith nonPrefTerm . xl:prefLabel range prefTerm . xl:altLabel range nonPrefTerm . usedFor subPropertyOf xl:labelRelation; domain prefTerm; range nonPrefTerm; inverseOf use . and then: love a prefTerm; adoration a nonPrefTerm; love usedFor adoration.
? I do not really understand this. -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
RE: UMTHES and SKOS-XLHi Thomas,
Please find in-line and prefixed with ">>JDS-2:" clarifications on your
added questions.
In the examples, "ev:" stands for the EUROVOC schema
prefix.
kr, Johan De Smedt. From: Thomas Bandholtz [mailto:thomas.bandholtz@...] Sent: Sunday, 25 October, 2009 20:13 To: Johan De Smedt Cc: 'SKOS'; 'Antoine Isaac' Subject: Re: UMTHES and SKOS-XL some considerations inline: [skip]
This is a very important issue. I used to see SKOS purely as an exchange format either, but since LOD I understand skosified reference vocabularies as an important building block at runtime. This does not constrain maintenance to anything else than the vocabulary has to be serializable in SKOS so that it will meet the expectations of a SKOS aware application.
I don't see any reason why this should not be compliant with SKOS. But it may not express your semantics: >>JDS-2: indeed, not all semantics are expressed that is what I
try to explain below..
You provide prefLabel and
altLabel with XL, but hiddenLabel in plain SKOS, so how will you express that
some permuted literal form refers to one of the labels? A Concept by itself has
no literal form, so i do not understand how it may have a permuted literal form.
Is there exactly one permuted literal form per Concept or per Label?
Could you give an example? >>JDS-2: example: C stands for the concept with preferred term "child
abuse"
:C xl:prefLabel :childAbuse
:childAbuse
xl:literalForm "child
abuse"@en .
:childAbuse
ev:permutedLiteralForm "abuse,
child"@en .
For this, the
EUROVOC publishing service generating SKOS will generate in
addition
:C
skos:prefLabel "child abuse"@en
.
:C
skos:hiddenLabel "abuse, child"@en
.
This is based on the following two (informally noted) rules that
go with the EUROVOC schema
- A chain xl:prefLabel([Concept][Term]) o
ev:permutedLiteralForm([Term][literal]) → skos:hiddenLabel([Concept][literal])
.
- A chain xl:altLabel([Concept][Term]) o ev:permutedLiteralForm([Term][literal]) → skos:hiddenLabel([Concept][literal]) .
I understand that "ptC" stands for "preferred term of a Concept" and "nptC" for "non-preferred term of a Concept", right? >>JDS-2: yes. The basic ISO equivalence relationship is "preferredTerm USED FOR non-preferredTerm" with inverse "non-preferredTerm USE preferredTerm". There is no such construct in SKOS. A SKOSXL Label cannot be preferred or not by itself, it only depends on how it is linked to a Concept (pref/altLabel). (see an example below ...) Guess that is the reason why you have a EUROVOC specific SKOS extension (which we don't know so far). I wonder how you express "permuted literal forms of either :ptC or :nptC", when the permuted literal form is a rdf:Literal? >>JDS-2: an xl:Label may have an arbitrary number of
ev:permutedLiteralForm. This is a data property (like
xl:literalForm).
>>JDS-2: in contrast though, ev:permutedLiteralForm has no
cardinality constraints.
>>JDS-2: further, for any xlLabel :L, its property
xl:literalForm and all its ev:permutedLiteralForm must have the same
language.
This might be a special case, but xs:labelRelation is intended to point to a xl:Label instance, not to a Literal.
What is the difference from a skos:Collection? >>JDS-2: group means any subset of concepts while collections
where aimed to represent "node labels" and "facets".
>>JDS-2: I think Stella and Antoine are better placed to
respond to this accurately.
SKOS only has altLabel prefLabel relations from a Concept to a Label. From this arises the question whether the same Label my be pref of one Concept and alt of another? Would this be compliant? Yes (may be not intentionally). S13: "skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties." S14: "A resource has no more than one value of skos:prefLabel per language tag." These only keep you from saying something like: <Love> skos:prefLabel "love"@en ; skos:prefLabel "adoration"@en . or <Love> skos:prefLabel "love"@en ; skos:altLabel "love"@en . But the following is compliant: <A> skos:prefLabel "love"@en ; skos:altLabel "adoration"@en . <b> skos:prefLabel "adoration"@en ; skos:altLabel "love"@en . Or even more evident in XL: <A> skosxl:prefLabel :love; skosxl:altLabel :adoration . <B> skosxl:prefLabel :adoration ; skosxl:altLabel :love . :love skosxl:literalForm "love"@en . :adoration skosxl:literalForm "adoration"@en. SKOS pref/alt of a label is only known in the context of a given Concept, while ISO pref/nonPref is bound to a given label (~term). Right? >>JDS-2: I agree and this makes ISO Thesaurus semantics more
strict than SKOS (as you demonstrate in the example
below).
If you want to have ISO equivalence in SKOS you may express something like: prefTerm subClassOf xl:Label . nonPrefTerm subclassOf xl:Label . prefTerm disjointWith nonPrefTerm . xl:prefLabel range prefTerm . xl:altLabel range nonPrefTerm . >>JDS-2: I do not follow with these last 2 rules as they
would redefine SKOS-XL.
>>JDS-2: Instead we define ev:EquivalenceRelation relating a
prefTerm and a nonPrefTerm using properties ev:use and ev:uf
respectively.
>>JDS-2: ev:use and ev:uf do have range prefTerm and
nonPrefTerm respectively.
>>JDS-2: then we say that :C xl:altLabel :nptC is entailed
by:
>>JDS-2: :C xl:prefLabel :ptC.
>>JDS-2: :eqr rdf:type ev:equivalenceRelation
; ev:use :ptC ; ev:uf
:nptC.
usedFor subPropertyOf xl:labelRelation; domain prefTerm; range nonPrefTerm; inverseOf use . and then: love a prefTerm; adoration a nonPrefTerm; love usedFor adoration.
? I do not really understand this. >>JDS-2: I mean the ISO standard went a long
way:
>>JDS-2: The BS preparing it defined an XML schema for
Thesauri.
>>JDS-2: This covered more than SKOS (this statement is scoped
to thesaurus).
>>JDS-2: In addition a model was defined using the Unified
Modeling Language (UML).
>>JDS-2: Likewise this model has more specific thesaurus
artifacts. -- Thomas Bandholtz, thomas.bandholtz@..., http://www.innoq.com innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491 |
|
|
Label management information in SKOS-XL (continuing from UMTHES and SKOS-XL)
Hi!
It was asked to me today how I will keep track of a label source (and other management information) in SKOS... Hopefully SKOS-XL reifies labels. Looking at ISO 25964, I have: Attributes of ThesaurusTerm: LexicalValue String 1 The wording of the term identifier String 1 A unique identifier for the term created date 0..1 The date when the term was created modified date 0..1 The date when the term was last modified source String 0..1 The person(s) or document(s) from which the term was taken Status String 0..1 Indication of whether the term is candidate, approved, etc. lang language 0..1 A code showing the language of the term. This should be included if the thesaurus supports more than one language The subject may be touchy but one could like to see a standardized way to have SKOS / ISO 25964 interchangeability (something like an SKOS/ISO application profile) This would allow: 1) An ISO 25964 thesaurus editor could unload/reload using SKOS files without information losses. 2) Exchanges between different ISO 25964 thesauri could be done using the SKOS format. 3) SKOS aware applications could support ISO 25964 "extensions" without parameterization to indicate which RDF attributes contains the supplementary ISO data. 4) SKOS would benefit from the insights of ISO 25964 design team. There is a rather striking difference between SKOS flexibility and extendability (opening ways to unstandardized horizons) and ISO willingness to build upon the past within a stricter frame. What I am suggesting is to check (and to normalize somewhat) how the complete data model of ISO can be mapped in SKOS(-XL). By the way, labels reification opens the way to write labels which are written from multiple coordinated concepts. A reified label of a coordination concept could include an rdf:Seq. This rdf:Seq would contain strings and/or refers to (reified) labels from the different coordinated concepts and/or refers to coordination operators (conjunctions). This could generate a dynamic literalForm based on the labels of the differeent coordinated concepts. Have a nice evening, Christophe Johan De Smedt a écrit :
[christophe_dupriez.vcf] begin:vcard fn:Christophe Dupriez n:Dupriez;Christophe org:DESTIN inc. SSEB adr:;;rue G.Godefroid 9;Felenne (Beauraing);;B-5570;Belgique email;internet:Christophe.Dupriez@... title;quoted-printable:Informaticien, Syst=C3=A8mes d'Information et de Documentation tel;cell:+32/475.77.62.11 note;quoted-printable:D=C3=A9veloppement de Syst=C3=A8mes de Traitement de l'Information x-mozilla-html:TRUE url:http://www.destin.be version:2.1 end:vcard |
|
|
RE: Label management information in SKOS-XL (continuing from UMTHES and SKOS-XL)Hi Christophe, I provided some in-line
considerations
nice
evening kr, Johan De Smedt. From: Christophe Dupriez [mailto:christophe.dupriez@...] Sent: Wednesday, 04 November, 2009 22:06 To: Johan De Smedt Cc: 'Thomas Bandholtz'; 'SKOS'; 'Antoine Isaac'; Dominique Vanpée Subject: Label management information in SKOS-XL (continuing from UMTHES and SKOS-XL) Hi!
It was asked to me today how I will keep track of a label source (and other management information) in SKOS... >>>JDS-3: dc:source applied would seem a nice
candidate
Hopefully SKOS-XL reifies labels. Looking at ISO 25964, I have: Attributes of ThesaurusTerm: LexicalValue String 1 The wording of the term identifier String 1 A unique identifier for the term created date 0..1 The date when the term was created modified date 0..1 The date when the term was last modified source String 0..1 The person(s) or document(s) from which the term was taken Status String 0..1 Indication of whether the term is candidate, approved, etc. lang language 0..1 A code showing the language of the term. This should be included if the thesaurus supports more than one language The subject may be touchy but one could like to see a standardized way to have SKOS / ISO 25964 interchangeability (something like an SKOS/ISO application profile) . This would allow: 1) An ISO 25964 thesaurus editor could unload/reload using SKOS files without information losses. >>>JDS-3: I think it is feasible to write a SKOS extension
that captures the formal model of the ISO standard.
>>>JDS-3: on an earlier version of SKOS-XL, I made
an exercise some time ago to cover the BS8723 (which was input for the ISO
standard)
>>>JDS-3: note this exercise was never discussed on any
forum yet
2) Exchanges between different ISO 25964 thesauri could be done using the SKOS format. 3) SKOS aware applications could support ISO 25964 "extensions" without parameterization to indicate which RDF attributes contains the supplementary ISO data. 4) SKOS would benefit from the insights of ISO 25964 design team. >>>JDS-3: I support these considerations. It
would provide a formal guideline for SKOS - ISO thesaurus
transformation. There is a rather striking difference between SKOS flexibility and extendability (opening ways to unstandardized horizons) and ISO willingness to build upon the past within a stricter frame. What I am suggesting is to check (and to normalize somewhat) how the complete data model of ISO can be mapped in SKOS(-XL). By the way, labels reification opens the way to write labels which are written from multiple coordinated concepts. A reified label of a coordination concept could include an rdf:Seq. This rdf:Seq would contain strings and/or refers to (reified) labels from the different coordinated concepts and/or refers to coordination operators (conjunctions). This could generate a dynamic literalForm based on the labels of the differeent coordinated concepts. Have a nice evening, Christophe Johan De Smedt a écrit :
|
|
|
Re: Label management information in SKOS-XL (continuing from UMTHES and SKOS-XL)On Wed, 4 Nov 2009 at 23:17:06, Johan De Smedt
<johan.de-smedt@...> wrote >Hi Christophe, I provided some in-line considerations > >From: Christophe Dupriez [mailto:christophe.dupriez@...] >Sent: Wednesday, 04 November, 2009 22:06 > >The subject may be touchy but one could like to see a standardized way >to have SKOS / ISO 25964 interchangeability (something like an SKOS/ISO >application profile) . > >This would allow: >1) An ISO 25964 thesaurus editor could unload/reload using SKOS files >without information losses. >>>>JDS-3: I think it is feasible to write a SKOS extension that >>>>captures the formal model of the ISO standard. >>>>JDS-3: on an earlier version of SKOS-XL, I made an exercise some >>>>time ago to cover the BS8723 (which was input for the ISO standard) >>>>JDS-3: note this exercise was never discussed on any forum yet >2) Exchanges between different ISO 25964 thesauri could be done using the >SKOS format. >3) SKOS aware applications could support ISO 25964 "extensions" without >parameterization to indicate which RDF attributes contains the supplementary >ISO data. >4) SKOS would benefit from the insights of ISO 25964 design team. >>>>JDS-3: I support these considerations. It would provide a formal >>>>guideline for SKOS - ISO thesaurus transformation. >There is a rather striking difference between SKOS flexibility and >extendability (opening ways to unstandardized horizons) and ISO >willingness to build upon the past within a stricter frame. > >What I am suggesting is to check (and to normalize somewhat) how the >complete data model of ISO can be mapped in SKOS(-XL). I am encouraged that this issue has been opened again, because as a member of the ISO 25964 working party, I am keen to resolve any divergence between SKOS and that standard. Part 1 of that standard is still in draft, but will soon be circulated to national standardising bodies for comment - they may be willing to supply copies to interested parties, for a price :-( Some of the issues were discussed in my message of 13th February 2009 and subsequent discussion <http://lists.w3.org/Archives/Public/public-esw-thes/2009Feb/0033.html> The UML model has been slightly modified since then, and I attach a copy of the latest version of the class diagram below. There are notes in the standard which give more detail, but due to ISO copyright restrictions I am very sorry not to be able to make them available; I shall however do my best to clarify any further points if anyone asks. >By the way, labels reification opens the way to write labels which are >written from multiple coordinated concepts. >A reified label of a coordination concept could include an rdf:Seq. >This rdf:Seq would contain strings and/or refers to (reified) labels from >the different coordinated concepts and/or refers to coordination operators >(conjunctions). >This could generate a dynamic literalForm based on the labels of the >differeent coordinated concepts. This is the main element that is missing to allow the model to represent classification schemes and other forms of pre-coordinated knowledge organisation schemes. Such schemes typically have classes which represent compound concepts, in which concepts from more than one facet are combined, such as an activity and the people who carry out that activity. When changes of facet occur within a classification hierarchy, the relationship is one of synthesis rather than of subordination, and neither SKOS nor the ISO model yet provide for this. I would like to see this added to our model, and I think that it will probably involve a solution on the lines that Christophe suggests above. It is not just a case of combining labels, though; presumably we have to treat the compound concept as a type of concept in the model, so that we have concepts which are made up of compounds of other, simpler, concepts, combined in a specified sequence, possibly with coordination operators, and with corresponding labels. Would anyone like to try adding this to the model below? Coordination of concepts has previously been discussed during the development of SKOS but was not followed through because it was "too hard" to deal with within the time available, e.g. <http://www.w3.org/2004/02/skos/core/proposals.html#coordination-8> <http://www.w3.org/2006/07/SWD/track/issues/40> Can we look at it again now? Regards Leonard Will -- Willpower Information (Partners: Dr Leonard D Will, Sheena E Will) Information Management Consultants Tel: +44 (0)20 8372 0092 27 Calshot Way L.Will@... ENFIELD Sheena.Will@... EN2 7BQ, UK http://www.willpowerinfo.co.uk/ |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |