|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumThis message is in regard to the discussion related to [this](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0101.html
). When I was implementing the Cerebra OWL reasoner, I came to the firm conclusion that the OWL (1.0) spec was downright broken on this point, and I fear we're in danger of breaking OWL 2.0 in exactly the same way. Putting aside the issue of whether or not it's possible to use (only) the XML Schema datatypes to represent meaningful and implementable OWL datatype value spaces, I expect that there is consensus that when users were writing `xsd:float` and `xsd:double` without values in OWL 1.0, what they really meant was "any number". No user ever intended to restrict the semantic space to a nowhere-dense number line. If the OWL spec presupposes that most of our users would a prefer a number line which does not include 1/3, my choice as an implementor would be to once again ignore the spec and be intentionally non-compliant. Doing what all my users want and expect in this case turns out to be way way easier than doing what a broken spec would require. Any working group who would produce such a spec would clearly be putting their own interests (ease of spec authoring and political considerations) above their duty to their intended users. (Note that in the course of the discussion I read on public-owl-wg the notions of "dense" and "continuous" seem to have become confused. I think the notion of density is probably the only one that makes a difference in terms of current OWL semantics, since number restrictions can cause inconsistencies in non-dense number lines, but continuity is really what users have in their heads.) The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/) is focused on representing particular values, not on classes of values. The notion of "value spaces" is used within the spec, but only in service of representation of values---note that there's not a single value space mentioned which is continuous with respect to the reals, nor are such notions as "rationals" defined. This makes sense in terms of data serialization (the driving XML use case) and standard programming languages (where manipulation of values is the driving use case), but OWL is in a very different situation. The primary OWL use case is reasoning about the emptiness (or size) of value spaces, and the definitions provided in the XML Schema spec do not serve this purpose well. Note that I'm not saying XML Schema is a bad spec; merely that it addresses different problems than we have. I strongly encourage the working group to publish a spec which provides for the following types of semantic spaces: 1. A countably infinite, nowhere-dense datatype. I.e. the integers. 2. A countably infinite, dense datatype. I.e. strings. 3. An uncountably infinite, dense, continuous datatype. I.e. the reals. I don't particularly care what each of these three is called; as long as OWL specifies the internal semantics of these three types of spaces, then it's straightforward to "implement" the datatypes users will actually want in terms of them. But, of course, the ability to use XML Schema Datatypes to encode specific values within each of these spaces would be quite convenient---and would use the XML Schema specification for *exactly* what it's good at. -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumThanks for your comments Rob. It occurs to me that both the XML Schema and the OWL working groups are in progress, that this is an issue that touches both groups, that having the specifications be able to be read in conjunction without confusion as to their relationship would be beneficial to overall efforts of the W3C towards harmonization and the implementors and users of those specifications. Perhaps we can take advantage of the felicitous timing to ensure that our respective specifications are consistent with each other by ensuring that terms are used in the same way, if necessary adding clarification, and by attempting to have any additional datatype concepts needed for a good OWL specification be incorporated into the XML Schema specification. Towards the end of understanding the terminology, I've been trying to understand what the value space of XML Schema means, given that it doesn't mean what one would expect in a mathematical sense. Similarly, there seems to be missing an underlying type for the date types - although there is reference to timeOnTimeline, this value type is not surfaced in the type hierarchy. One thought is that whether a correct interpretation is more along the lines of considering the value spaces as data structures. In favor of this interpretation it is clear that floats and integers are distinct and the stated influences - java, sql, machine independent data types. Contrary to this is that it makes it harder to interpret the integers as a restriction of decimals, since representation of arbitrary precision decimals is by a different data structure, and to understand the difference between base64Binary and hexBinary, which are represented on the machine as the same data structure. Another thought is that the value spaces are another aspect of lexical expression. This would account well for there being a difference between base64Binary and hexBinary, but not explain why these are not pattern facet restrictions on string. Finally, I wonder if you have comments on a couple of other aspects of datatypes that appear in XML schema. Specifically, data types that are derived by list and time and date types. Clearly such concepts or similar are relevant to OWL given work on, e.g. workflow, or in spatial reasoning. Where do they fit into your view of OWL class space? -Alan On Jul 4, 2008, at 12:46 PM, Rob Shearer wrote: > This message is in regard to the discussion related to [this](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0101.html > ). > > When I was implementing the Cerebra OWL reasoner, I came to the firm > conclusion that the OWL (1.0) spec was downright broken on this > point, and I fear we're in danger of breaking OWL 2.0 in exactly the > same way. > > Putting aside the issue of whether or not it's possible to use > (only) the XML Schema datatypes to represent meaningful and > implementable OWL datatype value spaces, I expect that there is > consensus that when users were writing `xsd:float` and `xsd:double` > without values in OWL 1.0, what they really meant was "any number". > No user ever intended to restrict the semantic space to a nowhere- > dense number line. If the OWL spec presupposes that most of our > users would a prefer a number line which does not include 1/3, my > choice as an implementor would be to once again ignore the spec and > be intentionally non-compliant. Doing what all my users want and > expect in this case turns out to be way way easier than doing what a > broken spec would require. Any working group who would produce such > a spec would clearly be putting their own interests (ease of spec > authoring and political considerations) above their duty to their > intended users. > > (Note that in the course of the discussion I read on public-owl-wg > the notions of "dense" and "continuous" seem to have become > confused. I think the notion of density is probably the only one > that makes a difference in terms of current OWL semantics, since > number restrictions can cause inconsistencies in non-dense number > lines, but continuity is really what users have in their heads.) > > The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/) is > focused on representing particular values, not on classes of values. > The notion of "value spaces" is used within the spec, but only in > service of representation of values---note that there's not a single > value space mentioned which is continuous with respect to the reals, > nor are such notions as "rationals" defined. This makes sense in > terms of data serialization (the driving XML use case) and standard > programming languages (where manipulation of values is the driving > use case), but OWL is in a very different situation. The primary OWL > use case is reasoning about the emptiness (or size) of value spaces, > and the definitions provided in the XML Schema spec do not serve > this purpose well. > > Note that I'm not saying XML Schema is a bad spec; merely that it > addresses different problems than we have. > > > I strongly encourage the working group to publish a spec which > provides for the following types of semantic spaces: > > 1. A countably infinite, nowhere-dense datatype. I.e. the integers. > > 2. A countably infinite, dense datatype. I.e. strings. > > 3. An uncountably infinite, dense, continuous datatype. I.e. the > reals. > > I don't particularly care what each of these three is called; as > long as OWL specifies the internal semantics of these three types of > spaces, then it's straightforward to "implement" the datatypes users > will actually want in terms of them. But, of course, the ability to > use XML Schema Datatypes to encode specific values within each of > these spaces would be quite convenient---and would use the XML > Schema specification for *exactly* what it's good at. > > -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum[Since this discussion is going to other lists, let me explain that I'm a member of the Schema WG, and I work primarily on datatypes. Although this message was directed from Alan Ruttenberg to Rob Shearer, I'm going to inject comments since it went to the general WG lists as well.] At 4:56 PM -0400 2008-07-04, Alan Ruttenberg wrote: >Thanks for your comments Rob. It occurs to me that both the XML >Schema and the OWL working groups are in progress, that this is an >issue that touches both groups, that having the specifications be >able to be read in conjunction without confusion as to their >relationship would be beneficial to overall efforts of the W3C >towards harmonization and the implementors and users of those >specifications. > >Perhaps we can take advantage of the felicitous timing to ensure >that our respective specifications are consistent with each other by >ensuring that terms are used in the same way, if necessary adding >clarification, and by attempting to have any additional datatype >concepts needed for a good OWL specification be incorporated into >the XML Schema specification. I'll leave comments on this aspect to others who are more up on precise development/publication schedules. >Towards the end of understanding the terminology, I've been trying >to understand what the value space of XML Schema means, given that >it doesn't mean what one would expect in a mathematical sense. I'll have to take exception to that. I'm sure it doesn't mean what you would expect in a mathematical sense. But it does very definitely mean what I would expect in a mathematical sense. (Credentials: Phd, U.C. Berkeley, 1965, primarily in Analysis and Foundations of Mathematics; Assistant Professor of Mathematics and Computer Science, and Associate Professor of Mathematics at various times during my career.) So please don't generalize to an arbitrary "one" and imply that that's the only possible reasonable expectation. >Similarly, there seems to be missing an underlying type for the date >types - although there is reference to timeOnTimeline, this value >type is not surfaced in the type hierarchy. I'd very much like to hear how you'd do this; unlike the number datatypes, where I could envisage how to pull them together, I can't envisage a reasonable way for all the d/t datatypes to be derived from a universal one. And I did try. >One thought is that whether a correct interpretation is more along >the lines of considering the value spaces as data structures. I'm curious what you mean by "data structure" here. Reading on, it sounds like you mean various possible machine representations of the values. Let me assure you that that's not what is meant by a value space. In fact, I can think of several extremely different-appearing representations of, for example, the integers, that are nonetheless isomorphic. They are all potential machine representations of the values for the same datatype. XSD does not have anything to say about machine representations, except to say that if an implementation has two different representations of the same value, it is obligated to generally treat them the same. >Another thought is that the value spaces are another aspect of >lexical expression. This would account well for there being a >difference between base64Binary and hexBinary, but not explain why >these are not pattern facet restrictions on string. base64Binary and hexBinary are different because they use entirely different lexical mappings. Different lexical mappings mean different datatypes. Except for our decision to paint the two value spaces different colors so we can tell them apart, the value spaces of these two datatypes are the same. (In this case, I suspect that the obvious equality across these two value spaces would not bother anyone. But we weren't going to do that for some obvious datatype pairs and not others. They are not pattern-facet restrictions on string for the same reason that float and double are not pattern-facet restrictions on string. The value spaces are different. String values are character strings; the xxxBinary values are bit-strings. Bits aren't characters. >Finally, I wonder if you have comments on a couple of other aspects >of datatypes that appear in XML schema. Specifically, data types >that are derived by list and time and date types. Clearly such >concepts or similar are relevant to OWL given work on, e.g. >workflow, or in spatial reasoning. Where do they fit into your view >of OWL class space? You both should definitely look up the latest Public Working Draft (a Last Call draft) for XSD. I think it might clear up some of the questions, hopefully providing a better understanding or description of list datatypes and date/time datatypes. -- Dave Peterson SGMLWorks! davep@... |
|
|
|
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumOn Jul 5, 2008, at 1:58 AM, Dave Peterson wrote: >> Towards the end of understanding the terminology, I've been trying >> to understand what the value space of XML Schema means, given that >> it doesn't mean what one would expect in a mathematical sense. > > I'll have to take exception to that. I'm sure it doesn't mean what > you > would expect in a mathematical sense. But it does very definitely > mean > what I would expect in a mathematical sense. (Credentials: Phd, U.C. > Berkeley, 1965, primarily in Analysis and Foundations of Mathematics; > Assistant Professor of Mathematics and Computer Science, and Associate > Professor of Mathematics at various times during my career.) So > please > don't generalize to an arbitrary "one" and imply that that's the only > possible reasonable expectation. I'm sorry for the overgeneralization and didn't mean to insult. It's just that as much as I think about it, I can't understand the idea that the value space of floats and the value space of decimal are disjoint. Fundamentally these represent some of the same real numbers and this isn't reflected in the spec. In addition, many numbers that can be finitely expressed and be calculated with find no place in *any* of the value spaces, e.g. 1/3. It is this sense of "mathematical" that I was referring to. I have looked at the functions and operators specification. I understand how you come to your previous points about different choice of equality, as the specification promotes decimal to float. As a matter of clarity, I probably would have called the comparison not "equality" but "equality as floats" and "equality as doubles". Considering the definition of equality, I would ask: Is that something someone would do if they weren't constrained to use floating point numbers? It is a perfectly reasonably thing to do if you don't have have any more expressive numeric types, as it is a perfectly reasonable thing to do to throw an exception when a multiplication of integers exceeds the limit of the integer datatype. However we now have libraries that support arbitrary precision integer and rational numbers. Floats can be promoted to the latter without loss of precision, as can decimal. Again, no addressing of this in the spec, nor any theoretical justification of how it is even possible to do an exact (sometimes) promotion of a decimal value to a float value if their value spaces are disjoint. Maybe there's a way to make sense of this. I'm trying. To offer a concrete suggestion (I'll get to putting something into the bug tracker...), and speaking to the possibility of harmonizing the OWL specification and the XSD specification, something to consider would be to add xsd:real and xsd:rational. This could at least prevent the (strong) possibility of OWL defining those types itself. Personally, I think it would be cleaner to have all the numeric types handled in the XML Schema documents. I realize that this might be a bit of work, but at least that work would have interested parties from both the OWL and XSD WGs. I'd also consider reviewing the part of the spec that says: > Should a derivation be made using a derivation mechanism that > removes ·lexical representations· from the·lexical space· to the > extent that one or more values cease to have any ·lexical > representation·, then those values are dropped from the ·value space·. > > I've still no understanding of why that is a desirable thing to do, and we've discussed aspects that some might consider undesirable. >> Similarly, there seems to be missing an underlying type for the >> date types - although there is reference to timeOnTimeline, this >> value type is not surfaced in the type hierarchy. > > I'd very much like to hear how you'd do this; unlike the number > datatypes, > where I could envisage how to pull them together, I can't envisage a > reasonable way for all the d/t datatypes to be derived from a > universal > one. And I did try. I had in mind subtyping the dates into those with and those without a timezone, and having each descend from a separate timeOnTimeline. >> One thought is that whether a correct interpretation is more along >> the lines of considering the value spaces as data structures. > > I'm curious what you mean by "data structure" here. Reading on, it > sounds > like you mean various possible machine representations of the values. > Let me assure you that that's not what is meant by a value space. In > fact, I can think of several extremely different-appearing > representations > of, for example, the integers, that are nonetheless isomorphic. They > are all potential machine representations of the values for the same > datatype. XSD does not have anything to say about machine > representations, > except to say that if an implementation has two different > representations > of the same value, it is obligated to generally treat them the same. Again, it is trying to wrestle with the disjointness of float and decimal value spaces that is leading me to look for some explanation. While XSD does not explicitly speak about machine representation, that does not mean that those concepts do not (overly) influence the specification. To explain myself a bit further on this kind of analysis - I spend a lot of time developing ontologies, and searching for unspoken, but operant, knowledge and constraint and then exposing it is a common aspect of this work. What I specifically mean by data structure in this case was the little data structure that is a floating point number, composed of part: integer mantissa, integer exponent, sign bit, +some symbols encodings. I compared that to integer which doesn't have these parts. However decimal seems to necessarily be composed of different kinds of parts. >> Another thought is that the value spaces are another aspect of >> lexical expression. This would account well for there being a >> difference between base64Binary and hexBinary, but not explain why >> these are not pattern facet restrictions on string. > > base64Binary and hexBinary are different because they use entirely > different > lexical mappings. Different lexical mappings mean different > datatypes. But not disjoint value spaces. > Except for our decision to paint the two value spaces different colors > so we can tell them apart, Why would one want to tell them apart? Why not consider a single lexical mapping that has a disjunction? More than one lexical can map to the same float, more than one lexical representation of a bit sequence can map to it. > the value spaces of these two datatypes are > the same. (In this case, I suspect that the obvious equality across > these two value spaces would not bother anyone. But we weren't going > to do that for some obvious datatype pairs and not others. It's the obviousness, and the spec's decision to not respect that obviousness that is my concern. > They are not pattern-facet restrictions on string for the same > reason that > float and double are not pattern-facet restrictions on string. The > value > spaces are different. String values are character strings; the > xxxBinary > values are bit-strings. Bits aren't characters. Fair enough. My mistake. >> Finally, I wonder if you have comments on a couple of other >> aspects of datatypes that appear in XML schema. Specifically, data >> types that are derived by list and time and date types. Clearly >> such concepts or similar are relevant to OWL given work on, e.g. >> workflow, or in spatial reasoning. Where do they fit into your >> view of OWL class space? > > You both should definitely look up the latest Public Working Draft (a > Last Call draft) for XSD. I think it might clear up some of the > questions, > hopefully providing a better understanding or description of list > datatypes and date/time datatypes. Have been. Will be doing more. -Alan |
|
|
RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum> >Similarly, there seems to be missing an underlying type for the date > >types - although there is reference to timeOnTimeline, this > value type > >is not surfaced in the type hierarchy. > > I'd very much like to hear how you'd do this; unlike the > number datatypes, where I could envisage how to pull them > together, I can't envisage a reasonable way for all the d/t > datatypes to be derived from a universal one. And I did try. Since the types date, time, dateTime, gYear, gYearMonth, gMonth, gMonthDay, and gDay are disjoint in both their value spaces and lexical spaces, I would have thought it quite easy to define a primitive type that is essentially the union of all of these (it might or might not be abstract), and derive these 8 types from this new type by restriction. Where exactly is the difficulty? The QT operations on dates and times could be greatly simplified if this were done (well, perhaps not retrospectively...) > > base64Binary and hexBinary are different because they use > entirely different lexical mappings. Different lexical > mappings mean different datatypes. This is certainly an unfortunate feature of the system. Clearly one would like all operations defined on one of these types to be equally applicable to the other. Having two different external representations of the values is really a very weak justification for making them different types. Of course it's too late to change this; but I'm sure it could have been done better. I would hope that if we introduced hexadecimal notation as an alternative lexical representation of integers we would find some way of doing it that didn't involve introducing a new primitive type. Michael Kay http://www.saxonica.com/ |
|
|
|
|
|
RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumAt 12:41 PM +0100 2008-07-05, Michael Kay wrote: >Since the types date, time, dateTime, gYear, gYearMonth, gMonth, gMonthDay, >and gDay are disjoint in both their value spaces and lexical spaces, I would >have thought it quite easy to define a primitive type that is essentially >the union of all of these (it might or might not be abstract), and derive >these 8 types from this new type by restriction. Where exactly is the >difficulty? I don't see that moments in time, segments of time, and repeating intervals make up a sensible datatype. That's my particular problem with the idea. E.g., how does one define order? Is 14:00:00 less than or equal to 1997? However, it could be done, even if the value space seemed to contain apples and oranges, so to speak. Just as the anySimpleType and anyAtomicType are artificially constructed datatypes. Why hasn't it been suggested before? I'm curious how the simplification would be effected for QT. -- Dave Peterson davep@... |
|
|
RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum> > I don't see that moments in time, segments of time, and > repeating intervals make up a sensible datatype. That's my > particular problem with the idea. Well, one can certainly conceive of a generalization of these types that is a three-dimensional space whose axes are the start instant (perhaps unknown), the duration (perhaps zero), and the interval between repeats (perhaps infinite). Alternatively, and perhaps more conveniently, you can think of it as a seven-dimensional space containing year, month, day, hour, minute, second, and timezone-offset, allowing components at either end to be omitted, where the absence of a high-order component indicates a repeating interval and the absence of a low-order component indicates a time span. E.g., how does one define order? Is 14:00:00 less than or equal to 1997? You could define an ordering (if you wanted to) by filling in the gaps, treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as 1997-01-01T00:00:00. Or you could say that the new primitive type is unordered, only the subtypes are ordered, as we do with the two duration subtypes. > > I'm curious how the simplification would be effected for QT. Difficult to do retrospectively, but with such a type, instead of XSLT defining three functions format-date, format-time, and format-dateTime, it could have defined a single function which would work perfectly well on all eight types, as well as on other logically-consistent subtypes like gHourMinute. Michael Kay http://www.saxonica.com/ |
|
|
RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumAt 10:13 AM +0100 2008-07-06, Michael Kay wrote: > > >> I don't see that moments in time, segments of time, and >> repeating intervals make up a sensible datatype. That's my >> particular problem with the idea. > >Well, one can certainly conceive of a generalization of these types that is >a three-dimensional space whose axes are the start instant (perhaps >unknown), the duration (perhaps zero), and the interval between repeats >(perhaps infinite). Alternatively, and perhaps more conveniently, you can >think of it as a seven-dimensional space containing year, month, day, hour, >minute, second, and timezone-offset, allowing components at either end to be >omitted, where the absence of a high-order component indicates a repeating >interval and the absence of a low-order component indicates a time span. > >E.g., how does one define order? Is 14:00:00 less than or equal to 1997? > >You could define an ordering (if you wanted to) by filling in the gaps, >treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as >1997-01-01T00:00:00. Or you could say that the new primitive type is >unordered, only the subtypes are ordered, as we do with the two duration >subtypes. >> >> I'm curious how the simplification would be effected for QT. > >Difficult to do retrospectively, but with such a type, instead of XSLT >defining three functions format-date, format-time, and format-dateTime, it >could have defined a single function which would work perfectly well on all >eight types, as well as on other logically-consistent subtypes like >gHourMinute. Good ideas all. Fodder for Schema 2.0, I'd say. It takes time to think these things out; equality didn't diverge from identity in 1.0 because we didn't have time to think out the ramifications. Sigh-- even standards creation is a publish-or-perish world, and if a version of the standard doesn't get out the door in a reasonable time, even if the possible improvements haven't been thought out yet, the creating standards group finds its resources gone and no standard at all gets out. One does the best one can, and hopes one hasn't closed off too many useful possibilities for the next round--or left things totally screwed up by not closing up some loopholes that leave the standard useless. A fine balancing act. (This, of course, is preaching to the choir WRT Mike Kay himself; he's been involved in the production of at least several standards.) -- Dave Peterson SGMLWorks! davep@... |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumOn Jul 5, 2008, at 1:04 PM, Rob Shearer wrote: >>> I'm providing you with my experience: every user I've ever spoken >>> to about this topic has wanted the real number line. >>> They are used to using the xsd datatypes `float` and `double` to >>> represent number values, so they use these without values in OWL >>> to mean "some number". >> >> Do they mean bounded numbers? (i.e. with min and max sizes?) Do >> they distinguish between double and float? Do they care about >> NaNs? (Alan's users care about the latter.) > > Whether it's "forall R > 1.0^^xsd:float" or "forall R `xsd:float`" > they seem to intend a dense number line. So you had user defined restrictions on floats, interesting. > In the first case `float` is just the easiest way to specify the > value; in the second you can certainly argue that they should have > used `decimal`...but that's a pointless argument because my > reasoner didn't really support decimal. That's interesting. I think part of what we need to is select a set of sane datatypes to require. String, Integer, reals seem reasonable. >>> My experience is that the use of xsd datatypes as value spaces in >>> OWL 1.0 causes users to write what they don't mean. >> >> For me, this would suggest removing them or enforcing them more >> clearly. > > I'd suggest removing them. That's where I'm heading too. >>> My experience is that *every* ontology using `xsd:float` and >>> `xsd:double` without values would be better off using >>> `xsd:decimal`, but that the user intent was "some real >>> number" (and I should note that I'm against requiring support for >>> `xsd:decimal` values). >> >> Values? Or the datatype? In OWL 1, all these types were optional >> and poorly speced and had no documentation whatsoever. Part of the >> goal here is to spec well and document clearly any types we require. > > I would like to use doubles internally to represent points on the > real number line. For what lexical syntax? > Some homogeneous mix of internal representations is a pain. And I > seriously doubt that many users really care about the extra > representation power of `decimal`. It makes sense as an optional > feature reasoners can support, but it seems completely unnecessary > to require it in the spec---it's exactly the sort of thing I'd put > off implementing indefinitely under users asked for it. > > The reason `decimal` keeps coming up is just that it's dense. That's true. But there are several issues floating about, including the possibility of interaction between floats and cardinality. It seems to me that for most users, that will be a rare occurrence, even accidently. It certainly requires ranges of floats (since it's unlikely that the cardinalities required to cause a problem would be feasible anyway). E.g., if we had unbounded binary numbers then such floats would be no harder than integers. > So are we using the xsd spec as an excuse to conflate density with > complex internal representations? I don't think so. [snip] > Referring any user over to that spec to understand value spaces is > obnoxious and counter-productive: We definitely don't intend to do that, I hope. Part of our current effort is to make sure we carefully document the types we require and/ or sanction. > even WG members seem to be having trouble grokking it. (And bravo > to anyone making the pedantic point that a particular value is a > degenerate value space.) > > I contend that OWL users only want a tiny tiny number of different > value spaces to play with: integers, strings, and reals. I certainly agree that these are key. I think the group agrees too. The other types are something of a legacy. > It is possible, however, they they will want a larger number of > ways to lexically represent particular values within these three > spaces. This wouldn't surprise me at all. > Most importantly, I do not think there is necessarily a direct > correlation between the lexical representations used to represent > particular values and the value spaces in which those particular > values live. I.e. users want to be able to specify particular > values within the `real` value space using `xsd:float`, You mean the type name or the lexical syntax (e.g., "12.78e-2")? I'm personally more comfortable with allowing the latter than pushing "xsd:float" as a synonym for the real value space. Your milage obviously varies. > but they do *not* have any interest in use of the `xsd:float` value > space. Some do at least to the extent of wanting NaN (and perhaps -0). I'd personally prefer not to shove them into the real type (certainly NaN; I suppose we could make our reals the affine reals and handle +inf). > Thus we've got two orthogonal concepts which happen to coincide for > strings and integers but not for real numbers. > > My proposed solution would be to use brand-new OWL names for all > value spaces, but use xsd syntax to specify particular values. Could you say what you think the lexical space of the reals should include? At least, as a first cut? (It seems decimal, scientific, and rational notation would all be useful, the first two for common ways of writing and the third for full coverage of the rationals.) [snipped lots of useful details] Thanks very much for those. I find them extremely helpful. > Thanks for the feedback. >> >> Cheers, >> Bijan. > > And if you're going to request further comment from a member of the > public, could you please do it on a list to which the public can > post? Shifting back to the WG list excludes me from comment. D'oh! Sorry. That was an accident. My apologies. > (Which is fine if you don't address questions directly to me.) Thanks again for the discussion. Cheers, Bijan. |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum>> Most importantly, I do not think there is necessarily a direct
>> correlation between the lexical representations used to represent >> particular values and the value spaces in which those particular >> values live. I.e. users want to be able to specify particular >> values within the `real` value space using `xsd:float`, > > You mean the type name or the lexical syntax (e.g., "12.78e-2")? XSD offers a lexical syntax for points that happen to lie on the real number line---that's what I suggest using it for. The easiest approach is that xsd names on their own are not valid "datatypes"; particular values encoded using xsd, however, are (because particular values are single-element value spaces). > I'm personally more comfortable with allowing the latter than > pushing "xsd:float" as a synonym for the real value space. Your > milage obviously varies. > >> but they do *not* have any interest in use of the `xsd:float` value >> space. > > Some do at least to the extent of wanting NaN (and perhaps -0). I'd > personally prefer not to shove them into the real type (certainly > NaN; I suppose we could make our reals the affine reals and handle > +inf). NaN. My principled stand is that it's inconsistent (a value space of size zero), but I'd definitely want to analyze the use cases to see who loses important functionality from that decision. But my main point is that users have no interest in the "holes" introduced by the xsd:float value space: providing them access to a value space of numbers representable in float representation is not useful, and could lead to lots of confusion, particularly if users could easily use such a space "by accident". That's the situation we've fallen into with floats in OWL 1.0. >> Thus we've got two orthogonal concepts which happen to coincide for >> strings and integers but not for real numbers. >> >> My proposed solution would be to use brand-new OWL names for all >> value spaces, but use xsd syntax to specify particular values. > > Could you say what you think the lexical space of the reals should > include? I don't know what you mean by "lexical space of the reals". I don't propose defining the reals lexically; I propose defining the value space mathematically. But implementations should allow users to specify particular points in that value space using the lexical representations for `xsd:float` and `xsd:int` values. I expect most implementations will also support points represented as `xsd:double` and `xsd:long` as well. I do *not* think a conformant implementations should have to deal with arbitrary points represented as `xsd:decimal` (since the vast majority of users don't need the extra representational power, and there is substantial implementation burden and performance penalty for dealing with such values correctly). > At least, as a first cut? (It seems decimal, scientific, and > rational notation would all be useful, the first two for common ways > of writing and the third for full coverage of the rationals.) The WG should consider that some implementations might allow lots of xsd syntaxes but lose precision on some of them (allow use of `xsd:decimal` in ontology files for user convenience, but convert them to floats during parsing)---thus a vocabulary for what it means to "support" a numeric xsd type for particular values would be useful. My big concern here is that an ontology will be developed and tested with a reasoner with "full" `xsd:decimal` support but then when it's used with an implementation with "imprecise" `xsd:decimal` support everything goes pear-shaped. Spitting out warnings during parsing isn't a great solution... And of course some implementations might offer additional value spaces as well, but I'd like the spec to make it very clear that this is a very different thing than the above. For one thing, I'd suggest outlawing any use of names within the xsd namespace for value spaces, even spaces implementors have added as extensions. "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for points on the real number line and nothing else. -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumFYI When I designed the mKR language, I purposely avoided placing any constraints on the space,time,view specification of context. This permits the user to choose whatever level of detail is appropriate in a given situation. The resulting descriptions are always useful, and sometimes just plain fun! Some of my specifications: space, time = here, now time = past, present, future time = yesterday, today space = my house, the store view = Aristotle, feminist view = RDF, OWL, mKR, CycL, Amazon, Google Dick ----- Original Message ----- From: "Dave Peterson" <davep@...> To: "Michael Kay" <mike@...>; "'Alan Ruttenberg'" <alanruttenberg@...>; "'Rob Shearer'" <rob.shearer@...> Cc: <public-webont-comments@...>; <public-owl-wg@...>; <www-xml-schema-comments@...> Sent: Sunday, July 06, 2008 6:51 AM Subject: RE: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum > > At 10:13 AM +0100 2008-07-06, Michael Kay wrote: >> > >>> I don't see that moments in time, segments of time, and >>> repeating intervals make up a sensible datatype. That's my >>> particular problem with the idea. >> >>Well, one can certainly conceive of a generalization of these types that >>is >>a three-dimensional space whose axes are the start instant (perhaps >>unknown), the duration (perhaps zero), and the interval between repeats >>(perhaps infinite). Alternatively, and perhaps more conveniently, you can >>think of it as a seven-dimensional space containing year, month, day, >>hour, >>minute, second, and timezone-offset, allowing components at either end to >>be >>omitted, where the absence of a high-order component indicates a repeating >>interval and the absence of a low-order component indicates a time span. >> >>E.g., how does one define order? Is 14:00:00 less than or equal to 1997? >> >>You could define an ordering (if you wanted to) by filling in the gaps, >>treating 14:00:00 as say 0000-01-01T14:00:00 and 1997 as >>1997-01-01T00:00:00. Or you could say that the new primitive type is >>unordered, only the subtypes are ordered, as we do with the two duration >>subtypes. >>> >>> I'm curious how the simplification would be effected for QT. >> >>Difficult to do retrospectively, but with such a type, instead of XSLT >>defining three functions format-date, format-time, and format-dateTime, it >>could have defined a single function which would work perfectly well on >>all >>eight types, as well as on other logically-consistent subtypes like >>gHourMinute. > > Good ideas all. Fodder for Schema 2.0, I'd say. It takes time to > think these things out; equality didn't diverge from identity in 1.0 > because we didn't have time to think out the ramifications. Sigh-- > even standards creation is a publish-or-perish world, and if a version > of the standard doesn't get out the door in a reasonable time, even > if the possible improvements haven't been thought out yet, the > creating standards group finds its resources gone and no standard > at all gets out. > > One does the best one can, and hopes one hasn't closed off too many > useful possibilities for the next round--or left things totally > screwed up by not closing up some loopholes that leave the standard > useless. A fine balancing act. > > (This, of course, is preaching to the choir WRT Mike Kay himself; > he's been involved in the production of at least several standards.) > -- > Dave Peterson > SGMLWorks! > > davep@... > > http://mKRmKE.org/ Ayn Rand do speak od mKR done; knowledge := man do identify od existent done; knowledge haspart proposition list; mKE do enhance od "Real Intelligence" done; |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumOn Jul 6, 2008, at 8:07 PM, Rob Shearer wrote: >>> Most importantly, I do not think there is necessarily a direct >>> correlation between the lexical representations used to represent >>> particular values and the value spaces in which those particular >>> values live. I.e. users want to be able to specify particular >>> values within the `real` value space using `xsd:float`, >> >> You mean the type name or the lexical syntax (e.g., "12.78e-2")? > > XSD offers a lexical syntax for points that happen to lie on the > real number line It offers several and we're free to define one for owl:real. If we use any decimal notation, we have exactness problems (e.g., 1/3), but decimal is very user friendly. So, I was thinking that the valid syntax for a real would be decimal floating points and ratios of integers. We could include scientific notation as well. > ---that's what I suggest using it for. The easiest approach is that > xsd names on their own are not valid "datatypes"; particular values > encoded using xsd, however, are (because particular values are > single-element value spaces). > >> I'm personally more comfortable with allowing the latter than >> pushing "xsd:float" as a synonym for the real value space. Your >> milage obviously varies. >> >>> but they do *not* have any interest in use of the `xsd:float` >>> value space. >> >> Some do at least to the extent of wanting NaN (and perhaps -0). >> I'd personally prefer not to shove them into the real type >> (certainly NaN; I suppose we could make our reals the affine reals >> and handle +inf). > > I'd endorse including only one zero, but I agree there's an issue > with NaN. And the infinities, though we could always go for the affine real line. > My principled stand is that it's inconsistent (a value space of > size zero), but I'd definitely want to analyze the use cases to see > who loses important functionality from that decision. > > But my main point is that users have no interest in the "holes" > introduced by the xsd:float value space: providing them access to a > value space of numbers representable in float representation is not > useful, and could lead to lots of confusion, particularly if users > could easily use such a space "by accident". Well, you'll get exactness holes with binary or decimal notation, regardless of density issues. > That's the situation we've fallen into with floats in OWL 1.0. > >>> Thus we've got two orthogonal concepts which happen to coincide >>> for strings and integers but not for real numbers. >>> >>> My proposed solution would be to use brand-new OWL names for all >>> value spaces, but use xsd syntax to specify particular values. >> >> Could you say what you think the lexical space of the reals should >> include? > > I don't know what you mean by "lexical space of the reals". XSD datatypes have a lexical space (e.g., the syntax) and a value space. You are suggesting, I thought, that we adopt a value space that is the reals and something about using xsd syntax (i.e., lexical spaces) for the syntax. XSD offers exact syntax only for binary and decimals (I believe it's exact for binary). I was wondering what sort of lexical space you want. > I don't propose defining the reals lexically; Sure. > I propose defining the value space mathematically. Well, of course. But that's what XSD does as well. The decimals are a well defined mathematical set. > But implementations should allow users to specify particular points > in that value space using the lexical representations for > `xsd:float` and `xsd:int` values. So you want a very broad lexical space for our real type, i.e., "1", "1.0", and "12.78e-2". If we want exactness for the rationals, we need either to allow repeating (e.g., 0.333repeating) (usually done with a macron) or fraction syntax (e.g., 1/3). > I expect most implementations will also support points represented > as `xsd:double` and `xsd:long` as well. You mean their syntax, i.e., their lexical space. (Sorry for using the XSD terminology, but I think it's a bit clearer if we stick to it for the moment.) > I > do *not* think a conformant implementations should have to deal > with arbitrary points represented as `xsd:decimal` (since the vast > majority of users don't need the extra representational power, and > there is substantial implementation burden and performance penalty > for dealing with such values correctly). Given that more and more languages (e.g., Java) now bundle a decimal type with their core libraries, I'm not so clear on the first. I'd like to hear more about the second. >> At least, as a first cut? (It seems decimal, scientific, and >> rational notation would all be useful, the first two for common >> ways of writing and the third for full coverage of the rationals.) > > The WG should consider that some implementations might allow lots > of xsd syntaxes but lose precision on some of them (allow use of > `xsd:decimal` in ontology files for user convenience, but convert > them to floats during parsing) Obviously, this can cause quite serious interoperability problems. Some I'm inclined against it on first blush. > ---thus a vocabulary for what it means to "support" a numeric xsd > type for particular values would be useful. This is what we're after. Anything we spec will be tightly specced. At the moment, we only have required and optional as modalities of support. I think supporting various levels of precision (or variant mapping) would be quite hard to understand. > My big concern here is that an ontology will be developed and > tested with a reasoner with "full" `xsd:decimal` support but then > when it's used with an implementation with "imprecise" > `xsd:decimal` support everything goes pear-shaped. That would be bad :) There could be subtler problems if people mapped decimal syntax to binary in variant ways (i.e., which float do you take 0.1 to?) > Spitting out warnings during parsing isn't a great solution... > > And of course some implementations might offer additional value > spaces as well, but I'd like the spec to make it very clear that > this is a very different thing than the above. For one thing, I'd > suggest outlawing any use of names within the xsd namespace for > value spaces, even spaces implementors have added as extensions. > "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for > points on the real number line and nothing else.\ This doesn't seem likely. Existing implementations already do different things with different xsd types. It'll be very hard to get buy in from the RDF community. It seems like a more likely strategy is to fix a (required) set of OWL types (or core types) which are easy to understand and robust with respect to intuitive behavior, and leave the more specialized types for future people to standardize. One this model, users would just have to decide between integers and reals. We could have quite a wide lexical space for reals (and even for integers, i.e., allow 1.0 to mean the integer 1). But "0.1"^^xsd:float would not be required, but also we wouldn't change the meaning along the lines you suggest (we'd just be silent about it). It's fairly simple to migrate old ontologies to the new one with a simple converter. If enough implementations did it silently, that would be information for a future group. Thanks again. Cheers, Bijan. |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum>> XSD offers a lexical syntax for points that happen to lie on the
>> real number line > > It offers several and we're free to define one for owl:real. If we > use any decimal notation, we have exactness problems (e.g., 1/3), > but decimal is very user friendly. So, I was thinking that the valid > syntax for a real would be decimal floating points and ratios of > integers. We could include scientific notation as well. Why on earth would the OWL group come up with their own syntax for encoding numbers? The XSchema guys have already done that, and people have implemented parsers for their spec. If there's going to be a syntax for rationals or algebraics, then that seems to be right up their alley. >> But my main point is that users have no interest in the "holes" >> introduced by the xsd:float value space: providing them access to a >> value space of numbers representable in float representation is not >> useful, and could lead to lots of confusion, particularly if users >> could easily use such a space "by accident". > > Well, you'll get exactness holes with binary or decimal notation, > regardless of density issues. I thought I had made my proposal clear on this: the value space does not have holes. The representations supported for particular values are not sufficient to address all the points in that space, but the space itself does *not* have holes. >> I don't know what you mean by "lexical space of the reals". > > XSD datatypes have a lexical space (e.g., the syntax) and a value > space. You are suggesting, I thought, that we adopt a value space > that is the reals and something about using xsd syntax (i.e., > lexical spaces) for the syntax. For the syntax of particular values. I keep trying to stress that values spaces should be kept separate from the syntax used for particular values. > XSD offers exact syntax only for binary and decimals (I believe it's > exact for binary). I was wondering what sort of lexical space you > want. XSD offers a well-defined mapping from lexical representation to IEEE floats. XSD defines an *exact* value for each valid lexical representaion. You may not like the way the mapping is defined (because the value of "1.1e0^^xsd:float" on the real number line is not equal to the value of "1.1^^xsd:decimal"), but there is no imprecision whatsoever about what each string represents. I am satisfied with the work the XSchema group did on floating-point lexical representations. >> But implementations should allow users to specify particular points >> in that value space using the lexical representations for >> `xsd:float` and `xsd:int` values. > > So you want a very broad lexical space for our real type, i.e., "1", > "1.0", and "12.78e-2". No. I want `real` to be a value space with no lexical connotations. I want to be able to specify a particular point in this value space using a string such as "1.0e0^^xsd:float". The XSD lexical forms are not "the lexical space for reals". There is no such thing as "the lexical space for reals". There is such a thing as "the space of lexical representations which a conformant implementation must support for particular values in the real value space", but this space is much smaller than the real value space. > If we want exactness for the rationals, we need either to allow > repeating (e.g., 0.333repeating) (usually done with a macron) or > fraction syntax (e.g., 1/3). I don't intend to support exactness for rationals. A conformant implementation should only be required to provide exact support for `xsd:int` and `xsd:float` values. >> I expect most implementations will also support points represented >> as `xsd:double` and `xsd:long` as well. > > You mean their syntax, i.e., their lexical space. Supporting these syntaxes means that reasoners must also support reasoning with the particular values representable in those syntaxes. Support for additional syntaxes does not change the underlying semantics of the real number line, but it might make implementation of those semantics a bit harder. > (Sorry for using the XSD terminology, but I think it's a bit clearer > if we stick to it for the moment.) > >> I >> do *not* think a conformant implementations should have to deal >> with arbitrary points represented as `xsd:decimal` (since the vast >> majority of users don't need the extra representational power, and >> there is substantial implementation burden and performance penalty >> for dealing with such values correctly). > > Given that more and more languages (e.g., Java) now bundle a decimal > type with their core libraries, I'm not so clear on the first. it is the flagship "you only ever need one language" proposal. And even in super-OO Java you have to program differently if you're going to play with polymorphic numbers than you would if you stuck to ints and floats. I'd like to write a distributed OWL reasoner in Erlang. But Javascript and C are perhaps more persuasive counterexamples to your argument. > I'd like to hear more about the second. The most efficient bignum and decimal libraries are an order of magnitude slower than corresponding int and float calculations. Hardware is good with ints and floats. >> ---thus a vocabulary for what it means to "support" a numeric xsd >> type for particular values would be useful. > > This is what we're after. Anything we spec will be tightly specced. > At the moment, we only have required and optional as modalities of > support. I think supporting various levels of precision (or variant > mapping) would be quite hard to understand. But presumably you're making clear that implementations which implement some "optional" functionality, but do so in a way which contradicts the optional semantics, are non-compliant. If so, then specifying what support for additional lexical representations means (i.e. exact) would make clear that a product which parsed `xsd:decimal` but internally converted to floating point would not "support `xsd:decimal`" by the terms of the OWL 2.0 spec. The implementors could always claim "partial support", however. > One this model, users would just have to decide between integers and > reals. We could have quite a wide lexical space for reals (and even > for integers, i.e., allow 1.0 to mean the integer 1). I'm getting really confused what you're talking about---constants appearing in XML and RDF OWL 2.0 files should be typed; there's no need at all to guess the type based on syntax. And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly the same point on the real number line. > But "0.1"^^xsd:float would not be required, but also we wouldn't > change the meaning along the lines you suggest (we'd just be silent > about it). It's fairly simple to migrate old ontologies to the new > one with a simple converter. If enough implementations did it > silently, that would be information for a future group. No idea what this means. But I'm guessing I disagree with it. -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumOn Jul 6, 2008, at 10:55 PM, Rob Shearer wrote: >>> XSD offers a lexical syntax for points that happen to lie on the >>> real number line >> >> It offers several and we're free to define one for owl:real. If we >> use any decimal notation, we have exactness problems (e.g., 1/3), >> but decimal is very user friendly. So, I was thinking that the >> valid syntax for a real would be decimal floating points and >> ratios of integers. We could include scientific notation as well. > > Why on earth would the OWL group come up with their own syntax for > encoding numbers? I'm presuming we're sticking with the basic xsd framework. So types have a lexical space and a values space. So, owl:real has a value space of the reals. But what should the lexical space be? I'd propose that at least the union of the xsd numeric types lexical spaces be the lexical space for our new type. I would add additional syntax for exact rationals (such as 1/3). The first part is isomorphic to your proposal about xsd syntax, I believe. > The XSchema guys have already done that, and people have > implemented parsers for their spec. If there's going to be a syntax > for rationals or algebraics, then that seems to be right up their > alley. They don't seem interested, alas. >>> But my main point is that users have no interest in the "holes" >>> introduced by the xsd:float value space: providing them access to >>> a value space of numbers representable in float representation is >>> not useful, and could lead to lots of confusion, particularly if >>> users could easily use such a space "by accident". >> >> Well, you'll get exactness holes with binary or decimal notation, >> regardless of density issues. > > I thought I had made my proposal clear on this: the value space > does not have holes. Sure. > The representations supported for particular values are not > sufficient to address all the points in that space, but the space > itself does *not* have holes. I just meant things that you can't write down 1/3 in decimal. That's all. >>> I don't know what you mean by "lexical space of the reals". >> >> XSD datatypes have a lexical space (e.g., the syntax) and a value >> space. You are suggesting, I thought, that we adopt a value space >> that is the reals and something about using xsd syntax (i.e., >> lexical spaces) for the syntax. > > For the syntax of particular values. I keep trying to stress that > values spaces should be kept separate from the syntax used for > particular values. Sure. But that's true in XSD as well. From what I can tell, you want all the literals that have "xsd:float" (to pick an example) to map to (a subset) of the reals (as the value space) and constrain/enable certain syntax. So "1.0"^^xsd:float would be a syntax error. >> XSD offers exact syntax only for binary and decimals (I believe >> it's exact for binary). I was wondering what sort of lexical space >> you want. > > XSD offers a well-defined mapping from lexical representation to > IEEE floats. Yes. I just hadn't checked the spec, hence my hesitation. > XSD defines an *exact* value for each valid lexical representaion. > You may not like the way the mapping is defined (because the value > of "1.1e0^^xsd:float" on the real number line is not equal to the > value of "1.1^^xsd:decimal"), No that's fine. > but there is no imprecision whatsoever about what each string > represents. You've got the wrong string. I only hedged because I hadn't looked and I don't like to speak with certainy without looking. My point was only that there are numbers which can not be exactly represented in binary or in decimal. > I am satisfied with the work the XSchema group did on floating- > point lexical representations. > >>> But implementations should allow users to specify particular >>> points in that value space using the lexical representations for >>> `xsd:float` and `xsd:int` values. >> >> So you want a very broad lexical space for our real type, i.e., >> "1", "1.0", and "12.78e-2". > > No. I want `real` to be a value space with no lexical connotations. I'd be surprised if we could get consensus on abandoning the lexical space/value space language and understanding. It's pretty deeply embedded into RDF. > I want to be able to specify a particular point in this value space > using a string such as "1.0e0^^xsd:float". Yeah, I'm kinda against that. But I would support "1.0e0^^owl:real". > The XSD lexical forms are not "the lexical space for reals". There > is no such thing as "the lexical space for reals". Bravo! ;) > There is such a thing as "the space of lexical representations > which a conformant implementation must support for particular > values in the real value space", but this space is much smaller > than the real value space. Our initial proposal for owl:real is to support for syntax, pairs of integers with the second being non-zero (i.e., standard fraction syntax for rationals) and (at least) the algebraic reals for the value space. If you don't have equations or special constants, you can't address the irrationals or transcendentals anyway. We are aiming to support some classes of equation, but only with rational constants. >> If we want exactness for the rationals, we need either to allow >> repeating (e.g., 0.333repeating) (usually done with a macron) or >> fraction syntax (e.g., 1/3). > > I don't intend to support exactness for rationals. A conformant > implementation should only be required to provide exact support for > `xsd:int` and `xsd:float` values. I don't think that would fly. [snip] >> Given that more and more languages (e.g., Java) now bundle a >> decimal type with their core libraries, I'm not so clear on the >> first. > > I'm not sure Java is an example of "more and more languages". In > fact it is the flagship "you only ever need one language" proposal. I picked java because it didn't have it for a long time and now it does. To pick another example, Python now has a bundled decimal class. Both of these are quite recent additions to popular languages. SQL supports it. Visual Basic seems to. > And even in super-OO Java you have to program differently if you're > going to play with polymorphic numbers than you would if you stuck > to ints and floats. > > I'd like to write a distributed OWL reasoner in Erlang. But > Javascript and C are perhaps more persuasive counterexamples to > your argument. Javascript is a bit odd in not supporting integers either :) There are high quality decimal libraries for C++ (e.g., from IBM) and the committee is considering decimal support (<http://open-std.org/JTC1/ SC22/WG21/>) >> I'd like to hear more about the second. > > The most efficient bignum and decimal libraries are an order of > magnitude slower than corresponding int and float calculations. > Hardware is good with ints and floats. Sure, but I wouldn't have thought that this would be a significant factor. Obviously, if the user writes really big or really small numbers, you have to deal with them anyway. If you only have user- defined types (no equations), then the operation (and number there of) is pretty limited (inclusion and cardinality testing). I'm a bit skeptical that it makes a huge practical difference. Perhaps because it doesn't come up too much. Also, perhaps I misrecall, but don't you want arbitrarily sized floats? """For the restriction "forall R `xsd:float`" I simply bounded the real number line at the min and max values of floats. Still a dense, infinite number line, but with bounds. I hated this usage, however, and would prefer if it became illegal.""" So you did bound...but you "hate it"? Which, the bounds? the universal quantifier? Implementations could always throw a warning or error if they hit a too large number. >>> ---thus a vocabulary for what it means to "support" a numeric xsd >>> type for particular values would be useful. >> >> This is what we're after. Anything we spec will be tightly >> specced. At the moment, we only have required and optional as >> modalities of support. I think supporting various levels of >> precision (or variant mapping) would be quite hard to understand. > > But presumably you're making clear that implementations which > implement some "optional" functionality, but do so in a way which > contradicts the optional semantics, are non-compliant. That's always a problem with optional :( > If so, then specifying what support for additional lexical > representations means (i.e. exact) would make clear that a product > which parsed `xsd:decimal` but internally converted to floating > point would not "support `xsd:decimal`" by the terms of the OWL 2.0 > spec. They can convert as long as the observable behavior is the same. > The implementors could always claim "partial support", however. If they are going to vary in observable ways, I would prefer that they would make that clear in documentation and by giving warnings. A "strict" mode would also be quite welcome to me as a user. >> One this model, users would just have to decide between integers >> and reals. We could have quite a wide lexical space for reals (and >> even for integers, i.e., allow 1.0 to mean the integer 1). > > I'm getting really confused what you're talking about---constants > appearing in XML and RDF OWL 2.0 files should be typed; there's no > need at all to guess the type based on syntax. > > And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly > the same point on the real number line. Sure. But I was talking about owl:real. It seems reasonable to allow "1.0e0^^owl:real:" and "1^^owl:real". (xsd:integer could be a subtype of owl:real as well). >> But "0.1"^^xsd:float would not be required, but also we wouldn't >> change the meaning along the lines you suggest (we'd just be >> silent about it). It's fairly simple to migrate old ontologies to >> the new one with a simple converter. If enough implementations did >> it silently, that would be information for a future group. > > No idea what this means. But I'm guessing I disagree with it. Me too :) Cheers, Bijan. |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum>>>> XSD offers a lexical syntax for points that happen to lie on the
NO. There is no lexical space to represent all of the reals. That's
>>>> real number line >>> >>> It offers several and we're free to define one for owl:real. If we >>> use any decimal notation, we have exactness problems (e.g., 1/3), >>> but decimal is very user friendly. So, I was thinking that the >>> valid syntax for a real would be decimal floating points and >>> ratios of integers. We could include scientific notation as well. >> >> Why on earth would the OWL group come up with their own syntax for >> encoding numbers? > > I'm presuming we're sticking with the basic xsd framework. So types > have a lexical space and a values space. the whole point---the reals include lots and lots of values that cannot necessarily be represented lexically. > So, owl:real has a value space of the reals. But what should the > lexical space be? I'd propose that at least the union of the xsd > numeric types lexical spaces be the lexical space for our new type. > I would add additional syntax for exact rationals (such as 1/3). The > first part is isomorphic to your proposal about xsd syntax, I believe. Again, I have no intention of implementing rationals. And if you want to come up with some syntax for encoding rational numbers in XML I suggest you join the XSchema working group, because that's way beyond the OWL charter. >> The XSchema guys have already done that, and people have >> implemented parsers for their spec. If there's going to be a syntax >> for rationals or algebraics, then that seems to be right up their >> alley. > > They don't seem interested, alas. And I very much hope the OWL WG takes that as a sign that they should be even less interested. >>>> XSD datatypes have a lexical space (e.g., the syntax) and a value >>>> space. You are suggesting, I thought, that we adopt a value space >>>> that is the reals and something about using xsd syntax (i.e., >>>> lexical spaces) for the syntax. >> >> For the syntax of particular values. I keep trying to stress that >> values spaces should be kept separate from the syntax used for >> particular values. > > Sure. But that's true in XSD as well. value in the value space of a datatype is denoted by one or more literals in its ·lexical space·." In XSD the lexical and value spaces and very tightly bound together. This should *not* be true for OWL. > From what I can tell, you want all the literals that have > "xsd:float" (to pick an example) to map to (a subset) of the reals > (as the value space) and constrain/enable certain syntax. So > "1.0"^^xsd:float would be a syntax error. That looks like a valid [float][http://www.w3.org/TR/xmlschema-2/ #float] to me. But `"rob"^^xsd:float` looks like a syntax error. Again, all these syntax issues should be deferred to the XSD spec. >> . I want `real` to be a value space with no lexical connotations. > > I'd be surprised if we could get consensus on abandoning the lexical > space/value space language and understanding. It's pretty deeply > embedded into RDF. It's impossible to have a real number line and have lexical representations for all values. The "value spaces" in OWL serve a fundamentally different purpose than the "value spaces" defined in XSD. See my first message. XSD is concerned with representing values. Future incarnations can add more representations for more values, and existing data sets can be seamlessly extended with these new values. OWL is concerned with spaces of values. If the OWL 2.0 value space for numbers does not include, for example, the rationals, then the system can *not* be seamlessly extended to include the rationals, because they've already been excluded. The value spaces from XSD are inappropriate for OWL because they fail in exactly the wrong way. OWL extensions trim down value spaces. XSD extensions build up value spaces. To make this clearer, perhaps we should abandon the "value space" terminology altogether and instead talk about OWL "data domains". I suggest that OWL have a string data domain and a number data domain. The integer data domain is a subset of the number data domain. There is absolutely no need for a float data domain. OWL implementations should support particular values encoded using the `xsd:int` and `xsd:float` lexical representations. These values are all in the number domain. >> I want to be able to specify a particular point in this value space >> using a string such as "1.0e0^^xsd:float". > > Yeah, I'm kinda against that. But I would support "1.0e0^^owl:real". That's craziness. You're crazy. Stop being crazy. >> The XSD lexical forms are not "the lexical space for reals". There >> is no such thing as "the lexical space for reals". > > Bravo! ;) > >> There is such a thing as "the space of lexical representations >> which a conformant implementation must support for particular >> values in the real value space", but this space is much smaller >> than the real value space. > > Our initial proposal for owl:real is to support for syntax, pairs of > integers with the second being non-zero (i.e., standard fraction > syntax for rationals) and (at least) the algebraic reals for the > value space. If you don't have equations or special constants, you > can't address the irrationals or transcendentals anyway. We are > aiming to support some classes of equation, but only with rational > constants. just cuts off any possibility of future extension! >>> If we want exactness for the rationals, we need either to allow >>> repeating (e.g., 0.333repeating) (usually done with a macron) or >>> fraction syntax (e.g., 1/3). >> >> I don't intend to support exactness for rationals. A conformant >> implementation should only be required to provide exact support for >> `xsd:int` and `xsd:float` values. > > I don't think that would fly. Who is the vast army of users in need of support for exact rationals? I strongly strongly suspect that if they really existed they would have pushed on the XSchema folks to give them a lexical representation---XML is kind of big as a data representation language, you know. > Also, perhaps I misrecall, but don't you want arbitrarily sized > floats? An `xsd:float` has a limited size, by definition. > """For the restriction "forall R `xsd:float`" I simply bounded the > real number line at the min and max values of floats. Still a dense, > infinite number line, but with bounds. I hated this usage, however, > and would prefer if it became illegal.""" > > So you did bound...but you "hate it"? Which, the bounds? the > universal quantifier? I hated that the user was saying `float` and I was interpreting it as "real between `FLT_MIN` and `FLT_MAX`". I hope OWL 2.0 allows only OWL data domains in such a context. (But of course an individual value is a valid data domain, and complex data domains could be built using facets with individual values.) -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrumOn Jul 7, 2008, at 1:02 AM, Rob Shearer wrote: >>>>> XSD offers a lexical syntax for points that happen to lie on >>>>> the real number line >>>> >>>> It offers several and we're free to define one for owl:real. If >>>> we use any decimal notation, we have exactness problems (e.g., >>>> 1/3), but decimal is very user friendly. So, I was thinking that >>>> the valid syntax for a real would be decimal floating points and >>>> ratios of integers. We could include scientific notation as well. >>> >>> Why on earth would the OWL group come up with their own syntax >>> for encoding numbers? >> >> I'm presuming we're sticking with the basic xsd framework. So >> types have a lexical space and a values space. > > NO. There is no lexical space to represent all of the reals. I didn't say or imply there was. There's, afaict, no requirement that the lexical space cover the entire mapping space. Indeed, in the current owl:real, where we are talking about a value space over the algebraic reals (which *are* denumerable), we only allow rational constants. (We need the additional reals as possible solutions to equations with rational constants.) > That's the whole point---the reals include lots and lots of values > that cannot necessarily be represented lexically. I'm skeptical that than's the whole point, as it doesn't seem relevant. >> So, owl:real has a value space of the reals. But what should the >> lexical space be? I'd propose that at least the union of the xsd >> numeric types lexical spaces be the lexical space for our new >> type. I would add additional syntax for exact rationals (such as >> 1/3). The first part is isomorphic to your proposal about xsd >> syntax, I believe. > > Again, I have no intention of implementing rationals. > > And if you want to come up with some syntax for encoding rational > numbers in XML I suggest you join the XSchema working group, > because that's way beyond the OWL charter. I'm not sure why you say that. Designing an OWL type seems well within our purview. Consider rdf:Literal. >>> The XSchema guys have already done that, and people have >>> implemented parsers for their spec. If there's going to be a >>> syntax for rationals or algebraics, then that seems to be right >>> up their alley. >> >> They don't seem interested, alas. > > And I very much hope the OWL WG takes that as a sign that they > should be even less interested. The reason (one memeber) gave (privately) is that they didn't think that reals beyond decimals were necessary for a schema language. I think we agree that they are for an ontology language. So, my conclusion is the opposite of your hope. >>>>> XSD datatypes have a lexical space (e.g., the syntax) and a >>>>> value space. You are suggesting, I thought, that we adopt a >>>>> value space that is the reals and something about using xsd >>>>> syntax (i.e., lexical spaces) for the syntax. >>> >>> For the syntax of particular values. I keep trying to stress that >>> values spaces should be kept separate from the syntax used for >>> particular values. >> >> Sure. But that's true in XSD as well. > > No it's [not](http://www.w3.org/TR/xmlschema-2/#value-space): "Each > value in the value space of a datatype is denoted by one or more > literals in its ·lexical space·." Oh, ick. I had interpreted that as contingent for the set defined, not as a general principle for all types in an extended system. Ick. Yes, well, as the current design for owl:real shows, we are already ignoring this constraint :( > In XSD the lexical and value spaces and very tightly bound > together. This should *not* be true for OWL. Well, not exactly, but certainly moreso than I thought. They seem to be loosening this in Schema 1.1: http://www.w3.org/TR/xmlschema11-2/#value-space """Each value in the value space of a ·primitive· or ·ordinary· datatype is denoted by one or more character strings in its ·lexical space·, according to ·the lexical mapping·; ·special· datatypes, by contrast, may include "ineffable" values not mapped to by any lexical representation. """ [snip] >> I'd be surprised if we could get consensus on abandoning the >> lexical space/value space language and understanding. It's pretty >> deeply embedded into RDF. > > It's impossible to have a real number line and have lexical > representations for all values. Yes, so we have to at least relax the Schema 1.0 constraint that every value have a corresponding literal. Thanks for pointing that out. [snip] > To make this clearer, perhaps we should abandon the "value space" > terminology altogether and instead talk about OWL "data domains". I > suggest that OWL have a string data domain and a number data > domain. The integer data domain is a subset of the number data > domain. There is absolutely no need for a float data domain. OWL > implementations should support particular values encoded using the > `xsd:int` and `xsd:float` lexical representations. These values are > all in the number domain. This goes against existing implementation and use, wherein xsd:float is disjoint from xsd:int. (The non-real values of float are a problem as well.) [snip] >>> There is such a thing as "the space of lexical representations >>> which a conformant implementation must support for particular >>> values in the real value space", but this space is much smaller >>> than the real value space. >> >> Our initial proposal for owl:real is to support for syntax, pairs >> of integers with the second being non-zero (i.e., standard >> fraction syntax for rationals) and (at least) the algebraic reals >> for the value space. If you don't have equations or special >> constants, you can't address the irrationals or transcendentals >> anyway. We are aiming to support some classes of equation, but >> only with rational constants. > > But why on earth would you cut down the value space at *all*? Once you let the value space vary from the lexical space, there's less need. I think it helps to have the concept if that's what you are actually using. At the time, we were motivated in part by not wanting to spook people. (If our principle is the most broad relevant type, then complex seems to be the right supertype. The algebraic reals have a lot of nice properties which make them pretty suitable for our purposes.) > That just cuts off any possibility of future extension! [snip] I'm not sure why. We're free to introduce new types or extend old types. >> For the restriction "forall R `xsd:float`" I simply bounded the >> real number line at the min and max values of floats. Still a >> dense, infinite number line, but with bounds. I hated this usage, >> however, and would prefer if it became illegal.""" >> >> So you did bound...but you "hate it"? Which, the bounds? the >> universal quantifier? > > I hated that the user was saying `float` and I was interpreting it > as "real between `FLT_MIN` and `FLT_MAX`". I hope OWL 2.0 allows > only OWL data domains in such a context. So you do want arbitrarily sized floats. Ok. > (But of course an individual value is a valid data domain, and > complex data domains could be built using facets with individual > values.) We may be reaching diminishing returns on the public debate. We can continue in private if you like, and then summarize back. Thanks for the discussion. Cheers, Bijan. |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum>>>> The XSchema guys have already done that, and people have
Rational numbers, and linear equations, and n-ary data predicates, all
>>>> implemented parsers for their spec. If there's going to be a >>>> syntax for rationals or algebraics, then that seems to be right >>>> up their alley. >>> >>> They don't seem interested, alas. >> >> And I very much hope the OWL WG takes that as a sign that they >> should be even less interested. > > The reason (one memeber) gave (privately) is that they didn't think > that reals beyond decimals were necessary for a schema language. I > think we agree that they are for an ontology language. So, my > conclusion is the opposite of your hope. seem *much* more relevant to data representation and model checking than satisfiability reasoning; these are systems people want to use to store and compute particular values based on input, not to check satisfiability. (The n-ary datatype use cases, for example, don't offer much insight into how such a feature could be used to draw valuable new inferences.) And yet the XSchema group---the data representation and model-checking crowd---decided that such notions were far too ambitious for even them. Again, I urge the OWL working group to follow that example and focus on the small set of features which will actually benefit users, and make sure that they get those features right. -rob |
|
|
Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum>> The integer data domain is a subset of the number data domain.
>> There is absolutely no need for a float data domain. OWL >> implementations should support particular values encoded using the >> `xsd:int` and `xsd:float` lexical representations. These values are >> all in the number domain. > > This goes against existing implementation and use, wherein xsd:float > is disjoint from xsd:int. Cerebra did not make them disjoint. Neither does KAON2. And testing reveals that neither does FaCT++. The only reasoner I can find that makes them disjoint is Pellet. This sheds a lot of light on your definition of "existing implementation and use". If you intend to attempt to enshrine bugs in Clark & Parsia products in the OWL standard, I suggest that you do it as a representative of Clark & Parsia and not as a representative of Manchester, which has an interest in FaCT++. -rob |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |