Concerning LET or AS

View: New views
6 Messages — Rating Filter:   Alert me  

Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you for your time and attention at the WG meeting today.

TopQuadrant would like Holger's earlier comment [1] to be treated as a
formal comment. (i.e. with an official WG response on this mailing list).

My understanding from today's meeting is that that is likely to be that
the WG has already considered the LET design and believes the AS design
to be adequate.
(LET is merely an abbreviated form for certain AS constructs). I also do
not believe that TopQuadrant is bringing any new information that was
not considered at your f2f meeting [2].

We however feel strongly about this, and are likely to raise a formal
objection (in the sense that we believe it would be better for the WG to
take a few weeks longer over SPARQL 1.1 and get this right, than to
deliver SPARQL 1.1 on schedule without this feature).

Thinking through particularly Steve's comments, I tried to come up with
an example illustrating how the ordering of operations that is sometimes
required is better articulated with LET than with AS.
This example is not as polished as I would like, since I believe it is
more helpful to contribute during your F2F meeting.

First I wish to clarify that this is not about whether or not assignment
should be in SPARQL 1.1. Assignment is in already, with the AS construct
that was discussed under item 39. This issue is purely about the syntax
and scoping rules for the single assignment capability.

Many of the sort of processing tasks that we and are customers have
involve mapping several legacy sources together, merging them into one
RDF graph, and then doing some processing.
A frequent problem is that different legacy sources represent the same
data in different ways, e.g. with different case conventions, in
different units, or whatever. In these cases, data laundry of one sort
or another is necessary. One option for laundry is using functions and
assignment within SPARQL.

So for my example, I am taking information about alumni at a college and
trying to find the appropriate year photo for them.
I will simplify the name problem to a name consist of a first name and a
last name, (no middle initial),  but people change their last name from
time to time.

The data sources that I have include:
- a current mailing database, with full-names, e-mail addresses, and
addresses
  a:fullName a:email a:address
  _:w a:fullName "John Smith" .
  _:w a:email <mailto:john.smith@...>.

- a database with students first names and last names and former last names
   to simplify processing I just use two properties
   b:firstName
   b:lastName

  for example:
   _:x  b:firstName "John" .
   _:x b:lastName "Doe" .
   _:x b:lastName "Smith".

shows that the person known as John Doe and the person known as John
Smith are one and the same, without clarifying the chronology of the
name change.

- a database with date of matriculation, and years of study, by full
name at time of matriculation
    c:matriculationDate c:studyYears c:fullName
  _:y c:fullName "John Doe" .
  _:y c:studyYears "P1Y"^^xs:yearMonthDuration .
  _:y c:matriculationDate "1988-09-01"^^xsd:date.  

- and a list of graduation photo names by year.
    d:year  d:fileName
   _:z d:year "1988"^^xsd:date
  _:z d:fileName "classOf88"

- I have arranged these photos as jpg files on the web at
http://www.example.org/photos
   http://www.example.org/photos/classOf88.jpg


SELECT ?eMail ?image
WHERE
{ ?a a:email ?eMail .
  ?a e:fullName ?fullName
  LET ( ?fullNameSpaceNormalized=normalize-space(?fullName) )        [A]      
  LET ( ?firstName=substring-before(?fullNameSpaceNormalized," ")    [B]
        ?lastName=substring-after(?fullNameSpaceNormalized," ") )
  ?b b:firstName ?firstName .
  ?b b:lastName ?lastName .
  ?b b:lastName ?altLastName .                                       [C]
  LET ( ?altName=concat(?firstName, " ", ?altLastName ) )
  ?c c:fullName ?a;tName .
  ?c c:studyYears ?lengthOfCourse .
  ?c c:matriculationDate ?matriculate .
  LET (?endDate=|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) )
  ?d d:year ?endDate .
  ?d d:fileName ?imageFile .
  LET ( ?image = xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) )|
}

Notes:
[A] for robustness against leading/trailing space and/or double space in
the middle
[B] cannot be combined with [A] because of rules discussed under issue 39
[C] ?altLastName can be the same as ?lastName

I believe the WG is considering recommending that this query should be
written as follows.

SELECT ?eMail,
       xs:anyURI(concat("http://www.example.org/photos", ?imageFile,
".jpg" ) )  as ?image
WHERE {
  SELECT ( *
year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse))  

              AS ?endDate )
  WHERE {
    SELECT ( * concat(?firstName, " ", ?altLastName ) AS ?altName )
    WHERE {
      SELECT (* substring-before(?fullNameSpaceNormalized," ")
                  AS ?firstName,
              substring-after(?fullNameSpaceNormalized," ") AS ?lastName )
      WHERE {
        SELECT (* normalize-space(?fullName) as ?fullNameSpaceNormalized)
        WHERE {
          ?a a:email ?eMail .
          ?a e:fullName ?fullName .
        }
      }      
    ?b b:firstName ?firstName .
    ?b b:lastName ?lastName .
    ?b b:lastName ?altLastName .
    }
  ?c c:fullName ?a;tName .
  ?c c:studyYears ?lengthOfCourse .
  ?c c:matriculationDate ?matriculate .
  }
?d d:year ?endDate .
?d d:fileName ?imageFile .
}

(Using the equivalence from [3])
We believe that this is inferior.
Harder to write, harder to read, harder to understand, and that the cost
of complicating the language by having two ways to say the same thing is
well worth it.


Jeremy Carroll
AC Rep, TopQuadrant.



[1]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Oct/0003
[2]
http://www.w3.org/2009/sparql/meeting/2009-05-06#ProjectExpressions___26___20_Assignment 

[3]
http://www.w3.org/2009/sparql/wiki/Feature:Assignment#Equivalence_with_SubSelects_and_ProjectExpressions



Re: Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

PS

My example come from putting together the following thoughts ...
A suggestion that what people don't like about LET is that it is
'procedural'.
However single-assignment is declarative, and that perceiving LET as
procedural is a failure of understanding.
Ordering constraints can be declarative or procedural, LET introduces
declarative ordering constraints.
The ordering constraints appear because of the shape of the problem: for
example if you compute an end date from a start date and a duration, you
need to know the start date and the duration.
Each LET declaratively but concisely introduces an ordering constraint.
The alternative SELECT AS WHERE construct declaratively and verbosely
introduces an ordering constraint.

There is a natural ordering to do with the flow of information. This
isn't necessarily the order of computation, but it is an order in which
it is easier for the query author to think about the query. The LET
syntax follows this natural ordering, the AS syntax does not.






Re: Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I thought I should share a couple of the comments I have had on this
topic from TopQuadrant colleagues:

[[
Speaking as a person who teaches this stuff, more than our own
reputation is at stake.  Perhaps this is something to add to our objection.

Technology adoption is the goal of a standard.  SPARQL fights an uphill
battle for a couple of reasons: (1) It's Not SQL.  (2) Pattern-based
retrieval is weird (witness the fabulous popularity of PROLOG as a
software engineering language).  In short, many people are looking for
reasons not to adopt it, and to stay with familiarity.
In short, the semantic web is faced with a huge hurdle in SPARQL.  And
while I applaud the "small standard" policy (which in general is a boon
to teaching and adoption), it is only good when it serves ease of adoption.

SPARQL 1.0 has some huge problems, that give SQL fans great ammunition
when it comes to saying "it's not ready" - negation and aggregates are
the biggies here, and both of them have been fixed.

Adding in another complex idiom (like nested subqueries) for something
simple (LET) will be repeating the mistake of negation.  Another reason
to say, "Wait for SPARQL 3".
I would go so far as to use the "small standard" argument the other
way.  Subqueries are difficult - in fact, in my course, I say, 'The
reason SPARQL doesn't have subquery is because it is not needed.  The
sorts of things that you use them for in SQL are done easily in a
pattern language, without resorted to a complex construct like a
subquery".  I challenge the room to prove me wrong.  One person was able
to do so.  No SQL programmer can do it - subqueries are error prone and
confusing.

So - faced with a confusing, difficult, unsuccessful idea from SQL
(subqueries) vs a well-accepted idea from BASIC, which one fits the
"small standard" mantra better?  I can speak confidently as an educator
in this stuff.   LET wins hands-down.
]]

and

[[
Also look at our mailing list to see what our customers are doing. [X]'s
message from yesterday contains:

SELECT ?stringPredicate ?stringObject ?stringAttributesNodeName ?
stringAttributeNodeName
WHERE {
   CQ:GatheredData ?predicate ?object .
   LET (?stringPredicate := smf:cast(smf:name(?predicate),
xsd:string)) .
   LET (?stringObject := smf:cast(smf:name(?object), xsd:string)) .
   LET (?uuid := smf:generateUUID()) .
   ?attributes a CQ:attributes .
   LET (?attributesNodeName := smf:qname(?attributes)) .
   LET (?stringAttributesNodeName := smf:cast(smf:name(?
attributesNodeName), xsd:string)) .
   LET (?stringAttributeNodeName := smf:buildString("{?
stringAttributesNodeName}-{?uuid}")) .
}

Will make a nice chain of sub-SELECTs...

]]



Re: Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

off list Steve and Lee encouraged me to be clearer about what I thought
LET as a keyword means.

Here is my attempt at specifying it (based on the WG Wiki page).
Please note that I am not responsible for TopQuadrant's SPARQL work;
Holger is our expert, and we tend to be dependent on Andy's implementation.
So, I am happy with any corrections from Andy.

It is not important how the word LET is spelt (i.e. as far as I know,
TopQuadrant has no particular attachment to 'LET' rather than 'BIND' for
example).

================================

In the FPWD of Query 1.1 we modify rule 43 for GroupGraphPattern as follows:

[43*] GroupGraphPattern ::= '{' GroupGraphPatternLetSub '}'

[A] GroupGraphPatternLetSub ::= ( GroupGraphPatternLetSub Let '.'? )?
GroupGraphPatternSub

[B] Let ::= 'LET' '('  Var ':=' Expression  ( ','  Var ':=' Expression
)* ')'

Rules [43*] [A] and [B] are interpreted by rewriting queries involving
LET into queries
not involving LET.
We will use phi(x) to be the written query of x.

If x matches rule B, then:

phi(x) = 'SELECT' '(' * '(' Expression 'AS' Var ')' ( '(' Expression
'AS' Var ')' )* ')'

(with the variables matching respectively).

If x matches rule A then

phi(x) =
phi(Let)
'WHERE' '{'
     phi(GroupGraphPatternLetSub)
'}'
GroupGraphPatternSub

The rest of the specification then applies.
==================

(Note this is a fine recipe for implementing as well).
Specifically, this prohibits forward references.
Being a macro expansion into a declarative form, this is declarative.

Jeremy




Re: Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I made a couple of mistakes in my previous text.

Please allow me to withdraw that text and try again.

The errors were:
1) I missed the { } around the subselect
2) my modification to rule 43 lost the SubSelect expansion
3) Removed too many '(' ')' in rewrite rule

I made some modifications for clarity too.

Also as a very minor comment, rule [43] etc combined with the gramar
rules from SPARQL 1.0 do not seem to expand to the example query
immediately above.

I take the intent of rule 43 to be:

GroupGraphPattern  ::=  '{' ( SubSelect | GroupGraphPatternSub )+ '}'

(Without the +, the rule matches either a single subselect or a SPARQL
1.0 body,
but not a combination of both.)

Here is modified text:

===============================

'LET' is specified as a macro-expansion, in terms of subselect queries.
In the FPWD of Query 1.1 we modify rule 43 for GroupGraphPattern as follows:

[43*] GroupGraphPattern ::= '{' GroupGraphPatternLetSub '}'

[A] GroupGraphPatternLetSub ::= ( GroupGraphPatternLetSub LetExpr '.'?
)? GroupGraphPatternNoLetSub

[B] LetExpr ::= 'LET' '('  Var ':=' Expression  ( ','  Var ':='
Expression )* ')'

[C] GroupGraphPatternNoLetSub ::= ( SubSelect | GroupGraphPatternSub )+

Rules [A] and [B] are interpreted by rewriting queries involving LET
into queries not involving LET.
We will use phi(x) to be the rewritten query of x.
For clarity of exposition we will expand the two alternative readings of
[A] as
[A.1] GroupGraphPatternLetSub ::= GroupGraphPatternNoLetSub
[A.2] GroupGraphPatternLetSub ::=  GroupGraphPatternLetSub LetExpr '.'?
GroupGraphPatternNoLetSub
Expressions matching rule [A.1] are not rewritten.

If x matches rule B, then:

phi(x) = 'SELECT'  * '(' Expression 'AS' Var ')' ( '(' Expression 'AS'
Var ')' )*

(with the variables matching respectively).

If x matches rule A.2  with y matching GroupGraphPatternLetSub on the
R.H.S.,
z matching LetExpr, and w matching  GroupGraphPatternNoLetSub then

phi(x) =
'{' phi(z)
   'WHERE' '{'
       phi(y)
   '}'
'}'
w

After this rewrite is applied to all instances matching rules [A] and
[B], the rewritten query does not involve 'LET' and its meaning is as
given in
the rest of the specification.
======================



Re: Concerning LET or AS

by Jeremy Carroll-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

As a personal comment, (sorry you are probably sick of me now). I don't
particularly seek a response to this comment.

I am surprised that there is concern that the LET single assignment
construct may mislead users into having an incorrect processing model in
their heads that might be overly procedural.

This surprise is because the whole point about having a declarative
semantics is that the processing model is irrelevant. Thus, with a
declarative language, we expect, perhaps even desire, that users have
incorrect processing models. Each implementation is free to use their
own processing model, and the user works with their own. For example,
when running an XSLT script, if it has some side effect of writing a
message to the console, it is often surprising when these messages get
written. This is because the easiest way to think of the XSLT processing
model is top-down left-to-right, but good implementations tend to be
lazy. This mismatch between the users model and the implementor's
reality is desirable because:
a) it makes it easier for the user to understand the language
b) it allows the implementor to efficiently implement the language
c) the declarative language design ensures that it doesn't matter that
these two views of the processing model differ, perhaps radically.

So, I think an advantage of the term LET as opposed to BIND (say) is
that LET reminds some users of procedural programming in BASIC, and
allows them to reuse that programming model. Now, while the details of
the execution flow are very different in SPARQL than in BASIC, it seems
that this apparent familiarity has pedagogical advantages.

==

For the record, TopQuadrant's position is we don't care what word is
used, whether it is LET or BIND or something else.

A further aside is that the latest release of TopBraid Composer includes
a SPARQL debugger function that exposes some aspects of the insides of
the SPARQL processing (I haven't used it). But I guess that using such a
tool would quickly disabuse you of any incorrect notions of the
processing model. (LET, I believe,  is one of the SPARQL extensions
supported by the tool).

Jeremy