Default ORDER BY

View: New views
3 Messages — Rating Filter:   Alert me  

Default ORDER BY

by Jeff Moore-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Guys,

This is regarding a recent change to the default order that a model  
uses.  I asked Paul to make that change and he did in the unfounded  
branch.  (r4120)  Thanks, Paul.  I wanted to move some of the  
discussion around it to the mailing list so others will understand  
what its about.


The issue was that Solar_Sql_Model was was always setting a default  
order in the $this->_order property.  If you did not set an order, the  
model would automatically set the order to the primary key.  
Unfortunately, this make it so there is no way to actually say "no  
order."

Where this comes into play is that when mysql uses indexes to optimize  
a query, it normally only chooses one index.  So, if you have both an  
ORDER BY and WHERE clause, mysql (at least v5), is forced to choose  
between optimizing for the ORDER BY or optimizing for the WHERE  
clause.  If it doesn't know that the query will return more than one  
record, it will optimize for the ORDER BY.  This results in cases  
where it might do an index scan on millions of records to find a  
handful of records because of the order by.  If you leave the order by  
off, it can use a different index and optimize for WHERE clause.  In  
some cases for us this was resulting in an order of magnitude  
difference in query speed.

So, its generally bad to include an ORDER BY clause if you don't  
actually need the results to be in order.


The changes in r4120 were intended to address this by not specifying a  
default order for a table, in effect making the default order "no  
order."  I've tested this and it works great.  This solution is also  
way way better than the solution I implemented in the grid branch.

Unfortunately, the related object still has a _setOrder which injects  
an order into any query that includes an eager fetch.

Consider an invoice model with the following relationship

$this->_hasMany('invoice_items');

If  you issue the following query

$items = $invoices->fetchAll('eager'=>'items');

This will issue two queries, one to fetch the invoices and another to  
fetch the items.  However the items query will still have a default  
ORDER BY clause generated.  That is because of this definition in  
Solar_Sql_Model_Related:

     protected function _setOrder($opts)
     {
         if (empty($opts['order'])) {
             // default to the foreign primary key
             $this->order = array("{$this->foreign_alias}.{$this-
 >_foreign_model->primary_col}");
         } else {
             $this->order = (array) $opts['order'];
         }
     }

Instead, may I suggest the following for Solar_Sql_Model_Related

     protected function _setOrder($opts)
     {
         if (empty($opts['order'])) {
             // default to the foreign primary key
             $this->order = array();
         } else {
             $this->order = (array) $opts['order'];
         }
     }

Which initializes order to a value which is valid but does not add  
extra ordering to the query fetching the related items.


Also consider what happens if the invoice model has a toOne  
relationship and we set an explicit order

$this->_hasOne('billing_address', 'order' => 'street');

If you issue this fetch

$items = $invoices->fetchAll('eager'=>'billing_address');

It will generate a single query, using a left join to fetch the  
addresses, but the ORDER BY will include the order from the  
relationship, thus ordering the native result set.  This doesn't  
happen with hasMany relationships.

What does it really mean to specify an 'order' option on a toOne  
relationship? There is one record, thus, order can't be important.

I'm thinking that for toOne relationships, using the 'order' option is  
probably at best "undefined." Because of the impact on the parent  
query, maybe 'order' should be ignored for toOne() or maybe result in  
an exception if used?  I kinda lean toward an exception.

You can always specify the 'order' in the fetch() method parameters.

What are your thoughts?


Best Regards,

Jeff

_______________________________________________
Solar-talk mailing list
Solar-talk@...
http://mailman-mail5.webfaction.com/listinfo/solar-talk

Re: Default ORDER BY

by Paul M Jones-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 2, 2009, at 13:58 , Jeff Moore wrote:

> Unfortunately, the related object still has a _setOrder which injects
> an order into any query that includes an eager fetch.
...

> Instead, may I suggest the following for Solar_Sql_Model_Related
>
>     protected function _setOrder($opts)
>     {
>         if (empty($opts['order'])) {
>             // default to the foreign primary key
>             $this->order = array();
>         } else {
>             $this->order = (array) $opts['order'];
>         }
>     }

Sounds fine to me -- if you could enter it as a ticket in Trac so I  
don't forget, that would be helpful.


> What does it really mean to specify an 'order' option on a toOne
> relationship? There is one record, thus, order can't be important.

Sure it can; e.g., if you want the relationship to always return the  
single most recent 'foo' by date, even when there are many matching  
related records, the order becomes important.  E.g.:

     // all related foos, in whatever order
     $this->_hasMany('foos');

     // just the most-recent foo
     $this->_hasOne('latest_foo', array(
         'class' => 'Foos',
         'order' => 'date DESC',
     );

The follow-on aspects of doing that are a separate discussion; the  
point is that yes, order *can* be important in to-one relationships.



--

Paul M. Jones
http://paul-m-jones.com/




_______________________________________________
Solar-talk mailing list
Solar-talk@...
http://mailman-mail5.webfaction.com/listinfo/solar-talk

Re: Default ORDER BY

by Jeff Moore-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Oct 2, 2009, at 2:29 PM, Paul M Jones wrote:

> Sounds fine to me -- if you could enter it as a ticket in Trac so I
> don't forget, that would be helpful.

Ticketed. (#212)


>> What does it really mean to specify an 'order' option on a toOne
>> relationship? There is one record, thus, order can't be important.
>
> Sure it can; e.g., if you want the relationship to always return the
> single most recent 'foo' by date, even when there are many matching
> related records, the order becomes important.  E.g.:
>
>     // all related foos, in whatever order
>     $this->_hasMany('foos');
>
>     // just the most-recent foo
>     $this->_hasOne('latest_foo', array(
>         'class' => 'Foos',
>         'order' => 'date DESC',
>     );
>
> The follow-on aspects of doing that are a separate discussion; the
> point is that yes, order *can* be important in to-one relationships.

Paul,

In this case latest_foo is defined as a _hasOne but will actually  
return many results.  The query that will be issued if you do this:

$invoices->fetchAll(array('eager'=>'latest_foo'));

Will be something like

SELECT *
FROM invoices
LEFT JOIN foos ON foos.invoice_id = invoices.id

Which will end up returning duplicate invoice records in the top level  
results if you have multiple foos.

Is this what you meant by follow-on aspects?

Philosophically, I think that the relationship definition should be  
stable.  What I mean by that is that including it in an eager fetch  
should not change the native results in a query.  And traversing the  
relationship without eager fetching it should return the the same  
results as traversing after eager fetching it.  I think this stability  
requirement is necessary to be able to write consistent business logic  
into the models.

I think its OK for the top level fetch() to specify conditions on the  
eager clause which would violate this principle.  However, if that is  
the case, then objects should not be loaded for that relationship.  
This would force a second query if the relationship is traversed.  I  
think the join_flag eager option controls this. (?)

For example, going back to invoices and items:

$result = $invoices->fetchAll(eager('items'=>array('where'=>'status =  
"active"')));

This would fetch all the invoices with at least one active item.  
However, the items property of each invoice should contain the  
invoices described in the relationship (active and all others), not  
the invoices described in the fetch.  I think this stability is  
critical for maintaining consistency of business logic in the models.

I know we've talked about these before, but maybe we should document  
some of these guidelines so we can preserve them going forward and so  
people know what to expect.

So back to the 'order' option.  I think that 'order' makes sense in  
the context of a fetch option.  It also makes sense on a hasMany()  
relationship.  But, I don't see how it can have meaning on a toOne()  
relationship, at least given the current capabilities of the  
relationships.

I don't think we currently have the capability to define a  
'latest_foo' type relationship.

There are 11 base relationship options.  This doesn't include a bunch  
of through options.  There are 11 eager options with some overlap and  
16 fetch options with some overlap.  Those are alot of interacting  
options.  Philosophically, I think it would be helpful to throw a  
helpful exception or two for combinations that don't make much sense.  
We can always take them out when we support more cases.

Best Regards,

Jeff
_______________________________________________
Solar-talk mailing list
Solar-talk@...
http://mailman-mail5.webfaction.com/listinfo/solar-talk