Distance2D Review

View: New views
5 Messages — Rating Filter:   Alert me  

Distance2D Review

by Paul Ramsey-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Niklas, I'm just reading through the code, and I'm going to note
things into this e-mail as I go along:

- Try and keep private structures out of the liblwgeom.h file. So far
I see BBOXDIST, W_BBOX, LISTSTRUCT which are entirely confined to
measures.c. Put them into ./liblwgeom/measures.h for now (we'll have a
go at rationalizing public and private prototypes more attractively in
the PostGIS 2.0 round).
- Follow the code formatting guidelines in ./STYLE please (you can
retroactively apply them with astyle though)
- I see no test cases in cu_measures.c for your functions, we're
trying to get as much testing as possible in the library level unit
tests as we go along, so for new functions in particular we should aim
to get coverage
- This construction is a little odd:

        DISTPTS thedl;
        DISTPTS *dl = &thedl;
        lw_dist2d_comp( lw1,lw2,dl);

  Generally people just write it this way, without the extra variable,
it has the same effect:

        lw_dist2d_comp( lw1,lw2, &thedl);

 - You've declared move of your working functions with a void return
(since the return values are actually in the *DISTPTS). I'm not sure
if you have failure cases checked deeper in your functions yet, but
one way to propagate up error cases without things like null-pointer
checks is to return LW_TRUE or LW_FALSE from the functions so that
higher level error handling code can do the "right thing" with them.
- Unfortunately, you've followed the coding style of the original
functions, which is working more-or-less directly with serialized
*uchar. So, for convenience, you've defined things like W_BBOX to have
easier access to the bounds without having to re-pull them from the
structure each time. If you de-serilized to LWGEOM early on and worked
with LWGEOM, things might be less ugly.
- *Also*, in PostGIS 2.0 I am going to be getting rid of all these
*uchar valued calculation functions and replacing them with code that
works on the LWGEOM structures. So you can future-proof your code by
setting it up that way from the start...
- For your domain-specific structs, prefixing them so they are all
commonly named would be nice: DIST2D_RESULTS, DIST2D_LIST, etc.
- In DISTPTS, you have a member, 'thedir' which should really be named
'mode' and valued with a macro or enumeration (DIST2D_MAX, DIST2D_MIN)
rather than magic numbers.
- In DISTPTS, you have a member 'd' which should be 'dist' or 'distance'.
- I'm not enamoured of the down-the-rabbit-hole style in which
functions seem to delegate to function2 variants here and there for no
clear reason. Document what happens in each section of your function,
great new function points for code that is called repeatedly or in
different contexts.
- Mind you, the existing style in measures.c is really ugly to start
with, so c'est la vie.
* - I would really like to see two SQL bindings, an ST_Distance() that
calls through your proposed stack and an ST_Distance_Old() that calls
directly into the brute force functions. That way during the testing
phase we can compare the methods for accuracy on *lots* of different
data easily. We can remove those SQL bindings before release time.
- Things might fall apart if you are fed EMPTY geometries. I see some
places where npoints is not checked for > 0 on point arrays (or, that
pointarrays aren't checked for non-null on deserialization).

The most important thing to me is (*) above: enabling the ability to
test correctness on a large range of data during the 1.5 pre-release
period.

Hopefully Mark can take a quick look at the code this week, but since
it's going to be re-written in 2.0 to some extent anyways, the
stylistic stuff is less important to me than it might otherwise be.

Paul
_______________________________________________
postgis-devel mailing list
postgis-devel@...
http://postgis.refractions.net/mailman/listinfo/postgis-devel

Parent Message unknown Re: Distance2D Review

by Nicklas Avén :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Great Paul
thanks
 
From my level of experience I think I could have expected worse :-)
 
I hopefylly get some time this evening to give it a try.
 
About the testing-ability:
What I have done to test it aginst large datasets is to do something like:

select a.gid, b.gid from table1 a, table2 b
 where abs(st_distance(a.the_geom, b.the_geom)-new_distance(a.the_geom, b.the_geom))>0.0000001;
with st_distance defined pointing to the 1.4 released library and new_distance pointing to "my" library.
 
The pro with this is that we then test even my brute force part aginst 1.4 since that is also a little rewritten.
 
or should I try that binding?
 
If the above is returning any rows, st_shortestline is very convenient to see what the new function is doing wrong.
 
 
/Nicklas
 
 



2009-11-10 Paul Ramsey wrote:

Hi Niklas, I'm just reading through the code, and I'm going to note
>things into this e-mail as I go along:
>
>- Try and keep private structures out of the liblwgeom.h file. So far
>I see BBOXDIST, W_BBOX, LISTSTRUCT which are entirely confined to
>measures.c. Put them into ./liblwgeom/measures.h for now (we'll have a
>go at rationalizing public and private prototypes more attractively in
>the PostGIS 2.0 round).
>- Follow the code formatting guidelines in ./STYLE please (you can
>retroactively apply them with astyle though)
>- I see no test cases in cu_measures.c for your functions, we're
>trying to get as much testing as possible in the library level unit
>tests as we go along, so for new functions in particular we should aim
>to get coverage
>- This construction is a little odd:
>
> DISTPTS thedl;
> DISTPTS *dl = &thedl;
> lw_dist2d_comp( lw1,lw2,dl);
>
> Generally people just write it this way, without the extra variable,
>it has the same effect:
>
> lw_dist2d_comp( lw1,lw2, &thedl);
>
> - You've declared move of your working functions with a void return
>(since the return values are actually in the *DISTPTS). I'm not sure
>if you have failure cases checked deeper in your functions yet, but
>one way to propagate up error cases without things like null-pointer
>checks is to return LW_TRUE or LW_FALSE from the functions so that
>higher level error handling code can do the "right thing" with them.
>- Unfortunately, you've followed the coding style of the original
>functions, which is working more-or-less directly with serialized
>*uchar. So, for convenience, you've defined things like W_BBOX to have
>easier access to the bounds without having to re-pull them from the
>structure each time. If you de-serilized to LWGEOM early on and worked
>with LWGEOM, things might be less ugly.
>- *Also*, in PostGIS 2.0 I am going to be getting rid of all these
>*uchar valued calculation functions and replacing them with code that
>works on the LWGEOM structures. So you can future-proof your code by
>setting it up that way from the start...
>- For your domain-specific structs, prefixing them so they are all
>commonly named would be nice: DIST2D_RESULTS, DIST2D_LIST, etc.
>- In DISTPTS, you have a member, 'thedir' which should really be named
>'mode' and valued with a macro or enumeration (DIST2D_MAX, DIST2D_MIN)
>rather than magic numbers.
>- In DISTPTS, you have a member 'd' which should be 'dist' or 'distance'.
>- I'm not enamoured of the down-the-rabbit-hole style in which
>functions seem to delegate to function2 variants here and there for no
>clear reason. Document what happens in each section of your function,
>great new function points for code that is called repeatedly or in
>different contexts.
>- Mind you, the existing style in measures.c is really ugly to start
>with, so c'est la vie.
>* - I would really like to see two SQL bindings, an ST_Distance() that
>calls through your proposed stack and an ST_Distance_Old() that calls
>directly into the brute force functions. That way during the testing
>phase we can compare the methods for accuracy on *lots* of different
>data easily. We can remove those SQL bindings before release time.
>- Things might fall apart if you are fed EMPTY geometries. I see some
>places where npoints is not checked for > 0 on point arrays (or, that
>pointarrays aren't checked for non-null on deserialization).
>
>The most important thing to me is (*) above: enabling the ability to
>test correctness on a large range of data during the 1.5 pre-release
>period.
>
>Hopefully Mark can take a quick look at the code this week, but since
>it's going to be re-written in 2.0 to some extent anyways, the
>stylistic stuff is less important to me than it might otherwise be.
>
>Paul
>_______________________________________________
>postgis-devel mailing list
>postgis-devel@...
>http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>


_______________________________________________
postgis-devel mailing list
postgis-devel@...
http://postgis.refractions.net/mailman/listinfo/postgis-devel

Re: Distance2D Review -function signatures

by Paragon Corporation-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Paul,
 
FWIW:  This is exactly why I didn't want you doing this
 
 
Before you had a hardcore upgrade plan in place.
 
The reason being -- from a dev perspective, Nicklas can test the functions side by side against an old install, which is great :)
 
From a user perspective -- all their old functions will be pointing at a 1.4 or 1.3 (unless they uninstall these) -- I think we all agree this is bad.
 
Also some of your changes require a dump reload to do it right given our current install (but you wouldn't know since you'd just be pointing at 1.4/1.3)
 
I also don't understand the rush since lots of things will be ripped in PostGIS 2.0 anyway.
 
Thanks,
Regina


From: postgis-devel-bounces@... [mailto:postgis-devel-bounces@...] On Behalf Of nicklas.aven@...
Sent: Tuesday, November 10, 2009 2:33 AM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] Distance2D Review

Great Paul
thanks
 
From my level of experience I think I could have expected worse :-)
 
I hopefylly get some time this evening to give it a try.
 
About the testing-ability:
What I have done to test it aginst large datasets is to do something like:

select a.gid, b.gid from table1 a, table2 b
 where abs(st_distance(a.the_geom, b.the_geom)-new_distance(a.the_geom, b.the_geom))>0.0000001;
with st_distance defined pointing to the 1.4 released library and new_distance pointing to "my" library.
 
The pro with this is that we then test even my brute force part aginst 1.4 since that is also a little rewritten.
 
or should I try that binding?
 
If the above is returning any rows, st_shortestline is very convenient to see what the new function is doing wrong.
 
 
/Nicklas
 
 



2009-11-10 Paul Ramsey wrote:

Hi Niklas, I'm just reading through the code, and I'm going to note


>things into this e-mail as I go along:
>
>- Try and keep private structures out of the liblwgeom.h file. So far
>I see BBOXDIST, W_BBOX, LISTSTRUCT which are entirely confined to
>measures.c. Put them into ./liblwgeom/measures.h for now (we'll have a
>go at rationalizing public and private prototypes more attractively in
>the PostGIS 2.0 round).
>- Follow the code formatting guidelines in ./STYLE please (you can
>retroactively apply them with astyle though)
>- I see no test cases in cu_measures.c for your functions, we're
>trying to get as much testing as possible in the library level unit
>tests as we go along, so for new functions in particular we should aim
>to get coverage
>- This construction is a little odd:
>
> DISTPTS thedl;
> DISTPTS *dl = &thedl;
> lw_dist2d_comp( lw1,lw2,dl);
>
> Generally people just write it this way, without the extra variable,
>it has the same effect:
>
> lw_dist2d_comp( lw1,lw2, &thedl);
>
> - You've declared move of your working functions with a void return
>(since the return values are actually in the *DISTPTS). I'm not sure
>if you have failure cases checked deeper in your functions yet, but
>one way to propagate up error cases without things like null-pointer
>checks is to return LW_TRUE or LW_FALSE from the functions so that
>higher level error handling code can do the "right thing" with them.
>- Unfortunately, you've followed the coding style of the original
>functions, which is working more-or-less directly with serialized
>*uchar. So, for convenience, you've defined things like W_BBOX to have
>easier access to the bounds without having to re-pull them from the
>structure each time. If you de-serilized to LWGEOM early on and worked
>with LWGEOM, things might be less ugly.
>- *Also*, in PostGIS 2.0 I am going to be getting rid of all these
>*uchar valued calculation functions and replacing them with code that
>works on the LWGEOM structures. So you can future-proof your code by
>setting it up that way from the start...
>- For your domain-specific structs, prefixing them so they are all
>commonly named would be nice: DIST2D_RESULTS, DIST2D_LIST, etc.
>- In DISTPTS, you have a member, 'thedir' which should really be named
>'mode' and valued with a macro or enumeration (DIST2D_MAX, DIST2D_MIN)
>rather than magic numbers.
>- In DISTPTS, you have a member 'd' which should be 'dist' or 'distance'.
>- I'm not enamoured of the down-the-rabbit-hole style in which
>functions seem to delegate to function2 variants here and there for no
>clear reason. Document what happens in each section of your function,
>great new function points for code that is called repeatedly or in
>different contexts.
>- Mind you, the existing style in measures.c is really ugly to start
>with, so c'est la vie.
>* - I would really like to see two SQL bindings, an ST_Distance() that
>calls through your proposed stack and an ST_Distance_Old() that calls
>directly into the brute force functions. That way during the testing
>phase we can compare the methods for accuracy on *lots* of different
>data easily. We can remove those SQL bindings before release time.
>- Things might fall apart if you are fed EMPTY geometries. I see some
>places where npoints is not checked for > 0 on point arrays (or, that
>pointarrays aren't checked for non-null on deserialization).
>
>The most important thing to me is (*) above: enabling the ability to
>test correctness on a large range of data during the 1.5 pre-release
>period.
>
>Hopefully Mark can take a quick look at the code this week, but since
>it's going to be re-written in 2.0 to some extent anyways, the
>stylistic stuff is less important to me than it might otherwise be.
>
>Paul
>_______________________________________________
>postgis-devel mailing list
>postgis-devel@...
>http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>

_______________________________________________
postgis-devel mailing list
postgis-devel@...
http://postgis.refractions.net/mailman/listinfo/postgis-devel

Parent Message unknown Re: Distance2D Review

by Nicklas Avén :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hallo
 
I'm struggling a little with this distance thing and hope for some help.
 
1) I have been looking a little on deserializing early and working with LWGEOM as you suggested Paul. It would be wonderful, and it would probably be much cleaner code and easier to work with the logic. But I have two questions about it, is there a function for creating bbox after deserializing or what is the best way of doing that? next question is about the number of subgeometries. As it works now the number om subgeometries is presented in lwgeom_inspected, but I see no function inspecting a LWGEOM or something corresponding.
 
I just committed a change from void functions to integer as Paul suggested. I think a now have it "carrying" any false-value all the way up and out. My question is what to to do with the double-functions that are supposed to return the distance. If they got a false answer from the underlying functions I now just throw an error that something is wrong. Then the compiler varned that I didn't return values in all situations from a function that should return values so I put a "return 0.0;" after the error. I guess it doesn't matter since the error stops the process, but it cannot be the right way to do it. It looks dangorous that 0.0 is ready to take of as an answer at onceafter the error-message.
If this gets more and more confusing it's because I should have gone sleeping some hours ago.
 
I will continue looking at all this but I might not get time before next week.
 
I also did some small tests to see how it competes to the old calculation on very small geometries.
My test was 1 dataset of 300 000 two-vertex lines spread all over Norway. Then i ran a distance-calculation between all those lines and first another tvo-vertexline and then a three-vertexline and then a four and then ten vertexes.
 
What I found was that the old brutal force was about 20% faster when measuring between 2-vertex lines
when measuring between 2-vertex lines and a 3 vertex line they were about the same speed. Then the new calculation was faster and 2 vertex-line to 10 vertexline was about 3 times faster with the new algoritm.
 
The question is:
Is the usage of 2-vertex lines aginst 2 vertex lines so common so we should try take those situations the old way?
 
This have to be investigated better too I think.
 
Now it's time for sleep.
 
/Nicklas
 
 
 
 

2009-11-10 nicklas.aven@... wrote:

>
Great Paul
>
thanks
>
 
>
From my level of experience I think I could have expected worse :-)
>
 
>
I hopefylly get some time this evening to give it a try.
>
 
>
About the testing-ability:
>
What I have done to test it aginst large datasets is to do something like:
>

> select a.gid, b.gid from table1 a, table2 b
>  where abs(st_distance(a.the_geom, b.the_geom)-new_distance(a.the_geom, b.the_geom))>0.0000001;
>
>
with st_distance defined pointing to the 1.4 released library and new_distance pointing to "my" library.
>
 
>
The pro with this is that we then test even my brute force part aginst 1.4 since that is also a little rewritten.
>
 
>
or should I try that binding?
>
 
>
If the above is returning any rows, st_shortestline is very convenient to see what the new function is doing wrong.
>
 
>
 
>
/Nicklas
>
 
>
 


>
> 2009-11-10 Paul Ramsey wrote:
>
> Hi Niklas, I'm just reading through the code, and I'm going to note
> >things into this e-mail as I go along:
> >
> >- Try and keep private structures out of the liblwgeom.h file. So far
> >I see BBOXDIST, W_BBOX, LISTSTRUCT which are entirely confined to
> >measures.c. Put them into ./liblwgeom/measures.h for now (we'll have a
> >go at rationalizing public and private prototypes more attractively in
> >the PostGIS 2.0 round).
> >- Follow the code formatting guidelines in ./STYLE please (you can
> >retroactively apply them with astyle though)
> >- I see no test cases in cu_measures.c for your functions, we're
> >trying to get as much testing as possible in the library level unit
> >tests as we go along, so for new functions in particular we should aim
> >to get coverage
> >- This construction is a little odd:
> >
> > DISTPTS thedl;
> > DISTPTS *dl = &thedl;
> > lw_dist2d_comp( lw1,lw2,dl);
> >
> > Generally people just write it this way, without the extra variable,
> >it has the same effect:
> >
> > lw_dist2d_comp( lw1,lw2, &thedl);
> >
> > - You've declared move of your working functions with a void return
> >(since the return values are actually in the *DISTPTS). I'm not sure
> >if you have failure cases checked deeper in your functions yet, but
> >one way to propagate up error cases without things like null-pointer
> >checks is to return LW_TRUE or LW_FALSE from the functions so that
> >higher level error handling code can do the "right thing" with them.
> >- Unfortunately, you've followed the coding style of the original
> >functions, which is working more-or-less directly with serialized
> >*uchar. So, for convenience, you've defined things like W_BBOX to have
> >easier access to the bounds without having to re-pull them from the
> >structure each time. If you de-serilized to LWGEOM early on and worked
> >with LWGEOM, things might be less ugly.
> >- *Also*, in PostGIS 2.0 I am going to be getting rid of all these
> >*uchar valued calculation functions and replacing them with code that
> >works on the LWGEOM structures. So you can future-proof your code by
> >setting it up that way from the start...
> >- For your domain-specific structs, prefixing them so they are all
> >commonly named would be nice: DIST2D_RESULTS, DIST2D_LIST, etc.
> >- In DISTPTS, you have a member, 'thedir' which should really be named
> >'mode' and valued with a macro or enumeration (DIST2D_MAX, DIST2D_MIN)
> >rather than magic numbers.
> >- In DISTPTS, you have a member 'd' which should be 'dist' or 'distance'.
> >- I'm not enamoured of the down-the-rabbit-hole style in which
> >functions seem to delegate to function2 variants here and there for no
> >clear reason. Document what happens in each section of your function,
> >great new function points for code that is called repeatedly or in
> >different contexts.
> >- Mind you, the existing style in measures.c is really ugly to start
> >with, so c'est la vie.
> >* - I would really like to see two SQL bindings, an ST_Distance() that
> >calls through your proposed stack and an ST_Distance_Old() that calls
> >directly into the brute force functions. That way during the testing
> >phase we can compare the methods for accuracy on *lots* of different
> >data easily. We can remove those SQL bindings before release time.
> >- Things might fall apart if you are fed EMPTY geometries. I see some
> >places where npoints is not checked for > 0 on point arrays (or, that
> >pointarrays aren't checked for non-null on deserialization).
> >
> >The most important thing to me is (*) above: enabling the ability to
> >test correctness on a large range of data during the 1.5 pre-release
> >period.
> >
> >Hopefully Mark can take a quick look at the code this week, but since
> >it's going to be re-written in 2.0 to some extent anyways, the
> >stylistic stuff is less important to me than it might otherwise be.
> >
> >Paul
> >_______________________________________________
> >postgis-devel mailing list
> >postgis-devel@...
> >postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> >


_______________________________________________
postgis-devel mailing list
postgis-devel@...
http://postgis.refractions.net/mailman/listinfo/postgis-devel

Re: Distance2D Review

by Paul Ramsey-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Niklas, thanks for looking over things:

On Wed, Nov 11, 2009 at 4:27 PM, Nicklas Avén
<nicklas.aven@...> wrote:

> is there a function for creating bbox after
> deserializing or what is the best way of doing that?

LWGEOM *lwg = lwgeom_deserialize(SERIALIZED_FORM(pglwg));
if( ! lwg->bbox )
  lwg->bbox = lwgeom_compute_box2d(lwg);

Actually, I think only POINT objects should show up without boxes, but
the above is comprehensive.

> next question is about
> the number of subgeometries. As it works now the number om subgeometries is
> presented in lwgeom_inspected, but I see no function inspecting a LWGEOM or
> something corresponding.

int lwcollection_ngeoms(const LWCOLLECTION *col)

> I just committed a change from void functions to integer as Paul suggested.
> I think a now have it "carrying" any false-value all the way up and out. My
> question is what to to do with the double-functions that are supposed to
> return the distance. If they got a false answer from the underlying
> functions I now just throw an error that something is wrong. Then the
> compiler varned that I didn't return values in all situations from a
> function that should return values so I put a "return 0.0;" after the error.

Yeah, I have a similar yucky solution in geography, but I return a
negative number (invalid value) so that can be caught higher if
necessary. It depends a lot on the number of error conditions and how
high we want to propogate them, unfortunately we don't have a hard and
set pattern to follow (I've seen some lovely consistent code that is
all return codes on the return and return values in pointer
arguments). Consistency is its own reward and it's own curse: do what
makes the most sense for your use case.

> If this gets more and more confusing it's because I should
> have gone sleeping some hours ago.

Sleep well!

P.
_______________________________________________
postgis-devel mailing list
postgis-devel@...
http://postgis.refractions.net/mailman/listinfo/postgis-devel