I've got a silly question about performance

View: New views
10 Messages — Rating Filter:   Alert me  

I've got a silly question about performance

by easpengren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm still getting the hang of some of the finer points of creating queries in PostGIS, as is probably obvious with my last post.

I've two tables, parcel2 that is a collection of parcels in a county and a table election, which is a table of voting precincts in the same county. I'd like to select all of the parcels in each precinct.

I have this query:

select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ, city_code from election, parcel2 where ST_contains(election.the_geom, ST_pointonsurface(parcel2.wkb_geometry));

This gets the job done, but it's very slow. What can I do to speed this up?

Eric

Re: I've got a silly question about performance

by pcreso :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


--- easpengren <phat-ass@...> wrote:

>
> I'm still getting the hang of some of the finer points of creating queries in
> PostGIS, as is probably obvious with my last post.
>
> I've two tables, parcel2 that is a collection of parcels in a county and a
> table election, which is a table of voting precincts in the same county. I'd
> like to select all of the parcels in each precinct.
>

That query looks OK, but if parcels can be split across precincts, then it
won't necessarily give the correct answer.
 
> I have this query:
>
> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ, city_code
> from election, parcel2 where ST_contains(election.the_geom,
> ST_pointonsurface(parcel2.wkb_geometry));
>
> This gets the job done, but it's very slow. What can I do to speed this up?

See the PostGIS docs about creating spatial indices on the geometry columns in
your two tables.

http://postgis.refractions.net/docs/ch04.html#id2761842
http://postgis.refractions.net/docs/ch04.html#id2761985

If you have, or if you create them, then you need to modify your query to use
them:

.... where election.the_geom && parcel2.wkb_geometry and ST_contains ...

as described in:
http://postgis.refractions.net/docs/ch04.html#id2762121



Hope this helps...

  Brent Wood
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by easpengren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OK, I think I've got that. When I imported these shape files into the database, an index was created. I added a Gist index to both to see what would happen. Will that cause a problem?

I do not understand Indices at all. I'll have to read up on them.

We'll see how these things come out. I would like to have something usable in the next day or two.

I'll likely end up creating a view of this query. Again, I'm still figuring this out. Speed is really quite important for me. I don't have the most powerful machine doing this stuff (more RAM will likely show up this week). What I'd like to do is make sure the software is working as efficiently as I can get it.

As I type this, I seem to have gotten a much quicker response from the server.

Wow.

Eric
Brent Wood-2 wrote:
--- easpengren <phat-ass@thinkheavyindustries.com> wrote:

>
> I'm still getting the hang of some of the finer points of creating queries in
> PostGIS, as is probably obvious with my last post.
>
> I've two tables, parcel2 that is a collection of parcels in a county and a
> table election, which is a table of voting precincts in the same county. I'd
> like to select all of the parcels in each precinct.
>

That query looks OK, but if parcels can be split across precincts, then it
won't necessarily give the correct answer.
 
> I have this query:
>
> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ, city_code
> from election, parcel2 where ST_contains(election.the_geom,
> ST_pointonsurface(parcel2.wkb_geometry));
>
> This gets the job done, but it's very slow. What can I do to speed this up?

See the PostGIS docs about creating spatial indices on the geometry columns in
your two tables.

http://postgis.refractions.net/docs/ch04.html#id2761842
http://postgis.refractions.net/docs/ch04.html#id2761985

If you have, or if you create them, then you need to modify your query to use
them:

.... where election.the_geom && parcel2.wkb_geometry and ST_contains ...

as described in:
http://postgis.refractions.net/docs/ch04.html#id2762121



Hope this helps...

  Brent Wood
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

RE: I've got a silly question about performance

by Gregory Williamson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RE: [postgis-users] I've got a silly question about performance

The GIST index is the one you want / need for spatial operations. Make sure that you ANALYZE <tablename>; after creating the index or changing the data much (rule of thumb is maybe 10% but analyze is a fast and fairly low impact operation).

The ST_ functions usually (always?) invoke the && operator (which is the spatial operator which will use the GIST indexes); they are wrapper for the underlying functions.

Posting your information [type of hardware, OS, postgreSQL and postGIS versions, table structure and indexes, tables sizes, the query itself and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL performance mail list might be worthwhile if this list doesn't help enough.

Greg Williamson
Senior DBA
Globexplorer LLC, a company owned by DigitalGlobe


-----Original Message-----
From: postgis-users-bounces@... on behalf of easpengren
Sent: Thu 1/17/2008 12:02 AM
To: postgis-users@...
Subject: Re: [postgis-users] I've got a silly question about performance


OK, I think I've got that. When I imported these shape files into the
database, an index was created. I added a Gist index to both to see what
would happen. Will that cause a problem?

I do not understand Indices at all. I'll have to read up on them.

We'll see how these things come out. I would like to have something usable
in the next day or two.

I'll likely end up creating a view of this query. Again, I'm still figuring
this out. Speed is really quite important for me. I don't have the most
powerful machine doing this stuff (more RAM will likely show up this week).
What I'd like to do is make sure the software is working as efficiently as I
can get it.

As I type this, I seem to have gotten a much quicker response from the
server.

Wow.

Eric

Brent Wood-2 wrote:
>
>
> --- easpengren <phat-ass@...> wrote:
>
>>
>> I'm still getting the hang of some of the finer points of creating
>> queries in
>> PostGIS, as is probably obvious with my last post.
>>
>> I've two tables, parcel2 that is a collection of parcels in a county and
>> a
>> table election, which is a table of voting precincts in the same county.
>> I'd
>> like to select all of the parcels in each precinct.
>>
>
> That query looks OK, but if parcels can be split across precincts, then it
> won't necessarily give the correct answer.

>> I have this query:
>>
>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>> city_code
>> from election, parcel2 where ST_contains(election.the_geom,
>> ST_pointonsurface(parcel2.wkb_geometry));
>>
>> This gets the job done, but it's very slow. What can I do to speed this
>> up?
>
> See the PostGIS docs about creating spatial indices on the geometry
> columns in
> your two tables.
>
> http://postgis.refractions.net/docs/ch04.html#id2761842
> http://postgis.refractions.net/docs/ch04.html#id2761985
>
> If you have, or if you create them, then you need to modify your query to
> use
> them:
>
> .... where election.the_geom && parcel2.wkb_geometry and ST_contains ...
>
> as described in:
> http://postgis.refractions.net/docs/ch04.html#id2762121
>
>
>
> Hope this helps...
>
>   Brent Wood
> _______________________________________________
> postgis-users mailing list
> postgis-users@...
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>

--
View this message in context: http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
Sent from the PostGIS - User mailing list archive at Nabble.com.

_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by Chris Hermansen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If you want a list of parcels in each precinct, why are you first
creating points?  That's just going to add extra - unnecessary -
geoprocessing.

Try st_intersects() and st_intersection().

If you are concerned your parcels overlap more than one precinct, get
the area() of each intersection(), get the max() of that, and group by
your other variables.

Gregory Williamson wrote:

>
> The GIST index is the one you want / need for spatial operations. Make
> sure that you ANALYZE <tablename>; after creating the index or
> changing the data much (rule of thumb is maybe 10% but analyze is a
> fast and fairly low impact operation).
>
> The ST_ functions usually (always?) invoke the && operator (which is
> the spatial operator which will use the GIST indexes); they are
> wrapper for the underlying functions.
>
> Posting your information [type of hardware, OS, postgreSQL and postGIS
> versions, table structure and indexes, tables sizes, the query itself
> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
> performance mail list might be worthwhile if this list doesn't help
> enough.
>
> Greg Williamson
> Senior DBA
> Globexplorer LLC, a company owned by DigitalGlobe
>
>
> -----Original Message-----
> From: postgis-users-bounces@... on behalf of
> easpengren
> Sent: Thu 1/17/2008 12:02 AM
> To: postgis-users@...
> Subject: Re: [postgis-users] I've got a silly question about performance
>
>
> OK, I think I've got that. When I imported these shape files into the
> database, an index was created. I added a Gist index to both to see what
> would happen. Will that cause a problem?
>
> I do not understand Indices at all. I'll have to read up on them.
>
> We'll see how these things come out. I would like to have something usable
> in the next day or two.
>
> I'll likely end up creating a view of this query. Again, I'm still
> figuring
> this out. Speed is really quite important for me. I don't have the most
> powerful machine doing this stuff (more RAM will likely show up this
> week).
> What I'd like to do is make sure the software is working as
> efficiently as I
> can get it.
>
> As I type this, I seem to have gotten a much quicker response from the
> server.
>
> Wow.
>
> Eric
>
> Brent Wood-2 wrote:
> >
> >
> > --- easpengren <phat-ass@...> wrote:
> >
> >>
> >> I'm still getting the hang of some of the finer points of creating
> >> queries in
> >> PostGIS, as is probably obvious with my last post.
> >>
> >> I've two tables, parcel2 that is a collection of parcels in a
> county and
> >> a
> >> table election, which is a table of voting precincts in the same
> county.
> >> I'd
> >> like to select all of the parcels in each precinct.
> >>
> >
> > That query looks OK, but if parcels can be split across precincts,
> then it
> > won't necessarily give the correct answer.
> >
> >> I have this query:
> >>
> >> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
> >> city_code
> >> from election, parcel2 where ST_contains(election.the_geom,
> >> ST_pointonsurface(parcel2.wkb_geometry));
> >>
> >> This gets the job done, but it's very slow. What can I do to speed this
> >> up?
> >
> > See the PostGIS docs about creating spatial indices on the geometry
> > columns in
> > your two tables.
> >
> > http://postgis.refractions.net/docs/ch04.html#id2761842
> > http://postgis.refractions.net/docs/ch04.html#id2761985
> >
> > If you have, or if you create them, then you need to modify your
> query to
> > use
> > them:
> >
> > .... where election.the_geom && parcel2.wkb_geometry and ST_contains ...
> >
> > as described in:
> > http://postgis.refractions.net/docs/ch04.html#id2762121
> >
> >
> >
> > Hope this helps...
> >
> >   Brent Wood
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users@...
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
> Sent from the PostGIS - User mailing list archive at Nabble.com.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users@...
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users@...
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>  


--
Regards,

Chris Hermansen · mailto:clh@...
tel:+1.604.714.2878 · fax:+1.604.733.0631
Timberline Natural Resource Group · http://www.timberline.ca
401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5

C'est ma façon de parler.

_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by Josh Livni-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well, and again assuming he doesn't mind getting the list of parcels
whose centroids intersect each district rather than the list of parcels
that intersect each district, checking for point-polygon intersection is
going to be faster than checking for polygon-polygon intersection.  So
for speed purposes it's probably a good way to go.

For a first pass, after ensuring there is a gist index on the centroid
(or in this case point-on-surface - tho I'm not sure why), I would
expect the following to be faster than the equivalent query w/the full
polygons rather than the points:
   ...
    AND election.the_geom && ST_pointonsurface(parcel2.wkb_geometry)
    AND
ST_distance(election.the_geom,ST_pointonsurface(parcel2.wkb_geometry)) = 0
  ...
And if you really needed speed, adding a new geometry field with the
points precalculated and using that in the second line would make things
even faster.

  -Josh



Chris Hermansen wrote:

> If you want a list of parcels in each precinct, why are you first
> creating points?  That's just going to add extra - unnecessary -
> geoprocessing.
>
> Try st_intersects() and st_intersection().
>
> If you are concerned your parcels overlap more than one precinct, get
> the area() of each intersection(), get the max() of that, and group by
> your other variables.
>
> Gregory Williamson wrote:
>  
>> The GIST index is the one you want / need for spatial operations. Make
>> sure that you ANALYZE <tablename>; after creating the index or
>> changing the data much (rule of thumb is maybe 10% but analyze is a
>> fast and fairly low impact operation).
>>
>> The ST_ functions usually (always?) invoke the && operator (which is
>> the spatial operator which will use the GIST indexes); they are
>> wrapper for the underlying functions.
>>
>> Posting your information [type of hardware, OS, postgreSQL and postGIS
>> versions, table structure and indexes, tables sizes, the query itself
>> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
>> performance mail list might be worthwhile if this list doesn't help
>> enough.
>>
>> Greg Williamson
>> Senior DBA
>> Globexplorer LLC, a company owned by DigitalGlobe
>>
>>
>> -----Original Message-----
>> From: postgis-users-bounces@... on behalf of
>> easpengren
>> Sent: Thu 1/17/2008 12:02 AM
>> To: postgis-users@...
>> Subject: Re: [postgis-users] I've got a silly question about performance
>>
>>
>> OK, I think I've got that. When I imported these shape files into the
>> database, an index was created. I added a Gist index to both to see what
>> would happen. Will that cause a problem?
>>
>> I do not understand Indices at all. I'll have to read up on them.
>>
>> We'll see how these things come out. I would like to have something usable
>> in the next day or two.
>>
>> I'll likely end up creating a view of this query. Again, I'm still
>> figuring
>> this out. Speed is really quite important for me. I don't have the most
>> powerful machine doing this stuff (more RAM will likely show up this
>> week).
>> What I'd like to do is make sure the software is working as
>> efficiently as I
>> can get it.
>>
>> As I type this, I seem to have gotten a much quicker response from the
>> server.
>>
>> Wow.
>>
>> Eric
>>
>> Brent Wood-2 wrote:
>>    
>>> --- easpengren <phat-ass@...> wrote:
>>>
>>>      
>>>> I'm still getting the hang of some of the finer points of creating
>>>> queries in
>>>> PostGIS, as is probably obvious with my last post.
>>>>
>>>> I've two tables, parcel2 that is a collection of parcels in a
>>>>        
>> county and
>>    
>>>> a
>>>> table election, which is a table of voting precincts in the same
>>>>        
>> county.
>>    
>>>> I'd
>>>> like to select all of the parcels in each precinct.
>>>>
>>>>        
>>> That query looks OK, but if parcels can be split across precincts,
>>>      
>> then it
>>    
>>> won't necessarily give the correct answer.
>>>
>>>      
>>>> I have this query:
>>>>
>>>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>>>> city_code
>>>> from election, parcel2 where ST_contains(election.the_geom,
>>>> ST_pointonsurface(parcel2.wkb_geometry));
>>>>
>>>> This gets the job done, but it's very slow. What can I do to speed this
>>>> up?
>>>>        
>>> See the PostGIS docs about creating spatial indices on the geometry
>>> columns in
>>> your two tables.
>>>
>>> http://postgis.refractions.net/docs/ch04.html#id2761842
>>> http://postgis.refractions.net/docs/ch04.html#id2761985
>>>
>>> If you have, or if you create them, then you need to modify your
>>>      
>> query to
>>    
>>> use
>>> them:
>>>
>>> .... where election.the_geom && parcel2.wkb_geometry and ST_contains ...
>>>
>>> as described in:
>>> http://postgis.refractions.net/docs/ch04.html#id2762121
>>>
>>>
>>>
>>> Hope this helps...
>>>
>>>   Brent Wood
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@...
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>      
>> --
>> View this message in context:
>> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@...
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@...
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>  
>>    
>
>
>  
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by Chris Hermansen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Josh, one of us is missing something here.

Point on polygon is faster.  But in the method he proposed, first you
have to calculate the points.  And getting centroids requires visiting
all of the points on the polygon.

So I'm not seeing how this could be faster.

If he's doing it more than once on the same data set, I agree with your
pre-calculation comment.

Some other thoughts:

Note that centroids would however be a bad choice, as they can easily be
outside the polygon.

Even points within the polygons are not necessarily a good choice,
depending on how the two input geometries overlap.  I guess it would
depend on how parcels are "really" assigned to disticts - ie can the
assignment be replicated algorithmically?

Josh Livni wrote:

> Well, and again assuming he doesn't mind getting the list of parcels
> whose centroids intersect each district rather than the list of
> parcels that intersect each district, checking for point-polygon
> intersection is going to be faster than checking for polygon-polygon
> intersection.  So for speed purposes it's probably a good way to go.
>
> For a first pass, after ensuring there is a gist index on the centroid
> (or in this case point-on-surface - tho I'm not sure why), I would
> expect the following to be faster than the equivalent query w/the full
> polygons rather than the points:
>   ...
>    AND election.the_geom && ST_pointonsurface(parcel2.wkb_geometry)
>    AND
> ST_distance(election.the_geom,ST_pointonsurface(parcel2.wkb_geometry))
> = 0
>  ...
> And if you really needed speed, adding a new geometry field with the
> points precalculated and using that in the second line would make
> things even faster.
>
>  -Josh
>
>
>
> Chris Hermansen wrote:
>> If you want a list of parcels in each precinct, why are you first
>> creating points?  That's just going to add extra - unnecessary -
>> geoprocessing.
>>
>> Try st_intersects() and st_intersection().
>>
>> If you are concerned your parcels overlap more than one precinct, get
>> the area() of each intersection(), get the max() of that, and group by
>> your other variables.
>>
>> Gregory Williamson wrote:
>>  
>>> The GIST index is the one you want / need for spatial operations. Make
>>> sure that you ANALYZE <tablename>; after creating the index or
>>> changing the data much (rule of thumb is maybe 10% but analyze is a
>>> fast and fairly low impact operation).
>>>
>>> The ST_ functions usually (always?) invoke the && operator (which is
>>> the spatial operator which will use the GIST indexes); they are
>>> wrapper for the underlying functions.
>>>
>>> Posting your information [type of hardware, OS, postgreSQL and postGIS
>>> versions, table structure and indexes, tables sizes, the query itself
>>> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
>>> performance mail list might be worthwhile if this list doesn't help
>>> enough.
>>>
>>> Greg Williamson
>>> Senior DBA
>>> Globexplorer LLC, a company owned by DigitalGlobe
>>>
>>>
>>> -----Original Message-----
>>> From: postgis-users-bounces@... on behalf of
>>> easpengren
>>> Sent: Thu 1/17/2008 12:02 AM
>>> To: postgis-users@...
>>> Subject: Re: [postgis-users] I've got a silly question about
>>> performance
>>>
>>>
>>> OK, I think I've got that. When I imported these shape files into the
>>> database, an index was created. I added a Gist index to both to see
>>> what
>>> would happen. Will that cause a problem?
>>>
>>> I do not understand Indices at all. I'll have to read up on them.
>>>
>>> We'll see how these things come out. I would like to have something
>>> usable
>>> in the next day or two.
>>>
>>> I'll likely end up creating a view of this query. Again, I'm still
>>> figuring
>>> this out. Speed is really quite important for me. I don't have the most
>>> powerful machine doing this stuff (more RAM will likely show up this
>>> week).
>>> What I'd like to do is make sure the software is working as
>>> efficiently as I
>>> can get it.
>>>
>>> As I type this, I seem to have gotten a much quicker response from the
>>> server.
>>>
>>> Wow.
>>>
>>> Eric
>>>
>>> Brent Wood-2 wrote:
>>>    
>>>> --- easpengren <phat-ass@...> wrote:
>>>>
>>>>      
>>>>> I'm still getting the hang of some of the finer points of creating
>>>>> queries in
>>>>> PostGIS, as is probably obvious with my last post.
>>>>>
>>>>> I've two tables, parcel2 that is a collection of parcels in a
>>>>>        
>>> county and
>>>    
>>>>> a
>>>>> table election, which is a table of voting precincts in the same
>>>>>        
>>> county.
>>>    
>>>>> I'd
>>>>> like to select all of the parcels in each precinct.
>>>>>
>>>>>        
>>>> That query looks OK, but if parcels can be split across precincts,
>>>>      
>>> then it
>>>    
>>>> won't necessarily give the correct answer.
>>>>
>>>>      
>>>>> I have this query:
>>>>>
>>>>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>>>>> city_code
>>>>> from election, parcel2 where ST_contains(election.the_geom,
>>>>> ST_pointonsurface(parcel2.wkb_geometry));
>>>>>
>>>>> This gets the job done, but it's very slow. What can I do to speed
>>>>> this
>>>>> up?
>>>>>        
>>>> See the PostGIS docs about creating spatial indices on the geometry
>>>> columns in
>>>> your two tables.
>>>>
>>>> http://postgis.refractions.net/docs/ch04.html#id2761842
>>>> http://postgis.refractions.net/docs/ch04.html#id2761985
>>>>
>>>> If you have, or if you create them, then you need to modify your
>>>>      
>>> query to
>>>    
>>>> use
>>>> them:
>>>>
>>>> .... where election.the_geom && parcel2.wkb_geometry and
>>>> ST_contains ...
>>>>
>>>> as described in:
>>>> http://postgis.refractions.net/docs/ch04.html#id2762121
>>>>
>>>>
>>>>
>>>> Hope this helps...
>>>>
>>>>   Brent Wood
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@...
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>>
>>>>      
>>> --
>>> View this message in context:
>>> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
>>>
>>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@...
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@...
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>      
>>
>>
>>  
> _______________________________________________
> postgis-users mailing list
> postgis-users@...
> http://postgis.refractions.net/mailman/listinfo/postgis-users


--
Regards,

Chris Hermansen · mailto:clh@...
tel:+1.604.714.2878 · fax:+1.604.733.0631
Timberline Natural Resource Group · http://www.timberline.ca
401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5

C'est ma façon de parler.

_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by Josh Livni-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris,

Good point - thanks for calling me out on that.   I think we're both
making assumptions about a use case -- my assumption was he would
probably be doing a similar calculation many times, and thus my bias
towards using points rather than polys.

Also, as noted - it's hard to say whether centroids or points would
suffice or which is better.  If a parcel intersects two polygons, and
you want to know which polygon it belongs to (and only assign it to one
of the two that it intersects), and if you are working only with point
proxies for parcels, then it's possible a centroid may be a more valid
assignment than a point_on_surface, even if the centroid itself lies
outside the parcel.

Anyway, without having more detail on the actual use-case and workflow,
I have no great opinion on what methodology might work best...

Cheers,
 
 -Josh


Chris Hermansen wrote:

> Josh, one of us is missing something here.
>
> Point on polygon is faster.  But in the method he proposed, first you
> have to calculate the points.  And getting centroids requires visiting
> all of the points on the polygon.
>
> So I'm not seeing how this could be faster.
>
> If he's doing it more than once on the same data set, I agree with your
> pre-calculation comment.
>
> Some other thoughts:
>
> Note that centroids would however be a bad choice, as they can easily be
> outside the polygon.
>
> Even points within the polygons are not necessarily a good choice,
> depending on how the two input geometries overlap.  I guess it would
> depend on how parcels are "really" assigned to disticts - ie can the
> assignment be replicated algorithmically?
>
> Josh Livni wrote:
>  
>> Well, and again assuming he doesn't mind getting the list of parcels
>> whose centroids intersect each district rather than the list of
>> parcels that intersect each district, checking for point-polygon
>> intersection is going to be faster than checking for polygon-polygon
>> intersection.  So for speed purposes it's probably a good way to go.
>>
>> For a first pass, after ensuring there is a gist index on the centroid
>> (or in this case point-on-surface - tho I'm not sure why), I would
>> expect the following to be faster than the equivalent query w/the full
>> polygons rather than the points:
>>   ...
>>    AND election.the_geom && ST_pointonsurface(parcel2.wkb_geometry)
>>    AND
>> ST_distance(election.the_geom,ST_pointonsurface(parcel2.wkb_geometry))
>> = 0
>>  ...
>> And if you really needed speed, adding a new geometry field with the
>> points precalculated and using that in the second line would make
>> things even faster.
>>
>>  -Josh
>>
>>
>>
>> Chris Hermansen wrote:
>>    
>>> If you want a list of parcels in each precinct, why are you first
>>> creating points?  That's just going to add extra - unnecessary -
>>> geoprocessing.
>>>
>>> Try st_intersects() and st_intersection().
>>>
>>> If you are concerned your parcels overlap more than one precinct, get
>>> the area() of each intersection(), get the max() of that, and group by
>>> your other variables.
>>>
>>> Gregory Williamson wrote:
>>>  
>>>      
>>>> The GIST index is the one you want / need for spatial operations. Make
>>>> sure that you ANALYZE <tablename>; after creating the index or
>>>> changing the data much (rule of thumb is maybe 10% but analyze is a
>>>> fast and fairly low impact operation).
>>>>
>>>> The ST_ functions usually (always?) invoke the && operator (which is
>>>> the spatial operator which will use the GIST indexes); they are
>>>> wrapper for the underlying functions.
>>>>
>>>> Posting your information [type of hardware, OS, postgreSQL and postGIS
>>>> versions, table structure and indexes, tables sizes, the query itself
>>>> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
>>>> performance mail list might be worthwhile if this list doesn't help
>>>> enough.
>>>>
>>>> Greg Williamson
>>>> Senior DBA
>>>> Globexplorer LLC, a company owned by DigitalGlobe
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: postgis-users-bounces@... on behalf of
>>>> easpengren
>>>> Sent: Thu 1/17/2008 12:02 AM
>>>> To: postgis-users@...
>>>> Subject: Re: [postgis-users] I've got a silly question about
>>>> performance
>>>>
>>>>
>>>> OK, I think I've got that. When I imported these shape files into the
>>>> database, an index was created. I added a Gist index to both to see
>>>> what
>>>> would happen. Will that cause a problem?
>>>>
>>>> I do not understand Indices at all. I'll have to read up on them.
>>>>
>>>> We'll see how these things come out. I would like to have something
>>>> usable
>>>> in the next day or two.
>>>>
>>>> I'll likely end up creating a view of this query. Again, I'm still
>>>> figuring
>>>> this out. Speed is really quite important for me. I don't have the most
>>>> powerful machine doing this stuff (more RAM will likely show up this
>>>> week).
>>>> What I'd like to do is make sure the software is working as
>>>> efficiently as I
>>>> can get it.
>>>>
>>>> As I type this, I seem to have gotten a much quicker response from the
>>>> server.
>>>>
>>>> Wow.
>>>>
>>>> Eric
>>>>
>>>> Brent Wood-2 wrote:
>>>>    
>>>>        
>>>>> --- easpengren <phat-ass@...> wrote:
>>>>>
>>>>>      
>>>>>          
>>>>>> I'm still getting the hang of some of the finer points of creating
>>>>>> queries in
>>>>>> PostGIS, as is probably obvious with my last post.
>>>>>>
>>>>>> I've two tables, parcel2 that is a collection of parcels in a
>>>>>>        
>>>>>>            
>>>> county and
>>>>    
>>>>        
>>>>>> a
>>>>>> table election, which is a table of voting precincts in the same
>>>>>>        
>>>>>>            
>>>> county.
>>>>    
>>>>        
>>>>>> I'd
>>>>>> like to select all of the parcels in each precinct.
>>>>>>
>>>>>>        
>>>>>>            
>>>>> That query looks OK, but if parcels can be split across precincts,
>>>>>      
>>>>>          
>>>> then it
>>>>    
>>>>        
>>>>> won't necessarily give the correct answer.
>>>>>
>>>>>      
>>>>>          
>>>>>> I have this query:
>>>>>>
>>>>>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>>>>>> city_code
>>>>>> from election, parcel2 where ST_contains(election.the_geom,
>>>>>> ST_pointonsurface(parcel2.wkb_geometry));
>>>>>>
>>>>>> This gets the job done, but it's very slow. What can I do to speed
>>>>>> this
>>>>>> up?
>>>>>>        
>>>>>>            
>>>>> See the PostGIS docs about creating spatial indices on the geometry
>>>>> columns in
>>>>> your two tables.
>>>>>
>>>>> http://postgis.refractions.net/docs/ch04.html#id2761842
>>>>> http://postgis.refractions.net/docs/ch04.html#id2761985
>>>>>
>>>>> If you have, or if you create them, then you need to modify your
>>>>>      
>>>>>          
>>>> query to
>>>>    
>>>>        
>>>>> use
>>>>> them:
>>>>>
>>>>> .... where election.the_geom && parcel2.wkb_geometry and
>>>>> ST_contains ...
>>>>>
>>>>> as described in:
>>>>> http://postgis.refractions.net/docs/ch04.html#id2762121
>>>>>
>>>>>
>>>>>
>>>>> Hope this helps...
>>>>>
>>>>>   Brent Wood
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users@...
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>>      
>>>>>          
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
>>>>
>>>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@...
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@...
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>      
>>>>        
>>>  
>>>      
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@...
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>    
>
>
>  
_______________________________________________
postgis-users mailing list
postgis-users@...
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by easpengren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Great.

This is all great advice. I hadn't thought of the area(), st_intersection() trick.

Some parcels do cross over into more than one precinct.

Eric
Josh Livni-3 wrote:
Well, and again assuming he doesn't mind getting the list of parcels
whose centroids intersect each district rather than the list of parcels
that intersect each district, checking for point-polygon intersection is
going to be faster than checking for polygon-polygon intersection.  So
for speed purposes it's probably a good way to go.

For a first pass, after ensuring there is a gist index on the centroid
(or in this case point-on-surface - tho I'm not sure why), I would
expect the following to be faster than the equivalent query w/the full
polygons rather than the points:
   ...
    AND election.the_geom && ST_pointonsurface(parcel2.wkb_geometry)
    AND
ST_distance(election.the_geom,ST_pointonsurface(parcel2.wkb_geometry)) = 0
  ...
And if you really needed speed, adding a new geometry field with the
points precalculated and using that in the second line would make things
even faster.

  -Josh



Chris Hermansen wrote:
> If you want a list of parcels in each precinct, why are you first
> creating points?  That's just going to add extra - unnecessary -
> geoprocessing.
>
> Try st_intersects() and st_intersection().
>
> If you are concerned your parcels overlap more than one precinct, get
> the area() of each intersection(), get the max() of that, and group by
> your other variables.
>
> Gregory Williamson wrote:
>  
>> The GIST index is the one you want / need for spatial operations. Make
>> sure that you ANALYZE <tablename>; after creating the index or
>> changing the data much (rule of thumb is maybe 10% but analyze is a
>> fast and fairly low impact operation).
>>
>> The ST_ functions usually (always?) invoke the && operator (which is
>> the spatial operator which will use the GIST indexes); they are
>> wrapper for the underlying functions.
>>
>> Posting your information [type of hardware, OS, postgreSQL and postGIS
>> versions, table structure and indexes, tables sizes, the query itself
>> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
>> performance mail list might be worthwhile if this list doesn't help
>> enough.
>>
>> Greg Williamson
>> Senior DBA
>> Globexplorer LLC, a company owned by DigitalGlobe
>>
>>
>> -----Original Message-----
>> From: postgis-users-bounces@postgis.refractions.net on behalf of
>> easpengren
>> Sent: Thu 1/17/2008 12:02 AM
>> To: postgis-users@postgis.refractions.net
>> Subject: Re: [postgis-users] I've got a silly question about performance
>>
>>
>> OK, I think I've got that. When I imported these shape files into the
>> database, an index was created. I added a Gist index to both to see what
>> would happen. Will that cause a problem?
>>
>> I do not understand Indices at all. I'll have to read up on them.
>>
>> We'll see how these things come out. I would like to have something usable
>> in the next day or two.
>>
>> I'll likely end up creating a view of this query. Again, I'm still
>> figuring
>> this out. Speed is really quite important for me. I don't have the most
>> powerful machine doing this stuff (more RAM will likely show up this
>> week).
>> What I'd like to do is make sure the software is working as
>> efficiently as I
>> can get it.
>>
>> As I type this, I seem to have gotten a much quicker response from the
>> server.
>>
>> Wow.
>>
>> Eric
>>
>> Brent Wood-2 wrote:
>>    
>>> --- easpengren <phat-ass@thinkheavyindustries.com> wrote:
>>>
>>>      
>>>> I'm still getting the hang of some of the finer points of creating
>>>> queries in
>>>> PostGIS, as is probably obvious with my last post.
>>>>
>>>> I've two tables, parcel2 that is a collection of parcels in a
>>>>        
>> county and
>>    
>>>> a
>>>> table election, which is a table of voting precincts in the same
>>>>        
>> county.
>>    
>>>> I'd
>>>> like to select all of the parcels in each precinct.
>>>>
>>>>        
>>> That query looks OK, but if parcels can be split across precincts,
>>>      
>> then it
>>    
>>> won't necessarily give the correct answer.
>>>
>>>      
>>>> I have this query:
>>>>
>>>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>>>> city_code
>>>> from election, parcel2 where ST_contains(election.the_geom,
>>>> ST_pointonsurface(parcel2.wkb_geometry));
>>>>
>>>> This gets the job done, but it's very slow. What can I do to speed this
>>>> up?
>>>>        
>>> See the PostGIS docs about creating spatial indices on the geometry
>>> columns in
>>> your two tables.
>>>
>>> http://postgis.refractions.net/docs/ch04.html#id2761842
>>> http://postgis.refractions.net/docs/ch04.html#id2761985
>>>
>>> If you have, or if you create them, then you need to modify your
>>>      
>> query to
>>    
>>> use
>>> them:
>>>
>>> .... where election.the_geom && parcel2.wkb_geometry and ST_contains ...
>>>
>>> as described in:
>>> http://postgis.refractions.net/docs/ch04.html#id2762121
>>>
>>>
>>>
>>> Hope this helps...
>>>
>>>   Brent Wood
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>      
>> --
>> View this message in context:
>> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>  
>>    
>
>
>  
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: I've got a silly question about performance

by easpengren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I realize this is an old thread. But I thought I'd let you guys know what came of this.

I used the pointonsurface method and it worked like a charm.

After comparing the data to the list of registered voters that we're using this in conjunction with we found that several people in the county had been assigned to the wrong precinct.

The election commissioner was pleased.

I work for a local party and it's not often that we're fixing their data.

I should just contract with them. Hmmm. It might pay more...

Cheers!

Eric
Josh Livni-3 wrote:
Chris,

Good point - thanks for calling me out on that.   I think we're both
making assumptions about a use case -- my assumption was he would
probably be doing a similar calculation many times, and thus my bias
towards using points rather than polys.

Also, as noted - it's hard to say whether centroids or points would
suffice or which is better.  If a parcel intersects two polygons, and
you want to know which polygon it belongs to (and only assign it to one
of the two that it intersects), and if you are working only with point
proxies for parcels, then it's possible a centroid may be a more valid
assignment than a point_on_surface, even if the centroid itself lies
outside the parcel.

Anyway, without having more detail on the actual use-case and workflow,
I have no great opinion on what methodology might work best...

Cheers,
 
 -Josh


Chris Hermansen wrote:
> Josh, one of us is missing something here.
>
> Point on polygon is faster.  But in the method he proposed, first you
> have to calculate the points.  And getting centroids requires visiting
> all of the points on the polygon.
>
> So I'm not seeing how this could be faster.
>
> If he's doing it more than once on the same data set, I agree with your
> pre-calculation comment.
>
> Some other thoughts:
>
> Note that centroids would however be a bad choice, as they can easily be
> outside the polygon.
>
> Even points within the polygons are not necessarily a good choice,
> depending on how the two input geometries overlap.  I guess it would
> depend on how parcels are "really" assigned to disticts - ie can the
> assignment be replicated algorithmically?
>
> Josh Livni wrote:
>  
>> Well, and again assuming he doesn't mind getting the list of parcels
>> whose centroids intersect each district rather than the list of
>> parcels that intersect each district, checking for point-polygon
>> intersection is going to be faster than checking for polygon-polygon
>> intersection.  So for speed purposes it's probably a good way to go.
>>
>> For a first pass, after ensuring there is a gist index on the centroid
>> (or in this case point-on-surface - tho I'm not sure why), I would
>> expect the following to be faster than the equivalent query w/the full
>> polygons rather than the points:
>>   ...
>>    AND election.the_geom && ST_pointonsurface(parcel2.wkb_geometry)
>>    AND
>> ST_distance(election.the_geom,ST_pointonsurface(parcel2.wkb_geometry))
>> = 0
>>  ...
>> And if you really needed speed, adding a new geometry field with the
>> points precalculated and using that in the second line would make
>> things even faster.
>>
>>  -Josh
>>
>>
>>
>> Chris Hermansen wrote:
>>    
>>> If you want a list of parcels in each precinct, why are you first
>>> creating points?  That's just going to add extra - unnecessary -
>>> geoprocessing.
>>>
>>> Try st_intersects() and st_intersection().
>>>
>>> If you are concerned your parcels overlap more than one precinct, get
>>> the area() of each intersection(), get the max() of that, and group by
>>> your other variables.
>>>
>>> Gregory Williamson wrote:
>>>  
>>>      
>>>> The GIST index is the one you want / need for spatial operations. Make
>>>> sure that you ANALYZE <tablename>; after creating the index or
>>>> changing the data much (rule of thumb is maybe 10% but analyze is a
>>>> fast and fairly low impact operation).
>>>>
>>>> The ST_ functions usually (always?) invoke the && operator (which is
>>>> the spatial operator which will use the GIST indexes); they are
>>>> wrapper for the underlying functions.
>>>>
>>>> Posting your information [type of hardware, OS, postgreSQL and postGIS
>>>> versions, table structure and indexes, tables sizes, the query itself
>>>> and the results of EXPLAIN ANALYZE <query>] to the PostgreSQL
>>>> performance mail list might be worthwhile if this list doesn't help
>>>> enough.
>>>>
>>>> Greg Williamson
>>>> Senior DBA
>>>> Globexplorer LLC, a company owned by DigitalGlobe
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: postgis-users-bounces@postgis.refractions.net on behalf of
>>>> easpengren
>>>> Sent: Thu 1/17/2008 12:02 AM
>>>> To: postgis-users@postgis.refractions.net
>>>> Subject: Re: [postgis-users] I've got a silly question about
>>>> performance
>>>>
>>>>
>>>> OK, I think I've got that. When I imported these shape files into the
>>>> database, an index was created. I added a Gist index to both to see
>>>> what
>>>> would happen. Will that cause a problem?
>>>>
>>>> I do not understand Indices at all. I'll have to read up on them.
>>>>
>>>> We'll see how these things come out. I would like to have something
>>>> usable
>>>> in the next day or two.
>>>>
>>>> I'll likely end up creating a view of this query. Again, I'm still
>>>> figuring
>>>> this out. Speed is really quite important for me. I don't have the most
>>>> powerful machine doing this stuff (more RAM will likely show up this
>>>> week).
>>>> What I'd like to do is make sure the software is working as
>>>> efficiently as I
>>>> can get it.
>>>>
>>>> As I type this, I seem to have gotten a much quicker response from the
>>>> server.
>>>>
>>>> Wow.
>>>>
>>>> Eric
>>>>
>>>> Brent Wood-2 wrote:
>>>>    
>>>>        
>>>>> --- easpengren <phat-ass@thinkheavyindustries.com> wrote:
>>>>>
>>>>>      
>>>>>          
>>>>>> I'm still getting the hang of some of the finer points of creating
>>>>>> queries in
>>>>>> PostGIS, as is probably obvious with my last post.
>>>>>>
>>>>>> I've two tables, parcel2 that is a collection of parcels in a
>>>>>>        
>>>>>>            
>>>> county and
>>>>    
>>>>        
>>>>>> a
>>>>>> table election, which is a table of voting precincts in the same
>>>>>>        
>>>>>>            
>>>> county.
>>>>    
>>>>        
>>>>>> I'd
>>>>>> like to select all of the parcels in each precinct.
>>>>>>
>>>>>>        
>>>>>>            
>>>>> That query looks OK, but if parcels can be split across precincts,
>>>>>      
>>>>>          
>>>> then it
>>>>    
>>>>        
>>>>> won't necessarily give the correct answer.
>>>>>
>>>>>      
>>>>>          
>>>>>> I have this query:
>>>>>>
>>>>>> select precinct, sit_st_num, sit_st_dir, sit_st_nam, sit_st_typ,
>>>>>> city_code
>>>>>> from election, parcel2 where ST_contains(election.the_geom,
>>>>>> ST_pointonsurface(parcel2.wkb_geometry));
>>>>>>
>>>>>> This gets the job done, but it's very slow. What can I do to speed
>>>>>> this
>>>>>> up?
>>>>>>        
>>>>>>            
>>>>> See the PostGIS docs about creating spatial indices on the geometry
>>>>> columns in
>>>>> your two tables.
>>>>>
>>>>> http://postgis.refractions.net/docs/ch04.html#id2761842
>>>>> http://postgis.refractions.net/docs/ch04.html#id2761985
>>>>>
>>>>> If you have, or if you create them, then you need to modify your
>>>>>      
>>>>>          
>>>> query to
>>>>    
>>>>        
>>>>> use
>>>>> them:
>>>>>
>>>>> .... where election.the_geom && parcel2.wkb_geometry and
>>>>> ST_contains ...
>>>>>
>>>>> as described in:
>>>>> http://postgis.refractions.net/docs/ch04.html#id2762121
>>>>>
>>>>>
>>>>>
>>>>> Hope this helps...
>>>>>
>>>>>   Brent Wood
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users@postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>>      
>>>>>          
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/I%27ve-got-a-silly-question-about-performance-tp14910258p14913339.html
>>>>
>>>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>      
>>>>        
>>>  
>>>      
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>    
>
>
>  
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users