|
View:
New views
16 Messages
—
Rating Filter:
Alert me
|
|
|
osm2pgsql tile expiry freaks me outHi,
I'm trying to work with the tile expiry list produced by osm2pgsql but something seems to be very wrong there. Last night's daily diff, for example, had 459 MB; about 2.3 million objects were changed. The corresponding osm2pgsql tile expiry list for zoom level 18 (done with -e 18-18) has 5.6 GB and claims to mark dirty a whopping 350 million objects. That would mean that one changed object, on average, touches more than 100 tiles on zoom level 18. I'm running an update every 15 minutes on another server, and there I get the following, equally strange results: date/time tiles expired # of changed objects ---------------------------------------------------- 2009-11-04 14:30 7469596 20022 2009-11-04 14:45 792306 19125 2009-11-04 15:00 158804709 17155 2009-11-04 15:30 7092126 33276 2009-11-04 16:00 500746 47315 2009-11-04 16:15 226193 65182 2009-11-04 16:30 61437040 94737 2009-11-04 16:45 96680596 46008 2009-11-04 17:00 2195996 111912 2009-11-04 17:15 1459539 63360 2009-11-04 17:30 585327 51517 2009-11-04 17:45 4452958 27274 2009-11-04 18:00 1927180 60348 2009-11-04 18:15 90720429 23879 As you can see, the average number of edits in a 15-minute window is around 50k, but the number of expired tiles varies wildly beyond belief. I did fix two bugs regarding tile expiry recently, one of which, I believe, could have caused larger than normal bounding boxes to be expired, but while the daily numbers cited above stem from before the fix while the 15-minutely are from a fixed version, so that doesn't seem to make a difference. (All numbers above refer to a planet-wide database.) I'm in the process of instrumenting osm2pgsql to find out what changes generate these huge tile lists but maybe someone has a hunch. Bye Frederik -- Frederik Ramm ## eMail frederik@... ## N49°00'09" E008°23'33" _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Wed, 2009-11-04 at 22:37 +0100, Frederik Ramm wrote:
> I'm trying to work with the tile expiry list produced by osm2pgsql > but something seems to be very wrong there. I never quite managed to get around to using the osm2pgsql based expiry code on the main tile server. I still use the ruby scripts written by Matt http://trac.openstreetmap.org/browser/applications/utils/export/tile_expiry I have discussed the expiry scripts with Matt a couple of times and he found that the osm2pgsql based approach tended to hit the DB quite hard. He found that even though the osm2pgsql code should in theory produce more accurate results, the ruby scripts tended to work better overall. Jon _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outHi,
Jon Burgess wrote: > I never quite managed to get around to using the osm2pgsql based expiry > code on the main tile server. I still use the ruby scripts written by > Matt I might have to recourse to them as well. > I have discussed the expiry scripts with Matt a couple of times and he > found that the osm2pgsql based approach tended to hit the DB quite hard. I haven't measured this thoroughly, but on the machine where I do updates every 15 minutes, the updates normally took around 150 seconds, and with expiry switched on it's more like 250 seconds or so. > He found that even though the osm2pgsql code should in theory produce > more accurate results, the ruby scripts tended to work better overall. Matt's scripts don't do relations and this is probably the reason why they work well. My experiments have shown the following (for changes covering a three-hour interval): ... 68440 | psql_out_relation (boundary) for 53134 expires 68440 tiles 68440 | psql_out_relation (boundary) for 53136 expires 68440 tiles 68440 | psql_out_relation for 53134 expires 68440 tiles 68440 | psql_out_relation for 53136 expires 68440 tiles 72248 | psql_out_relation for 45757 expires 72248 tiles 81951 | psql_out_relation for 44882 expires 81951 tiles 95256 | psql_out_relation for 276835 expires 95256 tiles 95256 | psql_out_way (poly) for 23947173 expires 95256 tiles 96010 | psql_out_relation for 7400 expires 96010 tiles 96160 | psql_out_relation for 8648 expires 96160 tiles 106239 | psql_out_relation for 44879 expires 106239 tiles 132068 | psql_out_relation for 310887 expires 132068 tiles 132068 | psql_out_relation for 310887 expires 132068 tiles 161242 | psql_out_relation for 52411 expires 161242 tiles 221445 | psql_out_relation for 62440 expires 221445 tiles 228092 | psql_out_relation for 62417 expires 228092 tiles 269451 | psql_out_relation (boundary) for 47667 expires 269451 tiles 269451 | psql_out_relation (boundary) for 47667 expires 269451 tiles 417795 | psql_out_relation (boundary) for 53134 expires 417795 tiles 432478 | psql_out_way (poly) for 35421140 expires 432478 tiles 830396 | psql_out_relation (boundary) for 47654 expires 830396 tiles 830396 | psql_out_relation (boundary) for 47654 expires 830396 tiles 881066 | psql_out_relation (boundary) for 44882 expires 881066 tiles 998215 | psql_out_relation (boundary) for 45756 expires 998215 tiles 1305480 | psql_out_relation for 73347 expires 1305480 tiles 1680708 | psql_out_relation for 73340 expires 1680708 tiles 2019020 | psql_out_relation (boundary) for 7400 expires 2019020 tiles 2287500 | psql_out_relation (boundary) for 45757 expires 2287500 tiles 2510272 | psql_out_relation (boundary) for 8648 expires 2510272 tiles 4353265 | psql_out_relation (boundary) for 44879 expires 4353265 tiles 6872596 | psql_out_relation (boundary) for 52411 expires 6872596 tiles So there are relations, especially boundary relations, where a little change to the relation expires a couple million level-18 tiles. (The largest way, #35421140, a riverbank, expires half a million.) I suspect that at least the large results for relations are due to an inefficiency; probably the whole circumference of the relation is marked dirty if a little bit changes here or there, something which would not be necessary. (In theory, of course, a rendering rule depending on the polygon area could flick over and render a whole country pink instead of gray just because its area has changed minimally...) Also, if the geometry of a way changes (and not its tags), then I could probably compare the new geometry to the old one and expire only where they differ - at least if expiring the whole length of the way means half a million tiles or so. But as for tagging changes, we're quickly getting into terrain where expiry and render rules intermingle; if someone changes the "source" tag on a very large polygon way, do I really need to expire half a million tiles? But what if the same way's landuse tag is changed? It is probably a bug or an inefficiency that we have such a high number of expired tiles at the moment, but even with perfectly functioning software of course it would be possible that e.g. a large boundary gets a new admin_level or so and expiry of a very large number of tiles is actually required... Bye Frederik -- Frederik Ramm ## eMail frederik@... ## N49°00'09" E008°23'33" _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, Nov 5, 2009 at 1:23 AM, Frederik Ramm <frederik@...> wrote:
> Jon Burgess wrote: >> I have discussed the expiry scripts with Matt a couple of times and he >> found that the osm2pgsql based approach tended to hit the DB quite hard. > > I haven't measured this thoroughly, but on the machine where I do > updates every 15 minutes, the updates normally took around 150 seconds, > and with expiry switched on it's more like 250 seconds or so. when i tried using the osm2pgsql expiry on the minutely no-names tile server it slowly went out of sync. after about a week of running it was 2 days behind. admittedly, that EC2 server it's running on doesn't exactly have stellar disk performance. >> He found that even though the osm2pgsql code should in theory produce >> more accurate results, the ruby scripts tended to work better overall. > > Matt's scripts don't do relations and this is probably the reason why > they work well. My experiments have shown the following (for changes > covering a three-hour interval): yeah, the assumption is that any change to a relation is going to (probably) involve changes to one or more nodes and ways, so the relation (probably) doesn't need expiring in total. of course, there are cases where it is necessary, but the scripts are only intended as a quick-and-dirty approximation to proper expiry. > So there are relations, especially boundary relations, where a little > change to the relation expires a couple million level-18 tiles. (The > largest way, #35421140, a riverbank, expires half a million.) is it expiring the bbox of that way, or just the tiles touching the boundary? > I suspect that at least the large results for relations are due to an > inefficiency; probably the whole circumference of the relation is marked > dirty if a little bit changes here or there, something which would not > be necessary. (In theory, of course, a rendering rule depending on the > polygon area could flick over and render a whole country pink instead of > gray just because its area has changed minimally...) > > Also, if the geometry of a way changes (and not its tags), then I could > probably compare the new geometry to the old one and expire only where > they differ - at least if expiring the whole length of the way means > half a million tiles or so. indeed. and there are operations, such as reversing the way, which might not change the rendering at all. i've been working on something that uses the method you describe to track "real" changes by using the diff/patch algorithm to find insertions, deletions and changes to the way_nodes and relation_members. it's then much easier to expire the real changes - assuming, of course, that your area-based colour change rules are absent ;-) > But as for tagging changes, we're quickly getting into terrain where > expiry and render rules intermingle; if someone changes the "source" tag > on a very large polygon way, do I really need to expire half a million > tiles? But what if the same way's landuse tag is changed? It is probably > a bug or an inefficiency that we have such a high number of expired > tiles at the moment, but even with perfectly functioning software of > course it would be possible that e.g. a large boundary gets a new > admin_level or so and expiry of a very large number of tiles is actually > required... indeed. it's a complex problem. there's a quick-and-dirty solution, but to do it properly, efficiently and accurately is very hard. i think jon was saying in the pub last night that diff updates and expiry already take up more resources than rendering tiles on yevaud. and that's with the quick-and-dirty solution. cheers, matt _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn mercredi 4 novembre 2009, Frederik Ramm wrote:
> Hi, > > I'm trying to work with the tile expiry list produced by osm2pgsql > but something seems to be very wrong there. Last night's daily diff, for > example, had 459 MB; about 2.3 million objects were changed. The > corresponding osm2pgsql tile expiry list for zoom level 18 (done with -e > 18-18) has 5.6 GB and claims to mark dirty a whopping 350 million > objects. That would mean that one changed object, on average, touches > more than 100 tiles on zoom level 18. I'm sorry I don't have any clue about what your problem is, but I confirm those figures to be very high and strange. My version is rather old : osm2pgsql SVN version 0.66-15819M Maybe that could help spot the problem. Figures I can provide are like this : - I'm only working on an europe BBOX (--bbox -27,31,50,72 -e 18) - Using hourly diffs - Every 5 hours I compute full tile list to expire My "expire_list" after 5 hours is around 50Mb on average Contains around 2M tiles to set dirty at zoom 18 haven't you duplicates in your tiles_list ? <blabla my life> This may occasionnaly vary a lot when some bulk modification are down to large relations (such as france admin_level 8 boundaries ;-) ) but it stays manageable at cruise speed. On the overall, I'm not unhappy with the result, and just faced the "large area" problem. Last big france import of corine land cover data (where you would need to expire tiles inside big forest) let my tiles into a strange funny state where I only had the perimeter of large forest/farms/... But that is the price to pay to avoid expiring every tiles when one changes the name:XY of the boundary of a big country. More perfect but harder approax would probably be (as said in this thread) have a clue about what is an area, what is a border. -- sly Sylvain Letuffe liste@... qui suis-je : http://slyserv.dyndns.org _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Wed, 4 Nov 2009, Frederik Ramm wrote:
> The > corresponding osm2pgsql tile expiry list for zoom level 18 (done with -e > 18-18) has 5.6 GB and claims to mark dirty a whopping 350 million > objects. Ouch. That sounds very high. However, I don't think I've ever tried running with "-e 18-18", I use "-e 0-17" on OpenPisteMap. > As you can see, the average number of edits in a 15-minute window is > around 50k, but the number of expired tiles varies wildly beyond belief. I'm not sure you can draw any conclusions from the number of expiries varying wildly. For example - an edit that changes a single node is only going to affect a few tiles, whereas an edit that changes a relation made up of many objects has the potential to affect very large numbers of tiles. > I'm in the process of instrumenting osm2pgsql to find out what changes > generate these huge tile lists but maybe someone has a hunch. I'm afraid I can't think of anything off the top of my head, the expiry stuff was running on OpenPisteMap quite successfully over several months after I implemented it. Unfortunately, I'm not currently doing updates to OpenPisteMap (pending some replacement hard disks and Postgres tuning) so I can't tell you if anything has broken recently. I'd certainly be interested in anything you find out though. -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, 5 Nov 2009, Jon Burgess wrote:
> I have discussed the expiry scripts with Matt a couple of times and he > found that the osm2pgsql based approach tended to hit the DB quite hard. Unfortunately there isn't really a way around hitting the database - ISTR that where possible, the expiry code tries to use data it already has in memory, but for some data (such as the deletion of an old object) there just isn't anything you can do other than going to the database to find out where that object was. > He found that even though the osm2pgsql code should in theory produce > more accurate results, the ruby scripts tended to work better overall. If the Ruby scripts produce more accurate results, we should certainly look at why and try to improve the osm2pgsql code. The ruby scripts probably don't expire as many tiles - I think they only expire metatiles that have a node associated with a changed object on them, which means that it won't catch cases where a way crosses the corner of a metatile (with no nodes on the metatile) and where a filled polygon changes (resulting in a need to re-render all the tiles within it). -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, Nov 5, 2009 at 4:22 PM, Steve Hill <steve@...> wrote:
> On Thu, 5 Nov 2009, Jon Burgess wrote: >> He found that even though the osm2pgsql code should in theory produce >> more accurate results, the ruby scripts tended to work better overall. > > If the Ruby scripts produce more accurate results, we should certainly > look at why and try to improve the osm2pgsql code. they don't. i think when jon said they work better overall, he meant that they're usually "good enough" and are quicker to run, not that they're more accurate. > The ruby scripts > probably don't expire as many tiles - I think they only expire metatiles > that have a node associated with a changed object on them, which means > that it won't catch cases where a way crosses the corner of a metatile > (with no nodes on the metatile) and where a filled polygon changes > (resulting in a need to re-render all the tiles within it). indeed. they're quick and dirty. the algorithm is just: expire all metatiles which contain a node which changed, or a node which is part of a way which changed. where "changed" means "was in the osc file". cheers, matt _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, 5 Nov 2009, Frederik Ramm wrote:
> I suspect that at least the large results for relations are due to an > inefficiency; probably the whole circumference of the relation is marked > dirty if a little bit changes here or there, something which would not > be necessary. (In theory, of course, a rendering rule depending on the > polygon area could flick over and render a whole country pink instead of > gray just because its area has changed minimally...) Yes, (unless someone has changed it since I wrote it) the expiry code could be a lot smarter. If an object changes, it expires all the nodes that the old version of the object covered and all the nodes that the new version covers - in the case of geometry changes, comparing the geometries of the two versions of the object and figuring out what tiles actually have changes on them would significantly reduce the expiry count. In the case of polygons, it tries to be slightly smart: for polygons under a certain size it expires everything within the polygon (since you might need to rerender everything within the whole polygon), but for large polygons it switches to treating it like a normal way and just expiring the perimeter. This is a tradeoff - on the one hand, expiring just the perimeter means that you might end up with out of date tiles within the polygon that don't get marked as expired, but on the other hand, expiring an entire continent could be a really really Bad Thing. > But as for tagging changes, we're quickly getting into terrain where > expiry and render rules intermingle; if someone changes the "source" tag > on a very large polygon way, do I really need to expire half a million > tiles? But what if the same way's landuse tag is changed? Yes, this is definately a problem - the current expiry system expires far too much stuff and really needs some knowledge about the rendering rules. As well as the examples you cite, theres also the fact that osm2pgsql doesn't know if a polygon will be rendered as a filled object or as an outline - just knowing whether you need to expire the whole polygon or just the outline could make a big difference. To integrate osm2pgsql with the rendering rules, we either need another rendering rule definition file, or we need to be able to parse the mapnik rules and automatically water them down to some simple data about how we are treating each tag/value pair. -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, 5 Nov 2009, Matt Amos wrote:
> when i tried using the osm2pgsql expiry on the minutely no-names tile > server it slowly went out of sync. after about a week of running it > was 2 days behind. admittedly, that EC2 server it's running on doesn't > exactly have stellar disk performance. To be honest, I found that just running the osm2pgsql updates really hammered the database, but you're right that the expiry stuff adds to that load. I intend to spend some time fiddling with the postgres settings to see if I can tune the performance a bit. OpenPisteMap used to manage to just about keep up (used to lag behind during the day and then catch up at night), but the I/O rather crippled the machine. After a hard disk failed in another of my servers I had to move a lot of services onto the database server, which left me having to shut down the updates because it couldn't cope with the extra load. I just got the failed server running again with 3 brand new hard disks and imported a new planet file onto that so I could have a go at tuning the postgres settings, and then 2 of the new disks failed, so I'm waiting on them being replaced at the moment... Not having much luck with this machine. :) > yeah, the assumption is that any change to a relation is going to > (probably) involve changes to one or more nodes and ways, so the > relation (probably) doesn't need expiring in total. I think that assumption is very faulty - there are plenty of tag-changes you can make to relations that wouldn't involve updating the component ways and nodes. > indeed. and there are operations, such as reversing the way, which > might not change the rendering at all. The key word here is "might" :) Unfortunately, without knowing the rendering rules, you don't know if reversing the way would change it or not. In the case of one-way streets, it most definately would change the rendering. The chances are that if someone is reversing the way, they are probably doing it for a rendering-related reason - i.e. they are reversing it because it has rendered wrong. Of course, you also reverse ways when you merge them, but handling merges efficently is a whole other can of worms. :) > indeed. it's a complex problem. there's a quick-and-dirty solution, > but to do it properly, efficiently and accurately is very hard. i > think jon was saying in the pub last night that diff updates and > expiry already take up more resources than rendering tiles on yevaud. > and that's with the quick-and-dirty solution. The diff updates really are the killer for me - they are way more I/O intensive than a non-slim-mode planet import. I'm using reasonably meaty machines, but the last slim-mode planet import I did took days... -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, 5 Nov 2009, sly (sylvain letuffe) wrote:
> haven't you duplicates in your tiles_list ? If there are duplicates then that indicates a very fundamental problem somewhere. The way the expired tiles are stored in memory prior to being dumped into the text file makes it impossible to get duplicates. -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn jeudi 5 novembre 2009, Steve Hill wrote:
> On Thu, 5 Nov 2009, sly (sylvain letuffe) wrote: > > > haven't you duplicates in your tiles_list ? > > If there are duplicates then that indicates a very fundamental problem > somewhere. The way the expired tiles are stored in memory prior to being > dumped into the text file makes it impossible to get duplicates. My sentence was badly formed, I meant : "Do you have duplicates in your tiles_list ?" In case he is doing like me, that is : running a cron job over and over again the same file where duplicates might acumulate. But there is a few chance he reaches such a number, so that was probably a bad guess. -- sly Sylvain Letuffe liste@... qui suis-je : http://slyserv.dyndns.org _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn jeudi 5 novembre 2009, Steve Hill wrote:
> The diff updates really are the killer for me - they are way more I/O > intensive than a non-slim-mode planet import. I'm using reasonably meaty > machines, but the last slim-mode planet import I did took days... Same thing here. An hour update diff (europe only) takes 15 minutes on average with a 1MB/s only disk I/O, so I strongly suspect access time is the key. I suspect having main memory as large as the DB would help, and since that's gone be rather expensive, either switch to SSD disk or have a RAID0 array (which is what I'm doing with a straight x2 time factor) But I'll be happy to hear feedback of an SSD setup if someone has one -- sly Sylvain Letuffe liste@... qui suis-je : http://slyserv.dyndns.org _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outHi,
Steve Hill wrote: > If there are duplicates then that indicates a very fundamental problem > somewhere. The way the expired tiles are stored in memory prior to being > dumped into the text file makes it impossible to get duplicates. My expire files don't contain duplicates, I checked that before reporting the large numbers here. Bye Frederik -- Frederik Ramm ## eMail frederik@... ## N49°00'09" E008°23'33" _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outOn Thu, 5 Nov 2009, sly (sylvain letuffe) wrote:
> In case he is doing like me, that is : running a cron job over and over again > the same file where duplicates might acumulate. But there is a few chance he > reaches such a number, so that was probably a bad guess. Ah, ok. That makes sense. The way I handle expiry means that I get a fresh expiry list each time rather than appending to an existing one - as soon as each osm2pgsql run completes I discard any entried from the expiry list that don't have associated rendered tiles actually on the disk(*). * it's actually a bit more complicated than this since you have to handle all the zoom levels. See: https://subversion.nexusuk.org/trac/browser/openpistemap/trunk/scripts/expire_tiles.py to see how I do it. -- - Steve xmpp:steve@... sip:steve@... http://www.nexusuk.org/ Servatis a periculum, servatis a maleficum - Whisper, Evanescence _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
|
|
Re: osm2pgsql tile expiry freaks me outHi,
Frederik Ramm wrote: > So there are relations, especially boundary relations, where a little > change to the relation expires a couple million level-18 tiles. (The > largest way, #35421140, a riverbank, expires half a million.) While this doesn't completely invalidate my findings, I noticed that I arrived at very large numbers because my PostGIS database is in lat/lon and not spherical Mercator. I had adapted all the tile expiry code to work with this, but forgot fixing the EXPIRE_TILES_MAX_BBOX parameter which means that when I ran it, osm2pgsql always expired whole polygon areas and never only the perimeter. The fixed version *still* expires lots of tiles for large relations, but not quite so many. Bye Frederik -- Frederik Ramm ## eMail frederik@... ## N49°00'09" E008°23'33" _______________________________________________ dev mailing list dev@... http://lists.openstreetmap.org/listinfo/dev |
| Free embeddable forum powered by Nabble | Forum Help |