dadadi blog

To content | To menu | To search

April 2011

Sunday, April 10 2011

More playing with srtm data: altiph

After importing srtm data into postgis, it was time to query and use that data.

route-altitude-profile can do that, but this time, I needed a php tool. So, I wrote altiphp library.

When given longitude and latitude, altiphp queries the database and returns altitude. Actually, it queries the altitude for nearest available points, and performs a bilinear interpolation (I took the idea from route-altitude-profile). When given an array of points, altiphp returns an array of altitude.

altiphp can add points to give a more accurate route profile. For example, if you have two points at the top of two mountains, altiphp knows how to add points between them each one separated of about 90m from the next one (90m is srtm resolution). You get then an accurate route profile going down to the valley and then climbing back to the other mountain. This behavior is of course optional.

Altiphp also integrates with gisconverter.php. When gisconverter is available, altiphp can handle geometries directly.

If you have not imported srtm data in a postgis database, you can use non-postgis mode. In this mode, altiphp downloads data files form usgs server, optionally stores them for future retrieval, and parses them to extracts data. This works fine, gets the same results as postgis mode, but parsing the file is really slow and each file, once parsed, takes about 300Mo in memory. So, using this method is not a good idea for a website, even a small one, but this can be helpful in some cases, for example for a seldom used local application or for testing.

altiphp needs php 5.3, is licensed under modified bsd license and is available at github.

Sunday, April 3 2011

srtm2postgis improved

In 2000, NASA Space Shuttle Endeavour had a 11 days mission named SRTM. It was about acquiring topographic information of earth surface, from 56°S to 60°N. Data has been released in public domain. It's one of the main sources for free global topographic data.

I want to play with these data, and the first part of the game is putting data into a postgis database. Fortunately, this had already been done as part of Route altitude profiles SRTM. It's a 2008 Google Summer of Code project by Sjors Provoost. Srtm2postgis is a small utility that downloads srtm data, and loads it into postgis.

Download urls for SRTM had changed, so original srtm2postgis does not work as-is. Fortunately, a fork with updated urls is available on github.

So, I started tickering with srtm2postgis. I fixed a few minor bugs. For example, srtm data is split into 14546 files. 14540 of them have all upper case name, and 6 have lower letters name. Those 6 files could not be downloaded. I also fixed some typos in the README, and a few failing tests, but overall, the script was running smoothly.

Then I spent some time trying to improve import time. During its summer of code, Sjors Provoost reported importing data at one tile a minute speed. A 75 tiles import (44°N to 50°N and 3°E to 7°W) took me 80 minutes. This about the same as Sjors Provoost result.

To connect to postgres, srtm2postgis uses pygresql, but also uses psycopg because of its copy_from method. But copy_from method was ultimately not used, and a psql subprocess was launched to run the copy instead. A comment explained that psycopg copy_from method was hanging. I tried to use it and it was not hanging. As the comment was nearly 3 years old, I assume this is a now fixed bug. I'm also glad this was clearly documented. Otherwise, this workaround would have been impossible to understand. With this modification, my 75 tiles import was only 49 minutes. I really don't understand why this improves performances so much, but I'm happy about it.

Then, I totally remove pygresql, and only used psycopg. This allows me, first to open only one connection, then, to run everything in a single transaction, and commit everything at the end. Import was now 42 minutes.

After improving database-related performances, I tried to speed up things in other parts of the code. SRTM data is read by gdal library. It returns a 1200 * 1200 arrays of numpy.int16. Converting int16 to string is something really slow. On a 1200 * 1200 grid, many points have the same altitude. So, I use a cache to store string representations of numpy.int16 altitudes. This resulted in a major performance improvement. My 75 tile import is now 28 minutes.

I also use string buffer instead of a temporary file to store data to copy, but this does not improve performance significantly. I also tried to remove table primary key constraint and recreate it after import. I thought it would improve copy speed. But actually, it damaged performances a lot.

Eventually, I run VACUUM ANALYZE after import. This takes 10 minutes, but this will probably improve future performances.

I could improve import time from 80 to 28 minutes (38 minutes with VACUUM ANALYZE), with is nearly 3 times faster!!! My updated srtm2postgis version is available on github.