1. Introduction
This documentation is about geospatial data. We focus on how to obtain geospatial data and how to manipulate it using, primarily, the GDAL command line tools.
1.1. Two basic kinds of geospatial data
Geospatial data comes in two primary types: raster data and vector data.
You can read in greater detail about different geospatial data formats on wikipedia, but the basic data structures are:
-
Raster data is data that is distributed on a grid.
-
A raster is just a grid of data, where each cell in the grid has some value (or values).
-
The cells are sometimes also called pixels. With image data, each pixel in the raster might have several values, such as the value of red, green and blue hues.
-
Image data thus has bands: each band is the information pertaining to the different colors.
-
-
Vector data is data that is a collection of points or polygons distributed in space.
-
"Vector" in the geospatial context doesn’t mean the same thing as in a mathematial context: vector geospatial data includes any data that has vertices that can be anywhere in space.
-
This contrasts with raster data where the data points are fixed in space, usually on a rectangular grid.
-
In most GIS systems, the handling of these two data types are different and you load them with different buttons on your GIS. Below are the two buttons in QGIS 2.14. Usually the raster button looks a bit like a grid and the vector button looks a bit like some shapes with distinct points.
These different kinds of data have different file formats. There is are two linked software libraries for processing these data, called GDAL and OGR: they are commonly used in geospatial software so you should be able to convert between the data types that are read by these two packages. Click on the links for information:
2. Topographic data
Much of what we do using LSDTopoTools is process topographic data.
Topographic data comes in a number of formats, but at a basic level most topographic data is in the form of a raster.
Topographic data is almost always single band: each pixel or cell only has one data value: the elevation. Derivative topographic data, such a slope or aspect, also tends to be in single band rasters.
It is possible to get topographic data that is not in raster format (that is, the data is not based on a grid). Occasionally you find topographic data built on unstructured grids, or point clouds, where each elevation data point has a location in space associated with it. This data format takes up more space than raster data, since on a aster you only need to supply the elevation data: the horizontal positions are determined by where the data sits in the grid. Frequently LiDAR data (LiDAR stands for Light Detection and Ranging, and is a method for obtaining very high resolution topographic data) is delivered as a point cloud and you need software to convert the point cloud to a raster.
For most of this book, we will assume that your data is in raster format.
3. Data sources
Before you can start analysing topography and working with topographic data, you will need to get data and then get it into the correct format. This page explains how to do so.
3.1. What data does LSDTopoTools take?
Although LSDTopoTools works with three data formats, the only one that retains georeferenceing information (where you are in space) is the ENVI bil format, so we highly recommend you convert all data to ENVI bil. |
Do not mix ESRI bil format with ENVI bil format! They are not the same! You need to use ENVI bil. |
The LSDTopoTools works predominantly with raster data; if you don’t know what that is you can read about it here: http://en.wikipedia.org/wiki/Raster_data. In most cases, the raster data you will start with is a digital elevation model (DEM). Digital elevation models (and rasters in general) come in all sorts of formats. LSDTopoTools works with three formats:
Data type | file extension | Description |
---|---|---|
|
This format is in plain text and can be read by a text editor. The advantage of this format is that you can easily look at the data, but the disadvantage is that the file size is extremely large (compared to the other formats, .flt). |
|
Float |
|
This is a binary file format meaning that you can’t use a text editor to look at the data.
The file size is greatly reduced compared to |
|
This is the recommended format, because it works best with GDAL (see the section GDAL), and because it retains georeferencing information. |
Below you will find instructions on how to get data into the correct format: data is delivered in a wide array of formats (e.g., ESRI bil, DEM, GeoTiff) and you must convert this data before it can be used by LSDTopoTools.
3.2. Downloading data
If you want to analyse topography, you should get some topographic data! The last decade has seen incredible gains in the availability and resolution of topographic data. Today, you can get topographic data from a number of sources. The best way to find this data is through search engines, but below are some common sources:
Source | Data type | Description and link |
---|---|---|
lidar |
This is a list of freely available lidar data maintained by Jošt Hobič. The most complete list on the web. https://arheologijaslovenija.blogspot.co.uk/p/blog-page_81.html |
|
lidar |
Lidar raster and point cloud data, funded by the National Science foundation. http://www.opentopography.org/ |
|
lidar and IfSAR |
Lidar raster and point cloud data, and IFSAR (5 m resolution or better), collated by NOAA. http://www.csc.noaa.gov/inventory/# |
|
Various (including IfSAR and lidar, and satellite imagery) |
United States elevation data hosted by the United States Geological Survey. Mostly data from the United States. http://viewer.nationalmap.gov/basic/ |
|
Various (including lidar, IfSAR, ASTER and SRTM data) |
Another USGS data page. This has more global coverage and is a good place to download SRTM 30 mdata. http://earthexplorer.usgs.gov/ |
|
lidar |
This site has lidar data from Spain: http://centrodedescargas.cnig.es/CentroDescargas/buscadorCatalogo.do?codFamilia=LIDAR |
|
lidar |
Finland’s national LiDAR dataset: https://tiedostopalvelu.maanmittauslaitos.fi/tp/kartta?lang=en |
|
lidar |
Denmark’s national LiDAR dataset: http://download.kortforsyningen.dk/ |
|
lidar |
Lidar holdings of the Environment Agency (UK): http://www.geostore.com/environment-agency/WebStore?xml=environment-agency/xml/application.xml |
|
lidar |
lidar from the Trentio, a province in the Italian Alps: http://www.lidar.provincia.tn.it:8081/WebGisIT/pages/webgis.faces |
3.3. Global datasets
There are several global topographic datasets.
-
The oldest of these is gtopo30, which was completed in 1996 and contains ~1 km resolution global data.
-
This was followed by the Shuttle Radar Topography Mission (SRTM) that produced a 90 meter resolution DEM in 2003.
-
SRTM was followed by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) 30 meter resolution global DEM in 2009.
-
2015 has seen the release of the WorldDEM, a 12 meter resolution topographic dataset.
GTOPO30, SRTM and ASTER data are all freely available for download. The WorldDEM is a commercial product. If you are looking for a global dataset at reasonable resolution that is not commercial, you will most likely choose between ASTER and SRTM. You can download ASTER and SRTM data at the same site and make your own comparisons, but ASTER has widely publicised data quality issues so we recommend the SRTM 30 meter data.
4. Projections and transformations
Many of our readers will be aware that our planet is well approximated as a sphere. Most maps and computer screens, however, are flat. This causes some problems.
To locate oneself on the surface of the Earth, many navigational tools use a coordinate system based on a sphere, first introduced by the "father or geography" Eratosthenes of Cyrene. Readers will be familiar with this system through latitude and longitude.
A coordinate system based on a sphere is called a geographic coordinate system. For most of our topographic analysis routines, a geographic coordinate system is a bit of a problem because the distance between points is measured in angular units and these vary as a function of position on the surface of the planet. For example, a degree of longitude is equal to 111.320 kilometers at the equator, but only 28.902 kilometers at a latitude of 75 degrees! For our topographic analyses tools we prefer to measure distances in length rather than in angular units.
To convert locations on a the surface of a sphere to locations on a plane (e.g., a paper map or your computer screen), a map projection is required. All of the LSDTopoTools analysis routines work on a projected coordinate system.
There are many projected coordinate systems out there, but we recommend the Universal Transverse Mercator (UTM) system, since it is a widely used projection system with units of meters.
So, before you do anything with topographic data you will need to:
-
Check to see if the data is in a projected coordinate system
-
Convert any data in a geographic coordinate systems to a projected coordinate system.
Both of these tasks can be done quickly an easily with GDAL software tools.
5. GDAL
Now that you know something about data formats, projections and transformations (since you read very carefully the preceding sections), you are probably hoping that there is a simple tool with which you can manipulate your data. Good news: there is! If you are reading this book you have almost certainly heard of GIS software, which is inviting since many GIS software packages have a nice, friendly and shiny user interface that you can use to reassuringly click on buttons. However, we do not recommend that you use GIS software to transform or project your data. Instead we recommend you use GDAL.
GDAL (the Geospatial Data Abstraction Library) is a popular software package for manipulating geospatial data. GDAL allows for manipulation of geospatial data in the Linux operating system, and for most operations is much faster than GUI-based GIS systems (e.g., ArcMap).
Here we give some notes on common operations in GDAL that one might use when working with LSDTopoTools. Much of these operations are carried out using GDAL’s utility programs, which can be downloaded from http://www.gdal.org/gdal_utilities.html. The appendices have instructions on how to get the GDAL utilities working. You will also have to be able to open a terminal or powershell. Instructions on how to do this are in the appendices.
5.1. Finding out what sort of data you’ve got
One of the most frequent operations in GDAL is just to see what sort of
data you have. The tool for doing this is gdalinfo
which is run with
the command line:
$ gdalinfo filename.ext
where filename.ext
is the name of your raster.
This is used mostly to:
-
See what projection your raster is in.
-
Check the extent of the raster.
This utility can read Arc formatted rasters but you need to navigate
into the folder of the raster and use the .adf
file as the filename.
There are sometimes more than one .adf
files so you’ll just need to
use ls -l
to find the biggest one.
5.1.1. Translating your raster into something that can be used by LSDTopoToolbox
Say you have a raster but it is in the wrong format (LSDTopoToolbox at
the moment only takes .bil
, .flt
and .asc
files) and in the wrong
projection.
LDSTopoToolbox performs many of its analyses on the basis of projected coordinates. |
You will need to be able to both change the projection of your rasters and change the format of your raster. The two utilities for this are:
5.1.2. Changing raster projections with gdalwarp
The preferred coordinate system is WGS84 UTM coordinates. For convert to
this coordinate system you use gdalwarp
. The coordinate system of the
source raster can be detected by gdal, so you use the flag -t_srs
to
assign the target coordinate system. Details about the target coordinate
system are in quotes, you want:
+proj=utm +zone=XX +datum=WGS84'
where XX
is the UTM zone.
You can find a map of UTM zones here: http://www.dmap.co.uk/utmworld.htm. For
example, if you want zone 44 (where the headwaters of the Ganges are),
you would use:
'+proj=utm +zone=XX +datum=WGS84'
Put this together with a source and target filename:
$ gdalwarp -t_srs '+proj=utm +zone=XX +datum=WGS84' source.ext target.ext
so one example would be:
$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' diff0715_0612_clip.tif diff0715_0612_clip_UTM44.tif
note that if you are using UTM and you are in the southern hemisphere,
you should use the +south
flag:
$ gdalwarp -t_srs '+proj=utm +zone=19 +south +datum=WGS84' 20131228_tsx_20131228_tdx.height.gc.tif Chile_small.tif
There are several other flags that could be quite handy (for a complete list see the GDAL website).
-
-of
format
: This sets the format to the selected format. This means you can skip the step of changing formats withgdal_translate
. We will repeat this later but the formats forLSDTopoTools
are:Table 3. Format of outputs for GDAL Flag Description ASCGrid
ASCII files. These files are huge so try not to use them.
EHdr
ESRI float files. This used to be the only binary option but GDAL seems to struggle with it and it doesn’t retain georeferencing.
ENVI
ENVI rasters. This is the preferred format. GDAL deals with these files well and they retain georeferencing. We use the extension
bil
with these files.So, for example, you could output the file as:
$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -of ENVI diff0715_0612_clip.tif diff0715_0612_clip_UTM44.bil
Or for the southern hemisphere:
$ gdalwarp -t_srs '+proj=utm +zone=19 +south +datum=WGS84' -of ENVI 20131228_tsx_20131228_tdx.height.gc.tif Chile_small.bil
-
-tr
xres yres
: This sets the x and y resolution of the output DEM. It uses nearest neighbour resampling by default. So say you wanted to resample to 4 metres:$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -tr 4 4 diff0715_0612_clip.tif diff0715_0612_clip_UTM44_rs4.tif
LSDRasters assume square cells so you need both x any y distances to be the same -
-r
resampling_method
: This allows you to select the resampling method. The options are:Table 4. Resampling methods for GDAL Method Description near
Nearest neighbour resampling (default, fastest algorithm, worst interpolation quality).
bilinear
Bilinear resampling.
cubic
Cubic resampling.
cubicspline
Cubic spline resampling.
lanczos
Lanczos windowed sinc resampling.
average
Average resampling, computes the average of all non-NODATA contributing pixels. (GDAL versions >= 1.10.0).
mode
Mode resampling, selects the value which appears most often of all the sampled points. (GDAL versions >= 1.10.0).
So for example you could do a cubic resampling with:
$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -tr 4 4 -r cubic diff0715_0612_clip.tif diff0715_0612_clip_UTM44_rs4.tif
-
-te
<x_min> <y_min> <x_max> <y_max>
: this clips the raster. You can see more about this below in under the header Clipping rasters with gdal.-
UTM South: If you are looking at maps in the southern hemisphere, you need to use the
+south
flag:$ gdalwarp -t_srs '+proj=utm +zone=44 +south +datum=WGS84' -of ENVI diff0715_0612_clip.tif diff0715_0612_clip_UTM44.bil
-
5.1.3. Changing Nodata with gdalwarp
Sometimes your source data has nodata values that are weird, like 3.08x10^36 or something. You might want to change these values in your output DEM. You can do this with gdalwarp:
$ gdalwarp -of ENVI -dstnodata -9999 harring_dem1.tif Harring_DEM.bil
In the above case I’ve just changed a DEM from a tif to an ENVI bil but used -9999 as the nodata value.
5.1.4. Changing raster format with gdal_translate
Suppose you have a raster in UTM coordinates
(zones can be found here: http://www.dmap.co.uk/utmworld.htm) but
it is not in .flt
format. You can change the format using
gdal_translate
(note the underscore).
gdal_translate recognizes many file formats, but for LSDTopoTools you want either:
-
The ESRI .hdr labelled format, which is denoted with
EHdr
. -
The ENVI .hdr labelled format, which is denoted with
ENVI
. ENVI files are preferred since they work better with GDAL and retain georeferencing.
To set the file format you use the -of
flag, an example would be:
$ gdal_translate -of ENVI diff0715_0612_clip_UTM44.tif diff0715_0612_clip_UTM44.bil
Where the first filename.ext
is the source file and the second is the
output file.
5.1.5. Nodata doesn’t register
In older versions of GDAL, the NoDATA value doesn’t translate when you use gdalwarp and gdal_traslate.
If this happens to you, the simple solution is to go into the 'hdr' file and add the no data vale.
You will need to use gdalinfo to get the nodata value from the source raster, and then in the header of the destination raster,
add the line: data ignore value = -9999 (or whatever the nodata value in the source code is).
|
If you want to change the actual nodata value on an output DEM, you will need to use gdalwarp with the -dstnodata
flag.
5.1.6. Potential filename errors
It appears that GDAL considers filenames to be case-insensitive, which can cause data management problems in some cases. The following files are both considered the same:
Feather_DEM.bil feather_dem.bil
This can result in an ESRI *.hdr
file overwriting an ENVI *.hdr
file
and causing the code to fail to load the data. To avoid this ensure that
input and output filenames from GDAL processes are unique.
5.2. Clipping rasters with gdal
You might also want to clip your raster to a smaller area. This can
sometimes take ages on GUI-based GISs. An alternative is to use
gdalwarp
for clipping:
$ gdalwarp -te <x_min> <y_min> <x_max> <y_max> input.tif clipped_output.tif
or you can change the output format:
$ gdalwarp -te <x_min> <y_min> <x_max> <y_max> -of ENVI input.tif clipped_output.bil
Since this is a gdalwarp
operation, you can add all the bells and
whistles to this, such as:
-
changing the coordinate system,
-
resampling the DEM,
-
changing the file format.
The main thing to note about the -te
operation is that the clip will
be in the coordinates of the source raster (input.tif
). You can look
at the extent of the raster using gdalinfo
.
5.3. Merging large numbers of rasters
Often websites serving topographic data will supply it to you in tiles. Merging these rasters can be a bit of an annoyance if you use a GIS, but is a breeze with GDAL. The gdal_merge.py
program allows you to feed in a file with the names of the rasters you want to merge, so you can use a Linux pipe to merge all of the rasters of a certain format using just two commands:
$ ls *.asc > DEMs.txt
$ gdal_merge.py -of ENVI -o merged_dem.bil --optfile DEMs.txt
The above command works for ascii files but tif
or other file formats would work as well.
The exception to this is ESRI files, which have a somewhat bizarre structure which requires a bit of extra work to get the file list. Here is an example python script to process tiles that have a common directory name:
def GetESRIFileNamesNextMap():
file_list = []
for DirName in glob("*/"):
#print DirName
directory_without_slash = DirName[:-1]
this_filename = "./"+DirName+directory_without_slash+"dtme/hdr.adf\n"
print this_filename
file_list.append(this_filename)
# write the new version of the file
file_for_output = open("DEM_list.txt",'w')
file_for_output.writelines(file_list)
file_for_output.close()
If you use this you will need to modify the directory structure to reflect your own files.
5.4. Creating shapefiles or vector format files with GDAL
Sometimes it is useful to "polygonize" a raster.
The most common case is when you want a shapefile from a basin raster.
To do this, you can use the gdal_polygonize.py
tool.
$ gdal_polygonize.py My_AllBasins.bil -f "ESRI Shapefile" Outputfile.shp
If you do not give the file format (with the flag -f ), the script will guess the output format on the basis of the extension. This doesn’t always work! We suggest using the -f flag. Common file formats (hopefully self-explanatory) are "GeoJSON" , "KML" , "ESRI Shapefile" .
|
If you do not correctly set the output format, it will default to GML format, which in my testing does not preserve the georeferencing. |