Introduction to geospatial data

Return to the main LSDTopoTools landing page

1. Introduction

This documentation is about geospatial data. We focus on how to obtain geospatial data and how to manipulate it using, primarily, the GDAL command line tools.

1.1. Two basic kinds of geospatial data

Geospatial data comes in two primary types: raster data and vector data.

You can read in greater detail about different geospatial data formats on wikipedia, but the basic data structures are:

Raster data is data that is distributed on a grid.
- A raster is just a grid of data, where each cell in the grid has some value (or values).
- The cells are sometimes also called pixels. With image data, each pixel in the raster might have several values, such as the value of red, green and blue hues.
- Image data thus has bands: each band is the information pertaining to the different colors.
Vector data is data that is a collection of points or polygons distributed in space.
- "Vector" in the geospatial context doesn’t mean the same thing as in a mathematial context: vector geospatial data includes any data that has vertices that can be anywhere in space.
- This contrasts with raster data where the data points are fixed in space, usually on a rectangular grid.

In most GIS systems, the handling of these two data types are different and you load them with different buttons on your GIS. Below are the two buttons in QGIS 2.14. Usually the raster button looks a bit like a grid and the vector button looks a bit like some shapes with distinct points.

Figure 1. Add raster and vector buttons in QGIS. The one that looks like a grid is the add raster button

These different kinds of data have different file formats. There is are two linked software libraries for processing these data, called GDAL and OGR: they are commonly used in geospatial software so you should be able to convert between the data types that are read by these two packages. Click on the links for information:

2. Topographic data

Much of what we do using LSDTopoTools is process topographic data.

Topographic data comes in a number of formats, but at a basic level most topographic data is in the form of a raster.

Topographic data is almost always single band: each pixel or cell only has one data value: the elevation. Derivative topographic data, such a slope or aspect, also tends to be in single band rasters.

It is possible to get topographic data that is not in raster format (that is, the data is not based on a grid). Occasionally you find topographic data built on unstructured grids, or point clouds, where each elevation data point has a location in space associated with it. This data format takes up more space than raster data, since on a aster you only need to supply the elevation data: the horizontal positions are determined by where the data sits in the grid. Frequently LiDAR data (LiDAR stands for Light Detection and Ranging, and is a method for obtaining very high resolution topographic data) is delivered as a point cloud and you need software to convert the point cloud to a raster.

For most of this book, we will assume that your data is in raster format.

3. Data sources

Before you can start analysing topography and working with topographic data, you will need to get data and then get it into the correct format. This page explains how to do so.

3.1. What data does LSDTopoTools take?

Although LSDTopoTools works with three data formats, the only one that retains georeferenceing information (where you are in space) is the ENVI bil format, so we highly recommend you convert all data to ENVI bil.

Do not mix ESRI bil format with ENVI bil format! They are not the same! You need to use ENVI bil.

The LSDTopoTools works predominantly with raster data; if you don’t know what that is you can read about it here: http://en.wikipedia.org/wiki/Raster_data. In most cases, the raster data you will start with is a digital elevation model (DEM). Digital elevation models (and rasters in general) come in all sorts of formats. LSDTopoTools works with three formats:

Table 1. File input and output options
Data type	file extension	Description
Ascii	`.asc`	This format is in plain text and can be read by a text editor. The advantage of this format is that you can easily look at the data, but the disadvantage is that the file size is extremely large (compared to the other formats, .flt).
Float	`.flt` with a header file with extension `.hdr`.	This is a binary file format meaning that you can’t use a text editor to look at the data. The file size is greatly reduced compared to `.asc` data, however. This format does not retain georeferencing information.
ENVI bil format	`.bil` with a header file with extension `.hdr`.	This is the recommended format, because it works best with GDAL (see the section GDAL), and because it retains georeferencing information.

Why don’t we use GeoTiff?

GeoTIFF is a widely used raster format that has the advantage of containing georeferencing and the raster data in a single file. The disadvantage is that for C++ code you need to have two libraries (libtiff and libgeotiff) installed before you can read GeoTIFF files. Because there are many open source, easily installed tools for converting GeoTIFF files (for example, the GDAL utilities and the python GDAL bindings) we have opted for portability and not included the GeoTIFF libraries in our software. If you have GeoTIFF files, you will need to convert them to a supported format before using LSDTopoTools.

Below you will find instructions on how to get data into the correct format: data is delivered in a wide array of formats (e.g., ESRI bil, DEM, GeoTiff) and you must convert this data before it can be used by LSDTopoTools.

3.2. Downloading data

If you want to analyse topography, you should get some topographic data! The last decade has seen incredible gains in the availability and resolution of topographic data. Today, you can get topographic data from a number of sources. The best way to find this data is through search engines, but below are some common sources:

Table 2. Sources of topographic data
Source	Data type	Description and link
List maintained by Jošt Hobič	lidar	This is a list of freely available lidar data maintained by Jošt Hobič. The most complete list on the web. https://arheologijaslovenija.blogspot.co.uk/p/blog-page_81.html
opentopography	lidar	Lidar raster and point cloud data, funded by the National Science foundation. http://www.opentopography.org/
U.S. Interagency Elevation Inventory	lidar and IfSAR	Lidar raster and point cloud data, and IFSAR (5 m resolution or better), collated by NOAA. http://www.csc.noaa.gov/inventory/#
USGS national map viewer	Various (including IfSAR and lidar, and satellite imagery)	United States elevation data hosted by the United States Geological Survey. Mostly data from the United States. http://viewer.nationalmap.gov/basic/
EarthExplorer	Various (including lidar, IfSAR, ASTER and SRTM data)	Another USGS data page. This has more global coverage and is a good place to download SRTM 30 mdata. http://earthexplorer.usgs.gov/
Spanish LiDAR	lidar	This site has lidar data from Spain: http://centrodedescargas.cnig.es/CentroDescargas/buscadorCatalogo.do?codFamilia=LIDAR
Finland LiDAR	lidar	Finland’s national LiDAR dataset: https://tiedostopalvelu.maanmittauslaitos.fi/tp/kartta?lang=en
Denmark LiDAR	lidar	Denmark’s national LiDAR dataset: http://download.kortforsyningen.dk/
Environment Agency (UK) LiDAR	lidar	Lidar holdings of the Environment Agency (UK): http://www.geostore.com/environment-agency/WebStore?xml=environment-agency/xml/application.xml
Trentino (Italy) LiDAR	lidar	lidar from the Trentio, a province in the Italian Alps: http://www.lidar.provincia.tn.it:8081/WebGisIT/pages/webgis.faces

3.3. Global datasets

There are several global topographic datasets.

The oldest of these is gtopo30, which was completed in 1996 and contains ~1 km resolution global data.
This was followed by the Shuttle Radar Topography Mission (SRTM) that produced a 90 meter resolution DEM in 2003.
SRTM was followed by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) 30 meter resolution global DEM in 2009.
In 2014 SRTM released a global 30 meter dataset.
2015 has seen the release of the WorldDEM, a 12 meter resolution topographic dataset.

GTOPO30, SRTM and ASTER data are all freely available for download. The WorldDEM is a commercial product. If you are looking for a global dataset at reasonable resolution that is not commercial, you will most likely choose between ASTER and SRTM. You can download ASTER and SRTM data at the same site and make your own comparisons, but ASTER has widely publicised data quality issues so we recommend the SRTM 30 meter data.

Where to get global data

As of late 2016, opentropography.org has been hosting global datasets. If you click on raster data and then global data sets you will find various global datasets. *The download interface is MUCH BETTER that other interfaces. You do not need to sign in, it gives seamless data, you can keep track of where you have downloaded data. It is just very good. If you are looking for 90m or 30m data this website is your one stop shop.

4. Projections and transformations

Many of our readers will be aware that our planet is well approximated as a sphere. Most maps and computer screens, however, are flat. This causes some problems.

To locate oneself on the surface of the Earth, many navigational tools use a coordinate system based on a sphere, first introduced by the "father or geography" Eratosthenes of Cyrene. Readers will be familiar with this system through latitude and longitude.

A coordinate system based on a sphere is called a geographic coordinate system. For most of our topographic analysis routines, a geographic coordinate system is a bit of a problem because the distance between points is measured in angular units and these vary as a function of position on the surface of the planet. For example, a degree of longitude is equal to 111.320 kilometers at the equator, but only 28.902 kilometers at a latitude of 75 degrees! For our topographic analyses tools we prefer to measure distances in length rather than in angular units.

To convert locations on a the surface of a sphere to locations on a plane (e.g., a paper map or your computer screen), a map projection is required. All of the LSDTopoTools analysis routines work on a projected coordinate system.

There are many projected coordinate systems out there, but we recommend the Universal Transverse Mercator (UTM) system, since it is a widely used projection system with units of meters.

So, before you do anything with topographic data you will need to:

Check to see if the data is in a projected coordinate system
Convert any data in a geographic coordinate systems to a projected coordinate system.

Both of these tasks can be done quickly an easily with GDAL software tools.

5. GDAL

How do I get GDAL on my computer

GDAL (the Geospatial Data Abstraction Library) is very easy to install if you have a Linux operating system. Many people don’t have a Linux operating system, however. We reccomend using either Docker or Vagrant to manage a Linux system that lives within your home operating system. The instructions are included link::LSDTT_installation.html[in our instructions on how to install LSDTopoTools]. We bundle GDAL with LSDTopoTools installations.

You use GDAL via a command line interface. If you don’t know anything about that, please read our documentation on an introduction to Linux.

Now that you know something about data formats, projections and transformations (since you read very carefully the preceding sections), you are probably hoping that there is a simple tool with which you can manipulate your data. Good news: there is! If you are reading this book you have almost certainly heard of GIS software, which is inviting since many GIS software packages have a nice, friendly and shiny user interface that you can use to reassuringly click on buttons. However, we do not recommend that you use GIS software to transform or project your data. Instead we recommend you use GDAL.

GDAL (the Geospatial Data Abstraction Library) is a popular software package for manipulating geospatial data. GDAL allows for manipulation of geospatial data in the Linux operating system, and for most operations is much faster than GUI-based GIS systems (e.g., ArcMap).

Here we give some notes on common operations in GDAL that one might use when working with LSDTopoTools. Much of these operations are carried out using GDAL’s utility programs, which can be downloaded from http://www.gdal.org/gdal_utilities.html. The appendices have instructions on how to get the GDAL utilities working. You will also have to be able to open a terminal or powershell. Instructions on how to do this are in the appendices.

5.1. Finding out what sort of data you’ve got

One of the most frequent operations in GDAL is just to see what sort of data you have. The tool for doing this is gdalinfo which is run with the command line:

$ gdalinfo filename.ext

where filename.ext is the name of your raster.

This is used mostly to:

See what projection your raster is in.
Check the extent of the raster.

This utility can read Arc formatted rasters but you need to navigate into the folder of the raster and use the .adf file as the filename. There are sometimes more than one .adf files so you’ll just need to use ls -l to find the biggest one.

5.1.1. Translating your raster into something that can be used by LSDTopoToolbox

Say you have a raster but it is in the wrong format (LSDTopoToolbox at the moment only takes .bil, .flt and .asc files) and in the wrong projection.

LDSTopoToolbox performs many of its analyses on the basis of projected coordinates.

You will need to be able to both change the projection of your rasters and change the format of your raster. The two utilities for this are:

5.1.2. Changing raster projections with gdalwarp

The preferred coordinate system is WGS84 UTM coordinates. For convert to this coordinate system you use gdalwarp. The coordinate system of the source raster can be detected by gdal, so you use the flag -t_srs to assign the target coordinate system. Details about the target coordinate system are in quotes, you want:

+proj=utm +zone=XX +datum=WGS84'

where XX is the UTM zone. You can find a map of UTM zones here: http://www.dmap.co.uk/utmworld.htm. For example, if you want zone 44 (where the headwaters of the Ganges are), you would use:

'+proj=utm +zone=XX +datum=WGS84'

Put this together with a source and target filename:

$ gdalwarp -t_srs '+proj=utm +zone=XX +datum=WGS84' source.ext target.ext

so one example would be:

$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' diff0715_0612_clip.tif diff0715_0612_clip_UTM44.tif

note that if you are using UTM and you are in the southern hemisphere, you should use the +south flag:

$ gdalwarp -t_srs '+proj=utm +zone=19 +south +datum=WGS84' 20131228_tsx_20131228_tdx.height.gc.tif Chile_small.tif

ESPG code for UTM

You can alternatively use the EPSG codes for UTM coordinates.

These are:

UTM North zones have EPSG:326XX where XX is the zone.
UTM South zones have EPSG:327XX where XX is the zone.

So example call would be

$ gdalwarp -t_srs EPSG:32719 20131228_tsx_20131228_tdx.height.gc.tif Chile_small.tif

There are several other flags that could be quite handy (for a complete list see the GDAL website).

-of format: This sets the format to the selected format. This means you can skip the step of changing formats with gdal_translate. We will repeat this later but the formats for LSDTopoTools are:

Table 3. Format of outputs for GDAL
Flag	Description
`ASCGrid`	ASCII files. These files are huge so try not to use them.
`EHdr`	ESRI float files. This used to be the only binary option but GDAL seems to struggle with it and it doesn’t retain georeferencing.
`ENVI`	ENVI rasters. This is the preferred format. GDAL deals with these files well and they retain georeferencing. We use the extension `bil` with these files.

So, for example, you could output the file as:

  $ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -of ENVI diff0715_0612_clip.tif diff0715_0612_clip_UTM44.bil

Or for the southern hemisphere:

  $ gdalwarp -t_srs '+proj=utm +zone=19 +south +datum=WGS84' -of ENVI 20131228_tsx_20131228_tdx.height.gc.tif Chile_small.bil

-tr xres yres: This sets the x and y resolution of the output DEM. It uses nearest neighbour resampling by default. So say you wanted to resample to 4 metres:
```
  $ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -tr 4 4 diff0715_0612_clip.tif diff0715_0612_clip_UTM44_rs4.tif
```
LSDRasters assume square cells so you need both x any y distances to be the same

-r resampling_method: This allows you to select the resampling method. The options are:

Table 4. Resampling methods for GDAL
Method	Description
`near`	Nearest neighbour resampling (default, fastest algorithm, worst interpolation quality).
`bilinear`	Bilinear resampling.
`cubic`	Cubic resampling.
`cubicspline`	Cubic spline resampling.
`lanczos`	Lanczos windowed sinc resampling.
`average`	Average resampling, computes the average of all non-NODATA contributing pixels. (GDAL versions >= 1.10.0).
`mode`	Mode resampling, selects the value which appears most often of all the sampled points. (GDAL versions >= 1.10.0).

So for example you could do a cubic resampling with:

$ gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84' -tr 4 4 -r cubic
diff0715_0612_clip.tif diff0715_0612_clip_UTM44_rs4.tif

-te <x_min> <y_min> <x_max> <y_max>: this clips the raster. You can see more about this below in under the header Clipping rasters with gdal.
- UTM South: If you are looking at maps in the southern hemisphere, you need to use the +south flag:
  $ gdalwarp -t_srs '+proj=utm +zone=44 +south +datum=WGS84' -of ENVI diff0715_0612_clip.tif diff0715_0612_clip_UTM44.bil

5.1.3. Changing Nodata with gdalwarp

Sometimes your source data has nodata values that are weird, like 3.08x10^36 or something. You might want to change these values in your output DEM. You can do this with gdalwarp:

$ gdalwarp -of ENVI -dstnodata -9999 harring_dem1.tif Harring_DEM.bil

In the above case I’ve just changed a DEM from a tif to an ENVI bil but used -9999 as the nodata value.

5.1.4. Changing raster format with gdal_translate

Suppose you have a raster in UTM coordinates (zones can be found here: http://www.dmap.co.uk/utmworld.htm) but it is not in .flt format. You can change the format using gdal_translate (note the underscore).

gdal_translate recognizes many file formats, but for LSDTopoTools you want either:

The ESRI .hdr labelled format, which is denoted with EHdr.
The ENVI .hdr labelled format, which is denoted with ENVI. ENVI files are preferred since they work better with GDAL and retain georeferencing.

To set the file format you use the -of flag, an example would be:

$ gdal_translate -of ENVI diff0715_0612_clip_UTM44.tif diff0715_0612_clip_UTM44.bil

Where the first filename.ext is the source file and the second is the output file.

5.1.5. Nodata doesn’t register

In older versions of GDAL, the NoDATA value doesn’t translate when you use gdalwarp and gdal_traslate. If this happens to you, the simple solution is to go into the 'hdr' file and add the no data vale. You will need to use gdalinfo to get the nodata value from the source raster, and then in the header of the destination raster, add the line: data ignore value = -9999 (or whatever the nodata value in the source code is).

If you want to change the actual nodata value on an output DEM, you will need to use gdalwarp with the -dstnodata flag.

5.1.6. Potential filename errors

It appears that GDAL considers filenames to be case-insensitive, which can cause data management problems in some cases. The following files are both considered the same:

Feather_DEM.bil feather_dem.bil

This can result in an ESRI *.hdr file overwriting an ENVI *.hdr file and causing the code to fail to load the data. To avoid this ensure that input and output filenames from GDAL processes are unique.

5.2. Clipping rasters with gdal

You might also want to clip your raster to a smaller area. This can sometimes take ages on GUI-based GISs. An alternative is to use gdalwarp for clipping:

$ gdalwarp -te <x_min> <y_min> <x_max> <y_max> input.tif clipped_output.tif

or you can change the output format:

$ gdalwarp -te <x_min> <y_min> <x_max> <y_max> -of ENVI input.tif clipped_output.bil

Since this is a gdalwarp operation, you can add all the bells and whistles to this, such as:

changing the coordinate system,
resampling the DEM,
changing the file format.

The main thing to note about the -te operation is that the clip will be in the coordinates of the source raster (input.tif). You can look at the extent of the raster using gdalinfo.

5.3. Merging large numbers of rasters

Often websites serving topographic data will supply it to you in tiles. Merging these rasters can be a bit of an annoyance if you use a GIS, but is a breeze with GDAL. The gdal_merge.py program allows you to feed in a file with the names of the rasters you want to merge, so you can use a Linux pipe to merge all of the rasters of a certain format using just two commands:

$ ls *.asc > DEMs.txt
$ gdal_merge.py -of ENVI -o merged_dem.bil --optfile DEMs.txt

The above command works for ascii files but tif or other file formats would work as well.

The exception to this is ESRI files, which have a somewhat bizarre structure which requires a bit of extra work to get the file list. Here is an example python script to process tiles that have a common directory name:

Script for creating a list of ESRI rasters for use in gdal_merge.py

def GetESRIFileNamesNextMap():

    file_list = []

    for DirName in glob("*/"):
        #print DirName

        directory_without_slash = DirName[:-1]

        this_filename = "./"+DirName+directory_without_slash+"dtme/hdr.adf\n"

        print this_filename
        file_list.append(this_filename)

    # write the new version of the file
    file_for_output = open("DEM_list.txt",'w')
    file_for_output.writelines(file_list)
    file_for_output.close()

If you use this you will need to modify the directory structure to reflect your own files.

5.4. Creating shapefiles or vector format files with GDAL

Sometimes it is useful to "polygonize" a raster. The most common case is when you want a shapefile from a basin raster. To do this, you can use the gdal_polygonize.py tool.

$ gdal_polygonize.py My_AllBasins.bil -f "ESRI Shapefile" Outputfile.shp

If you do not give the file format (with the flag -f), the script will guess the output format on the basis of the extension. This doesn’t always work! We suggest using the -f flag. Common file formats (hopefully self-explanatory) are "GeoJSON", "KML", "ESRI Shapefile".

If you do not correctly set the output format, it will default to GML format, which in my testing does not preserve the georeferencing.