telemetR: QA/QC with Spatial Attributes- 15 mins
GPS telemetry data can have a number of different errors for a number of different reasons. Sometimes the GPS sensor records an erroneous point, or no point at all. The data can be incomplete (missing locations) or completely missing (was scheduled to get a point but didn’t). This can depend on the biology of the animal, terrain, and canopy cover.
Now that we have extended the database with PostGIS we can use the functionality to detect outliers and erroneous points. Often these points are biologically impossible distances of velocities between two points.
- Location Validity
- Window Functions
- Other QA/QC Considerations & Wrap Up
- telemetR Series
We can choose to manage erroneous data in
telemetry a number of different ways. The simplest is to delete this data. However, if we go this route we lose all the data about that point that may help us diagnose the errors. The point may have even been correct, and we have to re-upload the data in order to get it. This is inefficient. A better method is to create a new field,
telemetry.validity_id that stores information about the validity of the point. We link this column to a new table,
The initial validity of all the coordinates will be 1. This is also the default value of all newly added coordinates to the table.
The simplest case is missing coordinates. This data isn’t useless or incomplete. The GPS device did transmit data but was unable to get a fix. This data is useful for calculating the error rate of the collar. Instead of deleting this data we will add the “no coordinates” validity code to
An impossible spike is a movement that would result in the animal having traveled an absurd distance in a short time. Fortunately these events are uncommon. However, depending on the temporal scales and magnitude of the movement, they can be difficult to detect. Visual inspection of the animal trajectory is a sure method to detect these events. We need to write a method to detect these events programatically to make analysis easier.
The code below uses [moving window functions][http://postgresguide.com/tips/window.html] to create steps. A step is the interval between two successive GPS locations. When calculating the step we need to know the data in the current row and the previous row. With this information we can calculate the time between each fix, the distance of the step, and the average speed of the step.
The database should return the following. If you’ve had to delete and re-append records your IDs may be different than mine. That is okay.
Each row has two timestamps. The first is the actual acquisition time and the second is the acquisition time from the previous row.
speed are the change in time, distance and speed between each fix.
The code snippet above is pretty complex. Lets go a little deeper with window functions. We will recreate the output above step by step.
In this first step we simply offset timestamps for the acquisition time for each successive GPS fix then calculate the time (
deltat) between each fix.
We can use
ST_DistanceSpheroid to calculate the distance of the step. The units are meters.
This last step calculates the average velocity between each successive GPS point in meters per hour.
Detect Impossible Spikes
Using distance alone isn’t a very robust method to detect impossible spikes. We can rule out absurd movements. For instance, I’ve had many points with coordinates in the Arctic. These points are obvious errors and result in an a distance way greater than possible. However, I’ve also had erroneous points that are maybe 5km away from the next two points. These are equally improbable (based on the biology of the organism, large ungulate) but not as easily detected. There are plenty of legitimate 5km movements during migrations of ungulates, especially if a transmission is missed. In the latter case an single step may be as great as 10km between the two. Simply excluding points greater than a arbitrary threshold won’t work.
A spike as a random error that occurs due to insufficient satellite coverage. It is unlikely that erroneous points will occur consecutively. Let’s use a few other movement parameters, velocity and relative angle. A spike is a visual onomatopoeia (fig 1.), we can use the characteristics of a spike to filter them out of the data. Below we are calculating the speed of each step on either side of a GPS point and the relative angle.
Detect Improbably Movement
Improbable movements are movements that can’t be ruled out based on the biology of the animal. These coordinates are probable but possibly not reliable. We can use the same update query above but lower the thresholds in the WHERE clause.
Check the counts of QA/QC codes.
Other QA/QC Considerations & Wrap Up
There are a few other points that can be eliminated in this step. Points that fall outside project boundaries or occur in impossible locations if shapefiles of this data are available. That is outside the scope of this post. These points can easily be excluded upon visual inspection. After inspection each row with erroneous data can be manually updated using
telemtry.id with the proper
Not every erroneous point will be removed after these functions have run. In a recent scenario a GPS device didn’t record the timestamps properly. The resulting trajectory was star shaped. No amount of automated QA/QC can solve this problem. Instead I re-downloaded the data from the store onboard which managed to fix the problem.
- Creating an Animal Movement Database
- Extending the Database with PostGIS
- Importing New Data
- QA/QC with Spatial Attributes
- Connecting to the Database with R
- Adding More SQL Functionality
- Shiny Web Application
- A Simple RESTful API
- … more tbd