telemetR: QA/QC with Spatial Attributes

- 15 mins

GPS telemetry data can have a number of different errors for a number of different reasons. Sometimes the GPS sensor records an erroneous point, or no point at all. The data can be incomplete (missing locations) or completely missing (was scheduled to get a point but didn’t). This can depend on the biology of the animal, terrain, and canopy cover.

Now that we have extended the database with PostGIS we can use the functionality to detect outliers and erroneous points. Often these points are biologically impossible distances of velocities between two points.

Location Validity

We can choose to manage erroneous data in telemetry a number of different ways. The simplest is to delete this data. However, if we go this route we lose all the data about that point that may help us diagnose the errors. The point may have even been correct, and we have to re-upload the data in order to get it. This is inefficient. A better method is to create a new field, telemetry.validity_id that stores information about the validity of the point. We link this column to a new table, lkp_validity.

CREATE TABLE lkp_validity (
  id serial PRIMARY KEY,
  validity_desc varchar);

ALTER TABLE telemetry
ADD COLUMN
  validity_id integer REFERENCES lkp_validity(id);

INSERT INTO lkp_validity (validity_desc)
VALUES
  ('no coordinates'),
  ('valid position'),
  ('low reliability'),
  ('wrong: out of study area'),
  ('wrong: impossible spike'),
  ('wrong: impossilbe position');

UPDATE telemetry
SET validity_id = 2;

The initial validity of all the coordinates will be 1. This is also the default value of all newly added coordinates to the table.

Missing Coordinates

The simplest case is missing coordinates. This data isn’t useless or incomplete. The GPS device did transmit data but was unable to get a fix. This data is useful for calculating the error rate of the collar. Instead of deleting this data we will add the “no coordinates” validity code to telemetry.validity_id

UPDATE telemetry
SET validity_id = 2
WHERE geom IS NULL;

Impossible Spike

An impossible spike is a movement that would result in the animal having traveled an absurd distance in a short time. Fortunately these events are uncommon. However, depending on the temporal scales and magnitude of the movement, they can be difficult to detect. Visual inspection of the animal trajectory is a sure method to detect these events. We need to write a method to detect these events programatically to make analysis easier.

The code below uses [moving window functions][http://postgresguide.com/tips/window.html] to create steps. A step is the interval between two successive GPS locations. When calculating the step we need to know the data in the current row and the previous row. With this information we can calculate the time between each fix, the distance of the step, and the average speed of the step.

SELECT
  animal_id AS id,
  acq_time,
  LEAD(acq_time,-1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time) AS acq_time_1,
  (EXTRACT(epoch FROM acq_time) -
  LEAD(EXTRACT(epoch FROM acq_time), -1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time))::integer AS deltat,
  (ST_DistanceSpheroid(
    geom,
    LEAD(geom, -1)
      OVER (
        PARTITION BY animal_id
        ORDER BY acq_time),
    'SPHEROID["WGS 84",6378137,298.257223563]'))::integer AS dist,
  (ST_DistanceSpheroid(
    geom,
    LEAD(geom, -1)
      OVER (
        PARTITION BY animal_id
        ORDER BY acq_time),
    'SPHEROID["WGS 84",6378137,298.257223563]')/
  ((EXTRACT(epoch FROM acq_time) -
  LEAD(
    EXTRACT(epoch FROM acq_time), -1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time))+1) * 60 * 60)::numeric(8,2) AS speed
FROM telemetry
WHERE validity_id IS NULL
LIMIT 10;

The database should return the following. If you’ve had to delete and re-append records your IDs may be different than mine. That is okay.

id |        acq_time        |       acq_time_1       | deltat | dist | speed
---+------------------------+------------------------+--------+------+-------
 3 | 2005-10-16 01:01:23-07 |                        |        |      |      
 3 | 2005-10-16 09:01:40-07 | 2005-10-16 01:01:23-07 |  28817 |   40 |  4.98
 3 | 2005-10-16 13:01:17-07 | 2005-10-16 09:01:40-07 |  14377 |  321 | 80.31
 3 | 2005-10-16 17:01:51-07 | 2005-10-16 13:01:17-07 |  14434 |  286 | 71.23
 3 | 2005-10-16 21:00:55-07 | 2005-10-16 17:01:51-07 |  14344 |   41 | 10.25
 3 | 2005-10-17 01:00:54-07 | 2005-10-16 21:00:55-07 |  14399 |  202 | 50.62
 3 | 2005-10-17 05:02:23-07 | 2005-10-17 01:00:54-07 |  14489 |  107 | 26.64
 3 | 2005-10-17 09:01:42-07 | 2005-10-17 05:02:23-07 |  14359 |   65 | 16.38
 3 | 2005-10-17 13:01:19-07 | 2005-10-17 09:01:42-07 |  14377 |  226 | 56.68
 3 | 2005-10-17 17:01:19-07 | 2005-10-17 13:01:19-07 |  14400 |   21 |  5.34

Each row has two timestamps. The first is the actual acquisition time and the second is the acquisition time from the previous row. deltat, dist, and speed are the change in time, distance and speed between each fix.

Window Functions

The code snippet above is pretty complex. Lets go a little deeper with window functions. We will recreate the output above step by step.

SELECT
  animal_id AS id,
  acq_time AS time_1,
  LEAD(acq_time, -1)
    OVER(
      PARTITION BY animal_id
      ORDER BY acq_time) AS time_2,
  (EXTRACT(epoch FROM acq_time) -
  LEAD(EXTRACT(epoch FROM acq_time), -1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time))::integer AS deltat
FROM telemetry
WHERE validity_id IS NULL
LIMIT 10;

In this first step we simply offset timestamps for the acquisition time for each successive GPS fix then calculate the time (deltat) between each fix.

SELECT
(ST_DistanceSpheroid(
  geom,
  LEAD(geom, -1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time),
  'SPHEROID["WGS 84",6378137,298.257223563]'))::integer AS dist
FROM telemetry
WHERE validity_id IS NULL
LIMIT 10;

We can use ST_DistanceSpheroid to calculate the distance of the step. The units are meters.

SELECT
  (ST_DistanceSpheroid(
    geom,
    LEAD(geom, -1)
      OVER (
        PARTITION BY animal_id
        ORDER BY acq_time),
    'SPHEROID["WGS 84",6378137,298.257223563]')/
  ((EXTRACT(epoch FROM acq_time) -
  LEAD(
    EXTRACT(epoch FROM acq_time), -1)
    OVER (
      PARTITION BY animal_id
      ORDER BY acq_time))+1) * 60 * 60)::numeric(8,2) AS speed
FROM telemetry
WHERE validity_id IS NULL
LIMIT 10;

This last step calculates the average velocity between each successive GPS point in meters per hour.

Detect Impossible Spikes

Using distance alone isn’t a very robust method to detect impossible spikes. We can rule out absurd movements. For instance, I’ve had many points with coordinates in the Arctic. These points are obvious errors and result in an a distance way greater than possible. However, I’ve also had erroneous points that are maybe 5km away from the next two points. These are equally improbable (based on the biology of the organism, large ungulate) but not as easily detected. There are plenty of legitimate 5km movements during migrations of ungulates, especially if a transmission is missed. In the latter case an single step may be as great as 10km between the two. Simply excluding points greater than a arbitrary threshold won’t work.

an improbable spike

Fig 1. All the animals GPS fixes are confined to a small area except for an improbable spike.

A spike as a random error that occurs due to insufficient satellite coverage. It is unlikely that erroneous points will occur consecutively. Let’s use a few other movement parameters, velocity and relative angle. A spike is a visual onomatopoeia (fig 1.), we can use the characteristics of a spike to filter them out of the data. Below we are calculating the speed of each step on either side of a GPS point and the relative angle.

UPDATE
  telemetry
SET
  validity_id = 5
WHERE
  id IN (
    SELECT id
    FROM
      (SELECT
        id,
        ST_DistanceSpheroid(
          geom,
          LAG(geom, 1)
          OVER (
            PARTITION BY animal_id
            ORDER BY acq_time),
            'SPHEROID["WGS 84",6378137,298.257223563]') /
          (EXTRACT(epoch FROM acq_time) -
           EXTRACT(epoch FROM (
              LAG(acq_time, 1)
              OVER (
                PARTITION BY animal_id
                ORDER BY acq_time)))) * 3600 AS speed_from,
        ST_DistanceSpheroid(
          geom,
          LEAD(geom, 1)
          OVER (
            PARTITION BY animal_id
            ORDER BY acq_time),
            'SPHEROID["WGS 84",6378137,298.257223563]') /
          (- EXTRACT(epoch FROM acq_time) +
           EXTRACT(epoch FROM (
              LEAD(acq_time, 1)
              OVER (
                PARTITION BY animal_id
                ORDER BY acq_time)))) * 3600 AS speed_to,
        cos(
          ST_Azimuth((LAG(geom, 1)
            OVER (
              PARTITION BY animal_id
              ORDER BY acq_time))::geography,
              geom::geography) -
          ST_Azimuth(
            geom::geography,
            (LEAD(geom, 1)
            OVER (
              PARTITION BY animal_id
              ORDER BY acq_time))::geography)) AS rel_angle
      FROM telemetry
      WHERE validity_id = 2) t
    WHERE
      rel_angle < -.99 AND
      speed_from > 2500 AND
      speed_to > 2500);

Detect Improbably Movement

Improbable movements are movements that can’t be ruled out based on the biology of the animal. These coordinates are probable but possibly not reliable. We can use the same update query above but lower the thresholds in the WHERE clause.

UPDATE
  telemetry
SET
  validity_id = 3
WHERE
  id IN (
    SELECT id
    FROM
      (SELECT
        id,
        ST_DistanceSpheroid(
          geom,
          LAG(geom, 1)
          OVER (
            PARTITION BY animal_id
            ORDER BY acq_time),
            'SPHEROID["WGS 84",6378137,298.257223563]') /
          (EXTRACT(epoch FROM acq_time) -
           EXTRACT(epoch FROM (
              LAG(acq_time, 1)
              OVER (
                PARTITION BY animal_id
                ORDER BY acq_time)))) * 3600 AS speed_from,
        ST_DistanceSpheroid(
          geom,
          LEAD(geom, 1)
          OVER (
            PARTITION BY animal_id
            ORDER BY acq_time),
            'SPHEROID["WGS 84",6378137,298.257223563]') /
          (- EXTRACT(epoch FROM acq_time) +
           EXTRACT(epoch FROM (
              LEAD(acq_time, 1)
              OVER (
                PARTITION BY animal_id
                ORDER BY acq_time)))) * 3600 AS speed_to,
        cos(
          ST_Azimuth((LAG(geom, 1)
            OVER (
              PARTITION BY animal_id
              ORDER BY acq_time))::geography,
              geom::geography) -
          ST_Azimuth(
            geom::geography,
            (LEAD(geom, 1)
            OVER (
              PARTITION BY animal_id
              ORDER BY acq_time))::geography)) AS rel_angle
      FROM telemetry
      WHERE validity_id = 2) t
    WHERE
      rel_angle < -.98 AND
      speed_from > 300 AND
      speed_to > 300);

points after qaqc

Figure 2. The resulting dataset once all/most erroneous points have been removed. Impossible spikes and improbable movements have been excluded after running the QA/QC SQL code. There may still be a few points that require visual inspection.

Check the counts of QA/QC codes.

SELECT
  telemetry.validity_id,
  lkp_validity.validity_desc,
  count(telemetry.validity_id) AS n
FROM telemetry, lkp_validity
WHERE
  validity_id IS NOT NULL AND
  telemetry.validity_id = lkp_validity.id
GROUP BY
  telemetry.validity_id,
  lkp_validity.validity_desc
ORDER BY telemetry.validity_id;

Other QA/QC Considerations & Wrap Up

There are a few other points that can be eliminated in this step. Points that fall outside project boundaries or occur in impossible locations if shapefiles of this data are available. That is outside the scope of this post. These points can easily be excluded upon visual inspection. After inspection each row with erroneous data can be manually updated using telemtry.id with the proper validity_id.

Not every erroneous point will be removed after these functions have run. In a recent scenario a GPS device didn’t record the timestamps properly. The resulting trajectory was star shaped. No amount of automated QA/QC can solve this problem. Instead I re-downloaded the data from the store onboard which managed to fix the problem.

telemetR Series

  1. Introduction
  2. Creating an Animal Movement Database
  3. Extending the Database with PostGIS
  4. Importing New Data
  5. QA/QC with Spatial Attributes

upcoming topics

rss facebook twitter github youtube mail spotify instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora