The Challenge

The 2017 VAST Challenge MC2 required the analysis of sensor data that detected airborne pollutants on the periphery of a wildlife reserve. The objective was to identify possible sources of the detected pollutants from the spatio-temporal patterns in sensor readings.

This page provides details of the giCentre's solution to the challenge, which won the award 'Comprehensive Mini-Challenge 2 Answer'.

Team Members: Jo Wood

Tools Used: Bespoke software designed and built using Processing, the giCentre Utils library written by the giCentre at City, University of London.

How many hours spent working on submission: Approximately 10 hours to construct software and perform analysis. A further 10 hours assembling the report and video. Additional software written for other VAST Challenges was also used.

Slides from VIS2017, presented by Johannes Liem, will be available after the presentation in Phoenix, Arizona in October 2017.

A short paper Visual Analytic Design for Detecting Airborne Pollution Sources outlines some of the design challenges for this kind of problem and the video below show the software we built to help answer the challenge questions. 

 

Challenge Questions

MC2.1 Characterize the sensors' performance and operation. Are they all working properly at all times?

Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

The nine sensors, each measuring four chemical concentrations, are generally functionally continuously during the three month-long sample periods. Readings are logged at hourly intervals 24 hours per day. Figure 1 shows the sensor readings (9 sensors; 4 chemical detectors each; 3 month-long periods) with exceptions where there are unexpected gaps in the logged readings symbolised as red discs with red vertical lines to aid accurate time comparison with other features. In this figure and the others below, the vertical scale measures the square root of chemical concentration in parts per million so can show variation among the less extreme values in addtion to spikes in chemical concentration.

Figure 1 - The 9 sensors' readings over the three month-long periods showing missing readings in red.

Figure 1 - The 9 sensors' readings over the three month-long periods showing missing readings in red.

There are two broad patterns to the apparent absences represented by red discs. The first of these shows certain timestamps where missing data points coincide. Figure 2 shows there are 7 points in time where there is largely an absence of recordings - midnight at the start of 2nd April, 6th April, 2nd August, 4th August, 7th August, 2nd December and 7th December. The only readings at these times (circled in Figure 2) were on the 1st August for Sensor 3 (AGOC-3A and Methylosmolene); 7th December for Sensor 6 (AGOC-3A), Sensor 7 (AGOC-3A and Appluimonia) and Sensor 8 (AGOC-3A and Methylosmolene).

Figure 2 - Missing midnight readings (in red) over the three month-long periods.

Figure 2 - Missing midnight readings (in red) over the three month-long periods.

Several of these exceptions to the missing data are revealing because for some sensors they occur at or around comparatively rare spikes in chemical concentrations. Of note is the peak in Methylosmolene in Sensor 3 that surrounds the reading for the 2nd August missing data timestamp (Figure 3). This and the following observations suggest there may be some association between the points of consistently missing data and possible unusual chemical readings.

Figure 3 - Zoomed in Sensor 3 chart showing exception to 2nd August data gap coincides with Methylosmolene spike.

Figure 3 - Zoomed in Sensor 3 chart showing exception to 2nd August data gap coincides with Methylosmolene spike.

Midnight on the 7th December invites particular scrutiny as this occurs around the time major peaks in Appluimonia (sensor 6, Figure 4), Methylosmolene and AGOC-3A (sensors 7 and 8, Figures 5 and 6).

Figure 4 - Zoomed in Sensor 6 chart showing exception to 7th December data gap coincides with spike in AGOC-3A and Appluimonia.

Figure 4 - Zoomed in Sensor 6 chart showing exception to 7th December data gap coincides with spike in AGOC-3A and Appluimonia.

Figure 5 - Zoomed in Sensor 7 chart showing the AGOC-3A exception to 7th December data gap coincides with spike in Methylosmolene.

Figure 5 - Zoomed in Sensor 7 chart showing the AGOC-3A exception to 7th December data gap coincides with spike in Methylosmolene.

Figure 6 - Zoomed in Sensor 8 chart showing the Methylosmolene and AGOC-3A exceptions to 7th December data gap coincides with large spike in Methylosmolene and moderate spike in AGOC-3A.

Figure 6 - Zoomed in Sensor 8 chart showing the Methylosmolene and AGOC-3A exceptions to 7th December data gap coincides with large spike in Methylosmolene and moderate spike in AGOC-3A.

Missing data at these points may be masking other peaks and require further investigation.

The second pattern observable from Figure 1 is a set of apparently missing readings in all sensors (but especially sensors 4, 5 and 6) in Methylosmolene. In every case other than those noted above, these missing vales coincide with double readings attributed to AGOC-3A in the same sensor. This is shown in Figure 7 where duplicate readings are symbolised as green discs with vertical lines accurately depicting the exact timestamps where they occur. In every case these are aligned with (i.e. occur at the same time as) missing Methylosmolene values (red discs on bottom row of each sensor).

Figure 7 - Zoomed in portion of sensors 5 and 6 readings showing alignment of duplicate (gree discs) and missing values (red discs) and the (possible) correlation with spikes of AGOC-3A concentrations.

Figure 7 - Zoomed in portion of sensors 5 and 6 readings showing alignment of duplicate (gree discs) and missing values (red discs) and the (possible) correlation with spikes of AGOC-3A concentrations.

Assuming that these duplicate/missing readings are the correct values but have been misallocated to the wrong chemical type, the following procedure was applied: For each sensor, the mean and standard deviation was calculated over the three month period for each chemical type. Labelling each pair of AGOC-3A duplicates as D1 and D2, there are two possible allocations: either

D1 -> AGOC-3A and D2 -> Methylosmolene 
or
D1 -> Methylosmolene and D2 -> AGOC-3A 

The z-scores (number of standard deviations away from the mean) for both possible allocations were calculated and option with the lowest sum of squared z-scores was automatically selected. In other words, each value was allocated to the distribution that it more typically represented.

Figure 8 shows some examples of this allocation. It can be seen that spikes occur in both AGOC-3A and Methylosmolene for these allocated values so there remains the possibility that the values themselves are incorrect, not simple allocated to the wrong group.

Figure 8 - Zoomed in portion of sensors 5 and 6 readings showing transfer of duplicate readings from ACOC-3A to Methylosmolene (original duplicates shown as solid green disks, transferred values green circles). Spikes remain at these points in …

Figure 8 - Zoomed in portion of sensors 5 and 6 readings showing transfer of duplicate readings from ACOC-3A to Methylosmolene (original duplicates shown as solid green disks, transferred values green circles). Spikes remain at these points in both AGOC-3A and Methylosmolene.

There are a number of related possible explanations for the patterns seen.

  1. Some error in the sensor readings for AGOC-3 and Methylosmolene results in data being attributed to the wrong chemical type.
  2. High concentrations of one or more chemicals could be the cause of the error.
  3. The error above could result in erroneously high readings.
  4. There could be some deliberate malicious attempt to hide high readings.
     

Finally, there is a likely problem with Sensor 4 that shows a consistent increase in chemical concentration readings over time.

Figure 9 - Drift in Sensor 4 readings revealed in CUSUM chart. Here the cumulative deviation from expected readings (based on the first week of April) is shown. All 4 chemical readings show an apparent gradual increase in the underlying concent…

Figure 9 - Drift in Sensor 4 readings revealed in CUSUM chart. Here the cumulative deviation from expected readings (based on the first week of April) is shown. All 4 chemical readings show an apparent gradual increase in the underlying concentration in addtion a day-to-day noisy variation and irregular spikes.

There is a small possibility that this trend could be triggered by genuine local environmental change, but given this trend is not detected by other nearby sensors, this is regarded as low probability.

MC2.2 Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

Figure 10 provides an overview of the trends in chemical detection for all sensors. The CUSUM (Cumulative Sum) chart shows the cumulative z-score over time, which allows for trends to be detected more easily than the equivalent raw sensor readings (shown in Figure 1). 'Normal' behaviour was modelled on the mean and variance for week 1 of all sensors and so if beyond that week, chemical levels are consistently above or below normal behaviour, the CUSUM line moves above or below the baseline. This shows that in general levels of all chemicals were higher by December than they were in April (bars thicker and above the baseline towards the right of Figure 10).

Figure 10 - CUSUM charts for all sensors/chemicals over time.

Figure 10 - CUSUM charts for all sensors/chemicals over time.

Discounting the apparent increase in probably erroneous Sensor 4, some of the largest increases in concentration are for Methylosmolene (sensors 2, 5, 7, 8 and 9); Chlorodinine (sensors 1,5 and 9) and Appluimonia (sensor 5). Sensor 5 shows a general increase in three of the four chemicals, but the fact that AGOC3-A does not appear to increase suggests it does not have the same recording problem exhibited by Sensor 4. However given the spatial proximity of sensors 4 and 5, both sensors should be checked in order to rule in or out, the possibility of serious local contamination.

Figures 11-14 show in more detail the concentration levels of all four chemicals as reported by the set of 9 sensors for the three month periods.

Figure 11 - Square root of AGOC-3A levels over time. Taking the square root of ppm concentrations reduces height of extreme peaks revealing detail in lower concentration values. All 9 sensors scaled equally.

Figure 11 - Square root of AGOC-3A levels over time. Taking the square root of ppm concentrations reduces height of extreme peaks revealing detail in lower concentration values. All 9 sensors scaled equally.

Figure 12 -  Square root of Appluimonia levels over time. All 9 sensors scaled equally.

Figure 12 -  Square root of Appluimonia levels over time. All 9 sensors scaled equally.

Figure 13 - Square root of Chlorodinine levels over time. All 9 sensors scaled equally.

Figure 13 - Square root of Chlorodinine levels over time. All 9 sensors scaled equally.

Figure 14 - Square root of Methylosmolene levels over time. All 9 sensors scaled equally.

Figure 14 - Square root of Methylosmolene levels over time. All 9 sensors scaled equally.

All four chemicals show a typical, largely random, noise component with occasional 'spikes' of much higher concentrations, typically 5-20 standard deviations from background levels. The most extreme spikes occur for AGOC-3A and Methylosmolene (noise showing smaller variation when scaled by maximum peak value in Figures 11 and 14). Appluimonia shows the least spikey distribution (Figure 12).

Figures 11-14 also show the anomalous behaviour of Sensor 4 for all chemicals suggesting at least a large part of the trend in apparently increasing concentrations is erroneous. The fact that Sensor 5 does not show a similar pattern lends further support to the observation above that levels of Appluimonia, Chlorodinine and Methylosmolene in that area are increasing over time rather than the product of sensor malfunction.

In addition to Sensor 5, Sensor 9 saw an increase in levels for all chemicals from the end of August (23-29th) and though December. Sensors 5 and 9 are geographically proximal and are the sensors that are closest to the interior of the park. The increase isn't immediately obvious from the raw concentration charts (Figures 11-14), but revealed by the CUSUM charts (Sensor 9 shown in Figure 15). Up until 23rd of August, detected levels are reasonably stable, but beyond that period we observe a trend of increasing concentrations due to combination of increased spike frequency and general background levels. This is most strongly evident in Chlorodinine but present also in the other three chemicals.

Figure 15 - CUSUM charts for Sensor 9 showing increase in all chemical concentrations from end of August and through December

Figure 15 - CUSUM charts for Sensor 9 showing increase in all chemical concentrations from end of August and through December

MC2.3 Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Spatial analysis of the sensor readings suggests the factories primarily responsible for high concentration chemical releases are as follows:

Kasios Office Furniture:  AGOC-3A; Methylosmolene.

Indigo Sol Boards: Appluimonia.

Roadrunner Fitness Electronics: Chlorodinine.

Radiance ColourTek: No detectable environmental pollution

 

Spatial analysis and visualization was performed largely with a zoomable map view showing the positions of the sensors and factories along with a timeline view showing a summary of high concentration detection events (see Figure 16). The map view was constructed as part of an integrated spatial analysis and visualization used for all three VAST Mini-challenges.

Figure 16 - Map and timeline view of the sensors and factory locations showing chemical detection events by sensor (vertically ordered) and chemical (colour hue). 

Figure 16 - Map and timeline view of the sensors and factory locations showing chemical detection events by sensor (vertically ordered) and chemical (colour hue). 

To detect the spread of airborne pollutants, selected peaks in concentration were displayed on the map view as probability cones based on the measured wind direction and strength at the time of the event. The probability cone shows the most likely source of the detected chemical by tracing a vector back in the opposite direction of the wind, expanding the likely region with distance away from sensor. The threshold that defines a detection event visualized in this way can be changed interactively. An example of events detected at 2pm on August 21st is shown in Figure 17.

Figure 17 - Example of two detection events shown on 21st August, 14:00 by Sensors 4 and 9 in map and timeline views. Cones show probable source direction based on measured wind direction and speed at time of event. Wind rose on left shows summ…

Figure 17 - Example of two detection events shown on 21st August, 14:00 by Sensors 4 and 9 in map and timeline views. Cones show probable source direction based on measured wind direction and speed at time of event. Wind rose on left shows summary of prevailing wind (yellow segments) and direction and speed of wind at selected event time (white arrow).

By combining probability cones for all chemical concentration peaks, a composite picture is created (Figure 18) showing a spatial structure to the events when considered by chemical type. This composite suggests Indigo Sol Boards the likely source for (orange) Appluimonia, Roadrunner for (green) Chlorodinine and possibly Kasios for (pink) Methylosmolene and (blue) AGOC-3A. However, more convincing evidence is provided by filtering by chemical type.

Figure 18 - Composite of all chemical detection events of at least 5.5 standard deviations from background levels. Wind probability cones show likely origin of chemicals.

Figure 18 - Composite of all chemical detection events of at least 5.5 standard deviations from background levels. Wind probability cones show likely origin of chemicals.

Examining just extreme AGOC-3A detection events (Figure 19), we see the most likely origins are in the region of Roadrunner Fitness and Kasios Office factories. However, Sensor 6 suggests Roadrunner an unlikely source given that very few detection events occurred with the prevailing NW wind (which would have carried the chemical from Roadrunner if it had been the source). In contrast, westerly winds almost exclusively carry AGOC-3A detected by the sensor. With Kasios being almost due west of sensor 6 this is the most likely origin.

Figure 19 - AGOC-3A detection events of at least 6.6 standard deviations from background levels.

Figure 19 - AGOC-3A detection events of at least 6.6 standard deviations from background levels.

Similar reasoning can be applied to Methylosmolene, originating from the same source (Figure 20).

Figure 20 - Methylosmolene detection events of at least 5 standard deviations from background levels.

Figure 20 - Methylosmolene detection events of at least 5 standard deviations from background levels.

Once wind direction is taken into account, the distribution of Appluimonia events can be seen as spatially distinct and uniquely focussed around Indigo Sol Boards (Figure 21). Note also that while Sensor 9 provides the primary positive evidence, Sensor 5 also supports this. As noted above, both of these sensors showed an increase in detection levels over time, suggesting the emissions from Sol Boards has increased since late August.

Figure 21 - Appluimonia detection events of at least 4.9 standard deviations from background levels with single event on the 21st August 16:00 highlighted, marking the start of a period of increased emissions.

Figure 21 - Appluimonia detection events of at least 4.9 standard deviations from background levels with single event on the 21st August 16:00 highlighted, marking the start of a period of increased emissions.

Finally, evidence for the origin of the Chlorodinine emissions is provided in Figure 22. Sensor 6 is again particularly discriminating in ruling out Kasios as a source and instead providing compelling evidence for Roadrunner being located NW of the sensor.

Figure 22 - Chlorodinine detection events of at least 6 standard deviations from background levels with single event on the 27th August 2am highlighted.

Figure 22 - Chlorodinine detection events of at least 6 standard deviations from background levels with single event on the 27th August 2am highlighted.