OR/17/009 Materials and methodology

From MediaWiki
Revision as of 09:47, 22 June 2017 by Ajhil (talk | contribs) (→‎Statistical modelling — outline)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Tye, A M, Kirkwood, C, Dearden, R, Rawlins, B G, Lark, R M, Lawley, R L, Entwistle, D, and Mee, K. 2017. Environmental factors influencing pipe failures. British Geological Survey Internal Report, OR/17/009.

The work was based on a (i) GIS based data pre-processing package and (ii) a modelling package. The following sections document the GIS process and the general modelling outline.

GIS based data pre-processing

Data sources

The following YW and BGS data sets were used in the project.
Yorkshire Water provided the following data:

  • Clean water network — shapefile (ArcGIS format)
  • Waste water network — shapefile (Excel)
  • Clean water failures — shapefile (Excel)
  • Waste water failures — shapefile (Excel)
  • BGASP sewer failures — shapefile (Excel)
  • Crossings — shapefile (Excel)
  • AZNP — shapefile (Excel)
  • Raw water temperatures — shapefile (Excel)
  • Severe weather dashboard — shapefile (Excel)
  • WTW-DMA connectivity — shapefile (Excel)
  • CCTV data — shapefile (Excel)
  • Drainage area zone — MID/MIF (MapInfo)
  • Leakage control zone-DMA — MID/MIF (MapInfo)
  • Clean water area — MID/MIF (MapInfo)
  • Operational area — MID/MIF (MapInfo)
  • Waste water area — MID/MIF (MapInfo)
  • Distribution Management Area data pertaining to water source and pipe betwork pressure

The following BGS and OS data were used in the analysis:

  • GeoSure — vector (ArcGIS)
  • Collapsible ground
  • Compressible ground
  • Landslides
  • Running sand
  • Shrink-swell
  • Soluble rocks
  • Sulphide/Sulphate
  • Parent material — vector (ArcGIS)
  • Corrosivity — vector (ArcGIS)
  • Digital Terrain Model derived topographic indices (e.g. Slope, aspect, CTI, Elevation) — raster (ESRI grids)
  • OS Strategic road/rail — vector (ArcGIS)

Expected outputs for geostatistical analysis

The objectives of the GIS processing package were to:

  • Create a grid across the study area for the analysis. The resolution selected was a 100 m grid (i.e. each grid cell was 100 m x 100 m).
  • The following statistics for each grid cell
  • Length of pipe material in each cell
  • Number of clean water failures for each pipe type
  • Number of waste water failures for each pipe type
  • Summarise the following for BGS data in each cell
  • Area covered by each classification for each GeoSure layer
  • Area covered by each classification in soil Corrosivity layer
  • Area covered by selected attributes from the Parent Material layer
  • Summary information for each raster dataset

Data assessment and formatting

Pipe type material

The clean water failure data lists the different type of pipe material for the pipe that failed (NB: this doesn’t apply for the waste water pipes as they are all made from either concrete or clay). The clean water pipe failure data was separated into different pipe types for the analysis grouped into similar material categories. There were 30 different pipe types and some had no code associated with them. Some of the different pipe types were deemed to be of similar composition and were grouped to make analysis simpler (e.g. Plastic). Table 1 lists the 30 categories in the original data and how they were grouped. The groups listed as Cast iron, Plastic, Asbestos Cement and Clay were used for analysis.

Table 1    Pipe type codes, descriptions and materials used in models
Code Description Type Used in model
2 Copper Clean
16 Dummy Clean
4 Galvanised Steel Clean
12 HDPE Clean Plastic Model
22 HEP30 Clean
21 HPPE Clean
10 LDPE Clean
8 Lead Clean
11 MDPE Clean Plastic Model
25 MoPVC Clean
24 PE100 Clean
23 PE80 Clean
14 Pre-Stressed Concrete Clean
26 PVCa Clean
20 PVCu Clean
29 Stone Clean
6 uPVC Clean
7 Asbestos Cement Clean/Waste Waste pipe used in Concrete Model
30 Brick Clean/Waste
1 Cast Iron Clean/Waste Cast Iron Model
28 Concrete Clean/Waste
3 Ductile Iron Clean/Waste
15 Glass Reinforced Concrete Clean/Waste
5 Steel Clean/Waste
AK Alkathene Waste
BL Bitumen Waste
CL Cement Waste Concrete Model
CC Concrete Box Culvert Waste
CSB Concrete Segments Bolted Waste
CSU Concrete Segments Unbolted Waste
GRP Glass Reinforced Plastic Waste
IS Insituform Waste
MAC Masonary, coursed Waste
MAR Masonary, random Waste
NA Not Applicable Waste
PF Pitch Fibre Waste
PL Plastic Waste
PSC Plastic/Steel Composite Waste
PE Polyethylene Waste
PP Polypropylene Waste
PVC Polyvinyl Chloride Waste Plastic Model
RPM Reinforced Plastic Matrix Waste
RL Resin Waste
SI Spun Iron Waste
U Unknown Waste
VC Vitrified Clay Waste Clay Model

Formatting

Prior to GIS analysis, the Yorkshire Water data (provided in Excel spreadsheet format) — needed to be cleaned and reformatted so that it could be imported into the GIS (ArcGIS version 10.0). This involved ensuring that the spreadsheets were in the correct format for import into ArcGIS — i.e. column headings must be 10 characters or fewer (made up of alphanumeric characters and underscores only).

Grid creation

The analysis was carried out in a custom-built grid covering the full extent of all the data. Since the clean water and waste water areas vary very slightly, the full Yorkshire Water Operational Area was used to define the full extent of the grid. This area was provided by Yorkshire Water in shape file format. The first step was to create a rectangular mesh (grid) covering the full extent of the YW operational area. Due to the shape of the YW operating area (roughly heart-shaped), this meant that approximately half of the cells in the rectangular grid contained no data and since the study area was very large (180 km E–W by 160 km N–S), this meant creating an unnecessarily large grid. To reduce processing time, any cells that did not overlap with the YW operating area were removed from the grid. A comma separated values file (csv) was created containing the coordinates of the new grid corners. This was used to produce a 100 m x 100 m grid in ArcGIS.

Length of pipe per grid square

To calculate the length of pipe per grid square, an IDENTITY tool performed a geometric intersection between the pipe network data and the grid. This cuts each section of pipe network line at the boundary of each grid square, so that only the portions of pipe line that fall into each grid square are selected. The lengths of each section of line are then calculated and summed for each grid square.

Pipe material per failure

Pipe material per failure is a simple statistical assessment (frequency analysis) of pipe material as identified within the burst datasets supplied by Yorkshire Water. Grouping of the various pipe materials broadly followed the following e.g. the pipe groups of Cast/Ductile Iron, Plastics, concrete, copper and clay etc.

Processing BGS/OS vector data

The aim of processing the BGS/OS data was to summarise, for each grid square, either the total area covered by each class (e.g. Class A of GeoSure Shrink-Swell) for each dataset or, in the case of the road data, the length of road of each type. The lengths of road of each type were calculated in the same way as calculating the length of pipeline per grid square, as outlined above.

Processing BGS/OS raster data

Raster datasets utilising terrain data were developed from NextMap Digital Terrain Model (DTM) data (50 m resolution) held by BGS and processed using ARCGIS and Spatial Analyst (elevation, slope, aspect, compound topographic index CTI). Standard ESRI-Spatial analysis tools and formulae were used to derive the terrain coefficients of slope and aspect. The CTI is a function of both slope and the upstream contributing area per unit width orthogonal to the direction of flow and is a steady state wetness index. It was assessed using the spatial analyst tool and follows the standard formulae of:

CTI = ln(a/tan B)                Eq. 1

where a = Upstream contributing area (m2) and is derived using the standard tools for Flow- accumulation and B = Slope (radians).

Back interpolation of all grid data against the h vector asset data was performed using MapInfo and Vertical Mapper, providing standard analysis of minimum, maximum, range and cell count of terrain coefficients (per object).

Final GIS results

The final GIS datasets consisted of separate csv files summarising the area or length of each class/road type for each dataset, with the grid cell identifier (ID). These GIS datasets summarised the length of pipes per grid cell, number of failures per grid cell and the number of pipe failures of each pipe type per grid square. These were provided for geostatistical analysis with a file containing the YW grid cell ID’s and the National Grid Easting/Northing of the centre point of each cell.

Topographic and categorical variables used

The continuous variables used in the modelling are shown in Table 2 along with an explanation of their relevance to pipe network failure. Similarly Table 3 reports on the categorical variables used. A continuous variable is a variable that has an infinite number of possible values, whereas a categorical value can only take on a certain number of values, and in this instance are defined as classes related to a geohazard.

Table 2    The continuous variables used in the model and an explanation as to impacts on pipe networks
Variable Description Explanation Source
Compound Topographic Index (CTI) or Wetness Index This is an index determining moisture in a 100 x 100 m cell as a function of slope, aspect and the upstream contributory area. It is a steady state wetness index and is commonly used to quantify topographic control on hydrological processes. The CTI should identify areas of ground of different potential moisture contents by examining the paths surface water may follow across the landscape. Thus cells with a high CTI may have a greater potential for waterlogging and the establishment of corrosion cells. Calculated using Terrain Analysis on the NEXTMAP. 50 x 50 m Digital Terrain Model.
Slope Average Slope within a 100 x 100 m cell. Slope steepness may dictate (i) drainage rate, (ii) ground stability or (iii) pipe movement. Calculated using Terrain Analysis on the NEXTMAP 50 x 50 m Digital Terrain Model.
Elevation Mean Elevation within a 100 x 100 m cell. The effects of elevation might indicate the (i) positioning of the water table, (ii) effects of altitude and related changes in temperature. Calculated using Terrain Analysis on the NEXTMAP 50 x 50 m Digital Terrain Model.
A-road The density of A or major road within a 100 x 100 m cell. A-class roads are likely to carry more and heavier traffic than other roads, however, they are generally built to better specifications than B- and C- class roads.


Linked to vibrations affecting pipe integrity.

Vector map Open OS maps at 1:25000 to 1:50000.
B-road The density of B road within a 100 x 100 m cell. Linked to vibrations affecting pipe integrity. Vector map Open OS maps at 1:25000 to 1:50000.
C-road The density of C or minor road within a 100 x 100 m cell. Although C-class roads are unlikely to have the quantity of traffic of other roads the placement of the pipes might not be as for the other roads. Also linked to vibrations affecting pipe integrity. Vector map Open OS maps at 1:25000 to 1:50000.
A-Resistivity This data set identifies the likely resistivity values for the main lithological resistivity type for a geological unit. It includes different environmental situation such variations in porosity and water saturation. Resistivity is a controlling factor for the corrosion of metal pipes and is linked particularly to clay content. Datasets used are DiGMapGB, the National Geotechnical Database, field resistivity values also the Berg algorithm and expert input.

Entwisle et al. 2014[1].
B-Resistivity This data set identifies the likely resistivity values for the secondary lithological resistivity type for a geological unit. It includes different environmental situation such variations in porosity and water saturation. Resistivity is a controlling factor for the corrosion of metal pipes and is linked particularly to clay content. Datasets used are DiGMapGB, the National Geotechnical Database, field resistivity values also the Berg algorithm and expert input.

Entwisle et al. 2014[1].
Aspect North Northness was computed as the sine of aspect (compass direction of slope). The direction a slope faces is important as it can affect ground thermal regimes and the moisture content of soil. A negative Aspect North is equivalent to Aspect South. Calculated using Terrain Analysis on the NEXTMAP 50 x 50 m Digital Terrain Model.
Aspect East Eastness was computed as the cosine of aspect (compass direction of slope). The direction a slope faces is important as it can affect ground thermal regimes and the moisture content of soil. A negative Aspect East is equivalent to Aspect West. Calculated using Terrain Analysis on the NEXTMAP 50 x 50 m Digital Terrain Model.
Dwellings Number of dwelling per 100 x 100 m cell. This variable can act as a more local proxy for how use can cause pressure changes and stress in the system. Data obtained from Office of National Statistics from 2011 census.
Table 3    The categorical co-variables used in the model and an explanation
as to their impacts on pipe networks
Variable Description Explanation Source
Parent Material Dominant soil parent material type, derived from the DiGMap50 surface geology (DiGMap50Plus), in the 100 x 100 m cell. Used as a generalised description of the geology of the parent material e.g. granite, sandstone and its possible influence on corrosivity or pipe failure. BGS DiGMapPlus-Parent Material 1:50 000 scale (previously called the soil parent material map.

(Lawley, 2011[2]).
Dominant mineralogy Dominant bulk mineralogy, derived from the DiGMap50 surface geology (DiGMap50Plus) in the 100 x 100 m cell (eg dominantly carbonate/siliceous etc). This is a very simplified classification of mineralogy and can be used to assess whether certain mineralogy (e.g. silica rich, carbonate rich, acid (igneous)) have an influence on corrosion or pipe failure. BGS DiGMapPlus-Parent Material 1:50 000 scale (previously called the soil parent material map.



(Lawley, 2011[2]).

G-Grain The typical grain size of soil parent materials as from the DiGMap50 surface geology (DiGMap50Plus), in the 100 x 100 m cell. This gives an indication of the dominant particle size (clay, silt, sand) of the soil parent material or subsoil. Will provide information regarding drainage. BGS DiGMapPlus-Parent Material 1:50 000 scale (previously called the soil parent material map.

(Lawley, 2011[2]).
Soil Group The typical grain size of surface soils (as predicted from the DiGMap50 surface geology (DiGMap50Plus), for the 100 x 100 m cell. This gives an indication of the dominant particle size (clay, silt, sand) of the surface soil. Will provide information regarding drainage. BGS DiGMapPlus-Parent Material 1:50 000 scale (previously called the soil parent material map.

(Lawley, 2011[2]).
Engineered-materials Classification of the parent material units for use as engineering fill partly based on The Highways Agency series 600). A description) of these materials expected within in the 100 x 100 m cell. This provides information regarding the behaviour of the soil with respect to it being used as a backfill material e.g. presence of sulphides or sulphates. BGS DiGMapPlus-Use as Engineered Fill 1:50 000 scale.

(Entwisle et al. 2013).
Collapsible ground Collapsible ground hazard from Geosure and applied to DiGMap-50Plus. Collapsible ground occurs in certain deposits that consolidate very rapidly when loaded and then saturated. Ground resulting strain could affect pipework and potentially weakening corroded pipe leading to failure. Obtained from the BGS collapsible ground dataset consisting of 5 hazard categories uses the DiGMapGB-50, BGS documents on the geology, the BGS National Geotechnical Properties Database. It is based on known or likely behaviour of geological units. There is some input of expert knowledge.

Aldiss, D, Diaz Doce & Northmore 2014[3], Booth et al. 2010[4], Lee and Diaz Doce (2010, 2014[5]).
Compressible Ground Compressible ground hazard from Geosure and applied to DiGMap-50Plus. Compressible ground is highly deformable under load or water removal. They include very soft clay and peat. The differences in differential movement at the interface between compressible ground and less compressible ground and the variation in compressibility within compressible ground might affect the pipes. Obtained from the BGS compressible ground dataset consisting of 5 hazard categories. The dataset was created using DiGMapGB-50, Superficial Thickness Model, the BGS National Geotechnical Properties Database and expert knowledge.

Booth et al. 2010[4], Jones, L D, et al. 2015[6], Lee and Diaz Doce (2010, 2014[7]).
Landslides Landslide hazard from Geosure and applied to DiGMap-50Plus. Ground movement due to landslides, could weaken or brake pipes including corroded pipes. Obtained from the BGS landslide ground dataset consisting of 5 hazard categories based on the geology and their likely behaviour, mapped landslides and the slope angle from a digital terrain model (DTM) and expert judgement.

Booth et al. 2010[4], Dashwood et al. 2014[8], Lee and Diaz Doce (2010, 2014[7]).
Running Sand Running sand hazard from Geosure and applied to DiGMap-50Plus. Running sand can occurs where saturated sand or coarse silt is intercepted by an excavation or borehole. The flow of sand into the excavation can cause ground movement affecting pipe stability, could weaken corroded pipe. Obtained from the BGS running sand dataset consisting of 5 hazard categories based on the known behaviour of geological units and expert knowledge.

Booth et al. 2010[4], Lee and Diaz Doce (2010, 2014[7]).
Shrink swell Shrink-swell hazard from Geosure and applied to DiGMap-50Plus. This hazard is usually identified from. Increases in water content causes swelling and drying causes shrinkage. The ground movement can damage already corroded pipes. Obtained from the BGS Shrink Swell dataset which consists of 5 hazard categories based primarily on the modified plasticity index, which is derived from the liquid limit, plastic limit and the percentage of particles less than 0.425 mm. It uses the DiGMapGB-50, BGS superficial thickness model, BGS National Geotechnical Properties Database, a simplified glacial till layer and some input from expert opinion.

Booth et al. 2010[4], Diaz Doce et al. 2015[9], Lee and Diaz Doce (2010, 2014[7]); Jones and Terrington (2011)[10].
Soluble rocks Soluble rocks (karst) hazard from Geosure and applied to DiGMap-50Plus. Ground movement due to dissolution of certain geological units can damage pipework including corroded pipes. Obtained from the BGS soluble rocks dataset that consists of 5 categories of the likely occurrence of dissolution features. It is base on the geological units (DiGMapGB-50). A digital terrain model, superficial thickness model, Glacial limits dataset, BGS superficial permeability data, known occurrences of solution features. and expert opinion.

Booth et al. 2010[4], Farrant et al. 2015[11], Lee and Diaz Doce 2014[7].
Soil Corrosivity This is dataset and classifies soils based on their corrosive properties. It is based on the CIPRA Index and applied corrosion classification occupying the greatest area in 100 x 100 m cell is used. Corrosive ground can, potentially, damage some types of pipe. Obtained from the BGS Ferrous Corrosion dataset dataset which is based on 5 categories of soil properties. It is based on the CIPRA classification scheme.

(Tye et al. 2012).
Sulphide/Sulphate This is the BGS sulphide/sulphate dataset. The presence of sulphide minerals can cause corrosion through their oxidation and the formation of H2SO4. The presence of elevated sulphate is often associated with the dissolution of gypsum deposits, thus causing subsidence.
Distribution Management Area (data) This is YW data and is data for water source. The source of water can have an effect on pipe networks. This can either be through the chemical nature of the water or the processing that is required before entering the system.

Statistical modelling — outline

The data available after the Pre-Processing (see GIS based data pre-processing) were of two kinds. The first were records of pipe bursts for a particular pipe material and for clean or waste water, each with a particular location in space. All pipe bursts were considered over the previous 10 years (2004 to 2014) as the target variable of interest. The second were the potential explanatory factors. These were mapped on 100 × 100 m cells and include those data listed previously (Table 3). The set of pipe bursts were considered for a particular pipe material and water type as a realization of a spatial point process. In particular it is assumed a non-homogeneous Poisson spatial point process (Diggle, 2013[12]). Events of such a process are mutually independent, but the expected number of events per unit area (density of the process) might vary spatially. The ‘spatstat’ package was used (Baddeley and Turner, 2006[13]) in R programming language for this purpose. This allows the estimation of non-homogeneous Poisson models by maximum likelihood. The models were fitted with the density of the Poisson process (i.e. the expected number of bursts per unit area) and modelled as a function of possible environmental explanatory factors.

In order to estimate meaningful models it is necessary to define the domain in two-dimensional space in which the point process is defined. For this purpose the data on pipe distribution were used (density (length) of pipes by material and water type). A burst can only be recorded in a 100 × 100 m cell where the pipe of interest occurs. In the setting of the ‘spatstat’ package a mask could be defined from the pipe density data to define the domain within which events can possibly occur. It was also necessary to consider pipe density as an explanatory factor in the model. Pipe density varies across the Yorkshire Water area, and this inevitably induces variations in the density of the modelled spatial point process for bursts, even if external risk factors are spatially uniform. A ‘null’ model for the density of the Poisson process, therefore, included the log density of the pipe type (construction material) and water type (clean or waste) of interest by default. The other explanatory factors could then be considered, and assessment as to whether or not they provide additional information on the expected local density of pipe bursts.

Two alternative models for the density of pipe burst events could be compared because, in the ‘spatstat’ package, they are estimated by maximum likelihood. Two models are said to be nested if the simpler model can be regarded as a special case of the more complex. Thus, if model A contains only log pipe density as an explanatory factor, and model B contains log pipe density and compound topographic index (CTI), then model A is said to be nested in model B since model A is equivalent to model B with coefficients for CTI set to zero. The evaluation of the null hypothesis could then be evaluated; that CTI is unrelated to the density of pipe bursts by comparing the maximized log-likelihood for model A, l A, with that for model B, l B. Under the null hypothesis the statistic

L = 2(lBl A)       (Eq. 2)

is distributed asymptotically as chi-squared with degrees of freedom equal to the difference between the number of parameters estimated for the two models (1 here). Note that this would not be true if model A were equivalent to model B with the coefficient for CTI set at a boundary (Cox and Hinkley, 1990[14]).

A more general comparison between models, not necessarily nested, can be made by computing Akaike's information criterion, AIC (Akaike, 1973[15]), which is a measure of the relative quality of statistic models for a given set of data. If a model has P parameters, and the maximized likelihood for its fit is then

AIC = 2P–2l.       (Eq. 3)

It can be shown that selecting the model that minimizes AIC in some set of alternative models minimizes the expected information loss from the selection process. In this study a two-stage approach was taken to the selection of predictor variables for the non-homogeneous density of burst of pipes for a particular water type (clean or dirty) and made from a particular material. First, a list of potential explanatory factors elicited from experts at Yorkshire Water and presented in order of importance according to expert opinion. We then used this list to propose and fit test models as follows.

  1. A ‘null’ model, as described above, where the local density of bursts is a function of the log density of pipes of the target type and material in the local 100 × 100-m cell.
  2. A series of models, each with log density of the pipe type and material and just one additional explanatory variable taken from the elicited list. Each model could be compared to the null model by means of the log-likelihood ratio.
  3. A sequence of models in which each predictor was added to the set in turn, adding predictors in the order that they were presented in the elicited list. The improvement to the model achieved by adding each predictor could be tested by comparing it with the previous model in the sequence using the log-likelihood ratio statistic L.

On this basis the predictors, as identified by Yorkshire Water's experts, were examined, and an assessment made of the statistical evidence that they are informative about the expected density of bursts. Additional predictors not identified with factors identified by the experts were then considered.

Selecting candidate variables to add to those identified by elicitation avoided adding possible predictors correlated with variables already in the model. For this reason the correlations were examined between all available continuous predictor variables. We also examined the principal components of the correlation matrix. This allowed identification of additional predictor variables that were not correlated with predictors already in the model from the expert elicitation. Measurement of the degree of association between a categorical and a continuous predictor variable was undertaken by computing the coefficient of determination for a simple linear model in which observations corresponded to different levels of the categorical variable having different mean values of the continuous variable. The square root of the coefficient of determination is comparable to the correlations between continuous variables. Having identified a subset of additional predictors their potential value in non-homogeneous Poisson models for the density of bursts was tested in two ways:

  1. As with variables identified in the expert elicitation, models were produced in which the candidate variable was the sole predictor and compared with the null model on the log-likelihood ratio.
  2. Starting with the set of predictors identified from among those proposed from the elicitation, each of the additional predictors was added in turn, testing the improvement to the model on the log-likelihood ratio.

The ‘spatstat’ package provides a useful diagnostic for assessment of a model once it has been fitted. This is a ‘lurking variable’ plot. The expected number of bursts within a local sub-region can be computed from the fitted model, and the difference between this number and the actual number of bursts in the sub-region is a residual. The residuals were plotted as a map. The lurking variable plot shows the accumulated sum of residuals from south to north or west to east across the study region. The accumulated residuals over the whole region are zero, but the examination of the fluctuations of the accumulated residuals within the region with respect to an envelope shows the local deviation of accumulated residuals from zero, which is consistent with random fluctuation. This helps to identify variations in the apparent density of the process that the factors included within the model do not account for.

Model outputs

Model coefficients and maps

The model outputs come in two forms. Firstly the models are composed of a series of covariates that are ranked in order of their importance. The ranking is undertaken using log likelihood ratio (LLr) for the model with pipe density and the covariate, relative to the null model (pipe density the only covariate). Ranking on LLR is equivalent to ranking on the AIC. Continuous covariates used in the model will also have either a positive or negative sign. This indicates whether it has a positive or negative correlation to the number of pipe failures per unit length. Categorical variables are different in that a coefficient will be produced for each of the classes for the covariate. These can then be interpreted as to how they may be affecting pipe failure.

For interpretation of the coefficients of the continuous and categorical variables and their influence on failures the pipe network we use those produced when single covariates are added to the Null model. Coefficients are also produced when the covariates are added sequentially to the Null model. These coefficients are used when we assess the influence of each covariate compared to the others in the production of heat maps (see 2.4.2). These are used because in the sequential model there is a common intercept.

The second output comes in the form of maps of the modelled area along with data regarding the cumulative sum of raw residuals of the model for the X–Y co-ordinates of the spatial area being modelled. These graphs of the cumulative sum of raw residuals are known as Lurking Variable plots. Ideally the cumulative raw residuals should be within the limits imposed by the elipitical feature at zero which signifies the error in the model that can be considered random noise. Where it is beyond this ellipse, the graphs show the extent to which the model is under or over-predicting the number of expected pipe failures per unit length (density) of pipe, indicated by the raw sum of residuals. A negative residual suggests that the model is over-predicting (blue colour in Figures) whilst a positive residual suggests that the model is under-predicting (red colour in Figures). Comparing how the different models are performing overall can be undertaken by comparing the sums of the raw residuals from the lurking variable plots, with lower values indicating improved model fits.

Heat maps

The basic model for the non-homogenous intensity of the pipe failure process takes the form

𝜆 = exp{𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 … … . }       Eq. 4

where β0 is a constant intercept, β1 is the coefficient for the first covariate, x1 etc. For any cell in the map this will give an expected intensity of the process.

The total intensity maps were created by multiplying the individual intensity maps (each of the form exp(βi*xi)). The resultant predictions of pipe failure intensity λ are extremely small numbers (e.g. 10-61), so in order to make them user-friendly, the total intensity maps were scaled using the ‘Scale’ function in R, which for each cell subtracts the mean value of the output for all the cells and divides by the standard deviation. The scaled output values fall in the range of single to double digit numbers and are therefore more easily symbolised, labelled and interpreted using GIS software.

In addition, we can decompose equation 4 into multiplicative components [exp{βixi}]. These components can be used to show how any particular covariate contributes to the expected intensity of failures across the region in heat maps. Some caution must be exercised in the interpretation of these because of the possibility of correlations among the covariates. These individual heat maps have been plotted using a standardised scale using the lowest and highest model coefficients across all the significant covariates. This enables us to plot spatially the impact of each model covariate relative to each other and identify those covariates which contribute most to the overall intensity.

There is a difference between how the values for continuous and categorical covariates are obtained for each cell. For continuous covariates (e.g. slope, road type) the model coefficient obtained from the final sequential model for each covariate is multiplied by for example, the mean slope within a square or the length of a particular road type to give the intensity value.

For categorical covariates, the co-efficient for each class obtained from the final model is the value of the intensity given to that cell.

References

  1. 1.0 1.1 Entwisle, D C, White, J C, Busby, J P, Lawley, R S, and Cooke, I L. 2014. Electrical resistivity model of Great Britain: User Guide. British Geological Survey Open report, OR/14/030. British Geological Survey, Keyworth, Nottingham, UK. 15pp.
  2. 2.0 2.1 2.2 2.3 Lawley, R. 2011. The Soil Parent Material Database: A User Guide. British Geological Survey Open Report, OR/08/034. 53pp
  3. Aldiss, Diaz Doce & Northmore. 2014. GeoSure Version 7 methodology: Collapsible Deposits. British Geological Survey Internal Report IR/14/016. British Geological Survey, Keyworth, Nottingham, UK. 31pp.
  4. 4.0 4.1 4.2 4.3 4.4 4.5 Booth, K A, Diaz Doce, D, Harrison, M & Wildman G (editors). 2010. User guide for the British Geological Survey GeoSure dataset. British Geological Survey Open report OR/10/006. British Geological Survey, Keyworth, Nottingham, UK. (www.nora.nerc.ac.uk/13840)
  5. Lee, K A and Diaz Doce. 2014. User guide for the British Geological Survey GeoSure dataset. British Geological Survey Open report OR/14/012. British Geological Survey, Keyworth, Nottingham, UK. 17pp.
  6. Jones, L D, Diaz Doce, Lee, K A & Entwisle, D C 2015. GeoSure Version 7 Methodology: Compressible Ground. British Geological Survey Internal report IR/14/015. British Geological Survey, Keyworth, Nottingham, UK. 30pp.
  7. 7.0 7.1 7.2 7.3 7.4 Lee, K A and Diaz Doce. 2014. User guide for the British Geological Survey GeoSure dataset. British Geological Survey Open report OR/14/012. British Geological Survey, Keyworth, Nottingham, UK. 17pp.
  8. Dashwood, C, Diaz Doce, D, and Lee, K A. 2014. Geosure Version 7 methodology@ Landslides (Slope Instabiity). British Geological Survey Internal report, IR/14/014. British Geological Survey, Keyworth, Nottingham, UK. 36pp.
  9. Diaz Doce. D, Jones, L D, and Lee K A. 2015. GeoSure Version 7 Methodology: Shrink Swell. British Geological Survey Internal report, IR/14/017. British Geological Survey, Keyworth, Nottingham, UK. 28pp.
  10. Jones, L D & Terrington, R. 2011. Modelling volume change potential in the London Clay. Quaternary Journal of Engineering Geology and Hydrogeology, 44, 109–122. http://nora.nerc.ac.uk/13629/
  11. Farrant, A R, Cooper, A H, and Diaz Doce. 2015. GeoSure Version 7 methodology: Soluble rocks (Dissolution). British Geological Survey Open report, OR/14/012. British Geological Survey, Keyworth, Nottingham, UK. 63pp.
  12. Diggle, P J. 2013. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns (third edition) Boca Raton:Chapman and Hall/CRC Press.
  13. Baddeley, A, and Turner, R. 2006. Modelling spatial point patterns in R. In Baddeley, A, Gregori, P, Mateu, J, Stoica, R, and Stoyan, D (Eds). Case Studies in Spatial Point Process Modeling, Lecture Notes in Statistics Volume 185, pp.23–74.
  14. Cox, D R, and Hinkley, D V. 1990. Theoretical Statistics. Chapman & Hall, London.
  15. Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle, in: Second International Symposium on Information Theory, edited by: Petov, B N, and Csaki, F, Akademia Kiado, Budapest, 267–281.