WHY GEOSTATISTICS

 

We have long been dealing with statistical tools to deal with variables, and in this case we are dealing with random variables. However, an interesting question arises – are the variables we use or come across in our daily life all random? What if they are not random? Some variables might be totally deterministic, which some might fall somewhere between random variables and totally deterministic variables. Does not it make sense to take you deeper into the field of Statistics and look for options to modify statistical principles and generate modifications to deal with such not-random variables?

 

Geostatistics is the branch of applied statistics developed by George Matheron of France (1960). It deals with Regionalized Variables (RV), which are variables distributed in space. It is a category of variables that fall between random variables and completely deterministic – in this sense, the variables are partially deterministic within a spatial domain, but beyond that they behave as random. Unlike random variables, RVs exhibit spatial continuity, e.g. ozone concentration in the atmosphere, distribution of pollutants in groundwater, etc. However, the change in RVs is so complex that they cannot be described by any deterministic function, and therefore a stochastic approach is called for.

 

Geostatistical data, often called field data, are data collected at fixed locations. The locations are generally spatially continuous.

 

In order to explain the distribution of a RV, Geostatistics considers the concept of random functions (RF), whereby the set of attribute values z(x) at all locations x are considered as a particular realization of a set of spatially dependent Random Variables (RV) Z(x). In order to make this approach acceptable, certain assumptions have to be made, which are introduced under the hypothesis of stationarity.

 

 

Hypothesis of Stationarity

 

 

 

 

Measure of spatial dependence

 

Semivariance is a measure of the degree of spatial dependence between samples. The magnitude of the semivariance between points depends on the distance between the points. A smaller distance yields a smaller semivariance and a larger distance results in a larger semivariance. The plot of the semivariances as a function of distance from a point is referred to as a semivariogram or simple Variogram (as is called these days). The Variogram is a measure of how quickly things change on the average. The underlying principle is that, on the average, two observations closer together are more similar than two observations farther apart. The semivariance increases as the distance increases until at a certain distance away from a point the semivariance will equal the variance around the average value, and will therefore no longer increase, causing a flat region to occur on the semivariogram called a sill. From the point of interest to the distance where the flat region begins is termed the range or span of the regionalized variable. Within this range, locations are related to each other, and all known samples contained in this region, also referred to as the neighborhood, and must be considered when estimating the unknown point of interest.

 

Although RVs are spatially continuous, owing to logistic constraint, it is not always possible to sample every location. Therefore, unknown values in space must be estimated from data taken at specific location that can be sampled. The beauty of geostatistical interpolation/prediction lies in the ability to predict variables at such unknown/unsampled locations along with a measure of uncertainty associated with the predictions.

 

Geostatistical interpolation / Simulation

 

Geostatistical methods of interpolation, popularly known as kriging, is based on such rationale that values at points close together in space are more likely to be similar than points further apart. Although kriging has its root in mining industry, it is now increasingly used in environmental modeling.

 

Simple kriging (SK) is the simplest method of kriging as it considers the mean (m) to be known and constant in the study area. It is probably the least applicable as it requires prior knowledge of the mean which is rarely available in soil science. Ordinary kriging (OK) is used when the mean in unknown.  It accounts for variations in the mean in the study area by only assuming stationarity within the kriging neighborhood.

 

The advantage of kriging over previous interpolation methods is that it provides information on interpolation errors. Knowledge of the spatial correlation error may also be used to generate sets of realizations (simulations) of the attribute z that can be of great value for studying error propagation through spatial models that may be linked to the GIS.

 

The above-mentioned univariate geostatistical techniques of kriging assume stationarity in the data, i.e., the dataset does not exhibit trend. This assumption is often not met by field sampled data sets, which poses a severe limitation to the use of ordinary kriging. Therefore, when variation exhibits a trend, other methods of interpolation should be tried.

 

Stochastic Simulation

 

Kriging techniques minimize local error variance and tend to smooth data; ordinary kriging has the tendency to under-predict at locations of higher data values and over predict where data values are lower. Environmental modeling and decision making often requires predictions of extreme values of the attribute over the region of interest and in such cases, we wish to interpolate a rough, realistic surface or volume. Unlike kriging, stochastic simulation does not aim at minimizing local error variance but focuses of the reproduction of statistics such as the sample histogram or the semivariogram model in addition to honoring of data values. The two methods have opposing goals - kriging aims at local accuracy through minimizing a covariance-based error variance, while simulation aims at reproducing spatial structure through a covariance model.

 

Stochastic models associate randomness to the regionalized variable, which can then be regarded as one among many possible realizations of a random function. Such simulated maps look more realistic than the map of statistically “best” estimates because it reproduces the spatial variability modeled from the sample information. Stochastic simulation is thus increasingly preferred to kriging for applications where the spatial variation of the measured field must be preserved. The trade-off cost for the better reproduction of spatial features by simulated maps is that the local property of “minimum error variance” of kriging is lost. A practical consideration is that the mean prediction error tends to be larger for simulated values than for kriging estimates. It has been demonstrated that when local constraints are accounted in stochastic simulation, the production forecast has smaller errors than from smooth estimated map or a simulated map that reproduces only the histogram and semivariogram.

 

Though simulations create many different renditions that appear to be visually complex, these alternate renditions all share a visual character and all somehow manage to honor the available data. When used in this way, spatial simulation techniques are attractive not so much for their ability to generate many outcomes but rather for their ability to produce outcomes that have realistic spatial variation.