Do we need Spatial Statistics? When?

Spatial Statistics, What is it? and Why use it?

The approach taken in this post is to offer an introduction of basic and main concepts of spatial data analysis as well as the importance of its utilization in some areas like epidemiology or ecology among others. But, before to introduce a definition of spatial statistics and some concepts about this field, I consider relevant to mention some situations in which our data need to be seen as ‘spatial data’.

It is possible that people associate spatial statistics with analysis that contain numerous maps. However, it goes beyond creating these, in fact spatial data analysis is subject to internal structure of the observed data. We therefore have to be careful with the questions not directly answered by looking at the data.

We could make a long list of areas where we can apply spatial statistics: epidemiology, agriculture, ecology, environmental science, geology, meteorology, oceanography,… even econometrics. In all of these we could ask questions like the following to recognize if our data have a spatial structure:

• Does the distribution of cases of a disease form a pattern in space?Could we relate health outcomes to geographic risk factors?
• Do they influence environmental and geographical factors in the distribution and variability in the occurrence of fishery species?

But, how can we explain what is spatial statistics? I am sure that we could find infinite definitions of spatial analysis or spatial statistics. We can say that spatial statistics is responsible for analyzing the variability of random phenomena that are linked with their geographic locations. It is used to model the dependence of the georeferenced observations (these can be point observations or areal data).

Depending on the type of data and the purpose of the spatial analysis itself, we classify spatial data sets into one of three basic types: geostatistical data, lattice data (or areal data) and point pattern data. The main dissimilarity between these resides in the set of observations. Let us look at this briefly with some examples:

• Geostatistics: the geostatistical data are represented by a random vector $Y(s)$, where the locations $s \in D$ varies continuously over a fixed observational region ($D \subset \Re^{r}$). These data are characterized by spatial dependence between the locations, and the main objective in application of geostatistics is to do predictions in unobserved locations from study region. kriging is the best known technique for prediction in geostatistical data. Some examples of this data are: occurrence of species in a region, annual acid rain deposition in a town, etc.

• Lattice data: the fixed subset ($D \subset \Re^{r}$) of observations are located in a continuous region (it can have a regular or irregular shape). It is partitioned into a finite number of geographical areas with well-defined boundaries. A characterization of this type of spatial data is that neighbouring areas are usually more similar than distant ones. An example can be the observed locations from agricultural field trials (here the plots are a regular lattice).
• Point pattern data: this is the last type of spatial data where $D \subset \Re^{r}$ is itself random. The own locations determined phenomena that occurred randomly in one place, thus generating spatial patterns. One example of point pattern data is the locations of a certain species of tree in a forest (here only the locations are thought of as random).

The above explanation have only been a briefly description of three main types of spatial data. There are two basic methodologies to carry out spatial analysis: through classical statistics or by Bayesian approach (using mainly hierarchical Bayesian methods). The latter will be dealt with in more detail in future posts, given their importance as they can be applied in many situations, like spatial statistics (you can see the book ‘Hierarchical Modeling and Analysis for Spatial Data by Banerjee et al.’ for more information about this).