Cellular automata are as old as the computer. The concept was invented by Von Neumann and Ulam at the end of the 1940s while working at the Los Alamos National Laboratory on the first computers capable of performing the large calculations required in the construction and production of the atomic bomb. Cellular automatons were designed to resolve the problem of the auto-reproduction of living beings. Since then, this notion has been used in numerous fields, including Geography. In the 1950s Hägerstand, without naming it, introduced it into the geographical thematic, particularly in relation to the modelling of the diffusion of an innovation, in the County of Asby in the south of Sweden. In recent years, this notion has tended to be incorporated into a broader and more recent paradigm, that of multi-agent systems. The latter are encompassed in the general framework of the theories of complexity and of Distributed Artificial Intelligence. Cellular automata, moreover, provide an interesting technical framework for the conduction of model simulations, as they are based on simple principles which enable quick development. Nevertheless, their usage in the context of a precise geographical issues such as hydrology, industrial risks or urban development, quickly leads to the need for complex models of geo-simulation which can often be difficult to develop.
A cellular automaton (or CA) is a formal or technological machine (usually a computer application) that enables us to simulate a dynamic system. A CA is a system composed of cells which each constitute an elementary automaton. Each cell is connected to other cells, its neighbours, thus forming a network of automata (Weisbuch, 1989). The characteristics of this network constitute its topology. The configuration (or global state) of the automaton is the ensemble of the states of the cellular network at any given moment. The initial configuration is the one that corresponds to the moment t=0; it is given a priori.
A CA can remain as an abstract network of cells without reference to a particular space. But often, and in particular in geographical applications, this network is associated with a particular spatial structure, such as a geographical network (road, hydrographical, air network etc), or a regular (or irregular) sub-division of the space. Moreover, this structure is defined as a space of dimension p. In geography, one primarily uses a space of dimension two, or 2d½ for modelling a soil in elevation (Langlois 2002 et al). Such a CA will thus be termed a geographical cellular automaton.
In order to model the complex spatial phenomena that use spatial information of different types, one needs to utilise heterogeneous cells. The different types of cells are associated, in general, with different types of spatial objects, predominantly defined in different layers of geographical objects, stemming from Geographical Information Systems (GIS). A cellular layer is then a grouping of cells which are all of the same type and which are associated with the same layer of geographical objects. For instance, if the cellular network is associated with a geographical network, each sub-division of the geographical layer is associated bijectively to a cell of the cellular layer. Consequently, this jointing enables it to recuperate the initial values of the cells from the attributes of the geographical objects.
Formalising this concept more precisely, each cell can be defined as an object that possesses states, inputs and outputs, and whose structure varies over time. If there are k variables of state per cell, they are represented by a vector Si,t of dimension k, where i is the cell index and t the time index. The inputs to cell I at time t are represented by a vector Ii,t of dimension nv, where nv is the number of neighbours to the cell. This vector contains the input values stemming from neighbouring cells. Similarly, the outputs are formalised by a vector Oi,t, built on the same principle as the input vector and which contain the output values of the cell aimed at neighbouring cells. When the cellular network is regular, each cell always has the same number of neighbours (with the exception of those at the borders), a neighbourhood operator applicable to each cell is used, which enables us to quickly calculate the respective number of each neighbouring cell of a given cell i. For instance, in a net of square cells, the operator V8 calculates the 8 neighbouring cells of a given cell, which is also known as the neighbourhood of Moore.
Here, a cell is identified by its coordinates (x, y) in the cellular network, the operator V8 calculates the positioning of the neighbours to the cell (x, y) by : V8(x, y) = (x-1,y-1), (x-1, y), (x-1, y +1), (x, y +1), (x+1, y +1), (x+1, y), (x+1, y -1), (x, y -1)
The operator V4 enables us to calculate the neighbourhood of Von Neumann, formed by the following 4 neighbours:
V4(x, y) = (x-1, y), (x, y +1), (x+1, y), (x, y -1)
The border effects have to be taken into consideration in the previous formulas because when the cell is on the border of a domain some of its neighbours are thus outside the domain. These examples delimit the immediate neighbourhoods of a cell by simple contiguity but one often uses extended neighbourhoods, defined by a radius R, which considers the cells situated at a distance from the centre cell inferior or equal to R. The choice of the metrics thus determines the type of neighbourhood that is associated. For instance, the metrics of Manhattan, delimits the neighbourhoods of the Von Neumann type, and the metrics of the max, , the neighbourhoods of the Moore type. When the sub-division is formed of hexagons, the neighbourhood operator is a more complex algorithm. Finally, if the sub-division is irregular, the operator has to be based on a data structure associated with the network, known as topological structure (Langlois, 94), which enables rapid retrieval of the ensemble of the neighbouring cells of a given cell.
The cellular dynamic rests on two internal mechanisms, the first of transition and the second of exit. The mechanism of transition Ti enables a cell i, within a certain timeframe, to change state according to its input values and current state. The exit mechanism Hi enables us to determine the output values according to the input values and the state of the cell. One can also have a third mechanism R, external to the cells, which manages the transit on the communication network between cells, by determining how the outputs of the cells are redistributed towards the inputs, that is to say, how the flow of information on the network is distributed.
If the automaton is homogenous, namely, that all the cells have the same structure and have the same function, the index i of mechanisms T and H can be disposed of. In the opposite case, where the CA is heterogeneous one has to retain the index I in order to differentiate them. Consequently, it is possible to formalise, mathematically, these three mechanisms by:
Si,t+1 = Ti(Ii,t, Si,t)
Oi,t+1 = Hi(Ii,t, Si,t)
It = R(Ot)
In the latter expression, the mechanism R operates on matrixes. Ot and It correspond to matrixes where row i of Ot contains the output of cell i and the column j of It contains the input to cell j.
It is often the case however that the mechanisms H and R are non-existent. This is so when the state of the cell is public, that is to say, when it can be read directly by the neighbouring cells. In this case, the mechanism R is implicit (reduced to the identity function) as the outputs of a stage are equal to the inputs of the next stage. The cell is then termed a simplified automaton of Moore and the CA is said to be simple.
The classic example illustrating this simple case of automaton is the game of life. Consider a cellular network formed of squares, where the neighbourhood of each cell is defined by its 8 neighbours (Neighbourhood of Moore). Each cell possesses a binary state variable, 1: the cell is alive, or 0: the cell is dead. The transition mechanism is as follows: it calculates the sum of values of the input vector, which represents the number nv of neighbours alive. If the cell is alive and that nv = 2 or 3, it remains alive, otherwise for the other values of nv it dies. If it is dead and that nv =3 it becomes alive. This can be summarized by the transition graph of figure 1:
Figure 1: graph of transition of the game of life
When managing the flows between cells, it becomes necessary to conserve the mechanisms H and R. For example, the state of the cell can correspond to a stock, the outputs are the quantities exported to neighbouring cells, if there is no change in these quantities, mechanism R is identical, and thus it becomes implicit. If there is a certain loss or weakening, for instance in respect of function to distance, mechanism R has to take care of the insertion of a certain weighting of flows. Finally, it is possible that the cellular network is more complex than a simple graph between cells. It may for example resemble an electricity transmission grid, where only certain nodes are cells, for instance the supply-cells or consumer-cells. When a supply-cell sends an output flow, the latter is distributed to the consumers in an indirect manner, by going through diverse nodes internal to the network, and in accordance with a complex management structure which takes charge of the distribution of mechanism R.
A final example in which the automaton cannot be simplified when applied to non-deterministic models, such as the diffusion of a pandemic in a network of cities, can be seen where the cells are distributed in accordance with a model based on Markov chains. In this case, the state of a cell is the number of persons infected in the city. Over time, the infection will develop more or less within each city as people, some of whom are sick, travel from one city to another. The transition mechanism T takes care of the dynamic of the pandemic within each city, the output mechanism H computes the number of persons infected who leave city i at instant t. The distribution mechanism R utilises a transition matrix of probability Pi ; it is devolved within the task of conducting a random draw in order to determine towards which cell j an infected individual from cell i will be sent, according to transition probability Pi(i, j).
1. The dimension 2d½ entails that a functional dependency of the third dimension (z) exists with regard to the first two (x and y): while for each (x, y) pair of the plan only one altitude z corresponds. This is particularly so for the modelling of altitude or height in numerical models of soil while it is not so for a true 3D object (for instance a vertical line going through a spherical object crosses the surface twice).