Data mining techniques for analysing complex simulation models

Individual-based models of plant populations and communities can be highly complex, reflecting the underlying dynamics of the natural systems under study. Analysing and understanding model behaviour can be extremely challenging. To address this we are collaborating with researchers from the Department of Knowledge Technologies of the Jožef Stefan Institute, Slovenia on the application of data mining and machine learning techniques to the analysis of the IBMs we have developed.

Model analysis

Machine learning methods are being used to analyse the relationship between simulation outputs with IBM inputs (model parameters) in order to gain insight into the behaviour of the model system. Here we rely on a Monte Carlo approach in which simulations are based on IBM parameter values sampled at random from across a predefined parameter space. By applying machine learning methods, we can generalise over the specific simulations made and derive more general rules concerning the behaviour of the system.

Model simplification

We are looking to develop simplifications to the individual-based models in order to capture the key features of each system. The process of simplification in itself enhances our understanding of the systems being considered by identifying which processes and attributes are most relevant in a given situation. Such approximations can drastically reduce the dimension of the model and may also yield to mathematical analysis, providing more general insights than those obtained by simulation of specific scenarios.

Equation discovery is the area of machine learning that develops methods for automated discovery of quantitative laws, expressed in the form of equations, in collections of measured data. When applied to IBM simulations, equation discovery aims to select an optimal structure of the equations describing the relationship between model inputs and outputs by searching through the space of possible equation structures. By considering appropriate IBM inputs and outputs and grammar with which to construct equations it is possible to obtain a dynamic model that provides a simplification of the original IBM.

Contact: Graham Begg