Houghton, James with Michael Siegel, Daniel Goldsmith, Allen Moulton, Stuart Madnick and Anton Wirsch  "A Survey of Methods for Data Inclusion in System Dynamics Models: Methods, Tools and Applications", 2014 July 20-2014 July 24

Online content

Fullscreen
A SURVEY OF METHODS FOR DATA INCLUSION IN SYSTEM

DYNAmics MODELS:
METHODS, TOOLS AND APPLICATIONS

James Houghton’, Michael Siegel’, Daniel Goldsmith”,
Allen Moulton’, Stuart Madnick’, Anton Wirsch?

‘Sloan School of Management, Massachusetts Institute of Technology
Eduventures, Boston MA

77 Massachusetts Avenue, E70-676G
Cambridge, MA 02139

281.728.6848

houghton@mit.edu, msiegel@mit.edu, dgoldsmi@mit.edu,
amoulton@mit.edu, smadnick@mit.edu, antonw@mit.edu

In the Proceedings of The 32nd International Conference of the System Dynamics Society;
July 20-24 2014; Delft, The Netherlands.

Abstract

Numerical data is experiencing a renaissance because 1) traditional data such as census and
economic surveys are more readily accessible 2) new sensors are measuring things that have
never been measured before, and 3) previously ‘unstructured’ data - such as raw text, audio,
images, and videos - is becoming more amenable to quantification. Because of this explosion
and the popular buzz surrounding ‘Big Data’, clients expect to see strong incorporation of data
methods into dynamic models, and it is imperative that System Dynamics Modelers are fully
versed in the techniques for doing so. The SD literature contains surveys that explain methods
for including data in system dynamics modeling, but techniques have continued to develop. This
paper attempts to bring these surveys up to date, and serve as a menu of modern techniques.

1 INTRODUCTION

In 1980, Jay Forrester * enumerated three types of data needed to develop the structure and
decision rules in models: numerical, written and mental data, in increasing order of importance.
While this prioritization is appropriate, it is numerical data that has experienced the most
development in the 25 years since Forester made his enumeration. In this paper, we’ll focus on
how numerical data can be incorporated into models when written and mental data are known,
and survey the techniques for doing so.

Motivation and Purpose

Numerical data is experiencing a renaissance because 1) traditional data such as census and
economic surveys are more readily accessible 2) new sensors are measuring things that have
never been measured before, and 3) previously ‘unstructured’ data - such as raw text, audio,
images, and videos - is becoming more amenable to quantification’.

Because of this explosion and the popular buzz surrounding ‘Big Data’, clients expect to see
strong incorporation of data methods into dynamic models, and it is imperative that System
Dynamics Modelers are fully versed in the techniques for doing so.

The SD literature contains surveys (Peterson 1976° and Eberlein 1985‘) that explain methods for
including data in system dynamics modeling, but techniques have continued to develop. This
paper attempts to bring these surveys up to date, and serve as a menu of modern techniques.

Structure

The paper is structured to follow the modeling process laid out by John Sterman in ‘Business
Dynamics": 1) Problem Articulation and Boundary Selection, 2) Formulation of a Dynamic
Hypothesis, 3) Formulation of a Simulation Model, 4) Testing, 5) Policy Design and Evaluation.

Within each major step we discuss specific modeling tasks and relevant data techniques, and
give a brief overview of the mechanics of that technique. We refer the reader to seminal works
and good tutorials for further learning. Where appropriate we list some of the major software
packages that support each technique. We show how the technique can be used specifically in
the modeling task, with examples drawn from the system dynamics or related literature.

Scope

This is clearly not an exhaustive list of data techniques that could be applied to System
Dynamics. Instead, we survey the techniques that have precedent in the SD literature, and
connect them to outside traditions that further support them.

We choose only to touch on the parts of the model building process that show clear promise of
benefitting from numerical data. We choose not to investigate methods of data collection or
elicitation, which are covered elsewhere’.

2 PROBLEM ARTICULATION AND BOUNDARY SELECTION

The first and most important model-building step is to clearly identify the purpose for a model
and the problem that it hopes to solve. This step requires investigation of the dynamic behavior
of the system and its problem symptoms, and determination of the appropriate scope and
resolution of the model. Randers explains this step in the classic ‘Guidelines for model
conceptualization"’; Mashayekhi and Ghili® discusses problem definition as an inherently
iterative process.

Omitted Modeling Tasks
From this section we omit qualitative tasks: phrasing the research question, description of
problem behavior, and choice of model structure boundary.

2.1 IDENTIFY REFERENCE MODES

Looking at the time-series data representing the system of interest, a modeler should be able to
find the symptoms of the problem she is trying to solve. The process of identifying reference
modes is described well by Saeed’ and VanderWerf”’.

Finding reference modes within time-series data is not always straightforward, and may require
the modeler to look at the data in a variety of ways before the behavior becomes clear.

2.1.1 Exploratory DATA ANALYSIS AND DATA VISUALIZATION
Exploratory data analysis refers to visualization and statistical summary techniques that are
designed to give the modeler an intuitive understanding of the nature of the data.

Description
Visualization methods generally show how a parameter of interest varies according to an
independent attribute (time, geography, property, or relationship). Summary statistics can
investigate the relationship of data with itself, by looking at averages, variances, and
correlations.

Among others, Tukey promoted Exploratory Data Analysis for development of intuition about
data, and data visualization and summary statistics are now both an integral part of introductory
statistics courses and an ongoing area of research. Tufte” provides the quintessential resource
for visualization of quantitative information, and Yau" gives an introduction to modern
techniques and tools. Keim™ discusses visualization as a form of data mining.

The vast majority of data or numerical analysis software suites provide some form of summary
statistic and data visualization capability: Python(NumPy™, Matplotlib’®, Bokeh’’), R(native),
Javascript(D3"*), matlab™*(native), gnuplot”®, etc.

Application to Reference Mode Identification

To identify behavior modes, the modeler needs to have a good understanding of the dynamic
behavior of the system. Exploratory Data Analysis in the form of visualization and simple
summary statistics allows the modeler to form that intuition by presenting numerical data in
human-digestible formats. These formats allow the modeler’s eyes to look past noise to see the
variety of growth, decay, or oscillatory modes that constitute the behavior of interest.

Khan, McLucas and Linard*? use a variety of visual and aggregation methods to develop
reference modes for the salinity of the Murray Darling basin.

2.1.2 FREQUENCY SPECTRUM ANALYSIS

Frequency Spectrum Analysis allows the modeler to see the strength of each of the oscillatory

modes in her sample data.

Description

It is not uncommon to think about time-series data as being
composed of a number of oscillations at different frequencies
superimposed on one another. Fourier transforms are used to
estimate the relative contribution of each of these oscillatory
modes to the measurements. Fourier transforms come out of the
signal processing community, with a seminal paper by Cooley and
Tukey”, and a good overview of the method by Duhamel”.

Software packages that support Fast Fourier Transforms and
Frequency Spectrum analysis include Python (Numpy.fft”), R
(stats.spectrum”’), Matlab”° (fft), and Excel (Analysis Toolpack)

Application to Reference Mode Identification

Frequency Spectrum Analysis can reveal the dominant modes of
oscillation, and help the modeler identify the frequency band that
captures the Arango and Moxnes”’
demonstrate the use of Spectral Analysis to identify cyclical

behavior of interest.
behavior in energy markets, as can be seen in Figure 1.

2.1.3 PHASE PORTRAITS

Price ($/Unit)

0 20 40 60
time (periods)
Autospectrum

0 005 01 015 02 0.25
cyclesiyr

FIGURE 1: MARKET PRICE HISTORY
AND SPECTRAL DENSITY IN ARANGO.

AND MoxNnes”’, SHOWING TWO
MAJOR OSCILLATORY MODES

A phase portrait factors out the time component of a model to show how a pair of system

components vary with respect to one another. The shape of the phase portrait curves can show

if one variable is driving another, or if they oscillate in-phase or opposed to one another. A

phase portrait can show if oscillations are being damped or becoming unstable, if the

oscillations follow a standard pattern, or if the system is exhibiting chaotic behavior.

Description

A phase portrait is a set of curves whose x coordinate is
described by the values of one variable of interest over time,
and the y coordinate described by another. A phase portrait
may show the direction of the flow in this space when the
system is initialized with a variety of values for each
parameter.

Phase plane analysis is frequently used in the Control Theory
and Complexity Theory traditions. A number of texts provide
introduction to phase plane analysis. See Chapter 2 in Enns
and McGuire”®, Chapter 2 in Tien”®, and Tseng*’. Most plotting

FIGURE 2: PHASE PLANE DEMONSTRATION
OF SECOND ORDER SYSTEM IN GUNERALP™.

tools can create a phase portrait, although some provide more established mechanisms for
doing so, including Python (PyDSToo!*), and Matlab” (pplane).

Application to Reference Mode Identification

Phase portraits can be useful tools in determining behavioral modes and coming to identify
causal behavior. When phase plots show cyclic behavior the modeler can expect to see second
order structure driving the model. When the motion of stocks is well correlated, she should look
for a behavior that drives both, and so forth. Guneralp” demonstrates the use of phase portraits
to understand the dynamic behavior of a variety of simple system structures, as can be seen in
Figure 2.

2.2 INFER APPROPRIATE AGGREGATION LEVELS

System dynamics depends on the ability to aggregate individual actors or elements into groups
whose behavior can be modeled as a unit. These groups may be based upon age, location, or
other characteristics that define the group and its expected behavior. Rahn* describes how the
process of aggregation can make stochastic behavior into analytically tractable flows, and takes
a good look at this essential component of system dynamics modeling.

In many cases there will be a natural set of choices for levels of aggregation, or standard for
doing so. For instance, students may be aggregated by grade, or into Elementary, Middle, or
High School. When such a clear-cut distinction is not available, clustering algorithms can be used
on the dataset to determine groups.

2.2.1 MACHINE LEARNING CLUSTERING ALGORITHMS

Description

Clustering algorithms work to identify sets of data-points in which the difference in attributes
between members of each group is minimized, and the difference between groups is
maximized. There are a variety of clustering algorithms that produce different types of clusters.
Clustering Algorithms come out of the Artificial Intelligence community, and a survey of these
algorithms is provided by Xu and Wunsch™.

Software packages that provide clustering algorithms include Python (Scikit-Learn®>), R (various
packages), Matlab (Statistics Toolbox)

Application to Data Aggregation

In situations where intuitive or standardized methods of aggregation are infeasible, the modeler
may choose to use a machine learning algorithm to identify cohorts based upon their shared
attributes. Onsel, Onsel, and Yucel*®; and Pruyt, Kwakkel, and Hamarat®’ discuss clustering of
behavior modes, although direct clustering for aggregation remains to be demonstrated in SD
literature.

3 FORMULATION OF DYNAMIC HYPOTHESIS

In the second stage of model building, the modeler begins to determine the structure of the
model in a largely qualitative way. Inferring the overall structure of a model based upon data is
challenging, although there have been some attempts to do so. Data methods can be more
helpful in an iterative modeling process where the structure of a new model may be based upon
previous similar models, and the inference task is to infer which of a set of previous models best
represents the current data.

Omitted Modeling Tasks

We omit here causal mapping of the system as this is the step in which the modeler begins to
add written and mental data to the model, and as such is less of a numerical task. We also omit
the creative and intuitive task of actual hypothesis generation.

3.1 MODEL SELECTION

In some cases there may be disagreement about the general structure of the system, as
different parties have different mental models of the way the system works. If simple versions
of these mental models are specific, data methods can be used to determine the relative
likelihood that each model structure could be responsible for creating the observed data. The
relative likelihoods of the model without respect to parameters are calculated as ‘Bayes
Factors’.

3.1.1 Markov CHAIN Monte CarRLo

Markov Chain Monte Carlo (MCMC) can be used to infer the distributions of input parameters to
a model when the output behavior of the system is known through measurement. In model
selection, MCMC assumes the choice of model to be the parameter of interest.

Description

MCMC chooses a random set of parameters from ‘prior’ input distributions, executes a system
model (which must have a stochastic component), and then calculates the likelihood of the
observed data given the run of the model and the chosen input parameters. The algorithm uses
this probability to decide whether to include the chosen input parameters in a posterior
distribution. Repeating this process on the order of tens or hundreds of thousands of times,
distributions for the input parameters can be summarized using histograms or other density
estimation methods.

Markov Chain Monte Carlo was developed to support nuclear engineering; and the primary
algorithms used in MCMC were developed by Metropolis, Hastings, and Geman and
Geman”. Andrieu, Freitas, Doucet and Jordan”; and Brooks” give good introductions to use of
the technique. Software packages that provide MCMC algorithms include Python(PyMc’),
BUGS™, winBUGS, R(mcmcpack), Vensim.

Application to Model Selection

A modeler can use Markov Chain Monte Carlo to help choose between multiple competing
models. To do so she establishes a categorical variable that can take values corresponding to
each of the candidate models. In each MCMC run, a value (and thus a model) is selected and the
likelihood of the data given that model computed. After MCMC convergence, the relative
likelihood of each model is proportional to the number of times its categorical variable value
was selected.

Andrieu, Djuric and Doucet®® expand on their introduction to MCMC with a description of how
the method can be used to choose between models.

3.1.2 Bayes FACTORS

Bayes Factors serve as a relative likelihood of the validity of two models. While they won't tell
the modeler that a particular model is correct, they will tell her if one model is a better
representation of the measured data than another model.

Description

The Bayes Factor lets a modeler abstract the model from its parameters by calculating the
likelihood ratio of the two models under any set of parameters that each model takes as valid.
This takes the form of an integral of the likelihood over the parameter space. As this is often a
large multidimensional integral, the modeler can use a sampling technique such as Markov
Chain Monte Carlo to approximate its value.

Bayes Factors come from the statistics tradition and were first articulated by Jeffreys”. Kass and
Raftery” give an overview of the technique in its modern form. Software packages that

43)

specifically facilitate the calculation of Bayes Factors include Python (PyMC and

R(BayesFactor).

Application to Model Selection

Bayes Factors give weight to the preference one model should receive over another, and thus
give a measure of confidence in deciding to reject one model in favor of its competitor.
Alternately, the Bayes Factor may tell the modeler that there is no clear preference for one
model over the other, and that some form of averaging, or a new model altogether, would be
the preferred solution.

Raftery® demonstrates the use of Bayes Factors for model selection on social research.
Opportunities exist for demonstrations of this technique in the SD literature.

4 FORMULATION OF SIMULATION MODEL

Data methods can be extremely helpful in completing dynamic models whose structures have
already been identified. The modeler can use these methods to help select equations to

represent the relationships between system components, identify values for model parameters
and initial conditions, and prepare exogenous inputs.

Omitted Modeling Tasks
While partial models and other methods of slicing the model before inference are important,
they are well established within System Dynamics and are not themselves data methods.

4.1 IDENTIFY EQUATIONS TO REPRESENT RELATIONSHIPS BETWEEN

VARIABLES
When a stock and flow structure has been constructed, the next step is to implement the
equations that govern each relationship. In some situations the modeler can work from first
principles, or infer scaling laws and nonlinearities from intuition. In other situations, she may
have little understanding of how variables relate to one another and must infer the nature of
the relationship from data.

4.1.1 STRUCTURAL EQUATION MODELING

Structural Equation Modeling (SEM) attempts to infer relationships between a latent or
unobserved variable and a number of observed variables. These latent variables may be
estimators for a 'soft' quantity (such as 'morale') which itself may be further used in the model.

Description

SEM tries to fit the covariance matrix of a predictive model to the covariance of the observed
parameters. Wright”? provided the seminal paper on structural equation modeling. Ullman”?
gives a tutorial of the basics of Structural Equation Modeling, using a substance abuse model as
a motivating example. A number of books give more detailed introductions, consider Kline”, or
Bollen®.

SEM is well represented in the psychology and econometrics traditions. Several statistical
packages have developed packages for structural equation modeling, including R(SEM,
OpenMx), SPSS(Amos).

Application to Equation Identification

SEM is applied to System Dynamics modeling as a method for including 'soft' variables into
feedback models, identifying the relationship between the unobservable variable and additional
observable characteristics.

Medina-Borja and Pasupathy*? demonstrate the use of SEM in the context of a customer
satisfaction and branch expansion model for a bank. Roy and Mohapatra™ use SEM in a model
of research and development laboratories.

4.2. IDENTIFY INFLUENTIAL PARAMETERS

In parameterizing the model, it’s likely that some parameters will have a strong influence on the
outcome of the model, and others less so. If a modeler can identify these, she can prioritize the
effort given to measuring each parameter value.

4.2.1 Sensitivity ANALYSIS
Sensitivity analysis (also called Experimental Uncertainty Analysis or Propagation of Error) is a
method for determining the impact of small changes of a system input parameter upon the
computed output of the system.

Description

Sensitivity analysis uses numerical means such as Monte Carlo Analysis (see Section 5.2.1) to
propagate errors or uncertainties in model parameters through to the model output. It is often
performed one parameter at a time, and the relative magnitude of change between the input
and output compared with that of other parameters. It can also be performed for multiple
parameters simultaneously to determine the nonlinear interactions between parameters on the
output.

Helton, Johnson, Sallaberry and Storlie® give an overview of sampling-based methods for
sensitivity analsysis in which they step through the various stages of the process: defining input
distributions, designing samples of those distributions and executing the model, and reviewing
the outcome; with a motivating example from industrial engineering.

Basic sensitivity analysis can be performed by any software capable of performing Monte Carlo
Analysis, although a few packages have been developed to implement more advanced
Sensitivity Analysis algorithms: Python(SALib®), R(Sensitivity), Matlab(Control System Toolbox).

Application to Influential Parameter Identification

When sensitivity analyses are used in the model-building stage, the modeler is interested in
finding the parameters with the greatest need for precision - these being the parameters for
which a small error can lead the model to different conclusions. For each parameter, the
modeler can calculate how tight the error bounds must be such that its relative contribution to
the model uncertainty matches that of the other parameters.

Sharp®” discusses methods of sensitivity analysis specifically in the System Dynamics setting.
Powell, Fair, LeClaire and Moore®™ use sensitivity analysis to identify crucial parameters in an
infectious disease model, and Hekimoglu and Barlas” apply the method to several business
management problems.

4.2.2 STATISTICAL SCREENING
Statistical Screening is a method of performing sensitivity analysis in system models that are
computationally expensive, and it is infeasible to take a large number of samples.

Description

Screening looks at the correlation between variables and output values, instead of looking at the
standard deviation of a number of samples when input values are varied. Welch et al® write an
influential paper describing the process of statistical screening.

Application to Influential Parameter Identification

Statistical screening is important to the field of System Dynamics for its ability to identify and
prioritize data collection and parameter estimation tasks. Ford and Flynn®! demonstrate the
use of statistical screening to identify influential parameters in SD models, with examples of
sales vs. company growth, and the World3 model. Taylor, Ford, and Ford®? demonstrate
screening for application to a diffusion model and a rework model.

4.3 SUMMARIZE MEASUREABLE PARAMETERS

In the ideal case, parameters of the model correspond to unique, observable characteristics of
the real system. The task of preparing measurements of this characteristic for inclusion in
System Dynamics models involves summarizing disaggregate sample data into single values or
distributions that represent the true distribution of the concept in the system. Graham® gives
an overview of this process, along with some traditional techniques for doing so.

4.3.1 BooTtsTRAP RESAMPLING

Resampling with bootstrap allows the modeler to estimate 1 Emphical Gunuletivs Distibiition Funcicn of Apr

the error in the parameters of a probability distribution,
based upon a finite number of observations of that
distribution. (Note that statistical bootstrapping is distinct
from bootstrap aggregating in machine learning.)

Description

Bootstrapping works by taking a subset of the sample data,
and computing the parameter of interest (mean, median, etc.)
for that subset. Repeating this process a large number of
times can give confidence bounds for the likely values of the
parameter in the sample data set, which are comparable to

F (Alpha)

Lower Limit of the 95% Confidence Interval

Upper Limit of the 95% Confidence Interval

the bounds on the true population. 0 02 04 06

08

Anna
FIGURE 3: DOGAN'S” USE OF BOOTSTRAP

The bootstrap method comes from the statistics tradition, and
was first introduced by Efron™; Henderson®, and Diaconis MODEL OF THE BEER GAME
and Efron® provide tutorials and perspective on the method

and its variants. A variety of software packages facilitate statistical bootstrapping, including
Python (Scikit-Bootstrap™), R (Boot™), and Matlab (Statistics Toolbox).

Application to Parameter Summarization
Bootstrapping can be used to infer distributions for model input parameters such that those
distributions can be properly used in subsequent Monte Carlo model testing. Dogan® describes

10

1

METHODS TO SUMMARIZE PARAMETERS INA

this process specifically for the System Dynamics model parameter estimation setting, and
demonstrate its use with examples from The Beer Game and quality erosion in the service
industry, as seen in Figure 3.

4.3.2 Markov CHAIN Monte Caro
For a description of Markov Chain Monte Carlo, please see section 3.1.1.

Application to Parameter Summarization

Markov Chain Monte Carlo can be used to estimate hyperparameters (such as mean and
standard deviation) for a distribution representing a SD model parameter. In such a case, MCMC
considers the data to be the product of a statistical model that takes the hyperparameters to
specify an analytical distribution, and calculates the likelihood of the observed data. The MCMC
algorithm then can be used to generate confidence intervals for the hyperparameters, which
can be used as input to the simulation model.

4.4 INFER UNMEASUREABLE PARAMETERS

There are many cases in which parameter values cannot be directly measured, but based upon
the model and its inputs and outputs, the modeler may be able to infer what the parameter
values would need to be for the system model and the data to make sense together. She can
perform this inference on either partial or full models, depending on the data that is available,
using a variety of numerical techniques to yield either single values or distributions.

4.4.1 REGRESSION

The concept of regression includes a variety of methods (Ordinary Least Squares, Orthogonal
Distance, Ridge, etc.) for identifying the single best set of parameter values for a model, where
‘best' is defined by some objective function, such as having the minimum square error between
model predictions and system measurements.

Description

Most regression methods perform an optimization (see section 6.1.1) over the parameter space
to minimize the difference between a prediction and system measurements, as seen in Figure 4.
When applied to time-series data, regression is most effective for non-oscillatory behavior
modes.

11

Data and Model Output Parameter Space

12 ¥ 10
° 4
10 3 jal re
*
8 Pe 6 8
6
4} %e Bo
©
2 2
0 2
0 2 ice
4 2
= 105
© 10°
oi 0
20. «40 «#60 80 100 120 0.0 0.5 1.0 15 2.0
Iterations Exponent

FIGURE 4: REGRESSION SEARCHES THE PARAMETER SPACE TO MINIMIZE THE SUM OF SQUARE ERRORS
BETWEEN OBSERVED DATA AND MODEL PREDICTIONS

A variety of resources for learning regression methods exist; consider Ryan”’ or Vinod”. The vast
majority of statistical packages and system dynamics modeling software give some form of
regression capability: Python (Scipy, Statsmodels”, Scikit-Learn), R(native), Matlab (Statistics
Toolbox), Vensim, AnyLogic (Model Calibration), etc.

Application to Model Parameter Inference

Multiple regression is a common method for inferring values for unknown parameters in system
dynamics models. Higuchi’® discusses parameter estimation using regression with an inventory
model demonstration. Mayerthaler, Haller, and Emberger”™ use regression to parameterize a
land use and transportation model.

4.4.2 Markov CHAIN Monte Caro
For a description of MCMC, see Section 3.1.1.

Application to Model Parameter Inference

MCMC can be used to estimate a distribution for unknown
model parameter values if the output of the simulation model
is specified to represent the parameters of a statistical
distribution. In this case, parameters can be sampled by the
MCNC algorithm and the likelihood of data given the sample
parameters computed. The MCMC algorithm traverses the

parameter space, sampling from a distribution representing Figure 5: INFERENCE OF DISTRIBUTIONS
the likelihood that each parameter takes on a certain value. witH MCMC IN ANDRIEU, FREITAS, AND

Osgood’° describes the use of Markov Chain Monte Carlo for Doucet*?.
SD model parameterization.

12

4.4.3 KALMAN FILTERING
Kalman filtering gives an efficient method for calculating the unknown state of a system given a
set of noisy measurements.

Description

At each time-step, a Kalman Filter uses a System Dynamics model to predict the current state of
the system based upon its estimate of the previous state as seen in Figure 6: Kalman filtering
takes a weighted average of the model-predicted state and a state measurement. It then
combines its prediction with a measurement of the state (or components of the state) based

upon its relative confidence in each.
10 1
8-
6
Vv
3 4
G
>
2 True Value
Model Prediction
te} Initial guess at each timestep
Noisy measurement
-2 Weighted Average
0.0 0.5 10 15 2.0 2.5 3.0 35 4.0

Timestep

FIGURE 6: KALMAN FILTERING TAKES A WEIGHTED AVERAGE OF THE MODEL-PREDICTED STATE AND A
STATE MEASUREMENT

Kalman filters were developed in support of the Apollo program, and come from an Aerospace
and Control Theory background. Kalman's original paper” lays out the basics of his filter, and Du
Plessis” gives a very readable introduction in 'The Poor Man's Explanation of Kalman Filtering’,
only a few years later. Awasthi and Raj’® give a modern survey of the filter's modern variants.

Kalman filters are available within a number of numeric tools, including Python (Pykalman”),

Matlab(Simulink, DSP Toolbox), R(several, compared by Tussel®), Vensim and others.

Application to Model Parameter Inference

In a parameter estimation setting, parameters can be considered as unchanging states for the
Kalman Filter to infer. Examples of Kalman Filters applied to System Dynamics include
Ryzhenkov's parameterization of an economic long wave model™, and Shiryaev, Golovin and
Smolin's model of a one-commodity firm®.

a3

4.5 SURROGATE A FUNCTION

In some cases, two variables exhibit a well-defined relationship, but the form of that
relationship is not well described by any simple functions. If these parts of the system are not
the subjects of interest, the modeler may choose to surrogate the relationship with some form
of piecewise model, or machine learning predictor.

4.5.1 TABLE FUNCTIONS OR LOOKUP TABLES
Table functions give the modeler the ability to approximate a nonlinear relationship between
one or more independent variables and an output variable.

Description

When an analytic representation of the relationship is complex or unavailable, the modeler
constructs a table function using a set of points from that function, and these are interpolated
between to find output values.

Lookup tables have been in use long before the advent of computers, and had previously been
compiled to make trigonometric or logarithmic calculations simpler. Many software packages
provide the basic data storage and interpolation function necessary to implement a lookup
table, including Python(Scipy, Pandas”), R(Stats), matlab(native), Vensim, Anylogic.

Application to Surrogating Functions

When a relationship between variables is unknown or complex, a lookup table provides an
intuitive way to include information about the relationship in a model. Franco™ presents a
thorough introduction to table functions as they have historically been used in SD modeling.

4.5.2 NEURAL NETWORKS

Neural Networks have the ability to encode multidimensional nonlinear relationships based
directly upon training data, and to approximate a response to novel input which is consistent
with the nonlinear relationships it encodes.

Description

Neural networks use training data to establish the relative weights of a set of links between
neural nodes, these links encoding the relationships present in training data. Test data forms the
inputs to these models, and their interaction with the established links provides the output.

Neural networks are a product of the Machine Learning community. McCulloch and Pitts®

developed the first concepts of Neural Networks well before they were implemented on
computer. Holena, Linke, Rodemerck, and Bajer®> use neural networks directly for surrogating
functions based upon data. Software capable of encoding neural networks includes, Matlab(NN
Toolbox), Python(NeuroLab™, PyBrain®’), and a number of standalone packages.

Application to Surrogating Functions
System Dynamicists can use neural networks in place of table functions, especially in situations
with complex, multidimensional relationships that must be estimated from data. Alborzi*®

14

demonstrates the use of a neural network to approximate a function, using the example of
gravitational attraction between two bodies.

5 TESTING

After the model is built, the next task is to build confidence in the model's ability to represent
the real system. Here the modeler can use data for its ability to disprove the model, show where
its weaknesses are, and determine how she can improve it. In each case the modeler is looking
at the model's ability to predict the data in a given set of conditions, and to test the model's
robustness to different types of errors.

Forrester and Senge” give a definitive guide to system dynamics model testing. Barlas®
elaborates with a procedure for conducting various forms of model testing.

Omitted Modeling Tasks

There are a number of tests that are important for building confidence in the model. Some of
these are qualitative tests, such as tests for boundary adequacy, and dimensional consistency.
Others are numerical tests with little reliance on data (beyond model calibration): test of
conservation laws, extreme conditions, loop knockout, and surprise behavior tests. These are
omitted to help focus on the tests with strong data reliance.

5.1 COMPARE POINT PREDICTIONS WITH NUMERICAL DATA

The most logical quantitative test of a predictive model is its ability to make predictions. When a
model is calibrated with best-fit parameter values, it is only able to make point predictions,
which are unlikely to follow the true behavior of the system exactly. A modeler can look at the
difference between the model prediction and the observed behavior in both the time and
frequency domain for various parts of the model and various sections of the data.

5.1.1 Summary STATISTICS

Summary statistics aggregate the difference between a model point prediction and the
observed value, according to some weighting function. These can be decomposed into
components due to bias, variance, and covariance.

Description

Summary statistics include variants on Mean Square Error, the Coefficient of Determination,
Theil's U statistics, and others. Sterman™ describes the appropriate use of summary statistics for
System Dynamics models. Oliva®’ shows how these metrics are calculated using Vensim.

The majority of statistical packages are capable of calculating basic summary statistics:
Python(Scikit-Learn, Statsmodels), R(Metrics), Vensim.

a5

Application to Point Prediction Assessment

Summary statistics are used to estimate the goodness of fit of a model to historical data. System
Dynamics models are often interested not in the magnitude of the total error, but in the way
that error is composed. By using a variety of summary statistics a modeler can determine if the
error in her models is relevant to the purpose of the model. As an example, Stephan™ uses
summary statistics to build confidence in models of software development.

5.1.2 Cross VALIDATION

Cross validation (or out-of-sample testing) works to improve confidence that the model
represents the underlying behavior of the system, and that correspondence between the model
and the observed data is not merely a result of over-fitting the model to the data.

Description

Cross validation works by breaking a dataset into 'training' and 'testing' components. The
training set is used to parameterize the model, and the testing set is used to measure the ability
of the model to make predictions. If possible, a modeler may choose to partition the data in a
variety of ways and repeat the analysis.

When the system is time-dependent (as is usually the case with feedback models) and data is
not significantly in excess of the relevant period of the system, there is a limited range of ways
that the data can be meaningfully partitioned.

Cross validation comes from the statistics tradition. Picard and Cook® give a good overview of
the use of cross validation to combat over-fitting in regression models. Software that facilitates
cross validation includes Python (Scikit-Learn), R (A variety of packages summarized by

Starkweather”), Matlab(Statistics Toolbox) and others.

Application to Prediction Assessment

Cross validation in system dynamics can be challenging, as the complex time-dependent nature
of the systems in question increases the difficulty of partitioning data into independent subsets.
Randers” demonstrates out-of-sample testing in the qualitative comparison of predicted and
measured wood pulp inventory and price.

5.1.3 FAMILY MEMBER TESTS

Family Member Tests are a special form of cross-validation or out-of-sample testing, in which
the modeler uses the model structure to predict the behavior of a structurally parallel but
physically separate system. As with Cross Validation, Family Member Tests work to verify that
the model fit is due to the structure of the system model, and not to a lucky guess of
parameters.

Description

If appropriate family member systems are available, a modeler may choose to reoptimize the
model based upon data from the family-member system in order to support the fidelity of the
model structure and equations; or use the original parameter values to support the full data and

16

modeling process. The modeler can then apply either point-wise or statistical measures to
evaluate the model's predictive ability.

Family Member Tests are described by Forrester and Senge”, and well elaborated by
Sterman*(21.4.9).

Application to Prediction Assessment
System dynamics is principally concerned with identifying the structure of a system, and
structural similarities between systems. These similarities allow a model developed in the
context of one system to be applied to another, structurally similar system with few changes.
Teekasap™ tests a model of the economy of Thailand by evaluating its ability to fit data for the
structurally similar Malaysian economy.

5.1.4 FREQUENCY SPECTRUM ANALYSIS
For a description of the method, see section 2.1.2

Application to Point Prediction Assessment

In systems with oscillation and noise, small changes in system parameters can lead to large
deviation between model prediction and system observation after only a few periods. Spectral
analysis, however, can reveal similarity between the strength of oscillatory modes excited in the
model and the real system.

Eberlein and Wang” demonstrate the use of spectral analysis to evaluate the ability of a model
to replicate behavior modes by comparing power spectral density of the predicted and observed
behavior in the frequency domain region of interest.

5.2 COMPARE STATISTICAL PREDICTIONS WITH NUMERICAL DATA

If models are calibrated with distributions of parameter values, the modeler can develop
statistical predictions of output values. In this case, she can calculate the likelihood of the
observed data given the assumption that the model is correct. The modeler uses statistical and
graphical measures to determine how well the model fits the data.

5.2.1 Monte Carto ANALYSIS

Monte Carlo Analysis is helpful when the parameters of a model are given as statistical
distributions, and we want to find statistical prediction (as opposed to a point prediction) of the
output behavior of the system.

Description

Monte Carlo Analysis draws a set of parameters from a distribution of possible values, and uses
those parameters to execute a dynamic model. The output is recorded and the process repeated
on the order of tens of thousands of times. The collected output values are summarized using a
histogram or other density estimation method to generate an expected distribution of the
behavior of the system given each input distribution.

17

Monte Carlo Analysis was developed to aid in nuclear energy calculations by Metropolis,
Rosenbluth, and Teller”, and expanded and explained by Metropolis and Ulam. There are a
variety of articles and books detailing Monte Carlo and its derivatives.

Most statistical packages are capable of performing Monte Carlo Analysis, and the basic
techniques are not difficult to implement.

Application to Statistical Prediction Assessment

Monte Carlo Analysis gives the modeler the ability to propagate uncertainty in parameter
estimates through to the model output. Well executed, the method can produce calibrated
statistical forecasts of system behavior. Hagenson’™ discusses the use of Monte Carlo
techniques to study capacity for airfield repair. Santos et al’ apply the method to simulation of
pulp prices. Moxnes*™? applies Monte Carlo Analysis to discuss decision making with regard to
greenhouse gasses, and Phillips
model conclusions.

discusses the use of Monte Carlo to ensure the robustness of

6 PoLicy DESIGN AND EVALUATION

When a modeler is confident that her model sufficiently represents the system, and replicates
its problem behavior, she begins to craft interventions to improve performance. This requires
her to identify places in the system where she is able take action, and determine what type of
action will have the desired effect.

Omitted Modeling Tasks

From this section we omit the qualitative tasks of brainstorming model structural changes, and
the tasks that require no new data techniques: testing for policy compatibility, robustness to
model uncertainty, etc.

6.1 EXPLORE PARAMETER CHANGE POLICIES

When leverage points have been identified, a modeler can use some form of optimization to
discover new values to drive parameters towards, respecting the costs of doing so.

6.1.1 OptimizATION
The term ‘Optimization’ covers a range of methods for choosing a set of input conditions that
maximize or minimize a desired output state of a model.

Description

Many types of optimization exist, each with a variety of algorithmic implementations: slope
following algorithms, edge condition assessment, stochastic methods, genetic algorithms and
others. The best optimization for a specific problem depends largely on the topography of the
reward function in parameter space. For a brief overview of optimization methods in the
context of supply chain analysis, see Christou (Ch 2). For a more detailed, modern
introduction, see Chong and Zak’.

18

Optimization is well established in Engineering and Applied Mathematics. Most computational
environments include some form of optimization, several include Python(Scipy, DEAP’”),
Matlab(Optimization Toolbox), R(native, GA), Vensim, and Anylogic.

Application to Policy Change Identification

The choice of a policy intervention in System Dynamics balances a number of factors, including
performance, viability, and robustness. Optimization can be used to tailor strategies for peak
performance subject to realistic constraints. Coyle’ begins a conversation about optimization
methods for policy design in System Dynamics that is continued by Macedo*”. Graham and
Ariza™° demonstrate the use of policy optimization in the context of market placement strategy.

6.2 ADAPT POLICY IN LIGHT OF NEW INFORMATION

Frequently, the optimal intervention strategy is not to change a parameter or system structure
once, but to respond dynamically to the state of the system continuously or in regular intervals.
In these cases the modeler can draw from control theory and sequential decision theoretic
approaches to plan her interventions.

Hamarat, Pruyt and Loonen’™" discuss the need and potential for adaptive, robust policy making

based upon System Dynamics models.

6.2.1 Q LEARNING

Q Learning (or Reinforcement Learning, Approximate Dynamic Programming) is a method of
sequential decision making which optimizes the balance of payoff in the immediate and future
time, taking advantage of new information about state and uncertainties that becomes available
as time progresses.

Description

Q-Learning solves a dynamic programming problem that
computes the expected future payoff for a variety of possible
future states, and suggests the optimal decision at each time-
step based upon the assumption that optimal decisions will also
be made in future states.

Q-Learning comes from the Machine Learning and Decision Leamin
Theory traditions. It was first articulated by Watkins in his PhD Algor ra

thesis’, and later elaborated in Machine Learning*’. Cybenko,

FIGURE 7: LEARNING AND STATE
Gray and Moizumi’™ give a tutorial and overview of the gGREGATION PROCESS IN RAHMANDAD
method. Several packages implement q-learning algorithms: AND FALLAH-FINI™°.
Python(Reinforcement Learning Toolkit'’’), R(qLearn); a variety

of other unpolished code examples are available.

19

Application to Policy Adaptation

Q-learning adds to System Dynamics modeling a programmatic method for structuring adaptive
policies capable of dealing with uncertainty. Rahmandad and Fallah-Fini’’° introduce the use of
Q-learning and system dynamics models for adaptive policy development.

7 CONCLUSION

In conducting this survey, we have identified a number of data techniques with potential to
support System Dynamics that do not seem to be in common use. In Problem
Conceptualization: Frequency analysis for time horizon and resolution determination. In Model
Formulation: Frequency domain regression, Indirect Inference and System Identification, model-
based interpolation and filtering for exogenous inputs, Sequential Monte Carlo. In Model
Testing: Brier Score, Reliability and Sharpness Diagrams. For Policy Development and
Verification: Model-Predictive Control, virtual control groups, dynamic models for real-time
system inference and monitoring.

The breadth of techniques for data inclusion in System Dynamics models, and the diversity of
applications in which they can be useful mean that any survey of this type will be incomplete;
and the rapid pace of new development means that there will soon be techniques to add to this
list. Readers should continue to monitor developments in data mining, machine learning, and
computer science to see how lessons from these fields can be brought to bear on dynamic
models of complex systems.

FUNDING ACKNOWLEDGEMENT

This work was supported in part by the Office of Naval Research under award number NO0014-
9-1-0597. Any opinions, findings, and conclusions or recommendations expressed in this
publication are those of the author(s) and do not necessarily reflect the views of the Office of
Naval Research.

This work was supported in part by the Defense Advanced Research Projects Agency
(government grant number D14AP00001). The views, opinions, and/or findings contained in this
article are those of the author and should not be interpreted as representing the official views
or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or
the Department of Defense. Approved for Public Release, Distribution Unlimited.

Research reported in this publication was supported in part by the Charles Stark Draper
Laboratory’s University Research and Development program. Any opinions, findings, and
conclusions or recommendations expressed in this publication are those of the authors and do
not necessarily reflect the views of the Charles Stark Draper Laboratory.

20

BIBLIOGRAPHY

Forrester JW. Information Sources for Modeling the National Economy. J Am Stat Assoc.555.
doi:10.2307/2287644.

Mullainathan S. What Big Data Means For Social Science. Edge.org.
http://www.edge.org/panel/headcon- 13-part-i-what-big-data-means-for-social-science. Published
November 11, 2013. Accessed March 6, 2014.

Peterson DW. Statistical Tools for System Dynamics. In: Proceedings of the International System
Dynamics Conference.; 1976:841. Available at:
http://www.systemdynamics.org/conferences/1976/proceed /peter841.pdf, Accessed March 6, 2014.

Eberlein RL, Wang Q. Statistical Estimation and System Dynamics Models. In: Proceedings of the
International System Dynamics Conference. 1985:206. Available at:
http:/ /www.systemdynamics.org/conferences/1985 /proceed /eberl206.pdf. Accessed March 6, 2014.

Sterman J. Business dynamics, 2000. Available at:
http://www.citeulike.org/group/1702/article /986718. Accessed March 13, 2014.

Sapsford R. Data Collection and Analysis Second Edition. Available at:
http://www.uksagepub.com /textbooks/Book226213., Accessed March 6, 2014.

Randers J. Guidelines for model conceptualization. Elem Syst Dyn method, 1980. Available at:
http://scholar.google.com /scholar?hl=en&q=Randers,+].+1980.+"Guidelines+for+Model+Conceptualiza
tion”&btnG=&as_sdt=1,22#0. Accessed March 11, 2014.

Mashayekhi AN, Ghili S. System Dynamics Problem Definition as an Evolutionary Process Using
Ambiguity Concept. In: Proceedings of the International System Dynamics Conference.; 2010:1157.
Available at: http://www.systemdynamics.org/conferences/2010/proceed /papers/P1157.pdf.
Accessed March 11, 2014.

Saeed K. Defining a problem or constructing a reference mode. Proc Int Syst Dyn Conf, 1998:16. Available
at: http://www.systemdynamics.org/conferences/1998/PROCEED/00016.PDF. Accessed March 6,
2014.

VanderWerf P. The use of reference modes in System Dynamics modeling, In: Proceedings of the
International System Dynamics Conference; 1981. Available at:
http://www.systemdynamics.org/conferences/1981 /proceed /vande108.pdf. Accessed March 6, 2014.

Tukey J. Exploratory data analysis. 1977. Available at:
http://xa.yimg.com /kq/groups/16412409 /1159714453 /name/exploratorydataanalysis.pdf. Accessed
March 8, 2014.

Tufte E, Graves-Morris P. The visual display of quantitative information; 1983. Available at:
http://www.eng.auburn.edu/users/fmm0002/DVsyllabus.pdf. Accessed March 11, 2014.

Yau N. Visualize this: the Flowi guide to design, visualization, and statistics.; 2011. Available at:
http:/ /books.google.com /books?hl=en&lr=&id=G1Z3EEUOTroC&oi pg=PR13&dq=Vvisualize+this+
nath YgLMId i JnLE! WMWRcLrPqQmd-gEM4. Accessed March 11, 2014.

Keim DA. Information visualization and visual data mining. IEEE Trans Vis Comput Graph. 2002;8(1):1-
8, doi:10.1109/2945.981847.

21

20.

21.

22.

23.

24,

25.

26.

27.

28.

29.

30.

31.

Numpy Developers. NumPy Documentation. 2013. Available at: http://www.numpy.org/. Accessed
March 13, 2014.

Hunter J, Dale D, Firing E, Droettboom M, The ib Di Team. ib: Python
Plotting — Documentation. 2013. Available at: http://matplotlib.org/. Accessed March 13, 2014.

Continuum Anaytics. Bokeh — Documetation. 2013. Available at: http://bokeh.pydata.org/. Accessed
March 13, 2014.

D3¥js - Data-Driven Documents. Available at: http://d3js.org/. Accessed March 18, 2014.

Martinez W, Martinez A, Solka J. Exploratory data analysis with MATLAB. 2004. Available at:
http://books.google.com/books?hl=en&l RVOOXxp6QC d&pg=PP1&0ts=tIRV2plgd9&sig
=sR3Wf9W5dGVvYSI-EIkEiVpfu7HE. Accessed February 4, 2014.

Gnuplot. Available at: http://www.gnuplot.info/. Accessed March 18, 2014.

Khan N, McLucas A, Linard K, DEVELOPMENT OF A REFERENCE MODE FOR CHARACTERISATION OF
SALINITY PROBLEM IN THE MURRAY DARLING BASIN. 22nd Int Syst .... 2004. Available at:
http://www.systemdynamics.org/conferences/2004/SDS_2004/PAPERS/247KHAN.pdf. Accessed
March 13, 2014.

Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math Comput.
1965;19(90):297-297. doi:10.1090 /S0025-5718-1965-0178586-1.

Duhamel P, Vetterli M. Fast fourier transforms: A tutorial review and a state of the art. Signal Processing.
1990;19(4):259-299. doi:10.1016/0165-1684(90)90158-U.

Numpy Developers. Discrete Fourier Transform (numpy.fft) — NumPy v1.8 Manual. 2013. Available at:
http://docs.scipy.org/doc/numpy/reference /routines.fithhtml. Accessed March 6, 2014.

R: The R Stats Package. Available at: http://stat.ethz.ch/R-manual/R-
patched /library /stats/html/00Index.html. Accessed March 18, 2014.

MATLAB Documentation. Available at: http://www.mathworks.com/help /matlab/. Accessed March 18,
2014.

Arango S, Moxnes E. Cyclical behaviour in electricity markets: an experimental study. ... Exp Electr Mark.
2006. Available at: http://qui .banrep.gov.co icaciones/pdf/electricity.pdf.
Accessed March 18, 2014.

Enns RH, McGuire GC. Computer Algebra Recipes: An Advanced Guide to Scientific Modeling. New York,
NY: Springer New York; 2007. doi:10.1007/978-0-387-49333-6.

Tien NT. Phase Plane Analysis. In: Applied Nonlinear Control; 2002. Available at:
http://www4.hcmut.edu.vn/~nttien /Lectures/Applied nonlinear control/C.2 Phase Plane Analysis.pdf.
Accessed March 14, 2014.

Tseng ZS. The Phase Plane. 2008. Available at: http://www.math.psu.edu/tseng/class/Math251/Notes-
PhasePlane.pdf. Accessed March 14, 2014.

Clewley R. Hybrid models and biological model reduction with PyDSTool. PLoS Comput Biol. 2012.
Available at: http://dx.plos.org/10.1371 /journal.pcbi.1002628. Accessed March 18, 2014.

22

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44,

45.

46.

47.

Guneralp B. Exploring Structure-Behaviour Relations: Eigenvalues and Eigenvectors versus Loop
Polarities. In. ... 22nd Int Conf... 2004. Available at:
http://www.systemdynamics.org/conferences/2004/SDS_2004/PAPERS/346GUNER.pdf. Accessed
March 14, 2014.

Rahn RJ. Aggregation in system dynamics. Syst Dyn Rev. 1985;1(1):111-122.
doi:10.1002/sdr.42600 10109.

Xu R, Wunsch D. Survey of clustering algorithms. [EEE Trans Neural Netw. 2005;16(3):645-78.
doi:10.1109/TNN.2005.845141.

Clustering — scikit-learn 0.14 documentation, Available at: http: //scikit-
learn.org/stable/modules/clustering.html. Accessed March 6, 2014.

Onsel N, Onsel i, Yiicel G. Evaluation of alternative dynamic behavior representations for automated
model output classification and clustering. systemdynamics.org. Available at:
http://www.systemdynamics.org/conferences/2013 /proceed /papers/P1341.pdf. Accessed March 18,
2014.

Pruyt E. Doing more with Models: Illustration of a SD Approach for exploring deeply uncertain issues,
analyzing models, and designing adaptive robust policies. Available at:
http://www.systemdynamics.org/conferences/2013/proceed /papers/P1012.pdf. Accessed March 6,
2014.

Metropolis N. Equation of state calculations by fast computing machines. J... 2004. Available at:
http://scitation.aip.org/content/aip/journal/jcp/21/6/10.1063/1.1699114. Accessed March 14, 2014.

Hastings W. Monte Carlo sampling methods using Markov chains and their applications. Biometrika.
1970. Available at: http://biomet.oxfordjournals.org/content/57/1/97.short. Accessed March 8, 2014.

Geman S, Geman D. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.
IEEE Trans Pattern Anal Mach Intell. 1984;PAMI-6(6):721-741. doi:10.1109/TPAMI.1984.4767596.

Andrieu C, Freitas N De, Doucet A, Jordan M. An introduction to MCMC for machine learning. Mach
Learn. 2003. Available at: http://linkspringer.com/article/10.1023/A:1020281327116. Accessed
March 8, 2014.

Brooks S. Markov chain Monte Carlo method and its application. J R Stat Soc Ser D (The Stat.
1998;47(1):69-100. doi:10.1111/1467-9884.00117.

Fonnesbeck CJ. PyMC User’s Guide. 2012. Available at: http://pymc-devs.github.io/pymc/. Accessed
March 8, 2014.

The BUGS Project - Bayesian inference Using Gibbs Sampling. Available at: http://www.mre-
bsu.camac.uk/bugs/. Accessed March 18, 2014.

Andrieu C, Djurié P, Doucet A. Model selection by MCMC computation. Signal Processing. 2001. Available
at: http://www.sciencedirect.com/science/article /pii/S0165168400001882. Accessed March 8, 2014.

Jeffreys H. Some tests of significance, treated by the theory of probability. Cambridge Univ Press.
Available at: http://journals.cambridge.org/production/action/cjoGetFulltext 9fulltextid=2109108.
Accessed February 4, 2014.

Kass R, Raftery A. Bayes factors. J Am Stat .... 1995. Available at:
http://amstat.tandfonline.com/doi/full/ 10.1080 /01621459.1995.10476572. Accessed February 4,
2014.

23

48.

49,

50.

51.

52.

53.

54.

55.

56.

57:

58.

59.

60.

61.

62.

Raftery A. Bayesian model selection in social research. Sociol Methodol. 1995. Available at:
https:/ /www.stat.washington.edu /raftery/Research/PDF/socmeth1995 pdf. Accessed February 4,
2014.

Wright S. Correlation and Causation. J Agric Res. 1921. Available at:
http://www.citeulike.org/group /3214/article/1615595. Accessed March 17, 2014.

Ullman J. Structural equation modeling: Reviewing the basics and moving forward. J Pers Assess. 2006.
Available at: http://www.tandfonline.com /doi/abs/10.1207 /s1532775 2)pa8701_03. Accessed March
16, 2014.

Kline R. Principles and practice of structural equation modeling.; 2011. Available at:
http://books.google.com/books?hl=en&lr=&i XOC&oi=fnd &pg=PR1 Ac
4Pb1P&sig=-z-eyTs_2DmdBi6kCIAykLAhpU. Accessed March 17, 2014.

Bollen K. Structural equation models; 1998. Available at:
http://onlinelibrary.wiley.com/doi/10.1002/0470011815.b2a13089/full. Accessed March 17, 2014.

Medina-Borja A, Pasupathy K. Uncovering Complex Relationships in System Dynamics Modeling:
Exploring the Use of CART, CHAID and SEM... Syst Dyn .... 2007. Available at:
http://www.systemdynamics.org/conferences/2007 /proceed /papers /MEDIN423.pdf, Accessed March
17,2014.

Roy S, Mohapatra P. Causality and validation of system dynamics models incorporating soft variables:
Establishing an interface with structural equation modelling, ... 2000 Int Syst Dyn .... 2000. Available at:
http://www.systemdynamics.org/conferences/2000/PDFs/roy319p.pdf. Accessed March 17, 2014.

Helton JC, Johnson JD, Sallaberry Cj, Storlie CB. Survey of sampling-based methods for uncertainty and
sensitivity analysis. Reliab Eng Syst Saf.2006;91(10-11):1175-1209. doi:10.1016 /j.ress.2005.11.017.

Herman JD, Reed P. SALib Documentation. 2013. Available at: http://jdherman github.io/SALib/.
Accessed March 14, 2014.

Sharp JA. Sensitivity Analysis Methods for System Dynamics Models. In: Proceedings of the International
System Dynamics Conference.; 1976:761. Available at:
http://www.systemdynamics.org/conferences/1976/proceed /sharp761.pdf. Accessed March 14, 2014.

Powell D, Fair J. Sensitivity analysis of an infectious disease model. Submitt to 2005 .... 2005. Available
at: http://www.systemdynamics.org/conferences/2005/proceed/papers/LECLA330.pdf. Accessed
March 14, 2014.

Hekimogilu M, Barlas Y. Sensitivity analysis of system dynamics models by behavior pattern measures.
w. Dyn Soc Syst Dyn .... 2010. Available at:
http://www.systemdynamics.org/conferences/2010/proceed /papers/P1260.pdf. Accessed March 14,
2014.

Welch W, Buck R, Sacks J, Wynn H. Screening, predicting, and computer experiments. .... 1992. Available
at: http://www.tandfonline.com /doi/abs/10.1080/00401706.1992.10485229. Accessed March 17,
2014.

Ford A, Flynn H. Statistical screening of system dynamics models. Syst Dyn Rev. 2005. Available at:
http://onlinelibrary.wiley.com/doi/10.1002/sdr.322/abstract. Accessed March 17, 2014.

Taylor T, Ford D, Ford A. Improving model understanding using statistical screening, Syst Dyn Rev. 2010.
Available at: http://onlinelibrary.wiley.com/doi/10.1002/sdr.428/abstract. Accessed March 17, 2014.

24

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76.

77.

78.

Graham AK. Parameter Formulation and Estimation in System Dynamics Models. In: Proceedings of the
International System Dynamics Conference, 1976:541. Available at:
http://www.systemdynamics.org/conferences/1976 /proceed /graha541.pdf. Accessed March 10, 2014.

Efron B. Bootstrap methods: another look at the jackknife. Ann Stat. 1979. Available at:
http://www,jstor.org/stable/2958830. Accessed March 17, 2014.

Henderson AR. The bootstrap: a technique for data-driven statistics. Using computer-intensive analyses
to explore experimental data. Clin Chim Acta, 2005;359(1-2):1-26. doi:10.1016/j.ccen.2005.04.002.

Diaconis P, Efron B. Computer-intensive methods in statistics. Sci Am. 1983. Available at:
http://statweb stanford.edu/~ckirby/techreports/NSF/EFS NSF 196.pdf. Accessed March 17, 2014.

Evans C. SciKits - scikits.bootstrap. Available at: http://scikits.appspot.com/bootstrap. Accessed March
18, 2014.

Canty A, Ripley B. R - Package “boot.” Available at: http://cran.r-
project.org/web /packages/boot/index.html. Accessed March 18, 2014.

Dogan G. Bootstrapping for confidence interval estimation and hypothesis testing for parameters of
system dynamics models. Syst Dyn Rev. 2007. Available at:
http://onlinelibrary.wiley.com/doi/10.1002/sdr.362/full. Accessed March 17, 2014.

Ryan T. Modern regression methods; 2008. Available at:
http://books.google.com /books?hl=en&lr=&id=ed_}Pj2pqbMC&oi=fnd 1&dq=r i ho
ds&ots=Uzxi6YsYMJ&sig=qZ_8Q1) oSiqPM. Accessed March 13, 2014.

Vinod H, Ullah A. Recent advances in regression methods. 1981. Available at:
http://www.getcited.org/pub/102140308. Accessed March 13, 2014.

StatsModels Documentation. Available at: http://statsmodels.sourceforge.net/. Accessed March 13,
2014.

Higuchi T. Parameter Estimation in System Dynamics Model by Multi-Optimization Technique. Proc Int
Syst Dyn Conf. 1996:221. Available at:
http://www.systemdynamics.org/conferences/1996 /proceed /papers/higue221.pdf. Accessed March
17,2014.

Mayerthaler A. A land-use/transport interaction model for Austria. Proc 27th .... 2009. Available at:
http://www.systemdynamics.org/conferences/2009/proceed /papers/P1239.pdf. Accessed March 17,
2014.

Osgood N. Bayesian Parameter Estimation of System Dynamics Models Using Markov Chain Monte Carlo
Methods: An Informal Introduction. Proc 31st Int Conf Syst Dyn Soc. 2013:1391. Available at:

http:/ /www.systemdynamics.org/conferences/2013 /proceed /papers/P1391.pdf. Accessed March 14,
2014.

Kalman R. A new "approach to linear filtering and prediction problems. J basic Eng. 1960. Available at:
hetp://flui digital collection.asme.org/article.aspx?articleid= 1430402. Accessed
March 13, 2014.

Plessis R Du. Poor Man's Explanation of Kalman Filtering, Autonectics Div North Am Rockwell .... 1967.
Available at: http://www.forth.org/fd/FD-V20N2.pdf. Accessed March 13, 2014.

‘Awasthi V, Raj K. A Survey on the Algorithms of Kalman Filter and Its Variants in State Estimation. Vis
Soft Researach Dev. 2011;2(2):73. Available at:

25

79.

80.

81.

82.

83.

84.

85.

86.

87.

88.

89.

90.

91.

92.

http://www.vsrdjournals.com/vsrd/Issue/2011_Feb/5_Vishal_Awasthi_Review_Article_Feb_2011.pdf.
Accessed March 18, 2014.

pykalman - dead-simple Kalman Filter, Kalman Smoother, and EM library for Python. Available at:
http://pykalman.github.io/. Accessed March 8, 2014.

Tusell F, Kalman filtering in R. J Stat Softw. 2011. Available at: http://stat-
www berkeley.edu/users/brill/Stat248 /kalmanfiltering,pdf. Accessed March 13, 2014.

Ryzhenkov A. A historical fit of a model of the US long wave. Proc of the. 2002. Available at:
http://www.systemdynamics.org/conferences/2002/proceed /papers/Ryzhenk1.pdf. Accessed March
13, 2014.

Shiryaev V, Shiryaev E. Adaptation and Optimal Control of Firm and its State and Parameters Estimation
at Change ofa Market Situation. Proc .... 2002. Available at:
http://www.systemdynamics.org/conferences/2002 /proceed /papers/Shiryae 1.pdf. Accessed March
13, 2014.

Python Data Analysis Library — pandas: Python Data Analysis Library. Available at:
http:/ /pandas.pydata.org/. Accessed March 18, 2014.

Franco D. Fifty years of table functions. Proc 25th Int Conf... 2007. Available at:
http://www.systemdynamics.org/conferences/2007 /proceed /papers/FRANC270.pdf. Accessed March
18, 2014.

McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys.
1943;5(4):115-133. doi:10.1007 /BF02478259.

Holeiia M, Linke D, Rodemerck U, Bajer L. Neural networks as surrogate models for measurements in
optimization algorithms. ... Stoch Model ....2010. Available at:
http://linkspringer.com /chapter/10.1007 /978-3-642-13568-2_25. Accessed March 18, 2014.

neurolab - Simple and powerfull neural network library for python - Google Project Hosting, Available
at: https://code.google.com/p/neurolab/. Accessed March 18, 2014.

Schaul T, Bayer J, Wierstra D, Sun Y. PyBrain. J Mach .... 2010. Available at:
http://dl.acm.org/citation.cfm?id=1756030. Accessed March 18, 2014.

Alborzi M. Implanting Neural Network Elements in System Dynamics Models to Surrogate Rate and
Auxiliary Variables. 2006. Available at:

http://www.systemdynamics.org/conferences/2006 /proceed /papers/ALBOR145.pdf. Accessed March
18, 2014.

Forrester J, Senge P. Tests for building confidence in system dynamics models; 1978. Available at:
http:/ /www.sdmodelbook.com /uploadedfile/32_0f93d902-b52f-4b1f-a658-5a5d144c3d1b_Tests for
Building Confidence JWF Senge.pdf. Accessed March 18, 2014.

Barlas Y. Formal aspects of model validity and validation in system dynamics. Syst Dyn Rev. 1996.
Available at: http://www.ie.boun.edu.tr/labs/sesdyn publications /articles/Barlas_1996.pdf. Accessed
March 18, 2014.

Sterman J. Appropriate summary statistics for evaluating the historical fit of system dynamics models.
Dynamica. 1984. Available at:

http://www.systemdynamics.org/conferences/1983/proceed /plenary/sterm203.pdf. Accessed March
14, 2014.

26

93:

94.

95.

96.

97.

98.

99.

100.

102.

103.

104.

105.

106.

Oliva R. A Vensim® Module to Calculate Summary Statistics for Historical Fit. Available at:
http://www.metasd.com/models/Library /Misc/TheilStatistics /D4584theil pdf. Accessed March 14,
2014.

Stephan T. The use of statistical measures to validate system dynamics models. 1992. Available at:
http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA248569, Accessed
March 14, 2014.

Picard R, Cook R. Cross-validation of regression models. J Am Stat .... 1984. Available at:
http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1984.10478083. Accessed March 13,2014.

Starkweather J. Cross Validation techniques in R: A brief overview of some methods, packages, and
functions for assessing prediction models. Available at:
http://www.unt.edu/rss/class/Jon/Benchmarks/CrossValidation1_JDS_May2011 pdf. Accessed March
13, 2014.

Randers J. Prediction of Pulp Prices - A Review Two Years Later. Proc Int Syst Dyn Conf. 1984:238.
Available at: http://www.systemdynamics.org/conferences/1984/ proceed /rander238.pdf. Accessed
March 18, 2014.

Teekasap P. Dynamics of Technology Spillover through Foreign Direct Investment in Thailand under
R&D Consortia Policy. In: Proceedings of the International System Dynamics Conference.; 2010:1049.
Available at: http://www.systemdynamics.org/conferences/2010/ proceed /papers/P1049.pdf.
Accessed March 17, 2014.

Eberlein R, Wang Q. Validation of oscillatory behavior modes using spectral analysis. Proc Int Syst ..
1983. Available at: http://www.systemdynamics.org /conferences/1983/proceed/parallel-
vol2/eberl952.pdf. Accessed March 11, 2014.

Metropolis N, Ulam S, The monte carlo method. J Am Stat .... 1949. Available at:
http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1949.10483310. Accessed March 13, 2014.

Hagenson N. System Dynamics Combined with Monte Carlo Simulation. System. 1990. Available at:
http://www.systemdynamics.org/conferences/1990 /proceed /pdfs/hagen468.pdf. Accessed March 18,
2014.

Santos E, Galli G, Nahmias T, Sienra R, Archipavas J. Case study: Scenario and Risk Analysis in the Pulp
Industry using System Dynamics and Monte Carlo Simulation. systemdynamics.org. Available at:
http://www.systemdynamics.org/conferences/2013/proceed /papers/P1138.pdf. Accessed March 18,
2014.

Moxnes E. System Dynamics and Decisions Under Uncertainty. In: Proceedings of the International
System Dynamics Conference.; 1990:798. Available at:

http:/ /www.systemdynamics.org/conferences/1990/proceed /pdfs/moxne798.pdf. Accessed March 18,
2014.

Phillips W. Monte Carlo Tests of Conclusion Robustness. In: Proceedings of the International System
Dynamics Conference. 1976:873. Available at:
http://www.systemdynamics.org/conferences/1976/proceed /phill873.pdf. Accessed March 18, 2014.

Christou IT. Quantitative Methods in Supply Chain Management. London: Springer London; 2012.
doi:10.1007 /978-0-85729-766-2.

Chong E, Zak S. An introduction to optimization,; 2013. Available at:
http://books.google.com/books? Ir=&id=iD5 sOiKXHP8C&oi=fnd&pg=PT15&dq=An+Introduction
+to+Optimization&ots=3 PrpeYCrcd&sig=jAIZgHTZMNo7ScTdlyyKSdAL_2k. Accessed March 18, 2014.

27

107.

108.

109.

110.

112:

113;

114.

115.

116.

Fortin F, Rainville D. DEAP: Evolutionary algorithms made easy. J Mach .... 2012. Available at:
http://dl.acm.org/citation.cfm?id=2503311. Accessed March 18, 2014.

Coyle RG. The use of optimization methods for policy design in a system dynamics model. Syst Dyn Rev.
1985;1 (1):81-91. doi:10.1002/sdr.4260010107.

Macedo J. A reference approach for policy optimization in system dynamics models. Syst Dyn Rev.
1989;5(2):148-175. doi:10.1002/sdr.4260050205.

Graham AK, Ariza CA. Dynamic, hard and strategic questions: using optimization to answer a marketing
resource allocation question. Syst Dyn Rev. 2003;19(1):27-46. doi:10.1002/sdr.264.

Hamarat C, Pruyt E, Loonen E, A Multi-Pathfinder for Developing Adaptive Robust Policies in System
Dynamics. systemdynamics.org. Available at:
http://www.systemdynamics.org/conferences/2013/proceed /papers/P1367.pdf. Accessed March 18,
2014.

Watkins C. Learning from delayed rewards. 1989. Available at:
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.330022. Accessed March 18, 2014.

Watkins C, Dayan P. Q-learning. Mach Learn, 1992. Available at:
http://linkspringer.com /article/10.1007/BF00992698. Accessed March 18, 2014.

Cybenko G, Gray R, Moizumi K. Q-Learning: A Tutorial and Extensions. In: Mathematics of Neural
Networks.Vol 8. Boston, MA: Springer US; 1997:24-33. Available at:
http:/ /www.springerlink.com /index/10.1007/978-1-4615-6099-9. Accessed March 18, 2014.

Reinforcement Learning Toolkit: Reinforcement Learning and Artificial Inteligence. Available at:
http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLtoolkit/RLtoolkit1.0.html. Accessed March 18,
2014.

Rahmandad H, Fallah-Fini S. Learning Control Policies in System Dynamics Models. systemdynamics.org.
Available at: http://www.systemdynamics.org/conferences/2008/proceed /papers/RAHMA388.pdf.
Accessed March 18, 2014.

28

Metadata

Resource Type:
Document
Description:
Numerical data is experiencing a renaissance because 1) traditional data such as census and economic surveys are more readily accessible 2) new sensors are measuring things that have never been measured before, and 3) previously 'unstructured' data - such as raw text, audio, images, and videos - is becoming more amenable to quantification. Because of this explosion and the popular buzz surrounding ‘Big Data’, clients expect to see strong incorporation of data methods into dynamic models, and it is imperative that System Dynamics Modelers are fully versed in the techniques for doing so. The SD literature contains surveys that explain methods for including data in system dynamics modeling, but techniques have continued to develop. This paper attempts to bring these surveys up to date, and serve as a menu of modern techniques.
Rights:
Date Uploaded:
March 16, 2026

Using these materials

Access:
The archives are open to the public and anyone is welcome to visit and view the collections.
Collection restrictions:
Access to this collection is unrestricted unless otherwide denoted.
Collection terms of access:
https://creativecommons.org/licenses/by/4.0/

Access options

Ask an Archivist

Ask a question or schedule an individualized meeting to discuss archival materials and potential research needs.

Schedule a Visit

Archival materials can be viewed in-person in our reading room. We recommend making an appointment to ensure materials are available when you arrive.