Bakken, Bent E., "Learning in Dynamic Simulation Games; Using Performance as a Measure", 1989

Online content

Fullscreen
Learning in dynamic simulation games; using performance as a measure

Bent E. Bakken
System Dynamics Group
Massachusetts Institute of Technology
Cambridge, MA 02141, USA

Abstract

In a dynamic simulation game portraying a multiplier-accelerator investment problem, there are major differences between high and low
performers; high performers voice specific concerns for future states of the system, while low performers are less likely to think about
the future, Planning, especially incorporating the deceptive nature of feedback, is necessary in systems that exhibit diverging long and
short term behaviors. A comparison of game results with written reports shows that there is a positive relationship between performance
and understanding of the game. These results are contrary to previous research where performance and understanding have been unrelated
Broadbent et al. 1978, 1986), but can be explained by the added complexity of non-linear feedback tasks with shifts in loop dominance.
Such tasks are, in contrast to simple regression model tasks, non-routine and therefore verbal and behavior aspects of decision makers’
mental models correspond.

Introduction

Compared to the complexities and uncertainties facing a manager, a airline pilot controls a transparent system. To
use a flight simulator analogy, the manager’s educational simulation requires that he must learn to perform well in
the simulated environment, but more importantly within she must be able to devise appropriate actions in
situations different from those simulated. She must be able to re-conceptualize a problem and devise appropriate
actions in performance of daily tasks without having access to the simulated environment.

By analyzing player performance in business games, one can gain insight into the process of how managers’
assimilate model insights. However, a link between performance and understanding is necessary for such
analyses to be fruitful. Previous studies, on the contrary, have documented that performance can be unrelated to
understanding (Broadbent et al., 1978; 1986). It is thus necessary to complement performance measures with
behavioral data to gain insight in how people make decisions and subsequently learn (Jacoby et al. 1984). If the
two types of measures contradict, the task of inferring mental processes can indeed be difficult. -But as long as
they correspond, analyzing learning can be done by using several data sources in a triangular fashion; or one can
substitute behavior data for performance measures.Such substitution is important, since performance indicators
often are more readily available than behavioral data. 3
310

This research reports on an experiment where we contrast performance and behaviorally derived data in a quasi-
continuous feedback game. Decision making in the game has been extensively reported elsewhere (Sterman 1987;
1989). Low transparency leads to poor performance and faulty decision making; which in turn can be explained
by players’ initial misperceptions of supply line feedback (Sterman 1989).

In addition to attempting to find a relationship between performance and other indicators of understanding, this
paper also provides a corroboration of the heuristics proposed by earlier work on statistical estimation of decision
rules in this game (Sterman 1986; 1989). We first briefly review relevant literature on learning in static and
various feedback tasks. Then the experiment is described and its results pointed out. Finally, a discussion of the
findings is carried out.

Previous work

Dynamic tasks are harder to research than static ones; they take longer to study, may require sophisticated
mathematics to solve for optimality and often demand (what used to be very) expensive computer set-ups (Slovic
et al. 1977). Led by Tversky and Kahnemann (1981) the main finding in static decision making tasks has been
that people make inappropriate decisions and act inconsistently. The underlying explanation is, in common
language, that in order to survive in an everyday environment that would be chaotic if people should calculate
optimal solutions, people perform according to simple rules of thumb. These simplifying heuristics can however
perform well in real world environments (Klayman and Young-Won 1987), but in experimental situations without
feedback decision makers can make faulty inferences (Hogarth 1981).

In particular, most real life situations are so constructed that cue redundancy and feedback will help an non-
Teflective decision maker. Even if he follows simple rules of thumb, he might perform quite well, but often he
doesn’t; the problematic dimensions of situations when people perform poorly are not well understood (Hogarth
1981). Some issues have been dealt with, however; in outcome feedback situations; i.e. in tasks where subjects
must infer the true relationships between two variables in presence of several, often distorted cues, it has been
shown that non-linear relationships are hard to infer (Brehmer 1978). Beyond a certain noise factor, subjects are
unable to make correct inferences.

However, real life tasks are in general not inference tasks, they require action; when a person makes a decision, he
also acts and receives consequential feedback from the task. Hogarth (1981) has proposed that existence of
continuous tasks can explain why people use heuristics requiring little cognitive effort; it may pay off to make
many decisions and adjust according to feedback instead of relying heavily on “one shot“ decisions involving
complex information processing. Hogarth’s view is consonant with Simon’s (1981) point that cognitive
processing need not be very complex; decision making environments are often so construed that the rules can be
very simple, yet the outcome can be satisfactorily. Concerned with a medical task with abundantly available
information, Kleinmuntz and Thomas (1987) show that there exist cognitive effort/accuracy payoffs. In their task
subjects rely too much on inference when use of simple heuristics, action and feedback would have yielded higher
performance.
311

Similarly, when individuals have available only action feedback, they misperceive the nature of simple structural
relationships (Sterman 1986, 1989). But more importantly, subjects tend to pay attention to salient features of the
task and not to subtle aspects. In particular, Sterman shows that by paying due attention to supply lines,
performance can be improved.

Brehmer (1987, 1988) has investigated a similar simulated decision making task. Instead of Sterman's statistically
estimated decision rules, he uses other protocols to discuss differences in performance. Of particular interest is his
finding that high performing players use planning (or feedforward as he puts it) to infer future states of the
system. In a task where conditions change exponentially, such as the forest firefighting he looks at, this ability is
crucial; without planning a couple of time periods ahead, subjects never get their firefighters to the fire on time and
keep sending fire engines to already blackburned areas.

Broadbent et al. (1986) show that there is not necessarily a link from performance to understanding; verbalizable
knowledge is not used in game playing since high performers give wrong answers to questions and vice versa.
Their work posits that decisions are largely subconscious, and that conscious verbalizations are not connected to
actual decision making in the simpel inference tasks. However, if the task is sufficiently unfamiliar, as the one we
have chosen, decision making and verbalization should correspond; according to Rasmussen (1976) and Ericsson
and Simon (1984), verbalizations will reflect underlying inference processes if a task departs sufficiently from
routine. Although Broadbent et al.’s subjects where unfamiliar with the management task they dealt with, its linear
nature led them to make a simple linear extrapolation; a meta-task they must have been familiar with. It therefore
resembled a routine task and is not amenable to protocol analysis and questionnaires,

The experimental task

The task discussed below is imbedded in a game (Sterman 1986) and the structure is related to a phenomenon
called the economic long wave and illustrates how capital self-ordering can cause fluctuations in economic
activities. In the game individuals are in charge of the capital ordering decision for a simulated company. The
tricky issue in the model arises from the firm’s need for its own capital to produce finished goods, and that for a
period of time increased capacity can only be obtained by restricting deliveries to final customers. Initially this
self-reinforcing feedback loop is hard to detect, thus overcapacity builds up and cycles with some 50 year’s period
develop.

One trial consists of 36 decisions with a given sequence of exogenous final consumer demand. There were 4
different demand patterns; (1) one time step of 10 %; (2) linear growth; (3) sinusoidal pattern and (4) stable with a
random component. The first demand pattern is called basic and the others are labeled as advanced. All players
played the basic game at least once, and most of them played it several times. Approximately 1/3 were asked to
use each of the advanced demand patterns. Performance was worse in the advanced game; which reflects the
312

higher optimal score of these games (3-4 times the basic game’s optimal score of 19) as well as the added
cognitive complexity inherent in forecasting future states in the advanced games.

In the results reported here, we used a subject pool of 50 MIT students enrolled in two different introductory
System Dynamics classes. Most of them (80 %) were graduate students and the others were undergraduate
students. The task was a homework assignment and explicitly graded on consistency (and not performance). The
grade on this task counted for about 10 % of the term grade and the students had about two weeks to finish the
assignment. The game was their second computer game of the term, so they were familiar with tasks wherein
”Boom-and-bust“ phenomena could occur. Since previous results (Bakken 1988) show no significant difference
in the same game if performed by System Dynamics novices or introductory students, we assume that the findings
are generalizable to other decision makers,

Method and results

In the first basic game trial, the optimum score was 19! and the median score in experiments with groups of 10 to
50 subjects have varied between 230 and 560. Thus, we (arbitrarily) defined a high performer as one having an
average score of less than 150 in his 2 first basic games and a low performer as someone having more than 500
average score in the same games. Since most of the 50 players received scores between the two extremes, and
some players only played the first game once, only 17 players were selected for inclusion into our two groups; 6
in the low performer category and 11 in the high performer.? Below, in tables 1 and 2, performance in the games
is shown. Note that high performers do better than low performers also in the advanced games.

First two basic games [First basic game Second basic game] First advanced game
Average High performer (n=11) | 106 (44) [116 (47) 97 43) J 649 (682) |
Average Low performers (n=6) | 861 (619) [1090 (725) 633 (439) 284 (654)

I There was no relationship between the number of additional games played after the first two basic ones and subsequent advanced scores.
In the game, a score was computed after the following formula:
Score = ys Pgh
isl
Where
DP = Desired Production
PC = Production Capacity
n= 36

‘Thus high performance means low score and vice versa.

2 he low performers played 4.33 (.74) basic games on average, whereas the high performers played 3.54 (1.23), but we only report the
two first ones. [Numbers in parenthesis are standard deviations]
313

Table 1: Scores for high and low performers; standard deviations in parentheses.

Average 2 basic wins Average advanced trial

Sine advanced input: High performers (n=6) 111
Low performers (n=1) 1265 305

Noise advanced input: High performers (n=3) 8 4 232
Low performers (n=1)__819 ___933

Ramp advanced input: High performers (n=4)_—«119 1332
Low performers (n=1) __626 ______1866

Table 2: Distribution of scores according to performance category and nature of advanced game input.

In order to make investigate determinants of high performance, data files were scrutinized to find whether
equilibrium was reached the basic trials. Whereas none of the low performers reached equilibrium, 5 and 4 of the
high performance players reached equilibrium in game 1 and 2 respectively. Likewise, written reports handed in.
by the students were subjected to content analysis for the mentioning of a) an equilibrium state to reach and b)
self-reinforcing feedback; both crucial aspects in understanding the system’s behavior. The results are shown
below; 63 % of the high performers mention both factors whereas only 17 % of the low performing students do.

Eq. mentioned —_Self-reinforcing feedback mentioned Both mentioned

Equilibrium reached
1st basic 2nd basi

High performers | .50 140 1.00 63 63

Low performers | {00 200 33 .67 7

Table 3: Measures of understanding of the system taken from result file data and from content analysis of written
reports (number of hitsinumber of players).

In sum, there is a positive relation between behavior and performance data between behavior and performance
data. This is due to the opaqueness of the model as evidenced by its delay structure and self-reinforcing
relationships and players must develop understanding and heuristics in the game in the game proper. In contrast,
Broadbent et al.'s task (1978) was of another nature, so that regression extrapolation was appropriate. The
difference between a linear regression equation and a non-linear system where loop dominance shifts from
positive to negative is the main reason why we find correspondence between behavioral and performance data here
but not in Broadbent’s work.

The nature of behavior measures in both this game and previous findings is important in order to understand the
performance in the game beyond the mere recognition that performance and behavior corresponds. How and why
is behavior as it is in this system ?
Discussion of behavior data

A phenomenological account of poor performance in a related task is provided by Démer (1980). Treating the
issue of decision making strategies in a town planning environment, he finds that low performers do poorly
314

mainly because they are “thematic vagabonds”. Instead of comprehensive testing of a single hypothesis, they
abandon it before the hypothesis can be validated. Frequent shifts in strategies are counter-productive in situations
with substantial delays, such as town planning.The investment task studied here is similar, in the sense that
ordered capital does not become available at once.

Without holding on to one policy; i.e. high or low capital ordering, the appropriate strategy might never be found.
The best policy this game is to take the unpleasant medicine (ordering much; thus having initial discrepancies
between desired and actual outputs) early. If such a bold step is not taken, then the firm’s symptoms (growing
backlog of orders) ‘become successively worse and the medicine that would previously have cured the system (a
shock order of 200) will not be strong enough. In fact, the dose must be increased 10-fold if the conditions are
allowed to develop without appropriate intervention early on; taking what would have been an appropriate dose
initially will, at a later stage only make matters worse.

Is there also any evidence of difference between high and low performers in transfer of performance ? The answer
is yes; all high performers do better than low performers in the advanced game. As one would expect, however,
the difference is more marked for the noise and the sine condition. These two external inputs reflect stationary
processes with no more than 20 % deviations from the initial equilibrium. Heuristics developed from the first
basic trials, where a key issue is to calculate an expected equilibrium condition, does quite well; subjects’ general
heuristic of not paying attention to excursions from equilibrium, limiting orders and stay in a surplus capacity
situation do well. In contrast, such heuristics are devastating when the input is a ramp. By not paying sufficient
attention to the discrepancies between actual and desired quantities, their backlog grow out of bounds and poor
performance results.

The high performers indeed show many other signs that they do understand the system, 63 % of them specifically
mention that positive feedback and voice concern for future equilibrium conditions, whereas only 17 % of the low
performers do the same. Their strategies are also different from the low performers; instead of “trying to get
production capacity up to desired production” (a common statement among low performers) they voice a concern
for what the future equilibrium of the system will be. Thus they develop detectors for excursions from the
equilibrium and succeed in avoiding them and can explain why only mentioning positive feedback, as do 67 % of
the low performers, is inadequate for high performance.

But why is it that low performers do so poorly, in particular, how is it that they fail to take the supply line into
account (Sterman 1989). The high performers, since they. calculate equilibrium and understand the positive
feedback complexity pay less attention to the actual numbers on the screen; they pay less attention to actual
feedback and instead use their own mental simulation of what the state of the system will be some periods ahead.
Previous findings of high performers lack of attention to feedback (Jacoby et al., 1984) suggest that due to self-
reliance and mental simulating capacity they anticipate irregularities in the feedback and therefore pay little attention
to it. In a slightly different task, Hammond et al. (1973) has shown that if a cue is noisy, players are better off not
paying attention to outcome feedback,
315

Our experiment deals with a strong positive feedback instead of noise, but the same phenomenon occurs. By
anchoring on the underlying structure of the system and not just on its behavioral manifestations high performers
are able to discount the transient behavior of the system and devise appropriate capital ordering. In fact, by
specifically addressing the particularities of this system and by developing appropriate heuristics, they are also
able to transfer to new situations much better than the low performers.

By observing one player in detail, we have corroborated the distinction between high and low performers. Using
concurrent verbal protocols (Ericsson and Simon 1984), we had access to her information processing. This player
went from low“ to "high“ performance during the first two trials, so she is not part of either group previously
discussed. In the first trial, she does not develop any concept of equilibrium. She feels very frustrated by the fact
that by ordering more capital she simultaneously creates additional unfilled demand. Her initial inference, validated.
by early decisions and feedback, is that it pays be be cautious; “careful ordering is better”. Although se detects the
positive feedback loop in her second basic trial and the equilibrium state in her third, the initial strategy of "don’t
be aggressive“ remains a strong behavioral anchor. In her first advanced trial, she thus performers miserably with
a ramp input. Her conservative strategy, only slightly dysfunctional in her basic trials, yields disastrous results
when she must accommodate a ramp input. It takes her three advanced trials to learn that an aggressive strategy is
called for. In other words, there is more to good performance and transfer of insight than just a recognition of
self-reinforcing feedback and equilibrium.

Future work will have to address those dimensions of transfer; here we have merely established the positive
relation between several indicators of understanding and performance itself.

Implications for managerial and research practice

The main lesson from this research is that performance is a reasonable indicator of dynamic understanding. One.
can thus measure performance in a game with positive feedback, delays and non-linearities and conclude that high
performers have figured out key structural properties of the system, and that they can translate that understanding
into appropriate decisions. Management consultants and researchers who use games of complex, dynamic systems
as tools to transfer systems insights can therefore safely use performance measures as first approximations of
structural understanding.

References:

Bakken, B (1988): The Learning of System Structure by Exploring Computer Games, mimeo, System Dynamics
Group, MIT

Brehmer, B (1973): Single Cue Probability Learning as a Function of the Sign and Magnitude of the Correlation
between Cue and Criterion, Organizational Behavior and Human Performance 9, 377-395
316

Brehmer, B (1987): Systems Design and the Psychology of Complex Systems, In: Empirical Foundations of
Information and Software Science II, Eds Rasmussen, J and P Zunde, Plenum Publishing Co

Brehmer, B (1988): Strategies in Real Time Decision Making, Paper Presented at the Judgement and Decision
Making Conference, Chicago, November 1988

Broadbent, D and B Aston (1978): Human Control in a Simulated Economic System, Ergonomics 21, No 12,
1035-1043

Broadbent, D; FitzGerald, P and M Broadbent (1986): Implicit and explicit knowledge in the control of complex
systems, British Journal of Psychology 77, 33-50

Dérner, D (1980): On the Difficulties People have in Dealing with Complexity, Simulation and Games 11,No 1
Ericsson, K and Simon, H (1984): Protocol Analysis; Verbal Reports as Data, MIT Press, Cambridge, MA
Einhom, H and R Hogarth (1978): Confidence in Judgement: Persistence in the Illusion of Validity,
Psychological Review 85, No 5, 395-416

Hammond, K; Summers, D and D Deane (1973): Negative Effects of Outcome-Feedback in Multiple-Cue
Probability Learning, Organizational Behavior and Human Performance 9, 30-34

Hogarth, R (1981): Beyond Discrete Biases: Functional and Dysfunctional aspects of Judgemental Heuristics,
Psychological Bulletin 90, No 2, 197-217

Jacoby, J; Mazursky, D; Troutmann, T and A Kuss (1984): When Feedback is Ignored: Disutility of Outcome
Feedback, Journal of Applied Psychology 69, No 3, 531-545

Klayman, J and H Young-Won (1987): Confirmation, Disconfirmation and Information in Hypothesis Testing,
Psychological Review 94, No 2, 211-228

Kleinmuntz, D and J Thomas (1987): The Value of Action and Inference in Dynamic Decision Making,
Organizational Behavior and Human Decision Processes 39, 341-364

Rasmussen, J (1976): Outlines of a Hybrid Model of the Process Plant Operator, In Monitoring Behavior and
Supervisory Control, Eds T Sheridan and K Johanssen, Plenum Press, NY, NY

Simon, H (1981) Sciences of the Artificial 2nd ed., MIT Press, Cambridge, MA

Slovic, P; Fischoff, B and S Lichtenstein (1977): Behavioral Decision Theory, American Review of Psychology
28, 1-39

Sterman, J (1987): Testing Behavioral Simulation Models by Direct Experiment, Management Science 33, No 12,
1572-1592

Sterman, J‘(1989) Misperceptions of Feedback in Dynamic Decision Making, Management Science 3,
Forthcoming

Tversky, A and Kahnemann, D (1981) The Framing of Decisions and the Psychology of Choice, Science Vol 211

Metadata

Resource Type:
Document
Description:
In a dynamic simulation game portraying a multiplier-accelerator problem, there are major differences between high and low performers; high performers voice specific concerns for future states of the system, while low performers are less likely to think about the future. Planning, especially incorporating the deceptive nature of feedback, is necessary in systems that exhibit diverging long and short term behaviors. A comparison of game results with written reports show that there is a positive relationship between performance and understanding of the game. These results are contrary to previous research where performance and understanding have been unrelated (Broadbent et al. 1978, 1986), but can be explained by the added complexity of non-linear feedback tasks with shifts in loop dominance. Such tasks are, in contrast to simple regression models, non-routine and therefore verbal and behavior aspects of decision makers' mental models correspond.
Rights:
Date Uploaded:
December 5, 2019

Using these materials

Access:
The archives are open to the public and anyone is welcome to visit and view the collections.
Collection restrictions:
Access to this collection is unrestricted unless otherwide denoted.
Collection terms of access:
https://creativecommons.org/licenses/by/4.0/

Access options

Ask an Archivist

Ask a question or schedule an individualized meeting to discuss archival materials and potential research needs.

Schedule a Visit

Archival materials can be viewed in-person in our reading room. We recommend making an appointment to ensure materials are available when you arrive.