Go Bac
Analysis of Dynamic Complexity of an IT Organization
by Gerd A.T. Miller
Abstract
This analysis was done as part of an organizational development in 1996. The European IT department of a computing
manufacturer experienced quality and overload issues after a phase of cost reduction and centralization.
Several approaches to improve the situation with conventional methods failed. As a last step a structured process to
understand the dynamic complexity of the organization was applied. The organizational dependencies were
documented, analyzed and communicated.
Key leanings were that key dependencies in the organization crossed organizational boundaries. This created slow,
loosely coupled feedback loops and prevented improvement of the situation. Underlying shifting the burden and
accidental adversaries patterns were found. Based on the learning organizational changes and metrics were introduced
which finally solved the problems.
HewlettPackard Company
Schickardstr. 25 B32
71034 Boeblingen
Germany
phone +49(7031)4681017
email gerd-at_mueller@hp.com
keywords dynamic complexity, organizational boundaries, influencing factors, cause effect net, shifting the burden,
accidental adversaries, sensitivity
| (>
15.05.2001
Page 1
. or how to clean
the Stable of Augias
O |
nvent
by Gerd A.T. Muller
The Problem
The European IT
department experienced
severe quality, customer
satisfaction and overload
issues after a phase of cost
reduction and
centralization.
Context
¢ The European organization was located
in 5 mejor locations (Bristol, Brussels,
Boeblingen, Grenoble, Milan), each
location having full responsibility within
its geographical area. Reporting was to
European management
¢ Managerrent asked to reduce cost by
20% 25%by centralizing whatever is
possible:
¢ Central: service deployment and
implementation, event detection and
notification, predefined incident
management
Remote: explorational incident
managerrent, operations bridge,
problem management
¢ First innplementation in one site
| (>
meant 15.05.2001 Page 3
Observed Symptoms
Overload of people
¢ Not talking to each other
¢Not pulling information
Push emails with long TO, CC and
BCC to blarme others
¢ Priority conflicts
¢ Forget things, missing agreements
¢ Burnout
Complex, slow processes
¢ Many interfaces (takes 10 people to
install a server)
¢Re-re-re-acknowledgement
Knowledge
eAccount O peration Manager
doesn’t have expertise to specify
request
Remote
¢Not defined/ ill defined service level
agreements
«Mismatch between resources and
workload
¢ unattractive working conditions
* can't obsolete things
¢feel being victims, burnout
Central
«Missing engineering resources for
improvements.
¢ Daily work prevents us from
working on processes and projects
(e.g. one engineering team spent
95% on ongoing work).
¢ Insufficient quality of platform
services wa som
meant 15.05.2001 Page 4
Measurable Facts
size
productivity
workload
people
flex force
server
sites
teams with 7x24 shift
server/ person
incidents/ server
overtime
standby calls/ night
1996 2000
221 296
32% 29%
800 1600
5 4
3 3
3.6 5.4
5..50 0.5..11
—~10 .. 20% <5%
3 <0.2
change
4+B4%
+100%
+50%
-90%.. -80%
-50%.. - 75%
-70%
gatm
15.05.2001 Page 5
Attempts to Solve
Several conventional
approaches to improve the
situation with conventional
methods failed.
Issue: Usually the situation is
not analyzed from next levels
of abstraction (look at larger
system).
Repeating pattern for problem
approach
¢Team meets as problem becomes
too large
¢ Problem statement is developed,
typically language processing (LP) is
used:
¢ Identify underlying problems
¢ Develop root-cause relation ships
¢ Rate priority based on impact and
feasibility
¢Actions are initiated
¢ After few months no change of
situation is observable
| (>
meant 15.05.2001 Page 6
Language Processing
Lack of resources and difficulties of working with SD&d
(as an organization) causes most of our issues as
What are the most important and critical iss'
creating today's dissatisfactory situation?
6
7 Problem Analyses
team
platform
services
production
automation
i,
management
eee
HW
event
detection
engineering
TZ /i95
03/ 94
03/ 94
04/ 96
04/ 96
04/ 96
QMS review
fish bone
brainstorming
LP
brainstorming
what
Ill defined service level agreements, missing
engineering resources for inproverments, unclear
responsibilities
Too much daily business and old stuff, it's not clear
to other what we do, unplanned requests
No clear understanding of customer needs, no
systematic improvement process, no performance
measures guiding decisions
Not leveraging our efforts , bad product
introduction, disconnect between European Mgnt
and country function, lack of ownership
No clear understanding and documentation of
process, very complex process.
Dedicated resources to work on the operations
monitoring process at each site.
If production environment is automated then less
workload due to normal failures.
| (>
meant 15.05.2001 Page 8
Systemic Approach
Needed to try something
different — the standard
method didn’t succeed
Radical ideas are not bad
ideas!
Steps To Do
«Identify targets to change, set
objectives
«Identify key driver of the situation
(influencing factors)
¢ Select few relevant drivers, shoot for
10 or less
¢ Describe cause-effect net of relevant
drivers and their relationships
¢ Analyze the net for
¢ Sensitivity
¢ Effect spread out
¢ Effect inclusion
¢ Feedback loops
¢ Understand room to maneuver
¢ Set actions |G | gam
meant 15.05.2001 Page 9
Identify targets to
change, set objectives
Availability of applications
Productivity does not meet management
and customer expectations
Workload has reached an unacceptable
level, overtime and rest time does not
fulfill EHS requirements
Be specific!
¢W hich customer needs which availability for
which application/ environment?
¢Which ones are most important? Why?
¢Who are the managers having a problem?
¢Who are the custorrers having a problen?
¢W hat are their expectations?
¢Working time must be controllable by
employee down to legal conditions.
* Overtime should not average above 20h/
week in a 12 months period. (What is the real
legal requirement?)
¢ After a stand-by call people must rest for at
least 11 hours.
meant 15.05.2001 Page 10
Identify drivers of
situation ...
... from existing problem analyzes:
Ill defined service level agreements,
missing engineering resources for
improverrents, unclear responsibilities.
Too much daily business and old stuff,
it's not clear to other what we do,
unplanned requests. No clear
understanding of customer needs, no
systematic improvement process, no
performance measures guiding
decisions. Not leveraging our efforts ,
bad product introduction, disconnect
between European Mgmt and country
function, lack of ownership. No clear
understanding and docurrentation of
process, very complex process. ...
... and select few key
ones
Targets: workload, availability and
productivity
Work within organization:
¢ Requests for implementation
(engineering)
¢Work orders (engineering)
¢ Release to production (engineering)
¢ Pre-defined incident management
(platform services)
¢Adnin, explorational incident
management, problem management
Trigger for activities: Customer requests
(new, change)
Size of systenx #systems, resources
meant 15.05.2001
o
Z
:
g
G
Sensitivity
Understand how to influence the
system
¢What are the powerful knobs to
turn?
¢What are the risk factors
influencing and being influenced at
the same time?
¢What are most dependent factors?
Test the model by changing
strengths of inmjpact
in
chat the sare tine ata
nd being influenc
gh degree. Availability (1) is mainly
influenced as well as workload engineering
(5) to a lower 5
gatm
meant 15.05.2001 Page 13
Spread Out of RtP
Release to Production (RtP) spread
out shows that within 2 steps the
whole net is impacted.
¢ RP influences both other
organizations without direct
feedback, no incentive to make a
good job.
¢ RiP influences workload in own
organization unfavorably, incentive
to save time.
Underlying pattern: Accidental
Adversaries
gatm
15.05.2001
Page 14
Feedback Loops
Release to Production (RtP) and
Problem Management (PrM) are both
on risk not to be done if the teamis
under heavy workload. In sucha
case resources are split among
competing requests. Usually urgent
requests are prioritizes against
important ones (e.g. PrM against
Incident Management, RtP against
work order). If this happens the
situation will become worse with a
time delay of ~3 months through the
enforcing feedback loops.
Underlying pattem: Shifting The
Burden
| (>
meant 15.05.2001 Page 15
Results
Organization stabilized after 6-12 ¢ Step by step implementation of fixes
months ¢ First results visible after 3 months
Learning ¢ System thinking is a powerful tool to
understand and document
Too complex to communicate to
management
Shifting the Burden ¢ Simulation for support environment
developed
¢ Metric “incidents/ (servers*day)” introduced
¢ Balanced scorecard implemented
Accidental Adversaries «New organizational setup shoots for
“autonomous cells” to have broad
responsibility in one team
¢ Aligning metrics to have clear ownership
Next steps ¢ How do we broaden this knowledge in the
organization?
¢ How do we deal with injposed
organizational setup?
| (>
meant 15.05.2001 Page 16