Principles of Reliable Systems
Lessons from Erlang
Strange Loop 2012
Garrett Smith, CloudBees
@gar1t
Presenter Notes
Reliability
Presenter Notes
Quality Wars, Circa 1980s
Presenter Notes
Things that break suck
Presenter Notes
Things that keep going are
awesome
Presenter Notes
Introducing Erlang
Presenter Notes
99.9999999% Uptime
Presenter Notes
Erlang's Roots: PLEX
Pseudo-parallel, event-driven real-time programming language
Dedicated for AXE telephone exchanges
Built in the 1970s by Göran Hemdahl at Ericsson
Effective, but expensive to use (low level, complex)
Presenter Notes
A New Language!
OS independent virtual machine
Massive fine grained concurrency
Asynchronous message passing
Reliability over performance
Functional pragmatism over purity
Presenter Notes
Concurrency So Easy...
Presenter Notes
The Principles
Isolation
Fault detection and recovery
Separation of concerns
Back box design
State management
Avoid complexity
Presenter Notes
Isolation
Presenter Notes
Isolation All Around Us
Memory
Threads
Files
Disks
CPU Cores
Network Interfaces
Networks
Racks
Data Centers
Presenter Notes
Fault Detection and Recovery
Presenter Notes
Failing
Have
to be able to detect failure
"Fail fast"
Avoid defensive measures
Limit the scope of failure
Presenter Notes
Recovery
Courtesy of South Park
Unplug the Internet
Wait five seconds
Plug Internet back in
Presenter Notes
Reboot Fixes Lots of Things
Presenter Notes
Even an F1 Front Wing!
Presenter Notes
Separation of Concerns
Presenter Notes
Small, Focused, Independent
Easier to reason about
Easier to test
Isolation effect - limited scope for change
Presenter Notes
Black Box Design
Presenter Notes
Appliances FTW!
Easy to setup (just plug in?)
Start button
Minimal controls
Reboot to fix
Presenter Notes
State Management
"
Presenter Notes
The Thing About State
Durability -> Recovery
Replication -> Failover
Integrity -> Repair
Consistency -> Synchronization
Presenter Notes
Four Stages of State Management
Presenter Notes
Session Failover
Courtesy Oracle
Presenter Notes
Session Punted
Presenter Notes
Avoid Complexity
Presenter Notes
Signs of Complexity
Dependencies
Nesting / Hierarchies
Resource Sharing
Lots of Code
Fear
Presenter Notes
Simple = Reliable
Presenter Notes
Step-by-Step Guide to All This
Presenter Notes
OS Processes Isolation
No shared memory
Communicate via "message passing" (stdio, sockets, pipes)
Process terminate (i.e. "fault") detection
Techniques
Standard IO "servers"
0MQ (light weight inter process communication via messages)
TCP / HTTP
Presenter Notes
Actors
No shared memory (semantically)
Queues to process messages
Inter thread communication via queue inserts (message passing)
Direct language support: Scala, Go, Erlang
Libraries: Kilim (Java), Pykka (Python), Celluloid (Ruby), libcppa (C++)
Presenter Notes
Fail Fast
Avoid defensive practices
Let exceptions propagate as far as possible
Use assertions and leave them in!
Exiting the process is not a bad idea
Presenter Notes
Process Supervision
Process monitors: runit, launchd
Standard IO "servers"
Presenter Notes
Think Small
Narrowing the scope of an “application”
Appliance oriented development
Micro SOA
Functional style programming (e.g. limit avg functions to < 4 lines)
Presenter Notes
Invest in Simplicty
If it's not obvious, work until it becomes obvious
Take small steps, doing what's clearly the next thing
Avoid building for the "future"
Presenter Notes
And In Conclusion...
Presenter Notes
Twitter FTW!
@gar1t
Presenter Notes
Table of Contents
Table of Contents
Principles of Reliable Systems
1
Reliability
2
Quality Wars, Circa 1980s
3
Things that break suck
4
Things that keep going arels
awesome
5
Introducing Erlang
6
99.9999999% Uptime
7
Erlang's Roots: PLEX
8
A New Language!
9
Concurrency So Easy...
10
The Principles
11
Isolation
12
Isolation All Around Us
13
Fault Detection and Recovery
14
Failing
15
Recovery
16
Reboot Fixes Lots of Things
17
Even an F1 Front Wing!
18
Separation of Concerns
19
Small, Focused, Independent
20
Black Box Design
21
Appliances FTW!
22
State Management
23
The Thing About State
24
Four Stages of State Management
25
Session Failover
26
Session Punted
27
Avoid Complexity
28
Signs of Complexity
29
Simple = Reliable
30
Step-by-Step Guide to All This
31
OS Processes Isolation
32
Actors
33
Fail Fast
34
Process Supervision
35
Think Small
36
Invest in Simplicty
37
And In Conclusion...
38
Twitter FTW!
39
Help
Help
Table of Contents
t
Exposé
ESC
Full screen slides
e
Presenter View
p
Source Files
s
Slide Numbers
n
Toggle screen blanking
b
Show/hide slide context
c
Notes
2
Help
h