Chicago Erlang Conference 2014 – Monitoring Complex Systems – Brian Troutwine

By Erlang Central | Published: October 23, 2014

Imagine being responsible for monitoring 100 servers. Now imagine 1000. Each server has 100 different things to keep track of. What do you pay attention to and what do you ignore? What is important? In this talk Brian will show how Erlang can be used to capture more information without compromising clarity — i.e. to keep track of the forest without loosing site of the trees!

Brian will provide motivation for the extensive instrumentation of complex computer systems and make the argument that such systems are more technically excellent than their un-instrumented equivalents. This talk will build off his Erlang Factory and Write the Docs talks on similar subjects, providing practical starting points in Erlang projects and maintaining a perspective on the human organization around the computer system. He will focus particularly on avoiding “instrumentation blindness”, the challenge of interpreting and acting on metrics emitted from a production system in a way which does not overwhelm operators’ ability to effectively control or prioritize faults in the system. He’ll use historical examples and case studies from his work to advance this argument.

About Brian

Brian has been doing Erlang since the Multicore Crisis was an active topic of conversation, having gotten into Erlang as an undergraduate. His interests run to the fault-tolerant, distributed side of things. He works with Erlang at AdRoll where he’s a developer on the real-time bidding team (discussed at Erlang Factory 2014) and previously at Rackspace, where he was a developer on the FireEngine project (discussed at Erlang Factory 2012). Brian also does the Peculiar Books Reviewed series of reviews for the Huffington Post Code blog.