A service fail over and take over system for Erlang/OTP


By Erlang Central | Published: April 27, 2009



Erlang/OTP positions itself in the niche for building fault tolerant software systems with redundancy over two or more independent nodes. However Erlang/OTP comes with surprisingly little built in support to make failover and takeover/migration of responsibilities between nodes in a safe way. The Fail Over and Take Over System (FOTOS) presented in this paper offers mechanisms to keep a consistent state over several nodes, and also detect partial network failures preventing individual nodes from making premature decisions. Applications cooperating over the network can make use of the guaranteed consistent information to make unanimous decisions when having to decide on to where now failed services shall be failed-over, and also in a potentially ongoing procedure where to resume the service.


Download Presentation Download

Speakers:

  • Lennart Öhman

    Inventor of OTP design patterns
    Sjöland & Thyselius Telecom AB

    Lennart Öhman is a member of the Sjöland & Thyselius management team and works with project management and customer problem analysis as well as “hand-on” technical development. Mr. Öhman works world-wide using his international experience to both assist international customers and market S&T services and products.Lennart started his career as a systems developer at Ericsson, focusing on robust, fault tolerant and non-stop systems written in Erlang. His first project was the Mobility Server, the first Erlang system Ericsson took into production. In conjunction with his work with the mobility server, he invented what later became the OTP behaviours.

    Lennart Öhman