A service fail over and take over system for Erlang/OTP
Erlang/OTP positions itself in the niche for building fault tolerant software systems with redundancy over two or more independent nodes. However Erlang/OTP comes with surprisingly little built in support to make failover and takeover/migration of responsibilities between nodes in a safe way. The Fail Over and Take Over System (FOTOS) presented in this paper offers mechanisms to keep a consistent state over several nodes, and also detect partial network failures preventing individual nodes from making premature decisions. Applications cooperating over the network can make use of the guaranteed consistent information to make unanimous decisions when having to decide on to where now failed services shall be failed-over, and also in a potentially ongoing procedure where to resume the service.
Lennart Öhman Inventor of OTP design patterns
Sjöland & Thyselius Telecom AB
Lennart Öhman is a member of the Sjöland & Thyselius management
team and works with project management and customer problem analysis as
well as “hand-on” technical development. Mr. Öhman works world-wide
using his international experience to both assist international
customers and market S&T services and products.Lennart started his
career as a systems developer at Ericsson, focusing on robust, fault
tolerant and non-stop systems written in Erlang. His first project was
the Mobility Server, the first Erlang system Ericsson took into
production. In conjunction with his work with the mobility server, he
invented what later became the OTP behaviours.