Erlang Central

OTP Release Handling Tutorial

Revision as of 08:50, 30 June 2006 by 87.74.0.243 (Talk)

Contents

OTP Release Handling Tutorial.

Introduction

This tutorial attempts to show by example how to build a proper OTP-based system.

A quick way to get started might be to copy this example system, and modify the configuration files. I will try to explain the purpose of each file and suggest how they may be modified.

The Example System

The example system (called "example") runs on two processors (note that the two processors can easily be two Erlang nodes on the same physical machine), and contains two applications:

  • base, which runs on both processors. This program doesn't do much interesting.
  • dist, which runs on only one processor at a time. This program does one interesting thing: smooth takeover of state. This involves start phases, the global name server, and some other exciting concepts.

Basic File Structure

Note:

The code is riddled with comments. If you view it with e.g. Emacs using fontification, it may be easier to read.


With this file structure in place, you are ready to support in-service upgrade, but that will be described in a future tutorial.

Compiling the code.

...Nothing to it, really. I didn't bother with make scripts and stuff like that. Go into each src/ directory and type:

Code listing 1.1

erlc -W -o ../ebin *.erl
	  

Building the boot script.

The easiest way to build the boot script is to place yourself in the $DIR/releases/1.0 directory, start an Erlang shell, and type the following:

Code listing 1.2

Eshell V5.2  (abort with ^G)
1>
<... output snipped>
=PROGRESS REPORT==== 4-Dec-2002::16:55:34 ===
         application: sasl
          started_at: nonode@nohost

1> Dir = "/home/etxuwig/work/erlang/release_tutorial".
"/home/etxuwig/work/erlang/release_tutorial"
2> Path = [Dir ++ "/lib/*/ebin"].
["/home/etxuwig/work/erlang/release_tutorial/lib/*/ebin"]
3> Var = {"MYAPPS", Dir}.
{"MYAPPS","/home/etxuwig/work/erlang/release_tutorial"}
4> systools:make_script("example",[{path,Path},{variables,[Var]}]).
ok
	  

Note:

The file example.rel contains hard-coded versions of kernel, stdlib and sasl. Depending on your version of OTP, the building of the boot script may fail. Fortunately, the error message is quite helpful in showing what needs to be changed.

Now, you should be able to see an example.script file in releases/1.0/. It contains instructions for the Erlang/OTP boot loader. The .script file is converted into an Erlang binary which is stored in example.boot in the same directory.

Now, you should be able to see an example.script file in releases/1.0/. It contains instructions for the Erlang/OTP boot loader. The .script file is converted into an Erlang binary which is stored in example.boot in the same directory.

Making a tar file.

Using systools:make_tar("example", Options) (where Options is the same list of options as for make_script/2, you can pack your release into a tar file, and unpack it on a target system. The -boot_var option makes the code re-locatable. See erl -man systools for more detailed instructions.

Running the example.

There are tricks for starting an embedded system and being able to attach a shell to a node, but that's another tutorial. </p>

I will show how one could easily get something up and running on a Unix workstation. Windows users will have to translate.

  • Place yourself in $DIR/releases/1.0(for convenience -- you could of course do this from any directory.)
  • Start two shell windows (e.g. xterm)
  • In one xterm, write:

    Code listing .3

    erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n1
  • In the other xterm, write:

    Code listing .4

    erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n2

It doesn't really matter if you start both nodes at once, or one at a time. In the sys.config file, a node synchronization timeout of 10 seconds was specified. After that, the first node will continue alone if the other node has not yet appeared.

Note:

The sys.config file contains hard-coded node names. You need to make sure that the node names are correct in your environment for the example to work.

This is of course an interesting thing to try. If you start n1 first, you may see the following output:

Code listing 1.5

[etxuwig@cbe1066]: erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n1
Erlang (BEAM) emulator version 5.2 [hipe] [threads:0]

Eshell V5.2  (abort with ^G)
(n1@cbe1066)1> 
=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.45.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.46.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.44.0>},
                       {name,sasl_safe_sup},
                       {mfa,{supervisor,
                                start_link,
                                [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.47.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: sasl
          started_at: n1@cbe1066
base_server starting.

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,base_super}
             started: [{pid,<0.53.0>},
                       {name,server},
                       {mfa,{base_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: base
          started_at: n1@cbe1066
dist_app:start(normal, _)
dist_server starting.

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
          supervisor: {local,dist_super}
             started: [{pid,<0.58.0>},
                       {name,server},
                       {mfa,{dist_server,
                                start_link,
                                [#Fun<dist_super.0.126311943>,
                                 #Fun<dist_super.1.36309860>]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]
dist_app:start_phase(takeover, _)
dist_app:start_phase(go, _)
handle_call({go, normal},...)

=PROGRESS REPORT==== 5-Dec-2002::16:40:27 ===
         application: dist
          started_at: n1@cbe1066

(n1@cbe1066)1> 
(n1@cbe1066)1> global:whereis_name(dist_server).
<0.58.0>
(n1@cbe1066)2> dist_server:get_value().
undefined
(n1@cbe1066)3> dist_server:set_value(17).
{ok,undefined}

	

We can see that the globally registered dist_server is running locally, and we can call the API functions get_value/0 and set_value/1.

If we now start n2, dist_server should migrate over to that node (since it is so specified in the sys.config file.)

Code listing 1.6

[etxuwig@cbe1066]: erl -boot ./example -config ./sys -boot_var MYAPPS $DIR -sname n2
Erlang (BEAM) emulator version 5.2 [hipe] [threads:0]

Eshell V5.2  (abort with ^G)
(n2@cbe1066)1> 
=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.46.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.47.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.45.0>},
                       {name,sasl_safe_sup},
                       {mfa,{supervisor,
                                start_link,
                                [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.48.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: sasl
          started_at: n2@cbe1066
base_server starting.

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,base_super}
             started: [{pid,<0.54.0>},
                       {name,server},
                       {mfa,{base_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: base
          started_at: n2@cbe1066
dist_app:start({takeover,n1@cbe1066}, _)
dist_server starting.

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
          supervisor: {local,dist_super}
             started: [{pid,<0.59.0>},
                       {name,server},
                       {mfa,{dist_server,
                                start_link,
                                [#Fun<dist_super.0.126311943>,
                                 #Fun<dist_super.1.36309860>]}},
                       {restart_type,permanent},
                       {shutdown,10000},
                       {child_type,worker}]
dist_app:start_phase(takeover, {takeover,n1@cbe1066}, _)
dist_app:start_phase(go, _)

=PROGRESS REPORT==== 5-Dec-2002::17:27:15 ===
         application: dist
          started_at: n2@cbe1066

(n2@cbe1066)1> 
(n2@cbe1066)1> global:whereis_name(dist_server).
<0.59.0>
(n2@cbe1066)2> dist_server:get_value().
17
	

In the first node, n1, we can see the following output:

Code listing 1.7

=INFO REPORT==== 5-Dec-2002::17:27:15 ===
    application: dist
    exited: stopped
    type: permanent
	

We can see that dist_server brought the state variable along when migrating to the other node (it did not bring the special function objects along, in order to avoid nasty surprises.)

We can now try different combinations of starting and killing the two nodes.

Download xml

release_handling_tutorial.xml