Erlang Central

Difference between revisions of "Building Non Blocking Erlang apps"

From ErlangCentral Wiki

m (Reverted edits by Kaiserpanda (Talk); changed back to last version by Ludvig)
(One intermediate revision by one user not shown)

Revision as of 08:09, 24 November 2006




Non Blocking the easy way

This HOWTO describes a common design pattern for building non blocking erlang applications. It can be difficult to discover this pattern from just reading the standard documentation as it uses a quite well hidden feature of [1]. It is well suited to server applications which must wait for slow backend systems and where the protocols are multiplexed with some transaction ID to link requests and responses. This covers many common telco and internet protocols with the horrific exception of the most common ones - HTTP and SMTP.

The obvious starting point is to have a single process owning each communication resource. One process will own the input socket, and another process will own the socket communicating with the backend system.

Following the Erlang Way (TM) the input process will spawn a new worker process for every individual request. This should happen as soon as it has worked out the boundaries of the messages and the transaction ID. This provides maximum fault isolation between individual calls - a very important property in these days of malformed requests and denial of service.

Each worker would then ideally want to make a simple synchronous gen_server:call/3 towards the process which owns the connection to the backend, and send the result back to the input process to be sent back down the input socket. The difficulty arises in modelling the backend connection process.

How to not block but still appear synchronous to every caller

The process owning the connection towards the back end system could be implemented something like this:

call(Name, Tr_id, Request) ->
    gen_server:call(Name, {request, Tr_id, Request}, ?timeout).

handle_call({request, Tr_id, Req}, From, State) ->
    Reply = do_time_consuming_remote_operation(Tr_id, Req, State#state.socket),
    {reply, Reply, State};

The obvious difficulty is that while the process waits for the response from the slow back end system all the requests from other spawned worker processes will queue up.

The gen server behaviour provides an extremely elegant solution to this. The key is to store the From parameter provided to the handle_call/3 function and use this later in a gen_server:reply(From, Reply). call.

Using active socket mode so that packets are delivered as messages the code becomes something along the lines of:

call(Name, Tr_id, Request) ->
    gen_server:call(Name, {request, Tr_id, Request}, ?timeout).

handle_call({request, Tr_id, Req}, From, State) ->
    Data = encode(Tr_id, Req),
    gen_tcp:send(State#state.socket, Data),
    ets:store(State#state.ets, {Tr_id, From}),
    {noreply, State};

handle_info({tcp, _Socket, Data}, State) ->
    {Tr_id, Result} = decode(Data),
    [{Tr_id, From}] = ets:lookup(State#state.ets, Tr_id),
    ets:delete(State#state.ets, Tr_id),
    gen_server:reply(From, Result),
    {noreply, State};

Notice that the handle_call/3 function returns {noreply, State}. The calling worker process is left waiting, but this process is immediately ready to handle the next message.

The worker process simply waits until a response with the matching transaction id arrives from the back end server. The original 'From' reference is retrieved from the ets table based on the transaction id, and the gen_server call protocol is fulfilled with the gen_server:reply(From, Result) call.

In this way the connection handling gen_server can handle all the workers concurrently while providing a nice synchronous interface to make worker implementation a doddle.

A real implementation would also send a timer message to itself in the handle_call/3 function. This be would handled in another handle_info/2 callback, where it would delete the transaction id from the ets table and return an error to the caller with gen_server:reply(From, {error, timeout}). or similar.

The case where a response is received after timeout would also need to be handled (if transaction id not in the ets table then drop the reponse), and it would be normal to check for re-use of a transaction id and return an error/raise a congestion alarm.

Download xml