Erlang Central

Commenting a Regular Expression

Revision as of 02:22, 4 September 2006 by Bfulgham (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Problem

You want to make it easy to maintain and understand your complex regular expressions.

Solution

Unfortunately, Erlang does not (yet) support the Perl-Compatible regular expression syntax. This removes the most useful technique -- comments inside the pattern -- for documenting your regular expressions. This means your only option for explaining your meaning to future readers of your regular expressions is to provide comments outside the string:

Rexp="[A-Za-z0-9][0-9][A-Za-z]*". % One letter, one digit, 0 or more
"[A-Za-z0-9][0-9][A-Za-z]*"
1> regexp:first_match("A0a", Rexp).
{match,1,3}

You might also aid readability by breaking complex patterns into shorter strings, then concatenating them (with the ++ operator) to form the actual regular expression for your search.

Discussion

There's really not much to say here. Erlang provides pretty weak support for commenting regular expressions. Luckily, Erlang's built-in matching functionality makes many instances where other languages would use regular expressions unnecessary.

However, people attempting to wrangle strings in Erlang should be careful to comment what the intent of the regular expression is:

2> RegIP = "(([0-2]?[0-9][0-9]?\.)+[0-2]?[0-9][0-9][0-9]?)".
"(([0-2]?[0-9][0-9]?.)+[0-2]?[0-9][0-9][0-9]?)"

Would probably be easier to understand as:

Octet = "[0-2]?[0-9][0-9]?".  % A number, 0 to 299.
IP = Octet ++ "\\." ++ Octet ++ "\\." ++ Octet ++ "\\." ++ Octet.
3> regexp:first_match("192.168.0.100", IP).
{match,1,13}
4> regexp:first_match("921.182.0", IPd).
nomatch

Even with this change, Erlang's lack of the curly bracket operators in the regular expression syntax { and } mean you have to repeat patterns in many places, rather than specifying how many times to repeat.