Erlang Central

Difference between revisions of "Commenting a Regular Expression"

From ErlangCentral Wiki

 
m (typo)
Line 34: Line 34:
 
3> regexp:first_match("192.168.0.100", IP).
 
3> regexp:first_match("192.168.0.100", IP).
 
{match,1,13}
 
{match,1,13}
4> regexp:first_match("921.182.0", IPd).
+
4> regexp:first_match("921.182.0", IP).
 
nomatch
 
nomatch
 
</code>
 
</code>

Revision as of 12:48, 24 September 2006

Problem

You want to make it easy to maintain and understand your complex regular expressions.

Solution

Unfortunately, Erlang does not (yet) support the Perl-Compatible regular expression syntax. This removes the most useful technique -- comments inside the pattern -- for documenting your regular expressions. This means your only option for explaining your meaning to future readers of your regular expressions is to provide comments outside the string:

Rexp="[A-Za-z0-9][0-9][A-Za-z]*". % One letter, one digit, 0 or more
"[A-Za-z0-9][0-9][A-Za-z]*"
1> regexp:first_match("A0a", Rexp).
{match,1,3}

You might also aid readability by breaking complex patterns into shorter strings, then concatenating them (with the ++ operator) to form the actual regular expression for your search.

Discussion

There's really not much to say here. Erlang provides pretty weak support for commenting regular expressions. Luckily, Erlang's built-in matching functionality makes many instances where other languages would use regular expressions unnecessary.

However, people attempting to wrangle strings in Erlang should be careful to comment what the intent of the regular expression is:

2> RegIP = "(([0-2]?[0-9][0-9]?\.)+[0-2]?[0-9][0-9][0-9]?)".
"(([0-2]?[0-9][0-9]?.)+[0-2]?[0-9][0-9][0-9]?)"

Would probably be easier to understand as:

Octet = "[0-2]?[0-9][0-9]?".  % A number, 0 to 299.
IP = Octet ++ "\\." ++ Octet ++ "\\." ++ Octet ++ "\\." ++ Octet.
3> regexp:first_match("192.168.0.100", IP).
{match,1,13}
4> regexp:first_match("921.182.0", IP).
nomatch

Even with this change, Erlang's lack of the curly bracket operators in the regular expression syntax { and } mean you have to repeat patterns in many places, rather than specifying how many times to repeat.