Erlang Central

Soundex Matching

Revision as of 22:12, 18 August 2006 by Cyberlync (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Problem

You want to generate Soundex hashes of surnames, for doing "sounds-like" indexing databases, or retrieving information from the US Census records and similar pre-existing databases.

Solution

Note: This library does not exist yet. Scheme data shown for the time being:

Use the soundex library:

> (soundex "Smith")
"S530"
> (soundex "Smyth")
"S530"

Both current NARA Soundex and "old" Soundex are supported (soundex is an alias for soundex-nara):

> (soundex-nara "Ashcraft")
"A261"
> (soundex-old "Ashcraft")
"A226"

Multiple Soundex keys based on prefix-skipping can be generated with the soundex-nara/prefixing, soundex-old/prefixing, and soundex/p procedures:

> (soundex/p "vanderlinden")
("V536" "D645" "L535")


Soundex is a string hash historically used by the US Census for indexing surnames by a function of what they "sound" like, rather than their precise spelling. Further general information on Soundex is available at http://www.archives.gov/research_room/genealogy/census/soundex.html.

Soundex keys are represented as four-character strings, therefore the equal? procedure can be used to compare them:

> (equal? (soundex "Johnson") (soundex "Jackson"))
#f
> (equal? (soundex "Johnson") (soundex "JANZEN"))
#t


This doesn't apply to Erlang, and is only here as a placeholder until the library is implemented. Coming to a Jungerl near you...