Parsing Names with Honorifics
In Railscast #16, Ryan Bates goes over Virtual Attributes in Rails, using the standard example of storing first and last names but getting/setting full names. He uses the following simple snippet:
def full_name=(name)
split = name.split(' ', 2)
self.first_name = split.first
self.last_name = split.last
end
Which -- given that the focus was on virtual attributes -- is fine for explanation. However, that snippet will fail on names like "Franklin Delano Roosevelt" (last name of "Delano Roosevelt"). Here's a method which our 32d President will like better:
def clean(n, re = /\s+|[^[:alpha:]\-]/)
return n.gsub(re, ' ').strip
end
# Returns [first_name, last_name] (or '' if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name(n)
parts = clean(n).split(' ')
[parts.slice(0..-2).join(' '), parts.last]
end
names = [
"Bill! Merkin,PhD.",
"Jim Thurston Howell III ",
"Charo",
"Heywood Jablowmie",
"Sergei Rodriguez-Ivanoviv",
"Polly Romanesq. ",
" ",
"",
]
p names.map { |n| first_last_from_name n }
# => [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], ["", "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], ["", nil], ["", nil]]
A regex is more extensible, and makes more sense for Perl refugees like me.
# Returns [first_name, last_name] (or nil if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name_re(n)
n = clean(n);
(n =~ / /) ? (n.scan(/(.*)\s+(\S+)$/).first) : [nil, n]
end
p names.map { |n| first_last_from_name_re n }
# => [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], [nil, "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], [nil, ""], [nil, ""]]
However, as someone who can't check in at the automatic kiosks in airports because -- no joke -- the credit card thinks my last name is "IV", I like this version better.
# Returns [first_name, last_name, appendix]
# (first name and appendix are nil if there isn't any).
# Leading/trailing spaces ignored.
#
def first_last_appendix_from_name_re(n, appendix = nil)
n = clean(n)
appendix_re ||= %q((I|II|III|IV|(?:jr|sr|m\.?d|esq|Ph\.?D)\.?))
if (n !~ / /) then
[nil, n, nil] # with no spaces return n as last name
else
n.scan(
/\A(.*?)\s+ # everything up to the last name
(\S+?) # last name is last stretch of non-whitespace
(?: # But! there may be an appendix. Look for an optional group
(?:,\s*|\s+) # that is set off by a comma or spaces
#{appendix_re} # and that matches any of our standard honorifics.
)? # but if not, don't worry about it.
\Z/ix).first # scan gives array of arrays; \A..\Z guarantees exactly one match
end
end
p names.map { |n| first_last_appendix_from_name_re n }
# => [["Bill", "Merkin", "PhD"], ["Jim Thurston", "Howell", "III"], [nil, "Charo", nil], ["Heywood", "Jablowmie", nil], ["Sergei", "Rodriguez-Ivanoviv", nil], ["Polly", "Romanesq", nil], [nil, "", nil], [nil, "", nil]]
All three versions might make Japanese (and other "FamilyName GivenNames" cultures) sad.
Labels: appendix, attributes, honorific, jr, match, MD, name, parse, rails, regex, ruby, sr, virtual, whitespace
Your "clean" method strips all german "Umlaute" (ä, ö, ü etc). I would be pissed to because the airport terminal would think my name is "Jrgen" instead of "Jürgen". Sorry dude, gotta make another revision.
Posted by Anonymous | March 9, 2010 at 9:45 AM