32
Nameparser Documentation Release 1.0.2 Derek Gulbranson Oct 26, 2018

Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

  • Upload
    others

  • View
    34

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser DocumentationRelease 1.0.2

Derek Gulbranson

Oct 26, 2018

Page 2: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title
Page 3: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Contents

1 Parsing Names 31.1 Using the HumanName Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Customizing the Parser with Your Own Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 HumanName Class Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Naming Practices and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5 Release Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.6 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Indices and tables 23

Python Module Index 25

i

Page 4: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

ii

Page 5: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

Version 1.0.2

A simple Python module for parsing human names into their individual components.

• hn.title

• hn.first

• hn.middle

• hn.last

• hn.suffix

• hn.nickname

Supports 3 different comma placement variations in the input string.

1. Title Firstname “Nickname” Middle Middle Lastname Suffix

2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]

3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]

It attempts the best guess that can be made with a simple, rule-based approach. It’s not perfect, but it gets you prettyfar.

Its main use case is English, but it may be useful for other latin-based languages, especially if you are willing tocustomize it, but it is not likely to be useful for languages that do not share the same structure as English names.

Instantiating the HumanName class with a string splits on commas and then spaces, classifying name parts based onplacement in the string and matches against known name pieces like titles. It joins name pieces on conjunctions andspecial prefixes to last names like “del”. Titles can be chained together and include conjunctions to handle titles like“Asst Secretary of State”. It can also try to correct capitalization.

It does not attempt to correct input mistakes. When there is ambiguity that cannot be resolved by a rule-based approach,HumanName prefers to handle the most common cases correctly. For example, “Dean” is not parsed as title becauseit is more common as a first name (You can customize this behavior though, see Parser Customization Examples).

Contents 1

Page 6: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

2 Contents

Page 7: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

CHAPTER 1

Parsing Names

1.1 Using the HumanName Parser

1.1.1 Example Usage

The examples use Python 3, but Python 2.6+ is supported.

>>> from nameparser import HumanName>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")>>> name.title'Dr.'>>> name["title"]'Dr.'>>> name.first'Juan'>>> name.middle'Q. Xavier'>>> name.last'de la Vega'>>> name.suffix'III'>>> name.surnames'Q. Xavier de la Vega'>>> name.full_name = "Juan Q. Xavier Velasquez y Garcia, Jr.">>> name<HumanName : [

title: ''first: 'Juan'middle: 'Q. Xavier'last: 'Velasquez y Garcia'suffix: 'Jr.'nickname: ''

]>>>> name.middle = "Jason Alexander"

(continues on next page)

3

Page 8: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

(continued from previous page)

>>> name.middle'Jason Alexander'>>> name<HumanName : [

title: ''first: 'Juan'middle: 'Jason Alexander'last: 'Velasquez y Garcia'suffix: 'Jr.'nickname: ''

]>>>> name.middle = ["custom","values"]>>> name.middle'custom values'>>> name.full_name = 'Doe-Ray, Jonathan "John" A. Harris'>>> name.as_dict(){'last': 'Doe-Ray', 'suffix': '', 'title': '', 'middle': 'A. Harris', 'nickname':→˓'John', 'first': 'Jonathan'}>>> name.as_dict(False) # add False to hide keys with empty values{'middle': 'A. Harris', 'nickname': 'John', 'last': 'Doe-Ray', 'first': 'Jonathan'}>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")>>> name2 = HumanName("de la vega, dr. juan Q. xavier III")>>> name == name2True>>> len(name)5>>> list(name)['Dr.', 'Juan', 'Q. Xavier', 'de la Vega', 'III']>>> name[1:-2]['Juan', 'Q. Xavier', 'de la Vega']

1.1.2 Capitalization Support

The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. Bydefault, it will not adjust the case of names entered in mixed case. To run capitalization on all names pass the parameterforce=True.

Capitalize the name.

• bob v. de la macdole-eisenhower phd -> Bob V. de la MacDole-Eisenhower Ph.D.

>>> name = HumanName("bob v. de la macdole-eisenhower phd")>>> name.capitalize()>>> str(name)'Bob V. de la MacDole-Eisenhower Ph.D.'>>> name = HumanName('Shirley Maclaine') # Don't change mixed case names>>> name.capitalize()>>> str(name)'Shirley Maclaine'>>> name.capitalize(force=True)>>> str(name)'Shirley MacLaine'

4 Chapter 1. Parsing Names

Page 9: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

1.1.3 Nickname Handling

The content of parenthesis or quotes in the name will be available from the nickname attribute.

>>> name = HumanName('Jonathan "John" A. Smith')>>> name<HumanName : [

title: ''first: 'Jonathan'middle: 'A.'last: 'Smith'suffix: ''nickname: 'John'

]>

1.1.4 Change the output string with string formatting

The string representation of a HumanName instance is controlled by its string_format attribute. The default value,“{title} {first} {middle} {last} {suffix} ({nickname})”, includes parenthesis around nicknames. Trailing commas andempty quotes and parenthesis are automatically removed if the name has no nickname pieces.

You can change the default formatting for all HumanName instances by setting a new string_format value on theshared CONSTANTS configuration instance.

>>> from nameparser.config import CONSTANTS>>> CONSTANTS.string_format = "{title} {first} ({nickname}) {middle} {last} {suffix}">>> name = HumanName('Robert Johnson')>>> str(name)'Robert Johnson'>>> name = HumanName('Robert "Rob" Johnson')>>> str(name)'Robert (Rob) Johnson'

You can control the order and presence of any name fields by changing the string_format attribute of the sharedCONSTANTS instance. Don’t want to include nicknames in your output? No problem. Just omit that keyword fromthe string_format attribute.

>>> from nameparser.config import CONSTANTS>>> CONSTANTS.string_format = "{title} {first} {last}">>> name = HumanName("Dr. Juan Ruiz de la Vega III (Doc Vega)")>>> str(name)'Dr. Juan de la Vega'

1.2 Customizing the Parser with Your Own Configuration

Recognition of titles, prefixes, suffixes and conjunctions is handled by matching the lower case characters of a namepiece with pre-defined sets of strings located in nameparser.config. You can adjust these predefined sets to helpfine tune the parser for your dataset.

1.2.1 Changing the Parser Constants

There are a few ways to adjust the parser configuration depending on your needs. The config is available in two places.

1.2. Customizing the Parser with Your Own Configuration 5

Page 10: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

The first is via from nameparser.config import CONSTANTS.

>>> from nameparser.config import CONSTANTS>>> CONSTANTS<Constants() instance>

The other is the C attribute of a HumanName instance, e.g. hn.C.

>>> from nameparser import HumanName>>> hn = HumanName("Dean Robert Johns")>>> hn.C<Constants() instance>

Both places are usually a reference to the same shared module-level CONSTANTS instance, depending on how youinstantiate the HumanName class (see below).

Editable attributes of nameparser.config.CONSTANTS

• TITLES - Pieces that come before the name. Includes all first_name_titles. Cannot include things that may befirst names.

• FIRST_NAME_TITLES - Titles that, when followed by a single name, that name is a first name, e.g. “KingDavid”.

• SUFFIX_ACRONYMS - Pieces that come at the end of the name that may or may not have periods separatingthe letters, e.g. “m.d.”.

• SUFFIX_NOT_ACRONYMS - Pieces that come at the end of the name that never have periods separating theletters, e.g. “Jr.”.

• CONJUNCTIONS - Connectors like “and” that join the preceding piece to the following piece.

• PREFIXES - Connectors like “del” and “bin” that join to the following piece but not the preceding, similar totitles but can appear anywhere in the name.

• CAPITALIZATION_EXCEPTIONS - Dictionary of pieces that do not capitalize the first letter, e.g. “Ph.D”.

• REGEXES - Regular expressions used to find words, initials, nicknames, etc.

Each set of constants comes with add() and remove() methods for tuning the constants for your project. Thesemethods automatically lower case and remove punctuation to normalize them for comparison.

Other editable attributes

• string_format - controls output from str()

• empty_attribute_default - value returned by empty attributes, defaults to empty string

1.2.2 Parser Customization Examples

Removing a Title

Take a look at the nameparser.config documentation to see what’s in the constants. Here’s a quick walk throughof some examples where you might want to adjust them.

“Hon” is a common abbreviation for “Honorable”, a title used when addressing judges, and is included in the defaulttiles constants. This means it will never be considered a first name, because titles are the pieces before first names.

6 Chapter 1. Parsing Names

Page 11: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

But “Hon” is also sometimes a first name. If your dataset contains more “Hon”s than “Honorable”s, you may wish toremove it from the titles constant so that “Hon” can be parsed as a first name.

>>> from nameparser import HumanName>>> hn = HumanName("Hon Solo")>>> hn<HumanName : [

title: 'Hon'first: ''middle: ''last: 'Solo'suffix: ''nickname: ''

]>>>> from nameparser.config import CONSTANTS>>> CONSTANTS.titles.remove('hon')SetManager({'right', ..., 'tax'})>>> hn = HumanName("Hon Solo")>>> hn<HumanName : [

title: ''first: 'Hon'middle: ''last: 'Solo'suffix: ''nickname: ''

]>

If you don’t want to detect any titles at all, you can remove all of them:

>>> CONSTANTS.titles.remove(*CONSTANTS.titles)

Adding a Title

You can also pass a Constants instance to HumanName on instantiation.

“Dean” is a common first name so it is not included in the default titles constant. But in some contexts it is morecommon as a title. If you would like “Dean” to be parsed as a title, simply add it to the titles constant.

You can pass multiple strings to both the add() and remove() methods and each string will be added or removed.Both functions automatically normalize the strings for the parser’s comparison method by making them lower caseand removing periods.

>>> from nameparser import HumanName>>> from nameparser.config import Constants>>> constants = Constants()>>> constants.titles.add('dean', 'Chemistry')SetManager({'right', ..., 'tax'})>>> hn = HumanName("Assoc Dean of Chemistry Robert Johns", constants=constants)>>> hn<HumanName : [

title: 'Assoc Dean of Chemistry'first: 'Robert'middle: ''last: 'Johns'suffix: ''

(continues on next page)

1.2. Customizing the Parser with Your Own Configuration 7

Page 12: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

(continued from previous page)

nickname: '']>

1.2.3 Module-level Shared Configuration Instance

When you modify the configuration, by default this will modify the behavior all HumanName instances. This could bea handy way to set it up for your entire project, but it could also lead to some unexpected behavior because changingthe config on one instance could modify the behavior of another instance.

>>> from nameparser import HumanName>>> instance = HumanName("")>>> instance.C.titles.add('dean')SetManager({'right', ..., 'tax'})>>> other_instance = HumanName("Dean Robert Johns")>>> other_instance # Dean parses as title<HumanName : [

title: 'Dean'first: 'Robert'middle: ''last: 'Johns'suffix: ''nickname: ''

]>

If you’d prefer new instances to have their own config values, one shortcut is to pass None as the second argument (orconstant keyword argument) when instantiating HumanName. Each instance always has a C attribute, but if youdidn’t pass something falsey to the constants argument then it’s a reference to the module-level config values withthe behavior described above.

>>> from nameparser import HumanName>>> instance = HumanName("Dean Robert Johns")>>> instance.has_own_configFalse>>> instance.C.titles.add('dean')SetManager({'right', ..., 'tax'})>>> other_instance = HumanName("Dean Robert Johns", None) # <-- pass None for per-→˓instance config>>> other_instance<HumanName : [

title: ''first: 'Dean'middle: 'Robert'last: 'Johns'suffix: ''nickname: ''

]>>>> other_instance.has_own_configTrue

Don’t Remove Emojis

By default, all emojis are removed from the input string before the name is parsed. You can turn this off by setting theemoji regex to False.

8 Chapter 1. Parsing Names

Page 13: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

>>> from nameparser import HumanName>>> from nameparser.config import Constants>>> constants = Constants()>>> constants.regexes.emoji = False>>> hn = HumanName("Sam Smith", constants=constants)>>> hn"Sam Smith"

Config Changes May Need Parse Refresh

The full name is parsed upon assignment to the full_name attribute or instantiation. Sometimes after makingchanges to configuration or other inner data after assigning the full name, the name will need to be re-parsed with theparse_full_name() method before you see those changes with repr().

Adjusting names after parsing them

Each attribute has a corresponding ordered list of name pieces. If you’re doing pre- or post-processing you may wishto manipulate these lists directly. The strings returned by the attribute names just join these lists with spaces.

• o.title_list

• o.first_list

• o.middle_list

• o.last_list

• o.suffix_list

• o.nickname_list

>>> hn = HumanName("Juan Q. Xavier Velasquez y Garcia, Jr.")>>> hn.middle_list['Q.', 'Xavier']>>> hn.middle_list += ["Ricardo"]>>> hn.middle_list['Q.', 'Xavier', 'Ricardo']

You can also replace any name bucket’s contents by assigning a string or a list directly to the attribute.

>>> hn = HumanName("Dr. John A. Kenneth Doe")>>> hn.title = ["Associate","Professor"]>>> hn.suffix = "Md.">>> hn.suffix<HumanName : [

title: 'Associate Processor'first: 'John'middle: 'A. Kenneth'last: 'Doe'suffix: 'Md.'nickname: ''

]>

Developer Documentation

1.2. Customizing the Parser with Your Own Configuration 9

Page 14: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

1.3 HumanName Class Documentation

1.3.1 HumanName.parser

class nameparser.parser.HumanName

class nameparser.parser.HumanName(full_name=u”, constants=<Constants() instance>,encoding=u’UTF-8’, string_format=None)

Parse a person’s name into individual components.

Instantiation assigns to full_name, and assignment to full_name triggers parse_full_name(). Afterparsing the name, these instance attributes are available.

HumanName Instance Attributes

• title

• first

• middle

• last

• suffix

• nickname

• surnames

Parameters

• full_name (str) – The name string to be parsed.

• constants (constants) – a Constants instance. Pass None for per-instance config.

• encoding (str) – string representing the encoding of your input

• string_format (str) – python string formatting

C = <Constants() instance>A reference to the configuration for this instance, which may or may not be a reference to the shared,module-wide instance at CONSTANTS. See Customizing the Parser.

__eq__(other)HumanName instances are equal to other objects whose lower case unicode representation is the same.

__init__(full_name=u”, constants=<Constants() instance>, encoding=u’UTF-8’,string_format=None)

x.__init__(. . . ) initializes x; see help(type(x)) for signature

are_suffixes(pieces)Return True if all pieces are suffixes.

as_dict(include_empty=True)Return the parsed name as a dictionary of its attributes.

Parameters include_empty (bool) – Include keys in the dictionary for empty name at-tributes.

Return type dict

10 Chapter 1. Parsing Names

Page 15: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

>>> name = HumanName("Bob Dole")>>> name.as_dict(){'last': 'Dole', 'suffix': '', 'title': '', 'middle': '', 'nickname': '',→˓'first': 'Bob'}>>> name.as_dict(False){'last': 'Dole', 'first': 'Bob'}

capitalize(force=False)The HumanName class can try to guess the correct capitalization of name entered in all upper or lowercase. By default, it will not adjust the case of names entered in mixed case. To run capitalization on allnames pass the parameter force=True.

Parameters force (bool) – force capitalization of strings that include mixed case

Usage

>>> name = HumanName('bob v. de la macdole-eisenhower phd')>>> name.capitalize()>>> str(name)'Bob V. de la MacDole-Eisenhower Ph.D.'>>> # Don't touch good names>>> name = HumanName('Shirley Maclaine')>>> name.capitalize()>>> str(name)'Shirley Maclaine'>>> name.capitalize(force=True)>>> str(name)'Shirley MacLaine'

firstThe person’s first name. The first name piece after any known title pieces parsed from full_name.

full_nameThe name string to be parsed.

handle_firstnames()If there are only two parts and one is a title, assume it’s a last name instead of a first name. e.g. Mr.Johnson. Unless it’s a special title like “Sir”, then when it’s followed by a single name that name is alwaysa first name.

has_own_configTrue if this instance is not using the shared module-level configuration.

is_an_initial(value)Words with a single period at the end, or a single uppercase letter.

Matches the initial regular expression in REGEXES.

is_conjunction(piece)Is in the conjuctions set and not is_an_initial().

is_prefix(piece)Lowercase and no periods version of piece is in the PREFIXES set.

is_roman_numeral(value)Matches the roman_numeral regular expression in REGEXES.

is_rootname(piece)Is not a known title, suffix or prefix. Just first, middle, last names.

1.3. HumanName Class Documentation 11

Page 16: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

is_suffix(piece)Is in the suffixes set and not is_an_initial().

Some suffixes may be acronyms (M.B.A) while some are not (Jr.), so we remove the periods from piecewhen testing against C.suffix_acronyms.

is_title(value)Is in the TITLES set.

join_on_conjunctions(pieces, additional_parts_count=0)Join conjunctions to surrounding pieces. Title- and prefix-aware. e.g.:

[‘Mr.’, ‘and’. ‘Mrs.’, ‘John’, ‘Doe’] ==> [‘Mr. and Mrs.’, ‘John’, ‘Doe’]

[‘The’, ‘Secretary’, ‘of’, ‘State’, ‘Hillary’, ‘Clinton’] ==> [‘The Secretary of State’,‘Hillary’, ‘Clinton’]

When joining titles, saves newly formed piece to the instance’s titles constant so they will be parsedcorrectly later. E.g. after parsing the example names above, ‘The Secretary of State’ and ‘Mr. and Mrs.’would be present in the titles constant set.

Parameters

• pieces (list) – name pieces strings after split on spaces

• additional_parts_count (int) –

Returns new list with piece next to conjunctions merged into one piece

with spaces in it. :rtype: list

lastThe person’s last name. The last name piece parsed from full_name.

middleThe person’s middle names. All name pieces after the first name and before the last name parsed fromfull_name.

nicknameThe person’s nicknames. Any text found inside of quotes ("") or parenthesis (())

original = u''The original string, untouched by the parser.

parse_full_name()The main parse method for the parser. This method is run upon assignment to the full_name attributeor instantiation.

Basic flow is to hand off to pre_process() to handle nicknames. It then splits on commas and choosesa code path depending on the number of commas.

parse_pieces() then splits those parts on spaces and join_on_conjunctions() joins anypieces next to conjunctions.

parse_nicknames()The content of parenthesis or quotes in the name will be added to the nicknames list. This happens beforeany other processing of the name.

Single quotes cannot span white space characters to allow for single quotes in names like O’Connor.Double quotes and parenthesis can span white space.

Loops through 3 REGEXES; quoted_word, double_quotes and parenthesis.

12 Chapter 1. Parsing Names

Page 17: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

parse_pieces(parts, additional_parts_count=0)Split parts on spaces and remove commas, join on conjunctions and lastname prefixes. If parts have periodsin the middle, try splitting on periods and check if the parts are titles or suffixes. If they are add to theconstant so they will be found.

Parameters

• parts (list) – name part strings from the comma split

• additional_parts_count (int) – if the comma format contains other parts, weneed to know how many there are to decide if things should be considered a conjunction.

Returns pieces split on spaces and joined on conjunctions

Return type list

post_process()This happens at the end of the parse_full_name() after all other processing has taken place. Runshandle_firstnames().

pre_process()This method happens at the beginning of the parse_full_name() before any other processing of thestring aside from unicode normalization, so it’s a good place to do any custom handling in a subclass. Runsparse_nicknames() and py:func:squash_emoji.

squash_emoji()Remove emoji from the input string.

suffixThe persons’s suffixes. Pieces at the end of the name that are found in suffixes, or pieces that are atthe end of comma separated formats, e.g. “Lastname, Title Firstname Middle[,] Suffix [, Suffix]” parsedfrom full_name.

surnamesA string of all middle names followed by the last name.

surnames_listList of middle names followed by last name.

titleThe person’s titles. Any string of consecutive pieces in titles or conjunctions at the beginning offull_name.

1.3.2 HumanName.config

The nameparser.config module manages the configuration of the nameparser.

A module-level instance of Constants is created and used by default for all HumanName instances. You can adjustthe entire module’s configuration by importing this instance and changing it.

>>> from nameparser.config import CONSTANTS>>> CONSTANTS.titles.remove('hon').add('chemistry','dean')SetManager(set([u'msgt', ..., u'adjutant']))

You can also adjust the configuration of individual instances by passing None as the second argument upon instantia-tion.

>>> from nameparser import HumanName>>> hn = HumanName("Dean Robert Johns", None)>>> hn.C.titles.add('dean')

(continues on next page)

1.3. HumanName Class Documentation 13

Page 18: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

(continued from previous page)

SetManager(set([u'msgt', ..., u'adjutant']))>>> hn.parse_full_name() # need to run this again after config changes

Potential Gotcha: If you do not pass None as the second argument, hn.C will be a reference to the module config,possibly yielding unexpected results. See Customizing the Parser.

nameparser.config.CONSTANTS = <Constants() instance>A module-level instance of the Constants() class. Provides a common instance for the module to share toeasily adjust configuration for the entire module. See Customizing the Parser with Your Own Configuration.

14 Chapter 1. Parsing Names

Page 19: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

class nameparser.config.Constants(prefixes=set([u’dela’, u’san’, u’von’, u’le’, u’degli’, u’la’,u’abu’, u’dei’, u’vel’, u’bin’, u’do’, u’dxed’, u’di’, u’dal’,u’de’, u’da’, u’santa’, u’van’, u’du’, u’ste’, u’ibn’, u’der’,u’st’, u’dello’, u’del’, u’bon’, u’delli’, u’dos’, u’delle’,u’della’]), suffix_acronyms=set([u’dbe’, u’kbe’, u’gc’,u’gm’, u’lg’, u’idsm’, u’lt’, u’mbe’, u’clu’, u’td’, u’do’,u’kcb’, u’kcsi’, u’dmd’, u’gbe’, u’cie’, u’cmg’, u’qc’,u’gcvo’, u’dpm’, u’pmp’, u’bart’, u’ed’, u’rrc’, u’kcvo’,u’cfp’, u’dds’, u’gcmg’, u’rd’, u’dso’, u’dsm’, u’dsc’,u’dcm’, u’cgm’, u’chfc’, u’dcb’, u’cgc’, u’kcie’, u’bt’,u’dcmg’, u’gcsi’, u’vrd’, u’jd’, u’mscmsm’, u’obi’, u’obe’,u’iso’, u’mvo’, u’ch’, u’cb’, u’mba’, u’sgm’, u’vd’, u’qpm’,u’qgm’, u’dfm’, u’dcvo’, u’dfc’, u’om’, u’md’, u’ma’, u’mc’,u’bem’, u’mm’, u’erd’, u’cvo’, u’mp’, u’ud’, u’lvo’, u’vc’,u’ae’, u’cbe’, u’rvm’, u’gcie’, u’afm’, u’gcb’, u’arrc’,u’qfsm’, u’afc’, u’qam’, u’csm’, u’kcmg’, u’csi’, u’phd’,u’iom’, u’phr’, u’dvm’, u’kg’, u’cpm’, u’cpa’, u’kp’, u’kt’]),suffix_not_acronyms=set([u’jnr’, u’esq’, u’i’, u’sr’, u’v’,u’jr’, u’iv’, u’ii’, u’2’, u’snr’, u’esquire’, u’iii’, u’dr’,u’junior’]), titles=set([u’msgt’, u’coach’, u’founder’,u’manager’, u’legal’, u’rebbe’, u’chair’, u’captain’,u’ballet’, u’baron’, u’father’, u’literary’, u’keyboardist’,u’ccmsgt’, u’merchant’, u’adviser’, u’dutchess’, u’lamido’,u’mag/judge’, u’surgeon’, u’missionary’, u’prefect’,u’magnate’, u’scholar’, u’investigator’, u’excellency’,u’celebrity’, u’brother’, u’delegate’, u’judicial’, u’dir’,u’cfo’, u’sultana’, u’docent’, u’chef’, u’honourable’,u’lawyer’, u’7th’, u’subaltern’, u’business’, u’2ndlt’,u’hereditary’, u’nurse’, u’jurist’, u’admiral’, u’9th’,u’clerk’, u’theorist’, u’ranger’, u’baseball’, u’nanny’,u’abbess’, u’dramatist’, u’teacher’, u’knowledge’,u’cyclist’, u’publisher’, u’comptroller’, u’mpco-cg’,u’technical’, u’envoy’, u’united’, u’credit’, u’musicologist’,u’advertising’, u’social’, u’dra’, u’military’, u’mag-judge’, u’cmsgt’, u’family’, u’deputy’, u’courtier’, u’sgt’,u’private’, u’sgm’, u’composer’, u’1st’, u’bandleader’,u’army’, u’archbishop’, u’archdruid’, u’sysselmann’,u’ayatollah’, u’msg’, u’pres’, u’baba’, u’pfc’, u’lcdr’,u’biblical’, u’cwo-2’, u’musician’, u’heir’, u’flag’,u’excellent’, u’commander’, u’alderman’, u’chaplain’,u’md’, u’mg’, u’primate’, u’patriarch’, u’ms’, u’mr’,u’entertainer’, u’giani’, u’mufti’, u’suffragist’, u’division’,u’tax’, u’high’, u’critic’, u’cpo’, u’2lt’, u’spc’,u’botanist’, u’risk’, u’csm’, u’sir’, u’lama’, u’guru’,u’hon’, u’effendi’, u’wo-1’, u"king’s", u’drummer’,u’cardinal’, u’ltg’, u’banker’, u’edohen’, u’designer’,u’information’, u’customs’, u’4th’, u’mag’, u’president’,u’law’, u’sr’, u’doctor’, u’psychologist’, u’presiding’,u’chief’, u’sn’, u’sa’, u’travel’, u’se’, u’producer’,u’rabbi’, u’tsarina’, u’gyani’, u’scientist’, u’comtesse’,u’mayor’, u’developer’, u’superior’, u’archdeacon’,u’verderer’, u’theologian’, u’dr’, u’councillor’, u’maid’,u’lt’, u’ens’, u’co-chairs’, u’criminal’, u’fadm’, u’ceo’,u’goodwife’, u’comedienne’, u’brigadier’, u’commodore’,u’bgen’, u’investor’, u’mystery’, u’mathematician’,u’naturalist’, u’curator’, u’shehu’, u’neuroscientist’,u’rock’, u’maharajah’, u’financial’, u’catholicos’, u’group’,u’navy’, u’blues’, u’adjutant’, u’collector’, u’eminence’,u’special’, u’rt’, u’shayk’, u’1sgt’, u’3rd’, u’miss’,u’rep’, u’rev’, u’vadm’, u’reverend’, u’misses’, u’activist’,u’lord’, u’honorable’, u’sma’, u’associate’, u’marquise’,u’mme’, u’princess’, u’barrister’, u’monsignor’, u’british’,u’sheikh’, u’registrar’, u’generalissimo’, u’hajji’,u’first’, u’tirthankar’, u’mademoiselle’, u’playwright’,u’revenue’, u’researcher’, u’blogger’, u’ltjg’, u’smsgt’,u’elder’, u’sailor’, u’comic’, u’paleontologist’, u’co-founder’, u’engineer’, u’corporal’, u’maj’, u’district’,u’5th’, u’historian’, u’master’, u’sergeant’, u’burgess’,u’saint’, u’edmi’, u’solicitor’, u’burlesque’, u’treasurer’,u’correspondent’, u’mcpoc’, u’mcpon’, u’inventor’,u’king’, u’minister’, u’cartoonist’, u’states’, u’architect’,u’6th’, u’counselor’, u’countess’, u’printmaker’,u’anthropologist’, u’pro’, u’premier’, u’maharani’,u’comedian’, u’host’, u’tsar’, u’scpo’, u’goodman’,u’appellate’, u’educator’, u’pianist’, u’cwo5’, u’lecturer’,u’evangelist’, u’printer’, u’matriarch’, u’theatre’,u’exec’, u’english’, u’pharaoh’, u’majgen’, u’most’,u’assoc’, u’librarian’, u’mullah’, u’screenwriter’,u’presbyter’, u’singer’, u’duchesse’, u’docket’, u’professor’,u’mrs’, u’deacon’, u’aunt’, u’colonel’, u’marchess’,u’businessman’, u’senior’, u’ltc’, u’detective’, u’pope’,u’prin’, u’queen’, u’sheik’, u’briggen’, u’television ’,u’radio’, u’industrialist’, u’economist’, u’principal’,u’archeologist’, u’sheriff’, u’writer’, u’philantropist’,u’historien’, u’sainte’, u’apprentice’, u’headman’,u’personality’, u’do’, u’mister’, u’his’, u’psychiatrist’,u’assistant’, u’designated’, u’ecologist’, u’mgr’,u’singer-songwriter’, u’magistrate’, u’ssg’, u’banner’,u’gen’, u’prime’, u’businesswoman’, u’vizier’, u’cwo2’,u’srta’, u’linguist’, u’graf’, u’secretary’, u’1stlt’,u’pvt’, u’choreographer’, u’intelligence’, u’national’,u’memoirist’, u’tsgt’, u’analytics’, u’computer’, u’bard’,u’marchioness’, u’marquess’, u’compositeur’, u’arhat’,u’expert’, u’federal’, u’radm’, u’magistrate-judge’, u’state’,u’obstetritian’, u’discovery’, u’cartographer’, u’pv2’,u’criminologist’, u’archduke’, u’wm’, u’prior’, u’physicist’,u’jr’, u’adept’, u’police’, u’10th’, u’almoner’, u’wo5’,u’wo4’, u’wo1’, u’priestess’, u’wo3’, u’foreign’, u’award-winning’, u’col’, u’author’, u’majesty’, u’attache’, u’ltcol’,u’seigneur’, u’2nd’, u’dancer’, u’gysgt’, u’biographer’,u’technologist’, u’shaykh’, u’petty’, u’shaikh’, u’strategy’,u’arbitrator’, u’poet’, u’ssgt’, u’dame’, u’imam’, u’acolyte’,u’po3’, u’po1’, u’controller’, u’representative’, u’gaf’,u’instructor’, u’dpty’, u’painter’, u’pilot’, u’physician’,u’soccer’, u’politician’, u’consultant’, u’sultan’, u"chargxe9d’affaires", u’governor’, u’air’, u’cmsaf’, u’voice’,u’abbot’, u’elerunwon’, u’vc’, u’metropolitan’, u’resident’,u’attachxe9’, u’canon’, u’dissident’, u’monk’, u’player’,u’tenor’, u’wo2’, u’co-chair’, u’soldier’, u’sociologist’,u’member’, u’mobster’, u’speaker’, u’grand’, u’essayist’,u’biochemist’, u’marcher’, u’phd’, u’director’, u’warden’,u’senator’, u’vocalist’, u’priest’, u’theater’, u’mlle’,u’bailiff’, u’academic’, u’mother’, u’model’, u’corporate’,u’madame’, u’ambassador’, u’bearer’, u’madam’,u’executive’, u’actress’, u’biologist’, u’holiness’,u’prince’, u’pursuivant’, u’clergyman’, u’swordbearer’,u’photographer’, u’ltgen’, u’royal’, u’schoolmaster’,u’civil’, u’bench’, u’sgtmaj’, u’chieftain’, u’doyen’,u’prelate’, u’cdr’, u’adm’, u’warrant’, u’kingdom’,u’lyricist’, u’municipal’, u’amn’, u’capt’, u’chancellor’,u’advocate’, u’forester’, u’senior-judge’, u’judge’,u’anarchist’, u’lady’, u’rear’, u’lcpl’, u’chairs’, u’akhoond’,u’servant’, u’broadcaster’, u’journalist’, u’friar’,u’security’, u’attorney’, u’right’, u’classical’, u’staff’,u’astronomer’, u’shaik’, u’abolitionist’, u’mountaineer’,u’novelist’, u’1stsgt’, u’philosopher’, u’8th’, u’pioneer’,u’buddha’, u’prof’, u’leader’, u’officer’, u’mgysgt’,u’bg’, u’archduchess’, u’sgtmajmc’, u’marketing’,u’ornithologist’, u’lieutenant’, u’journeyman’, u’political’,u’cwo-3’, u’translator’, u’sister’, u’sra’, u’cwo-5’, u’cwo-4’, u’gentiluomo’, u’subedar’, u’pediatrician’, u’emperor’,u’software’, u’cheikh’, u’duke’, u’vicar’, u’auntie’,u’intendant’, u’1lt’, u’blessed’, u’empress’, u’entrepreneur’,u’saoshyant’, u’her’, u’zoologist’, u’flying’, u’sfc’,u’bookseller’, u’editor’, u’narrator’, u’pastor’, u’soprano’,u’uncle’, u’junior’, u’highness’, u’count’, u’illustrator’,u’marquis’, u’siddha’, u’cwo3’, u’pslc’, u’actor ’,u’vardapet’, u’us’, u’cwo4’, u’swami’, u’arranger’,u’uk’, u’heiress’, u’asst’, u’mcpo’, u’rangatira’,u’supreme’, u’ab’, u’opera’, u’general’, u’provost’,u"queen’s", u’historicus’, u’a1c’, u’pir’, u’bishop’,u’film’, u’commander-in-chief’, u’diplomat’, u’conductor’,u’operating’, u’bodhisattva’, u’guitarist’, u’bwana’,u’murshid’, u’field’, u’shekh’, u’mathematics’, u’wing’,u’chemist’, u’satirist’, u’woodman’, u’venerable’, u’po2’,u’druid’, u’mahdi’, u’rdml’, u’viscount’, u’bibliographer’,u’cpl’, u’ekegbian’, u’vice’, u’behavioral’, u’timi’,u’cpt’, u’animator’]), first_name_titles=set([u’cheikh’,u’pope’, u’auntie’, u’father’, u’queen’, u’sheik’,u’shaik’, u’shaykh’, u’sir’, u’shayk’, u’shaikh’, u’maid’,u’master’, u’shekh’, u’dame’, u’uncle’, u’king’, u’sister’,u’brother’, u’sheikh’, u’aunt’, u’mother’]), conjunc-tions=set([u’and’, u’e’, u’&’, u’of’, u’und’, u’y’,u’et’, u’the’]), capitalization_exceptions=((u’ii’, u’II’),(u’iii’, u’III’), (u’iv’, u’IV’), (u’md’, u’M.D.’), (u’phd’,u’Ph.D.’)), regexes=set([(u’mac’, <_sre.SRE_Patternobject>), (u’roman_numeral’, <_sre.SRE_Patternobject>), (u’double_quotes’, <_sre.SRE_Patternobject>), (u’word’, <_sre.SRE_Pattern object>),(u’emoji’, <_sre.SRE_Pattern object>), (u’parenthesis’,<_sre.SRE_Pattern object>), (u’initial’, <_sre.SRE_Patternobject>), (u’no_vowels’, <_sre.SRE_Pattern object at0x1a92400>), (u’period_not_at_end’, <_sre.SRE_Patternobject>), (u’phd’, <_sre.SRE_Pattern object>),(u’quoted_word’, <_sre.SRE_Pattern object>), (u’spaces’,<_sre.SRE_Pattern object>)]))

1.3. HumanName Class Documentation 15

Page 20: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

An instance of this class hold all of the configuration constants for the parser.

Parameters

• prefixes (set) – prefixes wrapped with SetManager.

• titles (set) – titles wrapped with SetManager.

• first_name_titles (set) – FIRST_NAME_TITLES wrapped with SetManager.

• suffix_acronyms (set) – SUFFIX_ACRONYMS wrapped with SetManager.

• suffix_not_acronyms (set) – SUFFIX_NOT_ACRONYMS wrapped withSetManager.

• conjunctions (set) – conjunctions wrapped with SetManager.

• capitalization_exceptions (tuple or dict) –CAPITALIZATION_EXCEPTIONS wrapped with TupleManager.

• regexes (tuple or dict) – regexes wrapped with TupleManager.

empty_attribute_default = u''Default return value for empty attributes.

>>> from nameparser.config import CONSTANTS>>> CONSTANTS.empty_attribute_default = None>>> name = HumanName("John Doe")>>> name.titleNone>>>name.first'John'

string_format = u'{title} {first} {middle} {last} {suffix} ({nickname})'The default string format use for all new HumanName instances.

class nameparser.config.SetManager(elements)Easily add and remove config variables per module or instance. Subclass of collections.Set.

Only special functionality beyond that provided by set() is to normalize constants for comparison (lower case,no periods) when they are add()ed and remove()d and allow passing multiple string arguments to the add()and remove() methods.

add(*strings)Add the lower case and no-period version of the string arguments to the set. Can pass a list of strings.Returns self for chaining.

add_with_encoding(s, encoding=None)Add the lower case and no-period version of the string to the set. Pass an explicit encoding parameter tospecify the encoding of binary strings that are not DEFAULT_ENCODING (UTF-8).

remove(*strings)Remove the lower case and no-period version of the string arguments from the set. Returns self forchaining.

class nameparser.config.TupleManagerA dictionary with dot.notation access. Subclass of dict. Makes the tuple constants more friendly.

1.3.3 HumanName.config Defaults

nameparser.config.titles.FIRST_NAME_TITLES = set([u'cheikh', u'pope', u'auntie', u'father', u'queen', u'sheik', u'shaik', u'shaykh', u'sir', u'shayk', u'shaikh', u'maid', u'master', u'shekh', u'dame', u'uncle', u'king', u'sister', u'brother', u'sheikh', u'aunt', u'mother'])When these titles appear with a single other name, that name is a first name, e.g. “Sir John”, “Sister Mary”,

16 Chapter 1. Parsing Names

Page 21: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

“Queen Elizabeth”.

nameparser.config.titles.TITLES = set([u'msgt', u'coach', u'founder', u'manager', u'legal', u'rebbe', u'chair', u'captain', u'ballet', u'baron', u'father', u'literary', u'keyboardist', u'ccmsgt', u'merchant', u'adviser', u'dutchess', u'lamido', u'mag/judge', u'surgeon', u'missionary', u'prefect', u'magnate', u'scholar', u'investigator', u'excellency', u'celebrity', u'brother', u'delegate', u'judicial', u'dir', u'cfo', u'sultana', u'docent', u'chef', u'honourable', u'lawyer', u'7th', u'subaltern', u'business', u'2ndlt', u'hereditary', u'nurse', u'jurist', u'admiral', u'9th', u'clerk', u'theorist', u'ranger', u'baseball', u'nanny', u'abbess', u'dramatist', u'teacher', u'knowledge', u'cyclist', u'publisher', u'comptroller', u'mpco-cg', u'technical', u'envoy', u'united', u'credit', u'musicologist', u'advertising', u'social', u'dra', u'military', u'mag-judge', u'cmsgt', u'family', u'deputy', u'courtier', u'sgt', u'private', u'sgm', u'composer', u'1st', u'bandleader', u'army', u'archbishop', u'archdruid', u'sysselmann', u'ayatollah', u'msg', u'pres', u'baba', u'pfc', u'lcdr', u'biblical', u'cwo-2', u'musician', u'heir', u'flag', u'excellent', u'commander', u'alderman', u'chaplain', u'md', u'mg', u'primate', u'patriarch', u'ms', u'mr', u'entertainer', u'giani', u'mufti', u'suffragist', u'division', u'tax', u'high', u'critic', u'cpo', u'2lt', u'spc', u'botanist', u'risk', u'csm', u'sir', u'lama', u'guru', u'hon', u'effendi', u'wo-1', u"king's", u'drummer', u'cardinal', u'ltg', u'banker', u'edohen', u'designer', u'information', u'customs', u'4th', u'mag', u'president', u'law', u'sr', u'doctor', u'psychologist', u'presiding', u'chief', u'sn', u'sa', u'travel', u'se', u'producer', u'rabbi', u'tsarina', u'gyani', u'scientist', u'comtesse', u'mayor', u'developer', u'superior', u'archdeacon', u'verderer', u'theologian', u'dr', u'councillor', u'maid', u'lt', u'ens', u'co-chairs', u'criminal', u'fadm', u'ceo', u'goodwife', u'comedienne', u'brigadier', u'commodore', u'bgen', u'investor', u'mystery', u'mathematician', u'naturalist', u'curator', u'shehu', u'neuroscientist', u'rock', u'maharajah', u'financial', u'catholicos', u'group', u'navy', u'blues', u'adjutant', u'collector', u'eminence', u'special', u'rt', u'shayk', u'1sgt', u'3rd', u'miss', u'rep', u'rev', u'vadm', u'reverend', u'misses', u'activist', u'lord', u'honorable', u'sma', u'associate', u'marquise', u'mme', u'princess', u'barrister', u'monsignor', u'british', u'sheikh', u'registrar', u'generalissimo', u'hajji', u'first', u'tirthankar', u'mademoiselle', u'playwright', u'revenue', u'researcher', u'blogger', u'ltjg', u'smsgt', u'elder', u'sailor', u'comic', u'paleontologist', u'co-founder', u'engineer', u'corporal', u'maj', u'district', u'5th', u'historian', u'master', u'sergeant', u'burgess', u'saint', u'edmi', u'solicitor', u'burlesque', u'treasurer', u'correspondent', u'mcpoc', u'mcpon', u'inventor', u'king', u'minister', u'cartoonist', u'states', u'architect', u'6th', u'counselor', u'countess', u'printmaker', u'anthropologist', u'pro', u'premier', u'maharani', u'comedian', u'host', u'tsar', u'scpo', u'goodman', u'appellate', u'educator', u'pianist', u'cwo5', u'lecturer', u'evangelist', u'printer', u'matriarch', u'theatre', u'exec', u'english', u'pharaoh', u'majgen', u'most', u'assoc', u'librarian', u'mullah', u'screenwriter', u'presbyter', u'singer', u'duchesse', u'docket', u'professor', u'mrs', u'deacon', u'aunt', u'colonel', u'marchess', u'businessman', u'senior', u'ltc', u'detective', u'pope', u'prin', u'queen', u'sheik', u'briggen', u'television ', u'radio', u'industrialist', u'economist', u'principal', u'archeologist', u'sheriff', u'writer', u'philantropist', u'historien', u'sainte', u'apprentice', u'headman', u'personality', u'do', u'mister', u'his', u'psychiatrist', u'assistant', u'designated', u'ecologist', u'mgr', u'singer-songwriter', u'magistrate', u'ssg', u'banner', u'gen', u'prime', u'businesswoman', u'vizier', u'cwo2', u'srta', u'linguist', u'graf', u'secretary', u'1stlt', u'pvt', u'choreographer', u'intelligence', u'national', u'memoirist', u'tsgt', u'analytics', u'computer', u'bard', u'marchioness', u'marquess', u'compositeur', u'arhat', u'expert', u'federal', u'radm', u'magistrate-judge', u'state', u'obstetritian', u'discovery', u'cartographer', u'pv2', u'criminologist', u'archduke', u'wm', u'prior', u'physicist', u'jr', u'adept', u'police', u'10th', u'almoner', u'wo5', u'wo4', u'wo1', u'priestess', u'wo3', u'foreign', u'award-winning', u'col', u'author', u'majesty', u'attache', u'ltcol', u'seigneur', u'2nd', u'dancer', u'gysgt', u'biographer', u'technologist', u'shaykh', u'petty', u'shaikh', u'strategy', u'arbitrator', u'poet', u'ssgt', u'dame', u'imam', u'acolyte', u'po3', u'po1', u'controller', u'representative', u'gaf', u'instructor', u'dpty', u'painter', u'pilot', u'physician', u'soccer', u'politician', u'consultant', u'sultan', u"charg\xe9 d'affaires", u'governor', u'air', u'cmsaf', u'voice', u'abbot', u'elerunwon', u'vc', u'metropolitan', u'resident', u'attach\xe9', u'canon', u'dissident', u'monk', u'player', u'tenor', u'wo2', u'co-chair', u'soldier', u'sociologist', u'member', u'mobster', u'speaker', u'grand', u'essayist', u'biochemist', u'marcher', u'phd', u'director', u'warden', u'senator', u'vocalist', u'priest', u'theater', u'mlle', u'bailiff', u'academic', u'mother', u'model', u'corporate', u'madame', u'ambassador', u'bearer', u'madam', u'executive', u'actress', u'biologist', u'holiness', u'prince', u'pursuivant', u'clergyman', u'swordbearer', u'photographer', u'ltgen', u'royal', u'schoolmaster', u'civil', u'bench', u'sgtmaj', u'chieftain', u'doyen', u'prelate', u'cdr', u'adm', u'warrant', u'kingdom', u'lyricist', u'municipal', u'amn', u'capt', u'chancellor', u'advocate', u'forester', u'senior-judge', u'judge', u'anarchist', u'lady', u'rear', u'lcpl', u'chairs', u'akhoond', u'servant', u'broadcaster', u'journalist', u'friar', u'security', u'attorney', u'right', u'classical', u'staff', u'astronomer', u'shaik', u'abolitionist', u'mountaineer', u'novelist', u'1stsgt', u'philosopher', u'8th', u'pioneer', u'buddha', u'prof', u'leader', u'officer', u'mgysgt', u'bg', u'archduchess', u'sgtmajmc', u'marketing', u'ornithologist', u'lieutenant', u'journeyman', u'political', u'cwo-3', u'translator', u'sister', u'sra', u'cwo-5', u'cwo-4', u'gentiluomo', u'subedar', u'pediatrician', u'emperor', u'software', u'cheikh', u'duke', u'vicar', u'auntie', u'intendant', u'1lt', u'blessed', u'empress', u'entrepreneur', u'saoshyant', u'her', u'zoologist', u'flying', u'sfc', u'bookseller', u'editor', u'narrator', u'pastor', u'soprano', u'uncle', u'junior', u'highness', u'count', u'illustrator', u'marquis', u'siddha', u'cwo3', u'pslc', u'actor ', u'vardapet', u'us', u'cwo4', u'swami', u'arranger', u'uk', u'heiress', u'asst', u'mcpo', u'rangatira', u'supreme', u'ab', u'opera', u'general', u'provost', u"queen's", u'historicus', u'a1c', u'pir', u'bishop', u'film', u'commander-in-chief', u'diplomat', u'conductor', u'operating', u'bodhisattva', u'guitarist', u'bwana', u'murshid', u'field', u'shekh', u'mathematics', u'wing', u'chemist', u'satirist', u'woodman', u'venerable', u'po2', u'druid', u'mahdi', u'rdml', u'viscount', u'bibliographer', u'cpl', u'ekegbian', u'vice', u'behavioral', u'timi', u'cpt', u'animator'])Cannot include things that could also be first names, e.g. “dean”. Many of these from wikipedia: https://en.wikipedia.org/wiki/Title. The parser recognizes chains of these including conjunctions allowing recognitiontitles like “Deputy Secretary of State”.

nameparser.config.suffixes.SUFFIX_ACRONYMS = set([u'dbe', u'kbe', u'gc', u'gm', u'lg', u'idsm', u'lt', u'mbe', u'clu', u'td', u'do', u'kcb', u'kcsi', u'dmd', u'gbe', u'cie', u'cmg', u'qc', u'gcvo', u'dpm', u'pmp', u'bart', u'ed', u'rrc', u'kcvo', u'cfp', u'dds', u'gcmg', u'rd', u'dso', u'dsm', u'dsc', u'dcm', u'cgm', u'chfc', u'dcb', u'cgc', u'kcie', u'bt', u'dcmg', u'gcsi', u'vrd', u'jd', u'mscmsm', u'obi', u'obe', u'iso', u'mvo', u'ch', u'cb', u'mba', u'sgm', u'vd', u'qpm', u'qgm', u'dfm', u'dcvo', u'dfc', u'om', u'md', u'ma', u'mc', u'bem', u'mm', u'erd', u'cvo', u'mp', u'ud', u'lvo', u'vc', u'ae', u'cbe', u'rvm', u'gcie', u'afm', u'gcb', u'arrc', u'qfsm', u'afc', u'qam', u'csm', u'kcmg', u'csi', u'phd', u'iom', u'phr', u'dvm', u'kg', u'cpm', u'cpa', u'kp', u'kt'])Post-nominal acronyms. Titles, degrees and other things people stick after their name that may or may not haveperiods between the letters. The parser removes periods when matching against these pieces.

nameparser.config.suffixes.SUFFIX_NOT_ACRONYMS = set([u'jnr', u'esq', u'i', u'sr', u'v', u'jr', u'iv', u'ii', u'2', u'snr', u'esquire', u'iii', u'dr', u'junior'])Post-nominal pieces that are not acronyms. The parser does not remove periods when matching against thesepieces.

nameparser.config.prefixes.PREFIXES = set([u'dela', u'san', u'von', u'le', u'degli', u'la', u'abu', u'dei', u'vel', u'bin', u'do', u'd\xed', u'di', u'dal', u'de', u'da', u'santa', u'van', u'du', u'ste', u'ibn', u'der', u'st', u'dello', u'del', u'bon', u'delli', u'dos', u'delle', u'della'])Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece.They can be chained together, e.g “von der” and “de la”. Because they only appear in middle or last names,they also signifiy that all following name pieces should be in the same name part, for example, “von” will bejoined to all following pieces that are not prefixes or suffixes, allowing recognition of double last names whenthey appear after a prefixes. So in “pennie von bergen wessels MD”, “von” will join with all following namepieces until the suffix “MD”, resulting in the correct parsing of the last name “von bergen wessels”.

nameparser.config.conjunctions.CONJUNCTIONS = set([u'and', u'e', u'&', u'of', u'und', u'y', u'et', u'the'])Pieces that should join to their neighboring pieces, e.g. “and”, “y” and “&”. “of” and “the” are also include tofacilitate joining multiple titles, e.g. “President of the United States”.

nameparser.config.capitalization.CAPITALIZATION_EXCEPTIONS = ((u'ii', u'II'), (u'iii', u'III'), (u'iv', u'IV'), (u'md', u'M.D.'), (u'phd', u'Ph.D.'))Any pieces that are not capitalized by capitalizing the first letter.

nameparser.config.regexes.REGEXES = set([(u'mac', <_sre.SRE_Pattern object>), (u'roman_numeral', <_sre.SRE_Pattern object>), (u'double_quotes', <_sre.SRE_Pattern object>), (u'word', <_sre.SRE_Pattern object>), (u'emoji', <_sre.SRE_Pattern object>), (u'parenthesis', <_sre.SRE_Pattern object>), (u'initial', <_sre.SRE_Pattern object>), (u'no_vowels', <_sre.SRE_Pattern object at 0x1a92400>), (u'period_not_at_end', <_sre.SRE_Pattern object>), (u'phd', <_sre.SRE_Pattern object>), (u'quoted_word', <_sre.SRE_Pattern object>), (u'spaces', <_sre.SRE_Pattern object>)])All regular expressions used by the parser are precompiled and stored in the config.

1.4 Naming Practices and Resources

• US_Census_Surname_Data_2000

• Naming_practice_guide_UK_2006

• Wikipedia_Anthroponymy

• Wikipedia_Naming_conventions

• Wikipedia_List_Of_Titles

1.5 Release Log

• 1.0.2 - Oct 26, 2018

– Fix handling of only nickname and last name (#78)

• 1.0.1 - August 30, 2018

– Fix overzealous regex for “Ph. D.” (#43)

– Add surnames attribute as aggregate of middle and last names

• 1.0.0 - August 30, 2018

1.4. Naming Practices and Resources 17

Page 22: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

– Fix support for nicknames in single quotes (#74)

– Change prefix handling to support prefixes on first names (#60)

– Fix prefix capitalization when not part of lastname (#70)

– Handle erroneous space in “Ph. D.” (#43)

• 0.5.8 - August 19, 2018

– Add “Junior” to suffixes (#76)

– Add “dra” and “srta” to titles (#77)

• 0.5.7 - June 16, 2018

– Fix doc link (#73)

– Fix handling of “do” and “dos” Portuguese prefixes (#71, #72)

• 0.5.6 - January 15, 2018

– Fix python version check (#64)

• 0.5.5 - January 10, 2018

– Support J.D. as suffix and Wm. as title

• 0.5.4 - December 10, 2017

– Add Dr to suffixes (#62)

– Add the full set of Italian derivatives from “di” (#59)

– Add parameter to specify the encoding of strings added to constants, use ‘UTF-8’ as fallback (#67)

– Fix handling of names composed entirely of conjunctions (#66)

• 0.5.3 - June 27, 2017

– Remove emojis from initial string by default with option to include emojis (#58)

• 0.5.2 - March 19, 2017

– Added names scrapped from VIAF data, thanks daryanypl (#57)

• 0.5.1 - August 12, 2016

– Fix error for names that end with conjunction (#54)

• 0.5.0 - August 4, 2016

– Refactor join_on_conjunctions(), fix #53

• 0.4.1 - July 25, 2016

– Remove “bishop” from titles because it also could be a first name

– Fix handling of lastname prefixes with periods, e.g. “Jane St. John” (#50)

• 0.4.0 - June 2, 2016

– Remove “CONSTANTS.suffixes”, replaced by “suffix_acronyms” and “suffix_not_acronyms” (#49)

– Add “du” to prefixes

– Add “sheikh” variations to titles

– Add parameter to force capitalization of mixed case strings

• 0.3.16 - March 24, 2016

18 Chapter 1. Parsing Names

Page 23: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

– Clarify LGPL licence version (#47)

– Skip pickle tests if pickle not installed (#48)

• 0.3.15 - March 21, 2016

– Fix string format when empty_attribute_default = None (#45)

– Include tests in release source tarball (#46)

• 0.3.14 - March 18, 2016

– Add CONSTANTS.empty_attribute_default to customize value returned for empty attributes (#44)

• 0.3.13 - March 14, 2016

– Improve string format handling (#41)

• 0.3.12 - March 13, 2016

– Fix first name clash with suffixes (#42)

– Fix encoding of constants added via the python shell

– Add “MSC” to suffixes, fix #41

• 0.3.11 - October 17, 2015

– Fix bug capitalization exceptions (#39)

• 0.3.10 - September 19, 2015

– Fix encoding of byte strings on python 2.x (#37)

• 0.3.9 - September 5, 2015

– Separate suffixes that are acronyms to handle periods differently, fixes #29, #21

– Don’t find titles after first name is filled, fixes (#27)

– Add “chair” titles (#37)

• 0.3.8 - September 2, 2015

– Use regex to check for roman numerals at end of name (#36)

– Add DVM to suffixes

• 0.3.7 - August 30, 2015

– Speed improvement, 3x faster

– Make HumanName instances pickleable

• 0.3.6 - August 6, 2015

– Fix strings that start with conjunctions (#20)

– handle assigning lists of names to a name attribute

– support dictionary-like assignment of name attributes

• 0.3.5 - August 4, 2015

– Fix handling of string encoding in python 2.x (#34)

– Add support for dictionary key access, e.g. name[‘first’]

– add ‘santa’ to prefixes, add ‘cpa’, ‘csm’, ‘phr’, ‘pmp’ to suffixes (#35)

– Fix prefixes before multi-part last names (#23)

1.5. Release Log 19

Page 24: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

– Fix capitalization bug (#30)

• 0.3.4 - March 1, 2015

– Fix #24, handle first name also a prefix

– Fix #26, last name comma format when lastname is also a title

• 0.3.3 - Aug 4, 2014

– Allow suffixes to be chained (#8)

– Handle trailing suffix in last name comma format (#3). Removes support for titles with periods but nospaces in them, e.g. “Lt.Gen.”. (#21)

• 0.3.2 - July 16, 2014

– Retain original string in “original” attribute.

– Collapse white space when using custom string format.

– Fix #19, single comma name format may have trailing suffix

• 0.3.1 - July 5, 2014

– Fix Pypi package, include new config module.

• 0.3.0 - July 4, 2014

– Refactor configuration to simplify modifications to constants (backwards incompatible)

– use unicode_literals to simplify Python 2 & 3 support.

– Generate documentation using sphinx and host on readthedocs.

• 0.2.10 - May 6, 2014

– If name is only a title and one part, assume it’s a last name instead of a first name, with exceptions forsome titles like ‘Sir’. (#7).

– Add some judicial and other common titles. (#9)

• 0.2.9 - Apr 1, 2014

– Add a new nickname attribute containing anything in parenthesis or double quotes (Issue 33).

• 0.2.8 - Oct 25, 2013

– Add support for Python 3.3+. Thanks to @corbinbs.

• 0.2.7 - Feb 13, 2013

– Fix bug with multiple conjunctions in title

– add legal and crown titles

• 0.2.6 - Feb 12, 2013

– Fix python 2.6 import error on logging.NullHandler

• 0.2.5 - Feb 11, 2013

– Set logging handler to NullHandler

– Remove ‘ben’ from PREFIXES because it’s more common as a name than a prefix.

– Deprecate BlankHumanNameError. Do not raise exceptions if full_name is empty string.

• 0.2.4 - Feb 10, 2013

– Adjust logging, don’t set basicConfig. Fix Issue 10 and Issue 26.

20 Chapter 1. Parsing Names

Page 25: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

– Fix handling of single lower case initials that are also conjunctions, e.g. “john e smith”. Re Issue 11.

– Fix handling of initials with no space separation, e.g. “E.T. Jones”. Fix #11.

– Do not remove period from first name, when present.

– Remove ‘e’ from PREFIXES because it is handled as a conjunction.

– Python 2.7+ required to run the tests. Mark known failures.

– tests/test.py can now take an optional name argument that will return repr() for that name.

• 0.2.3 - Fix overzealous “Mac” regex

• 0.2.2 - Fix parsing error

• 0.2.0

– Significant refactor of parsing logic. Handle conjunctions and prefixes before parsing into attributebuckets.

– Support attribute overriding by assignment.

– Support multiple titles.

– Lowercase titles constants to fix bug with comparison.

– Move documentation to README.rst, add release log.

• 0.1.4 - Use set() in constants for improved speed. setuptools compatibility - sketerpot

• 0.1.3 - Add capitalization feature - twotwo

• 0.1.2 - Add slice support

1.6 Contributing

The project is hosted on GitHub:

https://github.com/derek73/python-nameparser

Find more information about running tests and contributing the project at the projects contribution guide.

https://github.com/derek73/python-nameparser/blob/master/CONTRIBUTING.md

1.6. Contributing 21

Page 26: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

22 Chapter 1. Parsing Names

Page 27: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

CHAPTER 2

Indices and tables

• genindex

• modindex

• search

GitHub Project: https://github.com/derek73/python-nameparser

23

Page 28: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

24 Chapter 2. Indices and tables

Page 29: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Python Module Index

nnameparser.config, 13nameparser.config.capitalization, 17nameparser.config.conjunctions, 17nameparser.config.prefixes, 17nameparser.config.regexes, 17nameparser.config.suffixes, 17nameparser.config.titles, 16nameparser.parser, 10

25

Page 30: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

26 Python Module Index

Page 31: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Index

Symbols__eq__() (nameparser.parser.HumanName method), 10__init__() (nameparser.parser.HumanName method), 10

Aadd() (nameparser.config.SetManager method), 16add_with_encoding() (nameparser.config.SetManager

method), 16are_suffixes() (nameparser.parser.HumanName method),

10as_dict() (nameparser.parser.HumanName method), 10

CC (nameparser.parser.HumanName attribute), 10CAPITALIZATION_EXCEPTIONS (in module

nameparser.config.capitalization), 17capitalize() (nameparser.parser.HumanName method), 11CONJUNCTIONS (in module

nameparser.config.conjunctions), 17Constants (class in nameparser.config), 14CONSTANTS (in module nameparser.config), 14

Eempty_attribute_default (nameparser.config.Constants at-

tribute), 16

Ffirst (nameparser.parser.HumanName attribute), 11FIRST_NAME_TITLES (in module

nameparser.config.titles), 16full_name (nameparser.parser.HumanName attribute), 11

Hhandle_firstnames() (nameparser.parser.HumanName

method), 11has_own_config (nameparser.parser.HumanName at-

tribute), 11HumanName (class in nameparser.parser), 10

Iis_an_initial() (nameparser.parser.HumanName method),

11is_conjunction() (nameparser.parser.HumanName

method), 11is_prefix() (nameparser.parser.HumanName method), 11is_roman_numeral() (nameparser.parser.HumanName

method), 11is_rootname() (nameparser.parser.HumanName method),

11is_suffix() (nameparser.parser.HumanName method), 11is_title() (nameparser.parser.HumanName method), 12

Jjoin_on_conjunctions() (nameparser.parser.HumanName

method), 12

Llast (nameparser.parser.HumanName attribute), 12

Mmiddle (nameparser.parser.HumanName attribute), 12

Nnameparser.config (module), 13nameparser.config.capitalization (module), 17nameparser.config.conjunctions (module), 17nameparser.config.prefixes (module), 17nameparser.config.regexes (module), 17nameparser.config.suffixes (module), 17nameparser.config.titles (module), 16nameparser.parser (module), 10nickname (nameparser.parser.HumanName attribute), 12

Ooriginal (nameparser.parser.HumanName attribute), 12

Pparse_full_name() (nameparser.parser.HumanName

method), 12

27

Page 32: Nameparser Documentation - Read the Docs...Nameparser Documentation, Release 1.0.2 Version 1.0.2 A simple Python module for parsing human names into their individual components. •hn.title

Nameparser Documentation, Release 1.0.2

parse_nicknames() (nameparser.parser.HumanNamemethod), 12

parse_pieces() (nameparser.parser.HumanName method),12

post_process() (nameparser.parser.HumanName method),13

pre_process() (nameparser.parser.HumanName method),13

PREFIXES (in module nameparser.config.prefixes), 17

RREGEXES (in module nameparser.config.regexes), 17remove() (nameparser.config.SetManager method), 16

SSetManager (class in nameparser.config), 16squash_emoji() (nameparser.parser.HumanName

method), 13string_format (nameparser.config.Constants attribute), 16suffix (nameparser.parser.HumanName attribute), 13SUFFIX_ACRONYMS (in module

nameparser.config.suffixes), 17SUFFIX_NOT_ACRONYMS (in module

nameparser.config.suffixes), 17surnames (nameparser.parser.HumanName attribute), 13surnames_list (nameparser.parser.HumanName attribute),

13

Ttitle (nameparser.parser.HumanName attribute), 13TITLES (in module nameparser.config.titles), 17TupleManager (class in nameparser.config), 16

28 Index