Upload
digaai
View
188
Download
3
Embed Size (px)
Citation preview
ONOMASTICS TO REVEAL
> GENDER GAP / GENDER PAY GAP
> DIASPORAS AND OTHER DIVERSITY ANALYTICS
Alvaro LIMA, Boston Redevelopment Authority Elian CARSENAT, NamSor Applied Onomastics
1
2015-08
2
NamSor Gender API extracts the likely gender of
personal names
(ex. Andrea Rossini : Male; Andrea Parker : Female)
The gender gap in City of Boston employees
Original file : Employee_Earnings_Report_2012.xlsx
Genderized : output_employees_genderized.xlsx
3 simple steps to view the gender gap 3
Read examples of industry wide studies (airline pilots, Hollywood, start-ups, ...)
http://gendergapgrader.com/
Employees List City
Example – Boston City Employees
4
RapidMiner +
NamSor API
Employee_Earnings_Report_2012.xls
(no gender information in original doc)
genderized document, with detail to make
gender gap / gender pay gap statistics
Both First Name and Last Names
are used to infer gender:
-Andrea Rossini -> Male
-Andrea Parker -> Female
-O. Sokolova -> Female
-N.S.->Unknown
-Olga S. -> Female
1 2 3
Gender Gap By Department 5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Transportation-Parking Clerk
Law Department
ASD Human Resources
Boston Public Schools
Boston Public Library
State Boston Retirement Syst
Dept of Voter Mobilization
Neighborhood Development
Boston City Council
Elderly Commission
Assessing Department
Boston Cntr - Youth & Families
Transportation Department
Dpt of Innovation & Technology
Inspectional Services Dept
Boston Police Department
Property Management
Parks Department
Public Works Department
Boston Fire Department
Boston City Gender Gap by dept having 50+ employees
%M
%F
%U
More Male
More Female
Source: output_employees_genderized.xlsx
Gender Gap By Earnings Range 6
2344
5488
303
3390
4863
15
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
22026+ 59874+ 162754+
Boston City Gender Gap Count of employees with
total earnings > $22k, $59k, $162k
Female
Male
Source: output_employees_genderized.xlsx
Gender Pay Gap – average earnings 7
- 20,000 40,000 60,000 80,000 100,000 120,000 140,000
Boston Public Schools:Teacher
Boston Public Schools:Librarian
Boston Police Department:Police Officer
Boston Police Department:Police Detective
Transportation Department:Parking Meter Supervisor
Inspectional Services Dept:Housing Inspector
Dpt of Innovation & Technology:Sr Data Proc Sys Anl
Dpt of Innovation & Technology:Data Proc Sys Analyst
Boston City Gender Pay Gap Average earnings by Dept by Title
(NB: Median should be used instead)
Female Avg
Male Avg
Source: output_employees_genderized.xlsx
Is it accurate? Testing with Boston Voters List
8
NamSor API
Output (B)
NamSor API
Input Declared
gender (A) How often A=B?
Row Labels Female Male Unknown (blank) Grand Total Precision Recall
F 194442 9625 1884 205951 95.3% 99.1%
M 6249 165069 1779 173097 96.4% 99.0%
(blank) 3339 3688 355 7382
Grand Total 204030 178382 4018 386430 95.8% 99.0%
Precision > 95%
The gender gap is measured accurately
9
Using Declared Gender Estimating Gender using Names
45%
53%
2%
Voters List Gender Gap (actual)
M
F
U
47%
53%
0%
Voters List Gender Gap (inferred)
Male
Female
Unknown
Error rate in range 0% to 2%, usually <1% (NB can vary based on demographics, ex gender can’t be inferred of Chinese, Korean names)
10
NamSor Origin API extracts the likely country/culture of origin of personal names
Diversity of origin in City of Boston employees Original file : Employee_Earnings_Report_2012.xlsx
Origins : output_employees_origined.xlsx
Diversity of origin in City of Boston voters
Original file : Voters List.txt
Origins : Voters_Origined.xlsx
<CAVEAT> 11
So far, at NamSor, we have worked mostly on international projects related to highly qualified migrants (company directors, scientists, inventors, ...)
In the context of a US City, we believe our unique onomastics technology (using names) should be combined with other more traditional approaches to geo-demographics (census, qualitative surveys ...)
We try to innovate and bring a different view, but we also hope to work hand-in-hand with you to reconcile the data coming from traditional geo-demographics (such as gender, race and ethnicity) with the fine grain information that names can bring in (such as gender, linguistic root, likely country/region of origin, ...)
</CAVEAT>
Boston City Employees Diversity of origin/culture by Department
35% of Employees of
the Boston Fire Dept
have an Irish Name
15% of Employees of
Boston Public Schools
have an Irish Name
12
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Boston City Employees – by Dept and Origin
Fire Dept
35% Irish
Public Schools
15% Irish
Irish Names
Boston Voters List Diversity varies district by district
13
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
02
12
4
02
13
0
02
13
5
02
12
7
02
13
6
02
13
1
02
13
2
02
12
5
02
11
8
02
12
8
02
11
9
02
11
6
02
12
1
02
12
2
02
12
9
02
12
6
02
11
5
02
13
4
02
21
5
02
11
4
02
12
0
02
11
3
02
11
1
02
10
8
02
10
9
02
21
0
02
11
0
02
46
7
02
19
9
02
16
3
02
16
9
02
44
5
02
44
6
02
02
6
Boston City Geo-demographics by ZipCode by likely Origin Russian Federation
Ghana
China
Austria
Czech Republic
Uganda
Benin
Belgium
Netherlands
Greece
Viet Nam
Kenya
Sweden
Switzerland
Portugal
South Africa
Germany
Italy
Spain
France
Ireland
British
Source: Voters_Origined.xlsx
Boston Voters List Ex. Portuguese, Spanish, Italian Names
14
ZIP 02125 : About 21% of people with a
Portuguese name live in ZIP 02125, whereas only
6.6% of people with a Spanish name, 3.8% of
people with an Italian name live in that district.
ZIP 02128: Only 5.8% of people with a Portuguese
name live in ZIP 02128, whereas 10.9% of people
with a Spanish name, 15.2% of people with an
Italian name live in that district.
Source: Voters_Origined.xlsx
Boston Voters List: Diversity by occupation Interesting ones: Firefighter, CNA, Cleaner, Waiter, Economist, ...
15
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
UN
KN
OW
N
STU
DEN
T
AT
HO
ME
ATT
ORN
EY
CLE
RK
ENG
INEE
R
PH
YSIC
IAN
AD
MIN
ASST
HO
MEM
AK
ER
LABO
RER
SEC
RET
ARY
SO
CIA
L W
ORKER
PRO
FESSO
R
ARC
HIT
ECT
TEC
HN
ICIA
N
SEL
F EM
PLO
YED
CA
RPEN
TER
REA
L ES
TATE
EDU
CA
TOR
BA
NKER
CO
ORD
INA
TOR
CA
SH
IER
MU
SIC
IAN
LIBRA
RIA
N
(bla
nk)
ELEC
TRIC
IAN
SEC
URIT
Y
RN
EDU
CA
TIO
N
THER
APIS
T
CU
STO
DIA
N
REA
LTO
R
HO
USEK
EEPER
REC
EPTI
ON
IST
SU
PER
VIS
OR
CO
NSTR
UC
TIO
N
BU
S D
RIV
ER
OPER
ATO
R
RES
EARC
H
PA
INTE
R
HEA
LTH
CA
RE
CLE
AN
ER
PLU
MBER
HEA
LTH
CA
RE
FOO
D S
ERV
ICE
AD
VER
TISIN
G
AU
DIT
OR
BU
SIN
ESS O
WN
ER
Source: Voters_Origined.xlsx
Boston Voters List by Occupation and by Origin
16
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
British Ireland France Italy Spain Portugal Russian China India Pakistan Somalia Ukraine Thailand Japan
Boston City Voters - occupation breakdown by onomastic class
OTHER
TEACHER
STUDENT
RETIRED
Source: Voters_Origined.xlsx
US Census vs NamSor geo-demographics
17
In July 2015, the US Government announced new
rules that will require all cities and towns receiving
federal housing funds to assess patterns of
segregation.
The NY Times has published interactive maps of
Boston geo-demographics, which we can compare
with the information inferred by NamSor
US Census Race Map of Boston 18
http://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html
Using Voters List+NamSor is finer-grain
US Census: 1pixel = 40 inhabitants
Voters List: 1 pixel = 1 voter
19
Voter’s list:
we can zoom further into 051200
US Census
Voters List + NamSor
20
Voter’s List: breaking down ‘white’
In 051200, ‘White’ is
dominent
‘White’ can be broken down further
(Southern, Northern, Eastern, Western Europe)
21
Voter’s List:
distinguishing Portuguese, Spanish, Italian
22
Current limitation (we’re working on it):
NamSor doesn’t see Black 23
The Census block 101001 is 83% composed of Black inhabitants, but NamSor
recognizes European names. That can be resolved by further work on the
sociolinguistics and relations between firstName and lastName.
Analysing other databases would provide
valuable information about: 24
Local Businesses in Boston
Start-ups / VC / Business Angels in Boston
Students, Professors & Researchers in Boston
Tourism in Boston
...
NamSor Pricing 25
NamSor API price is based on volume, typically
130€ / 10,000 data rows processed
plus cost of acquisition of source data
We can support our clients with training, data sourcing, data filtering, methodology, mapping
950€ plus Tax (Time & Material, daily rate)
Price range for a typical engagement
4000k€ to 45k€, on average 15K€
Conclusion 26
As a pilot, we analysed two databases (Employees,
Voters List) and discovered interesting patterns in
gender diversity as well as diversity of
culture/origin in Boston City voters and employees.
Other projects could use such data, in order to
make Boston City even more inclusive to men and
women of all origins.
The technology is simple to use and affordable.
Merci !
http://fdimagnet.com/
http://namsor.com/
27
Juillet 2013, Ambassade de Lituanie à Paris
+33 6 52 77 99 07
Twitter @NamSor_com