Upload
sweenydave
View
222
Download
0
Embed Size (px)
Citation preview
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
1/32
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
2/32
New York London
Problem 1
Images: Ambro / FreeDigitalPhotos.net
How do lawyers scan, file, store & shareclients case documents efficiently?
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
3/32
slambo_42@
flickr
AnotoAB@flickr
!"#
!%#
&"#
How do doctors, patients &researchers distribute & sharemedical records efficiently?
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
4/32
"#$%&'( "&()(*&)+
,(-./0.#(1&/2 ,34 )'$%%5%(/
)((0)+ $%6#$/ 789 1&/2#+:&(' /);
1)&4> #1(%$-2&6 %(..%-
789 1&/2#+:&(' /);
?0-/#:&)( @)(A1&/2#0/ ,34 )'$%%5%(/
The FATCA LegislationTakes effect 1 January 2013
Problem 3
How can a financial institution find U.S. citizensin masses of paperwork efficiently?
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
5/32
How much time do we actually spend on
4%)$*2&('B ')/2%$&(' &(C#
D$&.(' %5)&+-
?$%).(' :#*-
E()+FG&(' &(C#
3%
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
6/32
introduction
unstructured datareal life problems
unstructured data& text analytics
metadata
in legal domain
healthcare
records issues
conclusions
compliancein finance
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
7/32
'()*+,
!-.(/,
0(1*2.132*
43)(+
5*6,
7-.8*,
9+:(./
%*)(.
;.1.
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
8/32
Text MiningNaturalLanguageProcessing
unstructured data
Opinion Mining
Business Intelligence
Document Organization
Data ExtractionSearch
Machine Learning
Text Processing
StatisticsLinguistics
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
9/32
What can one minefrom unstructured data?
text text texttext text texttext text texttext text texttext text text
text text text
sentiment
keywordstags
genre
categoriestaxonomy terms
entities
names patterns biochemicalentitiestext text text
text text text
text text text
text text text
text text text
text text text
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
10/32
'()*+,
!-.(/,
0(1*2.132*
43)(+
5*6,
7-.8*,
9+:(./
%*)(.
;.1.
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
11/32
text text texttext text texttext text texttext text texttext text text
text text text
People U.S. politicians News aboutU.S. politicians
News
4/$0*/0$%:
@+#'&*)+
:)/)
=(&U0% &:%(.V%$-
W&/%$)/0$% $%C%$%(*%-
I;6%$/-X
)((#/).#(
YC$%% /%;/Z
Structured & unstructured data interplay
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
12/32
introduction
unstructured datareal life problems
unstructured data& text analytics
metadata
in legal domain
healthcare
records issues
conclusions
compliancein finance
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
13/32
-*)(
#*$
5%/):)/)
:5-
-)
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
14/32
Assigning metadata(approximation)
15 docs per day3 min per doc0.75 h per day
240 working days per year
$200 hourly charge
$36,000 per year per lawyer
Keyword extraction0.0027 min per doc
10 minfor yearly worth of docs
jacockshaw@
flickr
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
15/32
,(/%'$).('
5%/):)/)
%;/$)*.#(
1&/2
-*)((&('
2[6QRR111>F#0/0@%>*#5R1)/*2\
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
16/32
5%/):)/)
:5-
Efficient(legal) document processing pipeline
keywordstags
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
17/32
introduction
unstructured datareal life problems
unstructured data& text analytics
metadata
in legal domain
healthcare
records issues
conclusions
compliancein finance
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
18/32
!"#
!%#
&"#
slambo_42@
flickr
AnotoAB@flickr
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
19/32
!%#
!"#
&"#
`).#()+ E++&)(*% C#$ a%)+/2 ,(C#$5).#( K%*2(#+#'F
Y`Ea,KZ
:%V(&.#(-
b&-*#(.(0%:c
\
L>
`)5%B @&$/2 :)/%B @+##: /F6%
^>
I5%$'%(*F *#(/)*/Y-Z
7> J$&5)$F *)$%'&
d%:&*&(%-B :#-)'%-B )(: 2#1 +#('
/)A%(
_> E++%$'&%-R)++%$'&* $%)*.#(-
P> b)/% #C +)-/ 62F-&*)+
M>
b)/%-R$%-0+/- #C /%-/- )(:
-*$%%(&('-
e> d)f#$ &++(%--%-R-0$'%$&%- )(: /2%&$
:)/%-O>
?2$#(&* :&-%)-%-
L8>
")5&+F &++(%-- 2&-/#$F
LL> g
>?@ABB666CD/-CD(>C8+EB-*)/(D*@/3,B-.8.F(D*B
&"7
)*G()*DHI:.H+D @2+:*,,
L>
`)5%B @&$/2 :)/%B@+##: /F6%
^>
I5%$'%(*F *#(/)*/Y-Z
7>
J$&5)$F *)$%'&
d%:&*&(%-B :#-)'%-B )(: 2#1 +#('
/)A%(
_>
E++%$'&%-R)++%$'&* $%)*.#(-
P> b)/% #C +)-/ 62F-&*)+
M> b)/%-R$%-0+/- #C /%-/- )(:
-*$%%(&('-
e>
d)f#$ &++(%--%-R-0$'%$&%- )(: /2%&$
:)/%-O>
?2$#(&* :&-%)-%-
L8>
")5&+F &++(%-- 2&-/#$F
LL>
g
>?@ABB666CD/-CD(>C8+EB-*)/(D*@/3,B-.8.F(D*B
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
20/32
d%:&*)+ $%-%)$*2%$-
0-% 6).%(/ $%*#$:-
C#$ :&-*#
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
21/32
666C>:@2+C:+-
,(/(:+D.D8/*C:+-B
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
22/32
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
23/32
K2)(A- C#$ :&-*0--(-Q
`&')5 42)2B 4/)(C#$:
I(%&:) d%(:#(*)B =D&(-*#-&(B d):&-#(
,$%() 46)-&*B ?)$:&o =(&
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
24/32
introduction
unstructured datareal life problems
metadata
in legal domain
conclusions
compliancein finance unstructured data
& text analytics
healthcare
records issues
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
25/32
"#$%&'( "&()(*&)+
,(-./0.#(1&/2 ,34 )'$%%5%(/
)((0)+ $%6#$/ 789 1&/2#+:&(' /);
1)&4> #1(%$-2&6 %(..%-
789 1&/2#+:&(' /);
?0-/#:&)( @)(A1&/2#0/ ,34 )'$%%5%(/
The FATCA LegislationTakes effect 1 January 2013
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
26/32
FATCA COMPLIANCE STEP 1Detect U.S. citizenship indicators
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
27/32
Recommended Solutionfrom FATCA Legislation:
Query an electronic database usingstandard queries in programming languages
Adopt similar approaches as used for theAnti-money-laundering and Know-your-customerrequirements
Note that information, data, or files are notelectronically searchable if they are stored asimages
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
28/32
1)+5&(AB/2#5
1)/-#(pq&A$
FATCA COMPLIANCE STEP 2Contact client for additional info or a waver
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
29/32
Actual Solutionfor the FATCA Legislation:
#*$
+&(A )()+F-&-
%(./F %;/$)*.#(
)()+F-&-
')/2%$ /2% /$)&+ *+&%(/X- :)/)
*#(
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
30/32
EfficientFATCA Compliance
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
31/32
introduction
unstructured datareal life problems
metadata
in legal domain
healthcare
records issues
conclusions
compliancein finance unstructured data
& text analytics
healthcare
records issues
8/12/2019 Mining Unstructured Data_ Practical Applications Presentation
32/32
Alyona Medelyan, PhD@zelandiya
Anna Divoli, PhD@annadivoli
Natural Language Processing
Text MiningWikipedia MiningMachine Learning
Try out text analytics provided by the Pingar API!Online demo: apidemo.pingar.com
Free Sandbox account: pingar.com/get-the-api
Biomedical Text MiningSearch User InterfacesHuman FactorsKnowledge Discovery