Upload
avram-lang
View
37
Download
1
Embed Size (px)
DESCRIPTION
IOP ’06 Open Source Intelligence Lesson Learned. Issues in using open source for intelligence. Growth and complexity of heterogeneous content. Not all open source data is equal – Quantities vs. Qualitative. Requirements of Ecoinformatics Architectures. - PowerPoint PPT Presentation
Citation preview
2
Almaden Research Center
© 2006 IBM Corporation
I
Issues in using open source for intelligence
Growth and complexity of heterogeneous content
Not all open source data is equal – Quantities vs. Qualitative
Requirements of Ecoinformatics Architectures
3
Almaden Research Center
© 2006 IBM Corporation
ISource: IBM 2005 GTOYears
1024 = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year
Digital content is growing at dramatic rate
4
Almaden Research Center
© 2006 IBM Corporation
I Source: IBM 2005 GTO
The scale of open source data and its heterogeneous form increases complexity of extracting intelligence
Stora
ge o
nlin
e
Med
ical
dat
a st
ored
Perso
nal m
ultim
edia
Surve
illan
ce b
ytes
Photo
s m
ultim
edia
Scalable
Heterogeneity
Inte
llige
nce
Struct
ured
dat
a
Free
from
text
109
1012
1015
1021
1024
1027
5
Almaden Research Center
© 2006 IBM Corporation
I
Industry Publication
Company Internal Content
Company Publication
Industry Journals
Conference Proceedings
NGO Publications
Website affiliated with an organization
User Groups / Forums
News Letters
Content Aggregators
News & Press Releases
Legal Filings
Government Publications
Blogs / Weblogs
Non affiliated Websites Qualitative
Quantitative
Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses
sou
rces
in
th
e p
erip
her
y These are authoritative sources, where data is trusted and is defended
These are credentialed opinions , the source is
known and can be weighted
Open opinion, it is impossible to verify the authority of the source
6
Almaden Research Center
© 2006 IBM Corporation
I
Ecoinformatics Architectures need to be multi-layered
Cross-Page Annotators
ClassificationClassificationClusteringClustering CommunitiesCommunities RankingRanking
Applications
Network Associations
Network AssociationsSearch Search Topic
TrackingTopic
TrackingBuzz
AnalysisBuzz
Analysis
Per-Page Annotators
Auto Entity Spotters
Auto Entity Spotters
Auto Geography
Spotter
Auto Geography
Spotter
Porn & Dup Detection
Porn & Dup Detection
CustomerTaxonomy
Spotter
CustomerTaxonomy
Spotter10
0’s
10
00
’s
(pa
ge
s/se
con
d)
World Wide Web
BlogsNewspapers
Licensed Feeds Data BasesIntranet DataTaxonomies
Commercial Date Bases
IndexStore
Un-Structured DataDATA ACQUISITION
Structured Data
Parsing/Tokenizing
Annotation Searching
NaturalClustering
NaturalClustering
Affinity Analysis
Affinity Analysis
Snippet Analysis
Snippet Analysis
TrendingTrending
Performance Management
DrugResearch
Business Insights Workbench
Customer Applications
10
’s
Rel
evan
cy
Vo
lum
e
WebFountain
Business Insights Workbench
WS OminFind II
IndexStore
DATA ACQUISITION
Date SpottersDate Spotters Language SpottersLanguage Spotters Source SpottersSource Spotters
7
Almaden Research Center
© 2006 IBM Corporation
I 0
10
20
30
40
50
60
70
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005
# o
f We
b P
ag
es(0
00)
0
20
40
60
80
100
120
140
2001 2002 2003 2004 2005
# o
f We
b P
ag
es(0
00)
Year
0.0%0.5%1.0%1.5%2.0%2.5%3.0%3.5%4.0%4.5%
Congr
essm
an
Rob S
imm
ons
Dougla
s
Rushk
off Elio
t
Jard
ines
Majo
r Gen
eral
Patric
k Cam
mae
rt
Mr A
rno
Reuse
rRob
ert
Steele
Open Source Trend on Web
Some event happened in August
% o
f O
SI
we
b d
ocu
me
nts
One dominant voice
Finding intelligence can require different view of the same information
8
Almaden Research Center
© 2006 IBM Corporation
I
Robert Steele 6,440,000"Robert Steele" 170,000"Robert Steele" and Open Source Intelligence 2,400"Robert David Steele" and "Open Source Intelligence" within 5 words 73
Context
Network of Conference Attendees to auto-spotted Companies and Universities
In this network view we don’t care about
association with “Open Source Intelligence” but
with companies and universities
9
Almaden Research Center
© 2006 IBM Corporation
I
Computers don’t create intelligence, people do – computers enable smart people
Not all open source content is equal – know the sources
Not every thing you see is right – it’s all about the CONTEXT
Ecoinformation architecture supports- Large scale analytics of open source content- Integration of content other than open source- Power text analytic tools to support analysis of on topic stores
Conclusions on Open Source Intelligence