9
Almaden Research Center © 2006 IBM Corporation IOP ’06 Open Source Intelligence Lesson Learned

IOP ’06 Open Source Intelligence Lesson Learned

Embed Size (px)

DESCRIPTION

IOP ’06 Open Source Intelligence Lesson Learned. Issues in using open source for intelligence. Growth and complexity of heterogeneous content. Not all open source data is equal – Quantities vs. Qualitative. Requirements of Ecoinformatics Architectures. - PowerPoint PPT Presentation

Citation preview

Almaden Research Center

© 2006 IBM Corporation

IOP ’06Open Source Intelligence Lesson Learned

2

Almaden Research Center

© 2006 IBM Corporation

I

Issues in using open source for intelligence

Growth and complexity of heterogeneous content

Not all open source data is equal – Quantities vs. Qualitative

Requirements of Ecoinformatics Architectures

3

Almaden Research Center

© 2006 IBM Corporation

ISource: IBM 2005 GTOYears

1024 = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year

Digital content is growing at dramatic rate

4

Almaden Research Center

© 2006 IBM Corporation

I Source: IBM 2005 GTO

The scale of open source data and its heterogeneous form increases complexity of extracting intelligence

Stora

ge o

nlin

e

Med

ical

dat

a st

ored

Perso

nal m

ultim

edia

Surve

illan

ce b

ytes

Photo

s m

ultim

edia

Scalable

Heterogeneity

Inte

llige

nce

Struct

ured

dat

a

Free

from

text

109

1012

1015

1021

1024

1027

5

Almaden Research Center

© 2006 IBM Corporation

I

Industry Publication

Company Internal Content

Company Publication

Industry Journals

Conference Proceedings

NGO Publications

Website affiliated with an organization

User Groups / Forums

News Letters

Content Aggregators

News & Press Releases

Legal Filings

Government Publications

Blogs / Weblogs

Non affiliated Websites Qualitative

Quantitative

Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses

sou

rces

in

th

e p

erip

her

y These are authoritative sources, where data is trusted and is defended

These are credentialed opinions , the source is

known and can be weighted

Open opinion, it is impossible to verify the authority of the source

6

Almaden Research Center

© 2006 IBM Corporation

I

Ecoinformatics Architectures need to be multi-layered

Cross-Page Annotators

ClassificationClassificationClusteringClustering CommunitiesCommunities RankingRanking

Applications

Network Associations

Network AssociationsSearch Search Topic

TrackingTopic

TrackingBuzz

AnalysisBuzz

Analysis

Per-Page Annotators

Auto Entity Spotters

Auto Entity Spotters

Auto Geography

Spotter

Auto Geography

Spotter

Porn & Dup Detection

Porn & Dup Detection

CustomerTaxonomy

Spotter

CustomerTaxonomy

Spotter10

0’s

10

00

’s

(pa

ge

s/se

con

d)

World Wide Web

BlogsNewspapers

Licensed Feeds Data BasesIntranet DataTaxonomies

Commercial Date Bases

IndexStore

Un-Structured DataDATA ACQUISITION

Structured Data

Parsing/Tokenizing

Annotation Searching

NaturalClustering

NaturalClustering

Affinity Analysis

Affinity Analysis

Snippet Analysis

Snippet Analysis

TrendingTrending

Performance Management

DrugResearch

Business Insights Workbench

Customer Applications

10

’s

Rel

evan

cy

Vo

lum

e

WebFountain

Business Insights Workbench

WS OminFind II

IndexStore

DATA ACQUISITION

Date SpottersDate Spotters Language SpottersLanguage Spotters Source SpottersSource Spotters

7

Almaden Research Center

© 2006 IBM Corporation

I 0

10

20

30

40

50

60

70

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005

# o

f We

b P

ag

es(0

00)

0

20

40

60

80

100

120

140

2001 2002 2003 2004 2005

# o

f We

b P

ag

es(0

00)

Year

0.0%0.5%1.0%1.5%2.0%2.5%3.0%3.5%4.0%4.5%

Congr

essm

an

Rob S

imm

ons

Dougla

s

Rushk

off Elio

t

Jard

ines

Majo

r Gen

eral

Patric

k Cam

mae

rt

Mr A

rno

Reuse

rRob

ert

Steele

Open Source Trend on Web

Some event happened in August

% o

f O

SI

we

b d

ocu

me

nts

One dominant voice

Finding intelligence can require different view of the same information

8

Almaden Research Center

© 2006 IBM Corporation

I

Robert Steele 6,440,000"Robert Steele" 170,000"Robert Steele" and Open Source Intelligence 2,400"Robert David Steele" and "Open Source Intelligence" within 5 words 73

Context

Network of Conference Attendees to auto-spotted Companies and Universities

In this network view we don’t care about

association with “Open Source Intelligence” but

with companies and universities

9

Almaden Research Center

© 2006 IBM Corporation

I

Computers don’t create intelligence, people do – computers enable smart people

Not all open source content is equal – know the sources

Not every thing you see is right – it’s all about the CONTEXT

Ecoinformation architecture supports- Large scale analytics of open source content- Integration of content other than open source- Power text analytic tools to support analysis of on topic stores

Conclusions on Open Source Intelligence