Andres Dorado -Finding Relevant Data in the Cloud

Preview:

DESCRIPTION

 

Citation preview

© CGI GROUP INC. All rights reserved

_experience the commitment TM

The Cloud: Searching for Meaning

Finding Relevant Data in the Cloud for Actionable Decisions

APRIL 2012

2

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

3

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

4

Confidential

Information Retrieval is beyond databases

DBMS

Enterprise Data

> SELECT *FROM

Information Retrieval*, aka Search, is

finding material (usually documents) of an unstructured nature (usually text) that

satisfies an information need from within large collections (usually stored on computers).

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

“An Information Need* is the topic

about which the user desires to know more, and is differentiated from a

query, which is what the user conveys

to the computer in an attempt to

communicate the information need.”

Search Go

5

Confidential

Volume, variety and velocity… Big Data

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

DBMS

Enterprise Data

> SELECT *FROM

Information Retrieval*, aka Search, is

finding material (usually documents) of an unstructured nature (usually text) that

satisfies an information need from within large collections (usually stored on computers).

Big Data refers to fast growing,

large data sets that cannot be managed with “traditional” Database

Management Systems.

The “Cloud”

Search Go

6

Confidential

Consumer market is there and Organizations can learn from it

Personal

DataThe “Cloud”

iPhone

Siri: Searching for…

7

Confidential

Analytics is enabling these capabilities

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

The “Cloud”

Big Data

DBMS

Enterprise Data

> SELECT *FROM

Search Go

Information Retrieval applies analytic

techniques such as clustering and classification to support users in

browsing or filtering document collections or further processing a set of retrieved documents.*

8

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

9

Confidential

The “ABC” Formula

The “Cloud”

Big Data

Analytics

DBMS

Enterprise Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Business Operations

10

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

11

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• “A document is relevant* if it is one that the user perceives as containing information of value with respect to their personal information need.”

• “Something (A) is relevant** to a task (T) if it increases the likelihood of accomplishing the goal (G), which is implied by T.”

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009** Hjorland, B. and Christensen, F. S. Work tasks and socio-cognitive relevance: A specific example. 2002

12

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• Personal Information Retrieval: The system searches operating systems, e-mail, and other device applications.

• Enterprise, Institutional, and domain-specific search: Documents are typically stored on centralized file systems and/or dedicated servers.

• Web Search: The system has to provide search over billions of documents stored on millions of computers.

13

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• To assess effectiveness of an Information Retrieval system (i.e., the quality of its search results), a user will usually want to know two key statistics about the system’s returned results for a query or search:

• Precision: What fraction of the returned results are relevant to the information need?

• Recall: What fraction of the relevant documents in the collection were returned by the system?

14

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

15

Confidential

Example 1: The Right Profile

The “Cloud”

Big Data:LinkedIn

150 million professionals

Analytics:Text Mining

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Recruitment Process

16

Confidential

Example 1: The Right Profile

17

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

18

Confidential

Example 2: Like it ����

The “Cloud”

Big Data:Twitter

340 million tweets/day

Sentiment Analysis

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Customer Satisfaction

19

Confidential

Example 2: Like it ����

Public Relations using “Twitter Earth”Case: Tracking tweets and displaying them by location

20

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

21

Confidential

Example 3: Promote it

The “Cloud”

Big Data:Facebook

800 million users

“Wisdom”

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Marketing Effectiveness

22

Confidential

Example 3: Promote it

Social Intelligence using “Wisdom”Case: Analyzing 10 million Facebook users to promote Engineering

23

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

24

Confidential

Conclusions

• Analytics add capabilities to information retrieval systems that facilitate finding relevant data in the “cloud”.

• Analytics enables information retrieval systems to deal with large-scale data sets and therefore is recommendable for working with Big Data.

• Analytics provides advanced techniques for more effective browsing and filtering of Big data.

How are you driving business value with the data assets accessible in by your organization?

Consider the “ABC” formula

25

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

_experience the commitment TM

Our commitment to youCGI delivers outcomes your business can count on.