Upload
vananh
View
221
Download
5
Embed Size (px)
Citation preview
Proprietary & Confidential
How to combine structured and unstructured data for real business value
Nachman Geva, GM Israel Leon Ribinik, Solution Architect [email protected] [email protected] +972.52.743.0563 +972.54.457.5038
Proprietary & Confidential
Big Data Business Model: Amazon.com
• More products • More customers • More transactions • More shipments • More returns • More…
Proprietary & Confidential
Proprietary & Confidential
Big Data Business Model 2000-2010
• Predictive analytics on highly structured data
1. Use structured data (or make it structured)
2. Put in a large, powerful “data warehouse”
3. Use predictive analytics software
4. Get business insight
• Examples of Business Applications
• Clickstream analysis retail optimization
• CDR analysis service optimization and churn
• Market baste analysis upsell/bundling
• Risk analysis margin and risk optimization
4
Proprietary & Confidential
Disadvantages of “Old Model”
5
• Expensive
• Time consuming
• Rigid / Non-Agile
• Dependency & Integration
Proprietary & Confidential
Big Data Solutions (today…)
Source: Zaponet http://www.zaponet.com/products
Proprietary & Confidential
Problem Solved?
7
• Expensive?
OSS on low cost hardware (or cloud)
• Time consuming & Rigid?
Broad range of NoSQL solutions
• Dependency & Integration?
Agile
Proprietary & Confidential
Big Data Challenges
• Current solutions still focus on a single, high-volume data type (e.g., clicks, logs)
• Latency of batch processing creates blind spots
• Questions about “what” can be answered, but answers to “why” contained within unstructured content
8
Proprietary & Confidential
Let’s Think Bigger About Big Data…
• What Big Data source are you working with now?
– Customer behavior? (Log files? Clicks?)
– Customer transactions?
– System data? etc…
• How can you determine root causes behind Big Data trends?
• What other sources of information would help provide this insight?
– Documents?
– Web content?
– Email? etc…
9
Proprietary & Confidential
Big Data: Just One Part of Extreme Information
• Big Data only addresses the challenge of volume, but not:
• Variety
• Velocity
• Complexity
• This broader enterprise picture is referred to as Extreme Information
Source: 'Big Data' Is Only the Beginning of Extreme Information Management, April 7, 2011, Gartner Group
10
Proprietary & Confidential
Attivio’s Extreme Information Value Proposition
Extreme Information
Attivio completes the Big Data picture:
• Add “why” insights from unstructured content
• Access all data types for BI and decision making with one query
• Turn Big Data into real-time Active Information that initiates action
• Eliminate latency of information that causes “blind spots”
11
Proprietary & Confidential
Attivio AIE a Unified Information Access platform
Proprietary & Confidential
Attivio’s Active Intelligence Engine (AIE)
• Enterprise software, deployed on-premises or in cloud (AWS, Azure) for building applications and solutions that are strategic because they use and consume information from multiple sources
• Integrates information of any type (structured, unstructured), in any format, from any repository, inside or outside the enterprise
• Uniquely correlates information across sources at query time, so the system is agile - not brittle
• Access information at any level – document/record, thumbnail, aggregate/trend – using search or SQL
• Build analytics that incorporate information from all silos of information
ACCESS ANALYTICS CORRELATION INTEGRATION
Content & data trapped in silos
Valuable, actionable information
Proprietary & Confidential
ENTERPRISE & PACKAGED
APPLICATIONS
SEARCH & DISCOVERY
AD HOC QUERY TOOLS
ACTIVE DASHBOARDS
BUSINESS INTELLIGENCE
TOOLS
CONTENT &
eCOMMERCE
PERSONALIZED
Structured Data Unstructured Content ERP/CRM/Other DB Applications/ADBMS
Data Integration (ETL, MDM etc.)
Data Warehouses
Data Mart
CMS/Documents/PDFs/Email archive etc.
Distributed Data Mgmt.
Unstructured Data
Data Mart
Web/Social Media/SaaS/RSS
External Structured & Unstructured
Hadoop/Machine Data, etc.
splunk >
ATTIVIO ACTIVE INTELLIGENCE ENGINE (AIE)
Proprietary & Confidential
INGESTION WORKFLOWS
QUERY WORKFLOWS
ANALYTIC WORKFLOWS
UNIVERSAL ENGINE
SEARCH API
JDBC/ODBC
ANALYTICS • Feed information already
in AIE through workflows to do additional enrichment, transform data, or perform analytic calculations
AIE – Architecture & Capabilities
QUERY-SIDE • Active Security™ • Facet Finder™ • JOIN Processing • Predictive Autocomplete • Spelling Suggestions • Relevancy Ranking • Result Content & Sorting • Spotlighting • Alerts & Syndication • Recommendations
MODELS SPOTLIGHTS
INGEST-SIDE • Language Identification • Tokenization/Segmentation • Lemmatization/Stemming • Entity Extraction • Entity/Sentiment Analysis • Classification • Key Phrase Extraction
CONNECTORS • Databases • Content Management
Systems • Applications • Wrappers for command-
line utilities, Web Services, etc.
SEARCH UI
EMBEDDED IN APPLICATIONS
BUSINESS INTELLIGENCE
TOOLS
ATTIVIO ACTIVE DASHBOARDS
Proprietary & Confidential
Text Analytics
KEY PHRASES AUTO-CLASSIFICATION SENTIMENT ANALYSIS
ENTITY SENTIMENT ENTITY/CONCEPT EXTRACTION
Proprietary & Confidential
Unified Information Access - Example
Analyze & enrich unstructured data
Retain & respect normalized structure
John Smith <[email protected]>
New engagement
I am delighted that we were able to move forward … your service desk has been wonderful and helped resolve…
8
1
Proprietary & Confidential
AIE – Triples & Graphs
<triple id="1">
<entityId>P01</entityId>
<name>Joe</name>
<is>person</is>
...
</triple>
JOIN(is:person, INNER(JOIN(is:city, INNER(is:college, on="name=locatedIn")),
on="livesIn=name"))
JOIN(is:person, INNER(JOIN(is:city, INNER(JOIN(is:college,
INNER(AND(table:news, NEAR(happiest, students)), ON="name=college")),
ON="name=locatedIn")), ON="livesIn=name"))
All people who live in a college town:
All people who live in a college town with “happy students”:
Proprietary & Confidential
Scalability Model: Real Life Example
• 700M documents (Office, PDF and web pages), 1B+ security objects, plus person records and metadata
• 42k records/second • Organized in 75k
Communities of Excellence (COE)
• 500k users (350K employees, 150K partners)
• Multiple auth schemes, but query is SSO
• Total servers required for HA solution: 8
Proprietary & Confidential
Problem
• System outages costing millions per year in slower cash collections, lost productivity.
• Service interruptions taking too long to resolve.
Very high: Critical, diverse information sources include application log data; documents (SharePoint, Documentum, etc.); HP Service Center data; People Central data
Very high: Troubleshooting content scattered across 60+ internal sources. Also, must identify specific log data in real time that indicate a system problem
Customer Example: Large Mutual Funds Firm
High: Heavy velocity and massive volume of log data from over 90 internal applications
20
Proprietary & Confidential
Solution: Active Intelligence Engine
• Integrates > 20 million scientific publications, 8 million patents, 150,000 diseases, and 100,000 clinical trials
• Advanced search, structured querying, faceted navigation of all biopharma documents
• At any time, the user can visualize results as time-series BI data
• AIE In-engine analytics calc “Relay Score” for each document
• Relay launched 9 months ahead of schedule at 1/3rd of the development cost
Outcomes
Challenge
• Build next-gen competitive BI solution to help biopharmas find new research areas most likely to yield new blockbuster drugs
Case Study
“First of our competitors to market, ahead of schedule and under budget [with] views on the data just not possible in the SQL world.” --Brigham Hyde, COO, Relay ™
Proprietary & Confidential
http://relaytm.com/real-time-interactive-dashboards/
Proprietary & Confidential
Complete Data Discovery Experience
Discovered insight from AIE text analytics
Complete discovery experience: filtering via full-text search
Visualization of unstructured content from non-relational sources
Proprietary & Confidential
Root Cause Analysis with Unstructured Content
Contextual highlighting of details enriches visual analysis
Synonym, acronym expansion and automatic search by corrected spelling refines results
Demos
http://www.attivio.com/resources/demos.html
Thank You
26