the Second Meeting of the “Successful Administrative ... · Regional Forum on Cybersecurity in...

Preview:

Citation preview

Regional Forum on Cybersecurity in the Era of Emerging Technologies &

the Second Meeting of the “Successful Administrative Practices”-2017 Cairo, Egypt 28-29 November 2017

Big Data, Data Science and Analytics Challenges for Digital Transformation Introduction Note

Hisham Arafat

What is Big Data?!!

Context is important… Digital Transformation

2

Can we answer this questions in 15 mins?

Key Motivations in a Connected World!

Indicative Forecasted Figures for Year 2020 - Image source: The Enterprise Project

Billions of Connected

Users

Dozens Billions of Connected

Devices

Dozens Zettabytes of

Generated Data

Big Data…Practical Definition!

Huge Volumes Massive Streams Mixed Structures Complex Processing

1000s sensors~ 1 TB/Sensor/Day

Images/Reports/Corsp.

Millions of Parcels/Day~ 100 TB/Day

Labels/Cross App

Millions of Cars~ 200+ OBDII/sRealtime/Pred.

Big Data is not the Solution….It’s the Challenge

• Shared I/O

• Shared Processing

• Limited Scalability

• Service Bottlenecks

• High Cost FactorSh

ared

Bu

ffer

s

Data Files

Database Cluster

I/O

I/O

I/O

Network

Dat

abas

e Se

rvic

e

Traditional Data Management Systems

How Many Oxen?

In pioneer days they used oxen forheavy pulling, and when one oxcouldn't budge a log, they didn't try togrow a larger ox. We shouldn't betrying for bigger computers, but formore systems of computers.

Bullock Team drawing II Ton Marshall Engine (Australia early

20th century)

Dat

a N

od

es

Master NodesI/O

Network

Inte

rco

nn

ect

• Parallel Processing

• Shared Nothing

• Linear Scalability

• Distributed Services

• Lower Cost Factor

I/O

I/O

I/O

Metadata

1

2

3

n

Metadata

User data / Replicas

User data / Replicas

User data / Replicas

User data / Replicas

Abstraction of Big Data Platforms

Key Technologies and Patterns

Problem to Solve Techniques Methods

Find Relations or Patterns of Occurrence Among Items, Actions or Events

Association Rules Apriori, FP Growth

Analyze and Discover the Internal Structure, Behavior, or Similarity of Observations

Clustering Kmeans, k-medoids , DBSCAN, LDA

Put New Coming Observations Under Pre-defined Classes or Assign Labels

Classification Naive Bayes, LR, RF, Decision Trees, SVM

Understand the How Specific Outcome is Driven by Input Variables

Regression Linear, Logistic, Ridge, Multinomial, LOESS

Forecast in Short Term and Understand Temporal Behavior for Variables

Time Series Analysis Box-Jenkins, ARIMA, Wavelet

Analyzing Unstructured Text for Searching, Retrieval, Sentiment, Networking, NLP

Text Analytics BoW, TFIDF, PoS, CR, HiddenMarkov, TM

Respond to Future Events Prospectively Simulation Monte Carlo, GA

Provide List of Appealing Recommendations on Events or Actions Recommenders Collaborative Filtering, Content B. Filtering

Data Science Models…The Value

100s of Methods!

Emerging Technologies and Apps

BlockchainApplications

Internet of Things (IoT)Solutions

RecommenderSystems

PersonalizedServices

Personalized Medication

Preventive Maintenance

PerspectiveCybersecurity

Logistics & SCOptimization

GeolocationServices

Key Challenges

DataGovernance

Privacy-Preserving analytics

Securing inReal-time

Data StructureComplexities

DistributedDeployment

IdentityManagement

DDoS

Configuration Management

AlgorithmsThreats

Volume

Velocity

Processing

Varity

Insights

Growth

Streams

Real-time

Dynamicity

Responsive

Scale out Performance

Data Flow Engines

Event Pipelines

Smart Data Formats

Perspective Deep Models

Big

Dat

a b

y Ye

ar 2

01

0

Big Data…Digital Transformation

Thank You

Recommended