Upload
barcelona-tech-upc-barcelona-supercomputer-center-bsc
View
850
Download
0
Tags:
Embed Size (px)
Citation preview
Talking about Cognitive Computing Platforms with ���Barcelona GSE Master Data Science Students
Jordi Torres March- 2015
IS BIG DATA A TREND THAT IS HERE TO STAY OR IS IT ONLY A BUZZWORD?
Big Data: NEW ERA OF COMPUTING
SOCIAL NETWORKS
MOBILE TECHNOLOGY
INTERNET OF THINGS
CLOUD COMPUTING
NEW DATA CHALLENGES
Volume Velocity Variety Veracity
NEW MANAGEMENT ENVIRONMENTS ARE REQUIRED
e.g. NoSQL DB
e.g. MapReduce
Ongoing projects in Big Data at BSC
Big Data Application Monitoring
Design and implement a set of tools to monitor and to
trace network traffic for hadoop applications
Software Defined
Env.
Building cloud environments embracing
hardware and network heterogeneity to host a
variety of workloads (JSA-SDN)
Cloud Recom-
mendation
Enhance a decision support system providing guidance in the selection of the right set of Clouds analysing trade-off between cost, reliability, risks and quality impacts
Holistic Integration of Emerging Supercomputing Technologies ERC Starting Grant David Carrera
RM for Green Infrast.
Optimise placement of VM that exploit the usage of green
energy and the interaction with energy supply and cooling
systems
RM for Green
Software
RM for Green
Hardware
Optimise placement of VM with the aim for energy efficiency exploiting the ARM low-power architecture
Optimise placement of VM with the aim for energy efficiency
focusing on the interaction exchange during the whole
service layer
HI-EST project
Ongoing projects in Big Data at BSC
High Performance
Key/Value Stores
Explore the use of high performance key/value
databases for fast persistent memory technologies
Mngt. of Data
Streaming Environ.
Explore novel architectures of the emerging IoT stream processing platforms, that provide the capabilities of data stream composition, transformation and filtering in real time
Extending PyCompSs to NoSQL
DB Integration of COMPSs runtime with Cassandra and exploration of scheduling policies driven by data locality
NoSQL Data
Management Research
Design and implement a software layer to enable NoSQL databases to decouple data
organization from data model and provide NoSQL databases with efficient multi-level
indexing support
Improving Cost-effect. of Big Data
Deploy.
Develop mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments (runtime performance vs software and hardware configuration choices)
BIG DATA IS ALREADY INFLUENCING OUR EVERYDAY LIVES
the technology is often so subtle that consumers have no idea that big data is actually helping make their lives easier
Sour
ce: h
ttp://
blog
s.sas
.com
/con
tent
/sasc
om/fi
les/2
012/
06/B
igDat
aEve
ryw
here
.gif
?
Example: #OnlineShopping
database of around 250 million customers
Predicts what items you might want and sends it to your nearest delivery hub
Example: #Retail
Target made headlines in 2012 for correctly identifying a pregnant teenager before her family knew about her condition.
Identified 25 items that, when purchased in a particular order …
HOW DO THEY DO IT?
Sour
ce: h
ttp://
ww
w.ca
decd
issen
y.com
/wp-
cont
ent/u
ploa
ds/2
013/
10/a
pt4-
500x
213.j
pg
• feasible for companies to find interesting patterns hidden in data. • predictive modeling attempts to set up a model to predict the
probability of a specific outcome. Sour
ce: h
ttp://
bigs
onat
a.com
/wp-
cont
ent/u
ploa
ds/2
014/
07/P
redi
ctive
Mod
eling
.jpg
Machine Learning Algorithms
ability of computer systems to improve their performance by exposure to data without the need to follow explicitly programmed
instructions.
New algoritms:
• the term itself dates from the 1950s. • periods of hype and high
expectations alternating with periods of setback and disappointment.
Artificial Intelligence
plays an important
role
Computer Vision
Speech Recognition
Natural Language Processing
Artificial Intelligence: enabler of other technologies
http://cdn2.hubspot.net/hub/346378/file-529609748-png/blog-files/kcocco_twitter_data_google_prediction_api.png
Along the explosion of data …
New Algorithms
can be trained
Source: www.blogcdn.com/jobs.aol.com/articles/media/2011/08/statistics-getty.gif & cdni.wired.co.uk/620x413/d_f/dataastonishment.jpg
DATA SCIENTIST
the complexity of current big data platforms forces Data Scientist to waste part of their working time doing non-glamorous tasks such as data preparation, parameters tuning or selecting the most
suitable modeling method.
The goal is to automate predictive analysis
How?
and also preventing users and developers from wasting their time on tedious tasks related to data management and data processing.
Systems will have a new cognitive
abstraction layer in the software stack
offering learning tools, but at the same time,
abstracting lower layers to simplify the
big data software stack.
“cognitive layer” will enhance the knowledge extraction loop with new features
Source: www.blogcdn.com/jobs.aol.com/articles/media/2011/08/statistics-getty.gif & cdni.wired.co.uk/620x413/d_f/dataastonishment.jpg
… behaving as a “Predictive Modeling Factory”
WILL THE RISE OF PREDICTIVE MODELING FACTORIES ELIMINATE THE NEED FOR
DATA SCIENTISTS?
SOU
RCE:
http
s://w
ww
.itu.dk
/~pa
nic/p
rojec
ts/qu
eue.j
pg
they could focus their effort on what really matters: figuring out what are the right questions to ask the predictive models
Source: http://www.muycomputerpro.com/2015/02/20/cinco-claves-para-ser-un-gran-data-scientist
NO !!!
WHY NOW?
Sour
ce: h
ttp://
ww
w.co
mpu
terh
istor
y.org
/
COMPUTING WAVES
The first wave of computing made numbers computable
The second wave has made
text and rich media computable and accessible digitally
We are in the next wave that will also make context computable
Systems that embed predictive capabilities, providing the right functionality and content at the right time, for the right application, by continuously learning about them and predicting what they will need.
computers that address complex situations are necessary
Source: http://21weeks.com/wp-content/uploads/2012/08/96376749.jpg
AMBIGUITY AND UNCERTAINTY
These new systems will raise the potential to augment our reasoning capabilities
GIVING COMPUTERS A GREATER ABILITY TO UNDERSTAND INFORMATION, AND TO LEARN, TO
REASON, AND ACT UPON IT
instead of instructing a computer what to do, we are going to simply throw data at the problem and tell the computer to figure it out itself.
Source: http://financeandcareer.com/wp-content/uploads/2013/03/webProgrammingInternship.jpg
PROGRAMS DATA
WHY NOW?
#1: now algorithms can be “trained” by exposing them to large data sets that were
previously unavailable.
#2: the computing power necessary to implement these algorithms are now available
• Multicore Technology
• Increased network bandwidth
• Flash Technology
next technological shift
COGNITIVE COMPUTING? So
urce
: ww
w.D
esign
edIn
Barc
elona
.com
COGNITIVE COMPUTING *
(*) Others use Smart Computing, Intelligent Computing, …
its meaning is not clear yet…
DATA
WE LABEL THIS NEW TYPE OF COMPUTING
Supercomputers Research
Big Data Technologies
Advanced Analy:c
Algorithms
• to the continuous development of supercomputing systems
• enabling the convergence of advanced analytic algorithms
• and big data technologies
COGNITIVE COMPUTING IS ALREADY IN BUSINESS
IBM was able to double the precision of Watson’s answers in the few years leading up to its famous victory in the
quiz show Jeopardy.
COGNITIVE COMPUTING IS ALREADY IN BUSINESS
Just closed on $12.5 M in venture capital funding.
Booking a flights …
“I want a flight to MWC Barcelona with a return five days later
via London.”
“COGNITIVE COMPUTING” RESEARCH IN BARCELONA
HTTP://www.bsc.es/autonomic
Ongoing projects
Probabilistic Ggraphical
Models
Deep Learning
Multimodal Big Data Analytics
Scalable Indexing
of Image/Video
to enhance neural network algorithms using high performance computing platforms to make them more suitable to be used in tasks which involve unstructured types of data and unlabeled datasets.
1. DEEP LEARNING
II. PGM & BAYESIAN INFERENCE
Explores an alternative methodology for applying ML, using high performance computing platforms, in which a bespoke solution is formulated for each application, based on PGM and Bayesian inference for RT applications.
Case study: user as a sensor (social and sensor networks)
for data-driven city management
III. REAL-TIME SCALABLE INDEXING OF IMAGE/VIDEO
Using high performance computing platforms for distributed real-time process of indexing and searching near replica detection over massive streams of shared photos (visual spam detection, copyright infringement)
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
PATCH 1
PATCH 2
PATCH 3
PATCH 4
KP1
KP2
KP3
KP4
feature detec*on
feature descrip*on
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
0000 0100 1100
0010 0110 1110
0011 0111 1111 The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x
IV. MULTIMODAL BIG DATA ANALYTICS
social network
relationships
audiovisual content metadata
Three types of data
Analyzing the content
object recogni:on
clustering
near replica detec:on
face detec:on+ verificacion smage style recongiton
scene undrestanding
concept annota:on
WHAT CAN WE LEARN FROM THE PHOTOS?
(1) THE WORLD OBSERVED
WHAT CAN WE LEARN FROM THE PHOTOS?
(2) THE WORLD
WHAT CAN WE LEARN FROM THE PHOTOS?
(1) THE OBSERVER
Case Study: DESIGUAL
2 DATASETS #desigual #lavidaeschula
#mydesigual followers
30.000 photos 100 photos x 2K followers = 200K Photos (100 GB)
Case Study: DESIGUAL
• Desigual followers characterization • multimodal classification (SVM, logistic regression)
desigual followers & taggers (#mydesigual, #lavidaeschula)
Case Study: DESIGUAL
• Recommendation à collaborative filtering • Multimodal image clustering (k-means) • Alternate least square
Master project at BSC?
CATWALK: Social Media Image Analysis for Fashion Industry Market Research ���
Multimedia Big Data Computing platform that operates over freely available online images from sources such as Instagram or Twitter.
CATWALK: Social Media Image Analysis for Fashion Industry Market Research
• The platform includes tools for the continuous analysis of the harvested images, generating a range of social and behavioral metrics in real-time.
• Current Research: high performance image analysis and pattern matching algorithms over a distributed and scalable stream processing framework.
• Proof of concept: carried out with Spark over MareNostrum supercomputer.
• Also: Market research, IPR prot. & Business Plan
CONCLUSIONS ? So
urce
http
://cd
n01.a
m.in
foba
e.com
/adj
unto
s/163
/imag
enes
/011
/380
/001
1380
705.j
pg
• We are entering into the NEXTS COMPUTING WARE that will make context computable.
• COGNITIVE COMPUTING Systems will be the next generation of Big Data Platforms.
• Systems will have a NEW COGNITIVE ABSTACTION LAYER in the software stack.
• “Cognitive layer” will enhance the knowledge extraction loop with new features behaving as a PREDICTIVE MODELING FACTORY.
• DATA SCIENTISTS could focus their effort on what really matters: figuring out what are the right questions to ask the predictive models.
Thank you for your attention! JORDI TORRES @JordiTorresBCN
www.JordiTorres.eu
#CognitiveComputing & #Barcelona