Upload
megan-haggett
View
226
Download
4
Tags:
Embed Size (px)
Citation preview
Big Data:Big Challenges for Computer Science
Henri BalVrije Universiteit Amsterdam
Multiple types of data explosions
High-volume data
10-100 x global internet traffic per year (by 2018)
Complex data
Graphics Processing Units (GPUs)
Differences CPUs and GPUs● CPU: minimize latency of 1 activity (thread)
● Must be good at everything● Big on-chip caches● Sophisticated control logic
● GPU: maximize throughput of all threads usinglarge-scale parallelism
ControlALU ALU
ALU ALU
Cache
Example: NVIDIA Maxwell● 16 independent
streaming multiprocessors
● 2048 compute cores
Ongoing GPU work at VU● Applications
● Multimedia data● Digital forensics data● Climate modelling● Radio astronomy data
● Methodologies● Hadoop on accelerators● Programming methods
for accelerators
● Teaching GPUs (with UvA)● National ICT research infrastructure
COMMIT/
Complex data● Still smaller in volume than astronomy etc.● Much more complicated, semantically rich
data● Growing fast ….
Semantic web● Make the Web smarter by injecting meaning
so that machines can reason about it● initial idea by Tim Berners-Lee in 2001
● Now attracted the interest of big IT companies
WebPIE: a Web-scale Parallel Inference Engine
● Web-scale parallel reasoner doing full materialization● Orders of magnitude faster than previous work by
using smart parallel algorithms● Jacopo Urbani + Frank van Harmelen (VU)
Christiaan Huygens nomination PhD thesis Urbani
Reasoning on changing data
● WebPIE must recompute everything if data changes● Takes on the order of 1 day on a 64-node compute
cluster
● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data● Nanopublications (http://nanopub.org)● Handling 2 million news articles per day (Piek
Vossen, VU)● Data streams from (health) sensors & smart phones
● Exploit massive parallel computing and GPUs
Other work on complex data
● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA)
● Machine learning using GPUs (Hadoop)● Joint work with Max Welling (UvA)
● Business applications● With Frans Feldberg (VU, Economy)
Discussion
● We can process peta-scale (1015 , LHC) simple datawith cluster and grid technology
● Exascale (1018 , SKA) may be feasible with GPUs, but requires new parallel programming methodologies
● Processing complex data is vastly more complicated, even at smaller scales
● Complex data is also escalating in size● Dynamic (streaming) data will be next● Processing exa-scale dynamic complex data?