Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam

Big Data:Big Challenges for Computer Science

Henri BalVrije Universiteit Amsterdam

http://www.ae-info.org/ae/Acad_Main/News/New%20Members%202014

Multiple types of data explosions

High-volume data

10-100 x global internet traffic per year (by 2018)

Complex data

Graphics Processing Units (GPUs)

Differences CPUs and GPUs● CPU: minimize latency of 1 activity (thread)

● Must be good at everything● Big on-chip caches● Sophisticated control logic

● GPU: maximize throughput of all threads usinglarge-scale parallelism

ControlALU ALU

ALU ALU

Cache

Example: NVIDIA Maxwell● 16 independent

streaming multiprocessors

● 2048 compute cores

Ongoing GPU work at VU● Applications

● Multimedia data● Digital forensics data● Climate modelling● Radio astronomy data

● Methodologies● Hadoop on accelerators● Programming methods

for accelerators

● Teaching GPUs (with UvA)● National ICT research infrastructure

COMMIT/

Complex data● Still smaller in volume than astronomy etc.● Much more complicated, semantically rich

data● Growing fast ….

Semantic web● Make the Web smarter by injecting meaning

so that machines can reason about it● initial idea by Tim Berners-Lee in 2001

● Now attracted the interest of big IT companies

WebPIE: a Web-scale Parallel Inference Engine

● Web-scale parallel reasoner doing full materialization● Orders of magnitude faster than previous work by

using smart parallel algorithms● Jacopo Urbani + Frank van Harmelen (VU)

Christiaan Huygens nomination PhD thesis Urbani

Reasoning on changing data

● WebPIE must recompute everything if data changes● Takes on the order of 1 day on a 64-node compute

cluster

● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data● Nanopublications (http://nanopub.org)● Handling 2 million news articles per day (Piek

Vossen, VU)● Data streams from (health) sensors & smart phones

● Exploit massive parallel computing and GPUs

Other work on complex data

● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA)

● Machine learning using GPUs (Hadoop)● Joint work with Max Welling (UvA)

● Business applications● With Frans Feldberg (VU, Economy)

http://www.acba.nl/#home

Discussion

● We can process peta-scale (1015 , LHC) simple datawith cluster and grid technology

● Exascale (1018 , SKA) may be feasible with GPUs, but requires new parallel programming methodologies

● Processing complex data is vastly more complicated, even at smaller scales

● Complex data is also escalating in size● Dynamic (streaming) data will be next● Processing exa-scale dynamic complex data?

Documents

Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam