Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external...

Big Data and Analytics with ArcGISCanserina Kurnia

Technical Manager – Esri Global Asia Pacific

Agenda

• What is Big Data?

• What is Hadoop?

• How does Spatial integrate with Big Data and

Hadoop?

• How do I get started?

Story Time…

Demographic

FOR EACH LOCATION

FOR EACH DEMOGRAPHIC

⬇50 MILE HEATMAP

Traditional Means…

14 Days

850 GB Raster Files

Better Way ?

What is BigData ?

7 B I L L I O N

50% LIVE IN CITIES !

~70% By 2050 ! ! !

http://www.who.int/gho/urban_health/situation_trends/urban_population_growth_text/en/

Academics

Volume

Velocity Variety

Volume

Velocity

Variety

Veracity

Validity

Visualization

Vulnerability

But then I’ve seen…

→ data at rest

→ data in motion

→ many types

→ data in doubt

→ data that is correct

→ data in patterns

→ data at risk

→ data that is meaningful

“When the traditional

means are failing you”-Anonymous

What are the new means?

http://hadoop.apache.org

What’s in a name ?

http://blog.pivotal.io/pivotal/products/demystifying-hadoop-in-5-pictures

What Is Hadoop ?• Library / Framework

• Very Very Large Un/Structured

Dataset

• Multi Node Distributed Processing

• Resilient To Commodity Hardware

Failure

Hadoop Basic Stack

Hadoop Distributed File System (HDFS)

Yet Another Resource Negotiator (YARN)

Commodity Servers

MapReduce Hive HBase

Other Hadoop Projects• Avro - Serialization / RPC System

• HBase - Distributed Columnar Database

• Hive - Ad Hoc “SQL” Interface

• Pig - Data Flow Parallel Execution (AML)

• ZooKeeper - Coordination Service

• More….

HDFS• Distributed File System

• Lots and Lots of Commodity Drives

• Fault Tolerant

• Loves Big Files

• “POSIX” Like Interface

NameNode

DataNode DataNode DataNode

HDFS Client

HDFS Resilience !

DataNode DataNode DataNode

BigData

Program

BigData

Program

MapReduce

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

What Is MapReduce ?• Parallel Fault Tolerant Framework

• Splits Large Input

• Invoke User Defined “Map” Function

• Shuffle and Sort

• Invoke User Defined “Reduce” Function

Tracker

TrackerClient

MapReduce & HDFS.jar

Thinking In MR

Map list(K2,V2)

Shuffle/Sort

K2,list(V2)

Reduce list(K3,V3)

(filter & transform) (group & aggregate)

Geo MapReduce

DensityMapID1,X1,Y1

ID2,X2,Y2

ID3,X3,Y3

ID4,X4,Y4

DensityMapfunction map(lineno,text)

tokens = text.split(‘,’)

cell = toCell(tokens[1],tokens[2])

emit( cell, 1)

function toCell(x,y)

// some math !!

return cell

function reduce(cell,iterator)

sum = 0

for( one : iterator)

sum += one

emit( cell, sum)

http://thunderheadxpler.blogspot.com/2013/03/bigdata-kernel-density-analysis-on.html

Writing MapReduce Is

Hard…

http://www.cascading.org

Think of Data

Water In Pipes

Cascading pipeline

⬇MapReduce Job

To CellGroupBy

Collection

Workflow Pipeline

SourceSink

Filter

Cascading Pipe

// Pipe tap x,y input fields into spatial function

Pipe pipe = new Each("start", new Fields("X", "Y"), new SpatialDensity());

// Group by emitted ‘cell’ value

pipe = new GroupBy(pipe, “cell”);

// Count by group and name count ‘POPULATION’

pipe = new Every(pipe, Fields.GROUP, new Count(new Fields("POPULATION")));

http://thunderheadxpler.blogspot.com/2014/01/cascading-workflow-for-spatial-binning.html

How About….

No Programing ???

Apache HIVE

“SQL”

⬇MapReduce Job

HQLdrop table if exists logs;

create external table if not exists logs(

ip string,

method string,

uri string,

status string,

bytes int,

time_taken int,

referrer string,

user_agent string

) partitioned by (year int, month int, day int, hour int)

row format delimited

fields terminated by '\t'

lines terminated by '\n'

stored as textfile

location ‘hdfs://hadoop:8020/logs/';

Other AdHoc Engines• Cloudera Impala

• Facebook Presto

• SparkSQL

• Bypass MR generation / Direct HDFS Access

What About Spatial ?

GIS Tools For Hadoop• Computational Geometry Library

• Hive Spatial UDF Functions

• GeoProcessing Extensions to ArcMap

Geometry Library• Points / Lines / Polygons

• I/O (GeoJSON,WTK,WBT,Shape)

• Spatial Relations (inside, touches, intersects,…)

• Spatial Operations (buffer, cut, convex hull,…)

• In-Memory Spatial Index

API Usage in BigData• Map-only jobs - GeoEnrichment

- Given set of locations

- Given demographic area

- Augment location with demographic attributes

BigData Binning

Hive Spatial UDF• Uses Geometry API

• Constructor

- ST_POINT / ST_GeomFromGeoJSON

• Relations

- ST_Contains / ST_Buffer

• Accessor

- ST_Distance, ST_Area

Hive Spatial UDF

SELECT counties.name, count(*) total FROM countiesJOIN earthquakesWHERE ST_Contains(counties.boundaryshape, ST_Point(earthquakes.longitude, earthquakes.latitude))GROUP BY counties.nameORDER BY total desc;

GP Extensions

ArcMap

Hive/MapReduce

Workflow

PROCESSING EVOLUTION

• Transaction - Batch

• Operational - Dashboard

• Analytics - Exploration

• Intelligent - Realtime / Predictive

Schema

Variable

Schema

Big Data Partners

And More….

Blog Post: http://thunderheadxpler.blogspot.com

Thank you

Big Data and Analytics with ArcGIS - Esri€¦ · HQL drop table if exists logs; create external...

Documents

HETEROGENEOUS STRING CLASSES AND … Pseudo...Heterogeneous String Classes 2 Abstract This senior project is a research project on the developmental gap that exists between junior

APPENDIX B Test Pit Logs, Boring Logs, and Well

Natural Logs

Breaking logs

digidaw LOGS

Logs Miner : Portal for Data Mining Web Access Logs

Tachyon condensation in Bose-Einstein condensatessmp/RETUNE2012/talk-slides/Talk... · Brane annihilation in string theory In string theory, a tachyon exists in a system containing

for the Cromano String Quartet Revoltijo€¦ · for the Cromano String Quartet omaggio a S.Revueltas. 2 = string numbers (i.e. first string, second string, third string, forth string.)

Honka logs

Whose Logs, What Logs, Why Logs - Your Quickest Path to Security Visibility

11-5 Common Logs and Natural Logs

Logs – Solve USING LOGS METHOD

Protein Structure Prediction using String Kernels · 1.2.1 Comparative Modeling Comparative Modeling or homology modeling is used when there exists a clear relation- ... becomes how

SpiceWorks Webinar: Whose logs, what logs, why logs

Editing logs

LOGS...............THE BACK LOGS

06 Resistivity Logs Induction Logs

Trucking Logs

Logs management

4.4 Resistivity logs and Induction logs