MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Preview:

Citation preview

Solution Architect

Jay Runkel

@jayrunkel

Time Series Data: Aggregations in Action

Agenda

• Review Traffic Use Case

• Review Schema Design

• Document Retention Model

• Aggregation Queries

• Map Reduce

• Hadoop

Use Case Review

We need to prepare for this

Develop Nationwide traffic monitoring system

Traffic sensors to monitor interstate conditions

• 16,000 sensors

• Measure at one minute intervals

• Speed• Travel time• Weather, pavement, and traffic conditions

• Support desktop, mobile, and car navigation systems

What we want from our data

Charting and Trending

What we want from our data

Historical & Predictive Analysis

What we want from our data

Real Time Traffic Dashboard

Review Schema Design

Document Structure

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Ice Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Compound, uniqueIndex identifies theIndividual document

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Saves an extra index

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Range queries:/^900006:1403/

Regex must be left-anchored &case-sensitive

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Pre-allocated,60 element array of per-minute data

Advantages

1. In place updates efficient

2. Dashboards simple queries

Dashboards

Mon Mar 10 2014 04:57:00 GMT-0700 (PDT)Tue Mar 11 2014 05:00:00 GMT-0700 (PDT) Tue Mar 11 2014 21:59:00 GMT-0700 (PDT)0

10

20

30

40

50

60

70

Chart Title

Series1

db.linkData.find({_id : /^20484087:2014031/})

Supporting Queries From Navigation Systems

Navigation System Queries

What is the average speed for the last 10 minutes on 50 upcoming road segments?

Current Real-Time Conditions

Last ten minutes of speeds and times

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Pre-aggregated metrics

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Geo-spatially indexed road segment

db.linksAvg.update(

{"_id" : linkId},

{ "$set" : {"lUpdate" : date},

"$push" : {

"times" : { "$each" : [ time ], "$slice" : -10 },

"speeds" : {"$each" : [ speed ], "$slice" : -10}

}

})

Maintaining the current conditions

Each update pops the last element off the array and pushes the new value

Document Retention

Document retention

Doc per hour

Doc per day

2 weeks

2 months

1year

Doc per Month

Rollup – 1 day

// daily document// retained for 2 months{ _id: "link:date",

// 24 element array hourly: [ { speed: { sum: , count: }, time: { sum: , count: } }, { speed: { sum: , count: }, time: { sum: , count: } } ]}

Analysis With The Aggregation Framework

Pipelining operations

grep | sort |uniq

Piping command line operations

Pipelining operations

$match $group | $sort|

Piping aggregation operations

Stream of documents Result document

What is the average speed for a given road segment?

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

What is the average speed for a given road segment?

Select documents on the target segment

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

What is the average speed for a given road segment?

Keep only the fields we really need

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

What is the average speed for a given road segment?

Loop over the array of data points

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

What is the average speed for a given road segment?

Use the handy $avg operator

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, “_id”: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

More Sophisticated Pipelines: average speed with variance

{ "$project" : { mean: "$meanSpd", spdDiffSqrd : { "$map" : { "input": { "$map" : { "input" : "$speeds", "as" : "samp", "in" : { "$subtract" : [ "$$samp", "$meanSpd" ] } } }, as: "df", in: { $multiply: [ "$$df", "$$df" ] }} } } },{ $unwind: "$spdDiffSqrd" },{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }

Analysis With MapReduce

Historic Analysis

How does weather and road conditions affect traffic?

The Ask: what are the average speeds per weather, status and pavement

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Snow”, 34

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Icy spots”, 34

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Delays”, 34

MapReduce

MapReduce

Weather: “Rain”, speed: 44

MapReduce

Weather: “Rain”, speed: 39

MapReduce

Weather: “Rain”, speed: 46

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

Resultsresults: [

{ "_id" : "Generally Clear and Dry Conditions", "value" : { "count" : 902, "speedSum" : 45100 } }, { "_id" : "Icy Spots", "value" : { "count" : 242, "speedSum" : 9438 } }, { "_id" : "Light Snow", "value" : { "count" : 122, "speedSum" : 7686 } }, { "_id" : "No Report", "value" : { "count" : 782, "speedSum" : NaN } }

Analysis With Hadoop (using the MongoDB Connector)

Processing Large Data Sets

• Need to break data into smaller pieces

• Process data across multiple nodes

Hadoop

Hadoop

Hadoop Hadoop

HadoopHadoo

pHadoop

Hadoop

Hadoop

Hadoop

Benefits of the Hadoop Connector

• Increased parallelism• Access to analytics libraries• Separation of concerns• Integrates with existing tool chains

MongoDB Hadoop Connector

• Multi-source analytics• Interactive & Batch• Data lake

• Online, Real-time• High concurrency &

HA• Live analytics

Operational

Post Processingand

MongoDB Connector for

Hadoop

Questions?

@jayrunkeljay.runkel@mongodb.com

Part 3 - July 16th, 2:00 PM EST

Sign up for our “Path to Proof” Program and get expert advice on implementation, architecture, and

configuration.

www.mongodb.com/lp/contact/path-proof-program

HVDF:https://github.com/10gen-labs/hvdf

Hadoop Connector:https://github.com/mongodb/mongo-hadoop

Consulting Engineer, MongoDB Inc.

Bryan Reinero

#ConferenceHashtag

Thank You

Recommended