84

Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

E�cient Storage of Big-Data for Real-time GPS

Applications

Pavan Kumar Akulakrishna

Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,

Indian Institute of Science

November 15, 2014

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
Can you change this to Cloud Systems Lab? even on the footer of each slide?
Page 2: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Outline

Introduction

Motivation

Related Work

Methodology

Questions

Experiment and Results

Conclusions & Future Work

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 3: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 4: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 5: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 6: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 7: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 8: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 9: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 10: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 11: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 12: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Introduction

I GPS ApplicationsI Finding current locationI Finding a route from current location to some destinationI Finding nearest police station or hospital or restaurants etc.,

I GPS Applications need to beI More responsiveI Provide realtime information

I In context of storage these applications requireI E�cient data layoutI E�cient management and distribution data

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
change to data distribution
Page 13: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Motivation

I One study estimates we could save more than 600 billiondollars annually by 2020 [8].

I A consumer bene�ts by savingI TimeI Fuel

I "New ways to Exploit Raw Data may bring surge ofinnovation", a study says [8].

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 14: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Motivation

I One study estimates we could save more than 600 billiondollars annually by 2020 [8].

I A consumer bene�ts by savingI TimeI Fuel

I "New ways to Exploit Raw Data may bring surge ofinnovation", a study says [8].

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 15: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Motivation

I One study estimates we could save more than 600 billiondollars annually by 2020 [8].

I A consumer bene�ts by savingI TimeI Fuel

I "New ways to Exploit Raw Data may bring surge ofinnovation", a study says [8].

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 16: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Motivation

I One study estimates we could save more than 600 billiondollars annually by 2020 [8].

I A consumer bene�ts by savingI TimeI Fuel

I "New ways to Exploit Raw Data may bring surge ofinnovation", a study says [8].

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 17: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Motivation

I One study estimates we could save more than 600 billiondollars annually by 2020 [8].

I A consumer bene�ts by savingI TimeI Fuel

I "New ways to Exploit Raw Data may bring surge ofinnovation", a study says [8].

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 18: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 19: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 20: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 21: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 22: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 23: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 24: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 25: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Related Work

I Data storage and dissemination is done in either a fullydistributed manner or in a centralized manner [2].

I Fully Distributed MannerI Geodatabase replicationI DBMS replicationI Data copying and loading tools

I Centralized MannerI Storage at centralized servers

-Stored in traditional DBMS.I No time dependant data is stored

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Sticky Note
However, trend information associated with varying time is stored for guidance values. This can still be used for arriving at realistic average values but they do not capture real-time events impact.
Page 26: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Major issues in road-network related GPS applications

I GPS applications' data is dynamic and large.

I GPS applications need real-time responsiveness.

I User queries are location-sensitive and time-variant.

I Network congestions due to communications involved.

I Use of traditional DBMS don't scale well.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 27: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Major issues in road-network related GPS applications

I GPS applications' data is dynamic and large.

I GPS applications need real-time responsiveness.

I User queries are location-sensitive and time-variant.

I Network congestions due to communications involved.

I Use of traditional DBMS don't scale well.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 28: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Major issues in road-network related GPS applications

I GPS applications' data is dynamic and large.

I GPS applications need real-time responsiveness.

I User queries are location-sensitive and time-variant.

I Network congestions due to communications involved.

I Use of traditional DBMS don't scale well.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 29: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Major issues in road-network related GPS applications

I GPS applications' data is dynamic and large.

I GPS applications need real-time responsiveness.

I User queries are location-sensitive and time-variant.

I Network congestions due to communications involved.

I Use of traditional DBMS don't scale well.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 30: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Major issues in road-network related GPS applications

I GPS applications' data is dynamic and large.

I GPS applications need real-time responsiveness.

I User queries are location-sensitive and time-variant.

I Network congestions due to communications involved.

I Use of traditional DBMS don't scale well.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 31: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 32: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 33: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 34: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 35: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 36: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: Approach

I Computation-closeness to the data is ensured to reduceI CommunicationI Latency.

I Data-redundancy is maintained to improveI PerformanceI Build fault-tolerant solution.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 37: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: NineCellGrid

I In this method, data is distributed not based on the availablestorage nodes, but based on the region of area on which thecomputation is intended.

I Entire region is decomposed into L×L cells (L explained later)

I Each cell's data copied to adjacent 8 cells

I A 'NameTable' to store mapping(Latitude, Longitude)→ NodeAddress

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 38: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: NineCellGrid

I In this method, data is distributed not based on the availablestorage nodes, but based on the region of area on which thecomputation is intended.

I Entire region is decomposed into L×L cells (L explained later)

I Each cell's data copied to adjacent 8 cells

I A 'NameTable' to store mapping(Latitude, Longitude)→ NodeAddress

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 39: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: NineCellGrid

I In this method, data is distributed not based on the availablestorage nodes, but based on the region of area on which thecomputation is intended.

I Entire region is decomposed into L×L cells (L explained later)

I Each cell's data copied to adjacent 8 cells

I A 'NameTable' to store mapping(Latitude, Longitude)→ NodeAddress

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 40: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: NineCellGrid

I In this method, data is distributed not based on the availablestorage nodes, but based on the region of area on which thecomputation is intended.

I Entire region is decomposed into L×L cells (L explained later)

I Each cell's data copied to adjacent 8 cells

I A 'NameTable' to store mapping(Latitude, Longitude)→ NodeAddress

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 41: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Methodology: NineCellGrid

I NameTable

Longitude-74.49 -74.49 + f(L) -74.49 + 2*f(L) ... -73.50

Latitude 40.30 <NodeAddress> ... .. ...

40.30 + g(L) <NodeAddress> ... .. ...40.30 + 2*g(L) <NodeAddress> ... .. ...... <NodeAddress> ... .. ...41.29 <NodeAddress> ... .. ...

NameTable Here f and g are mappings from miles to Longitude and

Latitude units

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 42: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 43: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 44: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 45: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 46: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 47: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid : Dimension of cell, L

I A query is of the form 'route A-to-B'

I L is a value in miles, such that the query has A-B Euclideandistance less than or equal to L with high probability

I The probability or percentage is found computationally insteps of 3-5% using the historic query data from NHTS [13].

I Higher the L the more it will lead to centralized pattern

I Lower the L the more is the communication

I Optimum L exists (shown in Results section)

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
think of changing this to "Lower the L the more it will lead to fully distributed pattern" and then use your statement to the concerns when it moves to a distributed model.
Lakshmi
Sticky Note
You may want to say by way of explaining " the key idea here is to divide the data and distribute it in such a way that majority of queries associated with different regions can be answered by specific nodes. "
Page 48: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Case Study

I Case 1: Both A and B lie in the same cell of L×L

I Any of 9 cells can process the query

I Priority is given based on Euclidean from center to A and Band task load at that node

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 49: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Case Study

I Case 2: A and B lie at 1 cell distance apart

I Number of nodes that can process the query are 4-6 nodes.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 50: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Case Study

I Case 3: A and B lie at 2 Cell distance apart

I Number of nodes that can process the query are 1-3 nodes.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 51: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Case Study

I Case 4: A and B lie at more than 2 cell distance apart

I 1/9th of the nodes communicate which is 11.11% of totalnodes involving in communication.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 52: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid Summary

Arrive at L value based on query statistics

Decompose entire map into LxL cells

Distribute each cell data to8 other surrounding cells

Generate 'NameTable' of mapping of(Latitude, Longitude ) → Node Address

Load Balance across the nodes assigningmore nodes to cells of high load factor

Reassign 'NameTable' after load balance

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 53: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Load BalancingAlgorithm: Load Balancing

Inputs: [ NameTable; Relaxation δ ]Outputs: [ Load balanced distribution; Updated NameTable ]1. Initialize load based on number of edges, vertices, updates, etc.,2. for each node E do:3. if( Load(E) < M − δ )4. Find node P closest to E with load ≥ M − δ.5. Load(P) → Load(P) + Load(E);6. Load(E) → 0;7. NameTable[Latitude(E)][Longitude(E)] = Address(P).8. endif9. endfor10.for each node E do:11. if( Load(E) 6= 0 )12. Share the load among N nodes.13. (Load(E)/N) ∈ [M − δ,M + δ] & N ∈ (1,2,3,...)14. Update NameTable.15. endif16. endfor

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 54: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Why nine cell ?

I Consider,< A→ B > route query with Euclidean distance tobe L or less than L (Our claim)

I All possible locations of A in a cell would be

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 55: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Why nine cell ?

I Consider,< A→ B > route query with Euclidean distance tobe L or less than L (Our claim)

I All possible locations of A in a cell would be

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 56: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Why nine cell ?

I Locus of B would be

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 57: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Why nine cell ?

I Best �t is..

I Hence, "NineCellGrid"

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 58: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Why nine cell ?

I Best �t is..

I Hence, "NineCellGrid"

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
Why nine cells?
Page 59: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Is 9 way redundancy not too much ?

I Generally, redundancy is used to maintain fault-tolerantsystems and also in cases where the data is highly important.

I State of art Replication Factor(RF) is 3, 5 or may be somecases upto 7.

I Here, in this case we are using redundancy mostly to ensureperformance than to ensure fault-tolerant and data consistency.

I And moreover, the data of a cell can be stored distributedlyacross the 9 nodes in the HDFS(Hadoop DistributedFileSystem) Format[4].

I It is also noted that speci�c type of data is stored in HDFSformat which support reduce operations; else they follow themechanism of complete data copies i.e., entire �le is stored ascopies in all the 9 cells.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 60: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Is 9 way redundancy not too much ?

I Generally, redundancy is used to maintain fault-tolerantsystems and also in cases where the data is highly important.

I State of art Replication Factor(RF) is 3, 5 or may be somecases upto 7.

I Here, in this case we are using redundancy mostly to ensureperformance than to ensure fault-tolerant and data consistency.

I And moreover, the data of a cell can be stored distributedlyacross the 9 nodes in the HDFS(Hadoop DistributedFileSystem) Format[4].

I It is also noted that speci�c type of data is stored in HDFSformat which support reduce operations; else they follow themechanism of complete data copies i.e., entire �le is stored ascopies in all the 9 cells.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 61: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Is 9 way redundancy not too much ?

I Generally, redundancy is used to maintain fault-tolerantsystems and also in cases where the data is highly important.

I State of art Replication Factor(RF) is 3, 5 or may be somecases upto 7.

I Here, in this case we are using redundancy mostly to ensureperformance than to ensure fault-tolerant and data consistency.

I And moreover, the data of a cell can be stored distributedlyacross the 9 nodes in the HDFS(Hadoop DistributedFileSystem) Format[4].

I It is also noted that speci�c type of data is stored in HDFSformat which support reduce operations; else they follow themechanism of complete data copies i.e., entire �le is stored ascopies in all the 9 cells.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 62: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Is 9 way redundancy not too much ?

I Generally, redundancy is used to maintain fault-tolerantsystems and also in cases where the data is highly important.

I State of art Replication Factor(RF) is 3, 5 or may be somecases upto 7.

I Here, in this case we are using redundancy mostly to ensureperformance than to ensure fault-tolerant and data consistency.

I And moreover, the data of a cell can be stored distributedlyacross the 9 nodes in the HDFS(Hadoop DistributedFileSystem) Format[4].

I It is also noted that speci�c type of data is stored in HDFSformat which support reduce operations; else they follow themechanism of complete data copies i.e., entire �le is stored ascopies in all the 9 cells.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 63: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Is 9 way redundancy not too much ?

I Generally, redundancy is used to maintain fault-tolerantsystems and also in cases where the data is highly important.

I State of art Replication Factor(RF) is 3, 5 or may be somecases upto 7.

I Here, in this case we are using redundancy mostly to ensureperformance than to ensure fault-tolerant and data consistency.

I And moreover, the data of a cell can be stored distributedlyacross the 9 nodes in the HDFS(Hadoop DistributedFileSystem) Format[4].

I It is also noted that speci�c type of data is stored in HDFSformat which support reduce operations; else they follow themechanism of complete data copies i.e., entire �le is stored ascopies in all the 9 cells.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 64: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Experiment and Results

I We used MPI for simulation and Dijkstra for routecomputation to study the application-level performance ofcentralized, fully distributed and NineCellGrid approach datadissemination models.

I Notations usedCGWR : "Cell Grid Without Replication"DBMSAR : DBMS as Replication factor, RF = 5Zonal : Fully distributed with RF = 5Centralized : Centralized distribution.

I Dataset

Dataset used : New York City [(40.3, 41.3),(73.5,74.5)]Vertices/Junctions : 264,346Arcs/Links : 733,846Source : DIMACS [14]

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 65: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Experiment and Results

I We used MPI for simulation and Dijkstra for routecomputation to study the application-level performance ofcentralized, fully distributed and NineCellGrid approach datadissemination models.

I Notations usedCGWR : "Cell Grid Without Replication"DBMSAR : DBMS as Replication factor, RF = 5Zonal : Fully distributed with RF = 5Centralized : Centralized distribution.

I Dataset

Dataset used : New York City [(40.3, 41.3),(73.5,74.5)]Vertices/Junctions : 264,346Arcs/Links : 733,846Source : DIMACS [14]

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 66: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Experiment and Results

I We used MPI for simulation and Dijkstra for routecomputation to study the application-level performance ofcentralized, fully distributed and NineCellGrid approach datadissemination models.

I Notations usedCGWR : "Cell Grid Without Replication"DBMSAR : DBMS as Replication factor, RF = 5Zonal : Fully distributed with RF = 5Centralized : Centralized distribution.

I Dataset

Dataset used : New York City [(40.3, 41.3),(73.5,74.5)]Vertices/Junctions : 264,346Arcs/Links : 733,846Source : DIMACS [14]

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 67: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Actual Map of the Dataset used

New York [(40.3, 41.3),(73.5,74.5)]Source: maps.google.com [1]

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 68: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Data-points in actual coordinates

Plot represents junction pointsDataset Source: Dimacs[14]

I L value for this dataset is considered 6.8 miles[13]I Each data-point contributes to 1 unit of 'Load Factor'

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 69: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Data-points in actual coordinates

Plot represents junction pointsDataset Source: Dimacs[14]

I L value for this dataset is considered 6.8 miles[13]I Each data-point contributes to 1 unit of 'Load Factor'

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 70: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

After assigning each cell Load Factor

Load Factor is the number of Junctions and Arcs present in thatregion

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 71: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

NineCellGrid Distribution is applied

Load Factor of each cell is distributed to all other 8 cells

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 72: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

After Load Balancing

Mean M = 225, Relaxation δ = 75Hence the Min = 150 and Max = 300

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 73: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Histogram comparison

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 74: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Average Turnaround Times

Number of Queries : 1000

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 75: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Finding Optimal Percentage for L

Number of Queries : 1000

I Higher the L the more it tends to centralized patternI Lower the L the more is the communication

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 76: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Throughput Comparison

Number of Queries : 1000

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 77: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Conclusions

I Results show that NineCellGrid storage method achieve betterthroughput than fully distributed and centralized storagemethods.

I Despite redundancy being high and usage of extra space is aburden we are gaining a performance speed up.

I Using this method we gain redundancy in data, fault-tolerance,highly data parallel so increased level of parallelism.

I The overall performance of our method is either higher than orcomparable with fully distributed and centralized methods.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 78: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Conclusions

I Results show that NineCellGrid storage method achieve betterthroughput than fully distributed and centralized storagemethods.

I Despite redundancy being high and usage of extra space is aburden we are gaining a performance speed up.

I Using this method we gain redundancy in data, fault-tolerance,highly data parallel so increased level of parallelism.

I The overall performance of our method is either higher than orcomparable with fully distributed and centralized methods.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
achieves
Lakshmi
Highlight
Lakshmi
Sticky Note
change to "being"
Lakshmi
Highlight
Lakshmi
Sticky Note
delete
Page 79: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Conclusions

I Results show that NineCellGrid storage method achieve betterthroughput than fully distributed and centralized storagemethods.

I Despite redundancy being high and usage of extra space is aburden we are gaining a performance speed up.

I Using this method we gain redundancy in data, fault-tolerance,highly data parallel so increased level of parallelism.

I The overall performance of our method is either higher than orcomparable with fully distributed and centralized methods.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Lakshmi
Highlight
Lakshmi
Sticky Note
this could be a sub-bullet of the second bullet.
Page 80: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Conclusions

I Results show that NineCellGrid storage method achieve betterthroughput than fully distributed and centralized storagemethods.

I Despite redundancy being high and usage of extra space is aburden we are gaining a performance speed up.

I Using this method we gain redundancy in data, fault-tolerance,highly data parallel so increased level of parallelism.

I The overall performance of our method is either higher than orcomparable with fully distributed and centralized methods.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 81: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Future Work

I Our future work will focus on balancing the load at each nodeby manipulating the replication factors.

I Job-scheduling algorithms that learn from previously observedqueries can be devised to improve performance even more.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 82: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Future Work

I Our future work will focus on balancing the load at each nodeby manipulating the replication factors.

I Job-scheduling algorithms that learn from previously observedqueries can be devised to improve performance even more.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 83: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

ReferencesGoogle Maps http://maps.google.com

Distributed or Centralized Tra�c Advisory Systems - The Applications Take. Otto, J.S. ; Dept. of Electr. Eng. and Comput. Sci., Northwestern Univ., Evanston, IL,USA; Bustamante, F.E.

B. Hoh, M. Gruteser, R. Herring, J. Ban, D. Work, J.C. Herrera, A. Bayen, M. Annavaram, and Q. Jacobson, "Virtual trip lines for distributed privacy preservingtraf�c monitoring," in Proc. of ACM/USENIX MobiSys, Breckenridge, CO, June 2008.

Research on the Data Storage and Access Model in Distributed Computing Environment. Haiyan Wu Coll. of Comput. and Inf. Eng., Zhejiang GongShang Univ.,Hangzhou

Google White Papers

H. Zhu, Y. Zhu, M. Li, and L. M. Ni, "HERO online real-time vehicle tracking in Shangai," in Proc. of IEEE INFOCOM, 2008.

T. Logenthiran, Dipti Srinivasan Department of Electrical and Computer Engineering National University of Singapore, Intelligent Management of Distributed StorageElements in a Smart Grid, 2011.

Shashi Shekhar, Viswanath Gunturi, Michael R. Evans,KwangSoo Yang University of Minnesota, Spatial Big-Data Challenges Intersecting Mobility and CloudComputing, 2012.

V. Taliwal, D. Jiang, H. Mangold, C. Chen, and R. Sengupta, "Empirical determination of channel characteristics for DSRC vehicle-to-vehicle communication," inProc. of ACM VANET, 2004.

Z. Wang and M. Hassan, "How much of dsrc is available for non-safety use?" in Proc. of ACM VANET, September 2008.

B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu, E. Shih, H. Balakrishnan, and S. Madden, "CarTel a distributed mobile sensor computing system",in Proc. of ACM SenSys, 2006.

An ESRI Technical Paper June 2007.

NHTS: National Household Travel Survey, 2009.

DIMACS: http://www.dis.uniroma1.it/challenge9/download.shtml

The Hadoop Distributed File System, Shvachko, K., Yahoo!, Sunnyvale, CA, USA, Hairong Kuang ; Radia, S. ; Chansler, R.

Dijkstra Shortest Path Computation Algorithm, by Dijkstra.

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications

Page 84: Efficient Storage of Big-Data for Real-time GPS Applications...E cient Storage of Big-Data for Real-time GPS Applications Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy, CAD

Introduction Motivation Related Work Methodology Questions Experiment and Results Conclusions & Future Work

Thank You!

Pavan Kumar Akulakrishna Dr.J.Lakshmi & Prof.S.K.Nandy,CAD Lab, SERC,Indian Institute of Science

E�cient Storage of Big-Data for Real-time GPS Applications