14
The data processing journey at Presented by Thuc Nguyen - Lead Operation Engineer

TechTalk #15 Grokking: The data processing journey at AhaMove

Embed Size (px)

Citation preview

Page 1: TechTalk #15 Grokking:  The data processing journey at AhaMove

The data processing journey at

Presented by Thuc Nguyen - Lead Operation Engineer

Page 2: TechTalk #15 Grokking:  The data processing journey at AhaMove

On-demand logistics service

Official launch on August 10th 2015

Our engineering team (in launch day)

Page 3: TechTalk #15 Grokking:  The data processing journey at AhaMove

1) Metrics: fulfillment rate, supplier&user growth, intraday dashboard, supplier performance

2) Reporting: multiple teams such as Business Development, Finance and Accounting,

Partners, Driver Relationship Management...

Data problem 1: Metrics & Reporting

Page 4: TechTalk #15 Grokking:  The data processing journey at AhaMove

Solution on early days

mostly on front-end to visualize data on web pages

data export (on general data collections like orders, users, transactions, etc...)

incremental calculated cache (in-memory and physical) when data load get bigger

specific parameterized metric pages

Page 5: TechTalk #15 Grokking:  The data processing journey at AhaMove

Scale-up stage

Exposure of:

customized data requests

data sources

technical and resource limitations

=> A solution should:

allows our staff to query and get their desire data by themselves.

be simple for non-tech persons.

be easy to maintain.

Page 6: TechTalk #15 Grokking:  The data processing journey at AhaMove

Scale-up stage (cont.)

We chose MetaBase - an open source BI tool

- 2 query modes: builder and native query

- visualize data

- saved query and dashboard

- rich apis and utilities

- alternatives: SaaS (chartio, tableau),

OSS (slamdata)

However, MetaBase is not quite good with NoSql, especially for native query (such as

relationship matching and transformative functions supporting)

Page 7: TechTalk #15 Grokking:  The data processing journey at AhaMove

Scale-up stage (cont.)

Our mongodb status:

- Mongodb 3.2 with WiredTiger storage engine and replication in replica set mode

Pros Cons

- Flexible data schema

- Strong query language with geo support

- Strong indexing (sparse/partial, expire)

- Oplog tailing

- Ineffective relationship query

- Poor utilities function support

- Unfriendly to sql geeks

Page 8: TechTalk #15 Grokking:  The data processing journey at AhaMove

Scale-up stage (cont.)

MongoDB Mosql Postgresql Metabase

docker image

11.01.XX

docker image

a replication tool

sync via mongodb oplog

We designed a data pipeline to transform mongodb data into postgresql data:

Result: fulfill all reporting requirements, automate 80% reporting works, resolve the bottleneck in data pipeline.

Page 9: TechTalk #15 Grokking:  The data processing journey at AhaMove

Results

Our productivity is booming on data-related works: 6 staff can write queries now (none of them knows sql before), every staff can access their defined metrics.

We reduced the client report preparation from 30mins to 5mins via Google Data Studio with state-of-art beauty (on the right)

We have 2-way integration with Google Sheets, so that other teams like Fund Accounting can easily get the data they want via IMPORTDATA functions.

Page 10: TechTalk #15 Grokking:  The data processing journey at AhaMove

Geospatial analytics help us answer some common questions such as:

- Which areas have low, high demands in a specific time frame?

- Which areas have low, high supplies at a given time?

- Can we ask our drivers to move from low demanding areas to high demanding ones?

- How we present such kind of data: administrative areas (districts, wards), heatmap,

hexagon, pin map?

Data problem 2: Geospatial Analytics

Page 11: TechTalk #15 Grokking:  The data processing journey at AhaMove

Geospatial analytics stack

MongoDB CartoDb CartoJs Leaflet

interactive base-map

11.01.XX

visualization layer

a geo-database

The main technologies we are using are Carto (SaaS) and Leaflet (front-end library).

Page 12: TechTalk #15 Grokking:  The data processing journey at AhaMove

Geospatial analytic showcases

Page 13: TechTalk #15 Grokking:  The data processing journey at AhaMove

- We’re making use of open source softwares as collective

intelligence to minimize our maintenance effort.

- We’re still exploring new ways to process and present data, and

we think chat app is such a potential channel for this.

- Our team motto: ‘Keep things simple’

Summary

Page 14: TechTalk #15 Grokking:  The data processing journey at AhaMove

THANK YOU FOR YOUR LISTENINGAHAMOVEYOUR PRIVATE MOVER