TechTalk #15 Grokking: The data processing journey at AhaMove

Preview:

Citation preview

The data processing journey at

Presented by Thuc Nguyen - Lead Operation Engineer

On-demand logistics service

Official launch on August 10th 2015

Our engineering team (in launch day)

1) Metrics: fulfillment rate, supplier&user growth, intraday dashboard, supplier performance

2) Reporting: multiple teams such as Business Development, Finance and Accounting,

Partners, Driver Relationship Management...

Data problem 1: Metrics & Reporting

Solution on early days

mostly on front-end to visualize data on web pages

data export (on general data collections like orders, users, transactions, etc...)

incremental calculated cache (in-memory and physical) when data load get bigger

specific parameterized metric pages

Scale-up stage

Exposure of:

customized data requests

data sources

technical and resource limitations

=> A solution should:

allows our staff to query and get their desire data by themselves.

be simple for non-tech persons.

be easy to maintain.

Scale-up stage (cont.)

We chose MetaBase - an open source BI tool

- 2 query modes: builder and native query

- visualize data

- saved query and dashboard

- rich apis and utilities

- alternatives: SaaS (chartio, tableau),

OSS (slamdata)

However, MetaBase is not quite good with NoSql, especially for native query (such as

relationship matching and transformative functions supporting)

Scale-up stage (cont.)

Our mongodb status:

- Mongodb 3.2 with WiredTiger storage engine and replication in replica set mode

Pros Cons

- Flexible data schema

- Strong query language with geo support

- Strong indexing (sparse/partial, expire)

- Oplog tailing

- Ineffective relationship query

- Poor utilities function support

- Unfriendly to sql geeks

Scale-up stage (cont.)

MongoDB Mosql Postgresql Metabase

docker image

11.01.XX

docker image

a replication tool

sync via mongodb oplog

We designed a data pipeline to transform mongodb data into postgresql data:

Result: fulfill all reporting requirements, automate 80% reporting works, resolve the bottleneck in data pipeline.

Results

Our productivity is booming on data-related works: 6 staff can write queries now (none of them knows sql before), every staff can access their defined metrics.

We reduced the client report preparation from 30mins to 5mins via Google Data Studio with state-of-art beauty (on the right)

We have 2-way integration with Google Sheets, so that other teams like Fund Accounting can easily get the data they want via IMPORTDATA functions.

Geospatial analytics help us answer some common questions such as:

- Which areas have low, high demands in a specific time frame?

- Which areas have low, high supplies at a given time?

- Can we ask our drivers to move from low demanding areas to high demanding ones?

- How we present such kind of data: administrative areas (districts, wards), heatmap,

hexagon, pin map?

Data problem 2: Geospatial Analytics

Geospatial analytics stack

MongoDB CartoDb CartoJs Leaflet

interactive base-map

11.01.XX

visualization layer

a geo-database

The main technologies we are using are Carto (SaaS) and Leaflet (front-end library).

Geospatial analytic showcases

- We’re making use of open source softwares as collective

intelligence to minimize our maintenance effort.

- We’re still exploring new ways to process and present data, and

we think chat app is such a potential channel for this.

- Our team motto: ‘Keep things simple’

Summary

THANK YOU FOR YOUR LISTENINGAHAMOVEYOUR PRIVATE MOVER