1
Mercari meets MySQL Analytics Service
SRE Team of JapanMercari, Inc.
2
About mercari
3
About company
Japan’s First Unicorn: listed on Tokyo Stock Exchange’s Mothers Market - a board
for high-growth companies - in June 2018
February 1st, 2013
Established
Tokyo, Sendai, Fukuoka,
Palo Alto, Portland, Boston
Offices
Approx. 1,800
Including subsidiaries
Headcount
4
What is Mercari?
● Service start: July 2013
● OS: Android, iOS, Web
browsers
● Usage fee: Free
○ Commission fee for sold
items: 10% of the sales price
C2C marketplace app that allows users to enjoy buying and selling
Mercari in the U.S.
Focusing on a global marketplace: doing business in Japan and the U.S.
Succeeding in a market as large and diverse as the U.S. is a key milestone in achieving our
mission.
“The Selling App.”
6
Data democratization
● ALL Mercari employees have opportunities to be trained by BI team how to write SQL
○ Issuing queries against anonymized data on MySQL and Google BigQuery(BQ)
○ Some queries are too slow on MySQL, then using Google BigQuery
7
Architecture
Production DB
Anon-DB
PII Filter
Replication(Sync within few seconds)
Google BigQuery(BQ)
ETL(Sync once per day)
Metal Server
8
Problems
● Google BigQuery does not have Transaction/Consistency
● Large volume ETL system is too complex to manage
9
Synchronize anon-db and BigQuery is painful...
● Loading data to Google BigQuery is based
on appending style
● Then,
○ Easy to duplicate
○ If wrong data are loaded, we have to overwrite table.
10
We expect become System Simplicity
PRODUCTION DB
Anon-DB
PII Filter MySQL
Analytics Service
MySQLWith RAPID
Replication ETL -> Replication
Bare Metal Servers Oracle Cloud
11
Functionality
● We have to load RAPID cluster explicitly-- Define RAPID as the Secondary Engine-- NEW TABLEmysql> CREATE TABLE t1 (a INT, b BLOB) SECONDARY_ENGINE RAPID;
-- OR TABLE ALREADY EXISTSmysql> ALTER TABLE t1 SECONDARY_ENGINE=rapid;
-- Then load it to RAPIDmysql> ALTER TABLE t1 SECONDARY_LOAD;
● But it’s quite simple compare to build and maintain ETL
12
Performance – Top-K
SELECT
created,count(*)
FROM
itemsWHEREstatus=’sold_out’ AND
shipping_method=17GROUP BYcreated
ORDER BY 2 DESC
LIMIT 1000;
Google BigQuery 5.5 sec
MySQL Analytics Service 0.40 sec
On-Prem MySQL 2 hrs+
13
Performance - JOINS SEVERAL TABLES
SELECTCOUNT(i.id) cnt
FROMitems i
LEFT JOINusers buyer ON i.buyer_id = buyer.id
LEFT JOINusers seller ON i.seller_id = seller.id
LEFT JOINtransaction_evidences te ON i.id =
te.item_id
Google BigQuery 1 min 52 sec
MySQL Analytics Service 38.0 sec
On-Prem MySQL 9 hrs+
14
Conclusion - Performance
vs Google BigQuery vs On-Prem MySQL 5.7
SIMPLE COUNT 21x 3,200x
GROUP BY 24x 54x
GROUP BY LongRange 0.72x over 286x
Top-K 14x over 45,000x
JOINS SEVERAL TABLES 2.9x over 844x
Extremely Improved!
15
Thanks!