Upload
-
View
95
Download
4
Embed Size (px)
Citation preview
Проблема
Поиск решения
Архитектура
Инсталляция
Эксплуатация
Что имеем
Большое количество данных (3 TB)
Долгое построение отчетов
3 сервера (64 Ram x 1TB SSD)
Поиск решения
MPP архитектура
Адекватная стоимость/Open source
Простота работы и администрирования
Адекватный язык запросов
Наличие готовых BI инструментов
Open source
MPP architecture
Extension (not fork)
cstore_fdw + pg_shard
No DML
Limited joins
No CTE
Amazon dwh
PostgreSQL 8.2
Column store
MPP architecture
$13k per year (TB)
Open source
MPP architecture
Hybrid row/column store
PostgreSQL 8.2 (8.3)
PostgreSQL 8.2 (8.3)
8.3 Full text search (Apache SOLR)
8.4 Analytics functions (sum(baz) OVER (PARTITION BY foo))
8.4 CTE (WITH foo AS select * from bar)
9.5 GROUPING SETS/CUBE/ ROLLUP
9.6 parallel seq scan/aggregate (by design)
Fast
Very fast
Open source
Very specific SQL
Yandex ClickHouse
Horrible joins
Cant delete data(*)
Александр Зайцев. «Переезжаем на Yandex ClickHouse»
Tests
25M rows
0
50
100
150
200
1 week 1 month 3 month
Redshift Greenplum
Time in seconds (lower is better)
Time in seconds (lower is better)
0
17.5
35
52.5
70
Test 1 Test 2 Test 3
Yandex Clickhouse Greenplum
Архитектура
SQL
Master Node
Segment host Segment host
Segment host
Подробнее в блоге компании Тинькофф на Хабре
Beginners guide
Greenplum installation guide
10G interconnect
More disks (RAID 10)
swapoff
gpfdist — parallel file distribution program (more than 100GB)
s3 external tables (read/write/gzip)
COPY on master node (less than 100GB)
Don’t forget about VACUUM
Data loading
Data loading
No JSON type
pl/python + ujson
Don’t use JSON, please
Make columns from json fields (schema)
Default Monitoring
Greenplum command center
Basic charts and metrics
Query monitor
Historic data
Monitoring in Aviasales
CPU+RAM+IO+LOCKS and other PostgreSQL stuff
Resource queues
Spilling queries gp_toolkit.gp_workfile* view
Telegraf — collect metrics
Grafana dashboards (4.0 alerts)
5TB compressed data (14TB uncompressed)
No aggregates
Near realtime BI