PostgreSQL 9.4O que há de novo?
Matheus Espanhol5 de março de 2015
PostgreSQL 9.4
Versão lançada em 18/12/2014
1 ano e 3 meses após a 9.3
Início do desenvolvimento em 14/06/2013
DecodificaçãoLógica do WAL
Streaming Replication 9.4SELECT unnest(enumvals) FROM pg_settings WHERE name = 'wal_level';
unnest minimal archive hot_standby logical
ALTER SYSTEM SET wal_level TO 'hot_standby';ALTER SYSTEM SET max_wal_senders TO 3;ALTER SYSTEM SET listen_addresses TO '*';
$ cat $PGDATA/postgresql.auto.conf
# Do not edit this file manually!# It will be overwritten by ALTER SYSTEM command.wal_level = 'hot_standby'max_wal_senders = '3'listen_addresses = '*'
SELECT name,setting,context FROM pg_settings WHERE name IN ('wal_level','max_wal_senders','listen_addresses');
name | setting | context ++ listen_addresses| localhost| postmaster max_wal_senders | 0 | postmaster wal_level | minimal | postmaster
$ pg_ctl restart
SELECT name,setting,context FROM pg_settings WHERE name IN ('wal_level','max_wal_senders','listen_addresses');
name | setting | context ++ listen_addresses | * | postmaster max_wal_senders | 3 | postmaster wal_level | hot_standby | postmaster
Streaming Replication 9.4
CREATE TABLESPACE tblspc1 LOCATION '/tablespace1' WITH (random_page_cost=2.0);
CREATE ROLE repl LOGIN REPLICATION;
No pg_hba.conf:
host replication repl 172.16.124.211/32 trust
SELECT pg_reload_conf();
No servidor slave:
pg_basebackup h 172.16.124.211 U repl pgdata=/postgres/data_slave progress writerecoveryconf maxrate=50M tablespacemapping=/tablespace1=/tablespace2 xlogdir=/xlog
129347/129347 kB (100%), 2/2 tablespaces
Streaming Replication 9.4
Ainda no servidor slave, arquivo postgresql.conf:
hot_standby = on
$ pg_ctl D /postgres/data_slave start
LOG: database system was interrupted; last known up at 20150304 09:27:27 BRTLOG: entering standby modeLOG: started streaming WAL from primary at 0/10000000 on timeline 1FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000010 has already been removed
Como resolver?
Indicar no recovery.conf o servidor de backup PITR
Configurar wal_keep_segments no master
Utilizar slots de replicação
Streaming Replication 9.4
No master:
ALTER SYSTEM SET max_replication_slots TO 1;
$ pg_ctl D /postgres/data restart
SELECT * FROM pg_create_physical_replication_slot('slave1');
slot_name | xlog_position + slave1 |
SELECT slot_name,slot_type,active,restart_lsn FROM pg_replication_slots;
slot_name | slot_type | active | restart_lsn +++ slave1 | physical | f |
Replication Slots
Replication SlotsNo slave:
Executar novamente o pg_basebackup
Alterar novamente hot_standby = on
Adicionar parâmetro no recovery.conf:
primary_slot_name = 'slave1'
$ pg_ctl D /postgres/data_slave start
LOG: database system was interrupted; last known up at 20150304 09:54:16 BRTLOG: entering standby modeLOG: consistent recovery state reached at 0/17000060LOG: redo starts at 0/17000060LOG: record with zero length at 0/18000060LOG: database system is ready to accept read only connectionsLOG: started streaming WAL from primary at 0/18000000 on timeline 1
No master:
SELECT slot_name,slot_type,active,restart_lsn FROM pg_replication_slots;
slot_name | slot_type | active | restart_lsn +++ slave1 | physical | t | 0/180007F8
SELECT pg_drop_replication_slot('slave1');
ERROR: replication slot "slave1" is already active
Replication Slots
Decodificação lógicaNo master:
ALTER SYSTEM SET wal_level TO 'logical';
SELECT * FROM pg_create_logical_replication_slot('dextradb_slot','test_decoding');
slot_name | xlog_position + dextradb_slot | 0/18055988
SELECT * FROM pg_replication_slots;
slot_name | dextradb_slotplugin | test_decodingslot_type | logicaldatoid | 16384database | dextradbactive | fxmin | catalog_xmin | 787restart_lsn | 0/18055950
No master:
INSERT INTO newtbl VALUES (generate_series(1,10),'Nome'||generate_series(1,10));
SELECT * FROM pg_logical_slot_get_changes('dextradb_slot',NULL,NULL);
location | xid | data ++ 0/18055A98 | 787 | BEGIN 787 0/18055A98 | 787 | table public.newtbl: INSERT: id[integer]:1 nome[text]:'Nome1' 0/180570D0 | 787 | table public.newtbl: INSERT: id[integer]:2 nome[text]:'Nome2' 0/18057118 | 787 | table public.newtbl: INSERT: id[integer]:3 nome[text]:'Nome3' 0/18057160 | 787 | table public.newtbl: INSERT: id[integer]:4 nome[text]:'Nome4' 0/180571A8 | 787 | table public.newtbl: INSERT: id[integer]:5 nome[text]:'Nome5'... 0/18057398 | 787 | COMMIT 787
pg_recvlogical h 172.16.124.211 U repl dbname=dextradb slot=dextradb_slot start f output.txt
Decodificação lógica
Controle de UPDATE e DELETE por tabela:
ALTER TABLE newtbl REPLICA IDENTITY FULL;
UPDATE newtbl SET nome = 'novonome7' WHERE id = 7;
SELECT data FROM pg_logical_slot_get_changes('dextradb_slot',NULL,NULL);
data BEGIN 861table public.newtbl:
UPDATE: oldkey: id[integer]:7 nome[text]:'Nome7' newtuple: id[integer]:7 nome[text]:'novonome7'
COMMIT 861
Decodificação lógica
JSONB
jsonb
Armazenamento de documentos JSON binários
Índices GIN em documentos e subdocumentos
Novas funções de manipulação
Novos operadores
Combinação de NoSQL e SQL
jsonbCREATE TABLE json_table(id SERIAL PRIMARY KEY,data jsonb);
INSERT INTO json_table(data) VALUES($${ "_id" : { "$oid" : "52cdef7c4bab8bd675297d8a" }, "name" : "Wetpaint", "permalink" : "abc2" …
SELECT id,jsonb_object_keys(data) FROM json_table WHERE id = 1;
id | jsonb_object_keys + 1 | _id 1 | name 1 | image 1 | offices ...
jsonb
SELECT jsonb_typeof(data>'founded_year') FROM json_table LIMIT 1;
jsonb_typeof Number
SELECT id,data>'twitter_username' AS twitter_account,data>'founded_year' AS founded_year
FROM json_table WHERE data>'founded_year' >= '2012' ORDER BY data>'twitter_username' DESC;
id | twitter_account | founded_year ++ 1114 | "widgetbox" | 2012 10637 | "whoscall_app" | 2013 16888 | "vistagen" | 2013 557 | "tripodsocial" | 2012 13689 | "Topify" | 2012 3013 | "springleap" | 2012 1950 | "skydeck" | 2012
jsonb
EXPLAIN ANALYZE SELECT data>'tag_list',data>'founded_year' FROM json_table WHERE data>'tag_list' ? 'socialnews';
QUERY PLAN Seq Scan on json_table (cost=0.00..2566.11 rows=19 width=886)
(actual time=8.226..266.002 rows=2 loops=1) Filter: ((data > 'tag_list'::text) ? 'socialnews'::text) Rows Removed by Filter: 18799 Planning time: 0.089 ms Execution time: 266.032 ms
jsonb
CREATE INDEX ON json_table USING gin((data>'tag_list'));
EXPLAIN ANALYZE SELECT data>'tag_list',data>'founded_year' FROM json_table WHERE data>'tag_list' ? 'socialnews';
QUERY PLAN Bitmap Heap Scan on json_table
(cost=12.15..83.33 rows=19 width=886) (actual time=0.458..0.506 rows=2 loops=1)
Recheck Cond: ((data > 'tag_list'::text) ? 'socialnews'::text) Heap Blocks: exact=2 > Bitmap Index Scan on json_table_expr_idx1
(cost=0.00..12.14 rows=19 width=0) (actual time=0.024..0.024 rows=2 loops=1)
Index Cond: ((data > 'tag_list'::text) ? 'socialnews'::text)
Planning time: 0.239 ms Execution time: 0.545 ms
jsonb
CREATE INDEX ON json_table USING gin((data>'acquisition'>'acquiring_company') jsonb_path_ops); CREATE INDEX
EXPLAIN ANALYZE SELECT data>'acquisition'>'acquiring_company'>>'name',data>'number_of_employees' FROM json_table
WHERE data>'acquisition'>'acquiring_company' @> '{"name": "Google", "permalink": "google"}';
QUERY PLAN Bitmap Heap Scan on json_table (cost=20.15..91.47 rows=19 width=879)
(actual time=0.229..6.611 rows=71 loops=1) Recheck Cond:
(((data > 'acquisition'::text) > 'acquiring_company'::text) @> '{"name": "Google", "permalink": "google"}'::jsonb)
Heap Blocks: exact=47 > Bitmap Index Scan on json_table_expr_idx2
(cost=0.00..20.15 rows=19 width=0) (actual time=0.059..0.059 rows=71 loops=1)
Index Cond: (((data > 'acquisition'::text) > 'acquiring_company'::text) @> '{"name": "Google", "permalink": "google"}'::jsonb)
Planning time: 0.311 ms Execution time: 6.752 ms
jsonb
O tipo JSON ainda é útil nas seguintes situações:
Quando não necessário acesso indexado
Quando necessário preservar:
- Espaços em branco
- Ordenação das chaves
- Duplicidade das chaves
Maior performance na importação e exportação
Para todas as outras situações use JSONB
jsonb
View materializadasem bloqueios
CREATE MATERIALIZED VIEW mvw_venda_por_cidade ASSELECT ROW_NUMBER() OVER (ORDER BY estado) AS id,
estado,cidade,valor_venda,SUM(valor_venda::numeric) OVER w AS total_venda
FROM (SELECT data>>'name' AS nome, data>'acquisition'>>'price_amount' AS valor_venda, offices.value>>'state_code' AS estado, offices.value>>'city' AS cidade FROM json_table, jsonb_array_elements(data>'offices') AS offices(value) WHERE data>'acquisition'>>'price_amount' <> 'null' ) AS tWINDOW w AS (PARTITION BY cidade)ORDER BY estado;
CREATE UNIQUE INDEX ON mvw_venda_por_cidade(id);
REFRESH MATERIALIZED VIEW CONCURRENTLY mvw_venda_por_cidade;
Views materializadas
Índices GIN+ compactos
+ rápidos
SELECT taglist FROM teste_fts WHERE taglist @@ 'bigdata & socialmedia'::tsquery;
taglist 'audiencetargeting' 'bigdata' 'socialmedia' 'webplugins' 'analytics' 'bigdata' 'facebook' 'marketing' 'monitoring' …
EXPLAIN ANALYZE SELECT * FROM teste_fts WHERE taglist @@ 'bigdata & socialmedia'::tsquery;
QUERY PLAN Seq Scan on teste_fts (cost=0.00..449.00 rows=1 width=75)
(actual time=0.021..4.681 rows=2 loops=1) Filter: (taglist @@ '''bigdata'' & ''socialmedia'''::tsquery) Rows Removed by Filter: 18798Planning time: 0.160 msExecution time: 4.709 ms
Índices GIN
CREATE INDEX taglist_gin_idx ON teste_fts USING gin(taglist);
SELECT relname,indexrelname,idx_scan,pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes WHERE relname = 'teste_fts';
relname | indexrelname | idx_scan | pg_size_pretty +++ teste_fts | taglist_gin_idx | 0 | 2016 kB
Na 9.3:
relname | indexrelname | idx_scan | pg_size_pretty +++ teste_fts | taglist_gin_idx | 2 | 2312 kB
A redução do tamanho irá variar de acordo com a repetição de valoresda coluna, podendo reduzir até 5 vezes em relação a 9.3.
Índices GIN
EXPLAIN ANALYZE SELECT * FROM teste_fts WHERE taglist @@ 'bigdata & socialmedia'::tsquery;
QUERY PLAN Bitmap Heap Scan on teste_fts (cost=10020.00..100024.02 rows=1 width=75)
(actual time=0.031..0.032 rows=2 loops=1) Recheck Cond: (taglist @@ '''bigdata'' & ''socialmedia'''::tsquery) Heap Blocks: exact=2 > Bitmap Index Scan on taglist_gin_idx
(cost=0.00..20.00 rows=1 width=0) (actual time=0.025..0.025 rows=2 loops=1)
Index Cond: (taglist @@ '''bigdata'' & ''socialmedia'''::tsquery)Planning time: 0.175 msExecution time: 0.058 ms
Índices GIN
Na 9.3:
EXPLAIN ANALYZE SELECT * FROM teste_fts WHERE taglist @@ 'bigdata & socialmedia'::tsquery;
QUERY PLAN Bitmap Heap Scan on teste_fts (cost=20.00..24.02 rows=1 width=75)
(actual time=0.146..0.154 rows=2 loops=1) Recheck Cond: (taglist @@ '''bigdata'' & ''socialmedia'''::tsquery)
> Bitmap Index Scan on teste_fts_taglist_gin_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.134..0.134 rows=2 loops=1)
Index Cond: (taglist @@ '''bigdata'' & ''socialmedia'''::tsquery)Total runtime: 0.200 ms
Índices GIN
Background Workersdinâmicos
Background Workers dinâmicos
Infraestrutura criada na 9.3
Permite iniciar processos para tarefas específicas
Agora é possível iniciar esses processos dinamicamente
Alocar memória compartilhada dinamicamente
Funcionalidade base para:Processamento paraleloMecanismo de filaAgendamento...
SQL...Novas possibilidades
SELECT AVG(total_venda) AS media_geral,AVG(total_venda) FILTER (WHERE estado = 'CA') AS media_ca AVG(CASE WHEN estado = 'CA' THEN total_venda END)FROM mvw_venda_por_cidade;
media_geral | media_ca + 5732497468.13186813 | 9165636101.22699387
FILTER
SELECT * FROM ROWS FROM (
json_each('{"last_name": "Mulcahy", "first_name": "Susan"}'),unnest(array['a','b'],array['a'])
) WITH ORDINALITY;
key | value | unnest | unnest | ordinality ++++ last_name | "Mulcahy" | a | a | 1 first_name | "Susan" | b | | 2
WITH ORDINALITY/ROWS FROM/unnest(... , ...)
CREATE TABLE tabela AS SELECT generate_series(1,80) AS qtd;
INSERT INTO tabela VALUES (10);
SELECT unnest(percentile_disc(array[0.25,0.5,0.75,1]) WITHIN GROUP (ORDER BY qtd)),
rank(80) WITHIN GROUP (ORDER BY qtd), dense_rank(80) WITHIN GROUP (ORDER BY qtd), mode() WITHIN GROUP (ORDER BY qtd)FROM tabela;
unnest | rank | dense_rank | mode +++ 20 | 81 | 80 | 10 40 | 81 | 80 | 10 60 | 81 | 80 | 10 80 | 81 | 80 | 10
WITHIN GROUP
+ Performance
Redução de locks para alguns tipos de ALTER TABLE
Escrita paralela no WAL buffers
Escrita seletiva no WAL para UPDATEs Apenas as colunas alteradas
Mais performance na agregação para WINDOW FUNCTION
Melhorias de performance
+ Estatísticas
pg_stat_archiverALTER SYSTEM SET archive_mode TO on;ALTER SYSTEM SET archive_command
TO 'cp %p /backup/archives/%f';
$ pg_ctl restart
pg_stat_archiverSELECT *, current_setting('archive_mode')::BOOLEAN AND (last_failed_wal IS NULL OR last_failed_wal <= last_archived_wal) AS is_archiving, CAST (archived_count AS NUMERIC) / EXTRACT (EPOCH FROM age(now(), stats_reset)) AS current_archived_wals_per_secondFROM pg_stat_archiver;
[ RECORD 1 ]+archived_count | 4last_archived_wal | 00000001000000000000001Flast_archived_time | 20150305 08:22:29.79802803failed_count | 15last_failed_wal | 00000001000000000000001Elast_failed_time | 20150305 08:21:37.66306303stats_reset | 20150303 15:43:41.34448603is_archiving | tcurrent_archived_wals_per_second | 2.73349069156828e05
n_mod_since_analyzeANALYZE newtbl;
SELECT relname,n_tup_upd,n_tup_del,n_tup_hot_upd,n_dead_tup,n_mod_since_analyze,last_analyze
FROM pg_stat_user_tables;
relname | n_tup_upd | n_mod_since_analyze | last_analyze +++ newtbl | 23 | 0 | 20150305 08:34:05.38050103
UPDATE newtbl SET nome = 'novonome5' WHERE id = 5;
relname | n_tup_upd | n_mod_since_analyze | last_analyze +++ newtbl | 24 | 1 | 20150305 08:34:05.38050103
Outras coisas...
No slave, adicionar no arquivo recovery.conf:
recovery_min_apply_delay = 15min
pg_ctl D /postgres/data_slave start
SELECT now(),pg_last_xact_replay_timestamp();
now | pg_last_xact_replay_timestamp + 20150304 16:00:46.34467203 | 20150304 15:45:45.88123503
recovery_min_apply_delay
ALTER SYSTEM SET logging_collector TO on;ALTER SYSTEM SET log_lock_waits TO on;
pg_ctl restart
BEGIN;UPDATE newtbl SET nome = 'novonome6' WHERE id = 6;COMMIT;
LOG: process 32106 still waiting for ShareLock on transaction 896 after 1000.059 ms
DETAIL: Process holding the lock: 31858. Wait queue: 32106.CONTEXT: while updating tuple (0,12) in relation "newtbl"STATEMENT: UPDATE newtbl SET nome = 'nome6' WHERE id = 6;LOG: process 32106 acquired ShareLock
on transaction 896 after 255232.737 ms
Logs de bloqueio detalhados
BEGIN;UPDATE newtbl SET nome = 'nome6' WHERE id = 6;COMMIT;
ALTER INDEX ALL IN TABLESPACE pg_default SET TABLESPACE tblspc1;
autovacuum_work_mem
Memória a ser concedida para autovacuum workersPadrão: -1 (continua utilizando maintenance_work_mem)
huge_pages
Habilita o uso de huge pages do Linux, reduzindo o uso de CPU no gerenciamento de memóriaPadrão: try (Usa o recurso, se disponível)
session_preload_libraries
Bibliotecas carregadas no início da conexão
Administração
recovery_target = 'immediate'
Ao encontrar um estado consistente, finaliza a restauração
Adicionado no recovery.conf do servidor de contingência
Útil para casos em que:
Uma rápida restauração é mais importante que a atualização dos dados
Alteração de valores padrões:
work_mem: 1MB 4MB→
maintenance_work_mem: 16MB 64MB→
effective_cache_size: 128MB 4GB→
Administração
ANALYZE em etapas:
$ vacuumdb analyzeinstages
Generating minimal optimizer statistics (1 target)Generating medium optimizer statistics (10 targets)Generating default (full) optimizer statistics
Atualização seletiva de colunas para VIEWS atualizáveis
Triggers para FOREIGN TABLEs
Stacktrace em PL/pgSQLGET DIAGNOSTICS var = PG_CONTEXT;
Extensão pg_prewarmPré-carregar tabelas no shared_buffers ao iniciar
E mais...
Migração
Dump e restore com paralelismo
pg_upgrade
Slony
Vamos migrar?
[email protected]@dextra.com.br
Dextra Sistemas
http://www.dextra.com.br/postgres
www.pganalytics.io