View
134
Download
3
Category
Preview:
Citation preview
DEEP DIVE INTO N1QL:INTERNALS AND POWER FEATURES IN COUCHBASE 4.0Keshav Murthy Couchbase Engineeringkeshav@couchbase.com@N1QL @rkeshavmurthy
©2015 Couchbase Inc. 2
Agenda
Query Service OverviewQuery Service ArchitectureN1QL Power FeaturesQ&A
©2015 Couchbase Inc. 4
Couchbase Server Cluster Architecture
4
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
©2015 Couchbase Inc. 5
Couchbase Server Cluster Service Deployment
5
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Servi
ceSTORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Managed CacheStorage
Managed CacheStorage Storage
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Storage
Managed Cache
Managed Cache
©2015 Couchbase Inc. 6
N1QL: Query Execution Flow
Clients
1. Submit the query over REST API
8. Query result
2. Parse, Analyze, create Plan
7. Evaluate: Documents to results
3. Scan Request;
index filters
6. Fetch the documents
Index
Service
Query
Service Data
Service
4. Get qualified doc keys
5. Fetch Request, doc keys
SELECT c_id, c_first, c_last, c_max FROM CUSTOMER WHERE c_id = 49165;
{ "c_first": "Joe", "c_id": 49165, "c_last": "Montana", "c_max" : 50000}
©2015 Couchbase Inc. 8
Inside a Query Service
Client
FetchParse Plan Join FilterPre-Aggregate
Offset Limit ProjectSortAggregateScan
Query ServiceInde
x Servi
ce
Data Servi
ce
©2015 Couchbase Inc. 10
Client to Query Service: REST API
Communication protocol is REST on top of HTTP
The database protocol structure is embedded within the REST API.
Query Service is stateless: All query information is embedded within the REST request.
REST is open. All REST clients work with N1QL
All N1QL clients, JDBC, ODBC drivers use REST
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
import requestsimport jsonurl = "http://localhost:8093/query"s1=”SELECT * FROM CUSTOMER WHERE C_ID = 1284";r = requests.post(url, data=s1, auth=('Administrator', 'abc'))print r.json()
©2015 Couchbase Inc. 11
Query Execution: Parse & Semantic Check
Analyzes the Query for syntax & grammar
Only verifies for existence of referenced buckets
Flexible schema means, you can refer to arbitrary attribute names
Use IS MISSING clause to check if the keyname is present
Full reference to JSON structure Nested reference: CUSTOMER.contact.address.state
Array Reference: CUSTOMER.c_contact.phone_number[0]
SQL is enhanced to access & manipulate Arrays
Fetch
ParsePlan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 12
Query Execution: Parse & Semantic Check
SELECT c_zip, COUNT(c_id), AVG(c_balance) FROM CUSTOMER WHERE c_state = ‘CA’ AND c_year = 2014ORDER BY COUNT(c_id) DESC LIMIT 100
Simple refererences to the attribute name, just like
columnsUse expressions, just like
SQL
Table/keyspace/bucket references.
Filters on the JSON document work just like SQL
Sorting of the result set
Top N clause.
Fetch
ParsePlan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 14
Query Execution: Plan
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
Each query can be executed in several ways
Create the query execution plan Access path for each keyspace reference Decide on the filters to push down Determine Join order and join method Create the execution tree
For each keyspace reference: Look at the available indices Match the filters in the query with index
keys Choose one or more indices for each
keyspace
©2015 Couchbase Inc. 15
Query Execution: Plan
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
EXPLAIN SELECT c_id, c_first, c_middle, c_last, c_balance FROM CUSTOMER WHERE c_w_id = 49 AND c_d_id = 16 AND c_last = ‘Montana’;
Explain provides the JSON representation of the query plan
Focus on the index selection and the predicates pushed down
©2015 Couchbase Inc. 16
Query Execution: Plan
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
{ "#operator": "IndexScan", "index": "CU_W_ID_D_ID_LAST", "keyspace": "CUSTOMER", … "spans": [ { "Range": { "High": [ "49", "16", "\"Montana\"" ], "Inclusion": 3, "Low": [ "49", "16", "\"Montana\"" ] },
©2015 Couchbase Inc. 17
Query Execution: Plan
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan"#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "keyspace": "CUSTOMER", "namespace": "default » }, {"#operator": "Filter”, "condition": "((((`CUSTOMER`.`C_W_ID`) = 49) and ((`CUSTOMER`.`C_D_ID`) = 16)) and ((`CUSTOMER`.`C_LAST`) = \"Montana\"))”},
©2015 Couchbase Inc. 18
Query Execution: Project
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": ”Project", "keyspace": "CUSTOMER", "namespace": "default » }, {
©2015 Couchbase Inc. 21
Query Execution: Scan
Data Service
Global Secondary
Index
View Indexes
Global Secondary
Index
Global Secondary
Index
KeyScan
IndexScan
IndexScan
Data Fetch
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
Query Service
Cluster Map
©2015 Couchbase Inc. 22
Query Execution: Fetch
List of qualified document-keys are grouped into batches.
List of the documents is obtained from the Index or specified directly via USE KEYS clause.
Fetch request is done in parallel. The join operation use the fetch
operation to get the matching document. Fetch results are streamed into next
operators. For big queries, scan-fetch-join-filter-
aggregation will be executing in parallel.
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 23
Query Execution: Join
You can join any two key spaces if one has document-key of the other.
You can store multiple entities within the same bucket and join between distinct groups
Uses Nested Loop JOIN now JOINs are done in the same order
specified in the query Index selection is important for the first
keyspace in the FROM clause. Qualified documents from that scan is
joined with the other Keyspace using the DOCUMENT KEYS
Fetch
Parse
Plan
JoinFilter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 24
Query Execution: Join
"CUSTOMER": {"C_D_ID": 10,"C_ID": 1938, "C_W_ID": 1,
"C_BALANCE": -10, "C_CITY": ”San Jose", "C_CREDIT": "GC”, \"C_DELIVERY_CNT": 0, "C_DISCOUNT": 0.3866,
"C_FIRST": ”Jay","C_LAST": ”Smith",
"C_MIDDLE": "OE", "C_PAYMENT_CNT": 1, "C_PHONE": ”555-123-1234", "C_SINCE": "2015-03-22 00:50:42.822518", "C_STATE": ”CA", "C_STREET_1": ”555, Tideway Drive", "C_STREET_2": ”Alameda",
"C_YTD_PAYMENT": 10, "C_ZIP": ”94501" }
Document key: “1.10.1938” Document key: “1.10.143”
“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938”: "O_D_ID": 10, "O_ID": 1,
"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 143, "O_OL_CNT": 10, "O_W_ID": 1}x
“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938”:
"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_D_ID": 10, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 1355, "O_OL_CNT": 10, "O_W_ID": 3}
Document key: “1.10.1355”
©2015 Couchbase Inc. 25
Query Execution: Join
SELECT COUNT(o.O_ORDER_CNT ) AS CNT_O_OL_CNT FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS (o.O_CUSTOMER_KEY) WHERE o.O_CARRIER_NAME = ”Penske” AND c.C_STATE = “CA”;
Two keyspace joins
ON Clause for the joinFetch
Parse
Plan
JoinFilter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 26
N1QL: JoinSELECT *FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS (o.O_C_ID) LEFT JOIN PREMIUM p ON KEYS (c.C_PR_ID) LEFT JOIN demographics d ON KEYS (c.c_DEMO_ID)
Fetch
Parse
Plan
JoinFilter
Offset
Limit
Project
Sort
Aggregate
Scan
Support INNER and LEFT OUTER joins Join order follows the order in the FROM
clause. N1QL supports the nested loop joins
now. Join is always from a key of one
document(outer table) to the document key of the second document (inner table)
©2015 Couchbase Inc. 27
Query Execution: Filter
Filters not pushed to the index scan will
have to be applied.
Since the indices are maintained
asynchronously, we apply the filters again
to ensure integrity of the result set.
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
©2015 Couchbase Inc. 28
Query Execution: Aggregate, Sort, Offset, Limit
Each stream creates partial grouping & aggregates
The result set is sorted to evaluated the ORDER BY
The sort is done in parallel OFFSET and LIMIT is typically used in
pagination Evaluated after the ORDER BY clause is
evaluated.
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
AGG
Scan
©2015 Couchbase Inc. 29
Query Execution: Project
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
SELECT C_ZIP, count(*) as NUMCUSTOMERS FROM CUSTOMER GROUP BY C_ZIP ORDER BY COUNT(*) DESC LIMIT 10;{ "requestID": "ff49a6e6-35f0-4eac-8d74-aa8a0aab58e7", "signature": { "C_ZIP": "json", "NUMCUSTOMERS": "number" }, "results": [ { "C_ZIP": "304811111", "NUMCUSTOMERS": 12 },... { "C_ZIP": "709811111", "NUMCUSTOMERS": 10 } ], "status": "success", "metrics": { "elapsedTime": "1.57600634s", "executionTime": "1.575851088s", "resultCount": 10, "resultSize": 228 }}
Projection
Signature of the resultset
Query execution & resultset information
©2015 Couchbase Inc. 31
Power Features: USE KEYS
Data Service
Global Secondary
Index
View Indexes
Global Secondary
Index
Global Secondary
Index
KeyScan
IndexScan
IndexScan
Data Fetch
Query Service
Cluster Map
©2015 Couchbase Inc. 32
Power Features: USE KEYS
SELECT c_id, c_first, c_middle, c_last, (c_max - c_balance) FROM CUSTOMER USE KEYS [‘1.10.1938’];
KeyScan: Directly use the Couchbase cluster map to get the document
You can give one or more values in the array From N1QL, get keys via: META(CUSTOMER).id
©2015 Couchbase Inc. 33
Power Features: USE KEYSEXPLAIN SELECT * FROM CUSTOMER USE KEYS ['1.1.1634', '1.1.1639'];{ …[ { "#operator": "Sequence", "~children": [ { "#operator": "KeyScan", "keys": "[\"1.1.1634\”, "1.1.1639”]" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "keyspace": "CUSTOMER", "namespace": "default" },
©2015 Couchbase Inc. 34
Power Features: USE KEYS
UPDATE customer USE KEYS ['1.20.981', '12.42.196'] SET c_balance = c_balance + 200;
DELETE customer USE KEYS ['1.20.198', '12.42.2848'];
Even when you use the USE KEYS, the indexes are automatically maintained.
©2015 Couchbase Inc. 36
UNNEST: Denormalized CUSTOMER Document
{"C_ZIP" : "828011111","C_STATE" : "vt",
"C_FIRST" : "ykfdbqku","C_CREDIT" : "GC","C_DELIVERY_CNT" : 0,"C_W_ID" : 1,"C_CITY" : "quhpismkzumehqhr","C_STREET_1" : "rmtxadlsxqefdcwf","C_D_ID" : 1,"ORDERS" : [
{"ORDER_LINE" : [
{"OL_AMOUNT" : 0,"OL_DELIVERY_D" : "2015-02-11T14:55:25.480Z","OL_DIST_INFO" : "yptiwgjdelfxmathbjzirvye","OL_I_ID" : 35828,"OL_SUPPLY_W_ID" : 1,"OL_QUANTITY" : 5
},{
"OL_AMOUNT" : 0,"OL_DELIVERY_D" : "2015-02-11T14:55:25.480Z","OL_DIST_INFO" : "dxhqulhcgksjgqsicujzqhdb","OL_I_ID" : 26024,"OL_SUPPLY_W_ID" : 1,"OL_QUANTITY" : 5
},}
….
©2015 Couchbase Inc. 37
Power Features: UNNEST operation
SELECT COUNT(my_order_line) AS total_orders, MAX(my_order_line.ol_delivery_d) AS max_delivery_date, MAX(my_order_line.ol_quantity) AS max_order_quantity, MAX(my_orders.o_entry_d) AS max_customer_entry, MAX(my_orders.o_ol_cnt) AS max_orderline_entry, COUNT(customer) AS total_customers FROM CUSTOMER MY_CUSTOMER UNNEST ORDERS AS my_orders UNNEST my_orders.order_line AS my_order_line ;
©2015 Couchbase Inc. 39
Power Features: Named Prepare Statement
Client
FetchParse Plan Join FilterPre-Aggregate
Offset Limit ProjectSortAggregateScan
Query ServiceInde
x Servi
ce
Data Servi
ce
©2015 Couchbase Inc. 41
Named Prepared Statement
url="http://localhost:8093/query"s = requests.Session()s.keep_alive = Trues.auth = ('Administrator','password')
query = {'statement':'prepare select * from `beer-sample` where name = [$1]’}r = s.post(url, data=query, stream=False)prepared = str(r.json()['results'][0]['name'])
for i in range (0, 5): query={'prepared': '"' + prepared + '"', 'args': '["old_hat_brewery"]' } r = s.post(url, data=query, stream=False) print i, r.json()['metrics']['executionTime']
Bind Values Many times
Prepare ONCE
©2015 Couchbase Inc. 43
Functional Indices"contacts": { "age": 46, "children": [ { "age": 17, "fname": "Aiden", "gender": "m" }, { "age": 2, "fname": "Bill", "gender": "f" } ], "email": "dave@gmail.com", "fname": "Dave", "hobbies": [ "golf", "surfing" ],
"lname": "Smith", "relation": "friend", "title": "Mr.", "type": "contact" } } ],
CREATE INDEX idx_lname_lower ON contacts(LOWER(lname)) using GSI;
SELECT count(*) FROM contacts WHERE lower(lname) = smith;
©2015 Couchbase Inc. 44
Functional Indices"contacts": { "age": 46, "children": [ { "age": 17, "fname": "Aiden", "gender": "m" }, { "age": 2, "fname": "Bill", "gender": "f" } ], "email": "dave@gmail.com", "fname": "Dave", "hobbies": [ "golf", "surfing" ],
"lname": "Smith", "relation": "friend", "title": "Mr.", "type": "contact" } } ],
The value indexed is the result of the function or expression.
The query has to use the same expression in the WHERE clause for the planner to consider using the index.
Use EXPLAIN to verify using the index.
©2015 Couchbase Inc. 46
Power Features: IntersectScan (Multi-Index Scan)
SELECT * FROM customer WHERE c_last = ’Smith’ AND c_city = 'Santa Clara';
Index scan using composite index:– Needs first N keys to be used to choose the index– Will multiple indexes on same set of columns to support filter
push down
IntersectScan using multiple indices:– Multiple indices are scanned in parallel– Provides more flexibility in using the indices for filters– Requires less number of indexes defined on table.• Can save on disk space and memory space as well.
©2015 Couchbase Inc. 47
Multi-Index Scan SELECT * FROM customer WHERE c_last = ’Smith’ AND c_city = 'Santa Clara'; "#operator": "IntersectScan", "scans": [ { "#operator": "IndexScan", "index": "idx_cust_city", "keyspace": "CUSTOMER", "limit": 9.223372036854776e+18, "namespace": "default", "spans": [ { "Range": { "High": [ "\"Santa Clara\"" ], "Inclusion": 3, "Low": [ "\"Santa Clara\"" ] }, "Seek": null } ], "using": "view" },
{ "#operator": "IndexScan", "index": "idx_last_name", "keyspace": "CUSTOMER", "limit": 9.223372036854776e+18, "namespace": "default", "spans": [ { "Range": { "High": [ "\”Smith\"" ], "Inclusion": 3, "Low": [ "\”Smith\"" ] }, "Seek": null } ],
Recommended