29
Google BigQuery - Big data with SQL like query feature, but fast... Google BigQuery Google BigQuery

BigQuery implementation

Embed Size (px)

DESCRIPTION

Google BigQuery technical presentation for starting use of BigQuery

Citation preview

Page 1: BigQuery implementation

Google BigQuery - Big data with SQL like query feature, but fast...

Google BigQueryGoogle BigQuery

Page 2: BigQuery implementation

BigQuery Features

● TB level data analysis● Fast mining response● SQL like query language● Multi-dataset interactive

support● Cheap and pay by use● Offline job support

Page 3: BigQuery implementation

Getting Start

Page 4: BigQuery implementation

BigQuery Web UI

https://bigquery.cloud.google.com/

Page 5: BigQuery implementation

BigQuery structure● Project● Dataset● Table● Job

Page 6: BigQuery implementation

Handson - Import

Page 7: BigQuery implementation

The easily way - Import Wizard

Page 8: BigQuery implementation

Load Data to BigQuery in CMD

CSV / JSON Cloud Storage BigQuery

Page 9: BigQuery implementation

Load CSV to BigQuerygsutil cp [source] gs://[bucket-name]# gsutil cp ~/Desktop/log.csv gs://your-bucket/Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...Uploading: 4.59 MB/36.76 MB

bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING

Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE

Page 10: BigQuery implementation

Load JSON to BigQuerybq load --source_format NEWLINE_DELIMITED_JSON \ [project]:[dataset].[table] [json file] [schema file]

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.

json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

Page 11: BigQuery implementation

Handson - Query

Page 12: BigQuery implementation

Web way - Query Console

Page 13: BigQuery implementation

Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)

Shell way - bq commad

Page 14: BigQuery implementation

Shell way - bq commad

bq query <sql_query># bq query 'select charge_unit,charge_desc,one_charge from testbq.test'

Page 15: BigQuery implementation

BigQuery - Query Language

Page 16: BigQuery implementation

Query syntax● SELECT● WITHIN● FROM● FLATTEN● JOIN● WHERE● GROUP BY● HAVING● ORDER BY● LIMIT

Query supportSupported functions and operators

● Aggregate functions● Arithmetic operators● Bitwise operators● Casting functions● Comparison functions● Date and time functions● IP functions● JSON functions● Logical operators● Mathematical functions● Regular expression functions● String functions● Table wildcard functions● URL functions● Window functions● Other functions

Page 17: BigQuery implementation

select charge_unit,charge_desc,one_charge from testbq.test

Select

+-----------------+----------------+--------------------+| charge_unit | charge_desc | one_charge |+-----------------+----------------+--------------------+| M | 按月計費 |0 || D | 按日計費 |0 || HH | 小時計費 |0 || T | 分計費 |0 || SS | 按次計費 |1 | +-----------------+----------------+--------------------+

Page 18: BigQuery implementation

SELECT a.THEID, a.THENAME ,b.DESCRIPITON FROM user01.USER_MST a LEFT JOIN user01.USER_DETAIL_MST b on a.THEID = b.THEID limit 10'

Join

+-----------------+----------------+-----------------------------+| a_THEPID | a_THENAME | b_DESCRIPITON |+-----------------+----------------+-----------------------------+| 2 | 關於道具 |在道具編成道具。 | | 2 | 關於道具 |寶玉。 || 1 | 關於夥伴 |勇氣覺醒。 || 1 | 關於夥伴 |編輯進行任務的隊伍。 || 1 | 關於夥伴 |數個不同的類型 |+-----------------+----------------+-----------------------------+

Page 19: BigQuery implementation

SELECT

fullName,

age,

gender,

citiesLived.place

FROM (FLATTEN([dataset.tableId], children))

WHERE

(citiesLived.yearsLived > 1995) AND

(children.age > 3)

GROUP BY fullName, age, gender, citiesLived.place

Flatten

+------------+-----+--------+--------------------+

| fullName | age | gender | citiesLived_place |

+------------+-----+--------+--------------------+

| John Doe | 22 | Male | Stockholm |

| Mike Jones | 35 | Male | Los Angeles |

| Mike Jones | 35 | Male | Washington DC |

| Mike Jones | 35 | Male | Portland |

| Mike Jones | 35 | Male | Austin |

+------------+-----+--------+---------------------+

Page 20: BigQuery implementation

SELECT word, COUNT(word) AS countFROM publicdata:samples.shakespeareWHERE (REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUP BY wordORDER BY count DESCLIMIT 3;

Regular Expression

+-----------------+----------------+| word | count |+-----------------+----------------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-----------------+----------------+

Page 21: BigQuery implementation

SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_countFROM [publicdata:samples.wikipedia];

+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20971 || 2002-02-25 15:43:11.000000 | 15955 || 2010-01-14 15:52:34.000000 | 3 || 2009-12-31 19:29:19.000000 | 3 || 2009-12-28 18:55:12.000000 | 3 |+----------------------------+----------------+

Time Function

Page 22: BigQuery implementation

SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_countFROM [publicdata:samples.github_timeline]GROUP BY user_domainHAVING user_domain IS NOT NULL AND user_domain != ''ORDER BY activity_count DESCLIMIT 5;

IP Function

+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+

Page 23: BigQuery implementation

Handson - Programming

Page 24: BigQuery implementation

● Prepare a Google Cloud Platform project● Create a Service Account● Generate key from Service Account p12 key

Prepare

Page 25: BigQuery implementation

Google Service Account

web server applictionservice account

v.s.

Page 26: BigQuery implementation

Prepare Authentications

p12 key → pem key轉換$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem

Page 27: BigQuery implementation

Node.js - bigquery模組

var bq = require('bigquery') , prjId = 'your-bigquery-project-id';

bq.init({ client_secret: '/path-to-client_secret.json', privatekey_pem: '/path-to-privatekey.pem', key_pem: '/path-to-key.pem'});

bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d));});

操作時,透過bq呼叫job之下的function做操作

bigquery模組可參考:https://github.com/peihsinsu/bigquery

Page 28: BigQuery implementation

/* Ref: https://developers.google.com/apps-script/advanced/bigquery */var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };var queryResults = BigQuery.Jobs.query(request, projectId);var jobId = queryResults.jobReference.jobId;queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);var rows = queryResults.rows;while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows);}

Google Drive way - Apps Script