25
TIME SERIES AGGREGATES USING CASSANDRA, KAIROSDB & ALCHEMY API

CCM AlchemyAPI and Real-time Aggregation

Embed Size (px)

DESCRIPTION

An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction. Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys. All written in Python!

Citation preview

Page 1: CCM AlchemyAPI and Real-time Aggregation

TIME SERIES AGGREGATES

USING CASSANDRA, KAIROSDB & ALCHEMY API

Page 2: CCM AlchemyAPI and Real-time Aggregation

• Bio-Informatics Engineer

• Business Analyst

• Data Warehouse Specialist

• System Operations / DevOps

• Founder & Lead Technologist

• Presenter, Speaker, Organizer

• Founder / Do-Gooder

• Data Engineer & Manager

@

Who is Victor Anjos?KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

Page 3: CCM AlchemyAPI and Real-time Aggregation

@

Quick Review…KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

Page 4: CCM AlchemyAPI and Real-time Aggregation

@

Why Real-Time?KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

Page 5: CCM AlchemyAPI and Real-time Aggregation

@

REMEMBER --- TWEETKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

PLEASE MAKE SURETO TWEET…

NEED TWEETSTO THE HASHTAGSBELOW AT THE END

Page 6: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

Page 7: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

Page 8: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

cqlsh:test> SELECT * FROM example;

field1 | field2 | field3--------+--------+-------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9

Page 9: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

Page 10: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

Page 11: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

Page 12: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

cqlsh:test> INSERT INTO example (partitionKey1, ... partitionKey2, clusterKey1, clusterKey2, ... normalField1, normalField2) VALUES ( ... 'partitionVal1', ... 'partitionVal2', ... 'clusterVal1', ... 'clusterVal2', ... 'normalVal1', ... 'normalVal2');

Page 13: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

Page 14: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

[default@test] list example;-------------------RowKey: partitionVal1:partitionVal2=> (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000)=> (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)

Page 15: CCM AlchemyAPI and Real-time Aggregation

@

Keys in C*KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

1. First part of composite key [inside the inner brackets] is called “Partition Key”, rest [no inside the inner brackets] are “Cluster Keys”.

2. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (“:” as separator) to form column names (cluster keys). Column values remain unchanged.

3. Cluster keys (other than partition keys) are ordered, and you cannot allowed search on random columns, you have to specify the entire cluster key and can run a range query on the final portion of it.

Page 16: CCM AlchemyAPI and Real-time Aggregation

@

A bit of data modellingKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

USER ACTIVITY DATA MODEL

CREATE TABLE user_activity (… username varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar… PRIMARY KEY (username, interaction time)… ) WITH CLUSTERING ORDER BY (interaction_time

DESC);

CREATE TABLE user_activity_history (… username varchar,… interaction_date varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar,… PRIMARY KEY

((username,interaction_date),interaction_time)… );

Page 17: CCM AlchemyAPI and Real-time Aggregation

@

Data modelling 4 QUERIESKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

FIND A CAR IN A LOTCREATE TABLE car_location_index (

… make varchar,… model varchar,… colour varchar,… vehicle_id int,… lot_id,… PRIMARY KEY ((make,model,colour),vehicle_id)… );

Page 18: CCM AlchemyAPI and Real-time Aggregation

@

Data modelling 4 QUERIESKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

FIND A CAR IN A LOT

Truth(iness) Table

Page 19: CCM AlchemyAPI and Real-time Aggregation

@

Data modelling 4 QUERIESKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

FIND A CAR IN A LOT

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’’,’Blue’,1234,8675309)

Page 20: CCM AlchemyAPI and Real-time Aggregation

@

Data modelling 4 QUERIESKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

FIND A CAR IN A LOTSELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘Ford’AND model = ‘’AND colour= ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309

SELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘’AND model = ‘’AND colour = ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309 8765 | 5551212

Page 21: CCM AlchemyAPI and Real-time Aggregation

@

Enter KairosDBKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

[{ "name": "archive.file.tracked", "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]], "tags": { "host": "server1", "data_center": "DC1" }},{ "name": "archive.file.search", "timestamp": 999, "value": 321, "tags":{"host":"test"}}]

http://localhost:8080/api/v1/datapoints

http://localhost:8080/api/v1/datapoints/query

Page 22: CCM AlchemyAPI and Real-time Aggregation

@

Sentiment Analysis NLPKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

Page 23: CCM AlchemyAPI and Real-time Aggregation

@

Sentiment Analysis NLPKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

He loves me He loves me not

Page 24: CCM AlchemyAPI and Real-time Aggregation

@

AlchemyAPIKEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

AlchemyAPI uses natural language processing technology and machine learning algorithms to extract semantic meta-data from content, such as information on people, places, companies, topics, facts, relationships, authors, and languages.

Page 25: CCM AlchemyAPI and Real-time Aggregation

@

Prep Work…KEEP

TWEETING

@VictorFAnjos

@viafoura

@AlchemyAPI

@Datastax

@Data_for_Good

#BDWTO #BDW14

https://gist.github.com/vanjos/6169734Install CCM

Install KairosDBhttps://code.google.com/p/kairosdb/wiki/GettingStarted

Get some API Keyshttps://dev.twitter.com & https://apps.twitter.com/

http://www.alchemyapi.com/api/register.html