Upload
datastax
View
295
Download
0
Embed Size (px)
Citation preview
Dominique Rondé (@talk2nerd)Alexandra Klimova (@aklimova)
Real Time Business Intelligence with Cassandra, Kafka and HadoopA real story @ Allianz Deutschland AG
© Copyright Allianz
Dominique Rondé Big Data Pilot
Dipl. Wirt.-Inf. (FH)
128479 hrs with Java
40831 hrs with Big Data
14047 hrs Certified Datastax
Cassandra Solution Architect
Twitter: @Talk2Nerd
Alexandra Klimova Big Data Pilotesse
M.Sc. Informatik 75895 hrs with Big Data
40831 hrs with Hadoop
14047 hrs Certified Datastax Cassandra Solution Architect
Twitter: @Aklimova
© Copyright Allianz
We don‘t have an agenda-
We have some checklists!
3. Mai 2023 3
Agenda
© Copyright Allianz
Security
Instructions
3. Mai 2023 4
© Copyright Allianz
Checklist
Before Engine Start
Define the destination
3. Mai 2023 5
© Copyright Allianz 3. Mai 2023 6
Real Time Reporting
• Sold items for the current day• Open tickets during the day• Response Time on consumer
requests• Sold items grouped by type• Current Errors
© Copyright Allianz 3. Mai 2023 7
Fraud Protection
• Prevent „Fake Accounts“• Figure out „data grabber“• Detect fraud pattern
© Copyright Allianz 3. Mai 2023 8
Helping decision makers to understand the market
• Risk Specialists• Product Designers• Marketing Experts
© Copyright Allianz 3. Mai 2023 9
Our destination
TTDReduce the Time – To – Data
© Copyright Allianz
Time to Data is the time which is required until a requester received the data he / she needs to do his / her job.
Time to • find the source of required data• get the needed aggregation• clean up the data• write the statistical scripts• execute and refine these scripts• get a visualized result
3. Mai 2023 10
Definition of TTD
© Copyright Allianz
Checklist
Before Taxi
Check if we know all we need
3. Mai 2023 11
© Copyright Allianz
• Decoupled from all other development workChanges in analytics should not require additional work in all other applications
• Allow fast deploymentsLearn through the data and bring improvements fast into production
• High availableNo Event should get lost after it was fired
• Very accurateMake sure that every Event processed
• Horizontal scalableStart small and grow with the data
3. Mai 2023 12
Define functional requirements
© Copyright Allianz
• Data Privacy
• Data Security
• Data Protection
3. Mai 2023 13
Define legal requirements
© Copyright Allianz
Checklist
Before Take Off
Do the first steps
3. Mai 2023 14
© Copyright Allianz
Picking Measuring points
• Implement servlet filters to stay informed about http headers i.e. error-code, referrer
• Implement interceptors for the or-mapper to store the history of entities
• Instrument the web ui to send events about user interactions i.e. changes between pages
• Instrument the java code to send events with additional data at some points i.e. create a document
© Copyright Allianz
Each transfer object holds at least the
• current sessionId• timestamp when this event occurs• unique identifier of this event• version identifier
In some cases• current authenticated user
3. Mai 2023 16
Create some transfer objects
© Copyright Allianz 3. Mai 2023 17
Find an architecture
WebApplication
Reports
Dashboards
R-Scripts
© Copyright Allianz 3. Mai 2023 18
Design you first CF
Design conceptual
model
Specify access pattern
Choose a logical model
Configure physical model
Write a cql script
© Copyright Allianz
Checklist
During Take-Off
Run everything up
3. Mai 2023 19
© Copyright Allianz
But mention the difference
Start small
Add nodes
Grow up
© Copyright Allianz
Checklist
During Climb Out
Fill your speed-layer
3. Mai 2023 21
© Copyright Allianz
Monitor the Instruments
© Copyright Allianz
Consume
DataStream<String> messageStream =env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topicName"), new SimpleStringSchema(), properties));
MapDataStream<Tuple3<String,Date,Double>> clickMessageStream = messageStream.map(new ClickEventMapper());
Aggregate
DataStream<Tuple2<Date,Double,String>> aggregatedClickMessageStream = clickMessageStream.map(new KeyStreamMapper()).keyBy("f1").timeWindow(Time.minutes(2)).apply(new KeyWindowFunktion())
Store
CassandraSink.addSink(clickMessageStream).setQuery("INSERT INTO itemssale_by_product (eventtime, price, product) values (?, ?,?);").setClusterBuilder(new ClusterBuilder() {
public Cluster buildCluster(Cluster.Builder builder) {return builder.addContactPoint(„csn-node1.development.allianz.de").build();}
}).build();
© Copyright Allianz
Use the cassandra connector coming with Apache Flink since v. 1.1.0
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-cassandra_2.11</artifactId> <version>1.1.1</version></dependency>
Write aggregated data
© Copyright Allianz
@Table(keyspace= "allianz", name = "itemssale_by_product")public class MyCustomSalesEvent implements Serializable {
private static final long serialVersionUID = 1L;
@Column(name = "product")private String product;@Column(name = "eventdate")private Date eventdate;@Column(name = "price")private double price;
//Getter and Setter}
3. Mai 2023 25
Write aggregated data
© Copyright Allianz
DataStream<MyCustomSalesEvent> clickMessageStream = messageStream.map(new ClickEventMapper());
CassandraSink.addSink(clickMessageStream) .setClusterBuilder(new ClusterBuilder() { @Override public Cluster buildCluster(Cluster.Builder builder) { return builder.addContactPoint(„csn-node1.development.allianz.de").build(); } }) .build();
3. Mai 2023 26
Write aggregated data
© Copyright Allianz
Checklist
At 10.000 Feet
Make it safe and fancy
3. Mai 2023 27
© Copyright Allianz 3. Mai 2023 28
Privacy
WebApplication
Reports
Dashboards
R-Scripts
© Copyright Allianz 3. Mai 2023 29
Single gateway to the data
AdHoc Queries
Proof of Thesis
Quick Lookups
PeriodicReports
Web-basedDashboard
3rd PartieReportings
ExpertSystems
© Copyright Allianz 3. Mai 2023 30
Encryption
DC 1
Node 1
Node 3
Node 5
DC 2
Node 0
Node 4
Node 2
© Copyright Allianz
server_encryption_options: internode_encryption: all keystore: nasmount/conf/keystore.node0 keystore_password: changeme truststore: nasmount/conf/truststore.node0 truststore_password: changeme require_client_auth: true
3. Mai 2023 31
Encryption – Just easy to enable
allnone
dc: Cassandra encrypts the traffic between the data centers.rack: Cassandra encrypts the traffic between the racks.
© Copyright Allianz
CREATE TABLE zzz …
with compression_parameters:sstable_compression = 'Encryptor'... and compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'... and compression_parameters:secret_key_strength = 128;
3. Mai 2023 32
Encryption – With DSE
© Copyright Allianz
• ZeppelinIs ok as developer or data scientist toolNot suitable for C-Level reports
• MicroStrategyOnly support of Cassandra 2.xNeeds write permissions for the Column family (?)
• TablaeuAccess Cassandra via Spark (?)
3. Mai 2023 33
Hard to find a visualization solution
© Copyright Allianz
• D3.jsIs great to visualize and has stunning featuresNeeds an AngularJS developer to create a new report
• RProvides simple visualizationNeeds knowledge in R
3. Mai 2023 34
Hard to find a visualization solution
© Copyright Allianz
CREATE ROLE flink;
CREATE ROLE productsales;CREATE ROLE riskanalyst; GRANT SELECT ON allianz.solditems TO productsales;GRANT SELECT ON allianz.riskdata TO riskanalyst;
GRANT MODIFY ON KEYSPACE allianz TO flink;
3. Mai 2023 35
Limit read / write access
© Copyright Allianz
The maximum period to store some detailed information is limited by law
We have to ensure that me meet this requirement
TTL in cassandra does this job well
INSERT INTO proposal (id,date,product,price) VALUES (‘p-4711’, ‘09.09.2016’,’product-1’,50.00);UPDATE proposal USING TTL 86400 SET firstname = ‘Joe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 86400 SET lastname = ‘Doe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 172800 SET city = ‘Berlin’ WHERE id = ‘p-4711’;
3. Mai 2023 36
Remove outdated events
© Copyright Allianz
Checklist
At cruising altitude
Work with it
3. Mai 2023 37
© Copyright Allianz
Circle of dataMeet the experts
Extract and Enrich data
Aggregate data
Analyse the dataVisualize
Test Hypothesis
Discuss Actions
© Copyright Allianz 3. Mai 2023 39
Recalculate theSpeed-Layer
WebApplication
© Copyright Allianz
#Load RJDBClibrary(RJDBC)
#Load in the Cassandra-JDBC divercassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", list.files(„/opt/cassandra/lib/",pattern="jar$",full.names=T))
#Connect to Cassandra node and Keyspacecasscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9160/allianz")
3. Mai 2023 40
Bring the Data to R
© Copyright Allianz
#Query timeseries datares <- dbGetQuery(casscon, "select * from solditems limit 10")
#Transposetres <- t(res[2:10])
#Plotboxplot(tres,names=res$KEY,col=topo.colors(length(res$KEY)))title("BoxPlot of 10 Sold Items prices Historie")
3. Mai 2023 41
Bring the Data to R