Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO NATIONAL DNS OF VIETNAM
Danang City | 24 August 2018
Content
1. Overview
2. Apply Big Data on log analyze
3. Deploy the new monitoring system using Big Data
4. Results and conclusion
National DNS of Vietnam
• Most national critical information system.
• Established, managed, operated and secured by VNNIC.
• Distribution: – 5 domestic clusters in
Hanoi, HCM City and Danang City
– 2 oversea clusters
Query .VN domain
Query log: – Format: ISC BIND
– Rate: 4000 qps
– Velocity: 60GB/day
– Volume: • 20TB minimum (for 1
year)
• Longer for report and statistics
DNS Log Monitoring System
Traditional approach
Database Server
SAN Storage
Stats and monitor Server
Syslog server
DNS Server
DNS Server
DNS Server
Send log
Send log
Send log
Insert data
Read & write
Return data
Process data
Query data
Log analyser
User request
Send result
The limitations of traditional approach
• Limitations: – Storage capacity and speed – Data processing performance – Scalability – High cost
• Solution with traditional approach: – Reduce the sample time – Eliminate information – Less monitor and statistics cretiria
Reduce the value of information!
New approaches
• Main goal: horizontal scability in data storage and processing. – Distributed storage with able to parallel access (read and
write). – Log collecting can perform independently and parallel. – Massive parallel in data processing.
• Best solutions: – ELK Stack – Splunk – Hadoop (Big data)
Content
1. Overview
2. Apply Big Data on log analyze
3. Deploy the new monitoring system using Big Data
4. Results and conclusion
Big Data Hadoop
Hadoop Distributed File System (HDFS): • Filesystem written in Java ( based on Google ‘s GFS) • Sit on top of a native filesystem (ext3, ext4, nfs,…) • Provides redundant storage for massive amounts of
data. • Data is distributed across all nodes.
Yet Another Resource Nagotiator (YARN): • Platform for managing resources in Hadoop cluster. • Allocate and schedule resources (CPU, memory)
usage. • Run the distributed data processing application . • Monitor the application process.
HDFS
Master/Slave architecture: • Datanode: read and write blocks of large files as request from client. • Namenode: holds all metadata (file locations, file ownership and
permissions, name of individual blocks, locations of blocks).
YARN
• Resource Manager (RM): – Manage Node Managers (track heart beat) – Determine how resources are allocated – Create Application Master and track heart beat – Create containers (CPU, Memory) as request
from Application Master. Deallocate containers when expired or application completed.
• Node Manager (NM): – Register and provide information on node
resources to Resource Manager – Launch Application Masters on request from
Resource Manager. – Launch application processes – Monitor resouces usage by containers.
• Job History Server (JHS): – Archives MapReduce jobs ‘s metric and
metadata
Hadoop ecosystem
Applying Big Data on log analyse
Data ingestion
Data storage
Data processing
Data source
Deploy Hadoop Big Data Cluster with Cloudera
Content
1. Overview
2. Apply Big Data on log analyze
3. Deploy the new monitoring system using Big Data
4. Results and conclusion
Architecture of DNS Log Monitoring System
DNS SERVER
DNS SERVER
DNS SERVER
DNS SERVER
FLUME COLLECTOR
MONITORING(BANANA)
STATISTICSAPPLICATION (JDBC)
FLUME COLLECTOR
DNS Query Log ingestion with Flume
Deploy Flume with HA
DNS Server
Source
Channel
AvroSink
FLUME AGENT
DNS Server
Source
Channel
AvroSink
FLUME AGENT
DNS Server
Source
Channel
AvroSink
FLUME AGENT
AvroSource
Channel
Sink
FLUME AGENT
DNS Server
Source
Channel
AvroSink
FLUME AGENT
AvroSource
Channel
Sink
FLUME AGENT
HDFS
BIG DATA CLUSTER
DNS Log Analyze
AvroSink
AvroSink
AvroSink
AvroSource
Channel 2Solr Sink
FLUME AGENT
Channel 1HDFS Sink
AvroSource
Channel 2Solr Sink
FLUME AGENT
Channel 1HDFS Sink
HDFS
SOLR
• Perform in Flume Collector • Using Morphlines:
• Interceptor • MorphlineSolrSink
• Output: • HDFS Sink: Hive table data
(Data Warehouse), can query by Impala. Use for BI, Machine Learning.
• Solr Sink: Search engine for monitoring system’s query.
DNS Log Analyze with Morphline
• Morphlines support: – Grok – GeoIP (Maxmind2
database) – Translate – Conditional (IF THEN
ELSE) – Java code – Load solr
• Parse unstructured log data into fields: client ip, server ip, query time, domain, query type, record type,…
• Enrich the information: • TLDs, type of domain (ASCII, IDN) • Version of IP Address (v4 or v6) • Location of client • Query parameter (DNSSEC,
TCP/UDP, EDNS,…)
Data visualization with Banana
Lucidworks Banana: • Similar to Elastic’s Kibana • Use Solr for data analysis and
display • Based on dashboards, which
contain rows of panels • Supported panels:
– Histogram Panel – Heatmap Panel – Table Panel – Term Panel
Content
1. Overview
2. Apply Big Data on log analyze
3. Deploy the new monitoring system using Big Data
4. Results and conclusion
Dashboard of new DNS Log Monitoring System
Features of new DNS Log Monitoring System
• Near realtime monitoring system. • Time picker: relative, absolute. • Flexible filters:
– Field’s value as click on panel. – Advandce filtering.
• Monitoring criteria: – Number of query in DNS Clusters, DNS Servers – Top client IP Addresses – Top domain (.VN, other TLDs) – Client location – Record type – IPv4/IPv6 query – DNSSEC or non DNSSEC query – Query protocol (TCP/UDP). – Recent DNS queries.
Conclusions
• Tradional approach for log analyse has many limitations: performace, scability.
• Big Data can solve this problem and bring more valuable information from raw data sources.
• Our solution can apply on: – ISP DNS Server’s query log. – Other IT systems’ log.
• Future plan: – Apply machine learning on data warehouse to secure the National
DNS System, .VN domain names. – Apply BI for operating the National DNS System and making policy on
Internet Resources management.
Thank you!