APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO …2018.vnix-nog.vn/upload/files/13-VNNIC-ApplyBig... · Lucidworks Banana: • Similar to Elastics Kibana • Use Solr for data

APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO NATIONAL DNS OF VIETNAM

Danang City | 24 August 2018

Content

1. Overview

2. Apply Big Data on log analyze

3. Deploy the new monitoring system using Big Data

4. Results and conclusion

National DNS of Vietnam

• Most national critical information system.

• Established, managed, operated and secured by VNNIC.

• Distribution: – 5 domestic clusters in

Hanoi, HCM City and Danang City

– 2 oversea clusters

Query .VN domain

Query log: – Format: ISC BIND

– Rate: 4000 qps

– Velocity: 60GB/day

– Volume: • 20TB minimum (for 1

year)

• Longer for report and statistics

DNS Log Monitoring System

Traditional approach

Database Server

SAN Storage

Stats and monitor Server

Syslog server

DNS Server

DNS Server

DNS Server

Send log

Send log

Send log

Insert data

Read & write

Return data

Process data

Query data

Log analyser

User request

Send result

The limitations of traditional approach

• Limitations: – Storage capacity and speed – Data processing performance – Scalability – High cost

• Solution with traditional approach: – Reduce the sample time – Eliminate information – Less monitor and statistics cretiria

Reduce the value of information!

New approaches

• Main goal: horizontal scability in data storage and processing. – Distributed storage with able to parallel access (read and

write). – Log collecting can perform independently and parallel. – Massive parallel in data processing.

• Best solutions: – ELK Stack – Splunk – Hadoop (Big data)

Content

1. Overview




Big Data Hadoop

Hadoop Distributed File System (HDFS): • Filesystem written in Java ( based on Google ‘s GFS) • Sit on top of a native filesystem (ext3, ext4, nfs,…) • Provides redundant storage for massive amounts of

data. • Data is distributed across all nodes.

Yet Another Resource Nagotiator (YARN): • Platform for managing resources in Hadoop cluster. • Allocate and schedule resources (CPU, memory)

usage. • Run the distributed data processing application . • Monitor the application process.

HDFS

Master/Slave architecture: • Datanode: read and write blocks of large files as request from client. • Namenode: holds all metadata (file locations, file ownership and

permissions, name of individual blocks, locations of blocks).

YARN

• Resource Manager (RM): – Manage Node Managers (track heart beat) – Determine how resources are allocated – Create Application Master and track heart beat – Create containers (CPU, Memory) as request

from Application Master. Deallocate containers when expired or application completed.

• Node Manager (NM): – Register and provide information on node

resources to Resource Manager – Launch Application Masters on request from

Resource Manager. – Launch application processes – Monitor resouces usage by containers.

• Job History Server (JHS): – Archives MapReduce jobs ‘s metric and

metadata

Hadoop ecosystem

Applying Big Data on log analyse

Data ingestion

Data storage

Data processing

Data source

Deploy Hadoop Big Data Cluster with Cloudera

Content

1. Overview




Architecture of DNS Log Monitoring System

DNS SERVER

DNS SERVER

DNS SERVER

DNS SERVER

FLUME COLLECTOR

MONITORING(BANANA)

STATISTICSAPPLICATION (JDBC)

FLUME COLLECTOR

DNS Query Log ingestion with Flume

Deploy Flume with HA

DNS Server

Source

Channel

AvroSink

FLUME AGENT

DNS Server

Source

Channel

AvroSink

FLUME AGENT

DNS Server

Source

Channel

AvroSink

FLUME AGENT

AvroSource

Channel

Sink

FLUME AGENT

DNS Server

Source

Channel

AvroSink

FLUME AGENT

AvroSource

Channel

Sink

FLUME AGENT

HDFS

BIG DATA CLUSTER

DNS Log Analyze

AvroSink

AvroSink

AvroSink

AvroSource

Channel 2Solr Sink

FLUME AGENT

Channel 1HDFS Sink

AvroSource

Channel 2Solr Sink

FLUME AGENT

Channel 1HDFS Sink

HDFS

SOLR

• Perform in Flume Collector • Using Morphlines:

• Interceptor • MorphlineSolrSink

• Output: • HDFS Sink: Hive table data

(Data Warehouse), can query by Impala. Use for BI, Machine Learning.

• Solr Sink: Search engine for monitoring system’s query.

DNS Log Analyze with Morphline

• Morphlines support: – Grok – GeoIP (Maxmind2

database) – Translate – Conditional (IF THEN

ELSE) – Java code – Load solr

• Parse unstructured log data into fields: client ip, server ip, query time, domain, query type, record type,…

• Enrich the information: • TLDs, type of domain (ASCII, IDN) • Version of IP Address (v4 or v6) • Location of client • Query parameter (DNSSEC,

TCP/UDP, EDNS,…)

Data visualization with Banana

Lucidworks Banana: • Similar to Elastic’s Kibana • Use Solr for data analysis and

display • Based on dashboards, which

contain rows of panels • Supported panels:

– Histogram Panel – Heatmap Panel – Table Panel – Term Panel

Content

1. Overview




Dashboard of new DNS Log Monitoring System

Features of new DNS Log Monitoring System

• Near realtime monitoring system. • Time picker: relative, absolute. • Flexible filters:

– Field’s value as click on panel. – Advandce filtering.

• Monitoring criteria: – Number of query in DNS Clusters, DNS Servers – Top client IP Addresses – Top domain (.VN, other TLDs) – Client location – Record type – IPv4/IPv6 query – DNSSEC or non DNSSEC query – Query protocol (TCP/UDP). – Recent DNS queries.

Conclusions

• Tradional approach for log analyse has many limitations: performace, scability.

• Big Data can solve this problem and bring more valuable information from raw data sources.

• Our solution can apply on: – ISP DNS Server’s query log. – Other IT systems’ log.

• Future plan: – Apply machine learning on data warehouse to secure the National

DNS System, .VN domain names. – Apply BI for operating the National DNS System and making policy on

Internet Resources management.

Thank you!

Documents

APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO …2018.vnix-nog.vn/upload/files/13-VNNIC-ApplyBig... · Lucidworks Banana: • Similar to Elastics Kibana • Use Solr for data