27
APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO NATIONAL DNS OF VIETNAM Danang City | 24 August 2018

APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO …2018.vnix-nog.vn/upload/files/13-VNNIC-ApplyBig... · Lucidworks Banana: • Similar to Elastics Kibana • Use Solr for data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • APPLY BIG DATA ON STATISTICS AND MONITOR QUERY TO NATIONAL DNS OF VIETNAM

    Danang City | 24 August 2018

  • Content

    1. Overview

    2. Apply Big Data on log analyze

    3. Deploy the new monitoring system using Big Data

    4. Results and conclusion

  • National DNS of Vietnam

    • Most national critical information system.

    • Established, managed, operated and secured by VNNIC.

    • Distribution: – 5 domestic clusters in

    Hanoi, HCM City and Danang City

    – 2 oversea clusters

  • Query .VN domain

    Query log: – Format: ISC BIND

    – Rate: 4000 qps

    – Velocity: 60GB/day

    – Volume: • 20TB minimum (for 1

    year)

    • Longer for report and statistics

  • DNS Log Monitoring System

  • Traditional approach

    Database Server

    SAN Storage

    Stats and monitor Server

    Syslog server

    DNS Server

    DNS Server

    DNS Server

    Send log

    Send log

    Send log

    Insert data

    Read & write

    Return data

    Process data

    Query data

    Log analyser

    User request

    Send result

  • The limitations of traditional approach

    • Limitations: – Storage capacity and speed – Data processing performance – Scalability – High cost

    • Solution with traditional approach: – Reduce the sample time – Eliminate information – Less monitor and statistics cretiria

    Reduce the value of information!

  • New approaches

    • Main goal: horizontal scability in data storage and processing. – Distributed storage with able to parallel access (read and

    write). – Log collecting can perform independently and parallel. – Massive parallel in data processing.

    • Best solutions: – ELK Stack – Splunk – Hadoop (Big data)

  • Content

    1. Overview

    2. Apply Big Data on log analyze

    3. Deploy the new monitoring system using Big Data

    4. Results and conclusion

  • Big Data Hadoop

    Hadoop Distributed File System (HDFS): • Filesystem written in Java ( based on Google ‘s GFS) • Sit on top of a native filesystem (ext3, ext4, nfs,…) • Provides redundant storage for massive amounts of

    data. • Data is distributed across all nodes.

    Yet Another Resource Nagotiator (YARN): • Platform for managing resources in Hadoop cluster. • Allocate and schedule resources (CPU, memory)

    usage. • Run the distributed data processing application . • Monitor the application process.

  • HDFS

    Master/Slave architecture: • Datanode: read and write blocks of large files as request from client. • Namenode: holds all metadata (file locations, file ownership and

    permissions, name of individual blocks, locations of blocks).

  • YARN

    • Resource Manager (RM): – Manage Node Managers (track heart beat) – Determine how resources are allocated – Create Application Master and track heart beat – Create containers (CPU, Memory) as request

    from Application Master. Deallocate containers when expired or application completed.

    • Node Manager (NM): – Register and provide information on node

    resources to Resource Manager – Launch Application Masters on request from

    Resource Manager. – Launch application processes – Monitor resouces usage by containers.

    • Job History Server (JHS): – Archives MapReduce jobs ‘s metric and

    metadata

  • Hadoop ecosystem

  • Applying Big Data on log analyse

    Data ingestion

    Data storage

    Data processing

    Data source

  • Deploy Hadoop Big Data Cluster with Cloudera

  • Content

    1. Overview

    2. Apply Big Data on log analyze

    3. Deploy the new monitoring system using Big Data

    4. Results and conclusion

  • Architecture of DNS Log Monitoring System

    DNS SERVER

    DNS SERVER

    DNS SERVER

    DNS SERVER

    FLUME COLLECTOR

    MONITORING(BANANA)

    STATISTICSAPPLICATION (JDBC)

    FLUME COLLECTOR

  • DNS Query Log ingestion with Flume

  • Deploy Flume with HA

    DNS Server

    Source

    Channel

    AvroSink

    FLUME AGENT

    DNS Server

    Source

    Channel

    AvroSink

    FLUME AGENT

    DNS Server

    Source

    Channel

    AvroSink

    FLUME AGENT

    AvroSource

    Channel

    Sink

    FLUME AGENT

    DNS Server

    Source

    Channel

    AvroSink

    FLUME AGENT

    AvroSource

    Channel

    Sink

    FLUME AGENT

    HDFS

    BIG DATA CLUSTER

  • DNS Log Analyze

    AvroSink

    AvroSink

    AvroSink

    AvroSource

    Channel 2Solr Sink

    FLUME AGENT

    Channel 1HDFS Sink

    AvroSource

    Channel 2Solr Sink

    FLUME AGENT

    Channel 1HDFS Sink

    HDFS

    SOLR

    • Perform in Flume Collector • Using Morphlines:

    • Interceptor • MorphlineSolrSink

    • Output: • HDFS Sink: Hive table data

    (Data Warehouse), can query by Impala. Use for BI, Machine Learning.

    • Solr Sink: Search engine for monitoring system’s query.

  • DNS Log Analyze with Morphline

    • Morphlines support: – Grok – GeoIP (Maxmind2

    database) – Translate – Conditional (IF THEN

    ELSE) – Java code – Load solr

    • Parse unstructured log data into fields: client ip, server ip, query time, domain, query type, record type,…

    • Enrich the information: • TLDs, type of domain (ASCII, IDN) • Version of IP Address (v4 or v6) • Location of client • Query parameter (DNSSEC,

    TCP/UDP, EDNS,…)

  • Data visualization with Banana

    Lucidworks Banana: • Similar to Elastic’s Kibana • Use Solr for data analysis and

    display • Based on dashboards, which

    contain rows of panels • Supported panels:

    – Histogram Panel – Heatmap Panel – Table Panel – Term Panel

  • Content

    1. Overview

    2. Apply Big Data on log analyze

    3. Deploy the new monitoring system using Big Data

    4. Results and conclusion

  • Dashboard of new DNS Log Monitoring System

  • Features of new DNS Log Monitoring System

    • Near realtime monitoring system. • Time picker: relative, absolute. • Flexible filters:

    – Field’s value as click on panel. – Advandce filtering.

    • Monitoring criteria: – Number of query in DNS Clusters, DNS Servers – Top client IP Addresses – Top domain (.VN, other TLDs) – Client location – Record type – IPv4/IPv6 query – DNSSEC or non DNSSEC query – Query protocol (TCP/UDP). – Recent DNS queries.

  • Conclusions

    • Tradional approach for log analyse has many limitations: performace, scability.

    • Big Data can solve this problem and bring more valuable information from raw data sources.

    • Our solution can apply on: – ISP DNS Server’s query log. – Other IT systems’ log.

    • Future plan: – Apply machine learning on data warehouse to secure the National

    DNS System, .VN domain names. – Apply BI for operating the National DNS System and making policy on

    Internet Resources management.

  • Thank you!