33
9/1/2015 Big Data Cyber Analytics IKANOW 11921 Freedom Drive Suite 550 Reston, VA 20190 www.ikanow.com Document Release: 1.0 Document Number: PN200 Sholeh Gregory I KANOW Information Security Analytics (ISA) Threat Intelligence Platform System Architecture Guide v1.0 Continuous Cyber Security Optimization

IKANOW System Architecture Guide

Embed Size (px)

Citation preview

Page 2: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

2

2015

TABLE OF CONTENTS

Preface ...................................................................................................................................................................... 5

Who Should use this Document ................................................................................................................................... 5

Conventions Used in this Document .......................................................................................................................... 5

Other IKANOW Documentation .................................................................................................................................. 6

Contact Information ......................................................................................................................................................... 6

1 Cyber Security Threats Landscape ................................................................................................................... 7

Current Security Paradigm .............................................................................................................................................8

Challenges Around Information Security Analytics ................................................................................................8

2 Solution: IKANOW Next Generation Information Security Analytics Overview .....................................9

Open, Flexible, Scalable Threat Intelligence Platform .......................................................................................... 9

Constant Calibration of ISA Security Posture ........................................................................................................ 10

How IKANOW ISA Works .............................................................................................................................................. 11

Data Ingestion, Curation, Enrichment ....................................................................................................................... 13

3 Information Security Analytics Core Features ................................................................................................ 14

Three-Step Data Sources Ingestion .......................................................................................................................... 14

Comprehensive and Collaborative Visualizations and Reports ........................................................................ 15

Third-Party Tools and Applications Integration ..................................................................................................... 15

Robust Sorting and Searching ..................................................................................................................................... 16

4 IKANOW ISA Solution Architecture Overview ............................................................................................ 17

Traditional vs. Next-Generation ISA Application Architecture Key Points .................................................... 17

Next-Generation ISA Architecture Requirements ................................................................................................ 17

Front-End ISA Application Components ................................................................................................................. 19

Back-End ISA Architecture Core Components ..................................................................................................... 20

Middleware Data Analytics Services .......................................................................................................................... 21

5 Data Source Management .............................................................................................................................. 22

Data Sources .................................................................................................................................................................... 22

Data Source Documents .............................................................................................................................................. 22

Document Entities ..................................................................................................................................................... 22

Document Associations ........................................................................................................................................... 22

Matching Document Types ......................................................................................................................................... 23

Top Documents .......................................................................................................................................................... 23

Page 3: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

3

2015

Filtered Documents .................................................................................................................................................. 23

Aggregations Documents ........................................................................................................................................ 23

6 Basic Data Elements ....................................................................................................................................... 24

Data Objects .................................................................................................................................................................... 24

Data Services ................................................................................................................................................................... 24

Object Schema ........................................................................................................................................................... 24

Data Import ...................................................................................................................................................................... 25

Harvesting Data Import ........................................................................................................................................... 25

Enrichment Data Import ......................................................................................................................................... 25

Data Buckets .................................................................................................................................................................... 25

Harvesting Configuration ............................................................................................................................................. 25

Harvesting Data Enrichment....................................................................................................................................... 26

Data Enrichment Lists ............................................................................................................................................... 27

Data Analytics ................................................................................................................................................................... 27

Data Security ................................................................................................................................................................... 28

Plugin Libraries ................................................................................................................................................................ 29

7 Data Source Processing Pipeline ................................................................................................................... 30

Data Source Processing Types ...................................................................................................................................30

Input Sources Processing ........................................................................................................................................30

Custom Processing Sources ...................................................................................................................................30

Data Input Sources......................................................................................................................................................... 32

Page 4: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

4

2015

IMPORTANT NOTICE

The information contained in the document is believed to be reliable, but IKANOW makes no warranties as to its accuracy or completeness. IKANOW does not warrant or represent that any license, either express or implied, is granted under any IKANOW patent right, copyright, or other IKANOW intellectual property right relating to any combination or process in which IKANOW products or services are used. Information published by IKANOW regarding third-party products or services does not constitute a license from IKANOW to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property of the third party, or a license from IKANOW under the patents or other intellectual property of IKANOW.

IKANOW Threat Analytics Platform TM is trademark of IKANOW, Inc.

Copyright © 2015, IKANOW Inc. All rights reserved.

Page 5: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

5

2015

PREFACE The Information Security Analytics (ISA) Threat Intelligence Platform System Architecture Guide describes the product’s core features, architecture, design requirements, different components of the product, and the role each component provides for the IKANOW Threat Intelligence Platform solution.

WHO SHOULD USE THIS DOCUMENT The following are the intended audience for this guide:

Cyber Security Analyst

Special Operations Engineer

Tier 1, 2, 3 Cyber Analyst

Social Media Analyst

IT Security Engineer

Chief Information Security Officer (CISO)

Sales Engineers

System Architects and Designers

CONVENTIONS USED IN THIS DOCUMENT Table 1 describes the typographic conventions used in this guide.

Table 1. Typographic Conventions

Convention Meaning Example

courier font Names of commands, files, on-screen computer output.

Edit your .login file.

Use ls -a to list all files.

machine_name% test.doc.

italics Document titles, new terms, words to be emphasized.

Variables that you replace with a real name or value .

Read Chapter 6 in User's Guide.

These are called class options.

You must be root to do this.

Type rm filename to delete a file.

boldface

Consolas font

What you type. machine_name% su

Password:

Page 6: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

6

2015

OTHER IKANOW DOCUMENTATION IKANOW Community Edition Documentation

IKANOW Enterprise Edition Documentation

CONTACT INFORMATION Your feedback is always welcome. Please feel free to submit questions, comments, and feedback to [email protected].

Page 7: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

7

2015

CYBER SECURITY THREATS LANDSCAPE

“Today’s cyber security threats are dynamic and asymmetric,” says Chris Morgan, President of IKANOW.” Organizations need to change their approach to tackle these new threats effectively.”

Companies, governments, and non-governmental organization (NGOs) alike need to understand and defend themselves against advanced persistent threats. Cyber attacks are in the headlines nearly every day, indicating that virtually every enterprise has been breached. In July 2015, major data breaches against Trump Hotel Collection, AshleyMadison, UCLA Health System, Service System Associates, and St. Francis Health have all stolen headlines.

Figure 1. IKANOW Major Breach Index, July 2015

The impact of breaches is disastrous. According to the Mandiant M-Trends Report, it takes an average organization 229 days, or more than seven months, to just detect a data breach. There’s a 22 percent chance that today’s data breaches will compromise 10,000 or more records, according to the Ponemon Institute’s 2014 Cost of Data Breach Study .

Furthermore, the average cost of a data breach for Fortune 1000 companies has risen 15 percent over the last year to $3.5 (€3.15) million according to the same study. For organizations that store personal health information, partner breaches compromising client information could result in regulatory fines.

Response organizations and Fortune 1000 companies alike all face a similar big data problem in this era of increasing vulnerabilities where attacks are increasingly dynamic and asymmetric in nature. They are happening constantly wave after wave while organizations become more vulnerable through Bring Your Own Device (BYOD) programs, increased use of cloud storage, and the ubiquitous use of the Internet while at work.

Information security professionals need to optimize their resources to meet the rising cyber security challenges they face today. Many organizations are looking towards big data as well as elaborate network of disparate security systems to thwart these types of attacks. Current network security solutions collect huge amounts of data. In fact, standard security information and event management (SIEM) products collect so much data that companies struggle to operate them.

According to the 2013 SIEM Survey from EiQ Networks, 52 percent of all companies require two or more full-time analysts to manage their unwieldy SIEM deployments. This does not account for the additional monetary and personnel resources needed to analyze the extensive amount of data

11 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 8: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

8

2015

many organizations collect from external threat intelligence feeds, such as FireEye’s DTI, Symantec DeepInsights, and iSight Partners.

The problem, according to Mark Nicolett, a managing VP at Gartner, is not that organizations do not have enough security data. “We are not suffering from a lack of data,” Nicolett told Dark Reading. “We are suffering from a lack of intelligence in analyzing it.” In other words, collecting more data will be of no help if you cannot find the story within the data.

By taking these disparate sets of data out of their storage areas, and then analyzing and visualizing it, organizations can create action on the small data that counts and ultimately improve their security posture.

CURRENT SECURITY PARADIGM The current security paradigm is based on a big data approach. An immense amount of data is collected and stored from various sources such as log data, external threat intelligence feeds, and open source intelligence (OSINT) data.

Since this data lives in separate places, there is no efficient way for even the best cyber analysts to bring this information together, to find correlation relevance, or to take action on that data.

An advanced data analytics platform can discover the relevant, small data to conquer the big data problem. An effective threat analytics platform can ingest external threat intelligence information and enterprise security data to map known and previously mitigated attacks, along with current security data to detect attacks already underway.

CHALLENGES AROUND INFORMATION SECURITY ANALYTICS

Bringing all of that big data together into one central place in a logical way, while automating critical security tasks, allows organizations to increase productivity and streamlining the use of

resources required to understand current threats, bolster defenses and detect threats.

Analyzing and visually representing the full spectrum of internal, external, structured, semi-structured, and unstructured data together allows organizations to find the small

data that is meaningful and actionable. Only then can IT professionals effectively deploy limited resources and establish effective protocols for thwarting and addressing breaches.

When you understand the current threats and vulnerabilities faced by your organization, you can effectively deploy defensive resources to protect the most valuable and most vulnerable assets. In some cases, it might be against internal systems that can be disabled using distributed denial of service (DDoS) attacks. In many cases, cyber attacks target proprietary or customer information. Some intrusions leave a backdoor that becomes a foothold for future attacks.

Big data alone is not enough to defend against the ever changing and ever-increasing specters of cyber threats. By rethinking the way security and threat intelligence data is collected, analyzed and reported, security stakeholders can visualize the full threat landscape. This will enable them to find the small data that really matters, allowing them to respond to threats and develop anticipatory security strategies.

Page 9: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

9

2015

SOLUTION: IKANOW NEXT GENERATION

INFORMATION SECURITY ANALYTICS OVERVIEW

Anticipating and responding to cyber threats requires a specialized set of tools, infrastructure, and support. IKANOW’s revolutionary solution offers an open, high-throughput big data technology infrastructure. Within seconds, you can locate time sensitive information across terabytes of data, therefore raising the efficiency of team members and analysts that are attentively monitoring cyber risks around-the-clock. Yet, tools and technologies are useless without support. From ingestion to customizing visualizations to the development of strategic operational scorecards, it is possible to leverage IKANOW’s data science competencies and iteratively customize a cyber-risk-reduction program for your business, without ongoing cost.

IKANOW enables application of adaptable analytical techniques and measurement tools that automate data ingestion and analysis. These features offer visibility that can save weeks in detecting and defending against cyber security threats. The vast amount of security information now available requires a new approach to cyber security that leverages innovative big data management along with correlation and visualization technologies that enable information security professionals to effectively protect their network.

OPEN, FLEXIBLE, SCALABLE THREAT INTELLIGENCE PLATFORM As shown in Figure 2, IKANOW ISA platform

Provides Business Intelligence to the CISO to drive change in an organization.

Reduces the resource required to perform critical security tasks.

Provides an additional layer of defense against advanced persistent threats (APTs).

22 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 10: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

10

2015

Figure 2. IKANOW ISA Security-in-Layers Platform

CONSTANT CALIBRATION OF ISA SECURITY POSTURE IKANOW ISA platform delivers accelerated decision throughput by recalibrating the security posture of the information security analytics for your overall security plan. The security posture is the approach your business takes to security, from planning to implementation. It is comprised of technical and non-technical policies, procedures and controls that protect you from both internal and external threats. IKANOW ISA is

A platform to integrate threat intelligence with enterprise data and then to ingest, enrich, analyze and visualize the results and thereby determine the risk level and security posture.

A framework for assessing and improving the security posture of industrial control systems (ICS).

This platform combines the right feed for your organization by enhancing the feeds with an analytics platform that can dramatically improve an organization’s security posture.

Page 11: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

11

2015

Figure 3. ISA Security Posture Calibration

The IKANOW platform allows Fortune 1000 organizations and government agencies to quickly ingest various data sources and types, enrich the data, and then search and visualize the data in an easy-to-use interface.

HOW IKANOW ISA WORKS IKANOW ISA platform helps adjust the levers of your enterprise to cohesively align strategic and tactical functions. Information Security first simplifies the data ingestion process by giving your analyst team tools for ingestion and duration without the need for customized development.

By using this ingested data, external information as well as internal information can be fused, followed by the application of filters to remove unnecessary data points. Ready-made visualizations are then applied to identify patterns and anomalies while the results are shared with other teams so that other groups may also be informed about any potential or actual security breaches.. A plan is then devised by our data science team to build techniques for advanced analytics, finding the optimal mediums for correlation and comparison. With customized visualizations and templates, you are now armed to baseline repeatable metrics and build cascading scorecards (dashboards) across functions to mechanize responses used to predict cyber risks. The ISA platform equips security teams by uniting insights and creating discipline in a way that can achieve accelerated decision throughout.

Page 12: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

12

2015

Figure 4 – IKANOW Threat Analytics High-Level View

Page 13: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

13

2015

DATA INGESTION, CURATION, ENRICHMENT IANOW correlates data from multiple data feeds, social networks, as well as corporate security information and event management (SIEM) data. The output can be as specific as identifying IP addresses that have been affected by malware. Results of the analytics are presented in reports and a dashboard which allow threats to be easily communicated, discussed, prioritized, and resolved. This is shown in Figure 5.

Figure 5. DATA Integration, and Enrichment

ISA helps organizations constantly maintain an optimal security posture by aligning the strategic, tactical, and operational aspects of the business. It does this with a set of core features which make it very easy to ingest, curate and enrich data. Data sources are constantly entered into the IKANOW Threat Analytics Platform and are continually visualized and reported on using cascading scorecards, enabling each enterprise stakeholder to obtain timely results and drive the need for change in security posture accordingly.

Page 14: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

14

2015

INFORMATION SECURITY ANALYTICS CORE FEATURES

IKANOW ISA enables one to actively recalibrate your security posture by applying adaptable analytical techniques and measurement tools that automate analysis and decision-making processes. The following are the IKANOW ISA platform key features:

Three-Step Data Sources Ingestion

Comprehensive and Collaborative Visualizations and Reports

Third-Party Tools and Applications Integration

Robust Sorting and Searching These capabilities provide users with the information required to effectively use the platform from source creation and management through data visualization and reporting.

THREE-STEP DATA SOURCES INGESTION ISA offers the ability to control data sources throughout setup, testing, operation, and publishing and includes the ability to add and suspend data sources. ISA provides support for logstash, RSS, CSV, S3, and various APIs. In addition, the advanced source builder option allows you to add and edit JSON directly.

The first key feature is the source ingestion process that easily adds new source data to the IKANOW platform—structured, unstructured, or semi-structured in nature (think everything from SIEM data to OSINT data and social media). This will all be done in a new clean and light interface as shown in Figure 6.

33 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 15: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

15

2015

Figure 6. Three-Step Data Ingestion

This will eliminate having to wait for IT to add critical data sources to the platform. This allows one to quickly analyze and search across a comprehensive set of threat data, therefore allowing your organization to detect and prioritize your defense against potential threats in a timely manner..

COMPREHENSIVE AND COLLABORATIVE VISUALIZATIONS AND REPORTS ISA includes a series of reporting tools that enables one to compare threats and vulnerabilities by assigning risk levels and tracking cost information—all to help you to determine your optimal security strategy.

ISA also offers a threat feed tool to aid your team in determining the ongoing value of threat feeds over time. Additional visualizations are provided to help in identifying patterns across data and to identify indicators of compromise most relevant to your team. This means you can create comprehensive visualizations across all of your InfoSec analytics data. These visualizations can be shared with team members throughout the analytical process and across levels of your organization. Enterprises can then create the necessary structures to perform self learning in order to develop accurate pictures of results.

Figure 7. Visualizations and Reports

THIRD-PARTY TOOLS AND APPLICATIONS INTEGRATION ISA supports the use of multiple third-party tools and applications. ISA facilitates cleansing of data with many of these tools and applications to enable ease of sharing.. ISA is also directly integrated with a growing number of third-party applications, including Kibana, which can be accessed directly within ISA for direct comparability of log information.

ISA enables the use of Logstash to integrate Kibana and other third-party data analysis tools. This allows users to read and process data through Logstash and analyze it through Kibana, or another tool, at scale. This includes structured and unstructured threat intelligence data in a format customized to match your SIEM log data or any other format.

Page 16: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

16

2015

Figure 8. Tools and Applications Integration

Using adaptable analytical techniques and measurements that automate the analysis process, including IKANOW’s visualization and collaboration functionality, can help constantly optimize your security posture by staying ahead of threats and reducing enterprise risk.

ROBUST SORTING AND SEARCHING ISA provides data filtering and organization tools that enable you quickly identify relevant data. Search options range from verb categories to a selection of entity options that include the ability to tag and save past searches. Once search queries are executed, further filtering options offer additional focus across multiple predefined options, such as recent, oldest, and relevance.

You can search a combined set of data from disparate sources and formats to help uncover relationships between internal and external data, hastening the ability to see potential threats and their impact across the network.

Using this powerful search capability, organizations no longer need to hire a developer or contact their vendors to perform these tasks: The InfoSec team can do this on your own their schedule, quickly and easily.

Page 17: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

17

2015

IKANOW ISA SOLUTION ARCHITECTURE

OVERVIEW This chapter describes the next-generation system architecture for solutions like ISA that uses the developer-level APIs, while showing distinct differences between ISA and core components.

TRADITIONAL VS. NEXT-GENERATION ISA APPLICATION ARCHITECTURE KEY POINTS While traditional ISA design is to interact with by using the RESTful APIs that incorporate various Java plugins added as the platform for applications, the next-generation ISA supports application-specific API plugins that are added as an standard operation.

NEXT-GENERATION ISA ARCHITECTURE REQUIREMENTS The following is a list of requirements that had to be met to qualify IKANOW’s next-generation ISA architecture:

Write and deploy external harvesters.

Write stand-alone streaming enrichment engines.

Develop records-based threads by using the application-specific API plugins.

View each datum as an "object" with a set of attributes that defines where it is stored and how it can be processed instead of categorizing them as "document," "record," or "custom."

Set the schema at import time and subsequently modify it.

Plug in different NoSQL technologies based on their capabilities (mapped to the schema), so that the processing will access the layer that is most sensible.

Store the original data in HDFS, thus enabling repeatability.

Assign roles by users to each node in the cluster through a centralized management User Interface powered by Salt.

Decouple the user interface and applications more than in the original platform.

Build an Open Source platform from the start with a test infrastructure that enables partners and the community to contribute.

Write in the modern JVM-based language Scala for increased concurrency and reliability.

Keep the document-based threads.

Keep the analytics-based threads.

Provide elasticsearch data service with both read and write capabilities.

Provide access context for tomcat.

Provide most of the management DB, including bucket CRUD, library (plugin) CRUD, share replacement CRUD, and access to the data services.

The next step is to map the ISA functional requirements onto the ISA application model architecture as illustrated in Figure 9.

44 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 18: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

18

2015

Figure 9 illustrates the front-end, back-end, and middleware of the ISA application architecture.

Figure 9. ISA Application Architecture Components

Figure 9 illustrates the front-end, back-end, and the middleware of the ISA application architecture as described in Table 2.

Table 2. ISA Components

ISA Architecture Components

Meaning

Blue Show the components of the ISA application layer as described in Section Front-End ISA Application Components.

Light Blue Indicates the core components of the ISA architecture as explained in Section

Back-End ISA Architecture Core Components.

Very Light Blue Lists out the middleware services as described in Section

Middleware Data Analytics Services.

ISA Front-End Components

ISA Middleware Components

ISA Back-End Components

Page 19: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

19

2015

FRONT-END ISA APPLICATION COMPONENTS Table 3. Front-End ISA Application Components

Front-End ISA Application Components

Description

Management and Virtualization User Interfaces

The ISA Manager is an a role-based user management mechanism which enables the ingestion and management of Data Sources. The Manager for ingestion and management enables a single sheet of sterilized data be leveraged for analysis, reporting, and visualization.

The Management and Virtualization user interface helps the user with understanding and expression of information needed. The interface help users formulate their queries, select among available information sources, understand search results, and keep track of the progress of their search

Some of the visualizations that ship with Information Security Analytics require additional data processing jobs to be executed from the platform. An IKANOW resource will be required to execute these map reduce jobs before your data will appear in the visualizations.

ISA Application-Specific API Plugins

The ISA design abstracts extraction–transformation–loading (ETL), enrichment, and analytics into plugins.

ETL tools are pieces of software responsible for the extraction of data from several sources, its cleansing, customization, reformatting, integration, and insertion into a data warehouse. Building the ETL process is potentially one of the biggest tasks of building a data warehouse;

Buckets and Sources These are ISA-specific connectors to external data or controlling analytics.

The Buckets REST API creates, deletes, flushes, and retrieves information about buckets and bucket operations.

ISA supports full access to CRUD stores.

Page 20: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

20

2015

BACK-END ISA ARCHITECTURE CORE COMPONENTS

Table 4. Back-End ISA Core Components

Back-End ISA Core Components

Descriptions

Traditional API Plugins

Plugin interfaces are through data harvest, enrichment, analytics, as well as access context, which results in granting access to the management DB. This enables direct access to CRUDs, the bucket CRUD, and the binary plugin CRUD. Additionally, plugin interfaces enable direct access to the data services at different locations where data objects can be stored, including HDFS, elasticsearch, MongoDB for documents, and Titan for entities and associations.

Although document-based threads are available through these additional APIs, analytics-based threads a accessible through both the traditional as well as ISA application-specific APIs.

Supported analytic technologies:

- Hadoop with MongoDB

- HDFS

- Elasticsearch input/output

- Harvest Technologies

- Enrichments

Management Database Access Context

MongoDB

Unstructured analytics using Elasticsearch and MongoDB

Data Query Services Elastic search

Traditional Source APIs,

Buckets APIs

External data ISA connectors

ISA-specific connectors to external data or controlling analytics.

Including harvest, enrichment, analytics context

Page 21: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

21

2015

MIDDLEWARE DATA ANALYTICS SERVICES

Table 5. Middleware ISA Core Components

Middleware ISA Core Components

Description

Data Enrichment-Enrichment Modules

A general term, referring to processes used to enhance, refine or otherwise improve raw data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It also shows the common imperative of proactively using this data in various ways.

Analytic Technologies-Analytic Modules

Java code to perform ISA specific analytics (in many cases the plugin will be generic in nature, with ISA specific configuration).

ISA enables any analytic engine to be controlled via a bucket (given a Java plugin). While a simple Hadoop interface is available, ISA provides an analytic engine.

External Data Harvest Technologies

ISA enables Java plugins to control any harvester while providing the document pipeline and logstash.

Analytic Modules

Java code to perform ISA specific analytics (in many cases the plugin will be generic in nature, with ISA specific configuration).

Go to

DATA OBJECTS IN SECTION Data Objects of Chapter Section Basic Data Elements for more information about data objects.

Page 22: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

22

2015

DATA SOURCE MANAGEMENT

Data sources or connectors extract data and then use visualization widgets to attain deeper knowledge or a clearer perception of the data gathered.

Source data in the platform is stored in a JSON format as a document, where the document format contains elements such as metadata, entities, and associations. Sources are made up of documents that are harvested over time.

DATA SOURCES Sources are data connectors that pull data from databases, RSS feeds, or file shares such as directories, and single files, such as PDF, comma-separated values (CSV), XML, or ZIP. Each data source is assigned a Title (such as Fox News RSS), Tags (for example, News, Politics, Conservative, Republican, US) and a Type (like News).

DATA SOURCE DOCUMENTS Each record or piece of data ingested by a source becomes a JSON document, regardless of the format or size of the data. A document can be any of the following:

Article from an RSS feed

40-character Tweet

Row from a CSV file

40-page medical journal

Each JSON document contains

Series of metadata fields , including title, description, source ID, date, and time

Entities, such as person, IP-internal

Associations, for example hard (subject, verb, object) vs. soft

Document Entities Document entities are who, what, and where that are extracted from a document.

Who—Person, Company, Organization

What—Industry Term, Product, Facility

Where— City, Province or State, Country

Document Associations An association is an activity or relationship between entities. It can be a subject, verb, object, at location, or over time. These subjects and objects can be free text, while pointing to entities within a document.

55 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 23: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

23

2015

MATCHING DOCUMENT TYPES When a query is issued, often a large number of “matching” documents will satisfy the query criteria, particularly for a common query like "Obama.” In this example, the search yields 4.2 million results that are not directly available to the widgets.

Top Documents From all the matching documents that are retrieved, a ranked subset of these documents are selected according to a configurable scoring method returned directly to GUI for analyzing. These top documents are an estimate of the most relevant documents . The default number of top documents is 100, indicating the top 100 of the 4.2 million documents are presented in the widgets.

Filtered Documents The widget API allows for further filtering of the top documents within the GUI by selecting a subset of documents that contain a specific set of entities. This subset is called the filtered documents. In the above example, a filter for "Hillary Clinton" populates widgets with only those documents that contain both "Obama" AND "Hillary Clinton" occurrences.

Aggregations Documents Although all matching documents contribute to the "knowledge" that a query can provide, the documents themselves are not the only objects returned from a query. Similarly, the relevant information to the analysis is summed, averaged, or aggregated across all matching documents and so are referred to as the aggregations.

Page 24: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

24

2015

BASIC DATA ELEMENTS

DATA OBJECTS Units of data in ISA are called data objects that include the following diverse objects:

Web pages—Raw or annotated by natural language processing

Video files

Log records—individual or aggregated

Objects generated by analytics on existing data

KML overlays

Aircraft tracks

Business transactions

DATA SERVICES Table 6 shows a set of logical ways called data services, in which data can be stored, indexed, and retrieved.

Table 6. Data Services Types

Data Service Types Description

Document As an annotated document, which is a JSON object with a formatted sub-object describing entities, associations between entities, user comments, etc.

Search index A searchable object.

Columnar A related set of columns.

Graph A collection of nodes and edges.

Storage layer A set of "opaque" objects within a file.

Temporal, geo-spatial Enables time and geo-specific processing.

Data warehouse A relational view of the data well-suited to traditional OLAP-type processing.

Object Schema How an object is handled by ISA data services is defined by its schema (DataSchemaBean). The schema describes the different properties relative to each service, for example, which columns should be stored in columnar fashion, how the graph should be constructed from the objects, for how long objects should be stored, etc.

66 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 25: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

25

2015

DATA IMPORT Data is imported into ISA by means of buckets (DataBucketBean), which has two properties: harvesting and enrichment data import.

Harvesting Data Import Taking data from any transport layer and, in turn returning a set of JSON objects.

Enrichment Data Import Taking data from the harvest, filtering unwanted objects, formatting or creating the desired fields, applying internal or external functionalities such as geo-location, natural processing, lookups via other buckets, arbitrary business logic, and so on.

DATA BUCKETS The data schema to be applied to all objects in this bucket. Data buckets also have standard metadata shown in Table 7.

Table 7. Data Bucket Metadata

Data Bucket Metadata Description

Access Rights A set of access rights, as described in Data Security below.

Metadata Grouping Grouping metadata, while they can be grouped in a number of different ways:

Multi-Bucket A specific multi-bucket that is a collection of other buckets. Multiple buckets can be referenced by parent folders.

Bucket File system Each bucket has a file system hierarchy that physically maps onto where data is stored in the storage service or HDFS.

Multiple Buckets Alias Each bucket can also be assigned a common alias name that can refer to multiple buckets.

HARVESTING CONFIGURATION Harvesting configuration consists of three different parts:

Table 8. Harvesting Configuration Types

Harvesting Configuration Types

Description

Harvest Technology A JVM JAR implementation (IHarvestTechologyModule) whose callbacks are invoked whenever pre-defined actions occur on a bucket at the time it is created.

The Harvester is then free to do processing, typically launching or

Page 26: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

26

2015

re-configuring external processes such as Hadoop, Flume, Logstash, or a web crawler. This process results in ingesting objects using either the HDFS file interface (for batch operations) or functionality provided by an injected (IHarvestContext) object for streaming operations.

Harvest Module Optionally, a set of harvest module JVM JARs whose format is defined by the author of the Harvest Technology.

ISA enables the upload, access-permissions, discovery, and retrieval of Harvest Module Libraries. These libraries will typically provide de-framing of the data from its transport layer and JSON-ification.

Harvest Technology A list of Harvest Technology-specific JSON configuration objects.

HARVESTING DATA ENRICHMENT Enrichment of the data that has been harvested can take one of two forms: streaming or batch enrichment as shown in Table 9. Table 9. Data Enrichment Forms

Data Enrichment Forms Description

Streaming Data Enrichment Streaming enrichment, where each object is processed as soon as it is received.

Streaming enrichment use the Storm framework together with Kafka for messaging.

Batch Enrichment Batch enrichment, where enrichment is performed on sets of objects is more efficient but introduces latency and so is not suitable for alerting purpose.

Batch enrichment will use the Hadoop, YARN, or Spark framework.

Typically only one of the two supported enrichment forms is used. This means, you can take log records, perform batch processing on them, and then store them efficiently while performing a smaller set of enrichment processes in near-real-time and discarding most objects except for broadcasting "alerts" to listeners.

Page 27: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

27

2015

Data Enrichment Lists Table 10 shows that enrichment consists of two lists, batch and streaming.

Table 10. Data Enrichment Lists

Data Enrichment Lists Contents

JVM JAR files A list of JVM JAR files obtained from the Enrichment Module Library. One of the JAR files in this list must implement (IenrichmentBatchModule) or (IenrichmentStreamingModule).

The other JAR files can be in arbitrary format and can be used to provide functional libraries, for example the Stanford NLP set of JARs, internal utilities, etc.

JSON configuration object A JSON configuration object passed into the module at startup.

Dependencies The dependencies between the modules that can be used for batch processing to enrich the objects in parallel.

Similar to the harvester, enrichment modules have an (IenrichmentModuleContext)injected that enables the interaction with the core framework to filter objects, log errors, etc.

At the end of the enrichment stage, batch or streaming, the extracted, transformed, and enriched object is automatically passed on to each of the data services as defined in its schema for storage and indexing. It can also be broadcasted across an object bus for analytics or API listeners to process as described in Data Analytics below.

Note

A bucket can be generated without any harvesting and enrichment. It can point to an existing collection in the database or to an empty bucket that can then be populated either manually or by using analytic threads.

DATA ANALYTICS An Analytic Thread (AnalyticThreadBean) takes data from one or more populated buckets and then applies arbitrary further processing by using user-defined technologies such as Hadoop, Spark, Storm, Mahout, and Gephi.

Furthermore, these Analytic Threads will be contained the bucket corresponding to the output location of the results. Table 11 shows each Analytic Thread from which it is comprises.

Page 28: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

28

2015

Table 11. Data Analytics Components

Data Analytics Components

Contents

Analytic Technology The name of an Analytic Technology, JVM JAR file implementing IAnalyticsTechologyModule whose callbacks are invoked whenever a pre-defined action occurs, such as user interactions, another analytic thread completes, a bucket obtaining more data, on a regular schedule. The analytic technology will be responsible for queuing the desired analytics as defined by the remaining items on this list.

Data Query Services A set of inputs together with associated queries in the "language" of whichever "Data Service" is being used. For example, this could be "search term" queries, temporal queries, geo-spatial queries, "graph" queries, etc.

Analytic Modules A list of Analytic Modules, JVM JARs managed by the ISA Library whose format is defined by the corresponding Analytic Technology.

A configuration object describing the details of the analytics input, output, etc.

A set of dependencies within the analytic thread such as run module1, then module2, etc.

Analytic Thread The Analytic Thread run over the specified data and dump the output into one of more buckets with the appropriate data schemas.

The output can treat existing data in the output buckets in one of the following ways:

- Wipe and start again each time

- Add data incrementally

- Merge with existing data

Instead of taking the data At Rest from a bucket, objects can be streamed In Flight for real-time or near-real time analytics and alerting. The analytic thread in this case registers a bucket name and the pipeline stage before enrichment, after enrichment, or in the middle of enrichment (after the named enrichment module).

DATA SECURITY Security in ISA is delegated to a separate service, typically invoking an existing security scheme such as Kerberos and IKANOW ISA.

More on this topic, including the security architecture is described in the IKANOW Security Architecture Guide (TBD).

Page 29: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

29

2015

PLUGIN LIBRARIES The following plugin functionalities could be configured from libraries (SharedLibraryBean):

Harvest Technologies

Harvest Modules

Enrichment Modules

Analytic Technologies

Analytic Modules

Access Modules

ISA provides a library upload, storage, and retrieval services. Libraries are tagged for discovery and have access tokens assigned to them that determines who can use them. For example, different analytics or APIs can be restricted based on "user group," for example commercial tiers for SaaS, by division in a large organization, etc.

Note

Currently only the administrator can upload libraries for security reasons and then sets the access tokens to decide who can use the libraries.

Page 30: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

30

2015

DATA SOURCE PROCESSING PIPELINE

Pipeline in ISA is the flow of different components in Informatics. A mapping in Informatics may contain Source, Transformations, and Targets that are connected together to make up a pipeline. Many such pipelines in a single mapping can exist. A single pipeline takes place when one pipeline is connected to another.

DATA SOURCE PROCESSING TYPES IKANOW ISA platform supports two types of complex processing: input and custom source processing.

Input Sources Processing This type of source processing allows many different types of data be input into documents or records. Documents are larger and more complex objects are typically generated from more complex XML/JSON, as well as natural languages such as web-sites and reports.

The ISA platform provides a powerful pipeline of templated operations to transform these data types into ISA’s generic document model.

Records are smaller objects like single line log records, simple JSON objects, SQL records, and so on. ISA places almost no restrictions on the format of the JSON, including how it is to be imported into the system even though it integrates particularly well with the popular community-driven platform logstash to collect, enrich, and transport data.

Custom Processing Sources Custom source processing involves applying custom logic to existing documents and records to enrich the system with new data and functionalities as shown in Table 12.

Table 12. Custom Processing Sources New Data and Functionalities

New Data and Functionalities

Description

Reports Such as spreadsheets or statistical data containing directly actionable information.

New records and documents

Typically alerts, or aggregate "events" made up of multiple documents and records.

Lookup tables Tables that can be used to enrich new and existing documents like local asset information, generate alerts for malicious domains, etc.

IKANOW uses the popular Hadoop ecosystem to power its custom processing capabilities, integrating its output, management, monitoring and security layers.

77 VVEERRVVII

EEWW

OOFF

IIKKAA

NNOO

WW

HHAADD

OOOOPP

DDAATTAA

AACCCCEE

SSSS

SSEECCUU

RRIITTYY

PPRROODD

UUCCTT

Page 31: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

31

2015

Figure 10 shows how these two different activities, input and custom source processing are related.

Figure 10. Input and Custom Source Processing Relationships

The same JSON-based configuration language along with associated user interface can be used to build and maintain both types of pipelines. Typically, the elements do not mix. That is, a pipeline consists entirely of elements from either the "standard" set or the "custom" set even though there are some exceptions described below.

Page 32: IKANOW System Architecture Guide

32

2015

2015

DATA INPUT SOURCES The ISA architecture enables harvesting and enrichment that is a more logical process based on the concept of applying a pipeline of processing elements to documents proceeding from a source. This capability is illustrated in Figure 11 below:

Figure 11. Pipeline Elements Processing

Page 33: IKANOW System Architecture Guide

IKANOW Information Security Analytics (ISA) Threat Intelligence Platform

33

2015

Pipeline elements can be in any order and have any cardinality.

For example you could create metadata from raw HTML (using xpath), have an automated text extractor followed by pulling more metadata using regex/javascript, return to the original raw text, and then run a different automated extractor before creating entities.

A very useful scenario involves running the data through several entity extractors, potentially using the "criteria" field to choose which one to run based on the content and metadata extracted.

Figure 11 above shows the pipeline elements can be approximately grouped into the following categories shown in Table 13.

Table 13. Pipeline Element Categories

Pipeline Element Categories

Descriptions

Extractors Generates mostly empty ISA documents from external data sources.

Global Generate javascript artifacts that can be used by subsequent pipeline elements.

Secondary extractors Enables new documents to be produced in large number from the existing metadata.

Text extraction Manipulates the raw document content.

Metadata Generates document metadata such as title, description, date, as well as arbitrary content metadata that use xpath, regex, and javascript.

Entities and associations Creates entities and associations out of the text.

Storage and indexing Decides which documents to keep, what fields to keep, and what to index full text for searching using the GUI/API.