Deep DGA: Adversarially-Tuned Domain Generation and Detection

November 4, 2016

DEEPDGA: ADVERSARIALLY-TUNED DOMAIN GENERATION AND DETECTION

Bobby Filar -> @filarHyrum Anderson Jonathan WoodbridgeAISec2016

Outline

§ Motivation

§ Background

§ DeepDGA Architecture

§ Experiment(s) Setup

§ Results

§ Future Work

2

Motivations

§ Can we Red team vs. Blue team a known infosec problem (DGAs) leveraging Generative Adversarial Networks (GAN)?

§ Offensive: Leverage GANs to construct a deep-learning DGA designed to bypass an independent classifier.

§ Defensive: Can adversariallygenerated domains augment/improve training data to harden independent classifier???

3

www.endgame.com

www.endgame-2016.com

www.emdgane.com

Related Work

§ Recent work in adversarial examples• Explaining and harnessing

adversarial examples (Goodfellow, 2015)

• Adversarial perturbations against DNN for Malware Classification (Papernot, 2016)

§ Key differences between other domains and INFOSEC:• Other domains – Make my model

robust to occasional blind spot examples that it might come across in the wild

• Information Security – Discover and plug holes in my model that the adversary is actively trying to discover and exploit (Red vs. Blue)

4

Fast gradient sign method (Goodfellow)

“What is the cost of changing X’s label to a different y?”

Background à Domain-Generated Algorithm

§ Employed by malware families to bypass common C2 defenses

§ DGAs take a seed input and generate large amounts of pseudo-random domain names

§ Subset of domains registered command and control (C2) servers

§ Botnets and malware iterate through generated domains until it finds one that is registered, connects and establishes C2 channel

§ Asymmetric attack since defender must know all possible domains to blacklist

5

DNS

C2212.211.123.01

bjgkre.com212.211.123.01NXDomain wedcf.comNXDomain asdfg.com

Domain-Generated Algorithm à Cryptolocker Example

btbpurnkbqidxxclfdfrdqjasjphyrtn.orgsehccrlyfadifehntnomqgpfyunqqfft.orgkonsbolyfadifehntnomqgpfyunqqfft.orgcytfiobnkjxomkmhimxhcfvtogyaiqaa.org

6

Domain-Generated Algorithm à Character Distributions

7

§ DGA char dist + ML == robust defense?

§ Cryptolocker and ramnit are both nearly uniform over same range

• Expected; Calculations on a single seed

§ Suppobox concatenates random words from English dictionary thus reflects the distribution of Alexa 1M

§ Much more difficult for prior DGA detection models to correctly classify

§ Our goal is build a character-based generator that mimics the Alexa domain name distribution

AUTOENCODERS

§ Data compression algorithm

§ Models consist of encoder, decoder, and loss function

• encoder — transforms input to a low-dimension embedding (lossycompression)

• decoder — reconstruct original input from encoder (decompression)

§ Goal: minimize distortion between reconstructed output and original input

§ Easy to train; Don’t need labels (unsupervised)

GENERATIVE ADVERSARIAL NETWORKS

§ Adversarial game between two models

• generator— seeks to create synthetic data based on samples from the true data distribution (w/ added noise)

• discriminator — receives sample and must determine if it is a synthetic (from generator) or true data sample

§ Goal: Find an equilibrium similar to Nash Equilibrium by pitting models against one another

§ Harder to train; Unsupervised• lots of failure modes

8

Background à Frameworks

DeepDGA Architecture

DeepDGA à Autoencoder à Encoder

§ Encoder architecture taken from [Kim et al, 2015], found useful in character-level language modeling

§ Embedding learns linear mapping for each valid domain character (20 dimension space)

§ Convolutions filters applied to capture character combos (bi/trigrams)

§ Max-pooling over-time & over-filter• Gather fixed-representation

§ Highway Network à LSTM

10

Learn the right representation of Alexa domains

DeepDGA à Autoencoder à Decoder

§ Decoder is ~ the reverse of encoder minus maxpool step

§ Domain embedding is repeated over max length domain length (time-steps)

§ Sequence is passed to LSTM àHighway Network à Convolutional Filters

§ Softmax activation on final layer produces a multinomial distribution over domain characters

§ Sampled to generate new domain name modeled after the input domain name.

11

DeepDGA à GAN

§ Simply rewire autoencoderframework as the base of our GAN• Accepts random seed as input• Outputs domains much like

valid domain name

§ Box Layer — restricts output to live in axis-aligned box defined by embedding vectors of training data• Parameterize manifold coords

of legit domains • Box layer used in generator to

ensure it only learns domains in the legit domain (Alexa-like) manifold

12

DeepDGA à History Regularization

§ Regularize the discriminator model by training on both recently generated samples, but also sampled domains from prior adversarial rounds

§ Helps discriminator “remember” any deficiencies in model coverage AND forces discriminator to learn novel domain embeddings

§ Reduces likelihood for generator collapsing (i.e. generating same domain every batch)

13

DeepDGA à Walkthrough

14

0.1

…

x

…

…

…

…

…

random seed

www.emdgane.comwww.emdgane.com

www.emdgane.com

generatordetector

Move 1: Red Teamtrain generator to randomly create

impostors that trick the detector


15

0.1

…

x

…

…

…

…

…

www.emdgane.com

www.emdgane.com

generatordetector

Move 2: Blue Teamtrain detector to distinguish real domains from generator’s impostors


16

0.1

…

x

…

…

…

…

…

random seed


www.emdgane.com

generatordetector


17

www.endgame.comwww.endgame.com

0.1

…

x

…

…

…

…

…

detector generator


(encoder) (decoder)


0.1

…

x

…

…

…

…

…

random seed


www.emdgane.com

generatordetector

18

AUTOENCODED DOMAINS

<input domain> à <output domain>clearspending à clearspending

synodos à synodos3walq à 3walq

kayak à kayak

sportpitvl à sportpitvl7resume à 7resume

templateism à templateism

spielefuerdich à spielefueddrch

firebaseapp à firepareappgilliananderson à gilliadandelson

tonwebmarketing à torwetmarketing

thetubestore à thebubestoreinfusion à infunion

akorpasaji à akorpajaji

hargonis à harnonis

GAN-GENERATED DOMAINS

firiaps.com

qiurdeees.comgyldles.com

lirneret.com

vietips.commivognit.com

shtrunoa.comgilrr.com

yhujq.com

sirgivrv.comtisehl.com

thellehm.com

sztneetkop.comchdareet.com

statpottxy.com

laner.comspienienitne.com19

DeepDGA à Generated Domains

Experiment Setup &Results

Experiment Setup

§ Datasets• Alexa Top 1M • DGA Family datasets• All Open Source

§ Training Time• DeepDGA (Autoencoder & GAN)

implemented in Keras (python DL library)

• Auto encoder pretrained for 300 epochs· each epoch 256K domains randomly

sampled · batch size of 128· 14 hours on NVIDIA Titan X GPU

• Each adversarial round generated 12.8K samples against detector · @ 7 mins on GPU per round

21

Experiment Setup à Offensive

§ Red Team: DeepDGA vs. External Classifier

§ Random Forest model (sklearn – python) • ensemble classifier more resistant to

adversarial attacks due to low variance

§ Handcrafted feature extraction• domain length • entropy of character distribution• vowel-to-constant ratio• n-grams

§ Model trained on Alexa top 10K vs. DeepDGA

• Results averaged over 10-fold CV

22

Trained explicitly to catch DeepDGA and only DeepDGA

DeepDGA vs. External Classifier

23

0%10%20%30%40%50%60%70%80%90%

100%

accuracy (%)(trained to catch 11 DGA families, equally represented in training set )

DeepDGA à Character Distributions

§ Earlier we compared DGA families and Alexa 1M character distributions.

• Anomalous distributions were easy to identify

§ DeepDGA character distributions pre-adversarial rounds also appear anomalous.

§ But… post-adversarial rounds begin to resemble Alexa 1M (still not perfect)

§ Character distribution would confound previously important features

• Entropy• Vowel-to-consonant ratio• n-grams

24

Experiment Setup à Defensive

§ The core of this research was to determine if adversarial examples could harden an independent classifier.

§ Augmented training dataset w/ adversarial domains generated by GAN.

§ In theory, model can be hardened against previously unobserved families (in training set)

§ Employed LOO strategy which entire DGA family was held out for validation

• Baseline – Model trained on other 9 families + Alexa Top 10k

• Hardened – Repeated process w/ + DeepDGA (malicious)

25

Binary Classification Before/After Adversarial HardeningTPR @ a fixed 1% FPR

Summary

§ Contributions • Present the first known Deep Learning architecture to pseudo-randomly

generate domain names• Demonstrate that adversarially-crafted domains names targeting a DL model

are also adversarial for an independent external classifier• At least experimentally, those same adversarial samples can be used to

augment a training set and harden an independent classifier

§ Hard problems• GANs are hard! à Adversarial game construction• Carefully watch FP rate

· A dataset overloaded w/ augmented DGAs can increase the FP rate · Model tries to learn that these “realistic” domain names are possibly

malicious

§ Future Work• Network – improving domain name generation (DGA) and detection• Strengthen Malware Classification models

· Malicious WinAPI sequences· Adversarially-tuned static feature vectors

26

Questions?

Technology

Deep DGA: Adversarially-Tuned Domain Generation and Detection