Upload
ashwini-almad
View
244
Download
6
Embed Size (px)
Citation preview
November 4, 2016
DEEPDGA: ADVERSARIALLY-TUNED DOMAIN GENERATION AND DETECTION
Bobby Filar -> @filarHyrum Anderson Jonathan WoodbridgeAISec2016
Outline
§ Motivation
§ Background
§ DeepDGA Architecture
§ Experiment(s) Setup
§ Results
§ Future Work
2
Motivations
§ Can we Red team vs. Blue team a known infosec problem (DGAs) leveraging Generative Adversarial Networks (GAN)?
§ Offensive: Leverage GANs to construct a deep-learning DGA designed to bypass an independent classifier.
§ Defensive: Can adversariallygenerated domains augment/improve training data to harden independent classifier???
3
www.endgame.com
www.endgame-2016.com
www.emdgane.com
Related Work
§ Recent work in adversarial examples• Explaining and harnessing
adversarial examples (Goodfellow, 2015)
• Adversarial perturbations against DNN for Malware Classification (Papernot, 2016)
§ Key differences between other domains and INFOSEC:• Other domains – Make my model
robust to occasional blind spot examples that it might come across in the wild
• Information Security – Discover and plug holes in my model that the adversary is actively trying to discover and exploit (Red vs. Blue)
4
Fast gradient sign method (Goodfellow)
“What is the cost of changing X’s label to a different y?”
Background à Domain-Generated Algorithm
§ Employed by malware families to bypass common C2 defenses
§ DGAs take a seed input and generate large amounts of pseudo-random domain names
§ Subset of domains registered command and control (C2) servers
§ Botnets and malware iterate through generated domains until it finds one that is registered, connects and establishes C2 channel
§ Asymmetric attack since defender must know all possible domains to blacklist
5
DNS
C2212.211.123.01
bjgkre.com212.211.123.01NXDomain wedcf.comNXDomain asdfg.com
Domain-Generated Algorithm à Cryptolocker Example
btbpurnkbqidxxclfdfrdqjasjphyrtn.orgsehccrlyfadifehntnomqgpfyunqqfft.orgkonsbolyfadifehntnomqgpfyunqqfft.orgcytfiobnkjxomkmhimxhcfvtogyaiqaa.org
6
Domain-Generated Algorithm à Character Distributions
7
§ DGA char dist + ML == robust defense?
§ Cryptolocker and ramnit are both nearly uniform over same range
• Expected; Calculations on a single seed
§ Suppobox concatenates random words from English dictionary thus reflects the distribution of Alexa 1M
§ Much more difficult for prior DGA detection models to correctly classify
§ Our goal is build a character-based generator that mimics the Alexa domain name distribution
AUTOENCODERS
§ Data compression algorithm
§ Models consist of encoder, decoder, and loss function
• encoder — transforms input to a low-dimension embedding (lossycompression)
• decoder — reconstruct original input from encoder (decompression)
§ Goal: minimize distortion between reconstructed output and original input
§ Easy to train; Don’t need labels (unsupervised)
GENERATIVE ADVERSARIAL NETWORKS
§ Adversarial game between two models
• generator— seeks to create synthetic data based on samples from the true data distribution (w/ added noise)
• discriminator — receives sample and must determine if it is a synthetic (from generator) or true data sample
§ Goal: Find an equilibrium similar to Nash Equilibrium by pitting models against one another
§ Harder to train; Unsupervised• lots of failure modes
8
Background à Frameworks
DeepDGA Architecture
DeepDGA à Autoencoder à Encoder
§ Encoder architecture taken from [Kim et al, 2015], found useful in character-level language modeling
§ Embedding learns linear mapping for each valid domain character (20 dimension space)
§ Convolutions filters applied to capture character combos (bi/trigrams)
§ Max-pooling over-time & over-filter• Gather fixed-representation
§ Highway Network à LSTM
10
Learn the right representation of Alexa domains
DeepDGA à Autoencoder à Decoder
§ Decoder is ~ the reverse of encoder minus maxpool step
§ Domain embedding is repeated over max length domain length (time-steps)
§ Sequence is passed to LSTM àHighway Network à Convolutional Filters
§ Softmax activation on final layer produces a multinomial distribution over domain characters
§ Sampled to generate new domain name modeled after the input domain name.
11
DeepDGA à GAN
§ Simply rewire autoencoderframework as the base of our GAN• Accepts random seed as input• Outputs domains much like
valid domain name
§ Box Layer — restricts output to live in axis-aligned box defined by embedding vectors of training data• Parameterize manifold coords
of legit domains • Box layer used in generator to
ensure it only learns domains in the legit domain (Alexa-like) manifold
12
DeepDGA à History Regularization
§ Regularize the discriminator model by training on both recently generated samples, but also sampled domains from prior adversarial rounds
§ Helps discriminator “remember” any deficiencies in model coverage AND forces discriminator to learn novel domain embeddings
§ Reduces likelihood for generator collapsing (i.e. generating same domain every batch)
13
DeepDGA à Walkthrough
14
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
Move 1: Red Teamtrain generator to randomly create
impostors that trick the detector
DeepDGA à Walkthrough
15
0.1
…
x
…
…
…
…
…
www.emdgane.com
www.emdgane.com
generatordetector
Move 2: Blue Teamtrain detector to distinguish real domains from generator’s impostors
DeepDGA à Walkthrough
16
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
DeepDGA à Walkthrough
17
www.endgame.comwww.endgame.com
0.1
…
x
…
…
…
…
…
detector generator
www.emdgane.comwww.emdgane.com
(encoder) (decoder)
DeepDGA à Walkthrough
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
18
AUTOENCODED DOMAINS
<input domain> à <output domain>clearspending à clearspending
synodos à synodos3walq à 3walq
kayak à kayak
sportpitvl à sportpitvl7resume à 7resume
templateism à templateism
spielefuerdich à spielefueddrch
firebaseapp à firepareappgilliananderson à gilliadandelson
tonwebmarketing à torwetmarketing
thetubestore à thebubestoreinfusion à infunion
akorpasaji à akorpajaji
hargonis à harnonis
GAN-GENERATED DOMAINS
firiaps.com
qiurdeees.comgyldles.com
lirneret.com
vietips.commivognit.com
shtrunoa.comgilrr.com
yhujq.com
sirgivrv.comtisehl.com
thellehm.com
sztneetkop.comchdareet.com
statpottxy.com
laner.comspienienitne.com19
DeepDGA à Generated Domains
Experiment Setup &Results
Experiment Setup
§ Datasets• Alexa Top 1M • DGA Family datasets• All Open Source
§ Training Time• DeepDGA (Autoencoder & GAN)
implemented in Keras (python DL library)
• Auto encoder pretrained for 300 epochs· each epoch 256K domains randomly
sampled · batch size of 128· 14 hours on NVIDIA Titan X GPU
• Each adversarial round generated 12.8K samples against detector · @ 7 mins on GPU per round
21
Experiment Setup à Offensive
§ Red Team: DeepDGA vs. External Classifier
§ Random Forest model (sklearn – python) • ensemble classifier more resistant to
adversarial attacks due to low variance
§ Handcrafted feature extraction• domain length • entropy of character distribution• vowel-to-constant ratio• n-grams
§ Model trained on Alexa top 10K vs. DeepDGA
• Results averaged over 10-fold CV
22
Trained explicitly to catch DeepDGA and only DeepDGA
DeepDGA vs. External Classifier
23
0%10%20%30%40%50%60%70%80%90%
100%
accuracy (%)(trained to catch 11 DGA families, equally represented in training set )
DeepDGA à Character Distributions
§ Earlier we compared DGA families and Alexa 1M character distributions.
• Anomalous distributions were easy to identify
§ DeepDGA character distributions pre-adversarial rounds also appear anomalous.
§ But… post-adversarial rounds begin to resemble Alexa 1M (still not perfect)
§ Character distribution would confound previously important features
• Entropy• Vowel-to-consonant ratio• n-grams
24
Experiment Setup à Defensive
§ The core of this research was to determine if adversarial examples could harden an independent classifier.
§ Augmented training dataset w/ adversarial domains generated by GAN.
§ In theory, model can be hardened against previously unobserved families (in training set)
§ Employed LOO strategy which entire DGA family was held out for validation
• Baseline – Model trained on other 9 families + Alexa Top 10k
• Hardened – Repeated process w/ + DeepDGA (malicious)
25
Binary Classification Before/After Adversarial HardeningTPR @ a fixed 1% FPR
Summary
§ Contributions • Present the first known Deep Learning architecture to pseudo-randomly
generate domain names• Demonstrate that adversarially-crafted domains names targeting a DL model
are also adversarial for an independent external classifier• At least experimentally, those same adversarial samples can be used to
augment a training set and harden an independent classifier
§ Hard problems• GANs are hard! à Adversarial game construction• Carefully watch FP rate
· A dataset overloaded w/ augmented DGAs can increase the FP rate · Model tries to learn that these “realistic” domain names are possibly
malicious
§ Future Work• Network – improving domain name generation (DGA) and detection• Strengthen Malware Classification models
· Malicious WinAPI sequences· Adversarially-tuned static feature vectors
26
Questions?