University of Missouri, Department of Computer Science University of Missouri, Informatics Institute...

University of Missouri, Department of Computer ScienceUniversity of Missouri, Informatics Institute

Sean Lander, Master’s Candidate

An Evolutionary Method for Training Autoencoders for Deep Learning NetworksMASTER’S THESIS DEFENSE

SEAN LANDER

ADVISOR: YI SHANG

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science

AgendaoOverviewoBackground and Related WorkoMethodsoPerformance and TestingoResultsoConclusion and Future Work

OverviewDeep Learning classification/reconstructionoSince 2006, Deep Learning Networks (DLNs) have changed the landscape of classification problemsoStrong ability to create and utilize abstract featuresoEasily lends itself to GPU and distributed systemsoDoes not require labeled data – VERY IMPORTANToCan be used for feature reduction and classification

OverviewProblem and proposed solutionoProblems with DLNs:oCostly to train with large data sets or high feature spacesoLocal minima systemic with Artificial Neural NetworksoHyper-parameters must be hand selected

oProposed Solutions:oEvolutionary based approach with local search phaseo Increased chance of global minimumoOptimizes structure based on abstracted featuresoData partitions based on population size (large data only)oReduced training timeoReduced chances of overfitting

BackgroundPerceptronsoStarted with Perceptron in 1950oOnly capable of linear separabilityoFailed on XOR

BackgroundArtificial Neural Networks (ANNs)oANNs went out of favor until the Multilayer Perceptron (MLP) introducedoPro: Non-linear classificationoCon: Time consuming

oAdvance in training: BackpropagationoIncreased training speedsoLimited to shallow networksoError propagation diminishes anumber of layers increase

BackgroundBackpropagation using Gradient DescentoProposed in 1988, based on classification erroroGiven m training samples:

oFor each sample where calculate its error:

oFor all m training samples the total error can be calculated as:

BackgroundDeep Learning Networks (DLNs)oAllows for deep networks with multiple layersoLayers pre-trained using unlabeled dataoLayers are “stacked” and fine tunedoMinimizes error degradation for deepneural networks (many layers)

oStill costly to trainoManual selection of hyper-parametersoLocal, not global, minimum

BackgroundAutoencoders for reconstructionoAutoencoders can be used forfeature reduction and clusteringo“Classification error” is the abilityto reconstruct the sample inputoAbstracted features – output fromthe hidden layer – can be used toreplace raw input for othertechniques

Related WorkEvolutionary and genetic ANNsoFirst use of Genetic Algorithms (GAs) in 1989oTwo layer ANN on a small data setoTested multiple types of chromosomal encodings and mutation types

oLate 1990s and early 2000s introduced other techniquesoMulti-level mutations and mutation priorityoAddition of local search in each generationoInclusion of hyper-parameters as part of the mutationoIssue of competing conventions starts to appearo Two ANNs produce the same results by sharing the same nodes but in a permuted order

Related WorkHyper-parameter selection for DLNsoMajority of the work explored using newer technologies and methods such as GPU and distributed (MapReduce) trainingoImproved versions of Backpropagation, such as Conjugated Gradient or Limited Memory BFGS were tested under different conditionsoMost conclusions pointed toward manual parameter selection via trial-and-error

Method 1Evolutionary Autoencoder (EvoAE)oIDEA: Autoencoders’ power are in their feature abstraction, the hidden node outputoTraining many AEs willmake more potentialabstracted featuresoBest AEs will contain thebest featuresoJoining these featuresshould create a better AE

Method 1Evolutionary Autoencoder (EvoAE)

x x’

Method 1ADistributed learning and Mini-batchesoTraining of generic EvoAE increases in time linearly to the size of the populationoANN training time increases drastically with data sizeoTo combat this, mini-batches can be used where each AE is trained against a batch and updatedoBatch size << total data

Method 1ADistributed learning and Mini-batchesoEvoAE lends itself to distributed systemoData duplication and storage now an issue due to data duplication

Train• Forward propagation• Backpropagation

Rank• Calculate error• Sort

GA• Crossover• Mutate

Batch 1

Batch 2

Batch N

Method 2EvoAE Evo-batchesoIDEA: When data is large, small batches can be representativeoPrevents overfitting as nodes being trained are almost always introduced to new dataoScales well with large amounts of data even when parallel training is not possibleoWorks well on limited memory systems by increasing size of the population, thus reducing data per batchoQuick training of large populations, equivalent to training a single autoencoder using traditional methods

Method 2EvoAE Evo-batches

Data A Data B Data C Data D

Data D

Data C

Data B

Data A

Original Data

Local SearchCrossoverMutate

Performance and TestingHardware and testing parametersoLenovo Y500 laptopoIntel i7 3rd generation 2.4GHzo12 GB RAM

oAll weights randomly initialized to N(0,0.5)Parameter Wine Iris Heart Disease MNIST

Hidden Size 32 32 12 200

Hidden Std Dev NULL NULL NULL 80

Hidden +/- 16 16 6 NULL

Mutation Rate 0.1 0.1 0.1 0.1

Parameter Defaults

Learning Rate 0.1

Momentum 2

Weight Decay 0.003

Population Size 30

Generations 50

Epochs/Gen 20

Train/Validate 80/20

Performance and TestingBaseline

Learning rate Learning rate * 0.1

oBaseline is a single AE with 30 random initializationsoTwo learning rates to create two baseline measurementsoBase learning rateoLearning rate * 0.1

Performance and TestingData partitioningoThree data partitioning methods were usedoFull dataoMini-batchoEvo-batch

Full data Mini-batch Evo-batch

Performance and TestingPost-training configurationsoPost-training run in the following waysoFull data (All)oBatch data (Batch)oNone

Full data Batch data None

All sets below are using the Evo-batch configuration

ResultsParameters ReviewParameter Wine MNIST

Hidden Size 32 200

Hidden Std Dev NULL 80

Hidden +/- 16 NULL

Mutation Rate 0.1 0.1

Parameter Defaults

Learning Rate 0.1

Momentum 2

Weight Decay 0.003

Population Size 30

Generations 50

Epochs/Gen 20

Train/Validate 80/20

ResultsDatasetsoUCI wine dataseto178 sampleso13 featureso3 classesoReduced MNIST dataseto6000/1000 and 24k/6k training/testing sampleso784 featureso10 classes (0-9)

ResultsSmall datasets - UCI Wine

Parameter Wine

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

ResultsSmall datasets - UCI WineoBest error-to-speed:oBaseline 1

oBest overall error:oFull data All

oFull data is fast onsmall scale dataoEvo- and mini-batchnot good on smallscale data

Parameter Wine

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

ResultsSmall datasets – MNIST 6k/1k

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

ResultsSmall datasets – MNIST 6k/1koBest error-to-time:oMini-batch None

oBest overall error:oMini-batch Batch

oFull data slowsexponentially onlarge scale dataoEvo- and mini-batchclose to baseline speed

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

ResultsMedium datasets – MNIST 24k/6k

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

ResultsMedium datasets – MNIST 24k/6koBest error-to-time:oEvo-batch None

oBest overall error:oEvo-batch Batch ORoMini-batch Batch

oFull data too slow torun on datasetoEvoAE w/ population30 trains as quickly asa single baseline AEwhen using Evo-batch

Parameter MNIST

Hidden Size 200

Hidden Std Dev 80

Hidden +/- NULL

Mutation Rate 0.1

ConclusionsGood for large problemsoTraditional methods are still preferred choice for small problems and toy problemsoEvoAE with Evo-batch produces effective and efficient feature reduction given a large volume of dataoEvoAE is robust against poorly-chosen hyper-parameters, specifically learning rate

Future WorkoImmediate goals:oTransition to distributed system, MapReduce based or otherwiseoHarness GPU technology for increased speeds (~50% in some

cases)

oLong term goals:oOpen the system for use by novices and non-programmersoMake the system easy to use and transparent to the user for both

modification and training purposes

Thank you

BackgroundBackpropagation with weight decayoWe use this new cost to update weights and biases given some learning rate α:

oCost is prone to overfitting - weight decay variable λ is added

BackgroundConjugated Gradient DescentoThis can become stuck in a loop, however, so we add a momentum term β

oThis adds memory to the equation, as we use previous updates

BackgroundArchitecture and hyper-parametersoArchitecture and hyper-parameter selection usually done through trial-and-erroroManually optimized and updated by handoDynamic learning rates can beimplemented to correct forsub-optimal learning rate selection

ResultsSmall datasets – UCI IrisoThe UCI Iris dataset has 150 samples with 4 features and 3 classesoBest error-to-speed:oBaseline 1

oBest overall error:oFull data None

Parameter Iris

Hidden Size 32

Hidden Std Dev NULL

Hidden +/- 16

Mutation Rate 0.1

ResultsSmall datasets – UCI Heart DiseaseoThe UCI Heart Disease dataset has 297 samples with 13 features and 5 classesoBest error-to-time:oBaseline 1

oBest overall error:oFull data None

Parameter Heart Disease

Hidden Size 12

Hidden Std Dev NULL

Hidden +/- 6

Mutation Rate 0.1

University of Missouri, Department of Computer Science University of Missouri, Informatics Institute...

Documents

Organizational Chart 2021 - Lander University

2012-2013 FINANCIAL REPORT - Missouri State University · 2013. 10. 18. · 2012-2013 FINANCIAL REPORT Missouri State University Foundation 7 Missouri State University Foundation

The Vietnam War Dr. Kevin B. Witherspoon Lander University

LANDER UNIVERSITY · Lander University MSN-CNL Student Handbook School of Nursing Fall 2017 LANDER UNIVERSITY School of Nursing MSN-CNL Nursing Student Handbook 2017-2018 ATTENTION:

LANDER UNIVERSITY SCHOOL OF NURSING POLICY AND …

New Program Proposal Master of Science, Emergency ... Calendar...Master of Science, Emergency Management Lander University Summary Lander University requests approval to offer a program

STATE OF MISSOURI DIVISION OF PURCHASING AND MATERIALS MANAGEMENT UNIVERSITY ... · 2014. 12. 9. · Missouri Western State University - St. Joseph Northwest Missouri State University

LANDER UNIVERSITY 2018-2019 GRADUATE CATALOG · 2020-03-27 · LANDER UNIVERSITY . 2018-2019 GRADUATE CATALOG . Lander University reserves the right to make changes in curricula,

Missouri University of Science and Technology (Missouri S

2017 Lincoln University of Missouri and University of ... · 2017 Lincoln University of Missouri and University of Missouri Combined Research and Extension Plan of Work efficient,

Southeast Missouri State University protection program for southeast missouri state university cape girardeau, missouri prepared by southeast missouri state university

Missouri University of Science and Technology Material ...materialadvantage.org/wp-content/uploads/2014/08/Missouri-ST.pdf · The Missouri University of Science and Technology (Missouri

DEPARTMENT OF ART + DESIGN - Lander University

| University of Missouri

Evaluating Center Pivots in Missouri Joe Henggeler University of Missouri University of Missouri 38th ANNUAL MISSOURI IRRIGATION CONFERENCE Wednesday,

T Class of 2020...To the Lander University Class of 2020 Dr. Richard Cosentino President Lander University Dear Lander University Class of 2020, History will view you as one of the

LaShonda Carter-Boone University of Missouri System Mardy Eimers University of Missouri System

University of Missouri - Columbia UNIVERSITY SYMPHONY

Dr. Rick McGuire University of Missouri Anne Shadle University of Missouri

LANDER UNIVERSITY SCHOOL OF NURSING POLICY AND … · lander university school of nursing policy and procedure manual fall 2018 (revised spring 2018)