Upload
hoangdan
View
220
Download
3
Embed Size (px)
Citation preview
National Election Prediction
Lei Xu
University of Wisconsin—Madison
ECE/CS 539 Introduction to Artificial Neuron Network
and Fuzzy Systems
ABSTRACTPresident election is always a hot topic for this country .And as
the new turn of election is coming ,I just curious about how the
individuals elect for their president ? Is there any regular we can
find to predict one’s choice based on his personality ?
For my project ,I’d like to construct a multi-layer perceptron
Artificial Neural Network using for make the prediction for an
individual’s vote result based on four main
features:race,age,education level and income. If we want to do
the prediction ,we need to do the classification task first and
train the artificial neuron network.The configuration of the
Neuron Network also plays a n essential role .In my
project ,discovering the most well-performed ANN structure is
also an key part. Finally ,if I can access to enough detailed
election data ,I will be able to predict the final result of the
election with the trained Artificial Neuron Network.
PROBLEM STATEMENTIt is extremely difficult, if not impossible, for a politician to
estimate if he/she can win in the coming election. A politician
may able to predict his ballot in each individual region.
However, the final election is always hard to foresee,which may
largely depends on your opponent’s campaign.the other
competitors in each event.
However in my project ,I simplify the problem into a
classification problem .Using 4 features of an individual voter to
judge one’s election choice.Using the Artificial Neuron Network
to process the voter data then generates the results.
BACKGROUND1.About the election
The United States presidential election of 2016, scheduled for
Tuesday, November 8, 2016, will be the 58th quadrennial U.S.
presidential election. Voters will select presidential electors who
in turn will elect a new president andvice president through the
Electoral College.
The series of presidential primary elections and caucuses is
taking place between February 1 and June 14, 2016, staggered
among the 50 states, the District of Columbia and U.S.
territories. This nominating process is also an indirect election,
where voters cast ballots for a slate of delegates to a political
party'snominating convention, who then in turn elect their
party's presidential nominee. The 2016 Republican National
Conventionwill take place from July 18 to July 21, 2016
in Cleveland, Ohio. The2016 Democratic National
Convention will take place from July 25 to July 28, 2016
inPhiladelphia, Pennsylvania.
Businessman and reality television personality Donald
Trump became thepresumptive nominee of the Republican
Party on May 3, 2016, after the suspensions of Ted
Cruz and John Kasich's campaigns, respectively, and his win in
theIndiana primary. He is expected to face the as of yet
undetermined nominee of the Democratic Party in the general
election, presumably either Hillary Clintonor Bernie Sanders.
2.Multilayer perceptron
A multi-layer perceptron is a
type of feed-forward neural
network of threshold units .
Multi-layer perceptrons are composed of an input layer of
neurons, successive layers of intermediate units, and a layer of
output neurons. The output of each layer is connected to the
input of the next layer. A synaptic weight is associated with the
each unique connection between neurons in neighboring layers.
Each neuron itself is associated with a hyper plane, and
classifies its input based on which side of the hyper plane the
input falls, this classification is then passed on to neurons in the
next layer. To be used for classification, the weights and
activation functions of each neuron must be calibrated so that
when feature vectors are inputted to the input layer of neurons,
the correct classification vector is outputted from the output
neurons.
3.Back-propagation
Backpropagation, is a common
method of training artificial
neural networks used in
conjunction with an optimization
method such as gradient descent.
The method calculates the
gradient of a loss function with respect to all the weights in the
network. The gradient is fed to the optimization method which
in turn uses it to update the weights, in an attempt to minimize
the loss function.
Backpropagation requires a known, desired output —an
individual’s election result ,for each input value—the four
dimension feature vector, in order to calculate the loss function
gradient. It is therefore usually considered to be a supervised
learning method,. It is a generalization of the delta rule to multi-
layered feedforward networks, made possible by using the chain
rule to iteratively compute gradients for each layer.
4.CROSS VALIDATION
Cross-validation is a model validation technique for assessing
how the results of a statistical analysis will generalize to an
independent data set. It is mainly used in settings where the goal
is prediction, and one wants to estimate how accurately a
predictive model will perform in practice. In a prediction
problem, a model is usually given a dataset of known data on
which training is run (training dataset), and a dataset
of unknown data (or first seen data) against which the model is
tested (testing dataset).
Itinvolves partitioning a sample of data into complementary sub
sets, training set, and testing set. To reduce variability, multiple
rounds of cross-validation are performed using different
partitions, and the validation results are averaged over the
rounds.
In summary, cross-validation combines (averages) measures of
fit (prediction error) to correct for the optimistic nature of
training error and derive a more accurate estimate of model
prediction performance.
IMPLEMENTATIONData
Currently,I decide to use the turnout data set from
https://vincentarelbundock.github.io/Rdatasets/doc/Zelig/
turnout.html ,which contains individual-level turnout data and
pools several American National Election Surveys conducted
during the 1992 presidential election year.
Example:race age educate income votewhite 60 14 3.3458 1white 51 10 1.8561 0white 24 12 0.6304 0
In the last column, result 1 represent the choice “Bill Clinton” and 0 represent the choice “George Bush”’
Feature vectorsThe features being analyzed in this project are Race Age
Education Background and Income(thousand per month).The
reason is simple that they are the data we are accessible from the
data set ,and also contributing to one’s final vote decision to
some extent.However,we cannot deny that there do exist some
other more essential factors play a more in one’s decision ,such
as political stand or occupation.At first I’ll try to use this data
set as samples ,future replacement will be made ,if more
reasonable data set can be found.
Each feature vector contains 4 features and 1 label,giving a total of 5 position .An example feature vector is as follow .
Race Age Educate Income Label
ModelFor this project ,the model I plan to use is a Multi-Layer
Perceptron(MLP)with 4 inputs, 1 output and 2 hidden layers
with 50 neurons each. In each neuron ,I’d like to use a sigmoidal
activation function with the alpha value 0.1 and the momentum
is set to 0.9 at first .
With the 2000 sample data ,I choose to use a n way cross-
validation method to modify the network.May choose 1500
samples to be the training data and the rest 500 sample as the
testing data. The MLP is applied to predict the vote choice of a
individual (Clinton or Bush) .The output label corresponding to
this as follow : [1 0]——Clinton
[0 1]——Bush
RESULT
[10 10]; η:0.15; MSE=0.020619
[10 10]; η:0.01;MSE=0
[10 10]; η:0.10 MSE=0
[10 10]; η:0.15; MSE=0.020619
[10 10 10] η=0.15 MSE=0.1856
[5 5] η=0.15 MSE=0
o be the t The result above shows that with the different netrork
configuration.The performance of the ANN,measured by mean
square error(MSE) ,varies.Generally the simpler structure did a
better job.Because the complex structure of ANN may have a
over-fitting problem. In statistics and machine learning, one of
the most common tasks is to fit a "model" to a set of training
data, so as to be able to make reliable predictions on general
untrained data.
In overfitting, a statistical model describes random error or
noise instead of the underlying relationship. Overfitting occurs
when a model is excessively complex, such as having too
many parameters relative to the number of observations. A
model that has been overfit has poor predictive performance, as
it overreacts to minor fluctuations in the training data.
DISCUSSION AND FUTURE
My pr Due to the simplicity of the data ,it’s hard for me to
generalize a final conclusion of the prediction problem.Because
the types in the samples are limited ,many of them are very
similar but yield a different vote result .My project is almost
finish ,but it’s far way to perfect .The current data set is a little
simple ,and I would like to try with a more challenge and
complex data set and construct a more complex MLP in the
future.
REFERENCE
[1]King, Gary, Michael Tomz, Jason Wittenberg (2000).
“Making the Most of Statistical Analyses: Improving
Interpretation and Presentation,” American Journal of Political
Science, vol. 44, pp.341–355.
[2]http://heraqi.blogspot.com.eg/2015/11/mlp-neural-network-
with-backpropagation.html
[3]http://neuralnetworksanddeeplearning.com/chap3.html
[4]Professor Hu Lecture Slides
[5]Michael Nielsen “Neuron Network and Deep Learning”
http://neuralnetworksanddeeplearning.com/about.html
Appendix
Matlab code%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Multilayer Perceptron (MLP) Neural Network Function using MATLAB: %% An implementation for Multilayer Perceptron Feed Forward Fully %% Connected Neural Network with a sigmoid activation function. The %% training is done using the Backpropagation algorithm with options for %% Resilient Gradient Descent, Momentum Backpropagation, and Learning %% Rate Decrease. The training stops when the Mean Square Error (MSE) %% reaches zero or a predefined maximum number of epochs is reached. %% %% Four example data for training and testing are included with the %% project. They are generated by SharkTime Sharky Neural Network %% (http://sharktime.com/us_SharkyNeuralNetwork.html) %% %% Copyright (C) 9-2015 Hesham M. Eraqi. All rights reserved.
%% [email protected] %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Clear Variables, Close Current Figures, and Create Results Directory clc;clear all;close all;mkdir('Results//'); %Directory for Storing Results
%% Configurations/ParametersdataFileName = 'sharky.spirals.points'; %sharky.linear.points - sharky.circle.points - sharky.wave.points - sharky.spirals.pointsnbrOfNeuronsInEachHiddenLayer = [10]; %linear:[4] - circle:[10] - wave,spirals:[10 10]nbrOfOutUnits = 2;unipolarBipolarSelector = 0; %0 for Unipolar, -1 for Bipolar
learningRate = 0.15;nbrOfEpochs_max = 5000;
enable_resilient_gradient_descent = 1; %1 for enable, 0 for disablelearningRate_plus = 1.2;learningRate_negative = 0.5;0deltas_start = 0.9;deltas_min = 10^-6;deltas_max = 50;
enable_decrease_learningRate = 0; %1 for enable decreasing, 0 for disablelearningRate_decreaseValue = 0.0001;min_learningRate = 0.05;
enable_learningRate_momentum = 0; %1 for enable, 0 for disablemomentum_alpha = 0.05;
draw_each_nbrOfEpochs = 100;
%% Read Data
importedData = importdata(dataFileName, '\t', 6);Samples = importedData.data(:, 1:length(importedData.data(1,:))-1);TargetClasses = importedData.data(:, length(importedData.data(1,:)));TargetClasses = TargetClasses - min(TargetClasses);ActualClasses = -1*ones(size(TargetClasses));
%% Calculate Number of Input and Output NodesActivationsnbrOfInputNodes = length(Samples(1,:)); %=Dimention of Any Input Samples% nbrOfOutUnits = ceil(log2(length(unique(TargetClasses)))) + !; %Ceil(Log2( Number of Classes ))
nbrOfLayers = 2 + length(nbrOfNeuronsInEachHiddenLayer);nbrOfNodesPerLayer = [nbrOfInputNodes nbrOfNeuronsInEachHiddenLayer nbrOfOutUnits];
%% Adding the Bias as Nodes with a fixed Activation of 1nbrOfNodesPerLayer(1:end-1) = nbrOfNodesPerLayer(1:end-1) + 1;Samples = [ones(length(Samples(:,1)),1) Samples];
%% Calculate TargetOutputs %TODO needs to be general for any nbrOfOutUnitsTargetOutputs = zeros(length(TargetClasses), nbrOfOutUnits);for i=1:length(TargetClasses) if (TargetClasses(i) == 1) TargetOutputs(i,:) = [1 unipolarBipolarSelector]; else TargetOutputs(i,:) = [unipolarBipolarSelector 1]; endend
%% Initialize Random Wieghts MatricesWeights = cell(1, nbrOfLayers); %Weights connecting bias nodes with previous layer are useless, but to make code simpler and fasterDelta_Weights = cell(1, nbrOfLayers);ResilientDeltas = Delta_Weights; % Needed in case that Resilient Gradient Descent is usedfor i = 1:length(Weights)-1 Weights{i} = 2*rand(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1))-1; %RowIndex: From Node Number, ColumnIndex: To Node Number Weights{i}(:,1) = 0; %Bias nodes weights with previous layer (Redundant step) Delta_Weights{i} = zeros(nbrOfNodesPerLayer(i),
nbrOfNodesPerLayer(i+1)); ResilientDeltas{i} = deltas_start*ones(nbrOfNodesPerLayer(i), nbrOfNodesPerLayer(i+1));endWeights{end} = ones(nbrOfNodesPerLayer(end), 1); %Virtual Weights for Output NodesOld_Delta_Weights_for_Momentum = Delta_Weights;Old_Delta_Weights_for_Resilient = Delta_Weights;
NodesActivations = cell(1, nbrOfLayers);for i = 1:length(NodesActivations) NodesActivations{i} = zeros(1, nbrOfNodesPerLayer(i));endNodesBackPropagatedErrors = NodesActivations; %Needed for Backpropagation Training Backward Pass
zeroRMSReached = 0;nbrOfEpochs_done = 0;
%% Iterating all the DataMSE = -1 * ones(1,nbrOfEpochs_max);for Epoch = 1:nbrOfEpochs_max
for Sample = 1:length(Samples(:,1)) %% Backpropagation Training %Forward Pass NodesActivations{1} = Samples(Sample,:); for Layer = 2:nbrOfLayers NodesActivations{Layer} = NodesActivations{Layer-1}*Weights{Layer-1}; NodesActivations{Layer} = Activation_func(NodesActivations{Layer}, unipolarBipolarSelector); if (Layer ~= nbrOfLayers) %Because bias nodes don't have weights connected to previous layer NodesActivations{Layer}(1) = 1; end end
% Backward Pass Errors Storage % (As gradient of the bias nodes are zeros, they won't contribute to previous layer errors nor delta_weights) NodesBackPropagatedErrors{nbrOfLayers} = TargetOutputs(Sample,:)-NodesActivations{nbrOfLayers}; for Layer = nbrOfLayers-1:-1:1
gradient = Activation_func_drev(NodesActivations{Layer+1}, unipolarBipolarSelector); for node=1:length(NodesBackPropagatedErrors{Layer}) % For all the Nodes in current Layer NodesBackPropagatedErrors{Layer}(node) = sum( NodesBackPropagatedErrors{Layer+1} .* gradient .* Weights{Layer}(node,:) ); end end
% Backward Pass Delta Weights Calculation (Before multiplying by learningRate) for Layer = nbrOfLayers:-1:2 derivative = Activation_func_drev(NodesActivations{Layer}, unipolarBipolarSelector); Delta_Weights{Layer-1} = Delta_Weights{Layer-1} + NodesActivations{Layer-1}' * (NodesBackPropagatedErrors{Layer} .* derivative); end end
%% Apply resilient gradient descent or/and momentum to the delta_weights if (enable_resilient_gradient_descent) % Handle Resilient Gradient Descent if (mod(Epoch,200)==0) %Reset Deltas for Layer = 1:nbrOfLayers ResilientDeltas{Layer} = learningRate*Delta_Weights{Layer}; end end for Layer = 1:nbrOfLayers-1 mult = Old_Delta_Weights_for_Resilient{Layer} .* Delta_Weights{Layer}; ResilientDeltas{Layer}(mult > 0) = ResilientDeltas{Layer}(mult > 0) * learningRate_plus; % Sign didn't change ResilientDeltas{Layer}(mult < 0) = ResilientDeltas{Layer}(mult < 0) * learningRate_negative; % Sign changed ResilientDeltas{Layer} = max(deltas_min, ResilientDeltas{Layer}); ResilientDeltas{Layer} = min(deltas_max, ResilientDeltas{Layer});
Old_Delta_Weights_for_Resilient{Layer} = Delta_Weights{Layer};
Delta_Weights{Layer} = sign(Delta_Weights{Layer}) .* ResilientDeltas{Layer}; end end if (enable_learningRate_momentum) %Apply Momentum for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate*Delta_Weights{Layer} + momentum_alpha*Old_Delta_Weights_for_Momentum{Layer}; end Old_Delta_Weights_for_Momentum = Delta_Weights; end if (~enable_learningRate_momentum && ~enable_resilient_gradient_descent) for Layer = 1:nbrOfLayers Delta_Weights{Layer} = learningRate * Delta_Weights{Layer}; end end
%% Backward Pass Weights Update for Layer = 1:nbrOfLayers-1 Weights{Layer} = Weights{Layer} + Delta_Weights{Layer}; end
% Resetting Delta_Weights to Zeros for Layer = 1:length(Delta_Weights) Delta_Weights{Layer} = 0 * Delta_Weights{Layer}; end
%% Decrease Learning Rate if (enable_decrease_learningRate) new_learningRate = learningRate - learningRate_decreaseValue; learningRate = max(min_learningRate, new_learningRate); end
%% Evaluation for Sample = 1:length(Samples(:,1)) outputs = EvaluateNetwork(Samples(Sample,:), NodesActivations, Weights, unipolarBipolarSelector); bound = (1+unipolarBipolarSelector)/2;
if (outputs(1) >= bound && outputs(2) < bound) %TODO: Not generic role for any number of output nodes ActualClasses(Sample) = 1; elseif (outputs(1) < bound && outputs(2) >= bound) ActualClasses(Sample) = 0; else if (outputs(1) >= outputs(2)) ActualClasses(Sample) = 1; else ActualClasses(Sample) = 0; end end end
MSE(Epoch) = sum((ActualClasses-TargetClasses).^2)/(length(Samples(:,1))); if (MSE(Epoch) == 0) zeroRMSReached = 1; end
%% Visualization if (zeroRMSReached || mod(Epoch,draw_each_nbrOfEpochs)==0) % Draw Mean Square Error subplot(2,1,2); MSE(MSE==-1) = []; plot([MSE(1:Epoch)]); ylim([-0.1 0.6]); title('Mean Square Error'); xlabel('Epochs'); ylabel('MSE'); grid on;
saveas(gcf, sprintf('Results//fig%i.png', Epoch),'jpg'); pause(0.05); end display([int2str(Epoch) ' Epochs done out of ' int2str(nbrOfEpochs_max) ' Epochs. MSE = ' num2str(MSE(Epoch)) ' Learning Rate = ' ... num2str(learningRate) '.']);
nbrOfEpochs_done = Epoch; if (zeroRMSReached) saveas(gcf, sprintf('Results//Final Result for %s.png', dataFileName),'jpg');
break; end
enddisplay(['Mean Square Error = ' num2str(MSE(nbrOfEpochs_done)) '.']);