Upload
onella
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Advanced information retreival. Chapter 02: Modeling - Neural Network Model. Neural Network Model. A neural network is an oversimplified representation of the neuron interconnections in the human brain: nodes are processing units edges are synaptic connections - PowerPoint PPT Presentation
Citation preview
Advanced information Advanced information retreivalretreival
Chapter 02: Modeling - Chapter 02: Modeling -
Neural Network Neural Network ModelModel
Neural Network Model A neural network is an oversimplified representation
of the neuron interconnections in the human brain: nodes are processing units edges are synaptic connections the strength of a propagating signal is modelled by a
weight assigned to each edge the state of a node is defined by its activation level depending on its activation level, a node might issue
an output signal
Neural NetworksNeural Networks
• Neural NetworksNeural Networks– Complex learning systems recognized in animal brainsComplex learning systems recognized in animal brains– Single neuron has simple structureSingle neuron has simple structure– Interconnected sets of neurons perform complex Interconnected sets of neurons perform complex
learning taskslearning tasks– Human brain has 10Human brain has 101515 synaptic connections synaptic connections– Artificial Neural NetworksArtificial Neural Networks attempt to replicate non-linear attempt to replicate non-linear
learning found in naturelearning found in natureDendrites
Cell Body
Axon
Neural Networks Neural Networks ((cont’dcont’d))
– Dendrites gather inputs from other neurons and combine Dendrites gather inputs from other neurons and combine informationinformation
– Then generate non-linear response when threshold reachedThen generate non-linear response when threshold reached– Signal sent to other neurons via axonSignal sent to other neurons via axon
– Artificial neuron model is similarArtificial neuron model is similar
– Data inputs (xData inputs (xii) are collected from upstream neurons input to ) are collected from upstream neurons input to
combination function (sigma)combination function (sigma)
nx
x
x
2
1
y
Neural NetworksNeural Networks ((cont’dcont’d))
– Activation function reads combined input and produces Activation function reads combined input and produces non-linear response (y)non-linear response (y)
– Response channeled downstream to other neuronsResponse channeled downstream to other neurons
• What problems applicable to Neural Networks?What problems applicable to Neural Networks?– Quite robust with respect to noisy dataQuite robust with respect to noisy data– Can learn and work around erroneous dataCan learn and work around erroneous data– Results opaque to human interpretationResults opaque to human interpretation– Often require long training timesOften require long training times
Input and Output EncodingInput and Output Encoding
– Neural Networks require attribute values encoded to [0, 1]Neural Networks require attribute values encoded to [0, 1]
• NumericNumeric– Apply Min-max Normalization to continuous variablesApply Min-max Normalization to continuous variables
– Works well when Min and Max knownWorks well when Min and Max known– Also assumes new data values occur within Min-Max Also assumes new data values occur within Min-Max
rangerange– Values outside range may be rejected or mapped to Min Values outside range may be rejected or mapped to Min
or Maxor Max
)min()max(
)min(
)range(
)min(*
XX
XX
X
XXX
Input and Output Encoding Input and Output Encoding ((cont’dcont’d))
• OutputOutput – Neural Networks always return continuous values [0, 1]Neural Networks always return continuous values [0, 1]– Many classification problems have two outcomesMany classification problems have two outcomes– Solution uses threshold established Solution uses threshold established a prioria priori in single in single
output node to separate classesoutput node to separate classes– For example, target variable is “leave” or “stay”For example, target variable is “leave” or “stay”– Threshold value is “leave if output >= 0.67”Threshold value is “leave if output >= 0.67”– Single output node value = 0.72 classifies record as Single output node value = 0.72 classifies record as
“leave”“leave”
Simple Example of a Neural Simple Example of a Neural NetworkNetwork
– Neural Network consists of Neural Network consists of layeredlayered, , feedforwardfeedforward, , completely connectedcompletely connected network of nodes network of nodes
– Feedforward restricts network flow to single directionFeedforward restricts network flow to single direction– Flow does not loop or cycleFlow does not loop or cycle– Network composed of two or more layersNetwork composed of two or more layers
Node 1
Node 2
Node 3
Node B
Node A
Node Z
W1A
W1B
W2A
W2B
WAZ
W3A
W3B
W0A
WBZ
W0Z
W0B
Input LayerInput Layer Hidden LayerHidden Layer Output LayerOutput Layer
Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))
– Most networks have Most networks have InputInput, , HiddenHidden, , OutputOutput layers layers– Network may contain more than one hidden layerNetwork may contain more than one hidden layer– Network is completely connectedNetwork is completely connected– Each node in given layer, connected to every node in Each node in given layer, connected to every node in
next layernext layer
– Every connection has weight (WEvery connection has weight (Wijij) associated with it) associated with it
– Weight values randomly assigned 0 to 1 by algorithmWeight values randomly assigned 0 to 1 by algorithm– Number of input nodes dependent on number of Number of input nodes dependent on number of
predictorspredictors– Number of hidden and output nodes configurableNumber of hidden and output nodes configurable
Simple Example of a Neural Network Simple Example of a Neural Network ((contcont))
– Combination function produces linear combination of Combination function produces linear combination of node inputs and connection weights to single scalar valuenode inputs and connection weights to single scalar value
– For node j, xFor node j, xijij is i is ithth input input– WWijij is weight associated with i is weight associated with ithth input node input node– I+ 1 inputs to node jI+ 1 inputs to node j– xx11, x, x22, ..., x, ..., xII are inputs from upstream nodes are inputs from upstream nodes– xx00 is is constant inputconstant input value = 1.0 value = 1.0– Each input node has extra input WEach input node has extra input W0j0jxx0j0j = W = W0j0j
jIjIjjjjiji
ijj xWxWxWxW ...net 1100
Node 1
Node 2
Node 3
Node B
Node A Node
Z
W1AW1B
W2A
W2B
WAZ
W3AW3B
W0A
WBZW0Z
W0B
Input LayerInput Layer Hidden LayerHidden Layer Output LayerOutput Layer
Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))
– The scalar value computed for hidden layer Node A The scalar value computed for hidden layer Node A equals equals
– For Node A, netFor Node A, netAA = 1.32 is input to activation function = 1.32 is input to activation function– Neurons “fire” in biological organisms Neurons “fire” in biological organisms – Signals sent between neurons when combination of Signals sent between neurons when combination of
inputs cross thresholdinputs cross threshold
x0 = 1.0 W0A = 0.5 W0B = 0.7 W0Z = 0.5
x1 = 0.4 W1A = 0.6 W1B = 0.9 WAZ = 0.9
x2 = 0.2 W2A = 0.8 W2B = 0.8 WBZ = 0.9
x3 = 0.7 W3A = 0.6 W3B = 0.4
32.1)7.0(6.0)2.0(8.0)4.0(6.05.0
)0.1(net 3322110
AAAAAAAiAi
iAA xWxWxWWxW
Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))
– Firing response Firing response not necessarily linearly relatednot necessarily linearly related to to increase in input stimulationincrease in input stimulation
– Neural Networks model behavior using non-linear Neural Networks model behavior using non-linear activation functionactivation function
– Sigmoid functionSigmoid function most commonly used most commonly used
– In Node A, sigmoid function takes netIn Node A, sigmoid function takes netAA = 1.32 as input = 1.32 as input and produces outputand produces output
xey
1
1
7892.01
132.1
ey
Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))
– Node A outputs 0.7892 along connection to Node Z, and Node A outputs 0.7892 along connection to Node Z, and becomes component of netbecomes component of netZZ
– Before netBefore netZZ is computed, contribution from Node B is computed, contribution from Node B requiredrequired
– Node Z combines outputs from Node A and Node B, Node Z combines outputs from Node A and Node B, through netthrough netZZ
8176.01
1)net(
and,
5.1)7.0(4.0)2.0(8.0)4.0(9.07.0
)0.1(net
5.1B
3322110
ef
xWxWxWWxW BBBBBBBiBi
iBB
Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))
– Inputs to Node Z not data attribute valuesInputs to Node Z not data attribute values– Rather, outputs are from sigmoid function in upstream Rather, outputs are from sigmoid function in upstream
nodesnodes
– Value 0.8750 output from Neural Network on first passValue 0.8750 output from Neural Network on first pass– Represents predicted value for target variable, given first Represents predicted value for target variable, given first
observationobservation
8750.01
1)net(
finally,
9461.1)8176.0(9.0)7892.0(9.05.0
)0.1(net
9461.1z
0
ef
xWxWWxW BZBZAZAZZiZi
iZZ
Sigmoid Activation FunctionSigmoid Activation Function
– Sigmoid function combines Sigmoid function combines nearly linearnearly linear, , curvilinearcurvilinear, and , and nearly constant behaviornearly constant behavior depending on input value depending on input value
– Function nearly linear for domain values -1 < x < 1Function nearly linear for domain values -1 < x < 1– Becomes curvilinear as values move away from centerBecomes curvilinear as values move away from center– At extreme values, f(At extreme values, f(xx) is nearly constant ) is nearly constant – Moderate increments in Moderate increments in xx produce variable increase in produce variable increase in
f(f(xx), depending on location of ), depending on location of xx– Sometimes called “Squashing Function”Sometimes called “Squashing Function”– Takes real-valued input and returns values [0, 1]Takes real-valued input and returns values [0, 1]
Back-PropagationBack-Propagation
– Neural Networks are supervised learning methodNeural Networks are supervised learning method– Require target variableRequire target variable– Each observation passed through network results in Each observation passed through network results in
output valueoutput value– Output value compared to actual value of target Output value compared to actual value of target
variablevariable– (Actual – Output) = Error(Actual – Output) = Error– Prediction error analogous to residuals in regression Prediction error analogous to residuals in regression
modelsmodels– Most networks use Sum of Squares (SSE) to measure Most networks use Sum of Squares (SSE) to measure
how well predictions fit target valueshow well predictions fit target values sOutputNodecords
outputactualSSE 2
Re
)(
Back-Propagation Back-Propagation ((cont’dcont’d))
– Squared prediction errors summed over all output Squared prediction errors summed over all output nodes, and all records in data setnodes, and all records in data set
– Model weights constructed that minimize SSEModel weights constructed that minimize SSE– Actual values that minimize SSE are unknownActual values that minimize SSE are unknown– Weights estimated, given the data setWeights estimated, given the data set
Neural Network for IR: From the work by Wilkinson & Hingston, SIGIR’91
Document Terms
Query Terms Documents
ka
kb
kc
ka
kb
kc
k1
kt
d1
dj
dj+1
dN
Neural Network for IR Three layers network Signals propagate across the network First level of propagation:
Query terms issue the first signals These signals propagate accross the network to
reach the document nodes Second level of propagation:
Document nodes might themselves generate new signals which affect the document term nodes
Document term nodes might respond with new signals of their own
Quantifying Signal Propagation
Normalize signal strength (MAX = 1) Query terms emit initial signal equal to 1 Weight associated with an edge from a query term
node ki to a document term node ki: WiqWiq = wiq
sqrt ( i wiq ) Weight associated with an edge from a document
term node ki to a document node dj: WijWij = wij
sqrt ( i wij )
2
2
Quantifying Signal Propagation After the first level of signal propagation, the
activation level of a document node dj is given by:
i WiqWiq WijWij = i wiq wij sqrt ( i wiq ) * sqrt ( i wij )
which is exactly the ranking of the Vector model New signals might be exchanged among document
term nodes and document nodes in a process analogous to a feedback cycle
A minimum threshold should be enforced to avoid spurious signal generation
222
Conclusions
Model provides an interesting formulation of the IR problem
Model has not been tested extensively It is not clear the improvements that the model might
provide