Advanced information retreival

Advanced information Advanced information retreivalretreival

Chapter 02: Modeling - Chapter 02: Modeling -

Neural Network Neural Network ModelModel

Neural Network Model A neural network is an oversimplified representation

of the neuron interconnections in the human brain: nodes are processing units edges are synaptic connections the strength of a propagating signal is modelled by a

weight assigned to each edge the state of a node is defined by its activation level depending on its activation level, a node might issue

an output signal

Neural NetworksNeural Networks

• Neural NetworksNeural Networks– Complex learning systems recognized in animal brainsComplex learning systems recognized in animal brains– Single neuron has simple structureSingle neuron has simple structure– Interconnected sets of neurons perform complex Interconnected sets of neurons perform complex

learning taskslearning tasks– Human brain has 10Human brain has 101515 synaptic connections synaptic connections– Artificial Neural NetworksArtificial Neural Networks attempt to replicate non-linear attempt to replicate non-linear

learning found in naturelearning found in natureDendrites

Cell Body

Axon

Neural Networks Neural Networks ((cont’dcont’d))

– Dendrites gather inputs from other neurons and combine Dendrites gather inputs from other neurons and combine informationinformation

– Then generate non-linear response when threshold reachedThen generate non-linear response when threshold reached– Signal sent to other neurons via axonSignal sent to other neurons via axon

– Artificial neuron model is similarArtificial neuron model is similar

– Data inputs (xData inputs (xii) are collected from upstream neurons input to ) are collected from upstream neurons input to

combination function (sigma)combination function (sigma)

nx

x

x

2

1

y

Neural NetworksNeural Networks ((cont’dcont’d))

– Activation function reads combined input and produces Activation function reads combined input and produces non-linear response (y)non-linear response (y)

– Response channeled downstream to other neuronsResponse channeled downstream to other neurons

• What problems applicable to Neural Networks?What problems applicable to Neural Networks?– Quite robust with respect to noisy dataQuite robust with respect to noisy data– Can learn and work around erroneous dataCan learn and work around erroneous data– Results opaque to human interpretationResults opaque to human interpretation– Often require long training timesOften require long training times

Input and Output EncodingInput and Output Encoding

– Neural Networks require attribute values encoded to [0, 1]Neural Networks require attribute values encoded to [0, 1]

• NumericNumeric– Apply Min-max Normalization to continuous variablesApply Min-max Normalization to continuous variables

– Works well when Min and Max knownWorks well when Min and Max known– Also assumes new data values occur within Min-Max Also assumes new data values occur within Min-Max

rangerange– Values outside range may be rejected or mapped to Min Values outside range may be rejected or mapped to Min

or Maxor Max

)min()max(

)min(

)range(

)min(*

XX

XX

X

XXX

Input and Output Encoding Input and Output Encoding ((cont’dcont’d))

• OutputOutput – Neural Networks always return continuous values [0, 1]Neural Networks always return continuous values [0, 1]– Many classification problems have two outcomesMany classification problems have two outcomes– Solution uses threshold established Solution uses threshold established a prioria priori in single in single

output node to separate classesoutput node to separate classes– For example, target variable is “leave” or “stay”For example, target variable is “leave” or “stay”– Threshold value is “leave if output >= 0.67”Threshold value is “leave if output >= 0.67”– Single output node value = 0.72 classifies record as Single output node value = 0.72 classifies record as

“leave”“leave”

Simple Example of a Neural Simple Example of a Neural NetworkNetwork

– Neural Network consists of Neural Network consists of layeredlayered, , feedforwardfeedforward, , completely connectedcompletely connected network of nodes network of nodes

– Feedforward restricts network flow to single directionFeedforward restricts network flow to single direction– Flow does not loop or cycleFlow does not loop or cycle– Network composed of two or more layersNetwork composed of two or more layers

Node 1

Node 2

Node 3

Node B

Node A

Node Z

W1A

W1B

W2A

W2B

WAZ

W3A

W3B

W0A

WBZ

W0Z

W0B

Input LayerInput Layer Hidden LayerHidden Layer Output LayerOutput Layer

Simple Example of a Neural Simple Example of a Neural Network Network ((cont’dcont’d))

– Most networks have Most networks have InputInput, , HiddenHidden, , OutputOutput layers layers– Network may contain more than one hidden layerNetwork may contain more than one hidden layer– Network is completely connectedNetwork is completely connected– Each node in given layer, connected to every node in Each node in given layer, connected to every node in

next layernext layer

– Every connection has weight (WEvery connection has weight (Wijij) associated with it) associated with it

– Weight values randomly assigned 0 to 1 by algorithmWeight values randomly assigned 0 to 1 by algorithm– Number of input nodes dependent on number of Number of input nodes dependent on number of

predictorspredictors– Number of hidden and output nodes configurableNumber of hidden and output nodes configurable

Simple Example of a Neural Network Simple Example of a Neural Network ((contcont))

– Combination function produces linear combination of Combination function produces linear combination of node inputs and connection weights to single scalar valuenode inputs and connection weights to single scalar value

– For node j, xFor node j, xijij is i is ithth input input– WWijij is weight associated with i is weight associated with ithth input node input node– I+ 1 inputs to node jI+ 1 inputs to node j– xx11, x, x22, ..., x, ..., xII are inputs from upstream nodes are inputs from upstream nodes– xx00 is is constant inputconstant input value = 1.0 value = 1.0– Each input node has extra input WEach input node has extra input W0j0jxx0j0j = W = W0j0j

jIjIjjjjiji

ijj xWxWxWxW ...net 1100

Node 1

Node 2

Node 3

Node B

Node A Node

Z

W1AW1B

W2A

W2B

WAZ

W3AW3B

W0A

WBZW0Z

W0B

Input LayerInput Layer Hidden LayerHidden Layer Output LayerOutput Layer


– The scalar value computed for hidden layer Node A The scalar value computed for hidden layer Node A equals equals

– For Node A, netFor Node A, netAA = 1.32 is input to activation function = 1.32 is input to activation function– Neurons “fire” in biological organisms Neurons “fire” in biological organisms – Signals sent between neurons when combination of Signals sent between neurons when combination of

inputs cross thresholdinputs cross threshold

x0 = 1.0 W0A = 0.5 W0B = 0.7 W0Z = 0.5

x1 = 0.4 W1A = 0.6 W1B = 0.9 WAZ = 0.9

x2 = 0.2 W2A = 0.8 W2B = 0.8 WBZ = 0.9

x3 = 0.7 W3A = 0.6 W3B = 0.4

32.1)7.0(6.0)2.0(8.0)4.0(6.05.0

)0.1(net 3322110

AAAAAAAiAi

iAA xWxWxWWxW


– Firing response Firing response not necessarily linearly relatednot necessarily linearly related to to increase in input stimulationincrease in input stimulation

– Neural Networks model behavior using non-linear Neural Networks model behavior using non-linear activation functionactivation function

– Sigmoid functionSigmoid function most commonly used most commonly used

– In Node A, sigmoid function takes netIn Node A, sigmoid function takes netAA = 1.32 as input = 1.32 as input and produces outputand produces output

xey

1

1

7892.01

132.1

ey


– Node A outputs 0.7892 along connection to Node Z, and Node A outputs 0.7892 along connection to Node Z, and becomes component of netbecomes component of netZZ

– Before netBefore netZZ is computed, contribution from Node B is computed, contribution from Node B requiredrequired

– Node Z combines outputs from Node A and Node B, Node Z combines outputs from Node A and Node B, through netthrough netZZ

8176.01

1)net(

and,

5.1)7.0(4.0)2.0(8.0)4.0(9.07.0

)0.1(net

5.1B

3322110

ef

xWxWxWWxW BBBBBBBiBi

iBB


– Inputs to Node Z not data attribute valuesInputs to Node Z not data attribute values– Rather, outputs are from sigmoid function in upstream Rather, outputs are from sigmoid function in upstream

nodesnodes

– Value 0.8750 output from Neural Network on first passValue 0.8750 output from Neural Network on first pass– Represents predicted value for target variable, given first Represents predicted value for target variable, given first

observationobservation

8750.01

1)net(

finally,

9461.1)8176.0(9.0)7892.0(9.05.0

)0.1(net

9461.1z

0

ef

xWxWWxW BZBZAZAZZiZi

iZZ

Sigmoid Activation FunctionSigmoid Activation Function

– Sigmoid function combines Sigmoid function combines nearly linearnearly linear, , curvilinearcurvilinear, and , and nearly constant behaviornearly constant behavior depending on input value depending on input value

– Function nearly linear for domain values -1 < x < 1Function nearly linear for domain values -1 < x < 1– Becomes curvilinear as values move away from centerBecomes curvilinear as values move away from center– At extreme values, f(At extreme values, f(xx) is nearly constant ) is nearly constant – Moderate increments in Moderate increments in xx produce variable increase in produce variable increase in

f(f(xx), depending on location of ), depending on location of xx– Sometimes called “Squashing Function”Sometimes called “Squashing Function”– Takes real-valued input and returns values [0, 1]Takes real-valued input and returns values [0, 1]

Back-PropagationBack-Propagation

– Neural Networks are supervised learning methodNeural Networks are supervised learning method– Require target variableRequire target variable– Each observation passed through network results in Each observation passed through network results in

output valueoutput value– Output value compared to actual value of target Output value compared to actual value of target

variablevariable– (Actual – Output) = Error(Actual – Output) = Error– Prediction error analogous to residuals in regression Prediction error analogous to residuals in regression

modelsmodels– Most networks use Sum of Squares (SSE) to measure Most networks use Sum of Squares (SSE) to measure

how well predictions fit target valueshow well predictions fit target values sOutputNodecords

outputactualSSE 2

Re

)(

Back-Propagation Back-Propagation ((cont’dcont’d))

– Squared prediction errors summed over all output Squared prediction errors summed over all output nodes, and all records in data setnodes, and all records in data set

– Model weights constructed that minimize SSEModel weights constructed that minimize SSE– Actual values that minimize SSE are unknownActual values that minimize SSE are unknown– Weights estimated, given the data setWeights estimated, given the data set

Neural Network for IR: From the work by Wilkinson & Hingston, SIGIR’91

Document Terms

Query Terms Documents

ka

kb

kc

ka

kb

kc

k1

kt

d1

dj

dj+1

dN

Neural Network for IR Three layers network Signals propagate across the network First level of propagation:

Query terms issue the first signals These signals propagate accross the network to

reach the document nodes Second level of propagation:

Document nodes might themselves generate new signals which affect the document term nodes

Document term nodes might respond with new signals of their own

Quantifying Signal Propagation

Normalize signal strength (MAX = 1) Query terms emit initial signal equal to 1 Weight associated with an edge from a query term

node ki to a document term node ki: WiqWiq = wiq

sqrt ( i wiq ) Weight associated with an edge from a document

term node ki to a document node dj: WijWij = wij

sqrt ( i wij )

2

2

Quantifying Signal Propagation After the first level of signal propagation, the

activation level of a document node dj is given by:

i WiqWiq WijWij = i wiq wij sqrt ( i wiq ) * sqrt ( i wij )

which is exactly the ranking of the Vector model New signals might be exchanged among document

term nodes and document nodes in a process analogous to a feedback cycle

A minimum threshold should be enforced to avoid spurious signal generation

222

Conclusions

Model provides an interesting formulation of the IR problem

Model has not been tested extensively It is not clear the improvements that the model might

provide