15
1 / GE / Hello I’m Adam Steinberger an Edison Software Engineer in the class of 2012 working in Barrington, IL. Currently, I’m working with Team Tesla on developing a product called Zero Footprint. This product is a Universal Viewer that provides radiologists w/ an os-independent, web browser-independent online app for viewing their patients’ scans from anywhere in the world. I’m also currently taking A Course with the rest of the 2012 Software Edisons, and for this course I’ll be talking about Artificial Neural Networks today. During my senior year at Skidmore College, I devoted most of my studies to Artificial Neural Networks, which I used to develop Artificial Intelligence software to play the Ancient Chinese game of Go for my senior thesis, and now I’d like to share with you what I’ve learned from my studies.

Neural Networks 101

Embed Size (px)

DESCRIPTION

Artificial Neural Networks and their Applications in Industry.

Citation preview

Page 1: Neural Networks 101

1 /

GE /

Hello I’m Adam Steinberger an Edison Software Engineer in the class of 2012 working in Barrington, IL. Currently, I’m working with Team Tesla on developing a product called Zero Footprint. This product is a Universal Viewer that provides radiologists w/ an os-independent, web browser-independent online app for viewing their patients’ scans from anywhere in the world. I’m also currently taking A Course with the rest of the 2012 Software Edisons, and for this course I’ll be talking about Artificial Neural Networks today. During my senior year at Skidmore College, I devoted most of my studies to Artificial Neural Networks, which I used to develop Artificial Intelligence software to play the Ancient Chinese game of Go for my senior thesis, and now I’d like to share with you what I’ve learned from my studies.

Page 2: Neural Networks 101

2 /

GE /

In short, we’ll start by discussing what neural networks are, why they’re used in software, and their applications in industry. Then, I’ll introduce the structure of biological neurons, and follow up with an illustrative explanation of perceptrons and their mathematical model. Next, we’ll combine perceptrons together in layers to form multilayer neural networks, and together we’ll go through the Q-Learning activation that triggers signals to travel from layer to layer in these networks. Finally, we’ll go over how neural networks are trained to learn information based on pre-existing input data using the back propagation reinforcement learning algorithm. Afterwards, I’d like to present my citations and open the floor for questions.

Page 3: Neural Networks 101

3 /

GE /

Neural Networks are electrical networks or circuits formed by the interconnection of information nodes called neurons. These neurons can be either biological or artificial, but either way, their main purpose is to process information. They do this by taking a weighted sum of input signals and comparing this sum to an activation level to determine whether an output signal should be produced. Over time, neural networks learn how to solve problems through the repeated updating of their input and output signals and ever-changing interconnectivity.

Page 4: Neural Networks 101

4 /

GE /

There are many reasons why Neural Networks are useful in the real world. A major reason Neural Networks are so useful is that they’re easily conceptualized, simple, and modeled on biological structures. This makes it easier for computer scientists and software engineers to implement neural networks than other forms of Artificial Intelligence. Also, because of their biological basis, neural networks have been studied and researched by academia for decades. And Neural Networks are already an industry standard in a wide variety of well established fields. Most importantly, there are hundreds of implementations and libraries for Neural Networks available for free online, and they’re implemented in more programming languages than I can count. However, there are some drawbacks to these artificial brains, if you will. For one, training a neural network takes a very long time. Even training a small network can take months of constant computing on todays computers, and there’s no guarantee that the resulting outcomes from the network are correct when run on real world problems. Also, the training process for neural networks is still somewhat of an art. Its hard for scientists and engineers to estimate whether a network has been trained too much, which is called over training, or trained too little, which is called under training. To help you understand this issue, let’s use the analogy of a student named Alex studying for an exam in Computer Science 101 next Monday. If Alex is provided back exams by the professor with solutions, and decides to memorize the question-answer pairs for each of these exams, Alex will have a difficult time trying to answer questions on Monday’s exam that aren’t exact replicas of the questions he memorized from the back exams. On the other hand, if Alex doesn’t spend enough time studying for the Computer Science 101 exam on Monday, Alex won’t know enough about each question-answer pair from the back exams to answer the questions on Monday’s exam. Besides problems with training

Page 5: Neural Networks 101

‹#› /

GE /

neural networks, sometimes its inefficient to solve problems with these networks because they are black box solutions, where the process for solving the problems is never revealed to the user or concretely developed within the network. Plus there are no confidence measurements produced by these networks, so the user has no idea whether the answers produced by the network are correct or not, and neither does the network. And the neural network isn’t tailored to solving any specific problems, so it will never solve problems in the most efficient way possible.

Page 6: Neural Networks 101

5 /

GE /

Some examples of industries that use artificial neural networks include the textile, materials and geochemical, steel, coal, food, electric power, automotive, and robotics industries. Also, a few examples of Artificial Intelligence you may have heard about in the news include IBM’s Deep Blue, the computer that beat the world chess champion Garry Kasparov at chess in 1997, and IBM’s Watson, the computer that beat world champions Brad Rutter and Ken Jennings at the game show Jeopardy in 2011.

Page 7: Neural Networks 101

6 /

GE /

A biological neuron is a cell in the nervous systems, which includes the brain, of animals. Neurons process information based on electrical signals that pass through them. They’re made up of dendrites, a nucleus, axons, synapses and neurotransmitters. The dendrites and axons carry electrical signals across the neuron, while the nucleus sums the signals up and compares that sum to an activation level. Neurotransmitters are sent from the synapses to the dendrites to carry signals between the neurons.

Page 8: Neural Networks 101

7 /

GE /

A perceptron is a model of a biological neuron used by computer scientists and software engineers to create neural networks. The perceptron unit has a set of inputs, each containing weights, that send signals into the main unit. The unit then takes a weighted sum of the input signals and compares it to a threshold value. If the weighted sum is greater than or equal to the threshold, then a 1 signal is sent out of the unit. Otherwise a 0 signal is sent out.

Page 9: Neural Networks 101

8 /

GE /

Here we have the mathematical model behind the perceptron. X0 through Xm are input signals to the perceptron, each containing weights w0 through wm. Vk takes a weighted sum of the input signals, which is then sent to the activation function Phi. If the perceptron activates, a 1 is sent out to the output Yk, otherwise a 0 is sent to Yk.

Page 10: Neural Networks 101

9 /

GE /

When we put perceptrons together in multiple layers, we unlock new and powerful learning abilities that make neural networks so useful in todays industries. The benefit of connecting perceptrons in multilayer networks is that the network can mimic larger numbers of functions, and solve much more complex problems than a single neuron. By increasing or decreasing the number of perceptrons in the network, we also combat the potential for over or under training. Reducing the number of hidden layer nodes in the middle layer reduces each node’s influence on the network. This is why many neural networks in industry have large sets if input and output layer perceptrons and very few hidden layer perceptrons.

Page 11: Neural Networks 101

10 /

GE /

Although there are many different activation functions for neural networks, the one I used for my senior thesis is called Q-Learning. In Q-Learning, an action-value function produces an expected reward for taking a specific action from a specific state, without having to compare this information with a model of its environment. At each step of the learning process, a state is recorded, and action is taken using a policy Q, and a reward and new state s’ are recorded. The Q value for that state is then updated using a fraction of the difference between learned value for the next state and the value for the old state. Then the current state s becomes the new state s’. This process repeats until the state s is terminal, or in the case of my software the game ends.

Page 12: Neural Networks 101

11 /

GE /

Neural networks learn from pre-existing sets of training data and the back propagation algorithm. Training inputs are propagated forward through the neural network until output activations are generated. Training outputs are then propagated backwards through the neural network to general activation deltas of output and hidden layer neurons. Then for each connection weight, we multiply the output deltas with the input activation to calculate gradients for each weight. The weights are then updated using the opposite direction gradient by subtracting a fraction of the gradient from the old weight.

Page 13: Neural Networks 101

12 /

GE /

Back Propagation is an iterative process that continually updates the weights of links between nodes in a neural network until the difference between actual outputs and training outputs fall within a certain range 𝜀 (epsilon). We start with random weights for all paths in the network. For each node in the input layer, set the training input for that node as its activation. Then forward propagate through the network, calculating the weighted sum of the input to each node, and calculating the activation function for this sum. Then for each node in the output layer, calculate the difference between the actual outputs and the training outputs to get the output deltas. Afterwards, back propagate all the way back to the input layer, calculating the weighted sum of the deltas for each link in the network. Once back propagation is finished, update the weight of each link using a fraction of the delta calculated for that link. As this process iterates, the neural network learns how to process the training input in a way that produces outputs very close to those provided in the training data. Statisticians may help in determining the stopping criterion for back propagation, since poor stopping criterion could result in under or over training of the neural network.

Page 14: Neural Networks 101

13 /

GE /

Thank you very much for being here to see my presentation on Artificial Neural Networks. The following citations are just a fraction of the sources I used for my senior thesis, but since I don’t have a week to talk about neural networks, I chose these select sources for my presentation.

Page 15: Neural Networks 101

14 /

GE /

Again, I’d like to thank everyone for being here today. You guys are great! Now I’d like to open the floor to questions…