View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Arithmetic Done by Brains and Machines:
The Ersatz Brain Project
James A. [email protected]
Department of Cognitive and Linguistic Sciences
Brown University, Providence, RI 02912
Our Goal:
We want to build a first-rate, second-rate brain.
Ersatz ParticipantsFaculty:
Jim Anderson, Cognitive Science.
Gerry Guralnik, Physics.
David Sheinberg, Neuroscience.
Students:
Socrates Dimitriadis, Cognitive Science.
Brian Merritt, Cognitive Science.
Private Industry:
Paul Allopenna, Aptima, Inc.
Andrew Duchon, Aptima, Inc.
John Santini, Alion, Inc.
Acknowledgements
This work was supported by:
A seed money grant from the Office of the Vice President for Research, Brown University.
Phase I and Phase II SBIRs, “The Ersatz Brain Project,” to Aptima, Inc. (Woburn MA), Dr. Paul Allopenna, Project Manager. Funding from the Air Force Research Laboratory, Rome, NY
Comparison of Silicon Computers and Carbon Computer
Digital computers are • Made from silicon• Accurate (essentially no errors)• Fast (nanoseconds)• Execute long chains of logical operations (billions)
• Often irritating (because they don’t think like us).
Comparison of Silicon Computers and Carbon Computer
Brains are• Made from carbon • Inaccurate (low precision, noisy)• Slow (milliseconds, 106 times slower)
• Execute short chains of parallel alogical associative operations (perhaps 10 operations/second)
• Yet largely understandable (because they think like us).
Comparison of Silicon Computers and Carbon Computer
• Huge disadvantage for carbon: more than 1012 in the product of speed and power.
• But we still do better than them in many perceptual skills: speech recognition, object recognition, face recognition, information integration, motor control.
• One implication: Cognitive “software” uses only a few but very powerful elementary operations.
Major Point
Brains and computers are very different in their underlying hardware, leading to major differences in software.
Computers, as the result of 60 years of evolution, are great at modeling physics.
They are not great (after 50 years trying and largely failing) at modeling human cognition.
One possible reason: inappropriate hardware leads to inappropriate software.
Maybe we need something completely different: new software, new hardware, new basic operations, even new ideas about computation.
So Why Build a Brain-Like Computer? 1. Engineering. Computers are all special purpose devices. Many of the most important practical computer applications
of the next few decades will be cognitive in nature: Natural language processing. Internet search. Cognitive data mining. Decent human-computer interfaces. Text understanding. We claim it will be necessary to have a cortex-like
architecture (either software or hardware) to run these applications efficiently.
2. Science: Such a system, even in simulation, becomes a
powerful research tool. It leads to designing software with a particular
structure to match the brain-like computer. If we capture any of the essence of the cortex,
writing good programs will give insight into biology and cognitive science.
If we can write good software for a vaguely brain
like computer we may show we really understand something important about the brain.
3. Personal:
It would be the ultimate cool gadget.
A technological vision:In 2057 the personal computer you buy in Wal-Mart
will have two CPU’s with very different architectures:
First, a traditional von Neumann machine that runs spreadsheets, does word processing, keeps your calendar straight, etc. etc. What they do now.
Second, a brain-like chip To handle the interface with the von Neumann
machine, Give you the data that you need from the Web
or your files (but didn’t think to ask for). Be your silicon friend, guide, and confidant
(Because you understand each other.)
Conventional wisdom says neurons are the basic
computational units of the brain.
The Ersatz Brain Project is based on a different approximation.
The Network of Networks model was developed in collaboration with Jeff Sutton then at Harvard Medical School, now at NSBRI.
Cerebral cortex contains intermediate level
structure, between neurons and an entire cortical region.
Intermediate level brain structures are hard to
study experimentally because they require recording from many cells simultaneously.
The Ersatz Brain Approximation:The Network of Networks.
Network of Networks Approximation
We use the Network of Networks [NofN] approximation to structure the hardware and to reduce the number of connections.
We assume the basic
computing units are not neurons, but small (104 neurons) attractor networks.
Basic Network of Networks
Hardware Architecture:• 2 Dimensional array of
modules • Locally connected to
neighbors
Cortical Columns: Minicolumns
“The basic unit of cortical operation is the minicolumn … It contains of the order of 80-100 neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of 40-50 m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium.” (Mountcastle, p. 2)
VB Mountcastle (2003). Introduction [to a special issue of Cerebral Cortex on columns]. Cerebral Cortex, 13, 2-4.
Figure: Nissl stain of cortex in planum temporale.
Columns: Functional
Groupings of minicolumns seem to form the physiologically observed functional columns. Best known example is orientation columns in V1.
They are significantly bigger than minicolumns, typically around 0.3-0.5 mm.
Mountcastle’s summation:
“Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3)
Cells in a column ~ (80)(100) = 8000
Elementary ModulesThe activity of the non-
linear attractor networks (modules) is dominated by their attractor states.
Attractor states may be
built in or acquired through learning.
We approximate the
activity of a module as a weighted sum of attractor states.That is: an adequate set of basis functions.
Activity of Module:
x = Σ ciai
where the ai are the attractor states.
The Single Module: BSB The attractor
network we use for the individual modules is the BSB network (Anderson, 1993).
It can be
analyzed using the eigenvectors and eigenvalues of its local connections.
Interactions between Modules
Interactions between modules are described by state interaction matrices, M.
The state interaction matrix elements give the contribution of an attractor state in one module to the amplitude of an attractor state in a connected module.
In the BSB linear region
x(t+1) = Σ Misi + f + x(t) weighted sum input ongoing from other modules activity
The Linear-Nonlinear Transition
The first BSB processing stage is linear and sums influences from other modules.
The second processing stage is nonlinear.
This linear to nonlinear transition is a powerful computational tool for cognitive applications.
It describes the processing path taken by many
cognitive processes. A generalization from cognitive science: Sensory inputs (categories, concepts, words) Cognitive processing moves from continuous values
to discrete entities.
Sparse Connectivity The brain is sparsely connected. (Unlike most neural
nets.) A neuron in cortex may have on the order of 100,000
synapses. There are more than 1010 neurons in the brain. Fractional connectivity is very low: 0.001%.
Implications: • Connections are expensive biologically since they
take up space, use energy, and are hard to wire up correctly.
• Connections are valuable.• The pattern of connection is under tight control.• Short local connections are cheaper than long ones.
Our approximation makes extensive use of local connections for computation.
Biological Evidence:Columnar Organization in Inferotemporal
Cortex
Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex.
There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.
IT Response Clusters: Imaging
Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up.
At least a factor of
ten higher resolution than fMRI.
Size of response is
around the size of functional columns seen elsewhere: 300-400 microns.
Columns: Inferotemporal Cortex
Responses of a region of IT to complex images involve discrete columns.
The response to a
picture of a fire extinguisher shows how regions of activity are determined.
Boundaries are where
the activity falls by a half.
Note: some spots are
roughly equally spaced.
Active IT Regions for a Complex Stimulus
Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.
Engineering Hardware Considerations
We feel that there is a size, connectivity, and computational power sweet spot at the level of the parameters of the network of network model.
If an elementary attractor network has 104 actual
neurons, that network might have 50 attractor states. Each elementary network might connect to 50 others through state connection matrices.
A brain-sized system might consist of 106 elementary
units with about 1011 (0.1-1 terabyte) numbers specifying the connections.
If 100 to 1000 elementary units on a chip gives a
total of 1,000 to 10,000 chips in a cortex sized system. Well within the upper bounds of current technology.
Modules(Ersatz Processing Units:EPUs)
Function of EPU Modules:• Simulate local integration: Addition of inputs from outside, from other modules.
• Simulate local network dynamics.• Communications Controller: Handle long range (i.e. not neighboring) interactions.
Simpler approximations are possible:• “Cellular automaton”. (Ignore local dynamics.)
• Approximations to local dynamics.
Physical (Hardware) ModuleWe assume only local connections for the physical hardware.
Reason: Flexible, easy to build, easy to work with.
Software Based Connectivity
Cortical data suggests more connections than just nearest neighbors exist.
Simulate these with EPU module software, in the the Communications Controller.
Implications
Interesting bonus from this structure:• Information transmission both local and long range can be slow.
• It will take multiple steps (a long time) to move data to distant modules.
• But: This is a feature, not a bug!
ImplicationsForces us to pay attention to the
Temporal aspects of module behavior• Communication times• Module temporal dynamics• Note: The details of spatial arrangement of data affects communication times.
Consistent with cortical neuroscience
Implication: We can “program” the array by manipulating these “analog” properties to control array behavior.
Ersatz Programming Peculiarities
How do you make this “computer” compute? Not with logic! It is like a hybrid analog-digital computer.
Programming Techniques: • Spatial arrangement of data on array• Integration of data from multiple sources• Abstraction and discrete concept formation • Control of computation using (analog) dynamical
system parameters• Assemblies of interacting modules.
Give one example: performance of arithmetic by a simple Ersatz-like system.
Cognitive Computation: Example - Arithmetic
• Brains and computers are very different in the way they do things, largely because the underlying hardware is so different.
• Consider a computational task that humans and computers do frequently, but by different means:
– Learning simple arithmetic facts
Learning the “Right Thing”
Cognition is not memory for facts (like computer data) but remembering the “right things” even if the right things are constructed from many experiences and don’t actually exist!
Most (99.9%) sensory input data is discarded. (The essential process of “creative data destruction.”)
What is kept are useful abstractions and transformation of the inputs.
Arithmetic
Digital computers compute the answers to problem using well-known logic based algorithms.
Humans do it very differently.The human algorithm for elementary multiplication
facts seems to look like:
1. Find a number that is the answer to some multiplication problem and
2. A product number that is about the right size.
This is a process involving memory and estimation, not computation as traditionally understood.
Next, develop advantages and disadvantages of doing it this way.
A Problem with Arithmetic
• We often congratulate ourselves on the powers of the human mind.
• But why does this amazing structure have such trouble learning elementary arithmetic?
• Adults doing arithmetic are slow and make many errors.
• Learning the times tables takes children several years and they find it hard.
Brain Software: John von Neumann
Von Neumann: 1958, The Computer and the Brain
The nervous system is a complex machine which manages to do its exceedingly complex work on a rather low level of precision.
Von Neumann, as a numerical analyst, knew that errors would rapidly grow and the result would be meaningless if there were more than a few steps in the computation.
Computational Strategy
Ways to avoid problem:• Use a small number of steps• Use discrete (“logic-like”) operations rather than hard (“analog”) operations.
Engineering rule: Digital is easy, analog is hard.
Von Neumann: … Whatever language the central nervous system is using is
characterized by less logical and arithmetical depth than we are normally used to.
A small number of powerful operations are strung together to form a mental computation.
Teaching of Mathematics
Collaborators: Prof. Kathryn Spoehr, Dr. Susan Viscuso, and Dr. David Bennett
My own interest goes back to a joint paper with Prof. Phil Davis of Brown Applied Mathematics.
Point of the paper:
The “Theorem-Proof” method of teaching mathematics has ruined mathematics in the 20th Century.
Reason for Ruination
Real mathematicians do not think this way.
Mathematicians use a complex blend of intuition, perception, and memory to understand complex systems.
Proving theorems is the last stage, to convince others that you are correct.
Effects very hard on consumers of mathematics: Engineers and scientists.
They say, “I don’t think like this.” and lose confidence in their intuitions.
Why is Arithmetic so Hard?People are much worse than they should be at
elementary arithmetic.Elementary arithmetic fact learning involves
making the right associative links between pairs of the 10 digits to give products, sums, etc.
Only a few hundred facts to learn ...
Arithmetic rules are orders of magnitude less complicated than syntax in language.
But: Takes years for children to learn arithmetic.
The Problem with Arithmetic
At the same time children are having trouble learning arithmetic they are knowledge sponges learning – Several new words a day.– Social customs.– Many facts in other areas.
AssociationIn structure, arithmetic facts are simple associations.
Example: multiplication:(Multiplicand)(Multiplicand) Product
Simple association (S-R learning) was popular idea in the 1920’s (Thorndyke).
Formation of arbitrary associations is the basic rationale behind flash cards.
Can learn this way, but hard and not really with “understanding.”
Multiplication• Arithmetic facts are not arbitrary associations.
• They have an ambiguous structure that gives rise to associative interference.4 x 3 = 124 x 4 = 164 x 5 = 20
• Initial ‘4’ has associations with many possible products.
• Ambiguity causes difficulties for simple associative systems.
Number Magnitude
• One way to cope with ambiguity is to embed the fact in a larger context.
• Numbers are much more than arbitrary abstract patterns.
• Experiment:– Which is greater? 17 or 85– Which is greater? 73 or 74
Number Magnitude
It takes much longer to compare 74 and 73.
When a “distance” intrudes into what should be an abstract relationship it is called a symbolic distance effect.
A computer would be unlikely to show such an effect. (Subtract numbers, look at sign.)
Magnitude Coding
Key observation: We see a similar effect when sensory magnitudes are being compared.
Deciding which of – two weights is heavier, – two lights is brighter, – two sounds is louder – two numbers is bigger
displays the same reaction time pattern.
Magnitude Coding
This effect and many others suggest that we have an internal representation of number that acts like a sensory magnitude.
Conclusion: Instead of number being an abstract symbol, humans use a much richer coding of number containing powerful sensory and perceptual components.
Magnitude Coding
Argue that this “perceptual” elaboration of number is a good thing. It
– Connects abstract “number” to the physical world.
– Provides the basis for mathematical intuition.
– Is perhaps responsible for the creative aspects of mathematics.
Mathematics by Adults
Mathematics is the most lawful and abstract of the sciences.
Real mathematicians would not crudely associate a number with a weight?
Would they? In fact, they do.
Consider Jacques Hadamard’s book The Psychology of Invention in the Mathematical Field. (1946)
How Experts do Mathematics
Hadamard (a world class mathematician) interviewed his peers in 1943-5.
Conclusion: Most of them did not reason abstractly.
They used • Visualization• Auditory imagery• Kinesthetic imagery with imagined muscle movements for insights in to “abstract” systems.
Language and formal abstract reasoning were conspicuous by their rarity.
Quotes:
The mental pictures of the mathematicians whose answers I have received are most frequently visual, but they may also be of another kind – for example, kinetic. There can be auditive ones.”
… practically all of (them) avoided not only the use of mental words but also the mental use of any algebraic or any precise signs … they use vague images. There are two or three exceptional cases, the most important of which is the mathematician George D Birkhoff, one of the greatest in the world, who is accustomed to visualize algebraic symbols and work with them mentally … Hadamard
Einstein
One of Hadamard’s informants was Einstein.
The words or the language as they are written or spoken do not seem to play any role in my mechanism of thought.
Albert Einstein
To Einstein, thinking involves transforming of received sense images into a series of “memory pictures.”
Thinking began when he found a certain picture recurring in a number of series. “… such an element becomes a concept.”
Einstein
These concepts are not words but can become linked to words.
It is by no means necessary that a concept must be connected with a sensorily cognizable and reproducible sign (a word) but when this is the case thinking becomes by means of that fact communicable. (Albert Einstein, Autobiographical Notes.)
Therefore, the function of words and concepts is to convince others, not necessarily yourself who had understood the system through other means.
Richard Feynman
Richard Feynman was a “kinesthetic” thinker:
Feynman said to Dyson … that Einstein’s great work had sprung from physical intuition and when Einstein stopped creating it was because ‘he stopped thinking in concrete physical images and became a manipulator of equations.’ Intuition was not just visual but also auditory and kinesthetic. Those who watched Feynman in moments of intense concentration came away with a strong, even disturbing sense of the physicality of the process, as though his brain did not stop at the gray matter but extended through every muscle in his body. A Cornell dormitory neighbor opened Feynman’s door to find him rolling about on the floor beside his bed as he worked on a problem.
James Gleick, Genius: The Life and Science of Richard Feynman
Non-Verbal Science
Among the virtuosos of intuitive (non-verbal) science are physicists with their “gedanken experiments.”
At the age of 16 Einstein performed a powerful visual thought experiment.
He assumed an observer was moving along side an electromagnetic wave.
Think of a boat moving in the same speed and direction as an ocean wave.
Waves: Water and Electro-Magnetic
See a stationary hill of water.
See a stationary electro-magnetic field?
Waves Water wave: See a stationary hill of water.
If you traveled with the same speed and direction as an electromagnetic wave, you would see a motionless spatially varying electric and magnetic field.
Einstein knew this had been looked for and never found.
Relativity
Perhaps we did not see this because it was impossible for an observer to travel at the same velocity as an electromagnetic wave.
Results of this insight:
… a paradox upon which I had already hit at the age of 16: if I purse a beam of light with the velocity c … I should observe such a beam as a spatially oscillatory electromagnetic field at rest. However there seems to be no such thing.
… One sees that in this paradox, the germ of the special relativity theory is already contained.
Albert Einstein, Autobiographical Notes.
Visual Image of a ProofHadamard gives his own visual images of a proof. The proof is by contradiction.
Theorem: There is no largest prime number.• Suppose someone claims that P is the largest prime.• Form the product of all the prime numbers up to P,
forming a large number, N.• Add one to N, giving N+1.• Given this construction, all the primes up to P must give a
remainder of 1 when they divide N+1.Previously Shown: All integers are primes or the product of
primes.Therefore, either (1) the number N+1 itself is prime or (2) It is
the product of two or more primes, each larger than any in the sequence of known primes that formed N+1.
I consider all primes from 2 to 11, say 2,3,5,7,11. I see a confused mass.
I form their product, 2x3x5x7x11. N being a rather large number I imagine a point far from the confused mass.
I increase that product by 1, say N+1. I see a second point a little beyond the first.
That number, if not a prime, must admit of a prime divisor. … I see a place somewhere between the confused mass and the first point.
Problems
These images are supposed to be universal.
In fact: Hadamard’s image is wrong for the number 11.
For N=11, N+1 is 2,311 which is itself prime so the “place” in the last image is identical to the “second point.”
If the number used is N=13, N+1 = 30,031 which is the product of 59 and 509.
N=13 agrees with Hadamard’s image.
A visual image can be misleading! We need formal proofs to check our intuitions.
Model Makes Small Mistakes, Not Big Ones
Model used a neural network based associative system.
Buzz words: non-linear, associative, dynamical system, attractor network.
The magnitude representation is built into the system by assuming there is a topographic map of magnitude somewhere in the brain.
First Observation about Arithmetic Errors
Arithmetic fact errors are not random.
• Errors tend to be close in size to the correct answer.
• In the simulations, this effect is due to the presence of the magnitude code.
Second Observation: Error Values
• Values of incorrect answers are not random.
• They are product numbers, that is, the answer to some multiplication problem.
• Only 8% of errors are not the answer to a multiplication problem.
Human Algorithm for Multiplication
The answer to a multiplication problem is:
1. Familiar (a product)
2. About the right size.
Human Algorithm for Multiplication
• Arithmetic fact learning is a memory and estimation process.
• It is not really a computation!
Flexible and programmable
Learning facts alone doesn’t get you far.
The world never looks exactly like what you learned.
Heraclitus (500 BC):•It is not possible to step twice into the same river.
A major goal of learning is to apply past learning to new situations.
Getting Correct What you Never Learned: Comparisons
Consider number comparisons: Is 7 bigger than 9?
We can be sure that children do not learn number comparisons individually.
There are too many of them.
– About 100 single digit comparisons– About 10,000 two-digit comparisons– And so on.
Building a System to Perform Simple Arithmetic Operations
• We have a model for arithmetic learning. • Can we now make a system capable of performing some
simple mathematical operations on numbers? • Techniques we can use include attractor networks,
differential weighting of portions of an array of units, and a specialized data representation for number.
• Examples of simple operations are
1. increment, decrement,
2. greater than, less than,
3. round off.• The current version is restricted to the digits
from 1 to 10.
A bar code represents magnitude by position on a map. There are ten patterns for the digits from 1 to 10.
The patterns for each digit overlap slightly.
Bar Codes
Programming Patterns
The number map is weighted by programming patterns.
One pattern is used for each operation.
The pattern(s) for number and for operation multiply.
System dynamics gives the final answer.
•Count up (starting number + 1)
•Count down (starting number – 1)
•Greater than: Given two digits, output the larger.
•Lesser than: Given two digits, output the smaller.
•Round off: Given activity at a location on the array, output the nearest integer.
Basic Arithmetic Operations
Count up (starting number + 1)
Count down (starting number – 1) (mirror image of count up).
Programming Pattern: Count Up/Down
Greater than: Given two digits, output the larger.
Lesser than: Given two digits, output the smaller. (mirror image of “Greater than” pattern.
Programming Pattern: Greater Than/Less Than
Round off: Given activity at a location on the array, output the nearest integer.
Programming Pattern: Round-Off
•We are manipulating the starting point in the attractor structure. •Once the attractor structure is formed many operations can be performed without further learning.•Operations are not “logical” but based on continuous mathematics.•This might be considered a very simple kind of mathematical intuition.
Manipulating Starting Point
Assume something like experimental reaction time is related to the time taken to get the answer. The Greater-Than operation shows a “symbolic distance” effect just like humans do.
Experimental Data: Single Digit
Number Comparisons
Bars overlap.
Integers close in magnitude show a degree of similarity in their representations. A 2002 paper in Science showed this effect in single unit recordings in primate prefrontal cortex. Note the similarity to the symbolic distance curves.
Physiological Evidence
A Nieder, DJ Friedman, EK Miller (2002). Representation of the quantity of visual items in the primate prefrontal cortex. Science 297, 1708
Numerosity:
A problem joining ‘abstract’ quantities with pattern recognition. Given a set of identical items presented in a field, report how many items there are.
Numerosity
For humans, number from one to about four works in what is called the subitizing region. Subjects “know” quickly how many objects are present. Each additional item (up to 4) adds about 40 msec to the response time.
In the counting region (more than 4 objects) each additional item adds around 300 msec per item. This figure is consistent with explicit counting.. Evidence that there is a strong “total activity” component to subitizing.
Subitzing
The network of networks model propagates pattern information laterally.
Total maximum activity gives numerosity.
Number Estimation with Lateral Information Flow
Segment the field by using boundary modules in attractor states. (No lateral transmission.) (Lateral interactions can be halted by interposing lines or regions.)
Metacontrast
Which Plate has the Most Cookies?
Counting CookiesProgram Counting Cookies:
1. The image is segmented. 2. The numerosity of objects in each segment is
computed using activity based lateral spread.3. Activity measure is converted into an integer by
the round-off operation.4. Integers are compared using the greater-than
operator with the largest integer is the output. This very simple program is based on topographic
and dynamic representational assumptions.
Not just a toy problem: Can let you estimate number of similar or identical objects with a largely parallel and selective algorithm.
Magnitude
• We now see the usefulness of the “sensory” magnitude number representation.
• We can use magnitude to do computations like number comparisons without having to learn special cases.
Implications
We have constructed a system that acts like like logic or symbol processing in a limited domain.
It does so by using its connection to perception to do much of the computation.
These “abstract” or “symbolic” operations display their underlying perceptual nature in effects like symbolic distance and error patterns in arithmetic.
Connect perception to abstraction and gain the power of each approach
• Humans are a hybrid computer.
• We have a recently evolved, rather buggy ability to handle abstract quantities and symbols.
• (only 100,000 years old. We have the alpha release of the intelligence software.)
• We combine symbol processing with highly evolved, extremely effective sensory and perceptual systems.
• Realized in a mammalian neocortex.
• (over 500 million years old. We have a late release, high version number of the perceptual software.)
• The two systems cooperate and work together effectively.
Connect perception to abstraction and gain the power of each approach