30
I501 – Introduction to Informatics [email protected] http://informatics.indiana.edu/jbollen/I501 Informati cs and computing Lecture 8 – Fall 2010 From swarming to collaborative filtering. http://www.csml.ucl.ac.uk/images/Netflix_Prize.jpg

From swarming to collaborative filtering

  • Upload
    clem

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

From swarming to collaborative filtering. http:// www.csml.ucl.ac.uk/images/Netflix_Prize.jpg. Informatics: a possible parsing. Computer Science. STOP! ;-). b. b. b. a. a. a. b. a. b. b. a. a. b. a. b. Psilophyta/Psilotum. Let’s Observe Nature!. What do you see? - PowerPoint PPT Presentation

Citation preview

Page 1: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

From swarming to collaborative filtering.

http://www.csml.ucl.ac.uk/images/Netflix_Prize.jpg

Page 2: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Informatics:

a possible parsing

Computer ScienceSTOP! ;-)

Page 3: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Let’s Observe Nature!

What do you see? Plants typically branch out How can we model that?

Observe the distinct parts Color them Assign symbols

Build Model Initial State: b b -> a a -> ab

Doesn’t quite Work!

Psilophyta/Psilotum

bab

bb

b

b

bb b

aa

aa

aaa

Page 4: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Complex systems approach: looking at nature

A complex system is any system featuring a large number of interacting components (agents, processes, etc.) whose aggregate activity is nonlinear not derivable from the summations of the activity of

individual components Network identity: Components form aggregate

structures or functions that requires more explanatory devices than those used to explain the components Genetic networks, Immune networks, Neural networks,

Social insect colonies, Social networks, Distributed Knowledge Systems, Ecological networks

Bottom-up Methodology Collections of simple units interacting to form a more

complex hole Study of Simple Rules that Produce Complex Behavior Discovery of Global Patterns of behavior

Page 5: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

What about our plant?

An Accurate model requires Varying angles Varying stem lengths Randomness

The Fibonacci Model is similar Sneezewort:

Psilophyta/Psilotum

bab

bb

b

b

bb b

aa

aa

aaa

Page 6: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Fibonacci Numbers!

Rewriting production rules Initial State: A A -> B B -> AB

n=0 : A n=1 : B n=2 : AB n=3 : BAB n=4 : ABBAB n=5 : BABABBAB n=6 : ABBABBABABBAB n=7 : BABABBABABBABBABABBAB

The length of the string is the Fibonacci Sequence 1 1 2 3 5 8 13 21 34 55 89 ...

Fibonacci numbers in Nature

Livio (2003) The Golden Ratio: The Story of PHI, the World's Most Astonishing Number

Page 7: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Another example: flocking in nature

Flocking occurs when large groups of animals of the same species form aggregates that behave like a coherent, single entity Herds, flocks, schools, swarms, humans

Properties: Collective flight, migration, foraging, “drafting” Coherence: aggregate has its own

distinguishable system behavior and form Adaptive: behavior of aggregate responds and

adapts to external events (predators) Coordination: behavior of individuals seems to

be indicative of central control or symbolic/long-range communication, but isn’t

Page 8: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

How to model flocking behavior?

Describing properties of aggregate behavior will only go so far: Study shapes of aggregate Situations in which it occurs Dynamics, features of behavior Biologists fixing radios?

Lessons from complex systems: Complex systems behavior: not derivable

from the summations of the activity of individual components

Network identity: Components form aggregate structures or functions that requires more explanatory devices than those used to explain the components ~ emergence

Bottom-up Methodology: Collections of simple units interacting to form a

more complex hole Study of Simple Rules that Produce Complex

Behavior

Parrish(2002) – Self-organized fish schools

Page 9: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Models of flocking behavior

Boids: Craig Reynolds “Flocks, Herds and schools”, SIGGRAPH 21(4),1987

Visual model of bird flocks Lack of centralized control Lack of symbolic communication

General approach: Local computation, i.e. each individual maximizes: Collision avoidance: steer away from impact Speed matching: match speed of neighboring

birds Flock centering: steer towards perceived flock

center Flock behavior = emerges from interactions of large

groups of such construed individuals

Page 10: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Ant trails: emergent organizaton driven by communication

Problem: optimize location and extraction of food source Lack of centralized control Lack of symbolic communication

General modeling approach: Local computation leads to higher order emergent

computation Walk algorithm probabilistic, but biased by pheromone

concentration Ants leave pheromone trail when food is found Pheromone evaporates with time Find shortest path

Note: ~ greedy algorithm: hill-climbing on trail strength leads to

adaptive, collective behavior Approaches to address traveling salesman problem: BIOS

group: S. Kaufmann (Santa Fe), see also M. Dorigo(2006) Ant Colony Optimization-IEEE Computational Intelligence Magazine for overview

Page 11: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Abstracted: Stigmergy Stigma + ergon: sign + work

Indirect communication between various agents through environment, traces in environment Lack of centralized control Environment provides substrate for

communication, information storage Constrains individual agents

Emergence of complex, collective and goal-directed behavior Observed in social insects: termites, ants, bees Increasingly applied to social phenomena and

technology/engineeringSee:• Heylighen F. (2007). Why is Open Access Development so Successful?

Stigmergic organization and the economics of information, in: B. Lutterbeck, M. Baerwolff & R. A. Gehring (eds.), Open Source Jahrbuch 2007, Lehmanns Media, 2007, p. 165-180.

• http://www.mitpressjournals.org/doi/abs/10.1162/106454699568692

Page 12: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Probabilistic cleaning: ants

Very simple rules for colony clean up Pick dead ant. if a dead ant is found pick it up (with

probability inversely proportional to the quantity of dead ants in vicinity) and wander.

Drop dead ant. If dead ants are found, drop ant (with probability proportional to the quantity of dead ants in vicinity) and wander.

Figure by Marco Dorigo in Real ants inspire ant algorithms

See Also: J. L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, L. Chretien. “The Dynamics of Collective Sorting Robot-Like Ants and Ant-Like Robots”. From Animals to Animats: Proc. of the 1st Int. Conf. on Simulation of Adaptive Behaviour. 356-363 (1990).

Page 13: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Ant-inspired robots Rules (Becker et al, 1994)

Move: with no sensor activated move in straight line Obstacle avoidance: if obstacle is found, turn with a random

angle to avoid it and move. Pick up and drop: Robots can pick up a number of objects

(up to 3) If shovel contains 3 or more objects, sensor is activated and

objects are dropped. Robot backs up, chooses new angle and moves.

Results in clustering The probability of dropping items increases with quantity of

items in vicinity

Figure from R Beckers, OE Holland, and JL Deneubourg [1994]. “From local actions to global tasks: Stigmergy and collective robotics”. In Artificial Life IV.

Page 14: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

becker et al experiments

Page 15: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Luc Steels et al: ant algorithms

http://www.youtube.com/watch?v=93LwvuxDbfU

Page 16: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Adaptive information systems

Swarm Smarts. 78. Scientific American March 2000. ERIC BONABEAU

Johan Bollen (1994): adaptive hypertext systems

Page 17: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Recommender systems: general principles

• People ~ n-dimensional vectors Person = { CD/book purchases, DVDs rented, …} Vector is a representation of consumer. Entries

can be weighted (TFIDF etc) “Vector Space Model”

Calculate similarity of users: Correlation of user vectors Cosine similarity

Group consumers according to similarity: clustering

Similar users: discrepancies in vectors are recommendations

Used for all sorts of applications Similar problem to “bad of words” Multiple user personalities? Orthogonality? Same = better??

Shameboy

Plastic Operator

Angle: Consumer Similarity

[Shameboy, Plastic Operator, Figurine,…]

Buyer 1 [1, 1, 0, 0, 0,…]

Buyer 2 [1, 0, 0, 0, 0,…]

Page 18: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Tracking scientists (they are people too!)

http://informatics.indiana.edu/jbollen/PLosONEmap

André Skupin

Borner/Ketan (2004)

PNAS 101(1)

Highly recommended:

http://www.scimaps.org/

Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803

Page 19: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

documents

interface

We’re all ants now?• User vectors:

Represent individual trail/exploration in n-dimension information space

Recommender systems: bias probabilistic exploration paths of users based on

others’ actions Higher probability of following existing trails

Analogy: Set of user vectors + recommender system ~ ant trails Solving traveling salesman in n dimensions? ;-)

Modeling fads, hypes, flashcrowds in cyberspace, self-fulfilling prophecies, but also long tail effects, more optimized exploration of information space?

Which features of recommender systems promote either of the above?

Cf. youtube.com: “other users are watching” vs. batch-processed recommendations

Emergence of COMPUTATIONAL SOCIAL SCIENCE http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745217 / Lazer et al (2009). Life in the network – the coming age

of computational social science. Science. 2009 February 6; 323(5915): 721–723.

recommender

Page 20: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Next week readings

1. Gouth (2009) Training for Peer Review. Science Signaling 2 (85), tr2. [DOI: 10.1126/scisignal.285tr2]

2. MONASTERSKY (2005) The number that is devouring science. Chronicle of higher education, Section: Research & Publishing Volume 52, Issue 8, Page A12

3. Eysenbach G, 2006 Citation Advantage of Open Access Articles. PLoS Biol 4(5): e157. doi:10.1371/journal.pbio.0040157

4. Lance Fortnow (2009) Time for Computer Science to Grow Up. Communications of the ACM, august, 52(8) doi:10.1145/1536616.1536631

Page 21: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Next up: your proposal assignment

Page 22: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Your proposal assignmentFrom the syllabus:

“You have a dream. Write it down in the form of a proposal for the NSF Graduate Research Fellowship Program (CISE field of study). This proposal accounts for 20% of your final grade. 10% for the final presentation in class.”

What is the NSF Graduate Research Fellowship Program ?

“The program recognizes and supports outstanding graduate students who are pursuing research-based master's and doctoral degrees in fields within NSF's mission.  The GRFP provides three years of support for the graduate education of individuals who have demonstrated their potential for significant achievements in science and engineering research. The ranks of NSF Fellows include individuals who have made transformative breakthroughs in science and engineering research and have become leaders in their chosen careers and Nobel laureates.”

http://www.nsf.gov/pubs/2010/nsf10604/nsf10604.htm

Page 23: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

How does it work?

NSF wants:• Personal Profile,• Education and Work Experience, • Planned Graduate Program,• ** Personal Statement (2p)• ** Previous Research Experience (2p)• ** Proposed Plan of Research and References (2p)

I want:

1. The items marked with **, i.e. a total of 6 pages + references = 20% of grade

2. A 15’ in-class presentation of your work (December 1 and December 8) = 10% of grade

Page 24: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Formatting

• ** Personal Statement (2p)• ** Previous Research Experience (2p)• ** Proposed Plan of Research and References (2p)

• Maximum length of two pages, including all references, citations, charts, figures, and images.

• Standard 8.5" x 11" page size, 12-point, Times New Roman font, 1" margins on all sides, and must be single spaced or greater

• No hyperlinks, only citations in References Cited section. Images may be included in the page limits.

Page 25: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Personal statement

Important questions to ask yourself before starting the essay:• Why are you fascinated by your research area?• What examples of leadership skills and unique characteristics

do you bring to your chosen field?• What personal and individual strengths do you have that make

you a qualified applicant?• How will receiving the fellowship contribute to your career

goals?• How do these activities address the Intellectual Merit and

Broader Impacts criteria?

Example:

http://www.mitbrandon.com/nsfstatement.shtml

Page 26: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Previous Research Experience

Important questions to ask yourself before starting the essay:• What are all of your applicable experiences?• For each experience, what were the key questions,

methodology, findings, and conclusions?• Did you work in a team and/or independently?• How did you assist in the analysis of results?• How did your activities address the Intellectual Merit and

Broader Impacts criteria?

Example:

http://rachelcsmith.com/NSF/DisturbancePRE.pdf

Page 27: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Proposed Plan of Research

Review criteria:

http://www.nsfgrfp.org/how_to_apply/review_criteria

http://www.nsf.gov/pubs/2002/nsf022/bicexamples.pdf

Intellectual merit:

How important is the proposed activity to advancing knowledge and understanding within its own field or across different fields?

How well qualified is the proposer (individual or team) to conduct the project? (If appropriate, the reviewer will comment on the quality of prior work.)

To what extent does the proposed activity suggest and explore creative, original, or potentially transformative concepts?

How well conceived and organized is the proposed activity?

Is there sufficient access to resources?

Page 28: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

Proposed Plan of ResearchReview criteria:

http://www.nsfgrfp.org/how_to_apply/review_criteria

http://www.nsf.gov/pubs/2002/nsf022/bicexamples.pdf

Broader Impacts

– Activities and projects that:

How well does the activity advance discovery and understanding while promoting teaching, training, and learning?

How well does the proposed activity broaden the participation of underrepresented groups (e.g., gender, ethnicity, disability, geographic, etc.)?

To what extent will it enhance the infrastructure for research and education, such as facilities, instrumentation, networks, and partnerships?

Will the results be disseminated broadly to enhance scientific and technological understanding?

What may be the benefits of the proposed activity to society?

Page 29: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

General words of wisdomYou need to mind the formal criteria/check lists etc but what really

matters:

- Don’t so much focus on the task or burden of writing a proposal, but on the pleasure of outlining an interesting, relevant and successful research agenda.

- You are asking for support ($$$). Someone will make a decision to support your research. They need to a see compelling reason to do so. Your essay must make a good scientific and societal case for why one should invest in your idea and professional development.

- Make clear that you are qualified and well-positioned to execute what you propose.

- Start with the big issues. The limit is 2 pages, so try to be as succinct and to the point as you can. Make it work. Focus on the why, then on the how.

- Quality of exposition matters. Don’t annoy reviewers with jargon, crummy grammar, overly long sentences.

- Be mindful of your audience. Your reviewers will be experts but not the degree that you may be.

Page 30: From swarming to collaborative filtering

I501 – Introduction to Informatics

[email protected]://informatics.indiana.edu/jbollen/I501

Informatics and computing

Lecture 8 – Fall 2010

About assignment 2

Some changes you want to be mindful of:- NEW deadline = December 1st, 4PM (16:00)- Submission: two types

- Partial: regular submission of Assignment 2 as planned. Graded on 25 point scale. Grade for assignment 1 is maintained.

- Full: submission of 1 single assignment that comprises and integrates both assignments 1 and 2. Graded on 50 point scale. Expectation: SIGNIFICANT improvement in portion relevant to assignment 1. I will grade accordingly. Correct answer to algorithm is NOT sufficient.