38
P. Smyth: Networks MURI Kickoff Meeting, Nov 18 2008: 1 Scalable Methods for the Analysis of Network-Based Data MURI Project: University of California, Irvine Principal Investigator: Padhraic Smyth Kick-off Meeting November 18 th 2008

Goals for Today’s Meeting

  • Upload
    cianna

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Scalable Methods for the Analysis of Network-Based Data MURI Project: University of California, Irvine Principal Investigator: Padhraic Smyth Kick-off Meeting November 18 th 2008. Goals for Today’s Meeting. Review overall goals and research of MURI project University research groups - PowerPoint PPT Presentation

Citation preview

Page 1: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 1

Scalable Methods for the Analysis of Network-Based Data

MURI Project: University of California, Irvine

Principal Investigator: Padhraic Smyth

Kick-off Meeting

November 18th 2008

Page 2: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 2

Goals for Today’s Meeting

• Review overall goals and research of MURI project

• University research groups– learn about each other’s research– See opportunities for collaboration

• MURI team and ONR/Navy– MURI team: learn about ONR interests– ONR: learn about expertise and plans of MURI team

• Action items– Future meetings and collaborative activities– Review Year 1 research goals

Butts

Page 3: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 3

Outline

• Introductions

• Review today’s agenda– Schedule of talks– Logistics

• Overview of our MURI research project– Themes and goals– Tasks

Page 4: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 4

MURI Investigators

Carter Butts UCI

Michael Goodrich UCI

Dave HunterPenn State

David Eppstein UCIPadhraic Smyth UCI

Mark Handcock U Washington

Dave Mount U Maryland

Page 5: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 5

MURI Project Participants

• Postdocs– UC Irvine

• Romain Thibaux (Computer Science)

• Graduate Students (all UC Irvine)– Computer Science

• Darren Strash• Lowell Trott

– Statistics• Chris DuBois

– Social Science• Chris Marcum• Lorien Jasny• Emma Spiro• Zack Almquist

Page 6: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 6

Outline

• Introductions

• Review today’s agenda– Schedule of talks– Logistics

• Overview of our MURI research project– Themes and goals– Tasks

Page 7: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 7

Agenda for MURI Kickoff Meeting at UC Irvine November 18th 2008Location: UC Irvine, Bren Hall, Room 4011

MORNING SESSION8:30  Arrive, continental breakfast available

9:00  Introductions and overview of MURI proposal         Padhraic Smyth (UCI)

9:30  Research and application challenges from ONR's perspective          Martin Kruger (ONR, MURI Program Manager)

10:00 Brief discussion/Q&A session between ONR representatives and PIs

Page 8: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 8

Agenda for MURI Kickoff Meeting at UC Irvine 10:15 Break

10:30  Tutorial Session: Statistical models for network data:       Mark Handcock (U Washington), Carter Butts (UCI), Dave Hunter (Penn State)        - Fundamentals of exponential family random graph models (ERGMs)          - Parameter estimation in ERGMs: principles and computational challenges 

        - Alternative statistical frameworks such as latent space models

LUNCH12:00 Lunch for PIs and UCI visitors at the University Club (next door to Bren Hall)

Page 9: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 9

Agenda for MURI Kickoff Meeting at UC Irvine AFTERNOON SESSION: 15-minute brief Research Presentations:

1:30   - Studying networks through an algorithmic lens           Michael Goodrich (UCI)

1:45   - Fast algorithms for computing network statistics:           David Eppstein (UCI)

2:00   - Data structures for dynamic and kinetic multidimensional point sets:           Dave Mount (U Maryland)

2:15  - Modeling dynamic and relational event data:           Carter Butts

2:30   - Statistical modeling of large text collectons:            Padhraic Smyth (UCI)

2:45   - Modeling partially observed network data:           Mark Handcock (U Washington)

Page 10: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 10

Agenda for MURI Kickoff Meeting at UC Irvine 3:00  Break and Informal Discussion

3:30  Brief talks on Software and Data Sets:

         - R software for network analysis:           Dave Hunter (Penn State)

         - Experimental results on real-world networks         PhD students from Carter Butt's group (Sociology Dept, UCI)

         - Large network data sets for experimentation:          Chris DuBois (PhD student, Statistics Department, UCI)

Page 11: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 11

Agenda for MURI Kickoff Meeting at UC Irvine 4:15  Open Discussion on Research Plans        - relation of MURI research topics to military applications        - further opportunities for collaboration within the team        - year 1 research goals

5:00  Organizational Issues and Wrap-up          - future meetings (frequency, location)        - encouraging interaction between team members          (conference calls, weekly research meetings, etc)        - use of collaborative Web pages         - action items

5:30  Adjourn     

Page 12: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 12

Logistics

• Meals– Lunch at University Club - for PIs and non-UCI folks– Coffee breaks

• Wireless– Should be able to get 24-hour guest access

• Slides– Will be available online by the end of today

Page 13: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 13

Outline

• Introductions

• Review today’s agenda– Schedule of talks– Logistics

• Overview of our MURI research project– Themes and goals– Tasks

Page 14: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 14

MURI Project: Scalable Methods for Analysis of Network-Based Data

• 4 universities collaborating, 7 PIs– Support for (approx) 8 graduate students and 3 postdocs

or research associates

• 3-year project with possible extension to 5 years

• Time Period– Funding arrived at UCI in September 2008– At other universities in Sept/Oct 2008– Official project start/end

• June 1 2008 to May 30 2011/2013

Page 15: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 15

A Small Social Network

Butts Butts

Smyth

Hunter

Handcock

Mount

Goodrich

Eppstein

Page 16: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 16

A Small Social Network

Statistics

Smyth

Hunter

Handcock

Mount

Goodrich

Eppstein

Butts

Page 17: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 17

A Small Social Network

Statistics

Smyth

Hunter

Handcock

Mount

Goodrich

Eppstein

Butts

Algorithms &Data Structures

Page 18: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 18

Page 19: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 19

Figure fromCarter Butts

Page 20: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 20

Statistical Modeling of Network Data

• Statistics = principled approach for inference from data

– Basis for optimal prediction• querying = computation of conditional probabilities/expectation

– Principles for handling noisy measurements • e.g., noisy edge observation process

– Integration of different sources of information• e.g., combining edge information with node covariates

– Quantification of uncertainty• e.g., which model is a better explanation of the data

Page 21: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 21

Limitations of Existing Methods

• Network data over time– Relatively few statistical models for dynamic network data

• Heterogeneous data– e.g., few techniques for incorporating text, spatial

information, etc, into network models

• Computational tractability– Many network modeling algorithms scale exponentially in

the number of nodes N

Page 22: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 22

Example

• G = {V, E}V = set of N nodesE = set of directed binary edges

• Exponential random graph model (ERGM)

P(G | ) = f( G ; ) / normalization constant

The normalization constant = sum over all possible graphs

How many graphs? 2 N(N-1)

e.g., N = 20, we have 2380 ~ 1038 graphs to sum over

Page 23: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 23

Key Themes of our MURI Project

• Foundational research on new statistical estimation techniques for network data– e.g., principles of modeling with missing data

• New algorithms for heterogeneous network data– Incorporating time, space, text, other covariates

• Faster algorithms– E.g., approximation methods for very large data sets

• Software– Make network inference software publicly-available (in R)

Page 24: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 24

Key Themes of our MURI Project

Fast Algorithms

Statistical Methods

Richer models

Software

Large Heterogeneous

Data Sets

Page 25: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 25

Tasks

A: Fast network estimation algorithmsEppstein, Butts

B: Spatial representations and network dataGoodrich, Eppstein, Mount

C: Advanced network estimation techniquesHandcock, Hunter

D: Scalable methods for relational eventsButts

E: Network models with text dataSmyth

F: Software for network inference and predictionHunter

Page 26: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 26

Task A: Fast Network Estimation Algorithms

• Problem:– Statistical inference algorithms can be slow because of repeated

computation of various statistics on graphs

• Goal– Leverage ideas from computational graph algorithms to enable

much faster computation – also enabling computation of more complex and realistic statistics

• Projects– Dynamic graph methods for change-score computation– Rapid subgraph automorphism detection for feature counting– Dynamic connectivity

Investigators: Eppstein, Butts

Page 27: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 27

Task B: Spatial Representations and Network Data

• Problem:– Spatial representations of network data can be quite useful (both

latent embeddings and actual spatial information) but current statistical modeling algorithms scale poorly

• Goal– Build on recent efficient geometric data indexing techniques in

computer science to develop much faster and efficient algorithms

• Projects– Improved algorithms for latent-space embeddings– Fast implementations for high-dimensional latent space models– Techniques for integrating actual and latent space geometry

Investigators: Goodrich, Eppstein, Mount

Page 28: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 28

Task C: Advanced Estimation Techniques

• Problem:– Current statistical network inference models often make unrealistic

assumptions, e.g.,• Assume complete (non-missing) data• Assume that exact computation is possible

• Goal– Develop new theories and techniques that relax these assumptions,

i.e., methods for handing missing data and techniques for approximate inference

• Projects– Inference with partially observed network data– Approximation methods

• Approximate likelihood techniques• Approximate MCMC algorithms

– Will leverage new techniques developed in Tasks A and B

Investigators: Handcock, Hunter

Page 29: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 29

Figure from Dave Hunter, Penn State

Page 30: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 30

Task D: Scalable Temporal Models

• Problem:– Few statistical methods for modeling temporal sequences of

events among a network of actors

• Goal– Develop new statistical relational event models to handle an

evolving set of events over time in a network context

• Projects– Specification of relational event statistics– Rapid likelihood computation for relational event models– Predictive event system queries– Interventions, forecasting, and “network steering”– Can build on ideas from Tasks A, B, C

Investigator: Butts

Page 31: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 31

Page 32: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 32

Task E: Network Models and Text Data

• Problem:– Lack of statistical techniques that can combine network and text

data within a single framework (e.g., email communication)

• Goal– Leverage recent advances in both statistical text mining and

statistical network modeling to create new combined models

• Projects– Latent variable models for text and network data– Text as exogenous data for statistical network models– Modeling of text and network data over time– Fast algorithms for statistical modeling of text/networks– Can build on ideas from Tasks A, B, C and D

Investigator: Smyth

Page 33: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 33

Network of email communicationpatterns in a corporate research lab

Page 34: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 34

Task F: Software for Network Inference and Prediction

• Goal– Disseminate algorithms and software to research and practitioner

communities

• How?– By incorporating our new algorithms into the R statistical package– R = open source language for stat computing/graphics– MURI team has significant prior experience with developing

statistical network modeling packages in R• network (Butts et al, 2007)• latentnet (Handcock et al, 2004)• ergm (Handcock et al, 2003)• sna (Butts, 2000)

• Will integrate algorithms and techniques from earlier tasks

Investigator: Hunter

Page 35: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 35

Data Sets

• Traditional social network data sets

• “Next generation” data sets– Dynamic network data

• E.g., WTC communications

– Network data with text• E.g., political blogs, Enron emails

– Often much larger and richer than traditional data sets

– See afternoon talks by PhD students Lorien Jasny and Chris DuBois

Page 36: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 36

Evaluation Methods

• Traditional statistical metrics – Log-likelihood on training data– Model comparisons using penalized and marginal likelihood

• Predictive metrics– E.g., for dynamic networks, prediction of edge and node properties

“out of sample”, and assessment of the accuracy of these predictions

• Classification accuracy, precision-recall (ROC), etc

• Computational metrics– Worst and average-case analysis– Empirical evaluations of computation time– Trade-offs of statistical/predictive accuracy with computation time

Page 37: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 37

Summary

• Statistical modeling is a key approach for quantitative analysis and prediction using network data

• Existing statistical network modeling techniques are potentially very powerful– But are currently computationally limited to small networks

• Leverage ideas from computer science to extend the “reach” of statistical network modeling to larger networks

• Benefits– Computationally tractable modeling of much larger networks– More sophisticated representations for network models– New applications of statistical network modeling

Page 38: Goals for Today’s Meeting

P. Smyth: Networks MURI Kickoff Meeting, Nov 18, 2008: 38

Agenda for MURI Kickoff Meeting at UC Irvine November 18th 2008Location: UC Irvine, Bren Hall, Room 4011

MORNING SESSION8:30  Arrive, continental breakfast available

9:00  Introductions and overview of MURI proposal         Padhraic Smyth (UCI)

9:30  Research and application challenges from ONR's perspective          Martin Kruger (ONR, MURI Program Manager)

10:00 Brief discussion/Q&A session between ONR representatives and PIs