Graphical Representations of Knowledge and Its Distribution Cliff Behrens Information Analysis...

Graphical Representations of Knowledge and Its Distribution

Cliff BehrensInformation AnalysisApplied ResearchTelcordia Technologies, Inc973.829.5198cliff@research.telcordia.com

Workshop on Statistical Inference, Computing and Visualization for Graphs

Stanford University, August 1 - 2, 2003

Knowledge, Consensus and Information Sharing

Cultural Knowledge Derived from Consensus

Individual Knowledge

Information Sharing Among Individuals in a Single COI

Consensus Consensus Knowledge Knowledge

Schemer Knowledge Validation Services

Issues with CSCW technology– Focus of CSCW research on new tools, less on motivating their use– Collaborative modeling building often lacks scientific rigor and quality control

Schemer Web-based technology that derives knowledge from consensus among Subject Matter Experts

– Knowledge-based collaboration reveals distribution of domain expertise among panelists

– Metrics for qualifying panelists and validating the models they produce validates saliency of domain to SMEs

estimates competency of SMEs

yields best answers based on responses of SMEs weighted by their respective competencies

Generic service, but first tried on SIAM® influence networks

SIAM® Influence Net Example

Mathematics of Consensus Analysis (Romney et al. 1986) Formal model consists of a data matrix X containing the responses Xik of SMEs 1..i..N on

items 1..k..M

– from this matrix a symmetrical matrix M* is estimated and holds the empirical point estimates M*ij, the

proportion of matching responses on all items between SMEs i and j, corrected for guessing (if appropriate), on off-diagonal elements.

Obtain approximate solution yielding estimates of the individual SME competencies (the D*i)

by applying Maximum Likelihood Factor Analysis to fit equation below and solve for the main diagonal values– M* = D*D*'

– relative magnitude of eigenvalues (λ1 > 3 λ2) implies single factor solution

D*i, are the loadings for SMEs on the first factor

– D*i = v1i {λ1}

Estimated competency values (D*i ) and the profile of responses for item k (Xik,l) used to

compute Bayesian a posteriori probabilities for each possible answer. The formula for the probability that an answer is best or “correct” one follows:

– Pr(<Xik> i=1 | Zk=l) = [D*i + (1-D*

i)/L]Xik,l [(1-D*i)(L-1)/L]1-Xik,l

Schemer Knowledge Validation Services

Knowledge-Based Communications Interface

Structured Collaboration and Advice Network

• User’s relation to other SMEs

• Most similar point-of-view

• Most different point-of-view

• Someone a bit more knowledgeable

• Gurus

• Novel thinkers

Information Routing

• Supports/challenges one’s point-of-view

• Supports/challenges the consensus point-of-view

SME Contact Data

• Email services

• Meeting services

• Other plug-ins

Latent Semantic Indexing (LSI): What is it?

memory

Standard Vector Space Model(ndims = nterms)

LSI Dimension 1

Reduced LSI Vector Space Model(ndims << nterms)

memory

computer

LSI: How Does It Work?

Analyze training collection of documents– throw-out stop words and mark-up– count frequencies of words in each document

Compute term document matrix– store word counts as entries in a matrix– apply appropriate weighting, e.g., log-entropy, to entries

Compute LSI vector space– reduce term document matrix with Singular Value Decomposition

Fold new documents into LSI vector space– document vector computed from weighted sum of its term vectors

Compute vector for query (“pseudo-document”)– query vector computed from weighted sum of its term vectors

Search vector space for semantically-close term/document vectors– compute cosine of angle between query and other vectors

Scalability: Large Document Collections and Polysemy

Many Undifferentiated

Conceptual Domains/COIs

Many Undifferentiated

Conceptual Domains/COIs

"chip""wafer"

potato

chipcorn

silicon

wafer valley

copper

Dimension 1

valleysilicon copper

Dimension 2

sugar cornpotato

waferchip

LSI: Ongoing Work Distributed LSI

– Needed for LSI to scale to massive document collections Adopts “divide and conquer” approach

– Sort documents by conceptual domain recognizes documents created for different COIs create more semantically homogeneous subcollections apply cluster analysis, e.g., bisecting K-means

– Compute independent LSI vector spaces for each subcollection more parsimonious representations of concept domains or contexts

– Compute similarity measures between spaces construct graphs from terms shared by two vector spaces compute similarity between these two graphs

– Discover appropriate search vector spaces for a query cosine calculations (as before) relevance feedback (as before) query expansion Visualizations to explore semantic context for a query in different LSI vector spaces

DLSI: Experiments with NSF-Movie Review Corpus

Vector Spaces Dimensions Non-stop Terms Documents

NSF-Geology 298 25,963 3,255

NSF-Engineering 229 30,247 3,057

NSF-Biology 224 38,176 3,645

Movie Reviews 239 70,411 3,557

All Documents 282 122,685 13,514

DLSI: The Context of Term Meaning

Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing only NSF geology abstracts.

Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing only Ebert movie reviews.

Graph of semantic relationships between top five terms retrieved for the query {travel, center, earth} from the vector space containing all documents.

center

research earth

reports travel

alien earth

science-fiction/sci-fi travel

cooperative earth

university center/center’s

Graphical Representations of Knowledge and Its Distribution Cliff Behrens Information Analysis...

Documents

Peter Behrens Tipografia

Michael Behrens

Behrens Linguistics 2009

Telcordia Common Language Information … Common Language ® Information Services The Jigsaw Webinar Telcordia Contact: Telcordia Contact: Allen Seidman VP – Common Language Office:

Considering Peter Behrens - MITweb.mit.edu/soa/www/downloads/2010-19... · Considering Peter Behrens ... Remarking on Behrens’ own house at Darmstadt ... Gropius made the surprising

GR-246-CORE: Telcordia Technologies Specification of

BEHRENS v. PELLETIER

1 TELCORDIA PROPRIETARY – INTERNAL USE ONLY See proprietary restrictions on title page. Lets Move E911 Indoors! Mike Loushine & Clifford Behrens Telcordia

PhD Thesis Behrens Final

Behrens pumps

Copyright © 2007 Telcordia Technologies Challenges in Securing Converged Networks Prepared for : Telcordia Contact: John F. Kimmins Executive Director

Behrens Artistry BRIDAL LookBook

Telcordia Notes on Operator Services

07a FYI LIDB_WhitePaper[Telcordia]

Telcordia Technologies Proprietary – Internal Use Only

Contents Telcordia GR-1337 - Documentation Information

Behrens 2007 Supplementary material

DOCUMENTATION PAGE Form Approved - DTIC · (ARMS) Phase II work by the Telcordia team. The Telcordia team consists of Telcordia Technologies as prime contractor and Vanderbilt University

H.P.berlage Si P. Behrens

Copyright © 2007 Telcordia Technologies, Inc. All rights reserved Number Portability Worldwide Implementation Experience Prepared for : Telcordia Contact: