Transcript
Page 1: Fusepool Machine Learning Framework

FusepoolMachine Learning FrameworkJune 25th, Brussels

Page 2: Fusepool Machine Learning Framework

Fusepool

Structured Content

Visualization

Enable personalized software

Page 3: Fusepool Machine Learning Framework

Outline

Introduction to adaptive interfacesSource refinementDocument labelingLink predictionAdaptive layout

Simple Machine Learning: Listen-Update-Predict (LUP)

LUP in detail for document labelling

Predictive Query: Predictive queries

Page 4: Fusepool Machine Learning Framework

Adaptive interfaces

Guillaume Bouchard (Xerox)

Page 5: Fusepool Machine Learning Framework

Customization/Contextualization of interfaces

Known and accepted by big internet companies

Nor easy to implement for SMEs

Page 6: Fusepool Machine Learning Framework

Annotation tools

● To manage large knowledge bases, the is a need for efficient interfaces for annotators

● Web2.0 companies are investigating these tools

● Mixed initiativeo A learning algorithm +

human interface● Remark: a user can be

an annotator for some time

Page 7: Fusepool Machine Learning Framework

Supervised automationIntroduction

ChallengeLOD provides huge amount of dataHard to organize

GoalStreamline KB cleaning and management through implicit and explicit feedback

SpecificationsEasy tagging of documentsNear real-time prediction

Page 8: Fusepool Machine Learning Framework

Adaptive components in Fusepool

Document category prediction

Entity labeling

Source refinement (re-ranking based on previous user clicks)

Adaptive Layout

Page 9: Fusepool Machine Learning Framework

Simple Machine Learning:Listen-Update-Predict (LUP)

Guillaume Bouchard (Xerox)

Page 10: Fusepool Machine Learning Framework

Motivation

● Adaptive systems● Many systems use machine learning algorithms as internal components● The interaction between raw data, annotations, algorithms and predictions is

not simple:• Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume)• Algorithms: multiple possible algorithms for the same task, slow

training/inference• Visualization: must carry the uncertainty about data, annotations and

predictions ●Common problems:• Confusion between predictions and data• Models not automatically updated (manually « re-train » models)• No simple way to test new algorithms• Annotations not shared accross models in the same system• Too few annotations in specific domain (no principled way to gather new

annotations)

Page 11: Fusepool Machine Learning Framework

Prior art• Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008

• https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf• The Agent Learning Pattern: Implementing ML algorithms in multiagent systems

• http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf• Gestalt, a general-purpose integrated development environment designed the application of

machine learning• Kayur Patel (University of Washington)• http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf

• Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer• http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf

• Infer.net: Probabilistic programming. Compilation of machine learning codes• http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf

• Never-Ending Language Learning (NELL). The closest to our work but focused on language• www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf

Page 12: Fusepool Machine Learning Framework

Never Ending Language Learning● ● Intelligent computer agent

● Runs forever. Every day:

1. extract, or read, information from the web

2. learn to perform this task better

● Carlson, Betteridge, Kisiel, Settles, Hruschka and Mitchell (2010) give the design principles for such an agent

Page 13: Fusepool Machine Learning Framework

Machine learning process

Page 14: Fusepool Machine Learning Framework

LUPI Module overview

ListenGets notified when new annotations arrive

UpdateProcess annotation & update learning models

PredictExposes a prediction service available for other components

InvestigateActively ask for new annotations

Page 15: Fusepool Machine Learning Framework

LUP modules are monitored by Fusepool main platform

Page 16: Fusepool Machine Learning Framework

LUP Module Implementation

●LUPEngine in a java interface●Locations: com.xerox.services.LUPEngine

o + getGraphListener(...);o + graphChanged(...);o + updateModels(...);o + predict(...);

Bouchard, Guillaume
define
Page 17: Fusepool Machine Learning Framework
Page 18: Fusepool Machine Learning Framework

Guillaume Bouchard (Xerox)

Page 19: Fusepool Machine Learning Framework

Supervised automationFollow the LUP

ListenUsers give labels to documents in the GUILabels stored in annotation store

UpdateOptimize the model with latest annotationsWarm start machine learning algorithms

PredictReal time prediction based on updated modelVisible in the GUI

Page 20: Fusepool Machine Learning Framework

Supervised automationArchitecture

Components Process

Page 21: Fusepool Machine Learning Framework

Supervised automationXerox web services

Update and prediction using REST interface

Scaling up prediction to huge datasets

Page 22: Fusepool Machine Learning Framework

Listenprivate class MyListener implements GraphListener { public void graphChanged(List<GraphEvent> list) { /** * Listener method: called when matching modifications detected on * the Annostore. This method triggers the Learning process, using * the updateModels(HashMap<String,String> paramas) method. */ annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME); for (GraphEvent e : list) { log.info("New #MyKindOfAnnotation !"); HashMap<String,String> params = new HashMap<String, String>(); // 1.) Accessing the target of the annotation Iterator<Triple> it = annostore.filter(e.getTriple().getSubject(), new UriRef("http://www.w3.org/ns/oa#hasTarget"), null); // 2.) Accessing the content as text of the target // e.g. the new word to insert into the dictionary Resource target = it.next().getObject(); it = annostore.filter((NonLiteral)target, new UriRef("http://www.w3.org/2011/content#chars"), null); String newWord = it.next().getObject().toString(); params.put("newWord", newWord); updateModels(params); } } }

Page 23: Fusepool Machine Learning Framework

Update

public void updateModels(HashMap<String, String> params) { /** * This method updates the learning models. */ String newWord = params.get("newWord"); log.info("Adding " + newWord + " to dictionnary"); myDictionnary.add(newWord); }

Page 24: Fusepool Machine Learning Framework

Predict

HashMap<String,String> params = new HashMap<String,String>(); String docURI = "<http://fusepool.info/doc/pmc/2751467>"; /** * We build the parameters to give it to the L3.4via the predictionHub */ params.put("docURI", docURI); /** * We call the LUP34.predict(...) method via the predictionHub.predict(...) method */ String predictedLabels = predictionHub.predict("LUP34", params); /** * We dump the result of the prediction */ log.info(predictedLabels); /** * "tissue__0.713##sodium__0.09135##English__0.016" */

Page 25: Fusepool Machine Learning Framework

Supervised automationMulti-task learning services

● Better prediction based on multi-task algorithm with label embedding

● Efficient learning algorithmso Alternating optimizationo Stochastic Gradient Descent

● Efficient storage based on Cassandra

Page 26: Fusepool Machine Learning Framework

Supervised automationSequence diagram

1. The GUI insert annotations

2. The Listener calls the LUP3.4 Module

3. The LUP calls the REST API

4. Then the information flows back when doing prediction

Page 27: Fusepool Machine Learning Framework

Supervised automationProperly tested interface

Corpus 20 Newgroups WebKB Cade

Tolerance 1 2 3 1 2 3 1 2

Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222

Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266

Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072

Page 28: Fusepool Machine Learning Framework

Predictive queries

Guillaume Bouchard (Xerox)

Page 29: Fusepool Machine Learning Framework

Motivation for predictive queries

Most of prediction problems can be expressed as a query on “missing” information.

SELECT ?n WHERE<?d, hasLabel, “WellWritten”><?p, isAuthor, ?d><?p, hasName, ?n>

Page 30: Fusepool Machine Learning Framework

Semantic Search APIPredictive SPARQL

Core idea: learn a model on KB Now we can query missing data!● SPARQL is a standard query language for semantic data ● Predictive SPARQL: generalization to probabilistic models

Page 31: Fusepool Machine Learning Framework

Semantic Search APIPredictive SPARQL example

Page 32: Fusepool Machine Learning Framework

Semantic Search APIPredictive model

● Use of tensor factorization methods

● Tensor=generalization of matrices

● Scalable probabilistic models

● Based on Rescal approximation:

Tikj ≈ eiTRk ej

where:o ei and ej are entitieso Rk is the relational matrix

Page 33: Fusepool Machine Learning Framework

Predictive Sparql example

Page 34: Fusepool Machine Learning Framework

Conclusion

Guillaume Bouchard (Xerox)

Page 35: Fusepool Machine Learning Framework

Main achievements

● LUP: Listen-Update-Predict is a design pattern that provide software engineering best practices

● Predictive SPARQL: A framework for predictive queries on RDF data

Page 36: Fusepool Machine Learning Framework

Future of Fusepool

Xerox is using Fusepool for exploring and organizing its customer KB


Recommended