FusepoolMachine Learning FrameworkJune 25th, Brussels
Fusepool
Structured Content
Visualization
Enable personalized software
Outline
Introduction to adaptive interfacesSource refinementDocument labelingLink predictionAdaptive layout
Simple Machine Learning: Listen-Update-Predict (LUP)
LUP in detail for document labelling
Predictive Query: Predictive queries
Adaptive interfaces
Guillaume Bouchard (Xerox)
Customization/Contextualization of interfaces
Known and accepted by big internet companies
Nor easy to implement for SMEs
Annotation tools
● To manage large knowledge bases, the is a need for efficient interfaces for annotators
● Web2.0 companies are investigating these tools
● Mixed initiativeo A learning algorithm +
human interface● Remark: a user can be
an annotator for some time
Supervised automationIntroduction
ChallengeLOD provides huge amount of dataHard to organize
GoalStreamline KB cleaning and management through implicit and explicit feedback
SpecificationsEasy tagging of documentsNear real-time prediction
Adaptive components in Fusepool
Document category prediction
Entity labeling
Source refinement (re-ranking based on previous user clicks)
Adaptive Layout
Simple Machine Learning:Listen-Update-Predict (LUP)
Guillaume Bouchard (Xerox)
Motivation
● Adaptive systems● Many systems use machine learning algorithms as internal components● The interaction between raw data, annotations, algorithms and predictions is
not simple:• Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume)• Algorithms: multiple possible algorithms for the same task, slow
training/inference• Visualization: must carry the uncertainty about data, annotations and
predictions ●Common problems:• Confusion between predictions and data• Models not automatically updated (manually « re-train » models)• No simple way to test new algorithms• Annotations not shared accross models in the same system• Too few annotations in specific domain (no principled way to gather new
annotations)
Prior art• Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008
• https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf• The Agent Learning Pattern: Implementing ML algorithms in multiagent systems
• http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf• Gestalt, a general-purpose integrated development environment designed the application of
machine learning• Kayur Patel (University of Washington)• http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf
• Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer• http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf
• Infer.net: Probabilistic programming. Compilation of machine learning codes• http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf
• Never-Ending Language Learning (NELL). The closest to our work but focused on language• www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf
Never Ending Language Learning● ● Intelligent computer agent
● Runs forever. Every day:
1. extract, or read, information from the web
2. learn to perform this task better
● Carlson, Betteridge, Kisiel, Settles, Hruschka and Mitchell (2010) give the design principles for such an agent
Machine learning process
LUPI Module overview
ListenGets notified when new annotations arrive
UpdateProcess annotation & update learning models
PredictExposes a prediction service available for other components
InvestigateActively ask for new annotations
LUP modules are monitored by Fusepool main platform
LUP Module Implementation
●LUPEngine in a java interface●Locations: com.xerox.services.LUPEngine
o + getGraphListener(...);o + graphChanged(...);o + updateModels(...);o + predict(...);
Guillaume Bouchard (Xerox)
Supervised automationFollow the LUP
ListenUsers give labels to documents in the GUILabels stored in annotation store
UpdateOptimize the model with latest annotationsWarm start machine learning algorithms
PredictReal time prediction based on updated modelVisible in the GUI
Supervised automationArchitecture
Components Process
Supervised automationXerox web services
Update and prediction using REST interface
Scaling up prediction to huge datasets
Listenprivate class MyListener implements GraphListener { public void graphChanged(List<GraphEvent> list) { /** * Listener method: called when matching modifications detected on * the Annostore. This method triggers the Learning process, using * the updateModels(HashMap<String,String> paramas) method. */ annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME); for (GraphEvent e : list) { log.info("New #MyKindOfAnnotation !"); HashMap<String,String> params = new HashMap<String, String>(); // 1.) Accessing the target of the annotation Iterator<Triple> it = annostore.filter(e.getTriple().getSubject(), new UriRef("http://www.w3.org/ns/oa#hasTarget"), null); // 2.) Accessing the content as text of the target // e.g. the new word to insert into the dictionary Resource target = it.next().getObject(); it = annostore.filter((NonLiteral)target, new UriRef("http://www.w3.org/2011/content#chars"), null); String newWord = it.next().getObject().toString(); params.put("newWord", newWord); updateModels(params); } } }
Update
public void updateModels(HashMap<String, String> params) { /** * This method updates the learning models. */ String newWord = params.get("newWord"); log.info("Adding " + newWord + " to dictionnary"); myDictionnary.add(newWord); }
Predict
HashMap<String,String> params = new HashMap<String,String>(); String docURI = "<http://fusepool.info/doc/pmc/2751467>"; /** * We build the parameters to give it to the L3.4via the predictionHub */ params.put("docURI", docURI); /** * We call the LUP34.predict(...) method via the predictionHub.predict(...) method */ String predictedLabels = predictionHub.predict("LUP34", params); /** * We dump the result of the prediction */ log.info(predictedLabels); /** * "tissue__0.713##sodium__0.09135##English__0.016" */
Supervised automationMulti-task learning services
● Better prediction based on multi-task algorithm with label embedding
● Efficient learning algorithmso Alternating optimizationo Stochastic Gradient Descent
● Efficient storage based on Cassandra
Supervised automationSequence diagram
1. The GUI insert annotations
2. The Listener calls the LUP3.4 Module
3. The LUP calls the REST API
4. Then the information flows back when doing prediction
Supervised automationProperly tested interface
Corpus 20 Newgroups WebKB Cade
Tolerance 1 2 3 1 2 3 1 2
Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222
Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266
Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072
Predictive queries
Guillaume Bouchard (Xerox)
Motivation for predictive queries
Most of prediction problems can be expressed as a query on “missing” information.
SELECT ?n WHERE<?d, hasLabel, “WellWritten”><?p, isAuthor, ?d><?p, hasName, ?n>
Semantic Search APIPredictive SPARQL
Core idea: learn a model on KB Now we can query missing data!● SPARQL is a standard query language for semantic data ● Predictive SPARQL: generalization to probabilistic models
Semantic Search APIPredictive SPARQL example
Semantic Search APIPredictive model
● Use of tensor factorization methods
● Tensor=generalization of matrices
● Scalable probabilistic models
● Based on Rescal approximation:
Tikj ≈ eiTRk ej
where:o ei and ej are entitieso Rk is the relational matrix
Predictive Sparql example
Conclusion
Guillaume Bouchard (Xerox)
Main achievements
● LUP: Listen-Update-Predict is a design pattern that provide software engineering best practices
● Predictive SPARQL: A framework for predictive queries on RDF data
Future of Fusepool
Xerox is using Fusepool for exploring and organizing its customer KB