arXiv:1703.03385v1 [cs.LG] 9 Mar 2017 · sure and optionally a mapping from distance to sim-ilarity. Furthermore, many real-world examples re-quire additional steps in the algorithmic

Visual-Interactive Similarity Search for Complex Objectsby Example of Soccer Player Analysis

Jurgen Bernard1, Christian Ritter1, David Sessler1, Matthias Zeppelzauer2, Jorn Kohlhammer1,3, andDieter Fellner1,3

1Technische Universitat Darmstadt, Darmstadt, Germany2St. Polten University of Applied Sciences, St. Polten, Austria

3Fraunhofer Institute for Computer Graphics Research, IGD, Darmstadt, Germany{juergen.bernard, christian.ritter, david.sessler, dieter.fellner}@gris.tu-darmstadt.de, [email protected],

[email protected]

Keywords: Information Visualization, Visual Analytics, Active Learning, Similarity Search, Similarity Learning,Distance Measures, Feature Selection, Complex Data Objects, Soccer Player Analysis, Information Retrieval

Abstract: The definition of similarity is a key prerequisite when analyzing complex data types in data mining, informa-tion retrieval, or machine learning. However, the meaningful definition is often hampered by the complexityof data objects and particularly by different notions of subjective similarity latent in targeted user groups. Tak-ing the example of soccer players, we present a visual-interactive system that learns users’ mental models ofsimilarity. In a visual-interactive interface, users are able to label pairs of soccer players with respect to theirsubjective notion of similarity. Our proposed similarity model automatically learns the respective concept ofsimilarity using an active learning strategy. A visual-interactive retrieval technique is provided to validate themodel and to execute downstream retrieval tasks for soccer player analysis. The applicability of the approachis demonstrated in different evaluation strategies, including usage scenarions and cross-validation tests.

1 INTRODUCTION

The way how similarity of data objects is definedand represented in an analytical system has a deci-sive influence on the results of the algorithmic work-flow for downstream data analysis. From an algorith-mic perspective the notion of object similarity is oftenimplemented with distance measures resembling aninverse relation to similarity. Many data mining ap-proaches necessarily require the definition of distancemeasures, e.g., for conducting clustering or dimen-sion reduction. In the same way, most informationretrieval algorithms carry out indexing and retrievaltasks based on distance measures. Finally, the per-formance of many supervised and unsupervised ma-chine learning methods depends on meaningful defi-nitions of object similarity. The classical approach forthe definition of object similarity includes the identifi-cation, extraction, and selection of relevant attributes(features), as well as the definition of a distance mea-sure and optionally a mapping from distance to sim-ilarity. Furthermore, many real-world examples re-quire additional steps in the algorithmic pipeline suchas data cleansing or normalization. In practice, qual-

ity measures such as precision and recall are used toassess the quality of the similarity models and theclassifiers built upon them.

In this work, we strengthen the connection be-tween the notion of similarity of individual users andits adoption to the algorithmic definition of objectsimilarity. Taking the example of soccer players fromEuropean soccer leagues, a manager may want toidentify previously unknown soccer players match-ing a reference player, e.g., to occupy an importantposition in the team lineup. This is contrasted by anational coach who is also interested in selecting agood team. However, the national coach is indepen-dent from transfer fees and salaries while his choiceis limited to players of the respective nationality. Theexample sheds light on various remaining problemsthat many classical approaches are confronted with.First, in many approaches designers do not know be-forehand which definition of object similarity is mostmeaningful. Second, many real-world approaches re-quire multiple definitions of similarity for being us-able for different users or user groups. Moreover, itis not even determined that the notion of similarityof single users remains constant. Third, the example

arX

iv:1

703.

0338

5v1

[cs

.LG

] 9

Mar

201

7

Figure 1: Overview of the visual-interactive tool. Left: users are enabled to label the similarity between two soccer players(here: Radja Nainggolan and Marouane Fellaini, both from Belgium). The user’s notion of similarity is propagated to thesimilarity learning model. Right: a visual search interface shows the model results (query: Dimitri Payet). The exampleresembles the similarity notion typically for a national trainer: only players from the same country have very high similarityscores. Subsequently, the nearest neighbors for Dimitri Payet are all coming from France and have a similar field position.

of soccer players implicitly indicates that definitionof similarity becomes considerably more difficult forhigh-dimensional data. Finally, many real-word ob-jects consist of mixed data, i.e. attributes of numeri-cal, categorical, and binary type. However, most cur-rent approaches for similarity measurement are lim-ited to numerical attributes.

We hypothesize that it is desirable to shift the def-inition of object similarity from an offline preprocess-ing routine to an integral part of future analysis sys-tems. In this way the individual notions of similarityof different users will be reflected more comprehen-sively. The precondition for the effectiveness of suchan approach is a means that enables users to commu-nicate their notion of similarity to the system. Logi-cally, such a system requires the functionality to graspand adopt the notion of similarity communicated bythe user. Provided that users are able to conduct var-ious data analysis tasks relying on object similarityin a more dynamic and individual manner. This re-quirement shifts the definition of similarity towardsactive learning approaches. Active learning is a re-search field in the area of semi-supervised learningwhere machine learning models are trained with asfew user feedback as possible, learning models thatare as generalizable as possible. Beyond classical ac-tive learning, the research direction of this approachis towards visual-interactive learning allowing usersto give feedback for those objects they have preciseknowledge about.

We present a visual-interactive learning systemthat learns the similarity of complex data objects onthe basis of user feedback. The use case of soccerplayers will serve as a relevant and intuitive example.Overall this paper makes three primary contributions.First, we present a visual-interactive interface that en-ables users to select two soccer players and to submitfeedback regarding their subjective similarity. The setof labeled pairs of players is depicted in a history vi-sualization for lookup and reuse. Second, a machinelearning model accepts the pairwise notions of simi-larity and learns a similarity model for the entire dataset. An active learning model identifies player ob-jects where user feedback would be most beneficialfor the generalization of the learned model, and prop-agates them to a visual-interactive interface. Third,we present a visual-interactive retrieval interface en-abling users to directly submit example soccer playersto query for nearest neighbors. The interface com-bines both validation support as well as a downstreamapplication of model results. The results of differ-ent types of evaluation techniques particularly assessthe efficiency of the approach. In many cases it takesonly five labeled pairs of players to learn a robust andmeaningful model.The remaining paper is organized as follows. Sec-tion 2 shows related work. We present our approachin Section 3. The evaluation results are described inSection 4, followed by a discussion in Section 5 andthe conclusion in Section 6.

This publication appeared in the proceedings of the 8th International Conference on Information VisualizationTheory and Applications (IVAPP 2017) and was rewarded with the Best Paper Award.

2 RELATED WORK

The contributions of this work are based on twocore building blocks, i.e., visual-interactive interfaces(information visualization, visual analytics) and al-gorithmic similarity modeling (metric learning). Weprovide a subsection of related work for both fields.

2.1 Visual-Interactive Instance Labeling

We focus on visual-interactive interfaces allowingusers to submit feedback about the underlying datacollection. In the terminology of the related work, adata element is often referred to as an instance, thefeedback for an instance is called a label. Differenttypes of labels can be gathered to create some sortof learning model. Before we survey existing ap-proaches dealing with similarity in detail, we outlineinspiring techniques supporting other types of labels.

Some techniques for learning similarity metricsare based on rules. The approach of (Fogarty et al.,2008) allows users to create rules for ranking im-ages based on their visual characteristics. The rulesare then used to improve a distance metric for imageretrieval and categorization. Another class of inter-faces facilitates techniques related to interestingnessor relevance feedback strategies, e.g., to improve re-trieval performance (Salton and Buckley, 1997). Onepopular application field is evaluation, e.g,. to askusers which of a set of image candidates is best,with respect to a pre-defined quality criterion (Weberet al., 2016). In the visual analytics domain, relevancefeedback and interestingness-based labeling has beenapplied to learn users’ notions of interest, e.g., toimprove the data analysis process. Behrisch et al.(Behrisch et al., 2014) present a technique for speci-fying features of interest in multiple views of multidi-mensional data. With the user distinguishing relevantfrom irrelevant views, the system deduces the pre-ferred views for further exploration. Seebacher et al.(Seebacher et al., 2016) apply a relevance feedbacktechnique in the domain of patent retrieval, supportinguser-based labeling of relevance scores. Similar to ourapproach, the authors visualize the weight of differentmodalities (attributes/features). The weights are sub-ject to change with respect to the iterative nature ofthe learning approach. In the visual-interactive imageretrieval domain the Pixolution Web interface1 com-bines tag-based and example-based queries to adoptusers’ notions of interestingness. Recently the no-tion of interestingness was adopted to prostate cancerresearch. A visual-interactive user interface enables

1Pixolution, http://demo.pixolution.de, last accessed onSeptember 22th, 2016

physicians to give feedback about the well-being sta-tus of patients (Bernard et al., 2015b). The underly-ing active-learning approach calculates the numericallearning function by means of a regression tree.

Classification tasks require categorical labels forthe available instances. Ware et al. (Ware et al.,2001) present a visual interface enabling users tobuild classifiers in a visual-interactive way. The ap-proach works well for few and well-known attributes,but requires labeled data sets for learning classifiers.Seifert and Granitzer’s (Seifert and Granitzer, 2010)approach outlines user-based selection and labelingof instances as meaningful extension of classical ac-tive learning strategies (Settles, 2009). The authorspoint towards the potential of combining active learn-ing strategies with information visualization whichwe adopt for both the representation of instances andlearned model results. Hoferlin et al. (Hoferlinet al., 2012) define interactive learning as an exten-sion, which includes the direct manipulation of theclassifier and the selection of instances. The applica-tion focus is on building ad-hoc classifiers for visualvideo analytics. Heimerl at al. (Heimerl et al., 2012)propose an active learning approach for learning clas-sifiers for text documents. Their approach includesthree user interfaces: basic active learning, visualiza-tion of instances along the classifier boundary, and in-teractive instance selection. Similar to our approachthe classification-based visual analytics tool by Janet-zko et al. (Janetzko et al., 2014) also applies to thesoccer application domain. In contrast to our appli-cation goal, the approach supports building classifiersfor interesting events in soccer games by taking user-defined training data into consideration.

User-defined labels for relevance feedback, inter-estingness, or class assignment share the idea to binda single label to an instance, reflecting the classi-cal machine learning approach ( f (i) = y). However,functions for learning the concept of similarity requirea label representing the relation of pairs or groupsof instances, e.g., in our case, f (i1, i2) = y, wherey represents a similarity score in this case. Visual-interactive user interfaces supporting such learningfunctions have to deal with this additional complex-ity. A workaround strategy often applied for the vali-dation of information retrieval results shows multiplecandidates and asks the user for the most similar in-stances with respect to a given query. We neglect thisapproach since our users do not necessarily have theknowledge to give feedback for any query instancesuggested by the system. Rather, we follow a user-centered strategy where users themselves have an in-fluence on the selection of pairs of instances.

Another way to avoid complex learning functions


is allowing users to explicitly assign weights to theattributes of the data set (Ware et al., 2001; Jeonget al., 2009). The drawback of this strategy is the ne-cessity of users knowing the attribute space in its en-tirety. Especially when sophisticated descriptors areapplied for the extraction of features (e.g., Fourier co-efficients) or deep learned features, explicit weightingof individual features is inconceivable. Rather, ourapproach applies an implicit attribute learning strat-egy. While the similarity model indeed uses weightedattributes for calculating distances between instances(see Section 2.2), an algorithmic model derives at-tribute weights based on the user feedback at object-level. We conclude with a visual-interactive feedbackinterface where users are enabled to align small sets ofinstances on a two-dimensional arrangement (Bernardet al., 2014). The relative pairwise distances betweenthe instances are then used by the similarity model.We neglect strategies for arranging small sets of morethan two instances in 2D since we explicitly want toinclude categorical and boolean attributes. It has beenshown that the interpretation of relative distances forcategorical data is non-trivial (Sessler et al., 2014).

2.2 Similarity Modeling

Aside from methods that employ visual interactive in-terfaces for learning the similarity between objectsfrom user input as presented in the previous sec-tion, methods for the autonomous learning of sim-ilarity relations have been introduced (Kulis, 2012;Bellet et al., 2013). Human similarity perception isa psychologically complex process which is difficultto formalize and model mathematically. It has beenshown previously that the human perception of simi-larity does not follow the rules of mathematical met-rics, such as identity, symmetry and transitivity (Tver-sky, 1977). Nevertheless, today most approaches em-ploy distance metrics to approximate similarity esti-mates between two items (e.g., objects, images, etc.).Common distance metrics are Euclidean distance (L2distance) and Manhattan distance (L1 distance) (Yuet al., 2008), as well as warping or edit distance met-rics. The edit distance was, e.g., applied to the soccerdomain in a search system where users can sketch tra-jectories of player movement (Shao et al., 2016).

To better take human perception into account andto better adapt the distance metric to the underly-ing data and task an increasingly popular approachis to learn similarity or distance measures from data(metric learning). For this purpose different strategieshave been developed.

In linear metric learning the general idea is toadapt a given distance function (e.g., a Mahalanobis-

like distance function) to the given task by esti-mating its parameters from underlying training dataThe learning is usually performed in a supervised orweakly-supervised fashion by providing ground truthin the form of (i) examples of similar and dissimi-lar items (positive and negative examples), (ii) con-tinuous similarity assessments for pairs of items (e.g.,provided by a human) and (iii) triplets of items withrelative constraints, such as A is more similar to Band C (Bellet et al., 2013; Xing et al., 2003). Duringtraining the goal is to find parameters of the selectedmetric that maximizes the agreement between the dis-tance estimates and the ground truth, i.e., by minimiz-ing a loss function that measures the differences to theground truth. The learned metric can then be used tobetter cluster the data or to improve the classificationperformance in supervised learning.

Similar to these approaches, we also apply a lin-ear model. Instead of learning the distance metric di-rectly, we estimate the Pearson correlation betweenthe attributes and the provided similarity assessments.In this way, the approach is applicable even to smallsets of labeled pairs of instances. The weights ex-plicitly model the importance of each attribute and,as a by-product, enable the selection of the most im-portant features for downstream approaches. To fa-cilitate the full potential, we apply weighted distancemeasures for internal similarity calculations, includ-ing measures for categorical (Boriah et al., 2008) andboolean (Cha et al., 2005) attributes.

In non-linear metric learning, one approach is tolearn similarity (kernels) directly without explicitlyselecting a distance metric. The advantage of kernel-based approaches is that non-linear distance relation-ships can be modeled more easily. For this purposethe data is first transformed by a non-linear kernelfunction. Subsequently, non-linear distance estimatescan be realized by applying linear distance measure-ments in the transformed non-linear space (Abbas-nejad et al., 2012; Torresani and Lee, 2006). Otherauthors propose multiple kernel learning, which is aparametric approach that tries to learn a combinationof predefined base kernels for a given task (Gonenand Alpaydın, 2011). Another group of non-linear ap-proaches employs neural networks to learn a similar-ity function (Norouzi et al., 2012; Salakhutdinov andHinton, 2007). This approach has gained increasingimportance due to the recent success of deep learn-ing architectures (Chopra et al., 2005; Zagoruyko andKomodakis, 2015; Bell and Bala, 2015). The majordrawback of these methods is that they require hugeamounts of labeled instances for training which is notavailable in our case.

The above methods have in common that the


Attribute Description Variable Type Value Domain QualityIssues

Name Name of the soccer player, unique identifier Nominal (String) Alphabet of names perfect

Description Abstract of a player - for tooltips Nominal (String) Full text good

Nationality Nation of the player Nominal (String) 103 countries perfect

National Team National team, if applicable. Can be a youth team. Nominal (String) 155 nat. teams sparse

Birthday Day of birth (dd.mm.yyyy) Date perfect

Birthplace Place of birth Nominal (String) Alphabet of cities good

Size Size of the player in meters Numerical [1.59,2.03] good

Current Team Team of the player (end of last season) Nominal (String) Alphabet of teams perfect

Main Position Main position on the field Nominal (String) 13 positions perfect

Other positions Other positions on the field Nominal (String) 13 positions sparse, list

League Games No. of games played in the current soccer league Numerical [0,591] sparse

League Goals No. of goals scored in the current soccer league Numerical [0,289] sparse

Nat. Games No. of games played for the current nat. team Numerical [0,150] sparse

Nat. Goals No. of goals scored for the current nat. team Numerical [0,71] sparse

Table 1: Overview of primary data attributes about soccer players retrieved from DBpedia with SPARQL.

learned metric is applied globally to all instances inthe dataset. An alternative approach is local metriclearning that learns specific similarity measures forsubsets of the data or even separate measures for eachdata item (Frome et al., 2007; Weinberger and Saul,2009; Noh et al., 2010). Such approaches have advan-tages especially when the underlying data has hetero-geneous characteristics. A related approach are per-exemplar classifiers which even allow to select dif-ferent features (attributes) and distance measures foreach item. Per-exemplar classification has been ap-plied successfully for different tasks in computer vi-sion (Malisiewicz et al., 2011). While our proposedapproach to similarity modeling operates in a globalmanner, our active learning approach exploits localcharacteristics of the feature space by analyzing thedensity of labeled instances in different regions formaking suggestions to the user.

The approaches above mostly require largeamounts of data as well as ground truth in terms ofpairs or triplets of labeled instances. Furthermore,they rely on numerical data (or at least non-mixeddata) as input. We propose an approach for met-ric learning for unlabeled data (without any groundtruth) with mixed data types (categorial, binary, andnumerical), which is also applicable to small datasetsand data sets with initially no labeled instances. Forthis purpose, we combine metric learning with activelearning (Yang et al., 2007) and embed it in an inter-active visualization system for immediate feedback.Our approach allows the generation of useful distancemetrics from a small number of user inputs.

3 APPROACH

An overview of the visual-interactive system isshown in Figure 1. Figure 2 illustrates the interplay ofthe technical components assembled to a workflow. InSections 3.2, 3.3, and 3.4, we describe the three corecomponents in detail, after we discuss data character-istics and abstractions in Section 3.1.

3.1 Data Characterization

3.1.1 Data Source

Various web references provide data about soccerplayers with information differing in its scope anddepth. For example some websites offer informationabout market price values or sports betting statistics,while other sources provide statistics about pass accu-racy in every detail. Our prior requirement to the datais its public availability to guarantee the reproducibil-ity of our experiments. In addition, the informationabout players should be comprehensible for broad au-diences and demonstrate the applicability. Finally,the attributes should be of mixed types (numerical,categorical, boolean). This is why Wikipedia2 thefree encyclopedia serves as our primary data source.The structured information about players presentedat Wikipedia is retrieved from DBpedia3. We ac-

2Wikipedia, https://en.wikipedia.org/wiki/Main Page,last accessed on September 22th, 2016

3DBpedia, http://wiki.dbpedia.org/about, last accessedon September 22th, 2016


Attribute Description Variable Type Value Domain Quality

Age Age of the player (end of last season) Numerical (int) [16-43] perfect

National Player Whether the player has played as a national player Boolean [false, true] sparse

Nat. games p.a. Average number of national games per year Numerical [0.0,24.0] sparse

Nat. goals per game Average number of goals per national game Numerical < 3.0 sparse

League games p.a. Average number of league games per year Numerical [0.0,73.5] sparse

League goals p. game Average number of goals per league game Numerical < 3.0 sparse

Position Vertical The aggregated positions of the player as y-Coordinate (from “Keeper” to “Striker”)

Numerical [0.0,1.0] perfect

Position Horizontal The aggregated positions of the player as x-Coordinate (from left to right)

Numerical [0.0,1.0] perfect

Main Position LR The horizontal position of the player as String Nominal {left, center, right} perfect

Table 2: Overview of secondary data attributes about soccer players deduced from the primary data.

cess DBpedia with SPARQL (Prud’hommeaux andSeaborne, 2008) the query language for RDF4 recom-mended by W3C5. We focused on the Europe’s fivetop leagues (Premier League in England, Seria A inItaly, Ligue 1 in France, Bundesliga in Germany, andLaLiga in Spain). Overall, we gathered 2,613 playersengaged by the teams of respective leagues.

3.1.2 Data Abstraction

Table 1 provides an overview of the available infor-mation about soccer players. Important attributes forthe player (re-)identification are the unique name incombination with the nationality and the current team.Moreover, a various numerical and categorical infor-mation is provided for similarity modeling.

Table 2 depicts the secondary data (i.e., attributesdeduced from primary data). Our strategy for the ex-traction of additional information is to obtain as muchmeaningful attributes as possible. One benefit of ourapproach will be a weighing of all involved attributes,making the selection of relevant features for down-stream analyses an easy task. This strategy is inspiredby user-centered design approaches in different appli-cation domains where we asked domain experts aboutthe importance of attributes (features) for the similar-ity definition process (Bernard et al., 2013; Bernardet al., 2015a). One of the common responses was “ev-erything can be important!” In the usage scenarios,we demonstrate how the similarity model will weightthe importance of primary and secondary attributeswith respect to the learned pairs of labeled players.

4RDF, https://www.w3.org/RDF, last accessed onSeptember 22th, 2016

5RDF, https://www.w3.org, last accessed on September22th, 2016

3.1.3 Preprocessing

One of the data-centered challenges was the sparsityof some data attributes. This phenomenon can of-ten be observed when querying less popular instancesof concepts from DBpedia. To tackle this challengewe removed attributes and instances from the data setcontaining only little information. Remaining miss-ing values were marked with missing value indica-tors, with respect to the type of attribute. For illus-tration purposes, we also removed players without animage in Wikipedia. The final data set consists of1,172 players. An important step in the preprocessingpipeline is normalization. By default, we rescaled ev-ery numerical attribute into the relative value domainto foster metrical comparability.

3.2 Visual-Interactive LearningInterface

One of the primary views of the approach enablesusers to give feedback about individual instances. Ex-amples of the feedback interface can be seen in theFigures 1 and 3. The interface for the definition ofsimilarity between soccer players shows two playersin combination with a slider control in between. Theslider allows the communication of similarity scoresbetween the two players. We decided for a quasi-continuous slider, in accordance to the continuousnumerical function to be learned. However, one ofpossible design alternative would propose a feedbackcontrol with discrete levels of similarity.

Every player is represented with an image (whenavailable and permitted), a flag icon showing theplayer’s nationality, as well as textual labels for theplayer name and the current team. These four at-tributes are also used for compact representationsof players in other views of the tool. The visual


FeedbackInterpreter

ActiveLearningSupport

SimilarityLearner

Similarity ModelData

Feedback Interface Result Visualization

Use Case:k-NearestNeighbors

Fro

nte

nd

Bac

ken

d

Figure 2: Workflow of the approach. Users assign similarity scores for pairs of players in the feedback interface. In thebackend, the feedback is interpreted (blue) and delegated to the similarity model. Active learning support suggests players toimprove the model (orange). A kNN-search supports the use case of the workflow shown in the result visualization (purple).

metaphor of a soccer field represents the players’main positions. In addition, a list-based view providesthe details about the players’ attribute values.

The feedback interface combines three additionalfunctionalities most relevant for the visual-interactivelearning approach.First, users need to be able to define and select play-ers of interest. This supports the idea to grasp detailedfeedback about instances matching the users’ expertknowledge (Seifert and Granitzer, 2010; Hoferlinet al., 2012). For this purpose a textual query in-terface is provided in combination with a comboboxshowing players matching a user-defined query. Inthis way, we combine query-by-sketch and the query-by-example paradigm for the straightforward lookupof known players.The second ingredient for an effective active learn-ing approach is the propagation of instances to theuser reducing the remaining model uncertainty. Onecrucial design decision determined that users shouldalways be able to label players they actually know.Thus, we created a solution for the candidate selec-tion combining automated suggestions by the modelwith the preference of users. The feedback interfaceprovides two sets of candidate players, one set is lo-cated at the left and the other one right of the interface.Replacing the left feedback instance with one of thesuggested players at the left will reduce the remaining

entropy regarding the current instance at the right, andvice-versa. However, we are aware that other strate-gies for proposing unlabeled instances exist. Two ofthe obvious alternative strategies for labeling playerswould be a) providing a global pool of unlabeled play-ers (e.g., in combination with drag-and-drop) or b) of-fering pairs of instances with low confidence. Whilethese two strategies may be implemented in alterna-tive designs, we recall the design decision that usersneed to know the instances to be labeled. In this way,we combine a classical active learning paradigm withthe user-defined selection of players matching theirexpert knowledge.Finally, the interface provides a history functionalityfor labeled pairs of players at the bottom of the feed-back view (cf. Figure 1). For every pair of playersimages are shown and the assigned similarity score isdepicted in the center.

3.3 Similarity Modeling

3.3.1 Similarity Learning

The visual-interactive learning interface providesfeedback about the similarity of pairs of instances.Thus, the feedback propagated to the system is ac-cording to the learning function f (i1, i2) = y whereasy is a numerical value between 0 (unsimilar) and 1


Figure 3: Similarity model learned with stars in the European soccer scene. The history provides an overview of ten labeledpairs of players. The ranking of weighted attributes assigns high correlations to the vertical position, player size, and nationalgames. Karim Benzema served as the query player: all retrieved players share quite similar attributes with Benzema.

(very similar). Similarity learning is designed as atwo-step approach. First, every attribute (feature) ofthe data set is correlated with the user feedback. Sec-ond, pairwise distances are calculated for any giveninstance of the data set.

The correlation of attributes is estimated withPearson’s correlation coefficient. Pairwise distancesbetween categorical attributes are transformed intothe numerical space with the Kronecker delta func-tion. The correlation for a given attribute is then es-timated between the labeled pairs of instances pro-vided by the user and the distance in the value do-main obtained by that attribute. In the current stateof the approach every attribute is correlated indepen-dently to reduce computation time and to maximizeinterpretability of the resulting weights. The result ofthis first step of the learning model is a weighting ofthe attributes that is proportional to the correlation.

In a second step, the learning model calculates dis-tances between any pair of instances. As the under-lying data may consist of mixed attribute types, dif-ferent distance measures are used for different typesof attributes. For numerical data we employ the(weighted) Euclidean distance. For categorical at-tributes we choose the Goodall distance (Boriah et al.,2008) since it is sensitive to the probability of attributevalues and less frequent observations are assignedhigher scores. The weighted Jaccard distance (Chaet al., 2005) is used for binary attributes. The Jac-card distance neglects negative matches (both inputsfalse), which might be advantageous for many simi-larity concepts, i.e. the absence of an attribute in two

items does not add to their similarity (Sneath et al.,1973). After all distance measures have been com-puted in separate distance matrices all matrices arecondensed into a single distance matrix by a weightedsum, where the weights represent the fraction of thesum of weights for each attribute type.

3.3.2 Active Learning Strategy

We follow an interactive learning strategy that allowsfor keeping the user in the loop. To support the it-erative nature, we designed an active learning strat-egy that fosters user input for instances for whichno or little information is available yet. As a start-ing point for active learning the user selects a knowninstance from the database. Note that this is impor-tant as the user needs a certain amount of knowledgeabout the instance to make similarity assessments inthe following (see Section 3.2). After an instancehas been selected, we identify the attribute with thehighest weight. Next, we estimate the farthest neigh-bors to the selected item under the given attribute forwhich no similarity assessments exist so far. A setof respective candidates is then presented to the user.This strategy is useful as it identifies pairs of itemsfor which the system cannot make assumptions so far.The user can now select one or more proposed itemsand add similarity assessments. By adding assess-ments the coverage of the attribute space is success-fully improved especially in sparse areas where littleinformation was available so far.


Figure 4: Nearest neighbor search for Lionel Messi. Even ifsuperstars are difficult to replace, the set of provided nearestneighbors is quite reasonable. Keisuke Honda may be sur-prising, nevertheless Honda has similar performance valuesin the national team of Japan.

3.3.3 Model Visualization

Visualizing the output of algorithmic models is cru-cial, e.g., to execute downstream analysis tasks (seeSection 3.4). In addition, we visualize the currentstate of the model itself. In this way, designers andexperienced users can keep track of the model im-provement, its quality improvement, and its determin-ism. The core black-box information of this two-steplearning approach is the set of attribute weights repre-senting the correlation between attributes and labeledpairs of instances. Halfway right in the tool, we makethe attribute weights explicit (between the feedbackinterface and the model result visualization), as it canbe seen in the title figure. An enlarged version of themodel visualization is shown at the left of Figure 4where the model is used to execute a kNN search forLionel Messi. From top to bottom the list of attributesis ranked in the order of their weights. It is a reason-able point of discussion whether the attribute weightsshould be visualized to the final group of users ofsuch a system. A positive argument (especially in thisscientific context) is the transparency of the systemwhich raises trust and allows the visual validation.However, a counter argument is biasing users with in-formation about the attribute/feature space. Recallingthat especially in complex feature spaces users do notnecessarily know any attribute in detail, it may be avalid design decision to exclude the model visualiza-tion from the visual-interactive system.

3.4 Result Visualization –Visual-Interactive NN Search

We provide a visual-interactive interface for the visu-alization of the model output (see Figure 4). A pop-ular use case regarding soccer players is the identifi-cation of similar players for a reference player, e.g.,when a player is replaced in a team due to an upcom-ing transfer event. Thus, the interface of the result vi-sualization will provide a means to query for similarsoccer players. We combine a query interface (query-by-sketch, query-by-example) with a list-based visu-alization of retrieved players. The retrieval algorithmis based on a standard k-NN search (k nearest neigh-bors) using the model output. For every list element ofthe result set a reduced visual representation of a soc-cer player is depicted, including the player’s image,name, nationality, position, and team. Moreover, weshow information about three attributes contributingto the current similarity model significantly. Finally,we depict rank information as well as the distance tothe query for ever element. The result visualizationrounds up the functionality. Users can train individ-ual similarity models of soccer player similarity andsubsequently perform retrieval tasks. From a moretechnical perspective, the result visualization closesthe feedback loop of the visual-interactive learningapproach. In this connection, users can analyze re-trieved sets of players and give additional feedbackfor weakly learned instances. An example can be seenin Figure 4 showing a retrieved result set for LionelMessi used as an example query.

4 EVALUATION

Providing scientific proof for this research effortis non-trivial since the number of comparable ap-proaches is scarce. Moreover, we address the chal-lenge of dealing with data sets which are completelyunlabeled at start, making classical quantitative eval-uations with ground truth test data impossible.In the following, we demonstrate and validate the ap-plicability of the approach with different strategies. Ina first proof of concept scenario a similarity model istrained for an explicitly known mental model, answer-ing the question whether the similarity model will beable to capture a human’s notion of similarity. Sec-ond, we assess the effectiveness of the approach intwo usage scenarios. We demonstrate how the toolcan be used to learn different similarity models, e.g.,to replace a player in the team by a set of relevantcandidates. Finally, we report on experiments for thequantification of model efficiency.


Figure 5: Experiment with a mental model based on teamsand player age. It can be seen that for ten labeled pairs ofplayers the system is able to grasp this mental model.

4.1 Proof of Concept - Fixed MentalModel

The first experiment assesses whether the similaritymodel of the system is able to grasp the mental sim-ilarity model of a user. As an additional constraint,we limit the number of labeled pairs to ten, repre-senting the requirement of very fast model learning.As a proof of concept, we predefine a mental modeland express it with ten labels. In particular, we sim-ulate a fictitious user who is only interested in theage of players, as well as their current team. In otherwords, a numerical and a categorical attribute definesthe mental similarity model of the experiment. Iftwo players are likely identical with respect to thesetwo attributes (age +-1) the user assigns the similarityscore 1.0. If only one of the two attributes match, theuser feedback is 0.5 and if both attributes disagree apair of players is labeled with the similarity score 0.0.The ten pairs of players used for the experiment areshown in Figure 5. In addition to the labeled pairs,the final attribute weights calculated by the systemare depicted. Three insights can be identified. First,it becomes apparent that the two attributes with thehighest weights exactly match the pre-defined men-tal model. Second, the number of national games, thenumber of national games per year, and the numberof league games also received weights. Third, the setof remaining attributes received zero weights. Whilethe first insight validates the experiment, the secondinsight sheds light on attributes correlated with themental model. As an example, we hypothesize thatthe age of players is correlated with the number of

games. This is a beneficial starting point for down-stream feature selection tasks, e.g., when the model isto be implemented as a static similarity function. Fi-nally, the absence of weights for most other attributesdemonstrates that only few labels are needed to obtaina precise focus on relevant attributes.

4.2 Usage Scenario 1 - Top Leagues

The following usage scenario demonstrates the effec-tiveness of the approach. A user with much experi-ence in Europe’s top leagues (Premier League, Se-ria A, Ligue 1, Bundesliga, LaLiga) rates ten pairsof prominent players with similarity values from veryhigh to very low. The state of the system after ten la-beling iterations can be seen in Figure 3. The historyview shows that high similarity scores are assigned tothe pairs: Mats Hummels vs. David Luiz, Luca Tonivs. Claudio Pizarro, Dani Alves vs. Juan Bernat, ToniKroos vs. Xabi Alonso, Eden Hazard vs. David Silva,and Jamie Vardi vs. Marco Reus. Comparativelylow similarity values are assigned to the pairs PhilippLahm vs. Ron-Robert Zieler, Sandro Wagner cs. Bas-tian Oczipka, Manuel Neuer vs. Marcel Heller, andRobert Huth vs. Robert Lewandowski. The set ofplayer instances resembles a vast spectrum of nation-alities, ages, positions, as well as numbers of gamesand goals. The resulting leaning model depicts highweights to the vertical position on the field, the playersize, the national games per year. Further, being a na-tional player and the goals for the respective nationalteams contribute to the global notion of player simi-larity. In the result visualization, the user chose KarimBenzema for the nearest neighbor search. The resultset (Edison Cavani, Robert Lewandowski, SalomonKalou, Claudio Pizarro, Klass-Jan Huntelaar, Pierre-Emerick Aubameyang) represents the learned modelquite well. All these players are strikers, in a similarage, and very successful in their national teams. Incontrast, there is not a single player listed in the resultwho does not stick to the described notion of similar-ity. In summary, this usage scenario demonstrates thatthe tool was able to reflect the notion of similarity ofthe user with a very low number of training instances.

4.3 Usage Scenario 2 - National Trainer

In this usage scenario, we envision to be a nationaltrainer. Our goal is to engage a similarity modelwhich especially resembles the quality of soccer play-ers, but additionally takes the nationality of playersinto account. As a result, similar players coming fromthe same country are classified similar. The reason issimple; players from foreign countries cannot be po-


sitioned in a national team. Figure 1 shows the his-tory of the ten labeled pairs of players. We assignedsimilar scores to two midfielders from Belgium, twostrikers from France, two defenders from Germany,two goalkeepers from Germany, and two strikers fromBelgium. In addition, four pairs of players with dif-ferent nationalities were assigned with considerablylower similarity scores, even if the players play atvery similar positions. The result visualization showsthe search result for Dimitri Payet who was used as aquery player. In this usage scenario, the result can beused to investigate alternative players for the Frenchnational team. With only ten labels, the algorithm re-trieves exclusively players from France, all having ex-pertise in the national team, and all sharing DimitriPayets position (offensive midfielder).

4.4 Quantification of Efficiency

In the final evaluation strategy, we conduct an exper-iment to yield quantitative results for the efficiency.We assess the ‘speed’ of the convergence of the at-tribute weighting for a given mental model, i.e. howmany learning iterations the model needs to achievestable attribute weights. The independent variableof the experiment is the number of learned itera-tions, i.e., the number of instances already learned bythe similarity model. The dependent variable is thechange of the attribute weights of the similarity modelbetween two learning iterations, assessed by the quan-titative variable ∆w. To avoid other degrees of free-dom, we fix the mental model used in the experiment.For this purpose, a small group of colleagues all hav-ing an interest in soccer defined labels of similarityfor 50 pairs of players.To guarantee robustness and generalizability, werun the experiment 100 times. Inspired by cross-validation, the set of training instances is permuted inevery run. The result is depicted in Figure 6. Ob-viously, the most substantial difference of attributeweights is in the beginning of the learning processbetween the 1st and the 2nd learning iteration (∆w =0.36). In the following, the differences significantlydecrease before reaching a saturation point approxi-mately after the 5th iteration. For the 6th and laterlearning iterations ∆w is already below 0.1 and 0.03after the 30th iteration. To summarize, the approachonly requires very few labeled instances to produce arobust learning model. This is particularly beneficialwhen users have very limited time, e.g., important ex-perts in the respective application field.

5 DISCUSSION

In the evaluation section, we demonstrated theapplicability of the approach from different perspec-tives. However, we want to shed light on aspects thatallow alternative design decisions, or may be benefi-cial subjects to further investigation.

Similarity vs. Distance Distance measures are usu-ally applied to approximate similarity relationships.This is also the case in our work. We are, how-ever, aware that metric distances can in general not bemapped directly to similarities, especially when thedimension of the data becomes high and the pointsin the feature space move far away from each other.Finding suitable mappings between distance and sim-ilarity is a challenging topic that we will focus on infuture research.

Active Learning Strategy The active learning sup-port of this approach builds on the importance(weights) of attributes to suggest new learning in-stances to be queried. Thus, we focus on a scalablesolution that takes the current state of the model intoaccount and binds suggestions to previously-labeledinstances. Alternative strategies may involve other in-trinsic aspects of the data (attributes or instances) orthe model itself. For example statistical data analy-sis, distributions of value domains, or correlation testscould be considered. Other active learning strate-gies may be inspired by classification approaches, i.e.,models learning categorical label information. Con-crete classes of strategies involve uncertainty sam-pling or query by committee (Settles, 2009).

Numerical vs. Categorical This research effort ex-plicitly addressed a complex data object with mixeddata, i.e., objects characterized by numerical, categor-ical, and boolean attributes. This class of objects iswidespread in the real-world, and we argue that it isworth to address this additional analytical complexity.However, coping with mixed data can benefit from amore in-depth investigation at different steps of thealgorithmic pipeline.

Usability We presented a technique that actuallyworks but has not been throughoutly evaluated withusers. Will users be able to interact with the sys-tem? We did cognitive walkthroughs and created thedesigns in a highly interactive manner. Still, the ques-tion arises whether domain experts will appreciate thetool and be able to work with it in an intuitve way.


Figure 6: Quantification of efficiency. The experiment shows differences in the attribute weighting between consecutivelearning iterations. A saturation point can be identified, approximately after the 5th labeled pair of instances.

6 CONCLUSION

We presented a tool for the visual-interactive sim-ilarity search for complex data objects by exampleof soccer players. The approach combines principlesfrom active learning, information visualization, visualanalytics, and interactive information retrieval. An al-gorithmic workflow accepts labels for instances andcreates a model reflecting the similarity expressed bythe user. Complex objects including numerical, cate-gorical, and boolean attribute types can be includedin the algorithmic workflow. Visual-interactive in-terfaces ease the labeling process for users, depictthe model state, and represent output of the similar-ity model. The latter is implemented by means ofan interactive information retrieval technique. Whilethe strategy to combine active learning with visual-interactive interfaces enabling users to label instancesof interest is special, the application by exampleof soccer players is, to the best of our knowledge,unique. Domain experts are enabled to express expertknowledge about similar players, and utilize learnedmodels to retrieve similar soccer players. We demon-strated that only very few labels are needed to trainmeaningful and robust similarity models, even if thedata set was unlabeled at start.

Future work will include additional attributesabout soccer players, e.g., market values or variablesassessing the individual player performance. In ad-dition, it would be interesting to widen the scope andthe strategy to other domains, e.g., in design study ap-proaches. Finally, the performance of individual partsof the algorithmic workflow may be tested against de-sign alternatives in future experiments.

REFERENCES

Abbasnejad, M. E., Ramachandram, D., and Mandava, R.(2012). A survey of the state of the art in learn-ing the kernels. Knowledge and Information Systems,31(2):193–221.

Behrisch, M., Korkmaz, F., Shao, L., and Schreck, T.(2014). Feedback-driven interactive exploration oflarge multidimensional data supported by visual clas-sifier. In IEEE Visual Analytics Science and Technol-ogy (VAST), pages 43–52.

Bell, S. and Bala, K. (2015). Learning visual similarity forproduct design with convolutional neural networks.ACM Transactions on Graphics (TOG), 34(4):98.

Bellet, A., Habrard, A., and Sebban, M. (2013). A surveyon metric learning for feature vectors and structureddata. arXiv preprint arXiv:1306.6709.

Bernard, J., Daberkow, D., Fellner, D. W., Fischer, K., Koe-pler, O., Kohlhammer, J., Runnwerth, M., Ruppert, T.,Schreck, T., and Sens, I. (2015a). Visinfo: A digitallibrary system for time series research data based onexploratory search – a user-centered design approach.Int. Journal on Digital Libraries, 16(1):37–59.

Bernard, J., Sessler, D., Bannach, A., May, T., andKohlhammer, J. (2015b). A visual active learningsystem for the assessment of patient well-being inprostate cancer research. In IEEE VIS WS on VisualAnalytics in Healthcare (VAHC), pages 1–8. ACM.

Bernard, J., Sessler, D., Ruppert, T., Davey, J., Kuijper,A., and Kohlhammer, J. (2014). User-based visual-interactive similarity definition for mixed data objects-concept and first implementation. In Proceedings ofWSCG, volume 22. Eurographics Association.

Bernard, J., Wilhelm, N., Kruger, B., May, T., Schreck,T., and Kohlhammer, J. (2013). Motionexplorer:Exploratory search in human motion capture databased on hierarchical aggregation. IEEE Transactions


on Visualization and Computer Graphics (TVCG),19(12):2257–2266.

Boriah, S., Chandola, V., and Kumar, V. (2008). Simi-larity measures for categorical data: A comparativeevaluation. International Conference on Data Mining(SIAM), 30(2):3.

Cha, S.-H., Yoon, S., and Tappert, C. C. (2005). Enhancingbinary feature vector similarity measures.

Chopra, S., Hadsell, R., and LeCun, Y. (2005). Learninga similarity metric discriminatively, with applicationto face verification. In Computer Vision and PatternRecognition (CVPR), pages 539–546. IEEE.

Fogarty, J., Tan, D., Kapoor, A., and Winder, S. (2008).Cueflik: Interactive concept learning in image search.In SIGCHI Conference on Human Factors in Comput-ing Systems (CHI), pages 29–38. ACM.

Frome, A., Singer, Y., Sha, F., and Malik, J. (2007). Learn-ing globally-consistent local distance functions forshape-based image retrieval and classification. InConference on Computer Vision, pages 1–8. IEEE.

Gonen, M. and Alpaydın, E. (2011). Multiple kernel learn-ing algorithms. Journal of Machine Learning Re-search, 12(Jul):2211–2268.

Heimerl, F., Koch, S., Bosch, H., and Ertl, T. (2012). Visualclassifier training for text document retrieval. IEEETransactions on Visualization and Computer Graph-ics (TVCG), 18(12):2839–2848.

Hoferlin, B., Netzel, R., Hoferlin, M., Weiskopf, D., andHeidemann, G. (2012). Inter-active learning of ad-hocclassifiers for video visual analytics. In IEEE VisualAnalytics Sc. and Technology (VAST), pages 23–32.

Janetzko, H., Sacha, D., Stein, M., Schreck, T., Keim, D. A.,and Deussen, O. (2014). Feature-driven visual analyt-ics of soccer data. In IEEE Visual Analytics Scienceand Technology (VAST), pages 13–22.

Jeong, D. H., Ziemkiewicz, C., Fisher, B., Ribarsky, W.,and Chang, R. (2009). iPCA: An Interactive Systemfor PCA-based Visual Analytics. In Computer Graph-ics Forum (CGF), volume 28, pages 767–774. Euro-graphics.

Kulis, B. (2012). Metric learning: A survey. Foundationsand Trends in Machine Learning, 5(4):287–364.

Malisiewicz, T., Gupta, A., and Efros, A. A. (2011). Ensem-ble of exemplar-svms for object detection and beyond.In Conf. on Computer Vision, pages 89–96. IEEE.

Noh, Y.-K., Zhang, B.-T., and Lee, D. D. (2010). Genera-tive local metric learning for nearest neighbor classifi-cation. In Advances in Neural Information ProcessingSystems, pages 1822–1830.

Norouzi, M., Fleet, D. J., and Salakhutdinov, R. R. (2012).Hamming distance metric learning. In Adv. in neuralinformation processing systems, pages 1061–1069.

Prud’hommeaux, E. and Seaborne, A. (2008). SPARQLQuery Language for RDF. W3C Recommendation.

Salakhutdinov, R. and Hinton, G. E. (2007). Learning anonlinear embedding by preserving class neighbour-hood structure. In AISTATS, pages 412–419.

Salton, G. and Buckley, C. (1997). Improving retrieval per-formance by relevance feedback. Readings in Infor-mation Retrieval, 24:5.

Seebacher, D., Stein, M., Janetzko, H., and Keim, D. A.(2016). Patent Retrieval: A Multi-Modal Visual Ana-lytics Approach. In EuroVis Workshop on Visual Ana-lytics (EuroVA), pages 013–017. Eurographics.

Seifert, C. and Granitzer, M. (2010). User-based activelearning. In IEEE International Conference on DataMining Workshops (ICDMW), pages 418–425.

Sessler, D., Bernard, J., Kuijper, A., and Kohlhammer, J.(2014). Adopting mental similarity notions of categor-ical data objects to algorithmic similarity functions.Vision, Modelling and Visualization (VMV), Poster.

Settles, B. (2009). Active learning literature survey. Com-puter Sciences Technical Report 1648, University ofWisconsin–Madison.

Shao, L., Sacha, D., Neldner, B., Stein, M., and Schreck, T.(2016). Visual-Interactive Search for Soccer Trajecto-ries to Identify Interesting Game Situations. In IS&TElectronic Imaging Conference on Visualization andData Analysis (VDA). SPIE.

Sneath, P. H., Sokal, R. R., et al. (1973). Numerical taxon-omy. The principles and practice of numerical classi-fication.

Torresani, L. and Lee, K.-c. (2006). Large margin com-ponent analysis. In Advances in neural informationprocessing systems, pages 1385–1392.

Tversky, A. (1977). Features of similarity. Psychologicalreview, 84(4):327.

Ware, M., Frank, E., Holmes, G., Hall, M., and Witten, I. H.(2001). Interactive machine learning: letting usersbuild classifiers. Human-Computer Studies, pages281–292.

Weber, N., Waechter, M., Amend, S. C., Guthe, S., andGoesele, M. (2016). Rapid, Detail-Preserving ImageDownscaling. In Proc. of ACM SIGGRAPH Asia.

Weinberger, K. Q. and Saul, L. K. (2009). Distance metriclearning for large margin nearest neighbor classifica-tion. Machine Learning Research, 10:207–244.

Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. (2003).Distance metric learning with application to clusteringwith side-information. Advances in neural informa-tion processing systems, 15:505–512.

Yang, L., Jin, R., and Sukthankar, R. (2007). Bayesian ac-tive distance metric learning. In Conference on Un-certainty in Artificial Intelligence (UAI).

Yu, J., Amores, J., Sebe, N., Radeva, P., and Tian, Q.(2008). Distance learning for similarity estimation.IEEE Transactions on Pattern Analysis and MachineIntelligence (TPAMI), 30(3):451–462.

Zagoruyko, S. and Komodakis, N. (2015). Learning to com-pare image patches via convolutional neural networks.In Comp. Vis. and Pattern Recognition (CVPR). IEEE.


Documents

arXiv:1703.03385v1 [cs.LG] 9 Mar 2017 · sure and optionally a mapping from distance to sim-ilarity. Furthermore, many real-world examples re-quire additional steps in the algorithmic