Making Sense of Search: Using Graph Embedding and ......of methods, including statistical...

Preview:

Citation preview

  • Figure 1: Model Viewer – one ofthree interfaces we have developedto view and compare embeddingsof navigation graphs induced fromsearch engine query transitions.While the LSE (Laplacian spectralembedding) model to the leftshows different sub-intents for thequery “brexit” (e.g., “brexit dealwhat is it”, the ASE (Adjacencyspectral embedding) model to theright shows associated events ofcomparable public interest (e.g.,“government shutdown” and “hongkong protests”). These differentkinds of similar query may be usedto drive different kinds of queryrecommendation in the searchengine user experience.

    Making Sense of Search: Using GraphEmbedding and Visualization toTransform Query Understanding

    Jonathan LarsonMicrosoft ResearchSilverdale, WA, USAjolarso@microsoft.com

    Nathan EvansMicrosoft ResearchSilverdale, WA, USAnaevans@microsoft.com

    Darren EdgeMicrosoft ResearchCambridge, UKdaedge@microsoft.com

    Christopher WhiteMicrosoft ResearchRedmond, WA, USAchwh@microsoft.com

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.CHI ’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA.Copyright is held by the author/owner(s).ACM ISBN 978-1-4503-6819-3/20/04http://dx.doi.org/10.1145//3334480.3375233

    AbstractWe present a suite of interfaces for the visual explorationand evaluation of graph embeddings – machine learningmodels that reveal implicit relationships not directly ob-served in the input graph. Our focus is on the embeddingof navigation graphs induced from search engine querylogs, and how visualization of similar queries across dif-ferent embeddings, combined with the interactive tuningof results through multi-attribute ranking and post-filtering(e.g., using raw query frequency or derived entity type), canprovide a universal foundation for query recommendation.We describe the process of technology transfer from ourapplied research team to the Microsoft Bing product team,examining the critical role that visualization played in theirdecisions to ship the technology on bing.com.

    Author Keywordsquery logs; navigation graphs; graph embedding; queryrecommendation; visualization; tech transfer

    CCS Concepts•Human-centered computing → Activity centered design;Visual analytics; •Information systems → Content rank-ing; Query log analysis; Query suggestion; Evaluation ofretrieval results;

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 1

  • IntroductionWhile it is easy to make recommendations, it is hard toprovide consistently good recommendations – especiallywhen the questions can be vague and ill-formed, in any lan-guage, and on any topic. This is the fundamental problemof search engine query recommendation. A correspondingopportunity, arising from the growing dominance of mobileinternet use, is that touch navigation of recommendationlinks is significantly faster and easier than manually typingnew queries. Successive rounds of query recommendationalso allow the user to define and refine their intent in an it-erative, incremental, and exploratory fashion. Developing auniversal solution for query recommendation thus has thepotential to transform the search experience for all users.

    Stage Description

    AwarenessKnowing ofinnovation

    InterestSeeking moreinformation

    EvaluationDeciding oninitial use

    TrialLearning fromexperiences

    AdoptionDeciding oncontinued use

    Table 1: The diffusion ofinnovations (Rogers, 1962 [7]).

    Figure 2: From graph layout in 2D(top) to graph embedding projectedinto 3D (bottom). The proximity ofrelated vertices is preserved.

    This case study addresses the challenge of search queryrecommendation in the context of the Bing search engine,examining the critical role that data visualization and inter-active data interfaces played in facilitating product groupadoption of new recommendation mechanisms based ongraph embedding. We describe the different qualities of thevisual representations that drove the “diffusion” of this inno-vation (Table 1) from our applied research team in MicrosoftResearch, via key stakeholders in Bing product teams, tothe broader Bing organization and onto bing.com.

    The main lessons from this case study are twofold: (1) thatthe sensitivity of graph embedding to its configuration andthe subjectivity of evaluating recommendation quality de-mands a shared medium for stakeholders to explore differ-ent options and discover best practices; and (2) that whilestandalone explanatory data visualizations may have suf-ficient power to develop awareness of and interest in newtechnologies, shared exploratory data interfaces may benecessary to drive the real-time evaluation by individualstakeholders that leads to collective trial and adoption.

    Graph EmbeddingLet G = (V,E) represent a graph (network) of vertices(nodes) connected by edges (links), where the edges mayoptionally have a weight (strength) and direction (flow).Graph embedding [1] describes a family of machine learn-ing techniques that take conventional edge list or adjacencymatrix graph representations, which are hard to reasonabout on account of their discrete and high-dimensionalstructure, and transform them into feature vector vertexrepresentations that are relatively easier to reason aboutbecause their continuous and low-dimensional structuredefines a metric space. Graph embedding algorithms aimto perform this transformation in ways that preserve localstructure, such that vertices sharing similar edges in thegraph are mapped to similar locations in the embedding(Figure 2). The result is that rather than edges specifyingthe presence of a relationship between vertices, the dis-tance between any pair of vertices can be interpreted asa measure of their relatedness. This enables simple spa-tial implementations of three fundamental inference tasks:1) vertex nomination – given a query vertex, find its near-est neighbours in the embedded space; 2) link prediction– given a similarity threshold, find possible edges missingfrom the input graph; and 3) community detection – findclusters of vertices preferentially connected to one another.

    While stochastic approaches to graph embedding use ran-dom walks to characterize vertex neighbourhoods beforelearning multidimensional representations (e.g., DeepWalk[5] and node2vec [4]), spectral approaches use eigende-composition to factor matrix representations of the graphinto a multidimensional set of orthogonal basis vectors. Dif-ferent methods group vertices in different ways, e.g., Lapla-cian spectral embedding (LSE) groups vertices with similarconnections, while Adjacency spectral embedding (ASE)groups vertices with a similar structural role (e.g., “hub”) [6].

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 2

  • Navigation Graphs from Query LogsAs a way of understanding the relationships between searchengine queries, we are interested in inducing navigationgraphs that are represented implicitly in search enginequery logs (Figure 3). These logs record both queries andclicks (on page results, query recommendations, and ads)for anonymous user sessions using Bing search. There isno one correct way to induce such a graph: the choice ofvertex type (e.g., raw text vs normalized text vs named enti-ties only), the relative weights given to different edge types(e.g., query → query vs query → recommendation),and the minimum threshold for edge inclusion all make asignificant impact on the character of the results.

    The transfer of graph em-bedding technologies as adiffusion of innovations

    In the following series ofsidebars, we provide a com-mentary on the process oftechnology transfer usingquotes from two of our prod-uct group partners in Bing.We use the five stages of the“diffusion of innovations” tostructure these observations.

    Awareness – knowing ofinnovation. Our partnersknew of graph embeddingand its possible use, but onlyin a general sense:

    “There were teams doingsimilar things, just not usingthe technology that we’re col-laborating on and as a result,a lot of what they did waskind of handcrafted ratherthan a scaled solution.” (P2)

    “I do think that the rankingteam were aware of graphembedding and may havebeen using it in specificcases, but not across thewhole graph.” (P2)

    The use of graph representations to understand co-occurrencerelationships is a well established technique used in diversedomains ranging from literary analysis [10] to intelligenceanalysis [8]. The induction of navigation graphs from userhistory has also been practiced for at least two decades [2].Such “memory-based recommender systems” use collabo-rative filtering in conjunction with association measures likepointwise mutual information (PMI) to rank observed transi-tions [9]. The promise of embedding-based approaches isthat they are able to recommend relevant transitions evenin areas where the transition signal is sparse. Recent workfrom Taobao [11] reports substantial sparsity in item transi-tions on the Taobao e-commerce platfrom (

  • Figure 4: VS Code extension for exploring graph embeddingsusing coordinated views of a 2D node-link graph (top), a 3Dembedding projection (bottom), and a ranked list of nodes (right).

    beddings and derived data such as hierarchical “communi-ties” of related entities) such that they may be automaticallyindexed and joined with product group datasets (e.g., typesand identifiers for knowledge graph entities) for interactiveexploration in the browser. Our data application has threemain interface tabs, introduced next.

    With Hierarchy Viewer (Figure 5), the user can explore hi-erarchical communities of queries related to a given query,augmented with the query frequency and information on thedominant entity from our internal knowledge graph. Com-munities are detected from navigation graphs using a rangeof methods, including statistical (graph-based) and spatial(embedding-based) community detection techniques.

    Interest – seeking moreinformation. Our first graphvisualizations persuaded ourpartners about the value ofthe approach, but the pre-vailing organizational culturewas to persuade with data:

    “I was just showing him howour users are navigating fromentity to entity... He returnedwith this magnificent, won-derful, beautiful visualizationof what the graph looks likeand it was like, ‘Wow, thisfeels like a much more usefuland interesting dataset thanwe first thought’.” (P1)

    “For me, I was sold even atthe very beginning with theinitial tool [Figure 4], whichwas more of a strict visual-ization tool that shows youthe whole of the graph color-coded... Right there, I got it,I totally understand what thevalue would be of having allof Bing in a graph.” (P2)

    “There was initially a biaswhere people would look atthis and think it was [just] avisualization tool... but reallypeople are not interested invisualization tools in and ofthemselves.” (P2)

    With Embeddings Browser (Figure 6), the user can ex-plore many possible rankings of entity recommendationsrelated to a given query, augmented with information fromour knowledge graph and query logs (e.g., number of typed

    Figure 5: Hierarchy Viewer for the query “Gone with the wind”,listing top entities in the matching leaf cluster. Includes a leadactor (Vivien Leigh), their character (Scarlett O’Hara), and theirfamous lover not cast in the film (Lawrence Olivier).

    queries vs requery clicks). Our use of the LineUp visual-ization [3] allows the user to rerank the recommended en-tities based on any attribute (e.g., its own frequency vs itssimilarity to the query). The user can also create weightedcombinations of attributes by dragging columns onto oneanother, before dynamically reweighting the attributes (andthus reranking the results) by directly manipulating relativecolumn widths. Recommendations may also be filtered in-teractively (e.g., based on frequencies and/or entity types)to derive multiple kinds of recommendation, each with adistinctive character and purpose in terms user experience.

    With Model Viewer (Figure 1), the user can load any sub-set of the embedding models for side-by-side comparisonof the top N results across a series of queries. Result listscan be filtered to show their union (all items), intersection(common items), and symmetric difference (unique items).

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 4

  • Figure 6: Embedding Browser for the query “BMW”. Resultsranked by similarity (top) are refined by jointly ranking onfrequency (75%–25%) and filtering by entity type (bottom).

    Toggling any item highlights that item in a common color

    Evaluation – deciding oninitial use. Experimentationpersuaded segment ownersto trial in production:

    “When we meet with differentsegment owners, we showthe tool and they really likeplaying with it, because theyfind for themselves and theyunderstand for themselvesthat combining similarity withother signals works well ingeneral... eventually theyfamiliarize themselves withenough queries and resultsthat they trust the data andgive the go ahead to the en-gineers to take the raw dataand try to scale this and flightthis and ship this.” (P1)

    Trial – learning from expe-riences. Personal evalua-tions outweighed prior resultsin other areas:

    “the other segment owners,they all want to be individu-ally convinced by more thanjust a flight... when I showthem this view [Figure 6], itlooks user friendly, it lookshigh quality, and they feelempowered to make theirown decision.” (P1)

    across all juxtaposed models, while “Rank by Selection”ranks all models based on their coverage of the selecteditems (using the Jaccard set similarity measure of intersec-tion over union). Finally, “Model Stats” presents a matrixcomparing all loaded models to one another using eitherJaccard similarity or average precision (treating the modelrepresented by each row as the ground truth ranking forall other models). Together, these capabilities allow de-tailed comparison of many candidate models across a widerange of queries, allowing the user to understand which ofthe existing models works best for their needs, and which

    Figure 7: Similar products feature on bing.com, showing acarousel of recommended links for smartphones similar to thequery “iPhone XR” (Country/Region: United States – English).

    new models should be developed and evaluated next be-fore flighting or releasing to production. Figure 7 showsthe desktop and mobile experiences of one such featureshipped on bing.com – “Similar products” recommendation.

    Finally, the web application offers a link to “Segment spread-sheets” – a SharePoint folder in which the top recommen-dations for all queries associated with each predefinedsearch segment (e.g., health or automotive) are exportedto their own Excel spreadsheets for custom analysis (e.g.,by filtering and ranking in Excel, by importing to BusinessIntelligence platforms like Power BI and Tableau, or by per-forming data science using languages like R and Python).

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 5

  • How Visual Representations Drove AdoptionWe interviewed two of our primary partners in Bing to un-derstand their experience of the technology transfer pro-cess and how our interfaces have influenced sense-makingand decision-making across the organization. Both arehighly experienced program managers who have workedin a variety of Microsoft product groups before their cur-rent roles in Bing, where they are responsible for the devel-opment of new metrics and features, as well as the cross-group coordination with segment owners (e.g., for querieson health, sports, politics, food, etc.) required to ship newand improved features in production.

    A central theme emerging from these interviews was that vi-sual representations had played a vital role at each stage ofthe adoption process [7], which we have illustrated througha series of sidebars concluding in the left margin of thispage. In the sections below, we reflect on how exploratorydata interfaces helped to make sense of search.

    Adoption – deciding oncontinued use. Improve-ments in user experienceand revenue ultimately se-cured continued investment:

    “In the past we were just us-ing PMI to do related entities,but the unique challenge inproduct segments is thatthere is a very sparse signalof users transitioning fromone product to another. Thecool thing now is the similar-ity score, where even if youdon’t have a lot of raw sig-nal, we’re able to find closerelationships. So really ithelped us to create the re-lated entities feature that wecouldn’t have done just withraw data... and it’s having adirect monetary impact onBing, it’s really exciting.” (P1)

    Overall, this analysis showsthe power of exploratory datainterfaces to overcome boththe sensitivity of machinelearning and the subjectivityof human decision makers,as well as the power of graphmodels and embeddings toreveal latent value in data.

    Visual representations suggest new metrics and experiencesP1 described the close relationship between new visual-izations and metrics inspired by newly-visible phenomena:“The question is, ‘What should the next metric be, and howshould we get people to buy into that metric?’ And really vi-sualization is the key there, because if people can see theproblem, they will want to act on it and track the improve-ment over time”. From “the maybe 50-60 metrics” he hadcreated and presented to executives, navigation graph met-rics had “gained the most traction”. One reason was theclear advantage that embedding-based methods had overthe use of PMI, which was the prevailing standard prac-tice: “It’s what you’re blind to with PMI ... let’s say you’re onthis specific node. PMI is just going to tell you, ‘go here, gohere, go here, go here... [to adjacent nodes]’, but that’s notthe only thing the user should do. They should also move

    to nearby clusters”. This sense of ‘bottom up’ query cluster-ing in ways that may be independent of ‘top down’ segmentdefinitions, along with a sense for the spatial proximity ofsuch clusters, both emerged from exposure to our earlynode-link visualization of navigation graphs (Figures 3 & 4).

    Visual clusters reveal gaps in existing knowledgeP2 explained how one of the most important capabilities of-fered by our graph embedding approach was the automaticgeneration of “reasonably clustered sets where we can kindof narrow down and identify how dense the entity graphis within that particular cluster”. Tool users can thus inter-actively answer the questions, “Are there missing entitiesin that particular cluster? Are the entities too generic? Dowe need more specific entities in [our knowledge graph]?”.At the same time, both partners critiqued how the query-driven, list-based representation in this view showed only anarrow slice through the navigation graph, and thus lackedthe perceptual and navigational affordances of the node-link visualizations we had initially used to communicate theideas of navigation graphs and graph embeddings (Figures3 & 4). While our design choices with the current tool wereguided by the ‘search and list’ paradigm most familiar toour partner team, this reinforces the idea that new tools canlead to cultural change (which the tools must then follow).

    P1 gave a detailed account of how a Bing-scale navigationgraph combined with the ability to experiment in real-timeenabled a new kind of collaborative decision-making: “Let’ssay I’m on a call and they say, ‘I have this problem...’, I cango there in real-time and try their specific query, their spe-cific segment, tweak the ranking, the filtering, and show it tothem in real-time... being able to explore all this intelligencefrom the dataset in real-time is valuable because you areriding the wave of the meeting you’re in... Instead of saying,‘let’s schedule another meeting’, or ‘I’ll reply after the meet-

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 6

  • ing with some extra insights’, you can extract it in real-timeand have a discussion in real-time... there’s a lot of value inthe tool but what I like the most is that I have access to thefull dataset and I can explore it in real-time”.

    Visual juxtaposition builds understanding of methodsP1 called out Model Viewer as ending up “the most usefulto us” as it allowed direct model comparison at scale, em-phasizing their similarities and differences as well as theirsuitability to different problems: “We can compare differentembedding algorithms side-by-side and this is how we re-ally understood that one algorithm is really good for typos,one is really good for sub-intent, one is really good for re-lated entities. That was really a breakthrough there, andonce you understand which embedding algorithm is goodfor what, then you need to try it with different permutationsof hyperparameters for tuning, and this is the other partwhere this tool was useful, because then you can try like200 permutations and figure out which one’s best”. Show-ing an example query for ‘Bill Gates’, P2 also called outthe different qualities of result lists arising from the use ofAdjacency vs Laplacian spectral embedding: “In the ASEmodel, you can see that this is a really tight set of entities,especially at the top. This would be the ‘find missing entitiesand find which entities are related to a specific entity’ cut.Whereas the LSE model shows a lot of re-queries with theterm Bill Gates, which is so powerful... For example, youcould look at the popular re-query terms for all the peoplethat are like Bill Gates, and then you could rank those to getthe best set of re-query terms for that particular entity type.And by entity type, Bill Gates would be people.person, but ifyou look at this [ASE] list you can clearly see we could cre-ate a new type of people.person.billionaire”. These insights– that multiple embedding methods can be used togetherto infer new entity types as well as their sub-intents – wereboth consequences of visual model juxtaposition.

    ConclusionIn this case study, we have shown the value of both induc-ing and embedding navigation graphs as a novel approachto “making sense of search”. While the resulting data as-sets may be a necessary foundation for universal queryrecommendation, such data alone are insufficient for drivingdecisions by product owners to trial the data and associatedalgorithms in production. In our experience, such decisionsgreatly benefit from the visual representation and explo-ration of data, especially when the data are comprehensive,the queries are relevant to the viewer, and the results aretunable based on subjective notions of quality.

    At the same time, results that look good to software engi-neers and program managers in principle may not performwell with end-users in practice, and the continued use andbroader adoption of navigation graph embeddings withinBing will always depend on their measured impact on bothuser engagement and revenue. Our models are driving ex-perimental flights of both desktop and mobile experiences,as well as shipped experiences for “Similar products” and“People also search for” in a limited but expanding set ofsegments and markets. We are observing statistically sig-nificant improvements on the general metrics of sessionsuccess rate, time to success, and estimated revenue perclick, as well as recommendation clicks and carousel ac-tions. We hope that our work continues to transform theBing user experience for the better and that others maybenefit from the lessons of this case study.

    AcknowledgementsWe thank our team members in Microsoft Research, ourproduct group partners in Bing, and Carey Priebe and histeam at the Johns Hopkins University Applied Mathematicsand Statistics Department.

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 7

  • REFERENCES[1] Hongyun Cai, Vincent W. Zheng, and Kevin

    Chen-Chuan Chang. 2018. A comprehensive survey ofgraph embedding: Problems, techniques, andapplications. IEEE Transactions on Knowledge andData Engineering 30, 9 (2018), 1616–1637.

    [2] Xiaobin Fu, Jay Budzik, and Kristian J. Hammond.2000. Mining Navigation History for Recommendation.In Proceedings of the 5th International Conference onIntelligent User Interfaces (IUI ’00). ACM, New York,NY, USA, 106–112. DOI:http://dx.doi.org/10.1145/325737.325796

    [3] Samuel Gratzl, Alexander Lex, Nils Gehlenborg,Hanspeter Pfister, and Marc Streit. 2013. Lineup:Visual analysis of multi-attribute rankings. IEEEtransactions on visualization and computer graphics19, 12 (2013), 2277–2286.https://github.com/lineupjs

    [4] Aditya Grover and Jure Leskovec. 2016. node2vec:Scalable feature learning for networks. In Proceedingsof the 22nd ACM SIGKDD international conference onKnowledge discovery and data mining. ACM, 855–864.https://github.com/aditya-grover/node2vec

    [5] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena.2014. DeepWalk: Online learning of socialrepresentations. In Proceedings of the 20th ACMSIGKDD international conference on Knowledgediscovery and data mining. ACM, 701–710.https://github.com/phanein/deepwalk

    [6] Carey E. Priebe, Youngser Park, Joshua T. Vogelstein,John M. Conroy, Vince Lyzinski, Minh Tang, AvantiAthreya, Joshua Cape, and Eric Bridgeford. 2019. Ona two-truths phenomenon in spectral graph clustering.Proceedings of the National Academy of Sciences116, 13 (2019), 5995–6000.

    [7] Everett M. Rogers. 1962. Diffusion of innovations (1sted.). Free Press of Glencoe, New York. (book).

    [8] John Stasko, Carsten Görg, and Zhicheng Liu. 2008.Jigsaw: supporting investigative analysis throughinteractive visualization. Information visualization 7, 2(2008), 118–132.

    [9] Eva Suárez-García, Alfonso Landin, Daniel Valcarce,and Álvaro Barreiro. 2018. Term Association Measuresfor Memory-based Recommender Systems. InProceedings of the 5th Spanish Conference onInformation Retrieval. ACM, 6.

    [10] Romain Vuillemot, Tanya Clement, Catherine Plaisant,and Amit Kumar. 2009. What’s being said near“Martha”? Exploring name entities in literary textcollections. In 2009 IEEE Symposium on VisualAnalytics Science and Technology. IEEE, 107–114.

    [11] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang,Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scalecommodity embedding for e-commercerecommendation in alibaba. In Proceedings of the 24thACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining. ACM, 839–848.

    CHI 2020 Case Study CHI 2020, April 25–30, 2020, Honolulu, HI, USA

    CS36, Page 8

    http://dx.doi.org/10.1145/325737.325796https://github.com/lineupjshttps://github.com/aditya-grover/node2vechttps://github.com/phanein/deepwalk

    IntroductionGraph EmbeddingNavigation Graphs from Query LogsData Interfaces for Making Sense of SearchHow Visual Representations Drove AdoptionVisual representations suggest new metrics and experiencesVisual clusters reveal gaps in existing knowledgeVisual juxtaposition builds understanding of methods

    ConclusionREFERENCES

Recommended