Brave new search world

  • Published on
    15-Apr-2017

  • View
    393

  • Download
    0

Embed Size (px)

Transcript

<ul><li><p>Brave New Search World Ran HockOnline Strategiesran@onstrat.com</p></li><li><p>Brave New Search WorldThe nature of search is changing radically. Structure is being created from (relatively) unstructured data.The Semantic Web is becoming an actuality.Natural Language Processing (NLP) and other technologies are being extensively applied to search and search-related activities.</p><p>In the description of this talk which you have read, I say thatThe nature of search is changing "radically"The changes are 'radical" largely in terms of what we can do with "search" both in terms of how and when we (both information professionals and the masses) perform searches and the kinds of results we get. One quick example is the instananeous with which, in ordinary language, a question can be asked orally and an answer (not a list of resources" can be received.</p></li><li><p>Brave New Search WorldThese technologies are making the following kinds of things happen: Knowledge graphs Entity identification in numerous applications Natural language search statements Actual searching of images (not just of image metadata)These advances are coming not just from Google but from numerous services, especially for news search.</p><p>In the description of this talk which you have read, I say thatThe nature of search is changing "radically"The changes are 'radical" largely in terms of what we can do with "search" both in terms of how and when we (both information professionals and the masses) perform searches and the kinds of results we get. One quick example is the instananeous with which, in ordinary language, a question can be asked orally and an answer (not a list of resources" can be received.</p></li><li><p>Some Themes/PerspectivesWhat is happening is more evolutionary than revolutionary. Many, but not all, of the "pieces" of the technology have been around for a while.Structure is being derived out of (not totally) chaos. We are going from words to meaning.Google isnt the only player here.We can take real advantage of the developments.Using what you already know about search is important.</p><p>A we go along over the next 40 or so minutes, you'll notice several recurring themes, or perhaps, perspectives.pieces not new - the idea of augmenting results pages with collections of facts aout nthe topic dates back to AltaVista and yahoo in the mid-1990s People can see structure (linguistic structure) that machines don't easily seeBing, news sites, and many others are involved in improving search technologiesEspecially if we more fully understand the basic ideas of some of teh tech nolgies, we can make fuller use of what they are providing us with.Human searches can still accomplish things the technologies cant</p></li><li><p>Unstructuredness of DataPart of the organization of knowledge problemParticularly acute for textual material To a computer, a word is a string of characters bounded by spaces or punctuation and has no meaning.When we are searching for something, we are searching for meaningful things, not character strings.Meaning can be derived from context by the use of NLP.</p><p>In a sense, the organization of data is at the core of wht the information profession is all aboutexcept for a few cues such as heading, full text is just a collection of word strings to a typical computerNLP is the magic potion that changes strings to meaning</p></li><li><p>Where We Were RecentlyBoolean LogicActually a precursor/example of Artificial Intelligence (AI) applied to search.Still a part of search AI Boolean is (from our infancy) a central aspect of how we think, a part of our consciousnessOld approach: Searching by concepts </p><p>Boolean has been the primary "technology" since the beginning of computerized information retrieval. Since the essence of Boolean is an intellectual means of identifying from a group of items, those that have a specific combination of characteristics, boolen is likely toe a big thing for a long time.</p></li><li><p>Where We Were Recently Old (circa 1975 2???) search strategy (searching by concepts)</p><p> OR</p><p>For decades and to varying degree up to the present, this is the general approach that I and others who teach Internet search have used. - searching by concepts ---- identifying the essential concepts and then the alternate terms that might indicate the presence of each conceptWhether or not you search using a chart like this, this is one way a professional searcher thinks -- concepts and related </p></li><li><p>Where We Were Recently(cont.)Ranking of web search results was/is based on a wide range (ca 200) factors, signalsUser-controlled field searching (intitle: etc.)Etc.</p></li><li><p>The Newer TechnologiesSemantic Web TechnologiesArtificial Intelligence (AI) used at a broad level and utilizing various AI subfieldsAI - Expert Systems approachesAI - Natural Language Processing (NLP)AI - NLP - Entity identification (extraction, disambiguation, classification, etc.) AI - Machine LearningBig Data processing</p><p>Though there are other technologies involved, I think he main ones regarding things currently happening are the one's listed here.And, perhaps obviously, because of the nature of these technologies, there's considerable overlap between some of these categoriesSemantic .... particularly the things that can be done at the webpage levelEntity .. And if I had to pick out one from the list that makes the biggest difference it is this</p></li><li><p>Technologies:The Semantic WebW3C informal definition "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.(from Tim Berners-Lee et al, The Semantic Web.Scientific American, May 2001.)</p><p>T B-L - creator of the WebW3C - World Wide Web Consortium - the main international organization for Web standards.</p></li><li><p>Technologies:The Semantic WebEssence:strings to thingswords to meaningTechnologically accomplished on webpages by means of a specialized xml markup language, etc. </p><p>If you want further details on this, go to schema.org.</p></li><li><p>Technologies:The Semantic WebIdea born pre-1999In practice, also requires other technologies such as Natural Language Processing, etc. 2006 - Berners-Lee and colleagues stated that: "This simple idearemains largely unrealized".2013 - more than four million Web domains contained Semantic Web markup. </p><p>In practice .Webpage markup cant, by itself, create a semantic web.</p><p>To really accomplish that, all of the words need to have meaning and for this, themarkup has to be complemented by other techniques such as natural language processing</p></li><li><p>Technologies:AI - Expert SystemsSearch results ranking has long used an expert systems approach, mimicking what an experienced researcher looks for:Words appearing in the title Number of times cited (linked-to)Proximity of wordsWords in the abstractWords in headings Etc.This will continue, more and more automatically.</p><p>One of the reasons Google became so successful is because of the way it mimics what a researcher looks for when looking at a collection of articles</p></li><li><p>Technologies:Natural Language ProcessingA part of artificial intelligence and computational linguistics Deals with helping computers understand written and spoken languagesPlays a key role in voice input for search, natural language search statements, translations, and more.</p><p>Again, with NLP, the thrust is to turn words in to meaning,</p><p>As I said before, the structure is actually present in text, but it is a challenge to program in, for a computer,the cues that we as humans can rather easily identify.</p></li><li><p>Technologies:Natural Language ProcessingGoogle's syntactic systems predict part-of-speech tags for each word in a given sentence, identify morphological features such as gender and number. label relationships between words, such as subject, object, modification, etc. leverage large amounts of unlabeled dataincorporate neural net technology.</p><p>research.google.com/pubs/NaturalLanguageProcessing.html</p><p>Withn NLP as ir is used by Google, two major systems do a lot of the heavy work in understanding language, a syntactic system and a symantic system.</p><p>Nouns, verbs, pronouns, adjectives, adverbs, prepositions, etc.</p></li><li><p>Technologies:Natural Language ProcessingGoogles semantic systemsidentify entities in free text,label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (co-reference resolution), incorporates multiple sources of knowledge and information to aid with analysis of text</p><p>research.google.com/pubs/NaturalLanguageProcessing.html</p><p>TheSemantic systems go eyoud the more grammatical: strucdture and examines the broader contextual situation</p></li><li><p>Technologies:Entity ExtractionA.k.a. named-entity recognition, entity identificationComplementary to other natural language processingIdentifies things, people, places, etc. within text (and speech).Relates to the idea of conceptsreferred to earlier. Because text is based on language, structure is there but the structure is not readily evident to a computer.</p></li><li><p>Technologies:Entity ExtractionContext-based connections allow discernment of different meanings of a word.Entity extraction draws inferences based on the logical content of the data.Entity extraction may be the single most important tool for bringing structure to unstructured data, specifically text.Also used for search query suggestions.An excellent example is found in Silobreaker.</p><p>What Im referring to on this slide goes by several different names - entity extraction,entity identification, named-entity recognition and perhaps other namesAnd it involves various subsets of activities such and entity classification and entity disambiguation.</p></li><li><p>Silobreaker is a Swedish company that has been around since 2005 and was one of the first news services to extensively use entity extraction.</p><p>This slide, showing a search being entered points out that entity extraction isnt just used for indexing for the retrieval part of he search, but also for providing organized search terminology, basically a somewhat-controlled vocabulary.</p></li><li><p>This slide shows how silobreaker uses named entities to visualize connections between entities.</p><p>By the way, visualizations similar to this, but showing connections between retrieved domains have been around for a longtime. AltaVista in the mid 1990s was showing visualizations of connections between the first 200 retrieved records in a search.</p></li><li><p>This screenshot from a Silobreaker search shows named entties by classification, people, companies, groups, places, activities, and so on.</p></li><li><p>Technologies:Machine LearningComputers teaching themselves Google RankBrainUsed in processing search results, part of Googles Hummingbird search algorithmA way of interpreting a search statement in order to find web pages that may not have the specific words in the search statement.Uses patterns from seemingly unconnected other complex searches to find similarities in the current search, then applying that information to most likely useful content.Google regards this as the third most important signal.</p><p>One rather different technology eing used is machine learning, programming that allows a computer to teach itself.</p></li><li><p>Technologies:Big DataThe existence of big data collections provides unprecedented opportunities for computational approaches for computers to understand text.In neural networking image entity identification experiments, the accuracy of machine learning algorithms improves vastly when used with large pools of data."...Googles search engine queries a 100 petabyte index that incorporates over 200 indicators and whose algorithms change more than 500 times per year."</p><p>There now existant volumes of data and particularly where statistical analysis is a key part of processes, the more data the merrier</p></li><li><p>Specific Applications of These (and Other) TechnologiesContinued gradual incorporation of expert techniquesNatural language search statementsSearch by voiceImage recognition and search: search of images, search by image, and facial recognitionKnowledge GraphsEntities in news search</p><p>Having taken a look at the technologies involved lets take a more specific look at where in the search process they are eing applied.</p><p>There isnt time tocover all search-realted situations but the oneslisted here are theones most central to search</p></li><li><p>Gradual Incorporation of Expert TechniquesAn ordinary search isnt what it used to be.Google has now quietly taken over more of the old professional searcher techniques and now automatically adds not just word variants, but synonyms.</p></li><li><p>Gradual Incorporation of Expert TechniquesSuggested searches (based on known connections and not just based on your character string)</p><p> A "data-driven" approach - trillions of words, vs "rules. Not just word variants.</p><p>The old synonyms (~diet) option didnt just go away. It is now applied automatically. (Few people use the OR.)</p></li><li><p>Gradual Incorporation of Expert TechniquesDid you mean is now more often Showing results for</p></li><li><p>Gradual Incorporation of Expert TechniquesFuzzy Logic As well as searching for words that are close, Google may drop some of your concepts for some records</p></li><li><p>Gradual Incorporation of Expert TechniquesIf Google thinks you want specific facts and sees a matching answer, you may get that immediately.</p></li><li><p>Specific Applications:Natural Language Search StatementsDont hesitate to use them!</p><p>The above two searches give different (and relevant) answersThis is especially important for Google Now and Siri!</p></li><li><p>Specific Applications:Voice SearchApple (iOS) - SiriGoogle Google NowBing Cortana (recently deceased?)These expect natural language, so natural language will yield the best results.</p></li><li><p>Specific Applications:Image Recognition and Search: Search of ImagesNot much recent obvious change in Bings or Googles regular image search, but:Categorization (aspect of entity extraction) is now shown on image search results pagesGoogle, Microsoft (Bing) and Apple are heavy into research on image identification and classification.Whats happening/coming can be anticipated by looking at Google Photos.</p></li><li><p>Specific Applications:Image Recognition and Search: Search of Images</p><p>Bing Image Search</p></li><li><p>Specific Applications:Image Recognition and Search: Search of Images</p></li><li><p>Specific Applications:Image Recognition and Search: Search of ImagesIn December 2015, Microsoft beat out 5 competitors (including Google) in the ImageNet contest for machine recognition of imagesMachines were trained to recognize images using a deep neural networking method.Competitors must locate and identify objects from 100,000 photographs found in Flickr and search engines and then place them in 1,000 object categories.Microsoft, the winner, had an error rate of 3.5 percent for classification and 9 percent for localization.Machine learning using neural networking is also very successfully used for translations, such as in Skypes new translation offering</p></li><li><p>Specific Applications:Image Recognition and Search: Search by Image</p></li><li><p>Specific Applications:Image Recognition and Search: Entity and Facial Recognitionin Google Photos</p></li><li><p>Specific Applications:Knowledge GraphsKnowledge graphs do not originate...</p></li></ul>